Commit Graph

241 Commits

Author SHA1 Message Date
Dawid Medrek
96509c4cf7 db/hints: Make sync points be created for all hosts when not specified
Sync points are created, via POST HTTP requests, for a subset of nodes
in the cluster. Those nodes are specified in a request's parameter
`target_hosts`. When the parameter is empty, Scylla should assume
the user wants to create a sync point for ALL nodes.

Before these changes, sync points were created only for LIVE nodes.
If a node was dead but still part of the cluster and the user
requested creating a sync point leaving the parameter `target_hosts`
empty, the dead node was skipped during the creation of the sync point.
That was inconsistent with the guarantees the sync point API provides.

In this commit, we fix that issue and add a test verifying that
the changes have made the implementation compliant with the design
of the sync point API -- the test only passes after this commit.

Fixes scylladb/scylladb#9413

Closes scylladb/scylladb#19750
2024-08-07 13:15:20 +02:00
Pavel Emelyanov
dd7c7c301d hints: Const-ify gossiper references and anchor pointers
There are two places in hints code that need gossiper: hist_sender
calling gossiper::is_alive() and endpoint_downtime_not_bigger_than()
helper in manager. Both can live with const gossiper, so the dependency
references and anchor pointers can be restricted to const too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-26 16:28:54 +03:00
Dawid Medrek
8b6e887e02 db/hints: Verify that Scylla limits the concurrency of written hints
In 6e79d64, the behavior of `manager::too_many_in_flight_hints_for()`
was accidentally modified. It remained unnoticed for some time
and then fixed. In this commit, we add a test verifying that
the concurrency of hints being written to disk is indeed limited
and the limitations are imposed properly.
2024-07-18 13:49:29 +02:00
Piotr Dulikowski
3c535641fd Merge 'service/storage_proxy: Add metrics keeping track of incoming hints' from Dawid Mędrek
Although Scylla already exposes metrics keeping track of various information related to hinted handoff, all of them correspond to either storing or sending hints. However, when debugging, it's also crucial to be aware of how many hints are coming to a given node and what their size is. Unfortunately, the existing metrics are not enough to obtain that information.

This PR introduces the following new metrics:

* `sent_bytes_total` – the total size of the hints that have been sent from a given shard,
* `received_hints_total` – the total number of hints that a given shard has received,
* `received_hints_bytes_total` – the total size of the hints a given shard has received.

It also renames `hints_manager_sent` to `hints_manager_sent_total` to avoid conflicts of prefixes between that metric and `sent_bytes_total` in tests.

Fixes scylladb/scylladb#10987

Closes scylladb/scylladb#18976

* github.com:scylladb/scylladb:
  db/hints: Add a metric for the size of sent hints
  service/storage_proxy: Add metrics for received hints
2024-07-08 10:29:53 +02:00
Dawid Medrek
0e1cb0dc73 db/hints: Add logging when ignoring hint directories
In 2446cce, we stopped trying to attempt to create
endpoint managers for invalid hint directories
even when their names represented IP addresses or
host IDs. In this commit, we add logging informing
the user about it.

Refs scylladb/scylladb#19173

Closes scylladb/scylladb#19618
2024-07-04 20:14:52 +03:00
Dawid Medrek
2446cce272 db/hints: Initialize endpoint managers only for valid hint directories
Before these changes, it could happen that Scylla initialized
endpoint managers for hint directories representing

* host IDs before migrating hinted handoff to using host IDs,
* IP addresses after the migration.

One scenario looked like this:

1. Start Scylla and upgrade the cluster to using host IDs.
2. Create, by hand, a hint directory representing an IP address.
3. Trigger changing the host filter in hinted handoff; it could
   be achieved by, for example, restricting the set of data
   centers Scylla is allowed to save hints for.

When changing the host filter, we browse the hint directories
and create endpoint managers if we can send hints towards
the node corresponding to a given hint directory. We only
accepted hint directories representing IP addresses
and host IDs. However, we didn't check whether the local node
has already been upgraded to host-ID-based hinted handoff
or not. As a result, endpoint managers were created for
both IP addresses and host IDs, no matter whether we were
before or after the migration.

These changes make sure that any time we browse the hint
directories, we take that into account.

Fixes scylladb/scylladb#19172

Closes scylladb/scylladb#19173
2024-06-21 15:59:49 +02:00
Dawid Medrek
670830091c db/hints: Use dedicated functions to lock a shared mutex
Seastar has functions implementing locking a `seastar::shared_mutex`.
We should use those now instead of reimplementing them in Scylla.

Closes scylladb/scylladb#19253
2024-06-14 20:31:37 +02:00
Piotr Dulikowski
0b5a0c969a Merge 'hinted handoff: migrate sync point to host ID' from Michael Litvak
Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module.
Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs.
The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of new type the translation is avoided.

Fixes #18653

Closes scylladb/scylladb#19134

* github.com:scylladb/scylladb:
  db/hints: migrate sync point to host ID
  db/hints: rename sync point structures with _v1 suffix to _v1_v2
2024-06-13 06:16:00 +02:00
Dawid Medrek
dc41086c57 db/hints: Add a metric for the size of sent hints
In this commit, we add a new metric `sent_total_size`
keeping track of how many bytes of hints a node
has sent. The metric is supposed to complement its
counterpart in storage proxy that counts how many
bytes of hints a node has received. That information
should prove useful in analyzing statistics of
a cluster -- load on given nodes and where it comes
from.

We also change the name of the matric `sent`
to `sent_total` to avoid the conflict of prefixes
between the two metrics.
2024-06-12 18:20:08 +02:00
Michael Litvak
afc9a1a8a6 db/hints: migrate sync point to host ID
Change the format of sync points to use host ID instead of IPs, to be
consistent with the use of host IDs in hinted handoff module.
Introduce sync point v3 format which is the same as v2 except it stores
host IDs instead of IPs.
The encoding of sync points now always uses the new v3 format with host
IDs.
The decoding supports both formats with host IDs and IPs, so a sync point
contains now a variant of either types, and in the case of the new
format the translation from IP to host ID is avoided.
2024-06-11 11:07:00 +02:00
Dawid Medrek
a5528a2093 db/hints: Log when ignoring invalid hint directories
In 58784cd, aa4b06a and other commits migrating
hinted handoff from IPs to host IDs (scylladb/scylladb#15567),
we started ignoring hint directories of invalid names,
i.e. those that represent neither an IP address, nor a host ID.
They remain on disk and are taken into account while computing
e.g. the total size of hints, but they're not used in any way.

These changes add logs informing the user when Scylla
encounters such a directory.

Closes scylladb/scylladb#17566
2024-06-07 19:19:15 +02:00
Pavel Emelyanov
84f0bab27c hints/manager: Simplify hints dir evaluation
Currently the code wraps simple "if" with std::invoke over a lambda.
Also, the local variable that gets the result, is declared as const one,
which prevents it from being std::move()-d in the very next line.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19106
2024-06-06 08:31:30 +03:00
Dawid Medrek
e855794327 db/hints: Introduce an error injection to test draining
We want to verify that a hint directory is drained
when any of the nodes correspodning to it leaves
the cluster. The test scenario should happen before
the whole cluster has been migrated to
the host-ID-based hinted handoff, so when we still
rely on the mappings between hint endpoint managers
and the hint directories managed by them.

To make such a test possible, in these changes we
introduce an error injection rejecting incoming
hints. We want to test a scenario when:

1. hints are saved towards a given node -- node N1,
2. N1 changes its IP to a different one,
3. some other node -- node N2 -- changes its IP
   to the original IP of N1,
4. hints are saved towards N2 and they are stored
   in the same directory as the hints saved towards
   N1 before,
5. we start draining N2.

Because at some point N2 needs to be stopped,
it may happen that some mutations towards
a distributed system table generate a hint
to N2 BEFORE it has finished changing its IP,
effectively creating another hint directory
where ALL of the hints towards the node
will be stored from there on. That would disturb
the test scenario. Hence, this error injection is
necessary to ensure that all of the steps in the
test proceed as expected.
2024-05-29 19:32:41 +02:00
Dawid Medrek
745a9c6ab8 db/hints: Ensure that draining happens
Before hinted handoff is migrated to using host IDs
to identify nodes in the cluster, we keep track
of mappings between hint endpoint managers
identified by host IDs and the hint directories
managed by them and represented by IP addresses.
As a consequence, it may happen that one hint
directory corresponds to multiple nodes
-- it's intended. See 64ba620 for more details.

Before these changes, we only started the draining
process of a hint directory if the node leaving
the cluster corresponded to that hint directory
AND was identified by the same host ID as
the hint endpoint manager managing that directory.
As a result, the draining did not always happen
when it was supposed to.

Draining should start no matter which of the nodes
corresponding to a hint directory is leaving
the cluster. This commit ensures that it happens.
2024-05-29 19:32:38 +02:00
Kefu Chai
dbfdc71d2d treewide: fix typos in comment and error messages
these typos were identified by codespell

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18868
2024-05-26 11:54:36 +03:00
Pavel Emelyanov
b24fb8dc87 inet_address: Remove to_sstring() in favor of fmt::to_string
The existing inet_address::to_string() calls fmt::format("{}", *this)
anyway. However, the to_string() method is declared in .cc file, while
form formatter is in the header and is equipeed with constexprs so
that converting an address to string is done as much as possible
compile-time.

Also, though minor, fmt::to_string(foo) is believed to be even faster
than fmt::format("{}", foo).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18712
2024-05-21 09:43:08 +03:00
Dawid Medrek
c9bbb92b1a db/hints: Remove migrating flag before initializing endpoint managers
Before these changes, if initializing endpoint
managers after the migration of hinted handoff
to host ID is done throws an exception, we
don't remove the flag indicating the migration
is still in progress. However, the migration
has, in practice, finished -- all of the
hint directories have been mapped to host IDs
and all of the nodes in the cluster are
host-ID-based. Because of that, it makes sense
to remove the flag early on.
2024-05-13 16:40:47 +02:00
Dawid Medrek
bdcde0c210 db/hints: Prevent segmentation fault when initializing endpoint managers
If hinted handoff is still IP-based and there is
a hint directory representing an IP without
a corresponding mapping to a host ID in
`locator::token_metadata`, an attemp to initialize
its endpoint manager will result in a segmentation
fault. This commit prevents that.
2024-05-13 16:40:47 +02:00
Kefu Chai
2a9a874e19 db,service: fix typos in comments
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18567
2024-05-09 08:26:44 +03:00
Dawid Medrek
46ab22f805 db/hints: Add endpoint_downtime_not_bigger_than()
We add an auxiliary function checking if a node
hasn't been down for too long. Although
`gms::gossiper` provides already exposes a function
responsible for that, it requires that its
argument be an IP address. That's the reason
we add a new function.
2024-04-28 01:22:59 +02:00
Dawid Medrek
0ef8d67d32 db/hints: Migrate hinted handoff when cluster feature is enabled
These changes migrate hinted handoff to using
host ID as soon as the corresponding cluster
feature is enabled.

When a node starts, it defaults to creating
directories naming them after IP addresses.
When the whole cluster has upgraded
to a version of Scylla that can handle
directories representing host IDs,
we perform a migration of the IP folders,
i.e. we try to rename them to host IDs.
Invalid directories, i.e. those that
represent neither an IP address, nor a host
ID, are removed.

During the migration, hinted handoff is
disabled. It is necessary because we have
to modify the disk's contents, so new hints
cannot be saved until the migration finishes.
2024-04-28 01:22:57 +02:00
Dawid Medrek
58784cd8db db/hints: Handle arbitrary directories in resource manager
Before these changes, resource manager only handled
the case when directories it browsed represented
valid host IDs. However, since before migrating
hinted handoff to using host IDs we still name
directories after IP addresses, that would lead
to exceptins that shouldn't happen.

We make resource manager handle directories
of arbitrary names correctly.
2024-04-27 22:31:07 +02:00
Dawid Medrek
ee84e810ca db/hints: Start using hint_directory_manager
We start keeping track of mappings IP - host ID.
The mappings are between endpoint managers
(identified by host IDs) and the hint directories
managed by them (represented by IP addresses).

This is a prelude to handling IP directories
by the hint shard manager.

The structure should only be used by the hint
manager before it's migrated to using host IDs.
The reason for that is that we rely on the
information obtained from the structure, but
it might not make sense later on.

When we start creating directories named after
host IDs and there are no longer directories
representing IP addresses, there is no relation
between host IDs and IPs -- just because
the structure is supposed to keep track between
endpoint managers and hint directories that
represent IP addresses. If they represent
host IDs, the connection between the two
is lost.

Still using the data structure could lead
to bugs, e.g. if we tried to associate
a given endpoint manager's host ID with its
corresponding IP address from
locator::token_metadata, it could happen that
two different host IDs would be bound to
the same IP address by the data structure:
node A has IP I1, node A changes its IP to I2,
node B changes its IP to I1. Though nodes
A and B have different host IDs (because they
are unique), the code would try to save hints
towards node B in node A's hint directory,
which should NOT happen.

Relying on the data structure is thus only
safe before migrating hinted handoff to using
host IDs. It may happen that we save a hint
in the hint directory of the wrong node indeed,
but since migration to using host IDs is
a process that only happens once, it's a price
we are ready to pay. It's only imperative to
prevent it from happening in normal
circumstances.
2024-04-27 22:31:07 +02:00
Dawid Medrek
aa4b06a895 db/hints: Enforce providing IP in get_ep_manager()
We drop the default argument in the function's signature.
Also, we adjust the code of change_host_filter() to
be able to perform calls to get_ep_manager().
2024-04-27 22:31:07 +02:00
Dawid Medrek
934e4bb45e db/hints: Add function for migrating hint directories to host ID
We add a function that will be used while
migrating hinted handoff to using host IDs.
It iterates over existing hint directories
and tries to rename them to the corresponding
host IDs. In case of a failure, we remove
it so that at the end of its execution
the only remaining directories are those
that represent host IDs.
2024-04-27 22:31:04 +02:00
Dawid Medrek
e36f853f9b db/hints: Take both IP and host ID when storing hints
The store_hint() method starts taking both an IP
and a host ID as its arguments. The rationale
for the change is depending on the stage of
the cluster (before an upgrade to the
host-ID-based hinted handdof and after it),
we might need to create a directory representing
either an IP address, or a host ID.

Because locator::topology can change in the
before obtaining the host ID we pass
and when the function is being executed,
we need to pass both parameters explicitly
to ensure the consistency between them.
2024-04-27 20:35:58 +02:00
Dawid Medrek
063d4d5e91 db/hints: Prepare initializing endpoint managers for migrating from IP to host ID
We extract the initialization of endpoint managers
from the start method of the hint manager
to a separate function and make it handle directories
that represent either IP addresses, or host IDs;
other directories are ignored.

It's necessary because before Scylla is upgraded
to a version that uses host-ID-based hinted handoff,
we need to continue only managing IP directories.
When Scylla has been upgraded, we will need to handle
host ID directories.

It may also happen that after an upgrade (but not
before it), Scylla fails while renaming
the directories, so we end up with some of them
representing IP address, and some representing
host IDs. After these changes, the code handles
that scenario as well.
2024-04-27 20:35:53 +02:00
Dawid Medrek
cfd03fe273 db/hints: Migrate to locator::host_id
We change the type of node identifiers
used within the module and fix compilation.
Directories storing hints to specific nodes
are now represented by host IDs instead of
IPs.
2024-04-26 22:44:04 +02:00
Dawid Medrek
c585444c60 db/hints: Fix indentation 2024-04-26 22:44:03 +02:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Dawid Medrek
b36becc1f3 db/hints: Fix too_many_in_flight_hints_for
The semantics of the function was accidentally
modified in 6e79d64. The consequence of the change
was that we didn't limit memory consumption:
the function always returned false for any node
different from the local node. The returned value
is used by storage_proxy to decide whether it
is able to store a hint or not.

This commit fixes the problem by taking other
nodes into consideration again.

Fixes #17636

Closes scylladb/scylladb#17639
2024-03-06 09:48:30 +02:00
Pavel Emelyanov
7c5c89ba8d Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel"
This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.

This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
2024-01-31 15:08:14 +03:00
Patryk Wrobel
781a6a5071 utils/directories: make utils::directories::set an internal type
Previously, utils::directories::set could have been used by
clients of utils::directories class to provide dirs for creation.
Due to moving the responsibility for providing paths of dirs from
db::config to utils::directories, such usage is no longer the case.

This change:
 - defines utils::directories::set in utils/directories.cc to disallow
   its usage by the clients of utils::directories
 - makes utils::directories::create_and_verify() member function
   private; now it is used only by the internals of the class
 - introduces a new member function to utils::directories called
   create_and_verify_sharded_directory() to limit the functionality
   provided to clients

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:20:41 +01:00
Kefu Chai
be364d30fd db: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16664
2024-01-09 11:44:19 +02:00
Benny Halevy
6e79d647e6 db/hints/manager: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Dawid Medrek
ddc385bce0 db/hints: Remove an unused namespace 2023-10-06 13:25:30 +02:00
Dawid Medrek
76d414012b db/hints: Coroutinize change_host_filter() 2023-10-06 13:25:30 +02:00
Dawid Medrek
09eb30e6f1 db/hints: Coroutinize drain_for()
This commit turns the function into a coroutine
and makes the code less compact and more readable.
2023-10-06 13:25:30 +02:00
Dawid Medrek
907a572e24 db/hints: Clean up can_hint_for()
This commit gets rid of unnecessary additional calls to functions
and makes all lines abide by the limit of 120 characters.
2023-10-06 13:25:30 +02:00
Dawid Medrek
596e1f9859 db/hints: Clean up store_hint()
This commit makes the function abide by the limit
of 120 characters per line.
2023-10-06 13:25:30 +02:00
Dawid Medrek
8a43f94ca6 db/hints: Clean up too_many_in_flight_hints_for()
This commit makes the return statement more readable.
It also makes the comment abide by the limit of 120 characters per line.
2023-10-06 13:25:30 +02:00
Dawid Medrek
96a5906621 db/hints: Refactor get_ep_manager() 2023-10-06 13:25:30 +02:00
Dawid Medrek
8b591be3c3 db/hints: Coroutinize wait_for_sync_point()
This commit coroutinizes the function and adds
a comment explaining a non-trivial case.
2023-10-06 13:25:27 +02:00
Dawid Medrek
fee3aafd80 db/hints: Use std::span in calculate_current_sync_point
std::span is a lot more flexible than std::vector as it allows
for arbitrary contiguous ranges.
2023-10-06 12:36:05 +02:00
Dawid Medrek
64fd4d6323 db/hints: Clean up manager::forbid_hints_for_eps_with_pending_hints() 2023-10-06 12:26:55 +02:00
Dawid Medrek
58cd5c4167 db/hints: Clean up manager::forbid_hints() 2023-10-06 12:26:55 +02:00
Dawid Medrek
f8ed93f5bc db/hints: Clean up manager::allow_hints() 2023-10-06 12:26:52 +02:00
Dawid Medrek
bfe32bcf89 db/hints: Coroutinize compute_hints_dir_device_id() 2023-10-06 12:18:30 +02:00
Dawid Medrek
8f28eb6522 db/hints: Clean up manager::stop()
This commit gets rid of boilerplate in the function,
leverages a range pipe and explicit types to make
the code more readable, and changes the logs to
make it clearer what happens.
2023-10-06 12:18:30 +02:00
Dawid Medrek
a384caece0 db/hints: Clean up manager::start()
This commit coroutinizes the function and makes it less compact.
2023-10-06 12:18:30 +02:00