Currently, when attempting to send a hint, we might choose its
recipients in one of two ways:
- If the original destination is a natural endpoint of the hint, we only
send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.
There is a problem when we decommission a node: while data is streamed
away from that node, it is still considered to be a natural endpoint of
the data that it used to own. Because of that, it might happen that a
hint is sent directly to it but streaming will miss it, effectively
resulting in the hint being discarded.
As sending the hint _only_ to the leaving replica is a rather bad idea,
send the hint to all replicas also in the case when the original
destiantion of the hint is leaving.
Note that this is a conservative fix written only with the decommission
+ vnode-based keyspaces combo in mind. In general, such "data loss" can
occur in other situations where the replica set is changing and we go
through a streaming phase, i.e. other topology operations in case of
vnodes and tablet load balancing. However, the consistency guarantees of
hinted handoff in the face of topology changes are not defined and it is
not clear what they should be, if there should be any at all. The
picture is further complicated by the fact that hints are used by
materialized views, and sending view updates to more replicas than
necessary can introduce inconsistencies in the form of "ghost rows".
This fix was developed in response to a failing test which checked the
hint replay + decommission scenario, and it makes it work again.
Fixesscylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835
It's a small method and it is only used once in send_one_mutation.
Inlining it lets us get rid of its declaration in the header - now, if
one needs to change the variables passed from one function to another,
it is no longer necessary to change the header.
In scylladb/scylladb@7301a96, in the function `hint_endpoint_manager::store_hint()`,
we transformed the lambda passed to `seastar::with_gate()` to a coroutine lambda
to improve the readability. However, there was a subtle problem related to
lifetimes of the captures that needed to be addressed:
* Since we started `co_await`ing in the lambda, the captures were at risk of
being destructed too soon. The usual solution is to wrap a coroutine lambda
within a `seastar::coroutine::lambda` object and rely on the extended lifetime
enforced by the semantics of the language.
See `docs/dev/lambda-coroutine-fiasco.md` for more context.
* However, since we don't immediately `co_await` the future returned by
`with_gate()`, we cannot rely on the extended lifetime provided by the wrapper.
The document linked in the previous bullet point suggests keeping the passed
coroutine lambda as a variable and pass it as a reference to `with_gate()`.
However, that's not feasible either because we discard the returned future and
the function returns almost instantly -- destructing every local object, which
would encompass the lambda too.
The solution used in the commit was to move captures of the lambda into
the lambda's body. That helped because Seastar's backend is responsible for
keeping all of the local variables alive until the lambda finishes its execution.
However, we didn't move all of the captures into the lambda -- the missing one
was the `this` pointer that was implicitly used in the lambda.
Address sanitiser hasn't reported any bugs related to the pointer yet, but
the bug is most likely there.
In this commit, we transform the lambda's body into a new member function
and only call it from the lambda. This way, we don't need to care about
the lifetimes of the captures because Seastar ensures that the function's
arguments stay alive until the coroutine finishes.
Choosing this solution instead of assigning `this` to a pointer variable
inside the lambda's body and using it to refer to the object's members
has actual benefit: it's not possible to accidentally forget to refer
to a member of the object via the pointer; it also makes the code less
awkward.
Before these changes, we didn't specify which I/O scheduling
group commitlog instances in hinted handoff should use.
In this commit, we set it explicitly to the commitlog
scheduling group. The rationale for this choice is the fact
we don't want to cause a bottleneck on the write path
-- if hints are written too slowly, new incoming mutations
(NOT hints) might be rejected due to a too high number
of hints currently being written to disk; see
`storage_proxy::create_write_response_handler_helper()`
for more context.
Fixesscylladb/scylladb#18654Closesscylladb/scylladb#19170
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
There are two places in hints code that need gossiper: hist_sender
calling gossiper::is_alive() and endpoint_downtime_not_bigger_than()
helper in manager. Both can live with const gossiper, so the dependency
references and anchor pointers can be restricted to const too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In 6e79d64, the behavior of `manager::too_many_in_flight_hints_for()`
was accidentally modified. It remained unnoticed for some time
and then fixed. In this commit, we add a test verifying that
the concurrency of hints being written to disk is indeed limited
and the limitations are imposed properly.
Until now, the constant `HINT_FILE_WRITE_TIMEOUT` was
declared as a static member of `db::hints::manager`.
However, the constant is only ever used in one
translation unit, so it makes more sense to move it
there and not include boilerplate in a header.
In this commit, we add a new metric `sent_total_size`
keeping track of how many bytes of hints a node
has sent. The metric is supposed to complement its
counterpart in storage proxy that counts how many
bytes of hints a node has received. That information
should prove useful in analyzing statistics of
a cluster -- load on given nodes and where it comes
from.
We also change the name of the matric `sent`
to `sent_total` to avoid the conflict of prefixes
between the two metrics.
This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased.
The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost.
Refs scylladb/scylladb#6403Fixesscylladb/scylladb#12278Closesscylladb/scylladb#15567
* github.com:scylladb/scylladb:
docs: Update Hinted Handoff documentation
db/hints: Add endpoint_downtime_not_bigger_than()
db/hints: Migrate hinted handoff when cluster feature is enabled
db/hints: Handle arbitrary directories in resource manager
db/hints: Start using hint_directory_manager
db/hints: Enforce providing IP in get_ep_manager()
db/hints: Introduce hint_directory_manager
db/hints/resource_manager: Update function description
db/hints: Coroutinize space_watchdog::scan_one_ep_dir()
db/hints: Expose update lock of space watchdog
db/hints: Add function for migrating hint directories to host ID
db/hints: Take both IP and host ID when storing hints
db/hints: Prepare initializing endpoint managers for migrating from IP to host ID
db/hints: Migrate to locator::host_id
db/hints: Remove noexcept in do_send_one_mutation()
service: Add locator::host_id to on_leave_cluster
service: Fix indentation
db/hints: Fix indentation
This commit introduces a new class responsible
for keeping track of mappings IP-host ID.
Before hinted handoff is migrated to using
host IDs, hint directories still have to
represent IP addresses. However, since
we identify endpoint managers by host IDs
already, we need to be able to associate
them with the directories they manage.
This class serves this purpose.
We extract the initialization of endpoint managers
from the start method of the hint manager
to a separate function and make it handle directories
that represent either IP addresses, or host IDs;
other directories are ignored.
It's necessary because before Scylla is upgraded
to a version that uses host-ID-based hinted handoff,
we need to continue only managing IP directories.
When Scylla has been upgraded, we will need to handle
host ID directories.
It may also happen that after an upgrade (but not
before it), Scylla fails while renaming
the directories, so we end up with some of them
representing IP address, and some representing
host IDs. After these changes, the code handles
that scenario as well.
We change the type of node identifiers
used within the module and fix compilation.
Directories storing hints to specific nodes
are now represented by host IDs instead of
IPs.
While the function is marked as noexcept, the returned
future can in fact store an exception. We remove the
specifier to reflect the actual behavior of the
function.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.
Refs scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
database::get_token_metadata() is switched to token_metadata2.
get_all_ips method is added to the host_id-based token_metadata, since
its convenient and will be used in several places. It returns all current
nodes converted to inet_address by means of the topology
contained within token_metadata.
hint_sender::can_send: if the node has already left the
cluster we may not find its host_id. This case is handled
in the same way as if it's not a normal token owner - we
simply send a hint to all replicas.
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
This commit makes with_file_update_mutex() a method of hint_endpoint_manager
and introduces db::hints::manager::with_file_update_mutex_for() for accessing
it from the outside. This way, hint_endpoint_manager is hidden and no one
needs to know about its existence.
This commit makes the function less compact and abides by the limit
of 120 characters per line; that makes the code more readable.
We start using fmt::to_string instead of seastar::format("{:d"})
to convert strings to integers -- the new way is the preferred one.
The changes also name variables in a more descriptive way.
This commit makes the function less compact and abides by the limit
of 120 characters per line. That makes the code more readable.
It also doesn't unnecessarily call c_str() on seastar::sstring.
There is no need to call c_str() on the name of the directory entry.
In fact, the used overload std::stoi() takes an std::string as its
argument. Providing seastar::sstring instead of const char* is more
efficient because we can allocate just the right amount of memory
and std::memcpy it, i.e. call std::string(const char*, std::size_t).
Using the overload std::string(const char*) would need to first
traverse the string to find the null byte.
This is a small change, all the more because paths don't tend to
be long, but it's some gain nonetheless.
The commit also inserts a few empty lines to make the code less
compact and improve readability as a result.
An anonymous namespace is a safer mechanism than the static
keyword. When adding a new piece of code, it's easy to
forget about adding the static. In that case, that code
might undergo external linkage. However, when code is put
in an anonymous namespace (when it should not), the linker
will immediately detect it (in most cases), and
the programmer will be able to spot and fix their mistake
right away.
This commit renames `end_point_hints_manager` to `hint_endpoint_manager`
to be consistent with other names used in the module (they all start
with `hint_`).
This commit continues modularizing manager.hh.
After moving the declaration of sender to a dedicated
header file, these changes move its implementation to
a separate source file.
This commit is yet another step in modularizing manager.hh.
We move the declaration of sender to a dedicated file.
Its implementation will follow in a future commit.
The premise of these changes is the fact that we cannot have
a cycle of #includes.
Because the declaration of `sender` is going to be moved to
a separate header file in a future commit, and because that
header file is going to be included in the file where
`end_point_hints_manager` is declared, we will need to rely
on `end_point_hints_manager` being an incomplete type there.
A consequence of that is that we cannot access any of
`end_point_hints_manager`'s methods.
This commit prepares the ground for it by moving
the definition of the function to the source file where
`end_point_hints_manager` will be a complete type.
This commit continues moving end_point_hints_manager to its
dedicated files. After moving the declaration of the class,
these changes move the implementation.
This commit is yet another step in modularizing manager.hh.
We move the declaration of the class to a dedicated header file.
The implementation will follow in a future commit.
This commit moves types used by shard hint manager
and related to storing hints on disk to another file.
It is yet another step in modularizing manager.hh.
This commit extracts the logger used in manager.cc
to prepare the ground for modularization of manager.hh
into separate smaller files. We want to preserve
the logging behavior (at least for the time being),
which means new files should use the same logger.
These changes serve that purpose.
Currently, data structures used in manager.hh
use their own aliases for gms::inet_address.
It is clear they all should use the same type
and having different names for it only reduces
readability of the code. This commit introduces
a common alias -- endpoint_id -- and gets rid
of the other ones.
This commit is also the first step in modularizing
manager.hh by extracting common types to another
file.