Commit Graph

54 Commits

Author SHA1 Message Date
Dawid Mędrek
0a6137218a db/hints: Cancel draining when stopping node
Draining hints may occur in one of the two scenarios:

* a node leaves the cluster and the local node drains all of the hints
  saved for that node,
* the local node is being decommissioned.

Draining may take some time and the hint manager won't stop until it
finishes. It's not a problem when decommissioning a node, especially
because we want the cluster to retain the data stored in the hints.
However, it may become a problem when the local node started draining
hints saved for another node and now it's being shut down.

There are two reasons for that:

* Generally, in situations like that, we'd like to be able to shut down
  nodes as fast as possible. The data stored in the hints won't
  disappear from the cluster yet since we can restart the local node.
* Draining hints may introduce flakiness in tests. Replaying hints doesn't
  have the highest priority and it's reflected in the scheduling groups we
  use as well as the explicitly enforced throughput. If there are a large
  number of hints to be replayed, it might affect our tests.
  It's already happened, see: scylladb/scylladb#21949.

To solve those problems, we change the semantics of draining. It will behave
as before when the local node is being decommissioned. However, when the
local node is only being stopped, we will immediately cancel all ongoing
draining processes and stop the hint manager. To amend for that, when we
start a node and it initializes a hint endpoint manager corresponding to
a node that's already left the cluster, we will begin the draining process
of that endpoint manager right away.

That should ensure all data is retained, while possibly speeding up
the shutdown process.

There's a small trade-off to it, though. If we stop a node, we can then
remove it. It won't have a chance to replay hints it might've before
these changes, but that's an edge case. We expect this commit to bring
more benefit than harm.

We also provide tests verifying that the implementation works as intended.

Fixes scylladb/scylladb#21949

Closes scylladb/scylladb#22811
2025-03-13 11:55:15 +02:00
Kefu Chai
6e4cb20a69 tree: implement boost::accumulate with std::ranges library
Replace boost::accumulate() calls with std::ranges facilities. This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23062
2025-02-26 23:22:02 +02:00
Kefu Chai
7ff0d7ba98 tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22857
2025-02-15 20:32:22 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Avi Kivity
9024e4940c counters.hh: drop unused boost includes
Re-add them to source files that need them.

Closes scylladb/scylladb#21738
2024-12-05 12:27:41 +02:00
Kefu Chai
f436edfa22 mutation: remove unused "#include"s
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-29 14:01:44 +08:00
Kefu Chai
24d14b601b treewide: s/boost::adaptors::map_values/std::views::values/
now that we are allowed to use C++23. we now have the luxury of using
`std::views::values`.

in this change, we:

- replace `boost::adaptors::map_values` with `std::views::values`
- update affected code to work with `std::views::values`
- the places where we use `boost::join()` are not changed, because
  we cannot use `std::views::concat` yet. this helper is only
  available in C++26.

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21265
2024-10-27 21:32:45 +02:00
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Pavel Emelyanov
dd7c7c301d hints: Const-ify gossiper references and anchor pointers
There are two places in hints code that need gossiper: hist_sender
calling gossiper::is_alive() and endpoint_downtime_not_bigger_than()
helper in manager. Both can live with const gossiper, so the dependency
references and anchor pointers can be restricted to const too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-26 16:28:54 +03:00
Dawid Medrek
2446cce272 db/hints: Initialize endpoint managers only for valid hint directories
Before these changes, it could happen that Scylla initialized
endpoint managers for hint directories representing

* host IDs before migrating hinted handoff to using host IDs,
* IP addresses after the migration.

One scenario looked like this:

1. Start Scylla and upgrade the cluster to using host IDs.
2. Create, by hand, a hint directory representing an IP address.
3. Trigger changing the host filter in hinted handoff; it could
   be achieved by, for example, restricting the set of data
   centers Scylla is allowed to save hints for.

When changing the host filter, we browse the hint directories
and create endpoint managers if we can send hints towards
the node corresponding to a given hint directory. We only
accepted hint directories representing IP addresses
and host IDs. However, we didn't check whether the local node
has already been upgraded to host-ID-based hinted handoff
or not. As a result, endpoint managers were created for
both IP addresses and host IDs, no matter whether we were
before or after the migration.

These changes make sure that any time we browse the hint
directories, we take that into account.

Fixes scylladb/scylladb#19172

Closes scylladb/scylladb#19173
2024-06-21 15:59:49 +02:00
Dawid Medrek
a5528a2093 db/hints: Log when ignoring invalid hint directories
In 58784cd, aa4b06a and other commits migrating
hinted handoff from IPs to host IDs (scylladb/scylladb#15567),
we started ignoring hint directories of invalid names,
i.e. those that represent neither an IP address, nor a host ID.
They remain on disk and are taken into account while computing
e.g. the total size of hints, but they're not used in any way.

These changes add logs informing the user when Scylla
encounters such a directory.

Closes scylladb/scylladb#17566
2024-06-07 19:19:15 +02:00
Dawid Medrek
58784cd8db db/hints: Handle arbitrary directories in resource manager
Before these changes, resource manager only handled
the case when directories it browsed represented
valid host IDs. However, since before migrating
hinted handoff to using host IDs we still name
directories after IP addresses, that would lead
to exceptins that shouldn't happen.

We make resource manager handle directories
of arbitrary names correctly.
2024-04-27 22:31:07 +02:00
Dawid Medrek
59d49c5219 db/hints: Coroutinize space_watchdog::scan_one_ep_dir() 2024-04-27 22:31:07 +02:00
Dawid Medrek
cfd03fe273 db/hints: Migrate to locator::host_id
We change the type of node identifiers
used within the module and fix compilation.
Directories storing hints to specific nodes
are now represented by host IDs instead of
IPs.
2024-04-26 22:44:04 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Dawid Medrek
1c70a18fc7 db/hints: Use manager as API for hint_endpoint_manager
This commit makes with_file_update_mutex() a method of hint_endpoint_manager
and introduces db::hints::manager::with_file_update_mutex_for() for accessing
it from the outside. This way, hint_endpoint_manager is hidden and no one
needs to know about its existence.
2023-10-06 12:15:01 +02:00
Dawid Medrek
18a2831186 db/hints: Use reference for storage proxy
This commit makes db::hints::manager store service::storage_proxy
as a reference instead of a seastar::shared_ptr. The manager is
owned by storage proxy, so it only lives as long as storage proxy
does. Hence, it makes little sense to store the latter as a shared
pointer; in fact, it's very confusing and may be error-prone.
The field never changes, so it's safe to keep it as a reference
(especially because copy and move constructors of db::hints::manager
are both deleted). What's more, we ensure that the hint manager
has access to storage proxy as soon as it's created.

The same changes were applied to db::hints::resource_manager.
The rationale is the same.
2023-10-06 11:54:15 +02:00
Pavel Emelyanov
bc62ca46d4 lister: Make lister::dir_entry_types an enum_set
This type is currently an unordered_set, but only consists of at most
two elements. Making it an enum_set renders it into a size_t variable
and better describes the intention.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-17 19:01:45 +03:00
Benny Halevy
ebbbf1e687 lister: move to utils
There's nothing specific to scylla in the lister
classes, they could (and maybe should) be part of
the seastar library.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-28 12:36:03 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Benny Halevy
f4cd535e3d resource_manager: remove unnecessary include of lister.hh from header file
But define namespace fs = std::filesystem in the header
since many use sites already depend on it
and it's a convention throught scylla's code.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-11 17:04:16 +02:00
Piotr Dulikowski
de1679b1b9 hints: make hints concurrency configurable and reduce the default
Previously, hinted handoff had a hardcoded concurrency limit - at most
128 hints could be sent from a single shard at once. This commit makes
this limit configurable by adding a new configuration option:
`max_hinted_handoff_concurrency_per_shard`. This option can be updated
in runtime. Additionally, the default concurrency per shard is made
lower and is now 8.

The motivation for reducing the concurrency was to mitigate the negative
impact hints may have on performance of the receiving node due to them
not being properly isolated with respect to I/O.

Tests:
- unit(dev)
- dtest(hintedhandoff_additional_test.py)

Refs: #8624

Closes #8646
2021-06-22 15:58:56 +02:00
Pavel Emelyanov
92a4278cd1 hints: Drop storage service from managers
The storage service pointer is only used so (un)subscribe
to (from) lifecycle events. Now the subscription is gone,
so can the storage service pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-06-17 15:09:36 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Piotr Dulikowski
60ac68b7a2 hints/resource_manager: add comments to register_manager
Adds more comments to resource_manager::register_manager in order to
better explain what this function is doing.
2020-11-19 16:34:37 +01:00
Piotr Dulikowski
c0c10b918c hints/resource_manager: fix indentation
Fixes indentation in prepare_per_device_limits.
2020-11-19 16:34:37 +01:00
Piotr Dulikowski
ead6a3f036 hints/resource_manager: improve mutual exclusion
This commit causes start, stop and register_manager methods of the
resource_manager to be serialized with respect to each other using the
_operation_lock.

Those function modify internal state, so it's best if they are
protected with a semaphore. Additionally, those function are not going
to be used frequently, therefore it's perfectly fine to protect them in
such a coarse manner.

Now, space_watchdog has a dedicated lock for serializing its on_timer
logic with resource_manager::register_manager. The reason for separate
lock is that resource_manager::stop cannot use the same lock as the
space_watchdog - otherwise a situation could occur in which
space_watchdog waits for semaphore units held by
resource_manager::stop(), and resource_manager::stop() waits until the
space_watchdog stops its asynchronous event loop.
2020-11-19 16:34:37 +01:00
Piotr Dulikowski
362aebee7b hints/resource_manager: correct prepare_per_device_limits usage
The resource_manager::prepare_per_device_limits function calculates disk
quota for registered hints managers, and creates an association map:
from a storage device id to those hints manager which store hints on
that device (_per_device_limits_map)

This function was used with an assumption that it is idempotent - which
is a wrong assumption. In resource_manager::register_manager, if the
resource_manager is already started, prepare_per_device_limits would be
called, and those hints managers which were previously added to the
_per_device_limits_map would be added again. This would cause the space
used by those managers to be calculated twice, which would artificially
lower the limit which we impose on the space hints are allowed to occupy
on disk.

This patch fixes this problem by changing the prepare_per_device_limits
function to operate on a hints manager passed by argument. Now, we make
sure that this function is called on each hints manager only once.
2020-11-19 16:34:37 +01:00
Piotr Dulikowski
a4f03d72b3 hints/resource_manager: allow registering managers after start
This change modifies db::hints::resource_manager so that it is now
possible to add hints::managers after it was started.

This change will make it possible to register the regular hints manager
later in runtime, if it wasn't enabled at boot time.
2020-11-17 10:15:47 +01:00
Piotr Sarna
180a1505fd hints: track resource_manager sending queue length
The number of tasks waiting for a hint to be sent is now tracked.
2020-08-11 17:43:53 +02:00
Benny Halevy
a96087165a hints: get_device_id: use seastar file_stat
This avoids potential use-after-move, since undefined c++ sequencing order
may std::move(f) in the lambda capture before evaluating f.stat().

Also, this makes use of a more generic library function that doesn't
require to open and hold on to the file in the application.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200514152054.162168-1-bhalevy@scylladb.com>
2020-05-15 10:11:45 +02:00
Avi Kivity
88ade3110f treewide: replace calls to engine().some_api() with some_api()
This removes the need to include reactor.hh, a source of compile
time bloat.

In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.

Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().

Ref #1
2020-04-05 12:46:04 +03:00
Avi Kivity
1799cfa88a logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope
This allows us to drop a #include <reactor.hh>, reducing compile time.

Several translation units that lost access to required declarations
are updated with the required includes (this can be an include of
reactor.hh itself, in case the translation unit that lost it got it
indirectly via logalloc.hh)

Ref #1.
2020-04-05 12:45:08 +03:00
Pavel Emelyanov
d1775dd701 utils: Move disk-error-handler into it
The disk-error-handler is purely auxiliary thing that helps
propagating IO errors to the rest of the code. It well
deserves not sitting in the root namespace.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200207112443.18475-1-xemul@scylladb.com>
2020-02-09 17:26:52 +02:00
Piotr Sarna
9c5a5a5ac2 treewide: add names to semaphores
By default, semaphore exceptions bring along very little context:
either that a semaphore was broken or that it timed out.
In order to make debugging easier without introducing significant
runtime costs, a notion of named semaphore is added.
A named semaphore is simply a semaphore with statically defined
name, which is present in its errors, bringing valuable context.
A semaphore defined as:

  auto sem = semaphore(0);

will present the following message when it breaks:
"Semaphore broken"
However, a named semaphore:

  auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"});

will present a message with at least some debugging context:

  "Semaphore broken: io_concurrency_sem"

It's not much, but it would really help in pinpointing bugs
without having to inspect core dumps.

At the same time, it does not incur any costs for normal
semaphore operations (except for its creation), but instead
only uses more CPU in case an error is actually thrown,
which is considered rare and not to be on the hot path.

Refs #4999

Tests: unit(dev), manual: hardcoding a failure in view building code
2019-11-26 15:14:21 +02:00
Vlad Zolotarov
d253846c91 hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
The endpoint directories scanned by space_watchdog may get deleted
by the manager::drain_for().

If a deleted directory is given to a lister::scan_dir() this will end up
in an exception and as a result a space_watchdog will skip this round
and hinted handoff is going to be disabled (for all agents including MVs)
for the whole space_watchdog round.

Let's make sure this doesn't happen by serializing the scanning and deletion
using end_point_hints_manager::file_update_mutex.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 11:46:46 -04:00
Vlad Zolotarov
b34c36baa2 hinted handoff: make taking file_update_mutex safe
end_point_hints_manager::file_update_mutex is taken by space_watchdog
but while space_watchdog is waiting for it the corresponding
end_point_hints_manager instance may get destroyed by manager::drain_for()
or by manager::stop().

This will end up in a use-after-free event.

Let's change the end_point_hints_manager's API in a way that would prevent
such an unsafe locking:

   - Introduce the with_file_update_mutex().
   - Make end_point_hints_manager::file_update_mutex() method private.

Fixes #4685
Fixes #4836

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-08-20 11:26:19 -04:00
Benny Halevy
ff4d8b6e85 treewide: use std::filesystem
Rather than {std::experimental,boost,seastar::compat}::filesystem

On Sat, 2019-03-23 at 01:44 +0200, Avi Kivity wrote:
> The intent for seastar::compat was to allow the application to choose
> the C++ dialect and have seastar follow, rather than have seastar choose
> the types and have the application follow (as in your patch).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-28 14:21:10 +02:00
Benny Halevy
857ff4f59a database: directly use std::experimental::filesystem::path for lister::path
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
585ac6e641 database: use std::experimental::filesystem::path for lister::path
We would like to get rid of boost::filesystem and gradually replace it with
std::experimental::filesystem.

TODO: using namespace fs = std::experimental::filesystem,
use fs::path directly, rather than lister::path

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Vlad Zolotarov
aca0882a3f hinted handoff: enable storing hints before starting messaging_service
When messaging_service is started we may immediately receive a mutation
from another node (e.g. in the MV update context). If hinted handoff is not
ready to store hints at that point we may fail some of MV updates.

We are going to resolve this by start()ing hints::managers before we
start messaging_service and blocking hints replaying until all relevant
objects are initialized.

Refs #3828

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-18 16:49:58 -04:00
Duarte Nunes
6dcb7a39d4 db/hints/manager: Move decision about blocking hints to the manager
The space_watchdog enables or disables hints for the managers
associated with a particular device. We encapsulate this decision
inside the hints::managers by introducing the update_backlog()
function.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:35:00 +01:00
Duarte Nunes
207c9c8e38 db/hints/resource_manager: Correctly account resources in space_watchdog
A db::hints::resource_manager manages the resources for one or two
db::hints::managers. Each of these can be using the same or different
devices. The db::hints::space_watchdog periodically checks whether
each manager is within their resource allocation, and if not disables
it.

The watchdog iterates over the managers and accounts for the total
size they are using. This is wrong, since it can account in the same
variable the size consumed by managers using different devices.

We fix this while taking advantage of the fact that on_timer is now
called in the context of a seastar::thread, instead of using future
combinators.

Fixes #3821

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:34:54 +01:00
Duarte Nunes
25d266bdc1 db/hints/resource_manager: Replace timer with seastar::thread
Will make on_timer() much simpler to allow fixing a bug in subsequent
patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Duarte Nunes
278aa13bb0 db/hints/resource_manager: Ensure managers are correctly registered
Registering a manager for a new device used
std::unordered_map::emplace(), which may not insert the specified
value if one with the same key has already been added. This could
happen if both managers were using the same device and the fiber
deferred in-between adding them.

Found during code reading. Could cause hints to not be disabled for an
overloaded manager.

Fixes #3822

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Duarte Nunes
9e3b09cf48 db/hints/resource_manager: Fix formatting
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Piotr Sarna
828497ad19 hints: amend a comment in device limits
To make the comment less confusing, 'group of managers'
is used instead of 'device'.

Refs #3516

Reported-by: Vlad Zolotarov <vladz@scylladb.com>
Signed-off-by: Piotr Sarna <sarna@scylladb.com>
Message-Id: <60c9ab6b47195570f7ce7dff9556e3739b7ae00f.1529862547.git.sarna@scylladb.com>
2018-06-24 19:14:59 +01:00
Piotr Sarna
8b43ac3a57 hints: reserve more space for dedicated storage
Reserving 10% of space for hints managers makes sense if the device
is shared with other components (like /data or /commitlog).
But, if hints directory is mounted on a dedicated storage, it makes
sense to reserve much more - 90% was chosen as a sane limit.
Whether storage is 'dedicated' or not is based on a simple check
if given hints directory is a mount point.

Fixes #3516

Signed-off-by: Piotr Sarna <sarna@scylladb.com>
2018-06-22 10:27:00 +02:00
Piotr Sarna
32f86ca61e hints: add is_mountpoint function
A helper function that checks whether a path is also a mount point
is added.

Signed-off-by: Piotr Sarna <sarna@scylladb.com>
2018-06-22 10:26:52 +02:00
Piotr Sarna
b6c1b8c5ef hints: make space_watchdog device-aware
Instead of having one static space limit for all directories,
space_watchdog now keeps a per-device limit, shared among
hints managers residing on the same disks.

References #3516

Signed-off-by: Piotr Sarna <sarna@scylladb.com>
2018-06-22 10:26:45 +02:00