We drop the default argument in the function's signature.
Also, we adjust the code of change_host_filter() to
be able to perform calls to get_ep_manager().
This commit introduces a new class responsible
for keeping track of mappings IP-host ID.
Before hinted handoff is migrated to using
host IDs, hint directories still have to
represent IP addresses. However, since
we identify endpoint managers by host IDs
already, we need to be able to associate
them with the directories they manage.
This class serves this purpose.
We expose the update lock of space watchdog
to be able to prevent it from scanning
hint directories. It will be necessary in an
upcoming commit when we will be renaming hint
directories and possibly removing some of them.
Race conditions are unacceptable, so resource
manager cannot be able to access the directory
during that time.
We add a function that will be used while
migrating hinted handoff to using host IDs.
It iterates over existing hint directories
and tries to rename them to the corresponding
host IDs. In case of a failure, we remove
it so that at the end of its execution
the only remaining directories are those
that represent host IDs.
The store_hint() method starts taking both an IP
and a host ID as its arguments. The rationale
for the change is depending on the stage of
the cluster (before an upgrade to the
host-ID-based hinted handdof and after it),
we might need to create a directory representing
either an IP address, or a host ID.
Because locator::topology can change in the
before obtaining the host ID we pass
and when the function is being executed,
we need to pass both parameters explicitly
to ensure the consistency between them.
We extract the initialization of endpoint managers
from the start method of the hint manager
to a separate function and make it handle directories
that represent either IP addresses, or host IDs;
other directories are ignored.
It's necessary because before Scylla is upgraded
to a version that uses host-ID-based hinted handoff,
we need to continue only managing IP directories.
When Scylla has been upgraded, we will need to handle
host ID directories.
It may also happen that after an upgrade (but not
before it), Scylla fails while renaming
the directories, so we end up with some of them
representing IP address, and some representing
host IDs. After these changes, the code handles
that scenario as well.
We change the type of node identifiers
used within the module and fix compilation.
Directories storing hints to specific nodes
are now represented by host IDs instead of
IPs.
While the function is marked as noexcept, the returned
future can in fact store an exception. We remove the
specifier to reflect the actual behavior of the
function.
We extend the function
endpoint_lifecycle_subscriber::on_leave_cluster
by another argument -- locator::host_id.
It's more convenient to have a consistent
pair of IP and host ID.
Repair may miss some tablets that migrated across nodes.
So if tombstones expire after some timeout, then we can
have data resurrection.
Set default tombstone_gc mode to "repair" for tables which
use tablets (if repair is required).
Fixes: #16627.
Closesscylladb/scylladb#18013
* github.com:scylladb/scylladb:
test: check default value of tombstone_gc
test: topology: move some functions to util.py
cql3: statements: change default tombstone_gc mode for tablets
When reclaiming memory from bloom filters, do not remove them from
_recognised_components, as that leads to the on-disk filter component
being left back on disk when the SSTable is deleted.
Fixes#18398
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#18400
The FIXME was added back then because we thought the interface of
compaction_group_for_sstable might have to be adjusted if a sstable
were allowed to temporarily span multiple tablets until it's split,
but we have gone a different path.
If a sstable's key range incorrectly spans more than one tablet,
that will be considered a bug and an exception is thrown.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18410
We have a concurrent modification conflict in tests and suspect
duplicated requests but since we don't log successful requests
we have no way to verify if that's the case. get_mutations_internal log
will help to tell wchich nodes are trying to push auth or
service levels mutations into raft.
Refs scylladb/scylladb#18319Closesscylladb/scylladb#18413
Today with the backport automation, the developer added the relevant backport label, but without any explanation of why
Adding the PR template with a placeholder for the developer to add his decision about backport yes or no
The placeholder is marked as a task, so once the explanation is added, the task must be checked as completed
Also adding another check to the PR summary will make it clear to the maintainer/reviewer if the developer explained about backport
Closesscylladb/scylladb#18275
* github.com:scylladb/scylladb:
[github] add action to verify PR tasks was completed
[github] add PR template
There are two places that get total:live stats for a table snapshot --
database::get_snapshot_details() and table::get_snapshot_details(). Both
do pretty similar thing -- walk the table/snapshots/ directory, then
each of the found sub-directory and accumulate the found files' sizes
into snapshot details structure.
Both try to tell total from live sizes by checking whether an sstable
component found in snapshots is present in the table datadir. The
database code does it in a more correct way -- not just checks the file
presense, but also compares if it's a hardlink on the snapshot file,
while the table code just checks if the file of the same name exists.
This patch does both -- makes both database and table call the same
helper method for a single snapshot details, and makes the generalized
version use more elaborated collision check, thus fixing the per-table
details getting behavior.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18347
The event is used in a loop.
Found by clang-tidy:
```
streaming/stream_result_future.cc:80:49: warning: 'event' used after it was moved [bugprone-use-after-move]
listener->handle_stream_event(std::move(event));
^
streaming/stream_result_future.cc:80:39: note: move occurred here
listener->handle_stream_event(std::move(event));
^
streaming/stream_result_future.cc:80:49: note: the use happens in a later loop iteration than the move
listener->handle_stream_event(std::move(event));
^
```
Fixes#18332Closesscylladb/scylladb#18333
It's pretty straightforward, but prior to that, exception handling needs some care
Closesscylladb/scylladb#18378
* github.com:scylladb/scylladb:
view-builder: Coroutinize stop()
view_builder: Do not try to handle step join exceptions on stop
we are using `fmt::ostream_formatter` which was introduced in
{fmt} v9.0.0, see https://github.com/fmtlib/fmt/releases/tag/9.0.0 .
before this change, we depend on Seastar to find {fmt}. but
the minimal version of {fmt} required by Seastar is 5.0.0, which
cannot fulfill the needs to build scylladb.
in this change, we find {fmt} package in scylla, and specify the
minimal required version of 9.0.0, so the build can fail at the
configuration time. {fmt} v8 could be still being used by users.
for instance, ubuntu:jammy comes with libfmt-dev 8.1.1. and
ubuntu:jammy is EOL in Apr 2027, see
https://ubuntu.com/about/release-cycle .
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18386
Metric family config lets a user configure the metric family aggregate labels.
This patch modifies the existing relable-config from file to accept
metric family config.
Similar to the existing relable_config, it adds a metric_family_configs
section. For example, the following configuration demonstrates changing
aggregate labels by name and regular expression.
```
metric_family_configs:
- name: storage_service
aggregate_labels: [shard]
- regex: (storage_proxy.*)
aggregate_labels: [shard, scheduling_group_name]
```
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Closesscylladb/scylladb#18339
The API endpoint in question calls table::get_snapshot_detail() which just walks table/snapshots/ directory. This can clash with creating a new snapshot. Database-wide walk is guarded with snapshot-ctl's locking, so should the per-table API do
Closesscylladb/scylladb#18414
* github.com:scylladb/scylladb:
snapshot: Get per-table snapshot size under snapshot lock
snapshot: Move per-table snap API to other snapshot endpoints
in `partition_entry::apply_to_incomplete()`, we pass `*dst_snp` and
`std::move(dst_snp)` to build the capture variable list of a lambda,
but the order of evaluation of these variables are unspecified.
fortunately, we haven't run into any issues at this moment. but this
is not future-proof. so, let's avoid this by storing a reference
of the dereferenced smart pointer, and use it later on.
this issue is identified by clang-tidy:
```
/home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: warning: 'dst_snp' used after it was moved [bugprone-use-after-move]
500 | cur = partition_snapshot_row_cursor(s, *dst_snp),
| ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:502:23: note: move occurred here
502 | dst_snp = std::move(dst_snp),
| ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
500 | cur = partition_snapshot_row_cursor(s, *dst_snp),
| ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: warning: 'src_snp' used after it was moved [bugprone-use-after-move]
501 | src_cur = partition_snapshot_row_cursor(s, *src_snp, can_move),
| ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:504:23: note: move occurred here
504 | src_snp = std::move(src_snp),
| ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
501 | src_cur = partition_snapshot_row_cursor(s, *src_snp, can_move),
| ^
```
Fixes#18360
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18361
When issuing warnings about partitions with the number of rows above a configured threshold, the large partitions handler does not take into consideration the number of range tombstone markers in the total rows count. This fix adds the number of range tombstone markers to the total number of rows and saves this total in system.large_partitions.rows (if it is above the threshold). It also adds a new column range_tombstones to the system.large_partitions table which only contains the number of range tombstone markers for the given partition.
This PR fixes the first part of issue #13968
It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.
Closesscylladb/scylladb#18346
* github.com:scylladb/scylladb:
sstables: add docs changes for system.large_partitions
sstable: large data handler needs to count range tombstones as rows
before this change, if we generate the building system with plain
`Ninja`, instead of `Ninja Multi-Config` using cmake, the build
fails, because `${scylla_build_mode_${CMAKE_BUILD_TYPE}}` is not
defined. so the profile used for building the rust library would be
"rust-", which does not match any of the profiles defined by
`Cargo.toml`.
in this change, we use `$CMAKE_BUILD_TYPE` instead of "$config". as
the former is defined for non-multi generator. while the latter
is. see https://cmake.org/cmake/help/latest/generator/Ninja%20Multi-Config.html
with this change, we are able to generate the building system properly
with the "Ninja" generator. if we just want to run some static analyzer
against the source tree or just want to build scylladb with a single
configuration, the "Ninja" generator is a good fit.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18353
Walking per-table snapshot directory without lock is racy. There's
snapshot-ctl locking that's used to get db-wide snapshot details, it
should be used to get per-table snapshot details too
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
So that they are collected in one place and to facilitate next patch
that's going to use snapshot-ctl for per-table API too
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This commit adds the description and usage instructions of Scylla Doctor
to the "How to Report a ScyllaDB Problem" page.
Scylla Doctor replaces Health Check Report, so the description of
and references to the latter are removed with this commit.
Fixes https://github.com/scylladb/scylladb/issues/16276Closesscylladb/scylladb#17617
The off_strategy_updater is used during repair to update the automatic
off strategy timer so off_strategy compaction starts automatically only
after repair finishes. We still use off_strategy for tablets. So we
should still turn on the updater.
The update logic is used for vnode tables. We can share the code with
vnode table instead of copying, but since there is a possibility we
could disable off_strategy for tablets. We'd better postpone the code
sharing as follow ups. If later, we decide to disable off_strategy for
tablets, we can remove the updater for tablet.
Fixes#18196Closesscylladb/scylladb#18266
Currently a new node is marked as alive too late, after it is already
reported as a pending node. The patch series changes replace procedure
to be the same as what node_ops do: first stop reporting the IP of the
node that is being replaced as a natural replica for writes, then mark
the IP is alive, and only after that report the IP as a pending endpoint.
Fixes: scylladb/scylladb#17421
* 'gleb/17421-fix-v2' of github.com:scylladb/scylla-dev:
test_replace_reuse_ip: add data plane load
sync_raft_topology_nodes: make replace procedure similar to nodeops one
storage_service: topology_coordinator: fix indentation after previous patch
storage_service: topology coordinator: drop ring check in node_state::replacing state
In this commit we enhance test_replace_reuse_ip
to reproduce #17421. We create a test table and run
insert queries on it while the first node is
being replaced. In this form the test fails
without the fix from the previous commit. Some
insert requests fail with [Unavailable exception]
"Cannot achieve consistency level for cl QUORUM...".
In replace-with-same-ip a new node calls gossiper.start_gossiping
from join_token_ring with the 'advertise' parameter set to false.
This means that this node will fail echo RPC-s from other nodes,
making it appear as not alive to them. The node changes this only
in storage_service::join_node_response_handler, when the topology
coordinator notifies it that it's actually allowed to join the
cluster. The node calls _gossiper.advertise_to_nodes({}), and
only from this moment other nodes can see it as alive.
The problem is that topology coordinator sends this notification
in topology::transition_state::join_group0 state. In this state
nodes of the cluster already see the new node as pending,
they react with calling tmpr->add_replacing_endpoint and
update_topology_change_info when they process the corresponding
raft notification in sync_raft_topology_nodes. When the new
token_metadata is published, assure_sufficient_live_nodes
sees the new node in pending_endpoints. All of this happen
before the new node handled successful join notification,
so it's not alive yet. Suppose we had a cluster with three
nodes and we're replacing on them with a fourth node.
For cl=qurum assure_sufficient_live_nodes throws if
live < need + pending, which in our case becomes 2 < 2 + 1.
The end effect is that during replace-with-same-ip
data plane requests can fail with unavailable_exception,
breaking availability.
The patch makes boot procedure more similar to node ops one.
It splits the marking of a node as "being replaced" and adding it to
pending set in to different steps and marks it as alive in the middle.
So when the node is in topology::transition_state::join_group0 state
it marked as "being replaced" which means it will no longer be used for
reads and writes. Then, in the next state, new node is marked as alive and
is added to pending list.
fixesscylladb/scylladb#17421
Currently, the last mutation emitted by split_mutation could be empty.
It can happen as follows:
- consume range tombstone change at pos `1` with some timestamp
- consume clustering row at pos `2`
- flush: this will create mutation with range tombstone (1, 2) and
clustering row at 2
- consume range tombstone change at pos `2` with no timestamp (i.e.
closing rtc)
- end of partition
since the closing rtc has the same position as the clustering row, no
additional range tombstone will be emitted -- the only necessary range
tombstone was already emitted in the previous mutation.
On the other hand, `test_split_mutations` expects all emitted mutations
to be non-empty, which is a sane expectation for this function.
The test catched a case like this with random-seed=629157129.
Fix this by skipping the last mutation if it turns out to be empty.
Fixes: scylladb/scylladb#18042Closesscylladb/scylladb#18375
```
service/storage_service.cc:4288:62: warning: 'req' used after it was moved [bugprone-use-after-move]
node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable {
^
service/storage_service.cc:4288:107: note: move occurred here
node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable {
^
service/storage_service.cc:4288:62: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable {
^
```
if evaluation order is right-to-left (GCC), req is moved first, and req.ignore_nodes will be empty,
so nodes that should be ignored will still be considered, potentially resulting in a failure during
replace.
https://godbolt.org/z/jPcM6GEx1
courtesy of clang-tidy.
Fixes#18324.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18366
Currently, if tombstone_gc mode isn't specified for a table,
then "timeout" is used by default. With tablets, running
"nodetool repair -pr" may miss a tablet if it migrated across
the nodes. Then, if we expire tombstones for ranges that
weren't repaired, we may get data resurrection.
Set default tombstone_gc mode value for DDLs that don't
specify it. It's set to "repair" for tables which use tablets
unless they use local replication strategy or rf = 1.
Otherwise it's set to "timeout".
Commit 23c891923e (main: make sure view_builder doesn't propagate
semaphore errors) ignored some exceptions that could pop up from the
_build_step/do_build_step() serialized action, since they are "benign"
on stop.
Later there came b56b10a4bb (view_builder: do_build_step: handle
unexpected exceptions) that plugged any exception from the action in
question, regardless of they happen on stop or run-time.
Apparently, the latter commit supersedes the former.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This commit excludes OSS-specific links and content
added in https://github.com/scylladb/scylladb/pull/17624
to separate files and adds the include directive `.. scylladb_include_flag::`
to include these files in the doc source files.
Reason: Adding the link to the Open Source upgrade guide
(/upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology)
breaks the Enterprise documentation because the Enterprise docs don't
contain that upgrade guide. We must add separate files for OSS and
Enterprise to prevent failing the Enterprise build and breaking the
links.
Closesscylladb/scylladb#18372
```
sstables/storage.cc:152:21: warning: 'file_path' used after it was moved [bugprone-use-after-move]
remove_file(file_path).get();
^
sstables/storage.cc:145:64: note: move occurred here
auto w = file_writer(output_stream<char>(std::move(sink)), std::move(file_path));
```
It's a regression when TOC is found for a new sstable, and we try to delete temporary TOC.
courtesy of clang-tidy.
Fixes#18323.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18367
In order to check if a snapshot of a certain name exists the checking
method opens directory. It can be made with more lightweight call.
Also, though not critical, is that it fogets to close it.
Coroutinuze the method while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18365
There's a database::get_snapshot_details() method that returns collection of all snapshots for all ks.cf out there and there are several *snapshot_details* aux structures around it. This PR keeps only one "details" and cleans up the way it propagates from database up to the respective API calls.
Closesscylladb/scylladb#18317
* github.com:scylladb/scylladb:
snapshot_ctl: Brush up true_snapshots_size() internals
snapshot_ctl: Remove unused details struct
snapshot_ctl: No double recoding of details
database,snapshots: Move database::snapshot_details into snapshot_ctl
database,snapshots: Make database::get_snapshot_details() return map, not vector
table,snapshots: Move table::snapshot_details into snapshot_ctl
`future::get0()` was deprecated in favor of `future::get()`. so
let's use the latter instead. this change silences a `-Wdeprecated`
warning.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18357