Commit Graph

14 Commits

Author SHA1 Message Date
Dawid Mędrek
f83f911bae test: cluster: Reduce wait time in test_sync_point
If everything is OK, the sync point will not resolve with node 3 dead.
As a result, the waiting will use all of the time we allocate for it,
i.e. 30 seconds. That's a lot of time.

There's no easy way to verify that the sync point will NOT resolve, but
let's at least reduce the waiting to 3 seconds. If there's a bug, it
should be enough to trigger it at some point, while reducing the average
time needed for CI.
2026-02-10 17:05:02 +01:00
Dawid Mędrek
a256ba7de0 test: cluster: Fix test_sync_point
The test had a few shortcomings that made it flaky or simply wrong:

1. We were verifying that hints were written by checking the size of
   in-flight hints. However, that could potentially lead to problems
   in rare situations.

   For instance, if all of the hints failed to be written to disk, the
   size of in-flight hints would drop to zero, but creating a sync point
   would correspond to the empty state.

   In such a situation, we should fail immediately and indicate what
   the cause was.

2. A sync point corresponds to the hints that have already been written
   to disk. The number of those is tracked by the metric `written`.
   It's a much more reliable way to make sure that hints have been
   written to the commitlog. That ensures that the sync point we'll
   create will really correspond to those hints.

3. The auxiliary function `wait_for` used in the test works like this:
   it executes the passed callback and looks at the result. If it's
   `None`, it retries it. Otherwise, the callback is deemed to have
   finished its execution and no further retries will be attempted.

   Before this commit, we simply returned a bool, and so the code was
   wrong. We improve it.

Note that this fixes scylladb/scylladb#28203, which was a manifestation
of scylladb/scylladb#25879. We created a sync point that corresponded
to the empty state, and so it immediately resolved, even when node 3
was still dead.

Refs scylladb/scylladb#25879
Fixes scylladb/scylladb#28203
2026-02-10 17:05:02 +01:00
Dawid Mędrek
c5239edf2a test: cluster: Await sync points asynchronously
There's a dedicated HTTP API for communicating with the cluster, so
let's use it instead of yet another custom solution.
2026-02-10 17:05:02 +01:00
Dawid Mędrek
ac4af5f461 test: cluster: Create sync points asynchronously
There's a dedicated HTTP API for communicating with the nodes, so let's
use it instead of yet another custom solution.
2026-02-10 17:05:01 +01:00
Dawid Mędrek
628e74f157 test: cluster: Fetch hint metrics asynchronously
There's a dedicated API for fetching metrics now. Let's use it instead
of developing yet another solution that's also worse.
2026-02-10 17:04:59 +01:00
Andrei Chekun
cc5ac75d73 test.py: remove deprecated skip_mode decorator
Finishing the deprecation of the skip_mode function in favor of
pytest.mark.skip_mode. This PR is only cleaning and migrating leftover tests
that are still used and old way of skip_mode.

Closes scylladb/scylladb#28299
2026-01-25 18:17:27 +02:00
Tomasz Grabiec
5e6935f276 test: Use ManagerClient.{disable,enable}_tablet_balancing() 2026-01-13 00:38:00 +01:00
Andrei Chekun
c950c2e582 test.py: convert skip_mode function to pytest.mark
Function skip_mode works only on function and only in cluster test. This if OK
when we need to skip one test, but it's not possible to use it with pytestmark
to automatically mark all tests in the file. The goal of this PR is to migrate

skip_mode to be dynamic pytest.mark that can be used as ordinary mark.

Closes scylladb/scylladb#27853

[avi: apply to test/cluster/test_tablets.py::test_table_creation_wakes_up_balancer]
2026-01-08 21:55:16 +02:00
Andrei Chekun
24d17c3ce5 test.py: rewrite the wait_for_first_completed
Rewrite wait_for first_completed to return only first completed task guarantee
of awaiting(disappearing) all cancelled and finished tasks
Use wait_for_first_completed to avoid false pass tests in the future and issues
like #26148
Use gather_safely to await tasks and removing warning that coroutine was
not awaited

Closes scylladb/scylladb#26435
2025-10-22 01:13:43 +03:00
Wojciech Mitros
eb92f50413 hinted_handoff: drain hints after the target node stops owning tokens
When a node is being replaced, it enters a "left" state while still
owning tokens. Before this patch, this is also the time when we start
draining hints targeted to this node, so the hints may get sent before
the token ownership gets migrated to another replica, and these hints
may get lost.
In this patch we postpone the hint draining for the "left" nodes to
the time when we know that the target nodes no longer hold ownership
of any tokens - so they're no longer referenced in topology. I'm
calling such nodes "released".

Before this change, when we were starting draining hints, we knew the
IP addresses of the target nodes. We lose this information after entering
"left" stage, so when draining hints after a node is "released", we
can't drain the hints targeted to a specific IP instead of host_id.
We may have hints targeted to IPs if the migration rom IP-based to
host_ID-based hints didn't happen yet. The migration happens
when enabling a cluster feature since 2024.2.0, so such hints can
only exist if we perform a direct upgrade from a version before
2024.2.0 to a version that has this change (2025.4.0+).
To avoid losing hints completely when such an upgrade is combined
with a node removal/replacement, we still drain hints when the node
enters a "left" state and the migration of hints to host_id wasn't
performed yet. For these drains, the problematic scenario can't
occur because it only affects tablets, and when upgrading from
a version before 2024.2.0, no tablets can exist yet.
If we perform such a drain, we no longer need to drain hints when
entering the "released" state, so we only drain when entering that
state if the migration was already completed.
With this setup, we'll always drain hints at least once when a node
is leaving. However, if the migration to host_ids finishes between
entering the "left" state and the "released" state, we'll attempt
to drain the hints twice. This shouldn't be problem though because
each `drain_for()` is performed with the `_drain_lock` and after
a `hint_endpoint_manger` is drained, it's removed, so we won't try
to drain it twice.

This patch also includes a test for verifying that hints are properly
replayed after a node replace.

Fixes https://github.com/scylladb/scylladb/issues/24980

Closes scylladb/scylladb#24981
2025-09-24 07:11:59 +02:00
Wojciech Mitros
10b8e1c51c storage_proxy: send hints to pending replicas
Consider the following scenario:
- Current replica set is [A, B, C]
- write succeeds on [A, B], and a hint is logged for node C
- before the hint is replayed, D bootstraps and the token migrates from C to D
- hint is replayed to node C while D is pending, but it's too late, since streaming for that token is already done
- C is cleaned up, replayed data is lost, and D has a stale copy until next repair.
In the scenario we effectively fail to send the hint. This scenario is also more likely to happen with tablets,
as it can happen for every tablet migration.

This issue is particularly detrimental to materialized views. View updates use hints by default and a specific
view update may be sent to just one view replica (when a single base replica has a different row state due to
reordering or missed writes). When we lose a hint for such a view update, we can generate a persistent inconsistency
between the base and view - ghost rows can appear due to a lost tombstone and rows may be missing in the view due
to a lost row update. Such inconsistencies can't be fixed neither by repairing the view or the base table.

To handle this, in this patch we add the pending replicas to the list of targets of each hint, even if the original
target is still alive.

This will cause some updates to be redundant. These updates are probably unavoidable for now, but they shouldn't
be too common either. The scenarios for them are:
1. managing to send the hint to the source of a migrating replica before streaming that its token - the write will
arrive on the pending replica anyway in streaming
2. the hint target not being the source of the migration - if we managed to apply the original write of the hint to
the actual source of the migration, the pending replica will get it during streaming
3. sending the same hint to many targets at a similar time - while sending to each target, we'll see the same pending
replica for the hint so we'll send it multiple times
4. possible retries where even though the hint was successfully sent to the main target, we failed to send it to the
pending replica, so we need to retry the entire write

This patch handles both tablet migrations and tablet rebuilds. In the future, for tablet migrations, we can avoid
sending the hint to pending replias if the hint target is not the source fo the migration, which would allow us to
avoid the redundant writes 2 and 3. For rack-aware RF, this will be as simple as checking whether the replicas are
in the same rack.

We also add a test case reproducing the issue.

Co-Authored-By: Raphael S. Carvalho <raphaelsc@scylladb.com>

Fixes https://github.com/scylladb/scylladb/issues/19835

Closes scylladb/scylladb#25590
2025-09-08 09:18:20 +02:00
Evgeniy Naydanov
f6e3fdd778 test.py: rework log_browsing for dtest migration
Rework `ScyllaLogFile.wait_for()` method to make it easier
to add required methods to ScyllaNode class of ccm-like shim.

Also, added `ScyllaLogFile.grep_for_errors()` method and
reworked `ScyllaLogFile.grep()`
2025-05-19 11:50:55 +00:00
Dawid Mędrek
0a6137218a db/hints: Cancel draining when stopping node
Draining hints may occur in one of the two scenarios:

* a node leaves the cluster and the local node drains all of the hints
  saved for that node,
* the local node is being decommissioned.

Draining may take some time and the hint manager won't stop until it
finishes. It's not a problem when decommissioning a node, especially
because we want the cluster to retain the data stored in the hints.
However, it may become a problem when the local node started draining
hints saved for another node and now it's being shut down.

There are two reasons for that:

* Generally, in situations like that, we'd like to be able to shut down
  nodes as fast as possible. The data stored in the hints won't
  disappear from the cluster yet since we can restart the local node.
* Draining hints may introduce flakiness in tests. Replaying hints doesn't
  have the highest priority and it's reflected in the scheduling groups we
  use as well as the explicitly enforced throughput. If there are a large
  number of hints to be replayed, it might affect our tests.
  It's already happened, see: scylladb/scylladb#21949.

To solve those problems, we change the semantics of draining. It will behave
as before when the local node is being decommissioned. However, when the
local node is only being stopped, we will immediately cancel all ongoing
draining processes and stop the hint manager. To amend for that, when we
start a node and it initializes a hint endpoint manager corresponding to
a node that's already left the cluster, we will begin the draining process
of that endpoint manager right away.

That should ensure all data is retained, while possibly speeding up
the shutdown process.

There's a small trade-off to it, though. If we stop a node, we can then
remove it. It won't have a chance to replay hints it might've before
these changes, but that's an edge case. We expect this commit to bring
more benefit than harm.

We also provide tests verifying that the implementation works as intended.

Fixes scylladb/scylladb#21949

Closes scylladb/scylladb#22811
2025-03-13 11:55:15 +02:00
Artsiom Mishuta
d1198f8318 test.py: rename topology_custom folder to cluster
rename topology_custom folder to cluster
as it contains not only topology test cases
2025-03-04 10:32:44 +01:00