Modify test_batchlog_replay_failure_during_repair to also check
that there isn't data resurrection if flushing hints falls within
the repair cache timeout.
(cherry picked from commit e3dcb7e827)
Currently, if flushing hints falls within the repair cache timeout,
then the flush_time is set to batchlog_manager::_last_replay.
_last_replay is updated on each replay, even if some batches weren't
replayed. Due to that, we risk the data resurrection.
Update _last_replay only if all batches were replayed.
Fixes: https://github.com/scylladb/scylladb/issues/24415.
(cherry picked from commit 4d0de1126f)
reader_permit::release_base_resources() is a soft evict for the permit:
it releases the resources aquired during admission. This is used in
cases where a single process owns multiple permits, creating a risk for
deadlock, like it is the case for repair. In this case,
release_base_resources() acts as a manual eviction mechanism to prevent
permits blockings each other from admission.
Recently we found a bad interaction between release_base_resources() and
permit eviction. Repair uses both mechanism: it marks its permits as
inactive and later it also uses release_base_resources(). This partice
might be worth reconsidering, but the fact remains that there is a bug
in the reader permit which causes the base resources to be released
twice when release_base_resources() is called on an already evicted
permit. This is incorrect and is fixed in this patch.
Improve release_base_resources():
* make _base_resources const
* move signal call into the if (_base_resources_consumed()) { }
* use reader_permit::impl::signal() instead of
reader_concurrency_semaphore::signal()
* all places where base resources are released now call
release_base_resources()
A reproducer unit test is added, which fails before and passes after the
fix.
Fixes: #28083Closesscylladb/scylladb#28155
(cherry picked from commit b7bc48e7b7)
Closesscylladb/scylladb#28242
In storage_service::raft_topology_cmd_handler we pass a lambda
wrapped in coroutine::lambda to a function that creates streaming_task_impl.
The lambda is kept in streaming_task_impl that invokes it in its run
method.
The lambda captures may be destroyed before the lambda is called, leading
to use after free.
Do not wrap a lambda passed to streaming_task_impl into coroutine::lambda.
Use this auto dissociate the lambda lifetime from the calling statement.
Fixes: https://github.com/scylladb/scylladb/issues/28200.
Closesscylladb/scylladb#28201
(cherry picked from commit 65cba0c3e7)
Closesscylladb/scylladb#28241
The loop that unwraps nested exception, rethrows nested exception and saves pointer to the temporary std::exception& inner on stack, then continues. This pointer is, thus, pointing to a released temporary
Closesscylladb/scylladb#28143
(cherry picked from commit 829bd9b598)
Closesscylladb/scylladb#28239
It is observed that:
repair - repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: Failed to update
system.repair_history table of node d27de212-6f32-4649ad76-a9ef1165fdcb:
seastar::rpc::remote_verb_error
(repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: range (minimum
token,maximum token) is not in the format of (start, end])
This is because repair checks the end of the range to be repaired needs
to be inclusive. When small_table_optimization is enabled for regular
repair, a (minimum token,maximum token) will be used.
To fix, we can relax the check of (start, end] for the min max range.
Fixes#27220
Backport to all active branches.
(cherry picked from commit e97a504)
Parent PR: #27357Closesscylladb/scylladb#27460
The `make_key` lambda erroneously allocates a fixed 8-byte buffer
(`sizeof(s.size())`) for variable-length strings, potentially causing
uninitialized bytes to be included. If such bytes exist and they are
not valid UTF-8 characters, deserialization fails:
```
ERROR 2026-01-16 08:18:26,062 [shard 0:main] testlog - snapshot_list_contains_dropped_tables: cql env callback failed, error: exceptions::invalid_request_exception (Exception while binding column p1: marshaling error: Validation failed - non-UTF8 character in a UTF8 string, at byte offset 7)
```
Fixes#28195.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#28197
(cherry picked from commit 8aca7b0eb9)
Closesscylladb/scylladb#28208
Consider the following scenario:
1. A table has RF=3 and writes use CL=QUORUM
2. One node is down
3. There is a pending tablet migration from the unavailable node
that is reverted
During the revert, there can be a time window where the pending replica
being cleaned up still accepts writes. This leads to write failures,
as only two nodes (out of four) are able to acknowledge writes.
This patch fixes the issue by adding a barrier to the cleanup_target
tablet transition state, ensuring that the coordinator switches back to
the previous replica set before cleanup is triggered.
Fixes https://github.com/scylladb/scylladb/issues/26512
It's a pre existing issue. Backport is required to all recent 2025.x versions.
- (cherry picked from commit 669286b1d6)
- (cherry picked from commit 67f1c6d36c)
- (cherry picked from commit 6163fedd2e)
Parent PR: #27413Closesscylladb/scylladb#27427
* github.com:scylladb/scylladb:
topology_coordinator: Fix the indentation for the cleanup_target case
topology_coordinator: Add barrier to cleanup_target
test_node_failure_during_tablet_migration: Increase RF from 2 to 3
Fixes#27992
When doing a commit log oversized allocation, we lock out all other writers by grabbing
the _request_controller semaphore fully (max capacity).
We thereafter assert that the semaphore is in fact zero. However, due to how things work
with the bookkeep here, the semaphore can in fact become negative (some paths will not
actually wait for the semaphore, because this could deadlock).
Thus, if, after we grab the semaphore and execution actually returns to us (task schedule),
new_buffer via segment::allocate is called (due to a non-fully-full segment), we might
in fact grab the segment overhead from zero, resulting in a negative semaphore.
The same problem applies later when we try to sanity check the return of our permits.
Fix is trivial, just accept less-than-zero values, and take same possible ltz-value
into account in exit check (returning units)
Added whitebox (special callback interface for sync) unit test that provokes/creates
the race condition explicitly (and reliably).
Closesscylladb/scylladb#27998
(cherry picked from commit a7cdb602e1)
Closesscylladb/scylladb#28097
Currently, tablet allocation intentionally ignores current load (
introduced by the commit #1e407ab) which could cause identical shard
selection when allocating a small number of tablets in the same topology.
When a tablet allocator is asked to allocate N tablets (where N is smaller
than the number of shards on a node), it selects the first N lowest shards.
If multiple such tables are created, each allocator run picks the same
shards, leading to tablet imbalance across shards.
This change initializes the load sketch with the current shard load,
scaled into the [0,1] range, ensuring allocation still remains even
while starting from globally least-loaded shards.
Fixes https://github.com/scylladb/scylladb/issues/27620
Closes https://github.com/scylladb/scylladb/pull/27802Closesscylladb/scylladb#28106
`test_schema_versioning_with_recovery` is currently flaky. It performs
a write with CL=ALL and then checks if the schema version is the same on
all nodes by calling `verify_table_versions_synced`. All nodes are expected
to sync their schema before handling the replica write. The node in
RECOVERY mode should do it through a schema pull, and other nodes should do
it through a group 0 read barrier.
The problem is in `verify_local_schema_versions_synced` that compares the
schema versions in `system.local`. The node in RECOVERY mode updates the
schema version in `system.local` after it acknowledges the replica write
as completed. Hence, the check can fail.
We fix the problem by making the function wait until the schema versions
match.
Note that RECOVERY mode is about to be retired together with the whole
gossip-based topology in 2026.2. So, this test is about to be deleted.
However, we still want to fix it, so that it doesn't bother us in older
branches.
Fixes#23803Closesscylladb/scylladb#28114
(cherry picked from commit 6b5923c64e)
Closesscylladb/scylladb#28175
We currently do it only for a bootstrapping node, which is a bug. The
missing IP can cause an internal error, for example, in the following
scenario:
- replace fails during streaming,
- all live nodes are shut down before the rollback of replace completes,
- all live nodes are restarted,
- live nodes start hitting internal error in all operations that
require IP of the replacing node (like client requests or REST API
requests coming from nodetool).
We fix the bug here, but we do it separately for replace with different
IP and replace with the same IP.
For replace with different IP, we persist the IP -> host ID mapping
in `system.peers` just like for bootstrap. That's necessary, since there
is no other way to determine IP of the replacing node on restart.
For replace with the same IP, we can't do the same. This would require
deleting the row corresponding to the node being replaced from
`system.peers`. That's fine in theory, as that node is permanently
banned, so its IP shouldn't be needed. Unfortunately, we have many
places in the code where we assume that IP of a topology member is always
present in the address map or that a topology member is always present in
the gossiper endpoint set. Examples of such places:
- nodetool operations,
- REST API endpoints,
- `db::hints::manager::store_hint`,
- `group0_voter_handler::update_nodes`.
We could fix all those places and verify that drivers work properly when
they see a node in the token metadata, but not in `system.peers`.
However, that would be too risky to backport.
We take a different approach. We recover IP of the replacing node on
restart based on the state of the topology state machine and
`system.peers` just after loading `system.peers`.
We rely on the fact that group 0 is set up at this point. The only case
where this assumption is incorrect is a restart in the Raft-based
recovery procedure. However, hitting this problem then seems improbable,
and even if it happens, we can restart the node again after ensuring
that no client and REST API requests come before replace is rolled back
on the new topology coordinator. Hence, it's not worth to complicate the
fix (by e.g. looking at the persistent topology state instead of the
in-memory state machine).
Fixes#28057
Backport this PR to all branches as it fixes a problematic bug.
- (cherry picked from commit fc4c2df2ce)
- (cherry picked from commit 4526dd93b1)
- (cherry picked from commit 749b0278e5)
- (cherry picked from commit 0fed9f94f8)
Manually cherry-picked:
- 90b5b2c5f5
- 92b165b8c0
Parent PR: #27435Closesscylladb/scylladb#28098
* https://github.com/scylladb/scylladb:
gossiper: add_saved_endpoint: make generations of excluded nodes negative
test: introduce test_full_shutdown_during_replace
utils: error_injection: allow aborting wait_for_message
raft topology: preserve IP -> ID mapping of a replacing node on restart
pylib/rest_client.py: encode injection name
utils/error_injection: allow to abort `injection_handler::wait_for_message()`
The explanation is in the new comment in `gossiper::add_saved_endpoint`.
We add a test for this change. It's "extremely white-box", but it's better
than nothing.
(cherry picked from commit 0fed9f94f8)
We currently do it only for a bootstrapping node, which is a bug. The
missing IP can cause an internal error, for example, in the following
scenario:
- replace fails during streaming,
- all live nodes are shut down before the rollback of replace completes,
- all live nodes are restarted,
- live nodes start hitting internal error in all operations that
require IP of the replacing node (like client requests or REST API
requests coming from nodetool).
We fix the bug here, but we do it separately for replace with different
IP and replace with the same IP.
For replace with different IP, we persist the IP -> host ID mapping
in `system.peers` just like for bootstrap. That's necessary, since there
is no other way to determine IP of the replacing node on restart.
For replace with the same IP, we can't do the same. This would require
deleting the row corresponding to the node being replaced from
`system.peers`. That's fine in theory, as that node is permanently
banned, so its IP shouldn't be needed. Unfortunately, we have many
places in the code where we assume that IP of a topology member is always
present in the address map or that a topology member is always present in
the gossiper endpoint set. Examples of such places:
- nodetool operations,
- REST API endpoints,
- `db::hints::manager::store_hint`,
- `group0_voter_handler::update_nodes`.
We could fix all those places and verify that drivers work properly when
they see a node in the token metadata, but not in `system.peers`.
However, that would be too risky to backport.
We take a different approach. We recover IP of the replacing node on
restart based on the state of the topology state machine and
`system.peers` just after loading `system.peers`.
We rely on the fact that group 0 is set up at this point. The only case
where this assumption is incorrect is a restart in the Raft-based
recovery procedure. However, hitting this problem then seems improbable,
and even if it happens, we can restart the node again after ensuring
that no client and REST API requests come before replace is rolled back
on the new topology coordinator. Hence, it's not worth to complicate the
fix (by e.g. looking at the persistent topology state instead of the
in-memory state machine).
(cherry picked from commit fc4c2df2ce)
Sometimes it's convenient to use slashes in injection names,
for example my_component/my_method/my_condition. Without quote()
we get 'handler not found' error from Scylla.
(cherry picked from commit 92b165b8c0)
Call discover_staging_sstables in view_update_generator::start() instead
of in the constructor, because the constructor is called during
initialization before sstables are loaded.
The initialization order was changed in 5d1f74b86a and caused this
regression. It means the view update generator won't discover staging
sstables on startup and view updates won't be generated for them. It
also causes issues in sstable cleanup.
view_update_generator::start() is called in a later stage of the
initialization, after sstable loading, so do the discovery of staging
sstables there.
Fixesscylladb/scylladb#27956
(cherry picked from commit 5077b69c06)
Closesscylladb/scylladb#28091
Currently, database::truncate_table_on_all_shards calls the table::can_flush only on the coordinator shard
and therefore it may miss shards with dirty data if the coordinator shard happens to have empty memtables, leading to clearing the memtables with dirty data rather than flushing them.
This change fixes that by making flush safe to be called, even if the memtable list is empty, and calling it on every shard that can flush (i.e. seal_immediate_fn is engaged).
Also, change database_test::do_with_some_data is use random keys instead of hard-coded key names, to reproduce this issue with `snapshot_list_contains_dropped_tables`.
Fixes#27639
* The issue exists since forever and might cause data loss due to wrongly clearing the memtable, so it needs backport to all live versions
- (cherry picked from commit ec4069246d)
- (cherry picked from commit 5be6b80936)
- (cherry picked from commit 0342a24ee0)
- (cherry picked from commit 02ee341a03)
- (cherry picked from commit 2a803d2261)
- (cherry picked from commit 93b827c185)
- (cherry picked from commit ebd667a8e0)
Parent PR: #27643Closesscylladb/scylladb#28071
* https://github.com/scylladb/scylladb:
test: database_test: do_with_some_data: randomize keys
database: truncate_table_on_all_shards: drop outdated TODO comment
database: truncate_table_on_all_shards: consider can_flush on all shards
memtable_list: unify can_flush and may_flush
test: database_test: add test_flush_empty_table_waits_on_outstanding_flush
replica: table, storage_group, compaction_group: add needs_flush
test: database_test: do_with_some_data_in_thread: accept void callback function
The driver must see server_c before we stop server_a, otherwise
there will be no live host in the pool when we attempt to drop
the keyspace:
```
@pytest.mark.asyncio
async def test_not_enough_token_owners(manager: ManagerClient):
"""
Test that:
- the first node in the cluster cannot be a zero-token node
- removenode and decommission of the only token owner fail in the presence of zero-token nodes
- removenode and decommission of a token owner fail in the presence of zero-token nodes if the number of token
owners would fall below the RF of some keyspace using tablets
"""
logging.info('Trying to add a zero-token server as the first server in the cluster')
await manager.server_add(config={'join_ring': False},
property_file={"dc": "dc1", "rack": "rz"},
expected_error='Cannot start the first node in the cluster as zero-token')
logging.info('Adding the first server')
server_a = await manager.server_add(property_file={"dc": "dc1", "rack": "r1"})
logging.info('Adding two zero-token servers')
# The second server is needed only to preserve the Raft majority.
server_b = (await manager.servers_add(2, config={'join_ring': False}, property_file={"dc": "dc1", "rack": "rz"}))[0]
logging.info(f'Trying to decommission the only token owner {server_a}')
await manager.decommission_node(server_a.server_id,
expected_error='Cannot decommission the last token-owning node in the cluster')
logging.info(f'Stopping {server_a}')
await manager.server_stop_gracefully(server_a.server_id)
logging.info(f'Trying to remove the only token owner {server_a} by {server_b}')
await manager.remove_node(server_b.server_id, server_a.server_id,
expected_error='cannot be removed because it is the last token-owning node in the cluster')
logging.info(f'Starting {server_a}')
await manager.server_start(server_a.server_id)
logging.info('Adding a normal server')
await manager.server_add(property_file={"dc": "dc1", "rack": "r2"})
cql = manager.get_cql()
await wait_for_cql_and_get_hosts(cql, [server_a], time.time() + 60)
> async with new_test_keyspace(manager, "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 2} AND tablets = { 'enabled': true }") as ks_name:
test/cluster/test_not_enough_token_owners.py:57:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.14/contextlib.py:221: in __aexit__
await anext(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
manager = <test.pylib.manager_client.ManagerClient object at 0x7f37efe00830>
opts = "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 2} AND tablets = { 'enabled': true }"
host = None
@asynccontextmanager
async def new_test_keyspace(manager: ManagerClient, opts, host=None):
"""
A utility function for creating a new temporary keyspace with given
options. It can be used in a "async with", as:
async with new_test_keyspace(ManagerClient, '...') as keyspace:
"""
keyspace = await create_new_test_keyspace(manager.get_cql(), opts, host)
try:
yield keyspace
except:
logger.info(f"Error happened while using keyspace '{keyspace}', the keyspace is left in place for investigation")
raise
else:
> await manager.get_cql().run_async("DROP KEYSPACE " + keyspace, host=host)
E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.69.108.39:9042 dc1>: ConnectionException('Pool for 127.69.108.39:9042 is shutdown')})
test/cluster/util.py:544: NoHostAvailable
```
Fixes#28011Closesscylladb/scylladb#28040
(cherry picked from commit 34df158605)
Closesscylladb/scylladb#28070
The semaphore has detection and protection against regular resource
leaks, where some resources go unaccounted for and are not released by
the time the semaphore is destroyed. There is no detection or protection
against negative leaks: where resources are "made up" of thin air. This
kind of leaks looks benign at first sight, a few extra resources won't
hurt anyone so long as this is a small amount. But turns out that even a
single extra count resource can defeat a very important anti-deadlock
protection in can_admit_read(): the special case which admits a new
permit regardless of memory resources, when all original count resources
all available. This check uses ==, so if resource > original, the
protection is defeated indefinitely. Instead of just changing == to >=,
we add detection of such negative leaks to signal(), via
on_internal_error_noexcept().
At this time I still don't now how this negative leak happens (the code
doesn't confess), with this detection, hopefully we'll get a clue from
tests or the field. Note that on_internal_error_noexcept() will not
generate a coredump, unless ScyllaDB is explicitely configured to do so.
In production, it will just generate an error log with a backtrace.
The detection also clams the _resources to _initial_resources, to
prevent any damage from the negativae leak.
I just noticed that there is no unit test for the deadlock protection
described above, so one is added in this PR, even if only loosely
related to the rest of the patch.
Fixes: SCYLLADB-163
Closesscylladb/scylladb#27764
(cherry picked from commit e4da0afb8d)
Closesscylladb/scylladb#28002
With randomized keys, and since we're inserting only 2 keys,
it is possible that they would end up owned only by a single shard,
reproducing #27639 in snapshot_list_contains_dropped_tables.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit ebd667a8e0)
The comment was added in 83323e155e
Since then, table::seal_active_memtable was improved to guarantee
waiting on oustanding flushes on success (See d55a2ac762), so
we can remove this TODO comment (it also not covered by any issue
so nobody is planned to ever work on it).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 93b827c185)
can_flush might return a different value for each shard
so check it right before deciding whether to flush or clear a memtable
shard.
Note that under normal condition can_flush would always return true
now that it checks only the presence of the seal memtable function
rather than check memtable_list::empty().
Fixes#27639
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 2a803d2261)
Now that we have a unit test proving that it's safe to flush an
empty memtable list there is no need to distinguish between
may_flush and can_flush.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 02ee341a03)
Test that table::flush waits on outstanding flushes, even if the active memtable is empty
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0342a24ee0)
Table needs flush if not all its memtable lists are empty.
To be used in the next patch for a unit test.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 5be6b80936)
Many test cases already assume `func` is being called a seastar
thread and although the function they pass returns a (ready) future,
it serves no purpose other than to conform to the interface.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit ec4069246d)
Service levels cache is empty after upgrade to consistent topology
if no mutations are commited to `system.service_levels_v2` or rolling
restart is not done.
To fix the bug, this commit adds service levels cache reloading after
upgrading the SL data accessor to v2 in `storage_service::topology_state_load()`.
Fixes SCYLLADB-90
(cherry picked from commit be16e42cb0)
before doing migration to raft
There is no need to call `service_level_controller::upgrade_to_v2()`
on every topology state load, we only need to do it once.
(cherry picked from commit 53d0a2b5dc)
Fixes#27077
Multiple points can be clarified relating to:
* Names of each sub-procedure could be clearer
* Requirements of each sub-procedure could be clearer
* Clarify which keyspaces are relevant and how to check them
* Fix typos in keyspace name
Closesscylladb/scylladb#26855
(cherry picked from commit a0734b8605)
Closesscylladb/scylladb#27154
Currently, all apis that start a compaction have two versions:
synchronous and asynchronous. They share most of the implementation,
but some checks and params have diverged.
Unify the handlers of synchronous and asynchronous cleanup, major
compaction, and upgrade_sstables.
Fixes: https://github.com/scylladb/scylladb/issues/26715.
Requires backports to all live versions
- (cherry picked from commit 12dabdec66)
- (cherry picked from commit 044b001bb4)
- (cherry picked from commit fdd623e6bc)
Parent PR: #26746Closesscylladb/scylladb#26886
* github.com:scylladb/scylladb:
api: storage_service: tasks: unify upgrade_sstable
api: storage_service: tasks: force_keyspace_cleanup
pi: storage_service: tasks: unify force_keyspace_compaction
Currently, batchlog replay is considered successful even if all batches fail
to be sent (they are replayed later). However, repair requires all batches
to be sent successfully. Currently, if batchlog isn't cleared, the repair never
learns and updates the repair_time. If GC mode is set to "repair", this means
that the tombstones written before the repair_time (minus propagation_delay)
can be GC'd while not all batches were replied.
Consider a scenario:
- Table t has a row with (pk=1, v=0);
- There is an entry in the batchlog that sets (pk=1, v=1) in table t;
- The row with pk=1 is deleted from table t;
- Table t is repaired:
- batchlog reply fails;
- repair_time is updated;
- propagation_delay seconds passes and the tombstone of pk=1 is GC'd;
- batchlog is replayed and (pk=1, v=1) inserted - data resurrection!
Do not update repair_time if sending any batch fails. The data is still repaired.
For tablet repair the repair runs, but at the end the exception is passed
to topology coordinator. Thanks to that the repair_time isn't updated.
The repair request isn't removed as well, due to which the repair will need
to rerun.
Apart from that, a batch is removed from the batchlog if its version is invalid
or unknown. The condition on which we consider a batch too fresh to replay
is updated to consider propagation_delay.
Fixes: https://github.com/scylladb/scylladb/issues/24415
Data resurrection fix; needs backport to all versions
- (cherry picked from commit 502b03dbc6)
- (cherry picked from commit 904183734f)
- (cherry picked from commit 7f20b66eff)
- (cherry picked from commit e1b2180092)
- (cherry picked from commit d436233209)
- (cherry picked from commit 1935268a87)
- (cherry picked from commit 6fc43f27d0)
Parent PR: #26319Closesscylladb/scylladb#26762
* github.com:scylladb/scylladb:
repair: throw if flush failed in get_flush_time
db: fix indentation
test: add reproducer for data resurrection
repair: fail tablet repair if any batch wasn't sent successfully
db/batchlog_manager: fix making decision to skip batch replay
db: repair: throw if replay fails
db/batchlog_manager: delete batch with incorrect or unknown version
db/batchlog_manager: coroutinize replay_all_failed_batches
Replace -1 with 0 for the liveness check operation to avoid triggering digest validation failures. This prevents rare fatal errors when the cluster is recovering and ensures the test does not violate append_seq invariants.
The value -1 was causing invalid digest results in the append_seq structure, leading to assertion failures. This could happen when the sentinel value was the first (or only) element being appended, resulting in a digest that did not match the expected value.
By using 0 instead, we ensure that the digest calculations remain valid and consistent with the expected behavior of the test.
The specific value of the sentinel is not important, as long as it is a valid elem_t that does not violate the invariants of the append_seq structure. In particular, the sentinel value is typically used only when no valid result is received from any server in the current loop iteration, in which case the loop will retry.
Fixes: scylladb/scylladb#27307
Backporting to active branches - this is a test-only fix (low risk) for a flaky test that exists in older branches (thus affects the CI of active branches).
- (cherry picked from commit 3af5183633)
- (cherry picked from commit 4ba3e90f33)
Parent PR: #28010Closesscylladb/scylladb#28037
* https://github.com/scylladb/scylladb:
test/raft: use valid sentinel in liveness check to prevent digest errors
test/raft: improve debugging in randomized_nemesis_test
test/raft: improve reporting in the randomized_nemesis_test digest functions
Replace -1 with 0 for the liveness check operation to avoid triggering
digest validation failures. This prevents rare fatal errors when the
cluster is recovering and ensures the test does not violate append_seq
invariants.
The value -1 was causing invalid digest results in the append_seq
structure, leading to assertion failures. This could happen when the
sentinel value was the first (or only) element being appended, resulting
in a digest that did not match the expected value.
By using 0 instead, we ensure that the digest calculations remain valid
and consistent with the expected behavior of the test.
The specific value of the sentinel is not important, as long as it is
a valid elem_t that does not violate the invariants of the append_seq
structure. In particular, the sentinel value is typically used only
when no valid result is received from any server in the current loop
iteration, in which case the loop will retry.
Fixes: scylladb/scylladb#27307
(cherry picked from commit 4ba3e90f33)
Move the post-condition check before the assertion to ensure it is
always executed first. Before, the wrong value could be passed to the
digest_remove assertion, making the pre-check trigger there instead of
the post-check as expected.
Also, add a check in the append_seq constructor to ensure that the
digest value is valid when creating an append_seq object.
(cherry picked from commit 3af5183633)