mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-27 03:45:11 +00:00
gossiper: check for a race condition in do_apply_state_locally
In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change 1. adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. 2. Removes xfail from the test_gossiper_race test since the issue is now fixed. 3. Adds exception handling in `do_shadow_round` to skip responses from nodes that sent an empty host ID. This re-applies the commit13392a40d4that was reverted in46aa59fe49, after fixing the issues that caused the CI to fail. Fixes: scylladb/scylladb#25702 Fixes: scylladb/scylladb#25621 Ref: scylladb/scylla-enterprise#5613 (cherry picked from commitf08df7c9d7)
This commit is contained in:
committed by
GitHub Action
parent
e8b903979e
commit
e157e8577e
@@ -15,7 +15,6 @@ from test.pylib.manager_client import ManagerClient
|
||||
|
||||
@pytest.mark.asyncio
|
||||
@skip_mode('release', 'error injections are not supported in release mode')
|
||||
@pytest.mark.xfail(reason="https://github.com/scylladb/scylladb/issues/25621")
|
||||
async def test_gossiper_race_on_decommission(manager: ManagerClient):
|
||||
"""
|
||||
Test for gossiper race scenario (https://github.com/scylladb/scylladb/issues/25621):
|
||||
|
||||
Reference in New Issue
Block a user