mirror of
https://github.com/scylladb/scylladb.git
synced 2026-06-04 05:53:13 +00:00
In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change 1. adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. 2. Removes xfail from the test_gossiper_race test since the issue is now fixed. 3. Adds exception handling in `do_shadow_round` to skip responses from nodes that sent an empty host ID. This re-applies the commit13392a40d4that was reverted in46aa59fe49, after fixing the issues that caused the CI to fail. Fixes: scylladb/scylladb#25702 Fixes: scylladb/scylladb#25621 Ref: scylladb/scylla-enterprise#5613 (cherry picked from commitf08df7c9d7)