mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-06-09 18:32:43 +00:00
d1b1338558
fix(wdclient): prevent stale cache fallback for empty volume locations ## Problem During Kubernetes pod restarts, volume servers temporarily disconnect and their locations are removed from vidMap. The deleteLocation function leaves an empty array [] in vid2Locations map instead of removing the key entirely. GetLocations() was checking 'if found && len(locations) > 0', which would fail for empty arrays and fall back to the cache chain, returning STALE locations from before the restart. This caused S3 gateway to try connecting to old pod IPs that no longer exist, resulting in connection timeouts and hanging registry sync jobs. Example timeline: 1. Volume pod at 10.131.1.28:8081 registers volumes 10,12 2. S3 gateway caches: vid2Locations[10] = [10.131.1.28:8081] 3. Pod restarts, gets new IP 10.131.1.65:8081 4. Master sends delete → vid2Locations[10] = [] (empty, but key exists) 5. BUG: GetLocations(10) sees found=true, len=0 → falls back to cache 6. Returns stale 10.131.1.28:8081 instead of waiting for new location 7. S3 requests timeout trying to reach unreachable old IP ## Solution Distinguish between two cases: - found=true, locations=[] : Volume explicitly has no locations (e.g. restart) → Return nil, false (no fallback to cache) - found=false : Volume never seen in current map → Check cache (preserve cache benefits for unknown volumes) An empty array explicitly means 'this volume currently has no locations', which is semantically different from 'volume unknown'. Don't fall back to stale cache for explicitly empty volumes. ## Testing Added comprehensive tests: - TestGetLocationsEmptyArrayNoFallback: Verifies empty arrays don't use cache - TestGetLocationsUnknownVolumeUsesCache: Verifies unknown volumes still use cache - All existing tests pass ## Impact Fixes registry sync job hangs during SeaweedFS upgrades/restarts. S3 gateway will now correctly wait for updated volume locations instead of using stale cached IPs. Related: OutSystems.SeaWeedfs Helm chart, vega cluster incident 2026-06-24