Sometimes rdeStaging reduce shards die after the lock is acquired. When that happens - the (automatic) rerun of the shard fails because the lock is in place causing that specific TLD to not stage and await the next call to rdeStaging.
rdeStaging runs every 4 hours, but the current lock lives for 5 hours.
This means that on the next rerun of rdeStaging, the lock still hasn't timed out so it fails again, and we have to wait for the subsequent run - a total delay of 8 hours.
Shortening the lock timeout to be less than the 4 hours rdeStaging rerun time solves this issue.
NOTE: This is just a "quick patch" solution. To really fix the rdeStaging failure we need to fix the lock itself.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=166102387