Files
seaweedfs/weed
chrislu f18ff58476 fix: Critical offset persistence race condition causing message loss
This fix addresses the root cause of the 28% message loss detected during
consumer group rebalancing with 2 consumers:

CHANGES:
1. **OffsetCommit**: Don't silently ignore SMQ persistence errors
   - Previously, if offset persistence to SMQ failed, we'd continue anyway
   - Now we return an error code so client knows offset wasn't persisted
   - This prevents silent data loss during rebalancing

2. **OffsetFetch**: Add retry logic with exponential backoff
   - During rebalancing, brief race condition between commit and persistence
   - Retry offset fetch up to 3 times with 5-10ms delays
   - Ensures we get the latest committed offset even during rebalances

3. **Enhanced Logging**: Critical errors now logged at ERROR level
   - SMQ persistence failures are logged as CRITICAL with detailed context
   - Helps diagnose similar issues in production

ROOT CAUSE:
When rebalancing occurs, consumers query OffsetFetch for their next offset.
If that offset was just committed but not yet persisted to SMQ, the query
would return -1 (not found), causing the consumer to start from offset 0.
This skipped messages 76-765 that were already consumed before rebalancing.

IMPACT:
- Fixes message loss during normal rebalancing operations
- Ensures offset persistence is mandatory, not optional
- Addresses the 28% data loss detected in comprehensive load tests

TESTING:
- Single consumer test should show 0 missing (unchanged)
- Dual consumer test should show 0 missing (was 3,413 missing)
- Rebalancing no longer causes offset gaps
2025-10-16 19:39:44 -07:00
..
2025-10-13 18:05:17 -07:00
2025-10-14 13:04:33 -07:00
2025-10-13 18:05:17 -07:00
2025-10-13 18:05:17 -07:00
2025-10-15 22:56:47 -07:00
2025-10-13 18:05:17 -07:00
2025-08-31 23:31:28 -07:00
2025-08-22 01:15:42 -07:00
2025-10-15 16:08:21 -07:00
2025-10-13 18:05:17 -07:00
2024-06-25 09:18:11 -07:00
2025-10-13 18:05:17 -07:00
2025-08-30 11:15:48 -07:00
2025-10-13 18:05:17 -07:00
2024-02-14 08:26:38 -08:00
2025-10-13 18:05:17 -07:00
2025-10-13 18:05:17 -07:00
2025-10-13 18:05:17 -07:00
2025-10-13 18:05:17 -07:00
2025-07-19 21:43:34 -07:00
2025-03-17 23:13:27 -07:00