seaweedfs/weed at 6eb3bc46bd1e5320d09e55ca01ca9b364f6eaefa - seaweedfs - Anomalous Gitea

mirrors/seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-07-27 18:43:43 +00:00

Files

T

History

Chris Lu 6eb3bc46bd fix(kafka): close late-joiner orphan race in consumer-group rebalance

The CI-observed orphan (one consumer with empty assignment after
`TestConsumerGroups/Rebalancing/MultipleConsumersJoin`) came from a
race in the group coordinator: once the leader had taken its member-
list snapshot in its JoinGroup response, a new member could still
arrive before the leader's SyncGroup landed. The gateway accepted the
stale SyncGroup, moved to Stable, and the late joiner's own SyncGroup
then served an empty Assignment from the Stable-state path — leaving
it silently unassigned with no further rebalance to fix it.

Three changes in `handleJoinGroup` / `handleSyncGroup` close the race:

- Late join during `CompletingRebalance` bumps the generation and
  resets to `PreparingRebalance`, so the leader's in-flight SyncGroup
  fails its generation check and the round restarts with the new
  member in the snapshot.
- SyncGroup generation-mismatch returns `REBALANCE_IN_PROGRESS` (not
  `ILLEGAL_GENERATION`) while the group is rebalancing, mirroring the
  existing heartbeat fix — otherwise Sarama's `Consume()` tears down
  on the stale SyncGroup instead of retrying.
- Leader SyncGroup verifies its assignment covers every current
  member and rejects with `REBALANCE_IN_PROGRESS` otherwise, as a
  belt-and-suspenders catch for joins that slip in between the
  leader's JoinGroup reply and its SyncGroup without going through
  `CompletingRebalance` state.

Verified: baseline reliably reproduces the orphan locally; with the
fix `TestConsumerGroups` passes end-to-end (53s total,
`MultipleConsumersJoin` 15-17s) and a 10-iteration stress loop against
the same gateway is 10/10 green with every consumer getting exactly
one partition.

2026-04-20 14:39:37 -07:00

..

fix(admin): list all masters and dedupe EC file counts in dashboard (#9093 )

2026-04-15 22:28:54 -07:00

fix(wdclient,volume): compare master leader with ServerAddress.Equals (#9089 )

2026-04-15 12:29:31 -07:00

peer chunk sharing 3/8: mount peer-serve HTTP endpoint (#9132 )

2026-04-18 20:03:34 -07:00

fix(s3): preserve exact policy document in embedded IAM put/get-user-policy (#9025 )

2026-04-10 18:09:22 -07:00

peer chunk sharing 2/8: filer mount registry (#9131 )

2026-04-18 20:03:23 -07:00

fix: resolve gRPC DNS resolution issues in Kubernetes #8384 (#8387 )

2026-02-19 15:46:02 -08:00

go fmt

2026-04-10 17:31:14 -07:00

feat(iam): implement group inline policy actions (#8992 )

2026-04-08 15:57:04 -07:00

fix(iam): preserve actions/resources in GetUserPolicy fallback (#9009 )

2026-04-09 11:48:51 -07:00

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00

fix(mount): retry saveEntry on transient filer errors; stop mismapping Canceled to EIO (#9141 )

2026-04-20 00:31:37 -07:00

fix(kafka): close late-joiner orphan race in consumer-group rebalance

2026-04-20 14:39:37 -07:00

Adjust rename events metadata format (#8854 )

2026-03-30 18:25:11 -07:00

fix(mount): remove fid pool to stop master over-allocating volumes (#9111 )

2026-04-16 15:51:13 -07:00

peer chunk sharing 1/8: proto definitions (#9130 )

2026-04-18 20:02:55 -07:00

fix(plugin): remove Min Volume Age field from vacuum plugin worker config (#9095 )

2026-04-16 01:35:12 -07:00

fix(weed/query/engine): check for nil pointers (#9114 )

2026-04-16 16:27:55 -07:00

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00

go fmt

2026-04-10 17:31:14 -07:00

fix(s3api): self-heal stale .versions latest-version pointer on read (#9125 )

2026-04-17 14:57:59 -07:00

fix(sync): use per-cluster TLS for HTTP volume connections in filer.sync (#8974 )

2026-04-07 14:11:44 -07:00

[nfs] Add NFS (#9067 )

2026-04-14 20:48:24 -07:00

peer chunk sharing 2/8: filer mount registry (#9131 )

2026-04-18 20:03:23 -07:00

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00

fix(shell): volume.fsck keeps going past a single broken chunk manifest (#9140 )

2026-04-19 23:06:28 -07:00

…

Export master_disconnections metrics on volume servers. (#9104 )

2026-04-17 15:15:26 -07:00

fix(volume): keep vacuum running past dangling .idx entries (#9115 )

2026-04-16 22:01:34 -07:00

Prevent split-brain: Persistent ClusterID and Join Validation (#8022 )

2026-01-18 14:02:34 -08:00

fix(master): eagerly remove volume from writable when assign hits limit (#9108 )

2026-04-16 12:50:30 -07:00

4.21

2026-04-19 14:38:29 -07:00

fix(wdclient,volume): compare master leader with ServerAddress.Equals (#9089 )

2026-04-15 12:29:31 -07:00

fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C (#9112 )

2026-04-16 16:11:01 -07:00

Makefile

Move SQL engine and PostgreSQL server to their own binaries (#8417 )

2026-02-23 16:27:08 -08:00

weed.go

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00