mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-22 01:31:34 +00:00

Files

Chris Lu c4e1885053 fix(ec): honor disk_id in ReceiveFile so EC shards respect admin placement (#9184 ) (#9185 )

* test(volume_server): reproduce #9184 EC ReceiveFile disk-placement bug

The plugin-worker EC task sends shards via ReceiveFile, which picks
Locations[0] as the target directory regardless of the admin planner's
TargetDisk assignment. ReceiveFileInfo has no disk_id field, so there
is no wire channel to honor the plan.

Adds StartSingleVolumeClusterWithDataDirs to the integration framework
so tests can launch a volume server with N data directories. The new
repro asserts the current (buggy) behavior: sending three distinct EC
shards via ReceiveFile leaves all three files in dir[0] and the other
dirs empty. When the fix adds disk_id to ReceiveFileInfo, this
assertion must flip to verify the planned placement is respected.

* fix(ec): honor disk_id in ReceiveFile so EC shards respect admin placement

Before this change, VolumeServer.ReceiveFile for EC shards always
selected the first HDD location (Locations[0]). The plugin-worker EC
task had no way to pass the admin planner's per-shard disk
assignment — ReceiveFileInfo carried no disk_id field — so every
received EC shard piled onto a single disk per destination server.
On multi-disk servers this caused uneven load (one disk absorbing all
EC shard I/O), frequent ENOSPC retries, and a growing EC backlog
under sustained ingest (see issue #9184).

Changes:
- proto: add disk_id to ReceiveFileInfo, mirroring
  VolumeEcShardsCopyRequest.disk_id.
- worker: DistributeEcShards tracks the planner-assigned disk per
  shard; sendShardFileToDestination forwards that disk id. Metadata
  files (ecx/ecj/vif) inherit the disk of the first data shard
  targeting the same node so they land next to the shards.
- server: ReceiveFile honors disk_id when > 0 with bounds
  validation; disk_id=0 (unset) falls back to the same
  auto-selection pattern as VolumeEcShardsCopy (prefer disk that
  already has shards for this volume, then any HDD with free space,
  then any location with free space).

Tests updated:
- TestReceiveFileEcShardHonorsDiskID asserts three shards sent with
  disk_id={1,2,0} land on data dirs 1, 2, and 0 respectively.
- TestReceiveFileEcShardRejectsInvalidDiskID pins the out-of-range
  disk_id rejection path.

* fix(volume-rust): honor disk_id in ReceiveFile for EC shards

Mirror the Go-side change: when disk_id > 0 place the EC shard on the
requested disk; when unset, auto-select with the same preference order
as volume_ec_shards_copy (disk already holding shards, then any HDD,
then any disk).

* fix(volume): compare disk_id as uint32 to avoid 32-bit overflow

On 32-bit Go builds `int(fileInfo.DiskId) >= len(Locations)` can wrap a
high-bit uint32 to a negative int, bypassing the bounds check before the
index operation. Compare in the uint32 domain instead.

* test(ec): fail invalid-disk_id test on transport error

Previously a transport-level error from CloseAndRecv silently passed the
test by returning early, masking any real gRPC failure. Fail loudly so
only the structured ReceiveFileResponse rejection path counts as a pass.

* docs(test): explain why DiskId=0 auto-selects dir 0 in EC placement test

Documents the load-bearing assumption that shards are never mounted in
this test, so loc.FindEcVolume always returns false and auto-select
falls through to the first HDD. Saves future readers from re-deriving
the expected directory for the DiskId=0 case.

* fix(test): preserve baseDir/volume path for single-dir clusters

StartSingleVolumeClusterWithDataDirs started naming the data directory
volume0 even in the dataDirCount=1 case, which broke Scrub tests that
reach into baseDir/volume via CorruptDatFile / CorruptEcShardFile /
CorruptEcxFile. Keep the legacy name for single-dir clusters; only use
the indexed "volumeN" layout when multiple disks are requested.

2026-04-22 10:30:13 -07:00

framework

fix(ec): honor disk_id in ReceiveFile so EC shards respect admin placement (#9184 ) (#9185 )

2026-04-22 10:30:13 -07:00

grpc

fix(ec): honor disk_id in ReceiveFile so EC shards respect admin placement (#9184 ) (#9185 )

2026-04-22 10:30:13 -07:00

http

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

loadtest

go fmt

2026-04-10 17:31:14 -07:00

matrix

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

merge

Adds volume.merge command with deduplication and disk-based backend (#8441 )

2026-02-25 10:12:09 -08:00

rust

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

DEV_PLAN.md

Add volume server integration test suite and CI workflow (#8322 )

2026-02-13 00:40:56 -08:00

Makefile

Add volume server integration test suite and CI workflow (#8322 )

2026-02-13 00:40:56 -08:00

README.md

Add volume server integration test suite and CI workflow (#8322 )

2026-02-13 00:40:56 -08:00

README.md

Volume Server Integration Tests

This package contains integration tests for SeaweedFS volume server HTTP and gRPC APIs.

Run Tests

Run tests from repo root:

go test ./test/volume_server/... -v

If a weed binary is not found, the harness will build one automatically.

Optional environment variables

WEED_BINARY: explicit path to the weed executable (disables auto-build).
VOLUME_SERVER_IT_KEEP_LOGS=1: keep temporary test directories and process logs.

Current scope (Phase 0)

Shared cluster/framework utilities
Matrix profile definitions
Initial HTTP admin endpoint checks
Initial gRPC state/status checks

More API coverage is tracked in /Users/chris/dev/seaweedfs2/test/volume_server/DEV_PLAN.md.