mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-29 13:10:21 +00:00
* admin: report file and delete counts for EC volumes The admin bucket size fix (#9058) left object counts at zero for EC-encoded data because VolumeEcShardInformationMessage carried no file count. Billing/monitoring dashboards therefore still under-report objects once a bucket is EC-encoded. Thread file_count and delete_count end-to-end: - Add file_count/delete_count to VolumeEcShardInformationMessage (proto fields 8 and 9) and regenerate master_pb. - Compute them lazily on volume servers by walking the .ecx index once per EcVolume, cache on the struct, and keep the cache in sync inside DeleteNeedleFromEcx (distinguishing live vs already-tombstoned entries so idempotent deletes do not drift the counts). - Populate the new proto fields from EcVolume.ToVolumeEcShardInformationMessage and carry them through the master-side EcVolumeInfo / topology sync. - Aggregate in admin collectCollectionStats, deduping per volume id: every node holding shards of an EC volume reports the same counts, so summing across nodes would otherwise multiply the object count by the number of shard holders. Regression tests cover the initial .ecx walk, live/tombstoned delete bookkeeping (including idempotent and missing-key cases), and the admin dedup path for an EC volume reported by multiple nodes. * ec: include .ecj journal in EcVolume delete count The initial delete count only reflected .ecx tombstones, missing any needle that was journaled in .ecj but not yet folded into .ecx — e.g. on partial recovery. Expand initCountsLocked to take the union of .ecx tombstones and .ecj journal entries, deduped by needle id, so: - an id that is both tombstoned in .ecx and listed in .ecj counts once - a duplicate .ecj entry counts once - an .ecj id with a live .ecx entry is counted as deleted (not live) - an .ecj id with no matching .ecx entry is still counted Covered by TestEcVolumeFileAndDeleteCountEcjUnion. * ec: report delete count authoritatively and tombstone once per delete Address two issues with the previous EcVolume file/delete count work: 1. The delete count was computed lazily on first heartbeat and mixed in a .ecj-union fallback to "recover" partial state. That diverged from how regular volumes report counts (always live from the needle map) and had drift cases when .ecj got reconciled. Replace with an eager walk of .ecx at NewEcVolume time, maintained incrementally on every DeleteNeedleFromEcx call. Semantics now match needle_map_metric: FileCount is the total number of needles ever recorded in .ecx (live + tombstoned), DeleteCount is the tombstones — so live = FileCount - DeleteCount. Drop the .ecj-union logic entirely. 2. A single EC needle delete fanned out to every node holding a replica of the primary data shard and called DeleteNeedleFromEcx on each, which inflated the per-volume delete total by the replica factor. Rewrite doDeleteNeedleFromRemoteEcShardServers to try replicas in order and stop at the first success (one tombstone per delete), and only fall back to other shards when the primary shard has no home (ErrEcShardMissing sentinel), not on transient RPC errors. Admin aggregation now folds EC counts correctly: FileCount is deduped per volume id (every shard holder has an identical .ecx) and DeleteCount is summed across nodes (each delete tombstones exactly one node). Live object count = deduped FileCount - summed DeleteCount. Tests updated to match the new semantics: - EC volume counts seed FileCount as total .ecx entries (live + tombstoned), DeleteCount as tombstones. - DeleteNeedleFromEcx keeps FileCount constant and increments DeleteCount only on live->tombstone transitions. - Admin dedup test uses distinct per-node delete counts (5 + 3 + 2) to prove they're summed, while FileCount=100 is applied once. * ec: test fixture uses real vid; admin warns on skewed ec counts - writeFixture now builds the .ecx/.ecj/.ec00/.vif filenames from the actual vid passed in, instead of hardcoding "_1". The existing tests all use vid=1 so behaviour is unchanged, but the helper no longer silently diverges from its documented parameter. - collectCollectionStats logs a glog warning when an EC volume's summed delete count exceeds its deduped file count, surfacing the anomaly (stale heartbeat, counter drift, etc.) instead of silently dropping the volume from the object count. * ec: derive file/delete counts from .ecx/.ecj file sizes seedCountsFromEcx walked the full .ecx index at volume load, which is wasted work: .ecx has fixed-size entries (NeedleMapEntrySize) and .ecj has fixed-size deletion records (NeedleIdSize), so both counts are pure file-size arithmetic. fileCount = ecxFileSize / NeedleMapEntrySize deleteCount = ecjFileSize / NeedleIdSize Rip out the cached counters, countsLock, seedCountsFromEcx, and the recordDelete helper. Track ecjFileSize directly on the EcVolume struct, seed it from Stat() at load, and bump it on every successful .ecj append inside DeleteNeedleFromEcx under ecjFileAccessLock. Skip the .ecj write entirely when the needle is already tombstoned so the derived delete count stays idempotent on repeat deletes. Heartbeats now compute counts in O(1). Tests updated: the initial fixture pre-populates .ecj with two ids to verify the file-size derivation end-to-end, and the delete test keeps its idempotent-re-delete / missing-needle invariants (unchanged externally, now enforced by the early return rather than a cache guard). * ec: sync Rust volume server with Go file/delete count semantics Mirror the Go-side EC file/delete count work in the Rust volume server so mixed Go/Rust clusters report consistent bucket object counts in the admin dashboard. - Add file_count (8) and delete_count (9) to the Rust copy of VolumeEcShardInformationMessage (seaweed-volume/proto/master.proto). - EcVolume gains ecj_file_size, seeded from the journal's metadata on open and bumped inside journal_delete on every successful append. - file_and_delete_count() returns counts derived in O(1) from ecx_file_size / NEEDLE_MAP_ENTRY_SIZE and ecj_file_size / NEEDLE_ID_SIZE, matching Go's FileAndDeleteCount. - to_volume_ec_shard_information_messages populates the new proto fields instead of defaulting them to zero. - mark_needle_deleted_in_ecx now returns a DeleteOutcome enum (NotFound / AlreadyDeleted / Tombstoned) so journal_delete can skip both the .ecj append and the size bump when the needle is missing or already tombstoned, keeping the derived delete_count idempotent on repeat or no-op deletes. - Rust's EcVolume::new no longer replays .ecj into .ecx on load. Go's RebuildEcxFile is only called from specific decode/rebuild gRPC handlers, not on volume open, and replaying on load was hiding the deletion journal from the new file-size-derived delete counter. rebuild_ecx_from_journal is kept as dead_code for future decode paths that may want the same replay semantics. Also clean up the Go FileAndDeleteCount to drop unnecessary runtime guards against zero constants — NeedleMapEntrySize and NeedleIdSize are compile-time non-zero. test_ec_volume_journal updated to pre-populate the .ecx with the needles it deletes, and extended to verify that repeat and missing-id deletes do not drift the derived counts. * ec: document enterprise-reserved proto field range on ec shard info Both OSS master.proto copies now note that fields 10-19 are reserved for future upstream additions while 20+ are owned by the enterprise fork. Enterprise already pins data_shards/parity_shards at 20/21, so keeping OSS additions inside 8-19 avoids wire-level collisions for mixed deployments. * ec(rust): resolve .ecx/.ecj helpers from ecx_actual_dir ecx_file_name() and ecj_file_name() resolved from self.dir_idx, but new() opens the actual files from ecx_actual_dir (which may fall back to the data dir when the idx dir does not contain the index). After a fallback, read_deleted_needles() and rebuild_ecx_from_journal() would read/rebuild the wrong (nonexistent) path while heartbeats reported counts from the file actually in use — silently dropping deletes. Point idx_base_name() at ecx_actual_dir, which is initialized to dir_idx and only diverges after a successful fallback, so every call site agrees with the file new() has open. The pre-fallback call in new() (line 142) still returns the dir_idx path because ecx_actual_dir == dir_idx at that point. Update the destroy() sweep to build the dir_idx cleanup paths explicitly instead of leaning on the helpers, so post-fallback stale files in the idx dir are still removed. * ec: reset ecj size after rebuild; rollback ecx tombstone on ecj failure Two EC delete-count correctness fixes applied symmetrically to Go and Rust volume servers. 1. rebuild_ecx_from_journal (Rust) now sets ecj_file_size = 0 after recreating the empty journal, matching the on-disk truth. Previously the cached size still reflected the pre-rebuild journal and file_and_delete_count() would keep reporting stale delete counts. The Go side has no equivalent bug because RebuildEcxFile runs in an offline helper that does not touch an EcVolume struct. 2. DeleteNeedleFromEcx / journal_delete used to tombstone the .ecx entry before writing the .ecj record. If the .ecj append then failed, the needle was permanently marked deleted but the heartbeat-reported delete_count never advanced (it is derived from .ecj file size), and a retry would see AlreadyDeleted and early- return, leaving the drift permanent. Both languages now capture the entry's file offset and original size bytes during the mark step, attempt the .ecj append, and on failure roll the .ecx tombstone back by writing the original size bytes at the known offset. A rollback that itself errors is logged (glog / tracing) but cannot re-sync the files — this is the same failure mode a double disk error would produce, and is unavoidable without a full on-disk transaction log. Go: wrap MarkNeedleDeleted in a closure that captures the file offset into an outer variable, then pass the offset + oldSize to the new rollbackEcxTombstone helper on .ecj seek/write errors. Rust: DeleteOutcome::Tombstoned now carries the size_offset and a [u8; SIZE_SIZE] copy of the pre-tombstone size field. journal_delete destructures on Tombstoned and calls restore_ecx_size on .ecj append failure. * test(ec): widen admin /health wait to 180s for cold CI TestEcEndToEnd starts master, 14 volume servers, filer, 2 workers and admin in sequence, then waited only 60s for admin's HTTP server to come up. On cold GitHub runners the tail of the earlier subprocess startups eats most of that budget and the wait occasionally times out (last hit on run 24374773031). The local fast path is still ~20s total, so the bump only extends the timeout ceiling, not the happy path. * test(ec): fork volume servers in parallel in TestEcEndToEnd startWeed is non-blocking (just cmd.Start()), so the per-process fork + mkdir + log-file-open overhead for 14 volume servers was serialized for no reason. On cold CI disks that overhead stacks up and eats into the subsequent admin /health wait, which is how run 24374773031 flaked. Wrap the volume-server loop in a sync.WaitGroup and guard runningCmds with a mutex so concurrent appends are safe. startWeed still calls t.Fatalf on failure, which is fine from a goroutine for a fatal test abort; the fail-fast isn't something we rely on for precise ordering. * ec: fsync ecx before ecj, truncate on failure, harden rebuild Four correctness fixes covering both volume servers. 1. Durability ordering (Go + Rust). After marking the .ecx tombstone we now fsync .ecx before touching .ecj, so a crash between the two files cannot leave the journal with an entry for a needle whose tombstone is still sitting in page cache. Once the fsync returns, the tombstone is the source of truth: reads see "deleted", delete_count may under-count by one (benign, idempotent retries) but never over-reports. If the fsync itself fails we restore the original size bytes and surface the error. The .ecj append is then followed by its own Sync so the reported delete_count matches the on-disk journal once the write returns. 2. .ecj truncation on append failure. write_all may have extended the journal on disk before sync_all / Sync errors out, leaving the cached ecj_file_size out of sync with the physical length and drifting delete_count permanently after restart. Both languages now capture the pre-append size, truncate the file back via set_len / Truncate on any write or sync failure, and only then restore the .ecx tombstone. Truncation errors are logged — same-fd length resets cannot realistically fail — but cannot themselves re-sync the files. 3. Atomic rebuild_ecx_from_journal (Rust, dead code today but wired up on any future decode path). Previously a failed mark_needle_deleted_in_ecx call was swallowed with `let _ = ...` and the journal was still removed, silently losing tombstones. We now bubble up any non-NotFound error, fsync .ecx after the whole replay succeeds, and only then drop and recreate .ecj. NotFound is still ignored (expected race between delete and encode). 4. Missing-.ecx hardening (Rust). mark_needle_deleted_in_ecx used to return Ok(NotFound) when self.ecx_file was None, hiding a closed or corrupt volume behind what looks like an idempotent no-op. It now returns an io::Error carrying the volume id so callers (e.g. journal_delete) fail loudly instead. Existing Go and Rust EC test suites stay green. * ec: make .ecx immutable at runtime; track deletes in memory + .ecj Refactors both volume servers so the sealed sorted .ecx index is never mutated during normal operation. Runtime deletes are committed to the .ecj deletion journal and tracked in an in-memory deleted-needle set; read-path lookups consult that set to mask out deleted ids on top of the immutable .ecx record. Mirrors the intended design on both Go and Rust sides. EcVolume gains a `deletedNeedles` / `deleted_needles` set seeded from .ecj in NewEcVolume / EcVolume::new. DeleteNeedleFromEcx / journal_delete: 1. Looks the needle up read-only in .ecx. 2. Missing needle -> no-op. 3. Pre-existing .ecx tombstone (from a prior decode/rebuild) -> mirror into the in-memory set, no .ecj append. 4. Otherwise append the id to .ecj, fsync, and only then publish the id into the set. A partial write is truncated back to the pre-append length so the on-disk journal and the in-memory set cannot drift. FindNeedleFromEcx / find_needle_from_ecx now return TombstoneFileSize when the id is in the in-memory set, even though the bytes on disk still show the original size. FileAndDeleteCount: fileCount = .ecx size / NeedleMapEntrySize (unchanged) deleteCount = len(deletedNeedles) (was: .ecj size / NeedleIdSize) The RebuildEcxFile / rebuild_ecx_from_journal decode-time helpers still fold .ecj into .ecx — that is the one place tombstones land in the physical index, and it runs offline on closed files. Rust's rebuild helper now also clears the in-memory set when it succeeds. Dead code removed on the Rust side: `DeleteOutcome`, `mark_needle_deleted_in_ecx`, `restore_ecx_size`. Go drops the runtime `rollbackEcxTombstone` path. Neither helper was needed once .ecx stopped being a runtime mutation target. TestEcVolumeSyncEnsuresDeletionsVisible (issue #7751) is rewritten as TestEcVolumeDeleteDurableToJournal, which exercises the full durability chain: delete -> .ecj fsync -> FindNeedleFromEcx masks via the in-memory set -> raw .ecx bytes are *unchanged* -> Close + RebuildEcxFile folds the journal into .ecx -> raw bytes now show the tombstone, as CopyFile in the decode path expects.
494 lines
12 KiB
Protocol Buffer
494 lines
12 KiB
Protocol Buffer
syntax = "proto3";
|
|
|
|
package master_pb;
|
|
|
|
option go_package = "github.com/seaweedfs/seaweedfs/weed/pb/master_pb";
|
|
|
|
import "volume_server.proto";
|
|
|
|
//////////////////////////////////////////////////
|
|
|
|
service Seaweed {
|
|
rpc SendHeartbeat (stream Heartbeat) returns (stream HeartbeatResponse) {
|
|
}
|
|
rpc KeepConnected (stream KeepConnectedRequest) returns (stream KeepConnectedResponse) {
|
|
}
|
|
rpc LookupVolume (LookupVolumeRequest) returns (LookupVolumeResponse) {
|
|
}
|
|
rpc Assign (AssignRequest) returns (AssignResponse) {
|
|
}
|
|
rpc StreamAssign (stream AssignRequest) returns (stream AssignResponse) {
|
|
}
|
|
rpc Statistics (StatisticsRequest) returns (StatisticsResponse) {
|
|
}
|
|
rpc CollectionList (CollectionListRequest) returns (CollectionListResponse) {
|
|
}
|
|
rpc CollectionDelete (CollectionDeleteRequest) returns (CollectionDeleteResponse) {
|
|
}
|
|
rpc VolumeList (VolumeListRequest) returns (VolumeListResponse) {
|
|
}
|
|
rpc LookupEcVolume (LookupEcVolumeRequest) returns (LookupEcVolumeResponse) {
|
|
}
|
|
rpc VacuumVolume (VacuumVolumeRequest) returns (VacuumVolumeResponse) {
|
|
}
|
|
rpc DisableVacuum (DisableVacuumRequest) returns (DisableVacuumResponse) {
|
|
}
|
|
rpc EnableVacuum (EnableVacuumRequest) returns (EnableVacuumResponse) {
|
|
}
|
|
rpc VolumeMarkReadonly (VolumeMarkReadonlyRequest) returns (VolumeMarkReadonlyResponse) {
|
|
}
|
|
rpc GetMasterConfiguration (GetMasterConfigurationRequest) returns (GetMasterConfigurationResponse) {
|
|
}
|
|
rpc ListClusterNodes (ListClusterNodesRequest) returns (ListClusterNodesResponse) {
|
|
}
|
|
rpc LeaseAdminToken (LeaseAdminTokenRequest) returns (LeaseAdminTokenResponse) {
|
|
}
|
|
rpc ReleaseAdminToken (ReleaseAdminTokenRequest) returns (ReleaseAdminTokenResponse) {
|
|
}
|
|
rpc Ping (PingRequest) returns (PingResponse) {
|
|
}
|
|
rpc RaftListClusterServers (RaftListClusterServersRequest) returns (RaftListClusterServersResponse) {
|
|
}
|
|
rpc RaftAddServer (RaftAddServerRequest) returns (RaftAddServerResponse) {
|
|
}
|
|
rpc RaftRemoveServer (RaftRemoveServerRequest) returns (RaftRemoveServerResponse) {
|
|
}
|
|
rpc RaftLeadershipTransfer (RaftLeadershipTransferRequest) returns (RaftLeadershipTransferResponse) {
|
|
}
|
|
rpc VolumeGrow (VolumeGrowRequest) returns (VolumeGrowResponse) {
|
|
}
|
|
}
|
|
|
|
//////////////////////////////////////////////////
|
|
|
|
message DiskTag {
|
|
uint32 disk_id = 1;
|
|
repeated string tags = 2;
|
|
}
|
|
|
|
message Heartbeat {
|
|
string ip = 1;
|
|
uint32 port = 2;
|
|
string public_url = 3;
|
|
uint64 max_file_key = 5;
|
|
string data_center = 6;
|
|
string rack = 7;
|
|
uint32 admin_port = 8;
|
|
repeated VolumeInformationMessage volumes = 9;
|
|
// delta volumes
|
|
repeated VolumeShortInformationMessage new_volumes = 10;
|
|
repeated VolumeShortInformationMessage deleted_volumes = 11;
|
|
bool has_no_volumes = 12;
|
|
|
|
// erasure coding
|
|
repeated VolumeEcShardInformationMessage ec_shards = 16;
|
|
// delta erasure coding shards
|
|
repeated VolumeEcShardInformationMessage new_ec_shards = 17;
|
|
repeated VolumeEcShardInformationMessage deleted_ec_shards = 18;
|
|
bool has_no_ec_shards = 19;
|
|
|
|
map<string, uint32> max_volume_counts = 4;
|
|
uint32 grpc_port = 20;
|
|
repeated string location_uuids = 21;
|
|
string id = 22; // volume server id, independent of ip:port for stable identification
|
|
|
|
// state flags
|
|
volume_server_pb.VolumeServerState state = 23;
|
|
|
|
repeated DiskTag disk_tags = 24;
|
|
}
|
|
|
|
message HeartbeatResponse {
|
|
uint64 volume_size_limit = 1;
|
|
string leader = 2;
|
|
string metrics_address = 3;
|
|
uint32 metrics_interval_seconds = 4;
|
|
repeated StorageBackend storage_backends = 5;
|
|
repeated string duplicated_uuids = 6;
|
|
bool preallocate = 7;
|
|
}
|
|
|
|
message VolumeInformationMessage {
|
|
uint32 id = 1;
|
|
uint64 size = 2;
|
|
string collection = 3;
|
|
uint64 file_count = 4;
|
|
uint64 delete_count = 5;
|
|
uint64 deleted_byte_count = 6;
|
|
bool read_only = 7;
|
|
uint32 replica_placement = 8;
|
|
uint32 version = 9;
|
|
uint32 ttl = 10;
|
|
uint32 compact_revision = 11;
|
|
int64 modified_at_second = 12;
|
|
string remote_storage_name = 13;
|
|
string remote_storage_key = 14;
|
|
string disk_type = 15;
|
|
uint32 disk_id = 16;
|
|
}
|
|
|
|
message VolumeShortInformationMessage {
|
|
uint32 id = 1;
|
|
string collection = 3;
|
|
uint32 replica_placement = 8;
|
|
uint32 version = 9;
|
|
uint32 ttl = 10;
|
|
string disk_type = 15;
|
|
uint32 disk_id = 16;
|
|
}
|
|
|
|
message VolumeEcShardInformationMessage {
|
|
uint32 id = 1;
|
|
string collection = 2;
|
|
uint32 ec_index_bits = 3;
|
|
string disk_type = 4;
|
|
uint64 expire_at_sec = 5; // used to record the destruction time of ec volume
|
|
uint32 disk_id = 6;
|
|
repeated int64 shard_sizes = 7; // optimized: sizes for shards in order of set bits in ec_index_bits
|
|
uint64 file_count = 8; // total needles in the .ecx index (live + tombstoned)
|
|
uint64 delete_count = 9; // node-local tombstones in the .ecj deletion journal
|
|
// fields 10-19 reserved for future upstream open-source additions.
|
|
// fields 20+ are owned by the enterprise fork (e.g. data_shards/parity_shards)
|
|
// and must not be used here without coordination.
|
|
}
|
|
|
|
message StorageBackend {
|
|
string type = 1;
|
|
string id = 2;
|
|
map<string, string> properties = 3;
|
|
}
|
|
|
|
message Empty {
|
|
}
|
|
|
|
message SuperBlockExtra {
|
|
message ErasureCoding {
|
|
uint32 data = 1;
|
|
uint32 parity = 2;
|
|
repeated uint32 volume_ids = 3;
|
|
}
|
|
ErasureCoding erasure_coding = 1;
|
|
}
|
|
|
|
message KeepConnectedRequest {
|
|
string client_type = 1;
|
|
string client_address = 3;
|
|
string version = 4;
|
|
string filer_group = 5;
|
|
string data_center = 6;
|
|
string rack = 7;
|
|
}
|
|
|
|
message VolumeLocation {
|
|
string url = 1;
|
|
string public_url = 2;
|
|
repeated uint32 new_vids = 3;
|
|
repeated uint32 deleted_vids = 4;
|
|
string leader = 5; // optional when leader is not itself
|
|
string data_center = 6; // optional when DataCenter is in use
|
|
uint32 grpc_port = 7;
|
|
repeated uint32 new_ec_vids = 8;
|
|
repeated uint32 deleted_ec_vids = 9;
|
|
}
|
|
|
|
message ClusterNodeUpdate {
|
|
string node_type = 1;
|
|
string address = 2;
|
|
bool is_add = 4;
|
|
string filer_group = 5;
|
|
int64 created_at_ns = 6;
|
|
}
|
|
|
|
message KeepConnectedResponse {
|
|
VolumeLocation volume_location = 1;
|
|
ClusterNodeUpdate cluster_node_update = 2;
|
|
LockRingUpdate lock_ring_update = 3;
|
|
}
|
|
|
|
// LockRingUpdate is sent by the master to all filers when the lock ring
|
|
// membership changes. The master batches rapid changes (e.g., node drop + join)
|
|
// and sends the complete member list atomically, avoiding intermediate ring
|
|
// states that would cause unnecessary lock churn.
|
|
message LockRingUpdate {
|
|
string filer_group = 1;
|
|
repeated string servers = 2;
|
|
int64 version = 3;
|
|
}
|
|
|
|
message LookupVolumeRequest {
|
|
repeated string volume_or_file_ids = 1;
|
|
string collection = 2; // optional, a bit faster if provided.
|
|
}
|
|
message LookupVolumeResponse {
|
|
message VolumeIdLocation {
|
|
string volume_or_file_id = 1;
|
|
repeated Location locations = 2;
|
|
string error = 3;
|
|
string auth = 4;
|
|
}
|
|
repeated VolumeIdLocation volume_id_locations = 1;
|
|
}
|
|
|
|
message Location {
|
|
string url = 1;
|
|
string public_url = 2;
|
|
uint32 grpc_port = 3;
|
|
string data_center = 4;
|
|
}
|
|
|
|
message AssignRequest {
|
|
uint64 count = 1;
|
|
string replication = 2;
|
|
string collection = 3;
|
|
string ttl = 4;
|
|
string data_center = 5;
|
|
string rack = 6;
|
|
string data_node = 7;
|
|
uint32 memory_map_max_size_mb = 8;
|
|
uint32 writable_volume_count = 9;
|
|
string disk_type = 10;
|
|
uint64 expected_data_size = 11; // hint for size-aware volume selection
|
|
}
|
|
|
|
message VolumeGrowRequest {
|
|
uint32 writable_volume_count = 1;
|
|
string replication = 2;
|
|
string collection = 3;
|
|
string ttl = 4;
|
|
string data_center = 5;
|
|
string rack = 6;
|
|
string data_node = 7;
|
|
uint32 memory_map_max_size_mb = 8;
|
|
string disk_type = 9;
|
|
}
|
|
|
|
message AssignResponse {
|
|
string fid = 1;
|
|
uint64 count = 4;
|
|
string error = 5;
|
|
string auth = 6;
|
|
repeated Location replicas = 7;
|
|
Location location = 8;
|
|
}
|
|
|
|
message StatisticsRequest {
|
|
string replication = 1;
|
|
string collection = 2;
|
|
string ttl = 3;
|
|
string disk_type = 4;
|
|
}
|
|
message StatisticsResponse {
|
|
uint64 total_size = 4;
|
|
uint64 used_size = 5;
|
|
uint64 file_count = 6;
|
|
}
|
|
|
|
//
|
|
// collection related
|
|
//
|
|
message Collection {
|
|
string name = 1;
|
|
}
|
|
message CollectionListRequest {
|
|
bool include_normal_volumes = 1;
|
|
bool include_ec_volumes = 2;
|
|
}
|
|
message CollectionListResponse {
|
|
repeated Collection collections = 1;
|
|
}
|
|
|
|
message CollectionDeleteRequest {
|
|
string name = 1;
|
|
}
|
|
message CollectionDeleteResponse {
|
|
}
|
|
|
|
//
|
|
// volume related
|
|
//
|
|
message DiskInfo {
|
|
string type = 1;
|
|
int64 volume_count = 2;
|
|
int64 max_volume_count = 3;
|
|
int64 free_volume_count = 4;
|
|
int64 active_volume_count = 5;
|
|
repeated VolumeInformationMessage volume_infos = 6;
|
|
repeated VolumeEcShardInformationMessage ec_shard_infos = 7;
|
|
int64 remote_volume_count = 8;
|
|
uint32 disk_id = 9;
|
|
repeated string tags = 10;
|
|
}
|
|
message DataNodeInfo {
|
|
string id = 1;
|
|
map<string, DiskInfo> diskInfos = 2;
|
|
uint32 grpc_port = 3;
|
|
string address = 4; // ip:port for connecting to the volume server
|
|
}
|
|
message RackInfo {
|
|
string id = 1;
|
|
repeated DataNodeInfo data_node_infos = 2;
|
|
map<string, DiskInfo> diskInfos = 3;
|
|
}
|
|
message DataCenterInfo {
|
|
string id = 1;
|
|
repeated RackInfo rack_infos = 2;
|
|
map<string, DiskInfo> diskInfos = 3;
|
|
}
|
|
message TopologyInfo {
|
|
string id = 1;
|
|
repeated DataCenterInfo data_center_infos = 2;
|
|
map<string, DiskInfo> diskInfos = 3;
|
|
}
|
|
message VolumeListRequest {
|
|
}
|
|
message VolumeListResponse {
|
|
TopologyInfo topology_info = 1;
|
|
uint64 volume_size_limit_mb = 2;
|
|
}
|
|
|
|
message LookupEcVolumeRequest {
|
|
uint32 volume_id = 1;
|
|
}
|
|
message LookupEcVolumeResponse {
|
|
uint32 volume_id = 1;
|
|
message EcShardIdLocation {
|
|
uint32 shard_id = 1;
|
|
repeated Location locations = 2;
|
|
}
|
|
repeated EcShardIdLocation shard_id_locations = 2;
|
|
}
|
|
|
|
message VacuumVolumeRequest {
|
|
float garbage_threshold = 1;
|
|
uint32 volume_id = 2;
|
|
string collection = 3;
|
|
}
|
|
message VacuumVolumeResponse {
|
|
}
|
|
|
|
message DisableVacuumRequest {
|
|
bool by_plugin = 1;
|
|
}
|
|
message DisableVacuumResponse {
|
|
}
|
|
|
|
message EnableVacuumRequest {
|
|
bool by_plugin = 1;
|
|
}
|
|
message EnableVacuumResponse {
|
|
}
|
|
|
|
message VolumeMarkReadonlyRequest {
|
|
string ip = 1;
|
|
uint32 port = 2;
|
|
uint32 volume_id = 4;
|
|
string collection = 5;
|
|
uint32 replica_placement = 6;
|
|
uint32 version = 7;
|
|
uint32 ttl = 8;
|
|
string disk_type = 9;
|
|
bool is_readonly = 10;
|
|
}
|
|
message VolumeMarkReadonlyResponse {
|
|
}
|
|
|
|
message GetMasterConfigurationRequest {
|
|
}
|
|
message GetMasterConfigurationResponse {
|
|
string metrics_address = 1;
|
|
uint32 metrics_interval_seconds = 2;
|
|
repeated StorageBackend storage_backends = 3;
|
|
string default_replication = 4;
|
|
string leader = 5;
|
|
uint32 volume_size_limit_m_b = 6;
|
|
bool volume_preallocate = 7;
|
|
// MIGRATION: fields 8-9 help migrate master.toml [master.maintenance] to admin script plugin. Remove after March 2027.
|
|
string maintenance_scripts = 8;
|
|
uint32 maintenance_sleep_minutes = 9;
|
|
}
|
|
|
|
message ListClusterNodesRequest {
|
|
string client_type = 1;
|
|
string filer_group = 2;
|
|
int32 limit = 4;
|
|
}
|
|
message ListClusterNodesResponse {
|
|
message ClusterNode {
|
|
string address = 1;
|
|
string version = 2;
|
|
int64 created_at_ns = 4;
|
|
string data_center = 5;
|
|
string rack = 6;
|
|
}
|
|
repeated ClusterNode cluster_nodes = 1;
|
|
}
|
|
|
|
message LeaseAdminTokenRequest {
|
|
int64 previous_token = 1;
|
|
int64 previous_lock_time = 2;
|
|
string lock_name = 3;
|
|
string client_name = 4;
|
|
string message = 5;
|
|
}
|
|
message LeaseAdminTokenResponse {
|
|
int64 token = 1;
|
|
int64 lock_ts_ns = 2;
|
|
}
|
|
|
|
message ReleaseAdminTokenRequest {
|
|
int64 previous_token = 1;
|
|
int64 previous_lock_time = 2;
|
|
string lock_name = 3;
|
|
}
|
|
message ReleaseAdminTokenResponse {
|
|
}
|
|
|
|
message PingRequest {
|
|
string target = 1; // default to ping itself
|
|
string target_type = 2;
|
|
}
|
|
message PingResponse {
|
|
int64 start_time_ns = 1;
|
|
int64 remote_time_ns = 2;
|
|
int64 stop_time_ns = 3;
|
|
}
|
|
|
|
message RaftAddServerRequest {
|
|
string id = 1;
|
|
string address = 2;
|
|
bool voter = 3;
|
|
}
|
|
message RaftAddServerResponse {
|
|
}
|
|
|
|
message RaftRemoveServerRequest {
|
|
string id = 1;
|
|
bool force = 2;
|
|
}
|
|
message RaftRemoveServerResponse {
|
|
}
|
|
|
|
message RaftListClusterServersRequest {
|
|
}
|
|
message RaftListClusterServersResponse {
|
|
message ClusterServers {
|
|
string id = 1;
|
|
string address = 2;
|
|
string suffrage = 3;
|
|
bool isLeader = 4;
|
|
}
|
|
repeated ClusterServers cluster_servers = 1;
|
|
}
|
|
|
|
message RaftLeadershipTransferRequest {
|
|
string target_id = 1; // Optional: target server ID. If empty, transfers to any eligible follower
|
|
string target_address = 2; // Optional: target server address. Required if target_id is specified
|
|
}
|
|
message RaftLeadershipTransferResponse {
|
|
string previous_leader = 1;
|
|
string new_leader = 2;
|
|
}
|
|
|
|
message VolumeGrowResponse {
|
|
}
|