18 Commits

Author SHA1 Message Date
Chris Lu
10cc06333b cluster: restrict Ping RPC to known peers of the requested type (#9445)
Ping previously dialled whatever host:port the caller asked for. Gate
each server's Ping handler on cluster membership: masters check the
topology, registered cluster nodes, and configured master peers; volume
servers only accept their seed/current masters; filers accept tracked
peer filers, the master-learned volume server set, and configured
masters.

Use address-indexed peer lookups to keep Ping target validation O(1):
- topology maintains a pb.ServerAddress -> *DataNode index alongside
  the dc/rack/node tree, kept in sync from doLinkChildNode and
  UnlinkChildNode plus the ip/port-rewrite branch in
  GetOrCreateDataNode. GetTopology now returns nil on a detached
  subtree instead of panicking, so the linkage hooks can no-op safely.
- vid_map tracks a refcount per volume-server address so
  hasVolumeServer answers without scanning every vid location. The
  add path skips empty-address entries the same way the delete path
  already does, so a zero-value Location cannot leak a permanent
  serverRefCount[""] bucket.
- masters reuse a cached master-address set from MasterClient instead
  of walking the configured peer slice on every request.
- volume servers compare against a pre-built seed-master set and
  protect currentMaster reads/writes with an RWMutex, fixing the
  data race with the heartbeat goroutine. The seed slice is copied
  on construction so external mutation cannot desync it from the
  frozen lookup set.
- cluster.check drops the direct volume-to-volume sweep; volume
  servers no longer carry a peer-volume list, and the note next to
  the dropped probe is reworded to make clear that direct
  volume-to-volume reachability is intentionally not validated by
  this command.

Update the volume-server integration tests that drove Ping through the
new admission gate: success-path coverage now targets the master peer
(the only type a volume server tracks), and the unknown/unreachable
path asserts the InvalidArgument the gate now returns instead of the
old downstream dial error.

Mirror the same admission gate in the Rust volume server crate: a
seed-master HashSet built once at startup plus a tokio RwLock over the
heartbeat-tracked current master, both consulted in is_known_ping_target
on every Ping, with InvalidArgument returned for any target that isn't
a recognised master.
2026-05-12 13:00:52 -07:00
promalert
9012069bd7 chore: execute goimports to format the code (#7983)
* chore: execute goimports to format the code

Signed-off-by: promalert <promalert@outlook.com>

* goimports -w .

---------

Signed-off-by: promalert <promalert@outlook.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-07 13:06:08 -08:00
Konstantin Lebedev
ff1392f7f4 [shell] use constant for hdd of type (#6337)
use constant for hdd of type
2024-12-10 08:43:59 -08:00
chrislu
ec30a504ba refactor 2024-09-29 10:38:22 -07:00
chrislu
701abbb9df add IsResourceHeavy() to command interface 2024-09-28 20:23:01 -07:00
vadimartynov
8aae82dd71 Added context for the MasterClient's methods to avoid endless loops (#5628)
* Added context for the MasterClient's methods to avoid endless loops

* Returned WithClient function. Added WithClientCustomGetMaster function

* Hid unused ctx arguments

* Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions

* Changed the context termination check in the tryConnectToMaster function

* Added a child context to the tryConnectToMaster function

* Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark
2024-06-14 11:40:34 -07:00
Benoît Knecht
56287bd07d weed/shell: Cluster check other disk types (#5245)
* week/shell: Cluster check other disk types

The `cluster.check` command only took the empty (`""`) and `hdd` disk types
into consideration, but a cluster with only `ssd` or `nvme` disk types would be
equally valid.

This commit simply checks that _any_ disk type is defined, and that some
volumes are available for it.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>

* weed/shell: Replace loop that copies slice

Use the following construct instead of a `for` loop:

```golang
x = append(x, y...)
```

See https://staticcheck.dev/docs/checks#S1011.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>

* weed/shell: Check disk types when filer is in use

Filer stores its metadata logs in generic (i.e. `""`) or HDD disk type volumes,
so make sure those disk types exist and have volumes associated with them when
Filer is deployed in the cluster.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>

---------

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2024-01-29 10:36:37 -08:00
chrislu
81fdf3651b grpc connection to filer add sw-client-id header 2023-01-20 01:48:12 -08:00
askeipx
2e78a522ab remove old raft servers if they don't answer to pings for too long (#3398)
* remove old raft servers if they don't answer to pings for too long

add ping durations as options

rename ping fields

fix some todos

get masters through masterclient

raft remove server from leader

use raft servers to ping them

CheckMastersAlive for hashicorp raft only

* prepare blocking ping

* pass waitForReady as param

* pass waitForReady through all functions

* waitForReady works

* refactor

* remove unneeded params

* rollback unneeded changes

* fix
2022-08-23 23:18:21 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
89948a373b fix error reporting for "Need to a hdd disk type"
related to https://github.com/chrislusf/seaweedfs/issues/3128
2022-05-31 12:43:55 -07:00
chrislu
94635e9b5c filer: add filer group 2022-05-01 21:59:16 -07:00
chrislu
460d56d283 shell: cluster.check prints out clock delta and network latency 2022-04-16 13:24:17 -07:00
chrislu
1f03fcccb1 fix nil in cluster_check shell command
fix https://github.com/chrislusf/seaweedfs/issues/2905
2022-04-12 08:47:27 -07:00
chrislu
4aae87f405 check missing hdd disk type 2022-04-04 14:48:00 -07:00
chrislu
105578a2f2 skip pinging self for master and volume server 2022-04-01 20:25:35 -07:00
chrislu
4ecba915f3 add check between peers 2022-04-01 17:40:25 -07:00
chrislu
4b5c0e3fa9 check cluster connectivities 2022-04-01 17:27:49 -07:00