We should ignore equal and less than operators for shard_id as well.
Within a 3 nodes cluster, each node has 4 cpus, on first node
Before:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp 0 0 172.30.0.99:36998 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:36772 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:40125 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:60182 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:38013 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:51997 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:56532 172.30.0.100:7000 ESTABLISHED
After:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp 0 0 172.30.0.99:45661 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:57395 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:37807 172.30.0.100:7000 ESTABLISHED
tcp 0 36 172.30.0.99:50567 172.30.0.100:7000 ESTABLISHED
Each shard of a node is supposed to have 1 connection to a peer node,
thus each node will have #cpu connections to a peer node.
With this patch, the cluster is much more stable than before on AWS. So
far, I see no timeout in the gossip syn message exchange.