mirror of
https://github.com/tendermint/tendermint.git
synced 2026-01-03 11:45:18 +00:00
mempool: fix broadcastTxRoutine leak (#3478)
Refs #3306, irisnet@fdbb676 I ran an irishub validator. After the validator node ran several days, I dump the whole goroutine stack. I found that there were hundreds of broadcastTxRoutine. However, the connected peer quantity was less than 30. So I belive that there must be broadcastTxRoutine leakage issue. According to my analysis, I think the root cause of this issue locate in below code: select { case <-next.NextWaitChan(): // see the start of the for loop for nil check next = next.Next() case <-peer.Quit(): return case <-memR.Quit(): return } As we know, if multiple paths are avaliable in the same time, then a random path will be selected. Suppose that next.NextWaitChan() and peer.Quit() are both avaliable, and next.NextWaitChan() is chosen. // send memTx msg := &TxMessage{Tx: memTx.tx} success := peer.Send(MempoolChannel, cdc.MustMarshalBinaryBare(msg)) if !success { time.Sleep(peerCatchupSleepIntervalMS * time.Millisecond) continue } Then next will be non-empty and the peer send operation won't be success. As a result, this go routine will be track into infinite loop and won't be released. My proposal is to check peer.Quit() and memR.Quit() in every loop no matter whether next is nil.
This commit is contained in:
committed by
Anton Kaliaev
parent
6de7effb05
commit
1bb8e02a96
@@ -179,6 +179,10 @@ func (memR *MempoolReactor) broadcastTxRoutine(peer p2p.Peer) {
|
||||
peerID := memR.ids.GetForPeer(peer)
|
||||
var next *clist.CElement
|
||||
for {
|
||||
// In case of both next.NextWaitChan() and peer.Quit() are variable at the same time
|
||||
if !memR.IsRunning() || !peer.IsRunning() {
|
||||
return
|
||||
}
|
||||
// This happens because the CElement we were looking at got garbage
|
||||
// collected (removed). That is, .NextWait() returned nil. Go ahead and
|
||||
// start from the beginning.
|
||||
|
||||
Reference in New Issue
Block a user