Add tcp_keepalive_timeout_ms option, change default to 60s

The default TCP keepalive value is currently 10s, resulting in clients
being disconnected after 10 seconds of not replying to a TCP keepalive
packet. These keepalive values are reasonable most of the times, but
we've seen client disconnects where this timeout has been exceeded,
resulting in fencing. The cause for this is unknown at this time, but it
is suspected that network intermissions are happening.

This change adds a configurable value for this specific client socket
timeout. It enforces that its value is above UNRESPONSIVE_PROBES, whose
value remains unchanged.

The default value of 10000ms (10s) is changed to 60s. This is the value
we're assuming is much better suited for customers and has been briefly
trialed, showing that it may help to avoid network level interruptions
better.

Signed-off-by: Auke Kok <auke.kok@versity.com>
This commit is contained in:
Auke Kok
2025-09-09 09:58:25 -07:00
parent fd8aaa0810
commit f67462750b
4 changed files with 60 additions and 9 deletions

View File

@@ -130,6 +130,23 @@ the server for the filesystem if it is elected leader.
The assigned number must match one of the slots defined with \-Q options
when the filesystem was created with mkfs. If the number assigned
doesn't match a number created during mkfs then the mount will fail.
.TP
.B tcp_keepalive_timeout_ms=<number>
This option sets the amount of time, in milliseconds, that a client
connection will wait for active TCP packets, before deciding that
the connection is dead. This setting is per-mount and only changes
the behavior of that mount.
.sp
The default value of this setting is 60000msec (60s). Any precision
beyond a whole second is likely unrealistic due to the nature of
TCP keepalive mechanisms in the Linux kernel. Valid values are any
value higher than 3000 (3s).
.sp
The TCP keepalive mechanism is complex and observing a lost connection
quickly is important to maintain cluster stability. If the local
network suffers from intermittent outages this option may provide
some respite to overcome these outages without the cluster becoming
desynchronized.
.SH VOLUME OPTIONS
Volume options are persistent options which are stored in the super
block in the metadata device and which apply to all mounts of the volume.