Drain conn workers before nulling client->conn in destroy

scoutfs_client_destroy nulled client->conn before scoutfs_net_free_conn had a chance to drain the conn's workqueue. An in-flight proc_worker running client_lock_recover dispatches scoutfs_lock_recover_request synchronously, which in turn calls scoutfs_client_lock_recover_response. That helper reads client->conn and hands it to scoutfs_net_response, so a racing NULL made submit_send dereference conn->lock and trip a KASAN null-ptr-deref followed by a GPF. Only became reachable in practice once reconnect started draining pending client requests with -ECONNRESET, because the farewell can now return while the server is still sending requests on the re-established socket. Reorder so scoutfs_net_free_conn runs first; its shutdown_worker drains conn->workq before any memory is freed, then client->conn is nulled. The original intent of nulling to catch buggy late callers is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 22:10:30 +00:00 · 2026-04-19 10:11:14 -07:00
parent 25b9457d07
commit dc74104804
1 changed files with 8 additions and 2 deletions
--- a/kmod/src/client.c
+++ b/kmod/src/client.c
@@ -774,10 +774,16 @@ void scoutfs_client_destroy(struct super_block *sb)
 	/* make sure worker isn't using the conn */
 	cancel_delayed_work_sync(&client->connect_dwork);

-	/* make racing conn use explode */
+	/*
+	 * Drain the conn's workers before nulling client->conn.  In-flight
+	 * proc_workers dispatch request handlers that call back into client
+	 * response helpers (e.g. scoutfs_client_lock_recover_response) which
+	 * read client->conn; nulling it first races with those workers and
+	 * causes submit_send to dereference a NULL conn->lock.
+	 */
 	conn = client->conn;
-	client->conn = NULL;
 	scoutfs_net_free_conn(sb, conn);
+	client->conn = NULL;

 	if (client->workq)
 		destroy_workqueue(client->workq);