Drain conn workers before nulling client->conn in destroy

scoutfs_client_destroy nulled client->conn before scoutfs_net_free_conn
had a chance to drain the conn's workqueue.  An in-flight proc_worker
running client_lock_recover dispatches scoutfs_lock_recover_request
synchronously, which in turn calls scoutfs_client_lock_recover_response.
That helper reads client->conn and hands it to scoutfs_net_response, so
a racing NULL made submit_send dereference conn->lock and trip a KASAN
null-ptr-deref followed by a GPF.

Only became reachable in practice once reconnect started draining pending
client requests with -ECONNRESET, because the farewell can now return
while the server is still sending requests on the re-established socket.

Reorder so scoutfs_net_free_conn runs first; its shutdown_worker drains
conn->workq before any memory is freed, then client->conn is nulled.
The original intent of nulling to catch buggy late callers is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Auke Kok
2026-04-19 10:11:14 -07:00
parent 25b9457d07
commit dc74104804

View File

@@ -774,10 +774,16 @@ void scoutfs_client_destroy(struct super_block *sb)
/* make sure worker isn't using the conn */
cancel_delayed_work_sync(&client->connect_dwork);
/* make racing conn use explode */
/*
* Drain the conn's workers before nulling client->conn. In-flight
* proc_workers dispatch request handlers that call back into client
* response helpers (e.g. scoutfs_client_lock_recover_response) which
* read client->conn; nulling it first races with those workers and
* causes submit_send to dereference a NULL conn->lock.
*/
conn = client->conn;
client->conn = NULL;
scoutfs_net_free_conn(sb, conn);
client->conn = NULL;
if (client->workq)
destroy_workqueue(client->workq);