mirror of
https://github.com/versity/scoutfs.git
synced 2026-01-10 21:50:20 +00:00
There are a few bad corner cases in the state machine that governs how client transactions are opened, modified, and committed. The worst problem is on the server side. All server request handlers need to cope with resent requests without causing bad side effects. Both get_log_trees and commit_log_trees would try to fully processes resent requests. _get_log_trees() looks safe because it works with the log_trees that was stored previously. _commit_log_trees() is not safe because it can rotate out the srch log file referenced by the sent log_trees every time it's processed. This could create extra srch entries which would delete the first instance of entries. Worse still, by injecting the same block structure into the system multiple times it ends up causing multiple frees of the blocks that make up the srch file. The client side problems are slightly different, but related. There aren't strong constraints which guarantee that we'll only send a commit request after a get request succeeds. In crazy circumstances the commit request in the write worker could come before the first get in mount succeeds. Far worse is that we can send multiple commit requests for one transaction if it changes as we get errors during multiple queued write attempts, particularly if we get errors from get_log_trees after having successfully committed. This hardens all these paths to ensure a strict sequence of get_log_trees, transaction modification, and commit_log_trees. On the server we add *_trans_seq fields to the log_trees struct so that both get_ and commit_ can see that they've already prepared a commit to send or have already committed the incoming commit, respectively. We can use the get_trans_seq field as the trans_seq of the open transaction and get rid of the entire seperate mechanism we used to have for tracking open trans seqs in the clients. We can get the same info by walking the log_trees and looking at their *_trans_seq fields. In the client we have the write worker immediately return success if mount hasn't opened the first transaction. Then we don't have the worker return to allow further modification until it has gotten success from get_log_trees. Signed-off-by: Zach Brown <zab@versity.com>