mirror of
https://github.com/versity/scoutfs.git
synced 2026-01-07 04:26:29 +00:00
c65b70f2aac731364623b59c63652b9cb20a9376
The first pass of the buddy allocator had a fixed indirect block so it couldn't address large devices. It didn't index set bits or slots for each order so we spent a lot of cpu searching for free space. And it didn't precisely account for stable free space so it could spend a lot of cpu time discovering that free space can't be used because it wasn't stable. This fixes these initial critical flaws in the buddy allocator. Before it could only address a few hundred megs and now it can address 2^64 blocks. Before it limited bulk inode creation searching for slots and leaf bits and now other components are much higher in the profiles with greater create rates. First we remove the special case single indirect block. The root now references a block that can be at any height. The root records the height and each block records its level. We descend until we hit the leaf. We add a stack of the blocks traversed so that we can ascend and fix up parent indexing after we modify a leaf. Now that we can have quite a lot of parent indirect blocks we can no longer have a static bitmap for allocating buddy blocks. We instead precisely preallocate two blocks for every buddy block that will be used to address all the device blocks. The blkno offset of these pairs of buddy blocks can be calculated for a given position in the tree. Allocating a blkno xors the low bit of the blkno and freeing is a nop. This happily gets rid of the specific allocation of buddy blocks with its regions and worrying about stable free blocks itself. Then we index the first set index in a block for each order. In parent blocks this tells you the slot you can traverse to find a free region of that order. In leaf blocks it tells you the specific block offset of the first free extent. This is kept up to date as we set and clear buddy bits in leaves and free_order bits in parent slots. Allocation now is a simple matter of block reads and array dereferencing. And we now precisely account for frees that should not satisfy allocation until after a transaction commit. We record frees of stable data in extent nodes in an rbtree after their buddy blocks have been dirtied. Because their blocks are dirtied we can free them as the transaction commits without errors. Similarly, we can also revert them if the transaction commit fails so that they don't satisfy allocation. This prevents us from having to hang or go read-only if a transaction commit fails. The two changes visible to callers are easy argument changes: scoutfs_buddy_free() now takes a seq to specify when the allocation was first allocated, and scoutfs_buddy_alloc_same() has its arguments match that it only makes sense for single block allocations. Unfortunately all these changes are interrelated so the resulting patch amounts to a rewrite. The core buddy bitmap helper functions and loops are the same but the surrounding block container code changes significnatly. Signed-off-by: Zach Brown <zab@versity.com>
Description
No description provided
Languages
C
87.2%
Shell
9.1%
Roff
2.5%
TeX
0.9%
Makefile
0.3%