Zach Brown c65b70f2aa Use full radix for buddy and record first set
The first pass of the buddy allocator had a fixed indirect block so it
couldn't address large devices.  It didn't index set bits or slots for
each order so we spent a lot of cpu searching for free space.  And it
didn't precisely account for stable free space so it could spend a lot
of cpu time discovering that free space can't be used because it wasn't
stable.

This fixes these initial critical flaws in the buddy allocator.  Before
it could only address a few hundred megs and now it can address 2^64
blocks.  Before it limited bulk inode creation searching for slots and
leaf bits and now other components are much higher in the profiles with
greater create rates.

First we remove the special case single indirect block.  The root now
references a block that can be at any height.  The root records the
height and each block records its level.  We descend until we hit the
leaf.  We add a stack of the blocks traversed so that we can ascend and
fix up parent indexing after we modify a leaf.

Now that we can have quite a lot of parent indirect blocks we can no
longer have a static bitmap for allocating buddy blocks.  We instead
precisely preallocate two blocks for every buddy block that will be used
to address all the device blocks.  The blkno offset of these pairs of
buddy blocks can be calculated for a given position in the tree.
Allocating a blkno xors the low bit of the blkno and freeing is a nop.
This happily gets rid of the specific allocation of buddy blocks with
its regions and worrying about stable free blocks itself.

Then we index the first set index in a block for each order.  In parent
blocks this tells you the slot you can traverse to find a free region of
that order.  In leaf blocks it tells you the specific block offset of
the first free extent.  This is kept up to date as we set and clear
buddy bits in leaves and free_order bits in parent slots.  Allocation
now is a simple matter of block reads and array dereferencing.

And we now precisely account for frees that should not satisfy
allocation until after a transaction commit.  We record frees of stable
data in extent nodes in an rbtree after their buddy blocks have been
dirtied.  Because their blocks are dirtied we can free them as the
transaction commits without errors.  Similarly, we can also revert them
if the transaction commit fails so that they don't satisfy allocation.
This prevents us from having to hang or go read-only if a transaction
commit fails.

The two changes visible to callers are easy argument changes:
scoutfs_buddy_free() now takes a seq to specify when the allocation was
first allocated, and scoutfs_buddy_alloc_same() has its arguments match
that it only makes sense for single block allocations.

Unfortunately all these changes are interrelated so the resulting patch
amounts to a rewrite.  The core buddy bitmap helper functions and loops
are the same but the surrounding block container code changes
significnatly.

Signed-off-by: Zach Brown <zab@versity.com>
2016-11-08 16:05:37 -08:00
Description
No description provided
6.8 MiB
Languages
C 87.2%
Shell 9.1%
Roff 2.5%
TeX 0.9%
Makefile 0.3%