Convert the segment allocator to store its free region bitmaps in the
btree.
This is a very straight forward mechanical transformation. We split the
allocator region into a big-endian index key and the bitmap value
payload. We're careful to operate on aligned copies of the bitmaps so
that they're long aligned.
We can remove all the funky functions that were needed when writing the
ring. All we're left with is a call to apply the pending allocations to
dirty btree blocks before writing the btree.
Signed-off-by: Zach Brown <zab@versity.com>
Using the treap to be able to incrementally read and write the manifest
and allocation storage from all nodes wasn't quite ready for prime time.
The biggest problem is that invalidating cached nodes which are the
target of native pointers, either for consistency or memory pressure, is
problematic. This was getting in the way of adding shared support as
readers and writers try to use as much of their treap caches as they
can. There were other serious problems that we'd run into eventually:
memory pressure from duplicate caching in native nodes and the page
cache, small IOs from reading a page at a time, the risk of
pathologically imbalanced treaps, and the ring being corrupted if the
migration balancing doesn't work (the model assumed you could always
dirty an individual node in a transaction, you have to dirty all the
parents in each new transaction).
Let's back off to a much simpler mechanism while we build the rest of
the system around it. We can revisit aggressively optimizing this when
it's our worst problem.
We'll store the indexes that the manifest server needs in simple
preallocated rings with log entries. The server has to read the index
in its entirety into a native rbtree before it can work on it. We won't
access the physical ring from mounts anymore, they'll send messages to
the server.
The ring callers are now working with a pinned tree in memory so the
interface can be a bit simpler. By storing the indexes in their own
rings the code and write path become a lot simper: we have an IO
submission path for each index instead of "dirtying" calls per index and
then a writing call.
All this is much more robust and much less likely to get in our way as
we stand up the rest of the system around it.
Signed-off-by: Zach Brown <zab@versity.com>
Our statfs callback was still using the old buddy allocator.
We add a free segments field to the super and have it track the number
of free segments in the allocator. We then use that to calculate the
number of free blocks for statfs.
Signed-off-by: Zach Brown <zab@versity.com>
The first pass manifest and allocator storage used a simple ring log
that was entirely replayed into memory to be used. That risked the
manifest being too large to fit in memory, especially with large keys
and large volumes.
So we move to using an indexed persistent structure that can be read on
demand and cached. We use a treap of byte referenced nodoes stored in a
circular ring.
The code interface is modeled a bit on the in-memory rbtree interface.
Except that we can get IO errors and manage allocation so we return data
pointers to the item payload istead of item structs and we can return
errors.
The manifest and allocator are converted over and the old ring code is
removed entirely.
Signed-off-by: Zach Brown <zab@versity.com>
Add all the core strutural components to be able to modify metadata. We
modify items in fs write operations, track dirty items in the cache,
allocate free segment block reagions, stream dirty items into segments,
write out the segments, update the manifest to reference the written
segments, and write out a new ring that has the new manifest.
Signed-off-by: Zach Brown <zab@versity.com>