Commit Graph

877 Commits

Author SHA1 Message Date
Zach Brown
005cf99f42 scoutfs: use vmalloc for high order xattr allocs
The xattr item stream is constructred from a large contiguous region
that contains the struct header, the key, and the value.  The value
can be larger than a page so kmalloc is likely to fail as the system
gets fragmented.

Our recent move to the item cache added a significant source of page
allocation churn which moved the system towards fragmentation much more
quickly and was causing high-order allocation failures in testing.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
c61175e796 scoutfs: remove unused radix code
Remove the radix allocator that was added as we expermented with packed
extent items.  It didn't work out.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
e60f4e7082 scoutfs: use full extents for data and alloc
Previously we'd avoided full extents in file data mapping items because
we were deleting items from forest btrees directly.  That created
deletion items for every version of file extents as they were modified.
Now we have the item cache which can remove deleted items from memory
when deletion items aren't necessary.

By layering file data extents on an extent layer, we can also transition
allocators to use extents and fix a lot of problems in the radix block
allocator.

Most of this change is churn from changing allocator function and struct
names.

File data extents no longer have to manage loading and storing from and
to packed extent items at a fixed granularity.  All those loops are torn
out and data operations now call the extent layer with their callbacks
instead of calling its packed item extent functions.  This now means
that fallocate and especially restoring offline extents can use larger
extents.  Small file block allocation now comes from a cached extent
which reduces item calls for small file data streaming writes.

The big change in the server is to use more root structures to manage
recursive modification instead of relying on the allocator to notice and
do the right thing.  The radix allocator tried to notice when it was
actively operating on a root that it was also using to allocate and free
metadata blocks.  This resulted in a lot of bugs.  Instead we now double
buffer the server's avail and freed roots so that the server fills and
drains the stable roots from the previous transaction.  We also double
buffer the core fs metadata avail root so that we can increase the time
to reuse freed metadata blocks.

The server now only moves free extents into client allocators when they
fall below a low threshold.  This reduces the shared modification of the
client's allocator roots which requires cold block reads on both the
client and server.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
8f946aa478 scoutfs: add btree item extent allocator
Add an allocator which uses btree items to store extents.  Both the
client and server will use this for btree blocks, the client will use it
for srch blocks and data extents, and the server will move extents
between the core fs allocator btree roots and the clients' roots.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
b605407c29 scoutfs: add extent layer
Add infrastructure for working with extents.  Callers provide callbacks
which operate on their extent storage while this code performs the
fiddly splitting and merging of extents.  This layer doesn't have any
persitent structures itself, it only operates on native structs in
memory.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
b28acdf904 scoutfs: use larger percpu_counter batch
The percpu_counter library merges the per-cpu counters with a shared
count when the per-cpu counter gets larger than a certain value.  The
default is very small, so we often end up taking a shared lock to update
the count.  Use a larger batch so that we take the lock less often.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
ae97ffd6fc scoutfs: remove unused kvec.h
We've removed the last use of kvecs to describe item values.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
12067e99ab scoutfs: remove item granular work from forest
Now that the item cache is bearing the load of high frequency item
calls, we can remove all the item granular work that the forest was
trying to do.  The item cache amortizes the cost of the forest so its
remaining methods can go straight to the btrees and don't need
complicated state to reduce the overhead of item calls.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
6bacd95aea scoutfs: fs uses item cache instead of forest
Use the new item cache for all the item work in the fs instead of
calling into the forest of btrees.  Most of this is mechanical
conversion from the _forest calls to the _item calls.  The item cache
no longer supports the kvec argument for describing values so all the
callers pass in the value pointer and length directly.

The item cache doesn't support saving items as they're deleted and later
restoring them from an error unwinding path.  There were only two users
of this.  Directory entries can easily guarantee that deletion won't
fail by dirtying the items first in the item cache.  Xattr updates were
a little trickier.  They can combine dirtying, creating, updating, and
deleting to atomically switch between items that describe different
versions of a multi-item value.  This also fixed a bug in the srch
xattrs where replacing an xattr would create a new id for the xattr and
leave existing srch items referencing a now deleted id.  Replacing now
reuses the old id.

And finally we add back in the locking and transaction item cache
integration.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
45e594396f scoutfs: add an item cache above the btrees
Add an item cache between fs callers and the forest of btrees.  Calling
out to the btrees for every item operation was far too expensive.  This
gives us a flexible in-memory structure for working with items that
isn't bound by the constrants of persistent block IO.  We can rarely
stream large groups of items to and from the btrees and then use
efficient kernel memory structures for more frequent item operations.

This adds the infrastructure, nothing is calling it yet.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
b1757a061e scoutfs: add forest methods for item cache
Add forest calls that the item cache will use.  It needs to read all the
items in the leaf blocks of forest btree which could contain the key,
write dirty items to the log btree, and dirty bits in the bloom block as
items are dirtied.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
1a994137f4 scoutfs: add btree methods for item cache
Add btree calls to call a callback for all items in a leaf, and to
insert a list of items into their leaf blocks.  These will be used by
the item cache to populate the cache and to write dirty items into dirty
btree blocks.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
57af2bd34b scoutfs: give btree walk callers more keys
The current btree walk recorded the start and end of child subtrees as
it walked, and it could give the caller the next key to iterate towards
after the block it returned.  Future methods want to get at the key
bounds of child subtrees, so we add a key range struct that all walk
callers provide and fill it with all the interesting keys calculated
during the walk.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
9e975dffe1 scoutfs: refactor btree split condition
Btree traversal doesn't split a block if it has room for the caller's
item.  Extract this test into a function so that an upcoming btree call
can test that each of multiple insertions into a leaf will fit.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
d440056e6f scoutfs: remove unused xattr index code
Remove the last remnants of the indexed xattrs which used fs items.
This makes the significant change of renumbering the key zones so I
wanted it in its own commit.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
d1e62a43c9 scoutfs: fix leaking alloc bits in merge
In a merge where the input and source trees are the same, the input
block can be an initial pre-cow version of the dirty source block.
Dirtying blocks in the change will clear allocations in the dirty source
block but they will remain in the pre-cow input block.  The merge can
then set these blocks in the dst, even though they were also used by
allocation, because they're still set in the pre-cow input block.

This fix is clumsy, but minimal and specific to this problem.  A more
thorough fix is being worked on which introduces more staging more
allocator trees and should stop calls that are modifying the current
active avail or free trees.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
289caeb353 scoutfs: trace leaf_bit of modified radix bits
Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
ba879b977a scoutfs: expand radix merge tracing
Add a trace event for entering _radix_merge() and rename the current
per-merge trace event.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
5c6b263d97 scoutfs: trace radix bit ops before assertions
Trace operations before they can trigger assertions so we can see the
violating operation in the traces.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
ca6b7f1e6d scoutfs: lock invalidate only syncs dirty
Lock invalidation has to make sure that changes are visible to future
readers.  It was syncing if the current transaction is dirty.  This was
never optimal, but it wasn't catastrophic when concurrent invalidation
work could all block on one sync in progress.

With the move to a single invalidation worker serially invalidating
locks it became unacceptable.  Invalidation happening in the presence of
writers would constantly sync the current transaction while very old
unused write locks were invalidated.  Their changes had long since been
committed in previous transactions.

We add a lock field to remember the transaction sequence which could
have been dirtied under the lock.  If that transaction has already been
comitted by the time we invalidate the lock it doesn't have to sync.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
55dde87bb1 scoutfs: fix lock invalidation work deadlock
The client lock network message processing callbacks were built to
simply perform the processing work for the message in the networking
work context that it was called in.  This particularly makes sense for
invalidation because it has to interact with other components that
require blocking contexts (syncing commits, invalidating inodes,
truncating pages, etc).

The problem is that these messages are per-lock.  With the right
workloads we can use all the capacity for executing work just in lock
invalidation work.  There is no more work execution available for other
network processing.  Critically, the blocked invalidation work is
waiting for the commit thread to get its network responses before
invalidation can make forward progress.  I was easily reproducing
deadlocks by leaving behind a lot of locks and then triggering a flood
of invalidation requests on behalf of shrinking due to memory pressure.

The fix is to put locks on lists and have a small fixed number of work
contexts process all the locks pending for each message type.  The
network callbacks don't block, they just put the lock on the list and
queue the work that will walk the lists.  Invalidation now blocks one
work context, not the number of incoming requests.

There were some wait conditions in work that used to use the lock workq.
Other paths that change those conditions now have to know to queue the
work specifically, not just wake tasks which included blocked work
executors.

The other subtle impact of the change is that we can no longer rely on
networking to shutdown message processing work that was happening in its
callbacks.  We have to specifically stop our work queues in _shutdown.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
f4db553c28 scoutfs: fix error unwinding in server advance_seq
While checking for lost server commit holds, I noticed that the
advance_seq request path had obviously incorrect unwinding after getting
an error.  Fix it up so that it always unlocks and applies its commit.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
4b9c02ba32 scoutfs: add committed_seq to statfs_more
Add the committed_seq to statfs_more which gives the greatest seq which
has been committed.  This lets callers disocover that a seq for a change
they made has been committed.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
6356440073 scoutfs: add error message for client commit error
We had a debugging WARN_ON that warns when a client has an error
commiting their transaction.  Let's add a bit more detail and promote it
to a proper error.  These should not happen.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
9658412d09 scoutfs: add forest counters
Add a bunch of counters to track significant events in the forest.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
57c7caf348 scoutfs: fix forest dirty log tracking
The forest code is responsible for constructing a consistent fs image
out of the items spread across all the btrees written by mounts in the
system.

Usually readers walk a btree looking for log trees that they should
read.  As a mount modifies items in its dirty log tree, readers need to
be sure to check that in-memory dirty log tree even though it isn't
present in the btree that records persistent log trees.

The code did this by setting a flag to indicate that readers using a
lock should check the dirty log tree.  But the flag usage wasn't
properly locked and left a race where a reader and writer could race,
leaving future readers to not know that they should check the dirty log
tree.  When we rarely hit that race we'd see item errors that made no
sense, like not being able to find an inode item to update after having
just created it in the current transaction.

To fix this, we clean up the tree tracking in the forest code.

We get rid of the static forest_root structs in the lock_private that
were used to track the two special-case roots that aren't found in log
tree items: the in-memory dirty log root and the final fs root.  All
roots are now dynamically allocated.  We use a flag in the root to
identify it as the dirty log root, and identify the fs root by its
rid/nr.  This results in a bunch of caller churn as we remove lpriv from
root identifying functions.

We get rid of the idea of the writer adding a static root to the list as
well as marking the log as needing to read the root.  Instead we make
all root management happen as we refresh the list.  The forest maintains
a commit sequence and writers set state in the lock to indicate that the
lock has dirty items in the log during this transaction.  Iteration then
compares the state set by the commit, writer, and the last refresh to
determine if a new refresh needs to happen.

Properly tracking the presence of dirty items lets us recognize when the
lock no longer has dirty items in the log and we can stop locking and
reading the dirty log and fall back to reading the committed stable
version.  The previous code didn't do that, it would lock and read the
dirty root forever.

While we're in here, we fix the locking around setting bloom bits and
have it track the version of the log tree that was set so that we don't
have to clear set bits as the log version is rotated out by the server.

There was also a subtle bug where we could hit to stale errors for the
same root and return -EIO because we triggering refresh returned stale.
We rework the retrying logic to use a separate error code to force
refreshing so that we can't accidentally trigger eio by conflating
reading stale blocks and forcing refreshing.

And finally, we no longer record that we need the dirty log tree in a
root if we have a lock that could never read.  It's a minor optimization
that doesn't change functional behaviour.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
f8bf1718a0 scoutfs: add a bunch of btree counters
Add some counters for the most basic btree events.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
c415cab1e9 scoutfs: use srch to track .srch. xattrs
Using strictly coherent btree items to map the hash of xattr names to
inode numbers proved the value of the functionality, but it was too
expensive.  We now have the more efficient srch infrastructure to use.

We change from the .indx. to the .srch. tag, and change the ioctl from
find_xattr to search_xattrs.  The idea is to communicate that these are
accelerated searches, not precise index lookups and are relatively
expensive.

Rather than maintaining btree items, xattr setting and deleting emits
srch entries which either tracks the xattr or combines with the previous
tracker and removes the entry.  These are done under the lock that
protects the main xattr item, we can remove the separate locking of the
previous index items.

The semantics of the search ioctl needs to change a bit.  Because
searches are so expensive we now return a flag to indicate that the
search completed.  While we're there, we also allow a last_ino parameter
so that searches can be divided up and run in parallel.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
f8e1812288 scoutfs: add srch infrastructure
This introduces the srch mechanism that we'll use to accelerate finding
files based on the presence of a given named xattr.  This is an
optimized version of the initial prototype that was using locked btree
items for .indx. xattrs.

This is built around specific compressed data structures, having the
operation cost match the reality of orders of magnitude more writers
than readers, and adopting a relaxed locking model.  Combine all of this
and maintaining the xattrs no longer tanks creation rates while
maintaining excellent search latencies, given that searches are defined
as rare and relatively expensive.

The core data type is the srch entry which maps a hashed name to an
inode number.  Mounts can append entries to the end of unsorted log
files during their transaction.  The server tracks these files and
rotates them into a list of files as they get large enough.  Mounts have
compaction work that regularly asks the server for a set of files to
read and combine into a single sorted output file.  The server only
initiates compactions when it sees a number of files of roughly the same
size.  Searches then walk all the commited srch files, both log files
and sorted compacted files, looking for entries that associate an xattr
name with an inode number.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
cca83b1758 scoutfs: rework get_fs_roots to get_roots
The get_fs_roots rpc and server interfaces were built around individual
roots.  Rebuild it around passing around a struct so that we can add
roots without impacting all the current users.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
8c114ddb87 scoutfs: increase max btree item size
Now that we have larger blocks we can have a larger max item.  This was
increased to make room for the srch compaction items which store a good
number of srch files in their value.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
ab271f4682 scoutfs: report sm metadata blocks in statfs
The conversion of the super block metadata block counters to units of
large metadata blocks forgot to scale back to the small block size when
filling out the block count fields in the statfs rpc.   This resulted in
the free and total metadata use being off by the factor of large to
small block size (default of ~16x at the moment).

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
42e7fbb4f7 scoutfs: switch to using fnv1a for hashing
We had a few uses of crc for hashing.  That was fine enough for initial
testing but the huge number of xattrs that srch is recording was
seeing very bad collisions from the clumsy combination of crc32c into
a 64bit hash.  Replace it with FNV for now.

This also takes the opportunity to use 3 hash functions in the forest
bloom filter so that we can extract them from the 64bit hash of the key
rather than iterating and recalculating hashes for each function.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
f48112e2a7 scoutfs: allocate contig block pages with nowarn
We first attempt to allocate our large logically contiguous cached
blocks with physically contiguous pages to minimize the impact on the
tlb.  When that fails we fall back to vmalloc()ed blocks.  Sadly,
high-order page allocation failure is expected and we forgot to provide
the flag that suppresses the page allocation failure message.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
07ba053021 scoutfs: check super blkno fields
We had a bug where mkfs would set a free data blkno allocator bit past
the end of the device.  (Just at it, in fact.  Those fenceposts.)  Add
some checks at mount to make sure that the allocator blkno ranges in the
super don't have obvious mistakes.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
69e5f5ae5f scoutfs: add btree walk trace point
Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
2980edac53 scoutfs: restore btree block verification
Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
f9ff25db23 scoutfs: add dirent name fingerprint
Entries in a directory are indexed by the hash of their name.  This
introduces a perfectly random access pattern.  And this results in a cow
storm as directories get large enough such that the leaf blocks that
store their entries are larger than our commits.  Each commit ends up
being full of cowed leaf blocks that contain a single new entry.

The dirent name fingerprints change the dirent key to first start with a
fingerprint of the name.  This reduces the scope of hash randomization
from the entire directory to entries with the same fingerprint.

On real customer dir sizes and file names we saw roughly 3x create rate
improvements from being able to create more entries in leaf blocks
within a commit.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
0a47e8f936 Revert "scoutfs: add block visited bit"
The radix allocator no longer uses the block visited bit because it
maintains its own much richer private per-block data stored off the priv
pointer.

Signed-off-by: Zach Brown <zab@versity.com>

This reverts commit 294b6d1f79e6d00ba60e26960c764d10c7f4b8a5.
2020-08-26 14:39:12 -07:00
Zach Brown
3a82090ab1 scoutfs: have per-fs inode nr allocators
We had previously seen lock contention between mounts that were either
resolving paths by looking up entries in directories or writing xattrs
in file inodes as they did archiving work.

The previous attempt to avoid this contention was to give each directory
its own inode number allocator which ensured that inodes created for
entries in the directory wouldn't share lock groups with inodes in other
directories.

But this creates the problem of operating on few files per lock for
reasonably small directories.  It also creates more server commits as
each new directory gets its inode allocation reservation.

The fix is to have mount-wide seperate allocators for directories and
for everything else.  This puts directories and files in seperate groups
and locks, regardless of directory population.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
4d0b78f5cb scoutfs: add counters for server commits
Add some counters for server commits.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
e6ae397d12 Revert "scoutfs: switch block cache to rbtree"
We had switched away from the radix_tree because we were adding a
_block_move call which couldn't fail.  We no longer need that call, so
we can go back to storing cached blocks in the radix tree which can use
RCU lookups.

This revert has some conflict resolution around recent commits to add
the IO_BUSY block flag and the switch to _LG_ blocks.

This reverts commit 10205a5670dd96af350cf481a3336817871a9a5b.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
e5f5ee2679 Revert "scoutfs: add scoutfs_block_move"
We add _block_move for the radix allocator, but it no longer needs it.

This reverts commit 6bb0726689981eb9699296ae2cb4c8599add5b90.
2020-08-26 14:39:12 -07:00
Zach Brown
8fe683dab8 scoutfs: cow dirty radix blocks instead of moving
The radix allocator has to be careful to not get lost in recursion
trying to allocate metadata blocks for its dirty radix blocks while
allocating metadata blocks for others.

The first pass had used path data structures to record the references to
all the blocks we'd need to modify to reflect the frees and allocations
performed while dirtying radix blocks.  Once it had all the path blocks
it moved the old clean blocks into new dirty locations so that the
dirtying couldn't fail.

This had two very bad performance implications.  First, it meant that
trying to read clean versions of dirtied trees would always read the old
blocks again because their clean version had been moved to the dirty
version.  Typically this wouldn't happen but the server does exactly
this every time it tries to merge freed blocks back into its avail
allocator.  This created a significant IO load on the server.  Secondly,
that block cache move not being allowed to fail motivated us to move to
a locked rbtree for the block cache instead of the lockless rcu
radix_tree.

This changes the recursion avoidance to use per-block private metadata
to track every block that we allocate and cow rather than move.  Each
dirty block knows its parent ref and the blknos it would clear and set.
If dirtying fails we can walk back through all the blocks we dirty and
restore their original references before dropping all the dirty blocks
and returning an error.  This lets us get rid of the path structure
entirely and results in a much cleaner system.

This change meant tracking free blocks without clearing them as they're
used to satisfy dirty block allocations.  The change now has a cursor
that walks the avail metadata tree without modifying it.  While building
this it became clear that tracking the first set bits of refs doesn't
provide any value if we're always searching from a cursor.  The cursor
ends up providing the same value of avoiding constantly searching empty
initial bits and refs.  Maintaining the first metadata was just
overhead.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
6d7b8233c6 scoutfs: add radix merge retry counter
Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
26ccaca80b scoutfs: add commit written counter
Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
ca8abeebb1 scoutfs: check fs root in forest hint
The forst code has a hint call to gives iterators a place to start
reading from before they acquire locks.  It was checking all the log
trees but it wasn't checking the main fs tree.  This happened to be OK
today because we're not yet merging items from the log trees into the
main fs tree, but we don't want to miss them once we do start merging
the trees.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
b7943c5412 scoutfs: avoid reading forest roots with block IO
The forest item operations were reading the super block to find the
roots that it should read items from.

This was easiest to implement to start, but it is too expensive.  We
have to find the roots for every newly acquired lock and every call to
walk the inode seq indexes.

To avoid all these reads we first send the current stable versions of
the fs and logs btrees roots along with root grants.  Then we add a net
command to get the current stable roots from the server.  This is used
to refresh the roots if stale blocks are encountered and on the seq
index queries.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
304dbbbafa scoutfs: merge partial allocator blocks
The server fills radix allocators for the client to consume while
allocating during a transaction.  The radix merge function used to move
an entire radix block at a time.  With larger blocks this becomes much
too coarse and can move way too much in one call.

This moves allocator bits a word at a time and more precisely moves the
amount that the caller asked for.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
177af7f746 scoutfs: use larger metadata blocks
Introduce different constants for small and large metadata block
sizes.

The small 4KB size is used for the super block, quorum blocks, and as
the granularity of file data block allocation.  The larger 64KB size is
used for the radix, btree, and forest bloom metadata block structures.

The bulk of this are obvious transitions from the old single constant to
the appropriate new constant.  But there are a few more involved
changes, though just barely.

The block crc calculation now needs the caller to pass in the size of
the block.  The radix function to return free bytes instead returns free
blocks and the caller is responsible for knowing how big its managed
blocks are.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00