Commit Graph

933 Commits

Author SHA1 Message Date
Andy Grover
bed33c7ffd Remove item accounting
Remove kmod/src/count.h
Remove scoutfs_trans_track_item()
Remove reserved/actual fields from scoutfs_reservation

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-20 17:01:08 -08:00
Zach Brown
d64dd89ead Fix item cache page memory corruption
The item cache page life cycle is tricky.  There are no proper page
reference counts, everthing is done by nesting the page rwlock inside
item_cache_info rwlock.  The intent is that you can only reference pages
while you hold the rwlocks appropriately.  The per-cpu page references
are outside that locking regime so they add a reference count.  Now
there are reference counts for the main cache index reference and for
each per-cpu reference.

The end result of all this is that you can only reference pages outside
of locks if you're protected by references.

Lock invalidation messed this up by trying to add its right split page
to the lru after it was unlocked.  Its page reference wasn't protected
at this point.  Shrinking could be freeing that page, and so it could be
putting a freed page's memory back on the lru.

Shrinking had a little bug that it was using list_move to move an
initialized lru_head list_head.  It turns out to be harmless (list_del
will just follow pointers to itself and set itself as next and prev all
over again), but boy does it catch one's eye.  Let's remove all
confusion and drop the reference while holding the cinf->rwlock instead
of trying to optimize freeing outside locks.

Finally, the big one: inserting a read item after compacting the page to
make room was inserting into stale parent pointers into the old
pre-compacted page, rather than the new page that was swapped in by
compaction.  This left references to a freed page in the page rbtree and
hilarity ensued.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-20 09:02:29 -08:00
Andy Grover
d731c1577e Filesystem version instead of format hash check
Instead of hashing headers, define an interop version. Do not mount
superblocks that have a different version, either higher or lower.

Since this is pretty much the same as the format hash except it's a
constant, minimal code changes are needed.

Initial dev version is 0, with the intent that version will be bumped to
1 immediately prior to tagging initial release version.

Update README. Fix comments.

Add interop version to notes and modinfo.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-15 10:53:00 -08:00
Zach Brown
3139d3ea68 Add move_blocks ioctl
Add a relatively constrained ioctl that moves extents between regular
files.  This is intended to be used by tasks which combine many existing
files into a much larger file without reading and writing all the file
contents.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-14 13:42:22 -08:00
Zach Brown
4da3d47601 Move ALLOC_DETAIL ioctl definition
By convention we have the _IO* ioctl definition after the argument
structs and ALLOC_DETAIL got it a bit wrong so move it down.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-14 13:42:22 -08:00
Zach Brown
aa1b1fa34f Add util.h for kernel helpers
Add a little header for inline convenience functions.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-14 13:42:22 -08:00
Andy Grover
7cac1e7136 Merge pull request #1 from agrover/use-argp
Rework scoutfs command-line parsing
2021-01-13 11:14:08 -08:00
Andy Grover
2c5871c253 Change release ioctl to be denominated in bytes not blocks
This more closely matches stage ioctl and other conventions.

Also change release code to use offset/length nomenclature for consistency.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-12 16:29:42 -08:00
Zach Brown
fc003a5038 Consistently sample data alloc total_len
With many concurrent writers we were seeing excessive commits forced
because it thought the data allocator was running low.  The transaction
was checking the raw total_len value in the data_avail alloc_root for
the number of free data blocks.  But this read wasn't locked, and
allocators could completely remove a large free extent and then
re-insert a slightly smaller free extent as they perform their
alloction.  The transaction could see a temporary very small total_len
and trigger a commit.

Data allocations are serialized by a heavy mutex so we don't want to
have the reader try and use that to see a consistent total_len.  Instead
we create a data allocator run-time struct that has a consistent
total_len that is updated after all the extent items are manipulated.
This also gives us a place to put the caller's cached extent so that it
can be included in the total_len, previously it wasn't included in the
free total that the transaction saw.

The file data allocator can then initialize and use this struct instead
of its raw use of the root and cached extent.  Then the transaction can
sample its consistent total_len that reflects the root and cached
extent.

A subtle detail is that fallocate can't use _free_data to return an
allocated extent on error to the avail pool.  It instead frees into the
data_free pool like normal frees.  It doesn't really matter that this
could prematurely drain the avail pool because it's in an error path.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-06 09:25:32 -08:00
Zach Brown
1e0f8ee27a Finally change all 'ci' inode info ptrs to 'si'
Finally get rid of the last silly vestige of the ancient 'ci' name and
update the scoutfs_inode_info pointers to si.  This is just a global
search and replace, nothing functional changes.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-15 15:20:02 -08:00
Zach Brown
807ae11ee9 Protect per-inode extent items with extent_sem
Now that we have full precision extents a writer with i_mutex and a page
lock can be modifying large extent items which cover much of the
surrounding pages in the file.  Readers can be in a different page with
only the page lock and try to work with extent items as the writer is
deleting and creating them.

We add a per-inode rwsem which just protects file extent item
manipulation.  We try to acquire it as close to the item use as possible
in data.c which is the only place we work with file extent items.

This stops rare read corruption we were seeing where get_block in a
reader was racing with extent item deletion in a stager at a further
offset in the file.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-15 11:56:50 -08:00
Zach Brown
7ca3672a67 Update repo README.md, remove from kmod
Move the main scoutfs README.md from the old kmod/ location into the top
of the new single repository.  We update the language and instructions
just a bit to reflect that we can checkout and build the module and
utilities from the single repo.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-07 10:39:20 -08:00
Zach Brown
aa6e210ac7 Fix kmod spec path in dist tarball
For some reason, the make dist rule in kmod/ put the spec file in a
scoutfs-$ver/ directory, instead of scoutfs-kmod-$ver/ like the rest of
the files and instead of scoutfs-utils-$ver/ that the spec file for
utils is put in the utils dist tarball.

This adds -kmod to the path for the spec file so that it matches the
rest of the kmod dist tarball.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-07 10:39:20 -08:00
Zach Brown
e2dfffcab9 scoutfs: search_xattrs name requires srch tag
The search_xattrs ioctl is only going to find entries for xattrs with
the .srch. tag which create srch entries as they're created and
destroyed.  Export the xattr tag parsing so that the ioctl can return
-EINVAL for xattrs which don't have the scoutfs prefix and the .srch.
tag.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Zach Brown
f0ddf5ff04 scoutfs: search_xattrs returns each ino once
Hash collisions can lead to multiple xattr ids in an inode being found
for a given name hash value.  If this happens we only want to return the
inode number once.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Zach Brown
18aee0ebbd scoutfs: fix lost entries in resumed srch compact
Compacting very large srch files can use all of a given operation's
metadata allocator.  When this happens we record the position in the
srch files of the compcation in the pending item.

We could lose entries when this happens because the kway_next callback
would advance the srch file position as it read entries and put them in
the tournament tree leaves, not as it put them in the output file.  We'd
continue from the entries that were next to go in the tournament leaves,
not from what was in the leaves.

This refactors the kway merge callbacks to differentiate between getting
entries at the position and advancing the positions.  We initialize the
tournament leaves by getting entries at the positions and only advance
the position as entries leave the tournament tree and are either stored
in the output srch files or are dropped.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Zach Brown
c35f1ff324 scoutfs: inc end when search xattrs retries
In the rare case that searching for xattrs only finds deletions within
its window it retries the search past the window.  The end entry is
inclusive and is the last entry that can be returned.  When retrying the
search we need to start from the entry after that to ensure forward
progress.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Zach Brown
6770a31683 scoutfs: consistently trim srch entry range
We have to limit the number of srch entries that we'll track while
performing a search for all the inodes that contain xattrs that match
the search hash value.

As we hit the limit on the number of entries to track we have to drop
entries.  As we drop entries we can't return any inodes for entries
past the dropped entries.  We were updating the end point of the search
as we dropped entries past the tracked set, but we weren't updating the
search end point if we dropped the last currently tracked entry.

And we were setting the end point to the dropped entry, not to the entry
before it.  This could lead us to spuriously returning deleted entries
if we drop the creation entry and then allow tracking its deletion
later.

This fixes both those problems.  We now properly set the end point to
just before the dropped entry for all entries that we drop.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Zach Brown
9395360324 scoutfs: add srch entry inc/dec
We're going to need to increment and decrement srch entries in coming
fixes.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Zach Brown
7c5823ad12 scoutfs: drop duplicate compacted srch entries
The k-way merge used by srch file compaction only dropped the second
entry in a pair of duplicate entries.  Duplicate entries are both
supposed to be removed so that entries for removed xattrs don't take up
space in the files.

This both drops the second entry and removes the first encoded entry.
As we encode entries we rememeber their starting offset and the previous
entry that they were encoded from.  When we hit a duplicate entry
we undo the encoding of the previous entry.

This only works wihin srch file blocks.  We can still have duplicate
entries that span blocks but that's unlikely and relatively harmless.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Zach Brown
560c91a0e4 scoutfs: fix binary search for sorted srch block
The search_xattrs ioctl looks for srch entries in srch files that map
the caller's hashed xattr name to inodes.  As it searches it maintains a
range of entries that it is looking for.  When it searches sorted srch
files for entries it first performs a binary search for the start of the
range and then iterates over the blocks until it reaches the end of its
range.

The binary search for the start of the range was a bit wrong.  If the
start of the range was less than all the blocks then the binary search
could wrap the left index, try to get a file block at a negative index,
and return an error for the search.

This is relatively hard to hit in practice.  You have to search for the
xattr name with the smallest hashed value and have a sorted srch file
that's just the right size so that blk offset 0 is the last block
compared in the binary search, which sets the right index to -1.  If
there are lots of xattrs, or sorted files of the wrong length, it'll
work.

This fixes the binary search so that it specifically records the first
block offset that intersects with the range and tests that the left and
right offsets haven't been inverted.  Now that we're not breaking out of
the binary search loop we can more obviously put each block reference
that we get.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Zach Brown
4647a6ccb2 scoutfs: fix srch btree iref puts
The srch code was putting btree item refs outside of success.  This is
fine, but they only need to be put when btree ops return success and
have set the reference.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 09:58:35 -08:00
Andy Grover
1bef610416 scoutfs: Don't destroy sroot unless srch_search_xattrs() was called
Until then, sroot is uninitialized so it's not safe to call
destroy_rb_root().

Signed-off-by: Andy Grover <agrover@versity.com>
2020-12-03 09:02:31 -08:00
Zach Brown
9375b9d3b7 scoutfs: commit while enough meta for dirty items
Dirty items in a client transaction are stored in OS pages.  When the
transaction is committed each item is stored in its position in a dirty
btree block in the client's existing log btree.  Allocators are refilled
between transaction commits so a given commit must have sufficient meta
allocator space (avail blocks and unused freed entries) for all the
btree blocks that are dirtied.

The number of btree blocks that are written, thus the number of cow
allocations and frees, depends on the number of blocks in the log btree
and the distribution of dirty items amongst those blocks.  In a typical
load items will be near each other and many dirty items in smaller
kernel pages will be stored in fewer larger btree blocks.

But with the right circumstances, the ratio of dirty pages to dirty
blocks can be much smaller.  With a very large directory and random
entry renames you can easily have 1 btree block dirtied for every page
of dirty items.

Our existing allocator meta allocator fill targets and the number of
dirty item cache pages we allowed did not properly take this in to
account.  It was possible (and, it turned out, relatively easy to test
for with a hgue directory and random renames) to run out of meta avail
blocks while storing dirty items in dirtied btree blocks.

This rebalances our targets and thresholds to make it more likely that
we'll have enough allocator resources to commit dirty items.  Instead of
having an arbitrary limit on the number of dirty item cache pages, we
require that a given number of dirty item cache pages have a given
number of allocator blocks available.

We require a decent number of avialable blocks for each dirty page, so
we increase the server's target number of blocks to give the client so
that it can still build large transactions.

This code is conservative and should not be a problem in practice, but
it's theoretically possible to build a log btree and set of dirty items
that would dirty more blocks that this code assumes.  We will probably
revisit this as we add proper support for ENOSPC.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-02 09:25:13 -08:00
Zach Brown
ae286bf837 scoutfs: update srch _alloc_meta_low callers
The srch system checks that is has allocator space while deleting srch
files and while merging them and dirtying output blocks.  Update the
callers to check for the correct number of avail or freed blocks that it
needs between each check.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-02 09:25:13 -08:00
Zach Brown
a5d9ac5514 scoutfs: rework scoutfs_alloc_meta_low, takes arg
Previously, scoutfs_alloc_meta_lo_thresh() returned true when a small
static number of metadata blocks were either available to allocate or
had space for freeing.  This didn't make a lot of sense as the correct
number depends on how many allocations each caller will make during
their atomic transaction.

Rework the call to take an argument for the number of avail or freed
blocks available to test.  This first pass just uses the existing
number, we'll get to the callers.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-02 09:25:13 -08:00
Andy Grover
cf278f5fa0 scoutfs: Tidy some enum usage
Prefer named to anonymous enums. This helps readability a little.

Use enum as param type if possible (a couple spots).

Remove unused enum in lock_server.c.

Define enum spbm_flags using shift notation for consistency.

Rename get_file_block()'s "gfb" parameter to "flags" for consistency.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-11-30 13:35:44 -08:00
Andy Grover
73333af364 scoutfs: Use enum for lock mode
Signed-off-by: Andy Grover <agrover@versity.com>
2020-11-30 13:35:44 -08:00
Zach Brown
2f3d1c395e scoutfs: show metadev_path in sysfs/mount_options
We forgot to add metadev_path to the options that are found in the
mount_options sysfs directory.

Signed-off-by: Zach Brown <zab@versity.com>
2020-11-24 14:02:02 -08:00
Zach Brown
222e5f1b9d scoutfs: convert endian in SCOUTFS_IS_META_BDEV
We missed that flags is le64.

Signed-off-by: Zach Brown <zab@versity.com>
2020-11-24 14:02:02 -08:00
Zach Brown
08eb75c508 scoutfs: update README.md for metadev_path
Update the README.md introduction to scoutfs to mention the need for and
use of metadata and data block devices.

Signed-off-by: Zach Brown <zab@versity.com>
2020-11-19 11:41:20 -08:00
Andy Grover
9f151fde92 scoutfs: Use separate block devices for metadata and data
Require a second path to metadata bdev be given via mount option.

Verify meta sb matches sb also written to data sb. Change code as needed
in super.c to allow both to be read. Remove check for overlapping
meta and data blknos, since they are now on entirely separate bdevs.

Use meta_bdev for superblock, quorum, and block.c reads and writes.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-11-19 11:41:20 -08:00
Zach Brown
ff532eba75 scoutfs: recover max lock write_version
Write locks are given an increasing version number as they're granted
which makes its way into items in the log btrees and is used to find the
most recent version of an item.

The initialization of the lock server's next write_version for granted
locks dates back to the initial prototype of the forest of log btrees.
It is only initialized to zero as the module is loaded.  This means that
reloading the module, perhaps by rebooting, resets all the item versions
to 0 and can lead to newly written items being ignored in favour of
older existing items with greater versions from a previous mount.

To fix this we initialize the lock server's write_version to the
greatest of all the versions in items in log btrees.  We add a field to
the log_trees struct which records the greatest version which is
maintained as we write out items in transactions.  These are read by the
server as it starts.

Then lock recovery needs to include the write_version so that the
lock_server can be sure to set the next write_version past the greatest
version in the currently granted locks.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-30 11:14:10 -07:00
Zach Brown
736d9d7df8 scoutfs: remove struct scoutfs_log_trees_val
The log_trees structs store the data that is used by client commits.
The primary struct is communicated over the wire so it includes the rid
and nr that identify the log.  The _val struct was stored in btree item
values and was missing the rid and nr because those were stored in the
item's key.

It's madness to duplicate the entire struct just to shave off those two
fields.  We can remove the _val struct and store the main struct in item
values, including the rid and nr.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-30 11:14:10 -07:00
Andy Grover
e6228ead73 scoutfs: Ensure padding in structs remains zeroed
Audit code for structs allocated on stack without initialization, or
using kmalloc() instead of kzalloc().

- avl.c: zero padding in avl_node on insert.
- btree.c: Verify item padding is zero, or WARN_ONCE.
- inode.c: scoutfs_inode contains scoutfs_timespecs, which have padding.
- net.c: zero pad in net header.
- net.h: scoutfs_net_addr has padding, zero it in scoutfs_addr_from_sin().
- xattr.c: scoutfs_xattr has padding, zero it.
- forest.c: item_root in forest_next_hint() appears to either be
    assigned-to or unused, so no need to zero it.
- key.h: Ensure padding is zeroed in scoutfs_key_set_{zeros,ones}

Signed-off-by: Andy Grover <agrover@versity.com>
2020-10-29 14:15:33 -07:00
Andy Grover
13438c8f5d scoutfs: Remove struct scoutfs_betimespec
Unused.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-10-29 14:15:33 -07:00
Andy Grover
d9d9b65f14 scoutfs: remove __packed from all struct definitions
Instead, explicitly add padding field, and adjust member ordering to
eliminate compiler-added padding between members, and at the end of the
struct (if possible: some structs end in a u8[0] array.)

This should prevent unaligned accesses. Not a big deal on x86_64, but
other archs like aarch64 really want this.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-10-29 14:15:33 -07:00
Andy Grover
5e1c8586cc scoutfs: ensure btree values end on 8-byte-alignment boundary
Round val_len up to BTREE_VALUE_ALIGN (8), to keep mid_free_len aligned.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-10-29 14:15:33 -07:00
Andy Grover
68d7a2e2cb scoutfs: align items in item cache to 8 bytes
This will ensure structs, which are internally 8 byte aligned, will remain
so when in the item cache.

16 bytes alignment doesn't seem like it's needed so just do 8.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-10-29 14:15:33 -07:00
Andy Grover
87cb971630 scoutfs: fix hash compiler warnings
Signed-off-by: Andy Grover <agrover@versity.com>
2020-10-29 14:15:33 -07:00
Zach Brown
dc47ec65e4 scoutfs: remove btree value owner footer offset
We were using a trailing owner offset to iterate over btree item values
from the back of the block towards the front.  We did this to reclaim
fragmented free space in a block to satisfy an allocation instead of
having to split the block, which is expensive mostly because it has to
allocate and free metadata blocks.

In the before times, we used to compact items by sorting items by their
offset, moving them, and then sorting them by their keys again.  The
sorting by keys was expensive so we added these owner offsets to be able
to compact without sorting.

But the complexity of maintaining the owner metadata is not worth it.
We can avoid the expensive sorting by keys by allocating a temporary
array of item offsets and sorting only it by the value offset.  That's
nice and quick, it was the key comparisons that were expensive.  Then we
can remove the owner offset entirely, as well as the block header final
free region that compaction needed.

And we also don't compact as often in the modern era because we do the
bulk of our work in the item cache instead of in the btree, and we've
changed the split/merge/compaction heuristics to avoid constantly
splitting/merging/comapcting and an item population happens to hover
right around a shared threshold.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-29 14:15:33 -07:00
Zach Brown
dbea353b92 scoutfs: bring back sort_priv
Bring back sort_priv, we have need for sorting with a caller argument.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-29 14:15:33 -07:00
Zach Brown
2e7053497e scoutfs: remove free_*_blocks super fields
Remove the old superblock fields which were used to track free blocks
found in the radix allocators.  We now walk all the allocators when we
need to know the free totals, rather than trying to keep fields in sync.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
735c2c6905 scoutfs: fix btree split/join setting parent keys
Before the introduction of the AVL tree to sort btree items, the items
were sorted by sorting a small packed array of offsets.  The final
offset in that array pointed to the item in the block with the greatest
key.

With the move to sorting items in an AVL tree by nodes embedded in item
structs, we now don't have the array of offsets and instead have a dense
array of items.  Creation and deletion of items always works with the
final item in the array.

last_item() used to return the item with the greatest key by returning
the item pointed to by the final entry in the sorted offset array, then
it returned the final entry in the item array for creation and deletion
but that was no longer the item with the greatest key.

But spliting and joining still used last_item() to find the item in the
block with the greatest key for updating references to blocks in
parents.  Since the introduction of the AVL tree splitting and joining
has been corrrupting the tree by setting parent block reference keys to
whatever item happened to be at the end of the array, not the item with
the greatest key.

The extent code recently pushed hard enough to hit this by working with
relatively random extent items in the core allocation btrees.
Eventually the parent block reference keys got out of sync and we'd fail
to find items by descending into the wrong children when looking for
them.  Extent deletion hit this during allocation, returned -ENOENT, and
the allocator turned that into -ENOSPC.

With this fixed we can repetedly create and delte millions of files with
heavily fragmented extents in a tiny metadata device.  Eventually it
actually runs out of space instead of spuriously returning ENOSPC in a
matter of minutes.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
a848477e64 scoutfs: remove unused packed exents
We use full data extent items now, we don't need the packed extent
structures.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
b094b18618 scoutfs: compact fewer srch files each time
With the introduction of incremental srch file compaction we added some
fields to the srch_compact struct to record the position of compaction
in each file.  This increased the size of the struct past the limit the
btree places on the size of item values.

We decrease the number of files per compaction from 8 to 4 to cut the
size of the srch_compcat struct in half.  This compacts twice as often,
but still relatively infrequently, and it uses half the space for srch
files waiting to hit the compaction threshold.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
7a3749d591 scoutfs: incremental srch compaction
Previously the srch compaction work would output the entire compacted
file and delete the input files in one atomic commit.  The server would
send the input files and an allocator to the client, and the client
would send back an output file and an allocator that included the
deletion of the input files.  The server would merge in the allocator
and replace the input file items with the output file item.

Doing it this way required giving an enormous allocation pool to the
client in a radix, which would deal with recursive operations
(allocating from and freeing to the radix that is being modified).  We
no longer have the radix allocator, and we use single block avail/free
lists instead of recursively modifying the btrees with free extent
items.  The compaction RPC needs to work with a finite amount of
allocator resources that can be stored in an alloc list block.

The compaction work now does a fixed amount of work and a compaction
operation spans multiple work iterations.

A single compaction struct is now sent between the client and server in
the get_compact and commit_compact messages.  The client records any
partial progress in the struct.  The server writes that position into
PENDING items.  It first searchs for pending items to give to clients
before searching for files to start a new compaction operation.

The compact struct has flags to indicate whether the output file is
being written or the input files are being deleted.  The server manages
the flags and sets the input file deletion flag only once the result of
the compaction has been reflected in the btree items which record srch
files.

We added the progress fields to the compaction struct, making it even
bigger than it already was, so we take the time to allocate them rather
than declaring them on the stack.

It's worth mentioning that each operation now takes a reasonably bounded
amount of time will make it feasible to decide that it has failed and
needs to be fenced.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
d589881855 scoutfs: add tot m/d device blocks to statfs_more
The total_{meta,data}_blocks scoutfs_super_block fields initialized by
mkfs aren't visible to userspace anywhere.  Add them to statfs_more so
that tools can get the totals (and use them for df, in this particular
case).

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
2073a672a0 scoutfs: remove unused statfs RPC
Remove the statfs RPC from the client and server now that we're using
allocator iteration to calculate free blocks.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
33374d8fe6 scoutfs: get statfs free blocks with alloc_foreach
Use alloc_foreach to count the free blocks in all the allocators instead
of sending an RPC to the server.  We cache the results so that constant
df calls don't generate a constant stream of IO.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00