Update for cli args and options changes. Reorder subcommands to match
scoutfs built-in help.
Consistent ScoutFS capitalization.
Tighten up some descriptions and verbiage for consistency and omit
descriptions of internals in a few spots.
Add SEE ALSO for blockdev(8) and wipefs(8).
Signed-off-by: Andy Grover <agrover@versity.com>
Make it static and then use it both for argp_parse as well as
cmd_register_argp.
Split commands into five groups, to help understanding of their
usefulness.
Mention that each command has its own help text, and that we are being
fancy to keep the user from having to give fs path.
Signed-off-by: Andy Grover <agrover@versity.com>
This has some fancy parsing going on, and I decided to just leave it
in the main function instead of going to the effort to move it all
to the parsing function.
Signed-off-by: Andy Grover <agrover@versity.com>
Support max-meta-size and max-data-size using KMGTP units with rounding.
Detect other fs signatures using blkid library.
Detect ScoutFS super using magic value.
Move read_block() from print.c into util.c since blkid also needs it.
Signed-off-by: Andy Grover <agrover@versity.com>
Print warning if printing a data dev, you probably wanted the meta dev.
Change read_block to return err value. Otherwise there are confusing
ENOMEM messages when pread() fails. e.g. try to print /dev/null.
Signed-off-by: Andy Grover <agrover@versity.com>
Make offset and length optional. Allow size units (KMGTP) to be used
for offset/length.
release: Since off/len no longer given in 4k blocks, round offset and
length to to 4KiB, down and up respectively. Emit a message if rounding
occurs.
Make version a required option.
stage: change ordering to src (the archive file) then the dest (the
staged file).
Signed-off-by: Andy Grover <agrover@versity.com>
Implement a fallback mechanism for opening paths to a filesystem. If
explicitly given, use that. If env var is set, use that. Otherwise, use
current working directory.
Use wordexp to expand ~, $HOME, etc.
Signed-off-by: Andy Grover <agrover@versity.com>
Finally get rid of the last silly vestige of the ancient 'ci' name and
update the scoutfs_inode_info pointers to si. This is just a global
search and replace, nothing functional changes.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test which stages a file in multiple parts while a long-lived
process is blocking on offline extents trying to compare the file to the
known contents.
Signed-off-by: Zach Brown <zab@versity.com>
Now that we have full precision extents a writer with i_mutex and a page
lock can be modifying large extent items which cover much of the
surrounding pages in the file. Readers can be in a different page with
only the page lock and try to work with extent items as the writer is
deleting and creating them.
We add a per-inode rwsem which just protects file extent item
manipulation. We try to acquire it as close to the item use as possible
in data.c which is the only place we work with file extent items.
This stops rare read corruption we were seeing where get_block in a
reader was racing with extent item deletion in a stager at a further
offset in the file.
Signed-off-by: Zach Brown <zab@versity.com>
Move the main scoutfs README.md from the old kmod/ location into the top
of the new single repository. We update the language and instructions
just a bit to reflect that we can checkout and build the module and
utilities from the single repo.
Signed-off-by: Zach Brown <zab@versity.com>
The README in tests/ had gone a bit stale. While it was originally
written to be a README.md displayed in the github repo, we can
still use it in place as a quick introduction to the tests.
Signed-off-by: Zach Brown <zab@versity.com>
When we had three repos the run-tests harness helped by checking
branches in kmod and utils repos to build and test. Now that we have
one repo we can just use the sibling kmod/ and utils/ dirs in the repo.
Signed-off-by: Zach Brown <zab@versity.com>
Now that we're in one repo utils can get its format and ioctl headers
from the authoriative kmod files. When we're building a dist tarball
we copy the files over so that the build from the dist tarball can use
them.
Signed-off-by: Zach Brown <zab@versity.com>
For some reason, the make dist rule in kmod/ put the spec file in a
scoutfs-$ver/ directory, instead of scoutfs-kmod-$ver/ like the rest of
the files and instead of scoutfs-utils-$ver/ that the spec file for
utils is put in the utils dist tarball.
This adds -kmod to the path for the spec file so that it matches the
rest of the kmod dist tarball.
Signed-off-by: Zach Brown <zab@versity.com>
Add a trivial top-level Makefile that just runs Make in all the subdirs.
This will probably expand over time.
Signed-off-by: Zach Brown <zab@versity.com>
Add a utility that mimics our search_xattrs ioctl with directory entry
walking and fgetxattr as efficiently as it can so we can use it to test
large file populations.
Signed-off-by: Zach Brown <zab@versity.com>
The search_xattrs ioctl is only going to find entries for xattrs with
the .srch. tag which create srch entries as they're created and
destroyed. Export the xattr tag parsing so that the ioctl can return
-EINVAL for xattrs which don't have the scoutfs prefix and the .srch.
tag.
Signed-off-by: Zach Brown <zab@versity.com>
Hash collisions can lead to multiple xattr ids in an inode being found
for a given name hash value. If this happens we only want to return the
inode number once.
Signed-off-by: Zach Brown <zab@versity.com>
Compacting very large srch files can use all of a given operation's
metadata allocator. When this happens we record the position in the
srch files of the compcation in the pending item.
We could lose entries when this happens because the kway_next callback
would advance the srch file position as it read entries and put them in
the tournament tree leaves, not as it put them in the output file. We'd
continue from the entries that were next to go in the tournament leaves,
not from what was in the leaves.
This refactors the kway merge callbacks to differentiate between getting
entries at the position and advancing the positions. We initialize the
tournament leaves by getting entries at the positions and only advance
the position as entries leave the tournament tree and are either stored
in the output srch files or are dropped.
Signed-off-by: Zach Brown <zab@versity.com>
In the rare case that searching for xattrs only finds deletions within
its window it retries the search past the window. The end entry is
inclusive and is the last entry that can be returned. When retrying the
search we need to start from the entry after that to ensure forward
progress.
Signed-off-by: Zach Brown <zab@versity.com>
We have to limit the number of srch entries that we'll track while
performing a search for all the inodes that contain xattrs that match
the search hash value.
As we hit the limit on the number of entries to track we have to drop
entries. As we drop entries we can't return any inodes for entries
past the dropped entries. We were updating the end point of the search
as we dropped entries past the tracked set, but we weren't updating the
search end point if we dropped the last currently tracked entry.
And we were setting the end point to the dropped entry, not to the entry
before it. This could lead us to spuriously returning deleted entries
if we drop the creation entry and then allow tracking its deletion
later.
This fixes both those problems. We now properly set the end point to
just before the dropped entry for all entries that we drop.
Signed-off-by: Zach Brown <zab@versity.com>
The k-way merge used by srch file compaction only dropped the second
entry in a pair of duplicate entries. Duplicate entries are both
supposed to be removed so that entries for removed xattrs don't take up
space in the files.
This both drops the second entry and removes the first encoded entry.
As we encode entries we rememeber their starting offset and the previous
entry that they were encoded from. When we hit a duplicate entry
we undo the encoding of the previous entry.
This only works wihin srch file blocks. We can still have duplicate
entries that span blocks but that's unlikely and relatively harmless.
Signed-off-by: Zach Brown <zab@versity.com>
The search_xattrs ioctl looks for srch entries in srch files that map
the caller's hashed xattr name to inodes. As it searches it maintains a
range of entries that it is looking for. When it searches sorted srch
files for entries it first performs a binary search for the start of the
range and then iterates over the blocks until it reaches the end of its
range.
The binary search for the start of the range was a bit wrong. If the
start of the range was less than all the blocks then the binary search
could wrap the left index, try to get a file block at a negative index,
and return an error for the search.
This is relatively hard to hit in practice. You have to search for the
xattr name with the smallest hashed value and have a sorted srch file
that's just the right size so that blk offset 0 is the last block
compared in the binary search, which sets the right index to -1. If
there are lots of xattrs, or sorted files of the wrong length, it'll
work.
This fixes the binary search so that it specifically records the first
block offset that intersects with the range and tests that the left and
right offsets haven't been inverted. Now that we're not breaking out of
the binary search loop we can more obviously put each block reference
that we get.
Signed-off-by: Zach Brown <zab@versity.com>
The srch code was putting btree item refs outside of success. This is
fine, but they only need to be put when btree ops return success and
have set the reference.
Signed-off-by: Zach Brown <zab@versity.com>
Dirty items in a client transaction are stored in OS pages. When the
transaction is committed each item is stored in its position in a dirty
btree block in the client's existing log btree. Allocators are refilled
between transaction commits so a given commit must have sufficient meta
allocator space (avail blocks and unused freed entries) for all the
btree blocks that are dirtied.
The number of btree blocks that are written, thus the number of cow
allocations and frees, depends on the number of blocks in the log btree
and the distribution of dirty items amongst those blocks. In a typical
load items will be near each other and many dirty items in smaller
kernel pages will be stored in fewer larger btree blocks.
But with the right circumstances, the ratio of dirty pages to dirty
blocks can be much smaller. With a very large directory and random
entry renames you can easily have 1 btree block dirtied for every page
of dirty items.
Our existing allocator meta allocator fill targets and the number of
dirty item cache pages we allowed did not properly take this in to
account. It was possible (and, it turned out, relatively easy to test
for with a hgue directory and random renames) to run out of meta avail
blocks while storing dirty items in dirtied btree blocks.
This rebalances our targets and thresholds to make it more likely that
we'll have enough allocator resources to commit dirty items. Instead of
having an arbitrary limit on the number of dirty item cache pages, we
require that a given number of dirty item cache pages have a given
number of allocator blocks available.
We require a decent number of avialable blocks for each dirty page, so
we increase the server's target number of blocks to give the client so
that it can still build large transactions.
This code is conservative and should not be a problem in practice, but
it's theoretically possible to build a log btree and set of dirty items
that would dirty more blocks that this code assumes. We will probably
revisit this as we add proper support for ENOSPC.
Signed-off-by: Zach Brown <zab@versity.com>
The srch system checks that is has allocator space while deleting srch
files and while merging them and dirtying output blocks. Update the
callers to check for the correct number of avail or freed blocks that it
needs between each check.
Signed-off-by: Zach Brown <zab@versity.com>
Previously, scoutfs_alloc_meta_lo_thresh() returned true when a small
static number of metadata blocks were either available to allocate or
had space for freeing. This didn't make a lot of sense as the correct
number depends on how many allocations each caller will make during
their atomic transaction.
Rework the call to take an argument for the number of avail or freed
blocks available to test. This first pass just uses the existing
number, we'll get to the callers.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test that randomly renames entries in a single large directory.
This has caught bugs in the reservation of allocator resources for
client transactions.
Signed-off-by: Zach Brown <zab@versity.com>
Prefer named to anonymous enums. This helps readability a little.
Use enum as param type if possible (a couple spots).
Remove unused enum in lock_server.c.
Define enum spbm_flags using shift notation for consistency.
Rename get_file_block()'s "gfb" parameter to "flags" for consistency.
Signed-off-by: Andy Grover <agrover@versity.com>