Add a new distinguishable return value (ENOBUFS) from allocator for if
the transaction cannot alloc space. This doesn't mean the filesystem is
full -- opening a new transaction may result in forward progress.
Alter fallocate and get_blocks code to check for this err val and retry
with a new transaction. Handling actual ENOSPC can still happen, of
course.
Add counter called "alloc_trans_retry" and increment it from both spots.
Signed-off-by: Andy Grover <agrover@versity.com>
[zab@versity.com: fixed up write_begin error paths]
The item cache page life cycle is tricky. There are no proper page
reference counts, everthing is done by nesting the page rwlock inside
item_cache_info rwlock. The intent is that you can only reference pages
while you hold the rwlocks appropriately. The per-cpu page references
are outside that locking regime so they add a reference count. Now
there are reference counts for the main cache index reference and for
each per-cpu reference.
The end result of all this is that you can only reference pages outside
of locks if you're protected by references.
Lock invalidation messed this up by trying to add its right split page
to the lru after it was unlocked. Its page reference wasn't protected
at this point. Shrinking could be freeing that page, and so it could be
putting a freed page's memory back on the lru.
Shrinking had a little bug that it was using list_move to move an
initialized lru_head list_head. It turns out to be harmless (list_del
will just follow pointers to itself and set itself as next and prev all
over again), but boy does it catch one's eye. Let's remove all
confusion and drop the reference while holding the cinf->rwlock instead
of trying to optimize freeing outside locks.
Finally, the big one: inserting a read item after compacting the page to
make room was inserting into stale parent pointers into the old
pre-compacted page, rather than the new page that was swapped in by
compaction. This left references to a freed page in the page rbtree and
hilarity ensued.
Signed-off-by: Zach Brown <zab@versity.com>
Instead of hashing headers, define an interop version. Do not mount
superblocks that have a different version, either higher or lower.
Since this is pretty much the same as the format hash except it's a
constant, minimal code changes are needed.
Initial dev version is 0, with the intent that version will be bumped to
1 immediately prior to tagging initial release version.
Update README. Fix comments.
Add interop version to notes and modinfo.
Signed-off-by: Andy Grover <agrover@versity.com>
Add a relatively constrained ioctl that moves extents between regular
files. This is intended to be used by tasks which combine many existing
files into a much larger file without reading and writing all the file
contents.
Signed-off-by: Zach Brown <zab@versity.com>
By convention we have the _IO* ioctl definition after the argument
structs and ALLOC_DETAIL got it a bit wrong so move it down.
Signed-off-by: Zach Brown <zab@versity.com>
We were checking for the wrong magic value.
We now need to use -f when running mkfs in run-tests for things to work.
Signed-off-by: Andy Grover <agrover@versity.com>
This more closely matches stage ioctl and other conventions.
Also change release code to use offset/length nomenclature for consistency.
Signed-off-by: Andy Grover <agrover@versity.com>
Update for cli args and options changes. Reorder subcommands to match
scoutfs built-in help.
Consistent ScoutFS capitalization.
Tighten up some descriptions and verbiage for consistency and omit
descriptions of internals in a few spots.
Add SEE ALSO for blockdev(8) and wipefs(8).
Signed-off-by: Andy Grover <agrover@versity.com>
Make it static and then use it both for argp_parse as well as
cmd_register_argp.
Split commands into five groups, to help understanding of their
usefulness.
Mention that each command has its own help text, and that we are being
fancy to keep the user from having to give fs path.
Signed-off-by: Andy Grover <agrover@versity.com>
This has some fancy parsing going on, and I decided to just leave it
in the main function instead of going to the effort to move it all
to the parsing function.
Signed-off-by: Andy Grover <agrover@versity.com>
Support max-meta-size and max-data-size using KMGTP units with rounding.
Detect other fs signatures using blkid library.
Detect ScoutFS super using magic value.
Move read_block() from print.c into util.c since blkid also needs it.
Signed-off-by: Andy Grover <agrover@versity.com>
Print warning if printing a data dev, you probably wanted the meta dev.
Change read_block to return err value. Otherwise there are confusing
ENOMEM messages when pread() fails. e.g. try to print /dev/null.
Signed-off-by: Andy Grover <agrover@versity.com>
Make offset and length optional. Allow size units (KMGTP) to be used
for offset/length.
release: Since off/len no longer given in 4k blocks, round offset and
length to to 4KiB, down and up respectively. Emit a message if rounding
occurs.
Make version a required option.
stage: change ordering to src (the archive file) then the dest (the
staged file).
Signed-off-by: Andy Grover <agrover@versity.com>
With many concurrent writers we were seeing excessive commits forced
because it thought the data allocator was running low. The transaction
was checking the raw total_len value in the data_avail alloc_root for
the number of free data blocks. But this read wasn't locked, and
allocators could completely remove a large free extent and then
re-insert a slightly smaller free extent as they perform their
alloction. The transaction could see a temporary very small total_len
and trigger a commit.
Data allocations are serialized by a heavy mutex so we don't want to
have the reader try and use that to see a consistent total_len. Instead
we create a data allocator run-time struct that has a consistent
total_len that is updated after all the extent items are manipulated.
This also gives us a place to put the caller's cached extent so that it
can be included in the total_len, previously it wasn't included in the
free total that the transaction saw.
The file data allocator can then initialize and use this struct instead
of its raw use of the root and cached extent. Then the transaction can
sample its consistent total_len that reflects the root and cached
extent.
A subtle detail is that fallocate can't use _free_data to return an
allocated extent on error to the avail pool. It instead frees into the
data_free pool like normal frees. It doesn't really matter that this
could prematurely drain the avail pool because it's in an error path.
Signed-off-by: Zach Brown <zab@versity.com>
Implement a fallback mechanism for opening paths to a filesystem. If
explicitly given, use that. If env var is set, use that. Otherwise, use
current working directory.
Use wordexp to expand ~, $HOME, etc.
Signed-off-by: Andy Grover <agrover@versity.com>
Finally get rid of the last silly vestige of the ancient 'ci' name and
update the scoutfs_inode_info pointers to si. This is just a global
search and replace, nothing functional changes.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test which stages a file in multiple parts while a long-lived
process is blocking on offline extents trying to compare the file to the
known contents.
Signed-off-by: Zach Brown <zab@versity.com>
Now that we have full precision extents a writer with i_mutex and a page
lock can be modifying large extent items which cover much of the
surrounding pages in the file. Readers can be in a different page with
only the page lock and try to work with extent items as the writer is
deleting and creating them.
We add a per-inode rwsem which just protects file extent item
manipulation. We try to acquire it as close to the item use as possible
in data.c which is the only place we work with file extent items.
This stops rare read corruption we were seeing where get_block in a
reader was racing with extent item deletion in a stager at a further
offset in the file.
Signed-off-by: Zach Brown <zab@versity.com>
Move the main scoutfs README.md from the old kmod/ location into the top
of the new single repository. We update the language and instructions
just a bit to reflect that we can checkout and build the module and
utilities from the single repo.
Signed-off-by: Zach Brown <zab@versity.com>
The README in tests/ had gone a bit stale. While it was originally
written to be a README.md displayed in the github repo, we can
still use it in place as a quick introduction to the tests.
Signed-off-by: Zach Brown <zab@versity.com>
When we had three repos the run-tests harness helped by checking
branches in kmod and utils repos to build and test. Now that we have
one repo we can just use the sibling kmod/ and utils/ dirs in the repo.
Signed-off-by: Zach Brown <zab@versity.com>
Now that we're in one repo utils can get its format and ioctl headers
from the authoriative kmod files. When we're building a dist tarball
we copy the files over so that the build from the dist tarball can use
them.
Signed-off-by: Zach Brown <zab@versity.com>