Extend scoutfs_dir_add_next_linkref() to be able to return multiple
backrefs under the lock for each call and have it take an argument to
limit the number of backrefs that can be added and returned.
Its return code changes a bit in that it returns 1 on success instead of
0 so we have to be a little careful with callers who were expecting 0.
It still returns -ENOENT when no entries are found.
We break up its tracepoint into one that records each entry added and
one that records the result of each call.
This will be used by an ioctl to give callers just the entries that
point to an inode instead of assembling full paths from the root.
Signed-off-by: Zach Brown <zab@versity.com>
Our open by handle functions didn't care that the inode wasn't
referenced and let tasks open unlinked inodes by number. This
interacted badly with the inode deletion mechanisms which required that
inodes couldn't be cached on other nodes after the transaction which
removed their final reference.
If a task did accidentally open a file by inode while it was being
deleted it could see the inode items in an inconsistent state and return
very confusing errors that look like corruption.
The fix is to give the handle iget callers a flag to tell iget to only
get the inode if it has a positive nlink. If iget sees that the inode
has been unlinked it returns enoent.
Signed-off-by: Zach Brown <zab@versity.com>
As subsystems were built I tended to use interruptible waits in the hope
that we'd let users break out of most waits.
The reality is that we have significant code paths that have trouble
unwinding. Final inode deletion during iput->evict in a task is a good
example. It's madness to have a pending signal turn an inode deletion
from an efficient inline operation to a deferred background orphan inode
scan deletion.
It also happens that golang built pre-emptive thread scheduling around
signals. Under load we see a surprising amount of signal spam and it
has created surprising error cases which would have otherwise been fine.
This changes waits to expect that IOs (including network commands) will
complete reasonably promptly. We remove all interruptible waits with
the notable exception of breaking out of a pending mount. That requires
shuffling setup around a little bit so that the first network message we
wait for is the lock for getting the root inode.
Signed-off-by: Zach Brown <zab@versity.com>
Directory entries were the last items that had large variable length
keys because they stored the entry name in the key. We'd like to have
small fixed size keys so let's store dirents with small keys.
Entries for lookup are stored at the hash of the name instead of the
full name. The key also contains the unique readdir pos so that we
don't have to deal with collision on creation. The lookup procedure now
does need to iterate over all the readdir positions for the hash value
and compare the names.
Entries for link backref walking are stored with the entry's position in
the parent dir instead of the entry's name. The name is then stored in
the value. Inode to path conversion can still walk the backref items
without having to lookup dirent items.
These changes mean that all directory entry items are now stored at a
small key with some u64s (hash, pos, parent dir, etc) and have a value
with the dirent struct and full entry name. This lets us use the same
key and value format for the three entry key types. We no longer have
to allocate keys, we can store them on the stack.
We store the entry's hash and pos in the dirent struct in the item value
so that any item has all the fields to reference all the other item
keys. We store the same values in the dentry_info so that deletion
(unlink and rename) can find all the entries.
The ino_path ioctl can now much more clearly iterate over parent
directories and entry positions instead of oh so cleverly iterating over
null terminated names in the parent directories. The ioctl interface
structs and implementation become simpler.
Signed-off-by: Zach Brown <zab@versity.com>
This is implemented by filling in our export ops functions.
When we get those right, the VFS handles most of the details for us.
Internally, scoutfs handles are two u64's (ino and parent ino) and a
type which indicates whether the handle contains the parent ino or not.
Surpisingly enough, no existing type matches this pattern so we use our
own types to identify the handle.
Most of the export ops are self explanatory scoutfs_encode_fh() takes
an inode and an optional parent and encodes those into the smallest
handle that would fit. scoutfs_fh_to_[dentry|parent] turn an existing
file handle into a dentry.
scoutfs_get_parent() is a bit different and would be called on
directory inodes to connect a disconnected dentry path. For
scoutfs_get_parent(), we can export add_next_linkref() and use the backref
mechanism to quickly find a parent directory.
scoutfs_get_name() is almost identical to scoutfs_get_parent(). Here we're
linking an inode to a name which exists in the parent directory. We can also
use add_next_linkref, and simply copy the name from the backref.
As a result of this patch we can also now export scoutfs file systems
via NFS, however testing NFS thoroughly is outside the scope of this
work so export support should be considered experimental at best.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[zab edited <= NAME_MAX]