The key printing functions only output the key material that's described
by the format. We have some callers that need to increment or decrement
keys so they expand them to full size keys. This expansion and extra
high precision low significance was hidden from the traces.
This adds a helper that prints the key material with the format and then
appends an encoding of the trailing bytes.
The key printer was a huge mess of cases and ifs that made it hard to
integrate a sane helper. We also take the opportunity to break it up
into zone|type key printer functions. The isolation makes it much
clearer to see what's going on.
Signed-off-by: Zach Brown <zab@versity.com>
We can't block on a lock while holding the transaction open because
that'd stop lock downconversion from syncing to write out items while it
is converting from EX. Add a warning if we try to acquire a blocking
lock while holding a transaction.
Signed-off-by: Zach Brown <zab@versity.com>
Initially the index walking ioctl only ever output a single entry per
iteration. So the number of entries to return and the next entry
pointer to copy to userspace were maintained in the post-increment of
the for loop.
When we added locking of the index item results we made it possible to
not copy any entries in a loop iteration. When that happened the nr and
pointer would be incremented without initializing the entry. The ioctl
caller would see a garbage entry in the results.
This was visible in scoutfs/002 test results on a volume that had an
interesting file population after having run through all the other
scoutfs tests. The uninitialized entries would show up as garbage in
the size index portion of the test.
Signed-off-by: Zach Brown <zab@versity.com>
If item iteration finds a hole in the cache it tries to read items.
After the items are read it can look at the cached region and return
items or -ENOENT. We recently added an end key to limit how far we can
read and cache items.
The end key addition correctly limited the cache read to the lock end
value. It could never read item cache ranges beyond that. This means
that it can't iterate past the end value and should return -ENOENT if it
gets past end. But the code forgot to do that, it only checked for
iteration past last before returning -ENOENT. It spins continually
finding a hole past end but inside last, tries to read items but limits
them to end, then finds the same hole again.
Triggering this requires a lock end that's nearer than the last
iteration key. That's hard to do because most of our item reads are
covered by inode group locks which extend well past iteration inside a
given inode. Inode index item can easily trigger this if there's no
items. I tripped over it when walking empty indexes (data_seq or
online_blocks with no regular files).
Signed-off-by: Zach Brown <zab@versity.com>
The client connection loop was a bit of a mess. It only slept between
retries in one particular case. Other failures to connect would spin
and livelock. It would spin forever.
This fixed loop now has a much more orderly reconnect procedure. Each
connecting sender always tries once. Then retry attempts backoff
exponentially, settling at a nice long timeout. After long enough it'll
return errors.
This fixes livelocks in the xfstests that mount and unmount around
dm-flakey config. generic/{034,039,040} would easily livelock before
this fix.
Signed-off-by: Zach Brown <zab@versity.com>
stackglue is trying to call dlm posix symbols that don't exist in some
RHEL dlm kernels. We're not using this functionality yet so let's just
tear it out for now.
Signed-off-by: Zach Brown <zab@versity.com>
There's a fair amount of lock.c that's dead code now that we're using
dlmglue. Some of the dead code is seen as unused and is throwing
warnings. This silences the errors by removing the code.
Signed-off-by: Zach Brown <zab@versity.com>
Some dlmglue functions are unused by the current ifdefery. They're
throwing warnigns that obscure other warnings in the build. This
broadens the ifdef coverage so that we don't get warnings. The unused
code will either be promoted to an interface or removed as dlmglue
evolves into a reusable component.
Signed-off-by: Zach Brown <zab@versity.com>
We lock multiple inodes by order of their inode number. This fixes
the directory entry paths that hold parent dir and target inode locks.
Link and unlink are easy because they just acquire the existing parent
dir and target inode locks.
Lookup is a little squirrely because we don't want to try and order
the parent dir lock with locks down in iget. It turns out that it's
safe to drop the dir lock before calling iget as long as iget handles
racing the inode cache instantiation with inode deletion.
Creation is the remaining pattern and it's a little weird because we
want to lock the newly created inode before we create it and the items
that store it. We add a function that correctly orders the locks,
transaction, and inode cache instantiation.
Signed-off-by: Zach Brown <zab@versity.com>
It looked like it was easier to have a helper dirty and delete items.
But now that we also have to pass in locks the interface gets messy
enough that it's easier to have the caller take care of it.
Signed-off-by: Zach Brown <zab@versity.com>
Previously we had lots of inode creation callers that used a function to
create the dirent items and we had unlink remove entries by hand.
Rename is different because it wants to remove and add multiple links as
it does its work, including recreating links that it has deleted.
We rework add_entry_item() so that it gets the specific fields it needs
instead of getting them from the vfs structs. This makes it clear that
callers are responsible for the source of the fields. Specifically we
need to be able to add entries during failed rename cleanup without
allocating a new readdir pos from the parent dir.
With callers now responsible for the inputs to add_entry_items() we move
some of its code out into all callers: checking name length, dirtying
the parent dir inode, and allocating a readdir pos from the parent.
We then refactor most of _unlink() into a a del_entry_items() to match
addition. This removes the last user of scoutfs_item_delete_many() and
it will be removed in a future commit.
With the entry item helpers taking specific fields all the helpers they
use also need to use specific fields instead of the vfs structs.
To make rename cluster safe we need to get cluster locks for all the
inodes that we work with. We also have to check that the locally cached
vfs input is still valid after acquiring the locks. We only check the
basic structural correctness of the args: that parent dirs don't violate
ancestor rules to create loops and that the entries assumed by the
rename arguments still exist, or not.
Signed-off-by: Zach Brown <zab@versity.com>
Add a lock name that has a global scope in a given lockspace. It's not
associated with any file system items. We add a scope to the lock name
to indicate if a lock is global or not and set that in other lock naming
intitialization. We permit lock allocation to accept null start and end
keys.
Signed-off-by: Zach Brown <zab@versity.com>
Add a function that can lock multiple inodes in order of their inode
numbers. It handles nulls and duplicate inodes.
Signed-off-by: Zach Brown <zab@versity.com>
Without this we return -ESPIPE when a process tries to seek on a regular
file.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[zab: adapted to new lock call]
Signed-off-by: Zach Brown <zab@zabbo.net>
We need to lock and refresh the VFS inode before it checks permissions in
system calls, otherwise we risk checking against stale inode metadata.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[zab: adapted to newer lock call]
Signed-off-by: Zach Brown <zab@versity.com>
With trylock implemented we can add locking in readpage. After that it's
pretty easy to implement our own read/write functions which at this
point are more or less wrapping the kernel helpers in the correct
cluster locking.
Data invalidation is a bit interesting. If the lock we are invalidating
is an inode group lock, we use the lock boundaries to incrementally
search our inode cache. When an inode struct is found, we sync and
(optionally) truncate pages.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[zab: adapted to newer lock call, fixed some error handling]
Signed-off-by: Zach Brown <zab@versity.com>
Now that we have the inode refreshing flags let's add them to the
callers that want to have a current inode after they have their lock.
Callers locking newly created items use the new inode flag to reset the
refresh gen.
A few inode tests are moved down to after locking so that it can test
the current refreshed inode.
Signed-off-by: Zach Brown <zab@versity.com>
Lock callers can specify that they want inode fields reread from items
after the lock is acquired. dlmglue sets a refresh_gen in the locks
that we store in inodes to track when they were last refreshed and if
they need a refresh.
Signed-off-by: Zach Brown <zab@versity.com>
In addition to setting NEEDS_REFRESH when locks are acquired out of NL,
we now also give them a refresh_gen counter that is increased by
incrementing a long lived counter in the super.
This gives callers a strictly increasing read-only indication that the
lock has changed. They don't have to serialize users to clear
NEEDS_REFRESH and transfer it to some other serialized state.
scoutfs will use with the multiple inodes that are refreshed with
respect to the lock's refresh_gen.
Signed-off-by: Zach Brown <zab@versity.com>
This is based on Mark Fasheh <mfasheh@versity.com>'s series that
introduced inode refreshing after locking and a trylock for readpage.
Rework the inode locking function so that it's more clearly named and
takes flags and the inode struct.
We have callers that want to lock the logical inode but aren't doing
anything with the vfs inode so we provide that specific entry point.
Signed-off-by: Zach Brown <zab@versity.com>
This portion of the port needs a bit of work before we can use
it in scoutfs. In the meantime, disable it so that we can build
on debug kernels.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
These make it hard to read the header and are very ocfs2-specific
functions that would get moved when we merge this upstream anyway.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
When we're not the server node, 'mani' is NULL, so derefing it in our
loop causes a crash. That said, we don't need it anyway - the loop will
eventually end when our btree walk (via btree_prev_overlap_or_next())
ends.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
We move struct ocfs2_lock_res_ops and flags to dlmglue.c so that
locks.c can get access to it. Similarly, we export
ocfs2_lock_res_init_common() for locks.c can initialize each lockres
before use. Also, free_lock_tree() now has to happen before we shut
down the dlm - this gives dlmglue the opportunity to unlock their
underlying dlm locks before we go off freeing the structures.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Ultimataly the direct dlm lock calls will go away. For now though we
grab the lockspace off our cluster connection object. In order to get
this going, I stubbed out our recovery callbacks which now gets us a
print when a node goes down.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
These work with little modification. We comment out a couple
ocfs2-specific lines. And decouple a few more variables from the osb
structure. As it stands, ocfs2 could also use these init/shutdown
functions with little modification.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Dlmglue is built on top of this. Bring in the portions we need which
includes the stackglue API as well as most of the fs/dlm implementation.
I left off the Ocfs2 specific version and connection handling. Also
left out is the old Ocfs2 dlm support which we'll never want.
Like dlmglue, we keep as much of the generic stackglue code in tact
here. This will make translating to/from upstream patches much easier.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
This only leaked into the bast function. I retained the debug print -
it'll be turned off in our build anyway, and that's what we'd
want to do upstream.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
We need this for the lockres name. It also turns out to be the only
thing we need from fs/ocfs2/ocfs2_lockid.h.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Some of this leaks through even after the big #ifdef'ing - ocfs2 had
to special case printing the name of dentry locks. We don't have such
a need so it's easy to drop those calls.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
We only want the generic stuff. Long term the Ocfs2 specific code would be
what's left in fs/ocfs2/dlmglue.[ch].
Signed-off-by: Mark Fasheh <mfasheh@versity.com>