Some dlmglue functions are unused by the current ifdefery. They're
throwing warnigns that obscure other warnings in the build. This
broadens the ifdef coverage so that we don't get warnings. The unused
code will either be promoted to an interface or removed as dlmglue
evolves into a reusable component.
Signed-off-by: Zach Brown <zab@versity.com>
We lock multiple inodes by order of their inode number. This fixes
the directory entry paths that hold parent dir and target inode locks.
Link and unlink are easy because they just acquire the existing parent
dir and target inode locks.
Lookup is a little squirrely because we don't want to try and order
the parent dir lock with locks down in iget. It turns out that it's
safe to drop the dir lock before calling iget as long as iget handles
racing the inode cache instantiation with inode deletion.
Creation is the remaining pattern and it's a little weird because we
want to lock the newly created inode before we create it and the items
that store it. We add a function that correctly orders the locks,
transaction, and inode cache instantiation.
Signed-off-by: Zach Brown <zab@versity.com>
It looked like it was easier to have a helper dirty and delete items.
But now that we also have to pass in locks the interface gets messy
enough that it's easier to have the caller take care of it.
Signed-off-by: Zach Brown <zab@versity.com>
Previously we had lots of inode creation callers that used a function to
create the dirent items and we had unlink remove entries by hand.
Rename is different because it wants to remove and add multiple links as
it does its work, including recreating links that it has deleted.
We rework add_entry_item() so that it gets the specific fields it needs
instead of getting them from the vfs structs. This makes it clear that
callers are responsible for the source of the fields. Specifically we
need to be able to add entries during failed rename cleanup without
allocating a new readdir pos from the parent dir.
With callers now responsible for the inputs to add_entry_items() we move
some of its code out into all callers: checking name length, dirtying
the parent dir inode, and allocating a readdir pos from the parent.
We then refactor most of _unlink() into a a del_entry_items() to match
addition. This removes the last user of scoutfs_item_delete_many() and
it will be removed in a future commit.
With the entry item helpers taking specific fields all the helpers they
use also need to use specific fields instead of the vfs structs.
To make rename cluster safe we need to get cluster locks for all the
inodes that we work with. We also have to check that the locally cached
vfs input is still valid after acquiring the locks. We only check the
basic structural correctness of the args: that parent dirs don't violate
ancestor rules to create loops and that the entries assumed by the
rename arguments still exist, or not.
Signed-off-by: Zach Brown <zab@versity.com>
Add a lock name that has a global scope in a given lockspace. It's not
associated with any file system items. We add a scope to the lock name
to indicate if a lock is global or not and set that in other lock naming
intitialization. We permit lock allocation to accept null start and end
keys.
Signed-off-by: Zach Brown <zab@versity.com>
Add a function that can lock multiple inodes in order of their inode
numbers. It handles nulls and duplicate inodes.
Signed-off-by: Zach Brown <zab@versity.com>
Without this we return -ESPIPE when a process tries to seek on a regular
file.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[zab: adapted to new lock call]
Signed-off-by: Zach Brown <zab@zabbo.net>
We need to lock and refresh the VFS inode before it checks permissions in
system calls, otherwise we risk checking against stale inode metadata.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[zab: adapted to newer lock call]
Signed-off-by: Zach Brown <zab@versity.com>
With trylock implemented we can add locking in readpage. After that it's
pretty easy to implement our own read/write functions which at this
point are more or less wrapping the kernel helpers in the correct
cluster locking.
Data invalidation is a bit interesting. If the lock we are invalidating
is an inode group lock, we use the lock boundaries to incrementally
search our inode cache. When an inode struct is found, we sync and
(optionally) truncate pages.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[zab: adapted to newer lock call, fixed some error handling]
Signed-off-by: Zach Brown <zab@versity.com>
Now that we have the inode refreshing flags let's add them to the
callers that want to have a current inode after they have their lock.
Callers locking newly created items use the new inode flag to reset the
refresh gen.
A few inode tests are moved down to after locking so that it can test
the current refreshed inode.
Signed-off-by: Zach Brown <zab@versity.com>
Lock callers can specify that they want inode fields reread from items
after the lock is acquired. dlmglue sets a refresh_gen in the locks
that we store in inodes to track when they were last refreshed and if
they need a refresh.
Signed-off-by: Zach Brown <zab@versity.com>
In addition to setting NEEDS_REFRESH when locks are acquired out of NL,
we now also give them a refresh_gen counter that is increased by
incrementing a long lived counter in the super.
This gives callers a strictly increasing read-only indication that the
lock has changed. They don't have to serialize users to clear
NEEDS_REFRESH and transfer it to some other serialized state.
scoutfs will use with the multiple inodes that are refreshed with
respect to the lock's refresh_gen.
Signed-off-by: Zach Brown <zab@versity.com>
This is based on Mark Fasheh <mfasheh@versity.com>'s series that
introduced inode refreshing after locking and a trylock for readpage.
Rework the inode locking function so that it's more clearly named and
takes flags and the inode struct.
We have callers that want to lock the logical inode but aren't doing
anything with the vfs inode so we provide that specific entry point.
Signed-off-by: Zach Brown <zab@versity.com>
This portion of the port needs a bit of work before we can use
it in scoutfs. In the meantime, disable it so that we can build
on debug kernels.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
These make it hard to read the header and are very ocfs2-specific
functions that would get moved when we merge this upstream anyway.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
When we're not the server node, 'mani' is NULL, so derefing it in our
loop causes a crash. That said, we don't need it anyway - the loop will
eventually end when our btree walk (via btree_prev_overlap_or_next())
ends.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
We move struct ocfs2_lock_res_ops and flags to dlmglue.c so that
locks.c can get access to it. Similarly, we export
ocfs2_lock_res_init_common() for locks.c can initialize each lockres
before use. Also, free_lock_tree() now has to happen before we shut
down the dlm - this gives dlmglue the opportunity to unlock their
underlying dlm locks before we go off freeing the structures.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Ultimataly the direct dlm lock calls will go away. For now though we
grab the lockspace off our cluster connection object. In order to get
this going, I stubbed out our recovery callbacks which now gets us a
print when a node goes down.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
These work with little modification. We comment out a couple
ocfs2-specific lines. And decouple a few more variables from the osb
structure. As it stands, ocfs2 could also use these init/shutdown
functions with little modification.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Dlmglue is built on top of this. Bring in the portions we need which
includes the stackglue API as well as most of the fs/dlm implementation.
I left off the Ocfs2 specific version and connection handling. Also
left out is the old Ocfs2 dlm support which we'll never want.
Like dlmglue, we keep as much of the generic stackglue code in tact
here. This will make translating to/from upstream patches much easier.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
This only leaked into the bast function. I retained the debug print -
it'll be turned off in our build anyway, and that's what we'd
want to do upstream.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
We need this for the lockres name. It also turns out to be the only
thing we need from fs/ocfs2/ocfs2_lockid.h.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Some of this leaks through even after the big #ifdef'ing - ocfs2 had
to special case printing the name of dentry locks. We don't have such
a need so it's easy to drop those calls.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
We only want the generic stuff. Long term the Ocfs2 specific code would be
what's left in fs/ocfs2/dlmglue.[ch].
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Like the mtime index, this index is unused. Removing it is a near
identical task. Running the same createmany test from our last
patch gives us the following:
$ createmany -o '/scoutfs/file_%lu' 10000000
total: 10000000 creates in 598.28 seconds: 16714.59 creates/second
real 9m58.292s
user 0m7.420s
sys 5m44.632s
So after both indices are gone, we go from a 12m56 run time to 9m58s,
saving almost 3 minutes which translates into a total performance
increase of about 23%.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
This index is unused - we can gain some create performance by removing it.
To verify this, I ran createmany for 10 million files:
$ createmany -o '/scoutfs/file_%lu' 10000000
Before this patch:
total: 10000000 creates in 776.54 seconds: 12877.56 creates/second
real 12m56.557s
user 0m7.861s
sys 6m56.986s
After this patch:
total: 10000000 creates in 691.92 seconds: 14452.46 creates/second
real 11m31.936s
user 0m7.785s
sys 6m19.328s
So removing the index gained us about a minute and a half on the test or a
12% performance increase.
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Truncation updates extents that intersect with the input range. It
starts with the first block in the range and iterates until it has
searched for all the extents that could cover the range.
Extents are stored in items at their final block location so that we can
use _next to find intersections. Truncation was searching for the next
extent after the full extent that it was still searching for. That
means it was starting the search at the last block in the extent, not
the first. It would miss all the extents that didn't overlap with the
last block it was searching for.
This fixed by searching from a temporary single block extent at the
start of the search range.
Signed-off-by: Zach Brown <zab@versity.com>
Offline extents weren't being merged because they all had their physical
blkno set to 0 and all the extent calculations didn't treat them
specially. They would only merge if the physical blocks of two extent
were contiguous. Instead of special casing offline extents everywhere
we store them with a physical blkno set to the logical blk_off. This
lets all the current extent calculations work as expected.
Signed-off-by: Zach Brown <zab@versity.com>
Release tries to re-instate extents if it sees an error during release.
Those item manipulations need to be covered by the transaction.
Signed-off-by: Zach Brown <zab@versity.com>
The existing release interface specified byte regions to release but
that didn't match what the underlying file data mapping structure is
capable of. What happens if you specify a single byte to release? Does
it release the whole block? Does it release nothing? Does it return an
error?
By making the interface match the capability of the operation we make
the functioning of the system that much more predictable. Callers are
forced to think about implementing their desires in terms of block
granular releasing.
Signed-off-by: Zach Brown <zab@versity.com>
The ->statfs method was still using the super_block in the super_info
that was read during mount. This will get progressively more out
of date.
We add a network message to ask the server for the current fields that
impact statfs. This is always racy and the fields are mostly nonsense,
but we try our best.
Signed-off-by: Zach Brown <zab@versity.com>