Commit Graph

384 Commits

Author SHA1 Message Date
Mark Fasheh
72a8e9e171 scoutfs: pull in some of ocfs2 stackglue
Dlmglue is built on top of this. Bring in the portions we need which
includes the stackglue API as well as most of the fs/dlm implementation.
I left off the Ocfs2 specific version and connection handling. Also
left out is the old Ocfs2 dlm support which we'll never want.

Like dlmglue, we keep as much of the generic stackglue code in tact
here. This will make translating to/from upstream patches much easier.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 21:40:20 -05:00
Mark Fasheh
960f8e08bb scoutfs: copy in DLM_LVB_LEN from fs/ocfs2/dlm/dlmapi.h
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 19:06:18 -05:00
Mark Fasheh
114760365c scoutfs: fix up ocfs2_log_dlm_error()
We're still referencing some ocfs2 specific lock names here,
take them out.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 19:00:05 -05:00
Mark Fasheh
61499c5d30 scoutfs: pull in struct ocfs2_dlm_debug from fs/ocfs2/ocfs2.h
We need this for the dlmglue global context.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 18:52:49 -05:00
Mark Fasheh
1b59ed99fb scoutfs: remove ocfs2_lock_res->l_type
We don't need it - this the only ocfs2-ism in struct ocfs2_lock_res.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 18:51:38 -05:00
Mark Fasheh
bb100356d9 scoutfs: pull in some fields from ocfs2_super for dlmglue
This is all the dlmglue global context needed.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 18:37:06 -05:00
Mark Fasheh
1831014c24 scoutfs: remove usage of ocfs2_lock_type_string()
This only leaked into the bast function. I retained the debug print -
it'll be turned off in our build anyway, and that's what we'd
want to do upstream.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 18:16:00 -05:00
Mark Fasheh
13963d22e3 scoutfs: pull in OCFS2_LOCK_ID_MAX_LEN
We need this for the lockres name. It also turns out to be the only
thing we need from fs/ocfs2/ocfs2_lockid.h.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 18:12:54 -05:00
Mark Fasheh
9bfb9c059d scoutfs: copy struct ocfs2_lock_res
Grab this from fs/ocfs2/ocfs2.h and put it in dlmglue.h.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 18:07:52 -05:00
Mark Fasheh
99d00a5a2f scoutfs: dlmglue needs to #include "dlmglue.h"
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 18:07:31 -05:00
Mark Fasheh
2142648906 scoutfs: include linux/dlm.h
dlmglue needs this as we're no longer hooking it into the stackglue
component.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 17:59:10 -05:00
Mark Fasheh
498a2f3721 scoutfs: ifdef out usage of OCFS2_LOCK_TYPE_DENTRY
Some of this leaks through even after the big #ifdef'ing - ocfs2 had
to special case printing the name of dentry locks. We don't have such
a need so it's easy to drop those calls.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 17:57:34 -05:00
Mark Fasheh
bf6020c22b scoutfs: hide lockdep_keys in dlmglue for now
This belongs behind #ifdef CONFIG_DEBUG_LOCK_ALLOC in the
upstream code too.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 17:47:50 -05:00
Mark Fasheh
d4a89a5fbc scoutfs: dlmglue ifdef out ocfs2_build_lock_name()
This was missed in the initial #ifdef patch.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 17:46:55 -05:00
Mark Fasheh
500baca533 scoutfs: wrap some mlog calls in dlmglue
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 17:15:23 -05:00
Mark Fasheh
eae932e0fe scoutfs: dlmglue fix sched.h header
Upstream moved linux/sched.h to linux/sched/signal.h. Centos still uses
the old header location.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 16:05:34 -05:00
Mark Fasheh
bc2fef7fc8 scoutfs: ifdef out ocfs2 specific callbacks and functions
We only want the generic stuff. Long term the Ocfs2 specific code would be
what's left in fs/ocfs2/dlmglue.[ch].

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 16:05:24 -05:00
Mark Fasheh
fc21a0253c scoutfs: Hook dlmglue into our build system
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-23 15:54:08 -05:00
Mark Fasheh
f7e3f6f9e6 scoutfs: import fs/ocfs2/dlmglue.[ch] from Linux v4.13-rc6
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-22 19:07:53 -05:00
Mark Fasheh
021404bb6a scoutfs: remove inode ctime index
Like the mtime index, this index is unused. Removing it is a near
identical task. Running the same createmany test from our last
patch gives us the following:

 $ createmany -o '/scoutfs/file_%lu' 10000000

 total: 10000000 creates in 598.28 seconds: 16714.59 creates/second

 real    9m58.292s
 user    0m7.420s
 sys     5m44.632s

So after both indices are gone, we go from a 12m56 run time to 9m58s,
saving almost 3 minutes which translates into a total performance
increase of about 23%.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-22 15:59:13 -07:00
Mark Fasheh
d59367262d scoutfs: remove inode mtime index
This index is unused - we can gain some create performance by removing it.

To verify this, I ran createmany for 10 million files:

 $ createmany -o '/scoutfs/file_%lu' 10000000

Before this patch:
 total: 10000000 creates in 776.54 seconds: 12877.56 creates/second

 real    12m56.557s
 user    0m7.861s
 sys     6m56.986s

After this patch:
 total: 10000000 creates in 691.92 seconds: 14452.46 creates/second

 real    11m31.936s
 user    0m7.785s
 sys     6m19.328s

So removing the index gained us about a minute and a half on the test or a
12% performance increase.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-22 15:59:13 -07:00
Zach Brown
8135b18c76 scoutfs: start truncate from first block
Truncation updates extents that intersect with the input range.  It
starts with the first block in the range and iterates until it has
searched for all the extents that could cover the range.

Extents are stored in items at their final block location so that we can
use _next to find intersections.  Truncation was searching for the next
extent after the full extent that it was still searching for.  That
means it was starting the search at the last block in the extent, not
the first.  It would miss all the extents that didn't overlap with the
last block it was searching for.

This fixed by searching from a temporary single block extent at the
start of the search range.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-17 15:29:08 -07:00
Mark Fasheh
d1ae486d83 scoutfs: provide ->llseek
Without this we return -ESPIPE when a process tries to seek on a regular
file.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-14 19:57:13 -07:00
Zach Brown
07bbc418c3 scoutfs: merge offline extents
Offline extents weren't being merged because they all had their physical
blkno set to 0 and all the extent calculations didn't treat them
specially.  They would only merge if the physical blocks of two extent
were contiguous.  Instead of special casing offline extents everywhere
we store them with a physical blkno set to the logical blk_off.  This
lets all the current extent calculations work as expected.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-14 09:19:03 -07:00
Zach Brown
7cc09761f5 scoutfs: release item cleanup needs transaction
Release tries to re-instate extents if it sees an error during release.
Those item manipulations need to be covered by the transaction.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-14 09:19:03 -07:00
Zach Brown
c7ad9fe772 scoutfs: make release block granular
The existing release interface specified byte regions to release but
that didn't match what the underlying file data mapping structure is
capable of.  What happens if you specify a single byte to release?  Does
it release the whole block?  Does it release nothing?  Does it return an
error?

By making the interface match the capability of the operation we make
the functioning of the system that much more predictable.  Callers are
forced to think about implementing their desires in terms of block
granular releasing.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-14 09:19:03 -07:00
Zach Brown
87ab27beb1 scoutfs: add statfs network message
The ->statfs method was still using the super_block in the super_info
that was read during mount.  This will get progressively more out
of date.

We add a network message to ask the server for the current fields that
impact statfs.  This is always racy and the fields are mostly nonsense,
but we try our best.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-11 10:43:35 -07:00
Zach Brown
ba7bde30fc scoutfs: delete inode index items
Delete inode index items when deleting all the items associated with an
inode after its been unlinked and had all its references dropped.

The index items should always match the fields in the inode item so we
read it to determine the index items that should be deleted, regardless
of if we have the vfs inode cached or not.  We take the opportunity to
collapse the two callers of item deletion which looked up the inode into
item deletion so that it can use the inode fields.

The deletion of index items is partially verified by an inode index test
in xfstests which makes sure that unlinked files are no longer present
in the index.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-11 10:13:56 -07:00
Zach Brown
3768e3c41c scoutfs: don't add dirs to data_seq index
Directories were getting added to the data_seq index.  It might have
looked like they weren't because their data_seqs were always 0 but when
inodes are created they don't have 'have_item' set so all the fields are
added regardless of their current value.

We'd rather not have to wade their directories when looking for regular
file data in the data_seq index so let's explicitly test for regular
files when updating the data_seq index items.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-11 10:13:56 -07:00
Zach Brown
1398b2316d scoutfs: clean up racey inode index updates
The updating of the inode index items was racey.  It loaded the inode
values, updated the items, loaded the fields again, and then stored the
fields in the inode info.  All without locking.  Concurrent attempts
could get the fields scrambled and racing with other paths that update
the fields could get the items and inode info out of sync.

This fixes up the two races by only reading the inode fields once and
performing the multi-stage update under a mutex.  We add a new lock to
avoid ordering problems with trying to add an existing lock at these
points in the locking heirarchy.  We specifically use a mutex because
the item functions can block.

Now the inode index field update just has to safely race with concurrent
access to the fields.

This was found by generic/037 once getattr started refreshing the inode.
It now passes again.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-11 10:07:42 -07:00
Zach Brown
cdb58a967a scoutfs: give module fs scoutfs alias
Use MODULE_ALIAS_FS() to register the "scoutfs" fs alias so that
modprobe can find the module if it's installed and visible to depmod.

We don't yet have clever enough xfstests to mess around with modules.  I
manually verified this by installing the module in /lib/modules and
trying mount -t scoutfs before and after the change.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-10 18:07:26 -07:00
Zach Brown
c1b2ad9421 scoutfs: separate client and server net processing
The networking code was really suffering by trying to combine the client
and server processing paths into one file.  The code can be a lot
simpler by giving the client and server their own processing paths that
take their different socket lifecysles into account.

The client maintains a single connection.  Blocked senders work on the
socket under a sending mutex.  The recv path runs in work that can be
canceled after first shutting down the socket.

A long running server work function acquires the listener lock, manages
the listening socket, and accepts new sockets.  Each accepted socket has
a single recv work blocked waiting for requests.  That then spawns
concurrent processing work which sends replies under a sending mutex.
All of this is torn down by shutting down sockets and canceling work
which frees its context.

All this restructuring makes it a lot easier to track what is happening
in mount and unmount between the client and server.  This fixes bugs
where unmount was failing because the monolithic socket shutdown
function was queueing other work while running while draining.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-04 10:47:42 -07:00
Zach Brown
74a80b772e scoutfs: add endian_swap.h
Add a helper header for conversions between little and big endian.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-04 10:44:06 -07:00
Zach Brown
b98f97e143 scoutfs: use hlist hash for data cursors
The rhashtable API has changed over time.  Continuing to use it means
having to worry about maintaining different APIs in different kernel
generations.

We have a static pool of cursors so we don't need the flexibility of the
resizable rhashtable.  We can roll a simple array of hlist heads to use
as a hash table.

And finally, these cursors will probably disappear eventually anyway.
Let's not invest too much in them.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-04 10:44:06 -07:00
Zach Brown
9f4095bffb scoutfs: break the build if we export raw types
Raw [su]{8,16,32,64} types keep leaking into our exported headers where
they break userspace builds.  Make sure that we only use the exported __
types and add a check to break our build if we get it wrong.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-04 10:37:49 -07:00
Zach Brown
cefe06af61 scoutfs: add git describe to built module
It's handy to quickly find the git commit that built a given module.  We
add a MOD_INFO() tag for it so we can see it in modinfo on the built
module.  We add a ELF note that the kernel tracks in
/sys/modules/$m/notes/ when the module is loaded.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-03 15:07:23 -07:00
Zach Brown
6d16034112 scoutfs: remove old dlm make -I
We don't need arguments for a dlm build.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-03 15:07:23 -07:00
Zach Brown
65c3ac5043 scoutfs: Add cluster locking to node/file ops
This gives us cluster locking for the overwhelming majority of metadata ops
that scoutfs supports. In particular, we can create and modify file metadata
from one node and immediately see the changes reflected on another node.

In addition to synchonrization the cluster locks here are providing an I/O
endpoint for our item cache, ensuring that it doesn't read stale items.

Readdir and file read/write are notable exception - they require a more
specific approach and will be implemented in a future patch.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[fixed iget unlock and truncated commit message summary]
Signed-off-by: Zach Brown <zab@versity.com>
2017-08-03 11:16:35 -07:00
Zach Brown
172cff5537 scoutfs: return -ENODATA from getxattr
The conversion to the multi-item xattrs accidentally returned -EIO when
an attribute wasn't found instead of -ENODATA.  That broke a huge number
of xfstests because ls can look up xattrs and return EIO.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-02 11:16:12 -07:00
Mark Fasheh
325eadca9f scoutfs: check for NULL lock in scoutfs_unlock
This reduces the amount of duplicate code in callers and makes error
handling easier. The alternative is to sprinkle the code with 'if (lock)'
lines at the end of our functions.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-07-27 12:33:21 -07:00
Mark Fasheh
4ff2148f10 scoutfs: Don't use stale root in get_manifest_refs
get_manifest_refs was using the btree root in its stale copy of the
super block.  It is supposed to use the btree root that it was given by
its caller who went to the trouble of finding a sufficiently current
btree root.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
[zab: added commit message and fixed formatting]
Signed-off-by: Zach Brown <zab@versity.com>
2017-07-27 12:32:05 -07:00
Mark Fasheh
a65b28d440 scoutfs: lock impossible ino group for listen lock
Otherwise we get into a problem where the listen lock is conflicting with
regular inode group requests. Since we never drop the listen lock and it (by
design) blocks progress on another node, those inode group requests may
hang.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-07-19 19:04:41 -05:00
Mark Fasheh
2d11f08f5e scoutfs: Remove unused functions, scoutfs_[un]lock_addr
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-07-19 19:04:41 -05:00
Zach Brown
13ebd8d18c scoutfs: don't use delayed downconvert work
The delayed downconvert work wasn't being canceled on shutdown.  60s
after unmount at least the net lock's timer would fire and crash trying
to queue the delayed work on the destroyed workqueue.

Proactively unlocking the locks isn't always beneficial to begin with.
The relative costs of mispredicting the future are wildly different if
we have to re-read item caches from segments or have to downconvert a
blocking read lock.

So we can just remove the delayed work to fix the bug and remove a
moving piece that would need to be considered and tuned.  There's still
a race where we can get basts after destroying the workqueue but before
we destroy the lockspace, we'll get there.

Signed-off-by: Zach Brown <zab@versity.com>
2017-07-19 13:30:03 -07:00
Zach Brown
47b26d7888 scoutfs: add end to _item_delete
Add the end argument to scoutfs_item_delete() to limit how many items it
will read into the cache.

Signed-off-by: Zach Brown <zab@versity.com>
2017-07-19 13:30:03 -07:00
Zach Brown
d5b4677e7f scoutfs: add end to _dirty, _delete_many, _update
These transformations are mechanical and there aren't many callers of
these so we combine them into one commit.

Signed-off-by: Zach Brown <zab@versity.com>
2017-07-19 13:30:03 -07:00
Zach Brown
d78ed098a7 scoutfs: add cache reading limit to _set_batch
Add an end argument to _set_batch to specify the limit of
items we'll read into the cache.

And it turns out that the loop in _set_batch that meant to cache all the
items covered by the batch didn't try hard enough.  It would stop once
the first key was covered but didn't make sure that the coverage
extended to cover last.  This can happen if segment boundaries happen to
fall within the items that make up the batch.  Fix it up while we're in
here.

Signed-off-by: Zach Brown <zab@versity.com>
2017-07-19 13:30:03 -07:00
Zach Brown
0b64a4c83f scoutfs: lock inode index item iteration
Add locks around inode index item iteration.  This is tricky because the
inode index items are enormous and we can't default to coarse locks that
let it read and iterate over the entire key space.  We use the manifest
to find the next small fixed size region to lock and iterate from.

Signed-off-by: Zach Brown <zab@versity.com>
2017-07-19 13:30:03 -07:00
Zach Brown
f611c769e2 scoutfs: add 'end' to item_next to limit reads
Add an end key to the item_next calls to limit how many items will be
read into the cache.  Callers typically get this from the lock they hold
that covers the iteration.  We differentiate between iteration and
caching so that a series of small iterations (listxattr on inodes,
namespace walk in small dirs) can be satisfied by a single read of
adjacent items from segments.

Signed-off-by: Zach Brown <zab@versity.com>
2017-07-19 13:30:03 -07:00
Zach Brown
4f6f842efa scoutfs: add inode index item locking
Add a locking wrapper for the inode index items.  It maps the index
fields to a lock name for each index type.

Signed-off-by: Zach Brown <zab@versity.com>
2017-07-19 13:30:03 -07:00