Commit Graph

473 Commits

Author SHA1 Message Date
Zach Brown
a30f0bf82f scoutfs: stop spurious lockdep warning from dlm
The fs/dlm code has a harmless but unannotated inversion between
connection and socket locking that triggers during shutdown and disables
lockdep.  We don't want it to mask our warnings during testing that may
happen after the first shared unmount so we disable lockdep around the
dlm shutdown.  It's not ideal but then neither are distro kernels that
ship with lockdep warnings.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-12 14:51:30 -07:00
Zach Brown
8dee30047c scoutfs: fix xattr trans reservation
The xattr trans reservation assumed that it was only dirtying items for
the new xattr size.  It didn't account for dirty deletion items for
parts from a larger previous xattr.

With this fixed generic/070 no longer triggers warnings.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-12 13:59:52 -07:00
Zach Brown
cb879d9f37 scoutfs: add network greeting message
Add a network greeting message that's exchanged between the client and
server on every connection to make sure that we have the correct file
system and format hash.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-12 13:57:31 -07:00
Zach Brown
ce4daa817a scoutfs: add support for format_hash
Calculate the hash of format.h and ioctl.h and make sure the hash stored
in the super during mkfs matches our calculated hash on mount.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-12 13:57:31 -07:00
Zach Brown
80a4b7df2c scoutfs: move btree parent min to format.h
mkfs needs to know the size of the largest btree when figuring out how
big to make the ring.  It needs to know how few items we can have in
parent blocks and to know that it needs to know how empty the blocks can
get.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-12 13:57:31 -07:00
Zach Brown
8bbb859f0c scoutfs: move scoutfs_ioctl definition
We're going to be strictly enforcing matching format.h and ioctl.h
between userspace and kernel space.  Let's get the exported kernel
function definition out of ioctl.h.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-12 13:57:31 -07:00
Zach Brown
d593e2caa0 scoutfs: warn if we read items without cache limit
All the item ops now know the limit of the items they're allowed to read
into the cache.  Warn if someone asks to read items without knowing
how much they're allowed to read based on their lock coverage.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
365048b785 scoutfs: add full lock arg to _item_set_batch()
Add the full lock arg to _item_set_batch() so that it can verify lock
coverage.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
9b31c9795b scoutfs: add full lock arg to _item_delete()
Add the full lock arg to _item_delete() so that it can verify lock
coverage.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
6cd64f3228 scoutfs: add full lock arg to _item_update()
Add the full lock arg to _item_update() so that it can verify lock
coverage.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
0aa16f5ef6 scoutfs: add lock arg to _item_create()
scoutfs_item_create() hasn't been working with lock coverage.  It
wouldn't return -ENOENT if it didn't have the lock cached.  It would
create items outside lock coverate so they wouldn't be invalidated and
re-read if another node modified the item.

Add a lock arg and teach it to populate the cache so that it's correctly
consistent.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
47f5946c90 scoutfs: fix lock name comparison
The lock name comparison had a typo where it didn't compare the second
fields between the two names.  Only inode index items used the second
field.  This bug could cause lock matching when the names don't match
and trigger lock coverage warnings.

While we're in there don't rely so heavily on readers knowing the
relative precedence of subtraction and (magical gcc empty) ternary
operators.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
950436461a scoutfs: add lock coverage for inode index items
Add lock coverage for inode index items.

Sadly, this isn't trivial. We have to predict the value of the indexed
fields before the operation to lock those items.  One value in
particular we can't reliably predict: the sequence of the transaction we
enter after locking.  Also operations can create an absolute ton of
index item updates -- rename can modify nr_inodes * items_per_inode * 2
items, so maybe 24 today.  And these items can be arbitrarily positioned
in the key space.

So to handle all this we add functions to gather predicted item values
we'll need to lock sort and lock them all, then pass appropriate locks
down to the item functions during inode updates.

The trickiest bit of the index locking code is having to retry if the
sequence number changes.  Preparing locks has to guess the sequence
number of its upcoming trans and then makes item update decisions based
on that.  If we enter and have a different sequence number then we need
to back off and retry with the correct sequence number (we may find that
we'll need to update the indexed meta seq and need to have it locked).

The use of the functions is straight forward.  Sites figure out the
predicted sizes, lock, pass the locks to inode updates, and unlock.

While we're at it we replace the individual item field tracking
variables in the inode info with an array of indexed values.  The code
ends up a bit nicer.  It also gets rid of the indexed time fields that
were left behind and were unused.

It's worth noting that we're getting exclusive locks on the index
updates.  Locking the meta/data seq updates results in complete global
serialization of all changes.  We'll need concurrent writer locks to get
concurrency back.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
960bc4d53c scoutfs: add lock coverage for stage ioctl
Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
aa70903154 scoutfs: add lock coverage for data paths
Use per_task storage on the inode to pass locks from high level read and
write lock holders down into the callbacks that operate under the locks
so that the locks can then be passed to the item functions.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
c3e690a1ac scoutfs: add per_task storage helper
Add some functions for storing and using per-task storage in a list.
Callers can use this to pass pointers to children in a given scope when
interfaces don't allow for passing individual arguments amongst
concurrent callers in the scope.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
0535e249d1 scoutfs: add lock arg to scoutfs_update_inode_item
Add a full lock argument to scoutfs_update_inode_item() and use it to
pass the lock's end key into item_update().  This'll get changed into
passing the full lock into _update soon.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
32a68e84cf scoutfs: add full lock coverage to _item_dirty()
Add the full lock argument to _item_dirty() so that it can verify lock
coverage in addition to limiting item cache population to the range
covered by the lock.

This also ropes in scoutfs_dirty_inode_item() which is a thin wrapper
around _item_dirty();

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
1c6e3e39bf scoutfs: add full lock coverage to _item_next*()
Add the full lock argument to _item_next*() so that it can verify lock
coverage in addition to limiting item cache population to the range
covered by the lock.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
0e4627ea65 scoutfs: add locking of link backref traversal
Add cluster locking around the link backref item lookups during ino
to path traversal.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
3a277bac6f scoutfs: protect orphan items with node_id_lock
Orphan processing only works with orphans on its node today.  Protect
that orphan item use with the node_id lock.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
b2668fee9a scoutfs: protect node free block items
Now that we have a long-lived node_id lock we can use it to protect the
free block items in the node zone.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
9e3954a918 scoutfs: add lock around data item truncation
Add cluster lock coverage to scoutfs_data_truncate_items() and plumb the
lock down into the item functions.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
55709c4345 scoutfs: add lock coverage testing to item_lookup*
Let's give the item functions the full lock so that they can make sure
that the lock has coverage for the keys involved in the operation.

This _lookup*() conversion is first so it adds the
lock_coverager() helper.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
1193fbc9c5 scoutfs: add a node_id lock
A mount's node_id item operations need to be locked.  For now let's
use a lock that's held for the duration of the mount.  It makes
it trivial for us to use it with node_id items but we'll have work
to do if we want to opportunistically get access to other mount's
node_id items while they're still up.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
85dbc21dc6 scoutfs: use lock end keys in rename verification
scoutfs_rename() looks for dirents again after acquiring cluster locks.
It needs to pass in the lock end keys to limit the items that are read
into the cache.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Zach Brown
1da18d17cf scoutfs: use trylock for global server lock
Shared unmount hasn't worked for a long time because we didn't have the
server work woken out of blocking trying to acquire the lock.  In the
old lock code the wait conditions didn't test ->shutdown.

dlmglue doesn't give us a reasonable way to break a caller out of a
blocked lock.  We could add some code to do it with a global context
that'd have to wake all locks or add a call with a lock resource name,
not a held lock, that'd wake that specific lock.  Neither sound great.

So instead we'll use trylock to get the server lock.  It's guaranteed to
make reasonble forward progress.  The server work is already requeued
with a delay to retry.

While we're at it we add a global server lock instead of using the weird
magical inode lock in the fs space.  The server lock doesn't need keys
or to participate in item cache consistency, etc.

With this unmount works.  All mounts will now generate regular
background trylock requests.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-09 15:31:29 -07:00
Mark Fasheh
28a6b82690 scoutfs: allow some recursive locking in dlmglue
Scoutfs can get into a situation where it wants to acquire a lock twice.
This happens when we have a parent and child in the same inode group. A
create operation will lock that group twice, once for each inode.
Instead of forcing callers to remember which inode groups they've
locked, we handle this internally within dlmglue.

Add a dlmglue lock type flag that indicates we might recursively lock a
resource. During locking, when dlmglue sees that flag and we already
have the lock at an appropriate level, it will allow the lock operation
to continue even when the lock is marked blocking.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-10-04 08:55:14 -07:00
Mark Fasheh
c5e6676b04 scoutfs: remove some ifdef'd out dlmglue code
We can dump the ocfs2-specifics as well as any definitions that have been
exported via the header file. This makes reading through and modifying
dlmglue much more palatable.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-10-04 08:55:14 -07:00
Mark Fasheh
17c6025cb7 scoutfs: clean up some comments in lock.c
We finished the lock lru work and can remove these TODO comments.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-10-04 08:55:14 -07:00
Zach Brown
b6c592f099 scoutfs: don't dirty btree buffers
The btree sets some buffer head flags before it writes them to satisfy
submit_bh() requirements.  It was setting dirty which wasn't required.
Nothing every cleared dirty so those buffers sat around and were never
freed.  Each btree block we wrote sat around forever.  Eventually the vm
gets clogged up and the world backs up trying to allocate pages to
write and we see massive stalls.

With this fix we no longer see the 'buffers' vm stat continously grow
and IO rates are consistent over time.

Signed-off-by: Zach Brown <zab@versity.com>
2017-09-28 13:59:49 -07:00
Zach Brown
4bc565be39 scoutfs: silence bulk_alloc gcc warning
Some versions of gcc correctly noticed that bulk_alloc() had a
case where it wouldn't initialize ret if the first segno was 0.  This
won't happen because the client response processing returns an error in
this case.  So this just shuts up the warning.

Signed-off-by: Zach Brown <zab@versity.com>
2017-09-28 13:59:49 -07:00
Zach Brown
c5ddec7058 scoutfs: more aggressively shrink items
The old item shrinking was very conservative.  It would only try and
reclaim items from the front of the range of the oldest items in the
lru.  It would stop making progress if all the items in the front of the
lru are in a range whose initial item can't be reclaimed.

With the item shrinking not making progress memory fills with items.
Eventually the system backs up behind an allocation during segment
writing blocking waiting for free pages.

We fix this by much more aggressively shrinking items.  We now look for
a region of items around the oldest items to shrink.  If those fall in
the middle of a range then we use the memory from the items to construct
a new range and split the existing range.  Now the only way we'll refuse
to shrink items is if they're dirty.   We have a reasonably small cap on
the number of dirty items so we shoudln't get stuck.

Signed-off-by: Zach Brown <zab@versity.com>
2017-09-28 13:59:49 -07:00
Zach Brown
fd509840d4 scoutfs: use pages for seg shrink object count
The VM wasn't very excited about trying to reclaim our seg count when we
returned small count of the number of large segment objects available
for reclaim.  Each segment represents a ton of memory so we want to give
the VM more visibility into the scale of the cache to encourage it to
shrink it.  We define the object count for the seg shrinker as the
number of pages of segments in the lru.

Signed-off-by: Zach Brown <zab@versity.com>
2017-09-28 13:59:49 -07:00
Zach Brown
ccefffe74f scoutfs: add item, range, lock alloc/free counters
Add some counters to track allocation and freeing of our structures that
are subject to shrinking.  This lets us eyeball the counters to see if
we have runaway leaks.

Signed-off-by: Zach Brown <zab@versity.com>
2017-09-28 13:59:49 -07:00
Zach Brown
15aa09b0c2 scoutfs: add shrink exit trace points
Add trace points that show the incoming nr_to_scan and resulting object
count for shrinker calls.

Signed-off-by: Zach Brown <zab@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
43a2d63f79 scoutfs: replace trace_printk in bio.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
e67e500940 scoutfs: turn off tracing in dlmglue.c
Put this behind a #define. Leave the asserts (mlog_bug_on_msg) though
and redefine their macro to printk instead of going to the trace
buffer.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
3430edb60b scoutfs: replace trace_printk in item.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
2a07e6f642 scoutfs: replace trace_printk in data.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
deecfa0ad5 scoutfs: replace trace_printk in trans.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
285842086d scoutfs: replace trace_printk in ioctl.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
8ad6ff9d41 scoutfs: replace trace_printk in inode.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
44a19b63c0 scoutfs: replace trace_printk in segment.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
cf3f9fee75 scoutfs: replace trace_printk in lock.c
Also clean up these traces a bit and make a lock_info trace class
which we can expand in a future patch.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
7739a0084e scoutfs: replace trace_printk in xattr.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
87adeb9306 scoutfs: replace trace_printk in manifest.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
0d28930271 scoutfs: replace trace_printk in super.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
a5283e6f2c scoutfs: replace trace_printk in dir.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00
Mark Fasheh
2c1f117bef scoutfs: replace trace_printk in compact.c
Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-09-28 13:59:49 -07:00