Commit Graph

608 Commits

Author SHA1 Message Date
Zach Brown
ae6907623c scoutfs: add btree rw error traces and counters
Add some trivial traces and counters around btree block IO errors.

Signed-off-by: Zach Brown <zab@versity.com>
2018-05-01 11:48:19 -07:00
Zach Brown
24cc5cc296 scoutfs: lock manifest root request
The manifest root request processing samples the stable_manifest_root in
the server info.  The stable_manifest_root is updated after a
commit has suceeded.

The read of stable_manifest_root in request processing was locking the
manifest.  The update during commit doesn't lock the manifest so these
paths were racing.  The race is very tight, a few cpu stores, but it
could in theory give a client a malformed root that could be
misinterpreted as corruption.

Add a seqcount around the store of the stable manifest root during
commit and its load during request processing.  This ensures that
clients always get a consistent manifest root.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-27 09:06:35 -07:00
Zach Brown
7d7f8e45b7 scoutfs: more carefully manage private bh bits
The management of _checked and _valid_crc private bits in the
buffer_head wasn't quite right.

_checked indicates that the block has been checked and that the
expensive crc verification doesn't need to be recalculated.  _valid_crc
then indicates the result of the crc verification.

_checked is read without locks.  First, we didn't make sure that
_valid_crc was stored before _checked.  Multiple tasks could race to see
_checked before _valid_crc.  So we add some memory barriers.

Then we didn't clear _checked when re-reading a stale block.  This meant
that the moment the block was read its private flags could still
indicate that it had a valid crc.  We clear the private bits before we
read so that we'll recalculate the crc.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-27 09:06:35 -07:00
Zach Brown
fe8b155061 scoutfs: add btree corruption messages
Signed-off-by: Zach Brown <zab@versity.com>
2018-04-27 09:06:35 -07:00
Zach Brown
3efcc87413 scoutfs: add corruption messages for namei
Add scoutfs_corruption() calls for corruption associated with mapping
names to inodes.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-27 09:06:35 -07:00
Zach Brown
c9573d13bb scoutfs: add scoutfs_corruption()
Add a helper for printing a message warning about corruption.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-27 09:06:35 -07:00
Zach Brown
ac259c82a0 scoutfs: allow interrupting client sends
Waiting for replies to sent requests wasn't interruptible.  This was
preventing ctl-c from breaking out of mount when a server wasn't yet
around to accept connections.

The only complication was that the receive thread was accessing the
sender's struct outside of the lock.  An interrupted sender could remove
their struct while receive was processing it.  We rework recv processing
so that it only uses the sender struct under the lock.  This introduces
a cpu copy of the payload but they're small and relatively infrequent
control messages.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 15:49:14 -07:00
Zach Brown
8061a5cd28 scoutfs: add server bind warning
Emit an error message if the server fails to bind.  It can mean that
there is a bad configured address.  But we might want to be able to bind
if the address becomes available, so we don't hard error.  We only emit
the message once for a series of failures.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 15:49:14 -07:00
Zach Brown
81b3159508 scoutfs: return errors from read_items
The introduction of the helper to handle stale segment retrying was
masking errors.  It's meant to pass through the caller's return status
when it doesn't return -EAGAIN to trigger stale read retries.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 15:49:14 -07:00
Zach Brown
676d1e32ef scoutfs: more carefully trace backref walk loop
We were only issuing one kernel warning when we couldn't resolve a path
to an inode due to excessive retries.  It was hard to capture and we
only saw details from the first instance.

This adds a counter for each time we see excessive retries and returns
-ELOOP in that case.  We also extend the link backref adding trace point
to include the found entry, if any.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 10:09:31 -07:00
Zach Brown
c118f7cc03 scoutfs: add option to force tiny btree blocks
Add a tunable option to force using tiny btree blocks on an active
mount.  This lets us quickly exercise large btrees.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 08:59:03 -07:00
Zach Brown
e145267c05 scoutfs: allow smaller btree keys and values
Now that we're using small file system keys we can dramatically shrink
the maximum allowed btree keys and values.  This more accurately matches
the current users and less us fit more possible items in each block.
Which lets us turn the block size way down and still have multiple worst
case largest items per block.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 08:59:03 -07:00
Zach Brown
31286ad714 scoutfs: add options debugfs dir
Add a debugfs dir that will offer debugging options for an actively
mounted volume.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 08:59:03 -07:00
Zach Brown
90de34361c scoutfs: add trigger for advancing btree ring
Add a trigger that lets us force advancing the btree block to the start
of the next half.  It's only safe to do this once migration has moved
all the blocks out of the old half.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 08:59:03 -07:00
Zach Brown
e1f32a0f8b scoutfs: fix spurious hard stale block errors
The stale block handling code only handled the case where we read
through a stale root into blocks that have been overwritten in the
persistent store.  In this case you'll get a new root and the read will
be OK.

It didn't handle the case where we have stale blocks cached at the
blocks of the legitamate current root.  In this case we get ESTALE from
each stale block and because the root doesn't change when we retry we
assume the persistent structure is corrupt.

This case can happen when the btree ring wraps and there are still
blocks cached at the head of the ring.  This became much more possible
when we moved to small fixed size keys.

The fix is to retry reading individual blocks or segments before
returning -ESTALE and expecting the caller to get a new root and try
again.  In the stale cache case this will allow the more recent correct
blocks to be read.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-13 08:59:03 -07:00
Zach Brown
966c8b8cbc scoutfs: alloc inos at multiple of lock group
Inode allocations come from batches that are reserved for directories.
As the batch is exhausted a new one is acquired and allocated from.

The batch size was arbitrarily set to the human friendly 10000.  This
doesn't interact well with the lock group size being a power of two.
Each allocation batch will straddle an inode group with its previous and
next inode batch.

This often doesn't matter because dirctories very rarely have more than
9000 entries.  But as entries pass 10000 they'd see surprising
contention with other inode ranges in directories.

Tweak the allocation size to be a multiple of the lock group size to
stop this from happening.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:53:58 -07:00
Zach Brown
045380ca55 scoutfs: don't negatively cache unread segments
Previously we changed item reading to try and read from the start of its
locked range instead of from the key that wasn't found in the cache.
This greatly improved the performance of access patterns that didn't
proceed in key order.

We rightly shrank the range of items that we'd claim to cache by the
segments that we read.  But we missed the case where our search key
falls between two segments and we chose to read the next segment instead
of the previous.  If the previous segment in this case overlapped with
the lock range then we were claiming to cache the segments contents but
weren't reading it.

This would result in bad negative caching of items that existed.
scoutfs/500 was tripping over this as it tried to rename a file created
by another node.  The local renaming node would try to look up a key
that only existed in level 0 and not read but negatively cache the items
in the previous level 1 segment.

We fix this by shrinking the caching range down as we're considering
manifest entries instead of up as we process each segment read because
we have to shrink based on the segments in the manifest, not the ones we
chose to read.

With this fixed the rename can see those items in the level 1 segment
again.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
5001631dd9 scoutfs: add item deletion tracing
Add some traces for item deletion functions with their return values.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
62e26c5d96 scoutfs: scoutfs_bug_on to show bad append order
Use scoutfs_bug_on() to freak out if we append items to a segment out of
order.  We don't really return errors from this path, we should, but for
now at least share the keys that show the problem.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
704714c2ee scoutfs: add scoutfs_bug_on()
Add a BUG_ON() wrapper that identifies the file system via the super
block and prints the condition and some additional formatted output.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
4b413ed804 scoutfs: add seg item append trace point
Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
9148f24aa2 scoutfs: use single small key struct
Variable length keys lead to having a key struct point to the buffer
that contains the key.  With dirents and xattrs now using small keys we
can convert everyone to using a single key struct and significantly
simplify the system.

We no longer have a seperate generic key buf struct that points to
specific per-type key storage.  All items use the key struct and fill
out the appropriate fields.  All the code that paired a generic key buf
struct and a specific key type struct is collapsed down to a key struct.
There's no longer the difference between a key buf that shares a
read-only key, has it's own precise allocation, or has a max size
allocation for incrementing and decrementing.

Each key user now has an init function fills out its fields.  It looks a
lot like the old pattern but we no longer have seperate key storage that
the buf points to.

A bunch of code now takes the address of static key storage instead of
managing allocated keys.  Conversely, swapping now uses the full keys
instead of pointers to the keys.

We don't need all the functions that worked on the generic key buf
struct because they had different lengths.  Copy, clone, length init,
memcpy, all of that goes away.

The item API had some functions that tested the length of keys and
values.  The key length tests vanish, and that gets rid of the _same()
call.  The _same_min() call only had one user who didn't also test for
the value length being too large.  Let's leave caller key constraints in
callers instead of trying to hide them on the other side of a bunch of
item calls.

We no longer have to track the number of key bytes when calculating if
an item population will fit in segments.  This removes the key length
from reservations, transactions, and segment writing.

The item cache key querying ioctls no longer have to deal with variable
length keys.  The simply specify the start key, the ioctls return the
number of keys copied instead of bytes, and the caller is responsible
for incrementing the next search key.

The segment no longer has to store the key length.  It stores the key
struct in the item header.

The fancy variable length key formatting and printing can be removed.
We have a single format for the universal key struct.  The SK_ wrappers
that bracked calls to use preempt safe per cpu buffers can turn back
into their normal calls.

Manifest entries are now a fixed size.  We can simply split them between
btree keys and values and initialize them instead of allocating them.
This means that level 0 entries don't have their own format that sorts
by the seq.  They're sorted by the key like all the other levels.
Compaction needs to sweep all of them looking for the oldest and read
can stop sweeping once it can no longer overlap.  This makes rare
compaction more expensive and common reading less expensive, which is
the right tradeoff.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
df6a8af71f scoutfs: remove name from dirent keys
Directory entries were the last items that had large variable length
keys because they stored the entry name in the key.  We'd like to have
small fixed size keys so let's store dirents with small keys.

Entries for lookup are stored at the hash of the name instead of the
full name.  The key also contains the unique readdir pos so that we
don't have to deal with collision on creation.  The lookup procedure now
does need to iterate over all the readdir positions for the hash value
and compare the names.

Entries for link backref walking are stored with the entry's position in
the parent dir instead of the entry's name.  The name is then stored in
the value.  Inode to path conversion can still walk the backref items
without having to lookup dirent items.

These changes mean that all directory entry items are now stored at a
small key with some u64s (hash, pos, parent dir, etc) and have a value
with the dirent struct and full entry name.  This lets us use the same
key and value format for the three entry key types.  We no longer have
to allocate keys, we can store them on the stack.

We store the entry's hash and pos in the dirent struct in the item value
so that any item has all the fields to reference all the other item
keys.  We store the same values in the dentry_info so that deletion
(unlink and rename) can find all the entries.

The ino_path ioctl can now much more clearly iterate over parent
directories and entry positions instead of oh so cleverly iterating over
null terminated names in the parent directories.  The ioctl interface
structs and implementation become simpler.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
0bfc4b72c5 scoutfs: fix old comment in item.c
Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
b0bd273acc scoutfs: remove support for multi-element kvecs
Originally the item interfaces were written with full support for
vectored keys and values.  Callers constructed keys and values made up
of header structs and data buffers.  Segments supported much larger
values which could span pages when stored in memory.

But over time we've pulled that support back.  Keys are described by a
key struct instead of a multi-element kvec.  Values are now much smaller
and don't span pages.  The item interfaces still use the kvec arrays but
everyone only uses a single element.

So let's make the world a whole lot less awful but having the item
interfaces only supporting a single value buffer specified by a kvec.  A
bunch of code disappears and the result is much easier to understand.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
966d0176f6 scoutfs: remove seg kvec_from_pages
Item values have been limited to a single value vector entry for a
while.  They can't span 4K blocks in the segment format so they can't
cross kernel pages which are never smaller than 4K.

We don't need a helper to build a vector of their contents across
disjoint pages.  This removes the last user of multi-element kvecs.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
982a0a313e scoutfs: allocate contiguous dirent for creation
The values used in dirent item creation are one of the  few places we
have value kvecs with multiple entries.  Let's instead allocate and copy
the dirent struct and name into a contiguous buffer so that we can move
towards single vector values.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
08f544cc15 scoutfs: remove scoutfs_item_lookup_exact() size
Every caller of scoutfs_item_lookup_exact() provided a size that matches
the value buffer.  Let's remove the redundant arg and use the value
buffer length as the exact size to match.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
9c1b393404 scoutfs: don't track offline sparse blocks
There were some mistakes in tracking offline blocks.

Online and offline block counts are meant to only refer to actual data
contents.  Sparse blocks in an archived file shouldn't be counted as
offline.

But the code was marking unallocated blocks as offline.  This could
corrupt the offline block count if a release extended past i_size and
marked the blocks in the mapping item as offline even though they're
past i_size.

We could have clamped the block walking to not go past i_size.  But we
still would have had the problem of having offline blocks track sparse
blocks.

Instead we can fix the problem by only marking blocks offline if they
had allocated blocks.  This means that sparse regions are never marked
offline and will always read zeros.  Now a release that extends past
i_size will not do anything to the unallocated blocks in the mapping
item past i_size and the offline block count will be consistent.

(Also the 'modified' and 'dirty' booleans were redundant, we only need
one of the two.)

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-02 10:21:58 -07:00
Nic Henke
9d18d3a7aa Add script to build rpms and populate distro release
To better support building RPMs for multiple distribution versions, we
need a bit of help to organize the RPMs. We take the path of adding a
'rpms/7.4.1708' style directory, with the implicit knowledge this is for
CentOS and RHEL. Other distribution handling is left for the future.

To ease DOCKER_IMAGE selection for different distribution versions, the
environment variable DISTRO_VERS can be used. This simplifies a bunch of
call locations and scripting when we don't need to change the Docker
image flavor beyond this distribution version toggle.

i.e: DISTRO_VERS=el73 ./indocker.sh ./build_rpms.sh

The directory tree ends up looking like this:
rpms/7.4.1708/kmod-scoutfs-1.0-0.git.5fee207.el7.x86_64.rpm
rpms/7.3.1611/kmod-scoutfs-1.0-0.git.5fee207.el7.x86_64.rpm
2018-03-29 15:39:36 -07:00
Nic Henke
22f1ded17b Add RPM builds for scoutfs-kmod
This adds in the makefile targets and spec file template we need to
build RPMs. Most of the heavy lifting is taken care of by our docker
container and the rpmbuild.sh script distributed in it.

The git versioning comes from 'git describe --long', which gives us the
tag, the number of commits and the abbreviated commit name. This allows
us to use the number of commits as the RPM release version, letting yum
understand how to process 'yum update' ordering.

yum update shows us the proper processing, along with how our versioning
lines up in the RPMs:
---> Package kmod-scoutfs.x86_64 0:0-0.3.gb83d29d.el7 will be updated
---> Package kmod-scoutfs.x86_64 0:0-0.4.g2e5324e.el7 will be an update
The rpm file name is: kmod-scoutfs-0-0.4.g2e5324e.el7.x86_64.rpm

When we build release RPMS, we'll toggle _release, giving us a rpm name
and version like kmod-scoutfs-0-1.4.g2e5324e.el7.x86_64.rpm. The toggle
of 0/1 is enough to tell yum that all of the non-release RPMs with the
same version are older than the released RPMs. This allows for the
release to yum update cleanly over development versions.

The git hash helps map RPM names to the git version and the contents of
the .note-git_describe, for this RPM it was: heads/nic/rpms-0-g2e5324.
The RPM doesn't contain the branch name, but we can add that and other
info later if needed.

We are not naming the module for a kernel version, that does not seem to
be standard practice upstream. Instead, we'll make use of our
Artifactory repos and upload the RPMs to the correct places (7.3 vs 7.4
directories, etc).
2018-03-29 15:39:36 -07:00
Zach Brown
995e43aa18 scoutfs: hold the alloc sem during truncate
The super info's alloc_rwsem protects the local node free segment and
block bitmap items.  The truncate code wasn't holding using the rwsem so
it could race with other local node allocator item users and corrupt the
bitmaps.  In the best case this could corrupt structures that trigger
EIO.  The corrupt items could also create duplicate block allocations
that clobber each other and corrupt data.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-16 09:18:50 -07:00
Zach Brown
3818f72776 scoutfs: fix inefficient backwards item reading
Iterating over items backwards would result in a lot of extra work.

When an item isn't present in the cache we go and search the segments
for the item.  Once we find the item in its stack of segments we also
read in and cache all the items from the missing item to the end of all
the segments.

This reduced complexity a bit but had very bad worst case performance.
If you read items backwards you constantly get cache misses that each
search the segments for the item and then try to cache everything to the
end of the segment.  You're essentially working uncached and are doing
quite a lot of work to get that single missed item cached each time.

This adds the complexity to cache all the items in the segment stack
around the missed item, not just after the missed item.  Now reverse
iteration hits cached items for everything in the segment after the
initial miss.

To make this work we have to pass the full lock coverage range to the
item reading path.  Then we search the manifest for segments that
contain the missing key and use those segment's ranges to determine the
full range of items that we'll cache.  Then we again search the manifest
for all the level 0 segments that intersect that range.

That range extension is only for cached reads, it doesn't apply to the
'next' call which ignores caching. That operation is getting different
enough that we pull it out into its own function.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 13:15:04 -07:00
Zach Brown
c4de85fd82 scoutfs: cleanup xattr item storage
Honoring the XATTR_REMOVE flag in xattr deletion exposed an interesting
bug in getxattr().  We were unconditinally returning the max xattr value
size when someone tried to probe an existing xattrs' value size by
calling getxattr with size == 0.  Some kernel paths did this to probe
the existance of xattrs.  They expected to get an error if the xattr
didn't exist, but we were giving them the max possible size.  This
kernel path then tried to remove the xattrs with XATTR_REMOVE and that
now failed and caused a bunch of errors in xfstests.

The fix is to return the real xattr value size when getxattr is called
with size == 0.  To do that with the old format we'd have to iterate
over all the items which happened to be pretty awkward in the current
code paths.

So we're taking this opportunity to land a change that had been brewing
for a while.  We now form the xattr keys from the hash of the name and
the item values now store a logical contiquous header, the name, and the
value.  This makes it very easy for us to have the full xattr value
length in the header and return it from getxattr when size == 0.

Now all tests pass while honororing the XATTR_CREATE and XATTR_REMOVE
flags.

And the code is a whole lot easier to follow.  And we've removed another
barrier for moving to small fixed size keys.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 09:23:57 -07:00
Zach Brown
c438f5d887 scoutfs: remove scoutfs_item_set_batch()
scoutfs_item_set_batch() has a rocky history of being a giant pain in
the butt.  It's been a lot simpler to have callers use individual item
ops instead of trying to describe a compound item operation to sometihng
like _set_batch().

Its last user has gone away so we can remove it and never speak of it
again.  And there was much rejoycing.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 09:23:57 -07:00
Zach Brown
4101c655a5 scoutfs: rework set_xattr to honor XATTR_ flags
We weren't properly honoring the XATTR_{CREATE,REPLACE} flags.

For a start we weren't even passing them in to our _xattr_set() from
_setxattr().  So that's something.

We left it to scoutfs_item_set_batch() to return errors if we were. This
is wrong because the xattr flags are xattr granular, not item granular.
We don't want _REPLACE to fail when replacing a larger xattr value
because later items in the xattr don't have matching existing items.
(And it had some bugs where it could livelock if you set flags and items
already existed. :high_fives:).

Now that we have the _save and _restore calls we can avoid _set_batch's
bad semantics and bugs entirely.  It's easy for us to compare the flags
to item lookups, delete the old, create the new, and restore the old on
errors.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 09:23:57 -07:00
Zach Brown
acfc4b357b scoutfs: add item saving and restoring
Add item cache functions for saving and restoring items.  This lets
callers more easily undo changes while they have transactions pinned.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 09:23:57 -07:00
Zach Brown
4dad03a3dd scoutfs: add item_is_dirty() helper
We had an absolute ton of open coding of testing an item's dirty flag.
Let's hide it off in a helper so we're less likely to mess it up.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 09:23:57 -07:00
Zach Brown
7fb6841b1e scoutfs: free val while deleting items
It was silly to hand off deleted values to callers to free.  We can just
free as we delete and save a bunch of caller value manipulation.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 09:23:57 -07:00
Zach Brown
77f29fa021 scoutfs: allow null val in scoutfs_item_lookup
Some callers may want to just test if an item is present and not
necessarily want to setup storage for copying the value in.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 09:23:57 -07:00
Zach Brown
9f51b63f8d scoutfs: check snprintf_key() format args
Add the function attribute to snprintf_key() to have the compiler verify
its print format and args.  I noticed some buggy changes that didn't
throw errors.  Happily none of the existing calls had problems.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-15 09:23:57 -07:00
Zach Brown
0b54d71b98 scoutfs: avoid double unlock
We weren't sufficiently careful in reacting to basts.  If a bast arrived
whlie an unlock is in flight we'd turn around and try to unlock again,
returning an error, and exploding.

More carefully only act on basts if we have an active mode that needs to
be unlocked.  Now if the racey bast arrives we'll ignore it and end up
freeing the lock in processing after the unlock succeeds.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-13 13:14:26 -07:00
Zach Brown
d58c8d5993 scoutfs: move lock work after dependencies
Some of the lock processing path was happening too early.  Both
maintainance of the locks on the LRU and waking waiters depends on
whether there is work pending and on the the granted mode.  Those are
changed in the middle by processing so we need to move these two bits of
work down so that they can consume the updated state.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-09 14:29:55 -08:00
Zach Brown
9ad0f81084 scoutfs: add some lock/item consistency checks
Add some tests to the locking paths to see if we violate item caching
rules.

As we finish locking calls we make sure that the item cache is
consistent with the lock mode.  And we make sure that we don't free
locks before they've been unlocked and had a chance to check the
item cache.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-09 14:29:55 -08:00
Zach Brown
2aa613dae5 scoutfs: add scoutfs_item_range_cached()
Add a quick helper for querying if a given range of keys is covered by
the item cache.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-09 14:29:55 -08:00
Zach Brown
951b6d8dcd scoutfs: add d_revalidate trace
Add a trace event to get some visibility into dentry revalidation.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-09 14:29:55 -08:00
Zach Brown
2136a973ed scoutfs: copy names in rename trace event
The rename trace event was recording and later dereferencing pointers to
dentry names that could be long gone by the time the output is
generated.  We need to copy the name strings into the trace buffers.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-09 14:29:55 -08:00
Zach Brown
8ec5b7efe3 scoutfs: remove bio page add trace
This is a very chatty trace evenet that doesn't add much value.  Let's
remove it and make a lot more room for other more interesting trace
events.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-09 14:29:55 -08:00
Zach Brown
6adb24f0f5 scoutfs: clean up compaction destruction
We're seeing warnings from trying to destroy the server work queue while
it's still active.  Auditing shows that almost all of the sources of
queued work are shutdown before we destroy the work queue.

Except for the compaction func.  It queues itself via the sneaky call to
scoutfs_compact_kick() inside scoutfs_client_finish_compaction().  What
a mess.  We only wait for work to finish running in
scoutfs_compact_destroy(), we don't forbid further queueing.  So with
just the right races it looks possible to have the compact func
executing after we return from _destroy().  It can then later try to
queue the commit_work in the server workqueue.

It's pretty hard to imagine this race, but it's made a bit easier by the
startling fact that we don't free the compact info struct.  That makes
it a little easier to imagine use-after-destroy not exploding.

So let's forcibly forbid chain queueing during compaction shutdown by
using cancel_work_sync().  It marks the work canceling while flushing so
the queue_work in the work func won't do anything.  This should ensure
that the compaction func isn't running when destroy returns.

Also while we're at it actually free the allocated compaction info
struct!  Cool cool cool.

Signed-off-by: Zach Brown <zab@versity.com>
2018-03-06 10:32:33 -08:00
Zach Brown
241b52d55a scoutfs: reserve at least one xattr item value
Even when we're setting an xattr with no value we still have a file
system item value that contains the xattr value header which tells us
that this is the last value.

This fixes a warning that would be issued if we tried to set an xattr
with a zero length value.  We'd try to dirty an item value with the
header after having reserved zero bytes for item values.  To hit the
warning the inode couldn't already be dirty so that the xattr value
didn't get to hide in the unsed reservation for dirtying the inode
item's value.

Signed-off-by: Zach Brown <zab@versity.com>
2018-02-28 22:14:15 -08:00