Commit Graph

49 Commits

Author SHA1 Message Date
Zach Brown
96f2ad29dc Add inode crtime creation time
Add an inode creation time field.  It's created for all new inodes.
It's visible to stat_more.  setattr_more can set it during
restore.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-08 11:00:30 -07:00
Zach Brown
73bf916182 Return ENOSPC as space gets low
Returning ENOSPC is challenging because we have clients working on
allocators which are a fraction of the whole and we use COW transactions
so we need to be able to allocate to free.  This adds support for
returning ENOSPC to client posix allocators as free space gets low.

For metadata, we reserve a number of free blocks for making progress
with client and server transactions which can free space.  The server
sets the low flag in a client's allocator if we start to dip into
reserved blocks.  In the client we add an argument to entering a
transaction which indicates if we're allocating new space (as opposed to
just modifying existing data or freeing).  When an allocating
transaction runs low and the server low flag is set then we return
ENOSPC.

Adding an argument to transaciton holders and having it return ENOSPC
gave us the opportunity to clean it up and make it a little clearer.
More work is done outside the wait_event function and it now
specifically waits for a transaction to cycle when it forces a commit
rather than spinning until the transaction worker acquires the lock and
stops it.

For data the same pattern applies except there are no reserved blocks
and we don't COW data so it's a simple case of returning the hard ENOSPC
when the data allocator flag is set.

The server needs to consider the reserved count when refilling the
client's meta_avail allocator and when swapping between the two
meta_avail and meta_free allocators.

We add the reserved metadata block count to statfs_more so that df can
subtract it from the free meta blocks and make it clear when enospc is
going to be returned for metadata allocations.

We increase the minimum device size in mkfs so that small testing
devices provide sufficient reserved blocks.

And finally we add a little test that makes sure we can fill both
metadata and data to ENOSPC and then recover by deleting what we filled.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-07 14:13:14 -07:00
Andy Grover
4630b77b45 cleanup: Use flexible array members instead of 0-length arrays
See Documentation/process/deprecated.rst:217, items[] now preferred over
items[0].

Signed-off-by: Andy Grover <agrover@versity.com>
2021-04-07 10:14:47 -07:00
Andy Grover
0deb232d3f Support O_TMPFILE and allow MOVE_BLOCKS into released extents
Support O_TMPFILE: Create an unlinked file and put it on the orphan list.
If it ever gains a link, take it off the orphan list.

Change MOVE_BLOCKS ioctl to allow moving blocks into offline extent ranges.
Ioctl callers must set a new flag to enable this operation mode.

RH-compat: tmpfile support it actually backported by RH into 3.10 kernel.
We need to use some of their kabi-maintaining wrappers to use it:
use a struct inode_operations_wrapper instead of base struct
inode_operations, set S_IOPS_WRAPPER flag in i_flags. This lets
RH's modified vfs_tmpfile() find our tmpfile fn pointer.

Add a test that tests both creating tmpfiles as well as moving their
contents into a destination file via MOVE_BLOCKS.

xfstests common/004 now runs because tmpfile is supported.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-04-05 14:23:44 -07:00
Zach Brown
3139d3ea68 Add move_blocks ioctl
Add a relatively constrained ioctl that moves extents between regular
files.  This is intended to be used by tasks which combine many existing
files into a much larger file without reading and writing all the file
contents.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-14 13:42:22 -08:00
Zach Brown
4da3d47601 Move ALLOC_DETAIL ioctl definition
By convention we have the _IO* ioctl definition after the argument
structs and ALLOC_DETAIL got it a bit wrong so move it down.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-14 13:42:22 -08:00
Andy Grover
2c5871c253 Change release ioctl to be denominated in bytes not blocks
This more closely matches stage ioctl and other conventions.

Also change release code to use offset/length nomenclature for consistency.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-12 16:29:42 -08:00
Andy Grover
cf278f5fa0 scoutfs: Tidy some enum usage
Prefer named to anonymous enums. This helps readability a little.

Use enum as param type if possible (a couple spots).

Remove unused enum in lock_server.c.

Define enum spbm_flags using shift notation for consistency.

Rename get_file_block()'s "gfb" parameter to "flags" for consistency.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-11-30 13:35:44 -08:00
Andy Grover
d9d9b65f14 scoutfs: remove __packed from all struct definitions
Instead, explicitly add padding field, and adjust member ordering to
eliminate compiler-added padding between members, and at the end of the
struct (if possible: some structs end in a u8[0] array.)

This should prevent unaligned accesses. Not a big deal on x86_64, but
other archs like aarch64 really want this.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-10-29 14:15:33 -07:00
Zach Brown
d589881855 scoutfs: add tot m/d device blocks to statfs_more
The total_{meta,data}_blocks scoutfs_super_block fields initialized by
mkfs aren't visible to userspace anywhere.  Add them to statfs_more so
that tools can get the totals (and use them for df, in this particular
case).

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
3d790b24d5 scoutfs: add alloc_detail ioctl
An an ioctl which copies details of each persistent allocator to
userspace.  This will be used by a scoutfs command to give information
about the allocators in the system.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:03 -07:00
Zach Brown
4b9c02ba32 scoutfs: add committed_seq to statfs_more
Add the committed_seq to statfs_more which gives the greatest seq which
has been committed.  This lets callers disocover that a seq for a change
they made has been committed.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Zach Brown
c415cab1e9 scoutfs: use srch to track .srch. xattrs
Using strictly coherent btree items to map the hash of xattr names to
inode numbers proved the value of the functionality, but it was too
expensive.  We now have the more efficient srch infrastructure to use.

We change from the .indx. to the .srch. tag, and change the ioctl from
find_xattr to search_xattrs.  The idea is to communicate that these are
accelerated searches, not precise index lookups and are relatively
expensive.

Rather than maintaining btree items, xattr setting and deleting emits
srch entries which either tracks the xattr or combines with the previous
tracker and removes the entry.  These are done under the lock that
protects the main xattr item, we can remove the separate locking of the
previous index items.

The semantics of the search ioctl needs to change a bit.  Because
searches are so expensive we now return a flag to indicate that the
search completed.  While we're there, we also allow a last_ino parameter
so that searches can be divided up and run in parallel.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Benjamin LaHaise
f5863142be scoutfs: add data_wait_err for reporting errors
Add support for reporting errors to data waiters via a new
SCOUTFS_IOC_DATA_WAIT_ERR ioctl.  This allows waiters to return an error
to readers when staging fails.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
[zab: renamed to data_wait_err, took ino arg]
Signed-off-by: Zach Brown <zab@versity.com>
2020-05-29 13:50:13 -07:00
Zach Brown
edd8fe075c scoutfs: remove lsm code
Remove all the now unused code that deals with lsm: segment IO, the item
cache, and the manifest.

Signed-off-by: Zach Brown <zab@versity.com>
2020-01-17 11:21:36 -08:00
Zach Brown
a7ce9f22e2 scoutfs: add statfs ioctl
Add an ioctl that can fill a user struct with file system info.  We're
going to use this to find the fsid and rid of a mount.

Signed-off-by: Zach Brown <zab@versity.com>
2019-08-20 15:52:13 -07:00
Zach Brown
d8bc962fc5 scoutfs: unpriv listxattr_hidden only shows .hide.
Our hidden attributes are hidden so that they don't leak out of
the system when archiving tools transfer xattrs from listxattr along
with the file.  They're not intended to be secret, in fact users want to
see their contents like they want to see other fs metadata that they
can't update which describes the system.

Make our listxattr ioctl only return hidden xattrs and allow anyone to
see the results if they can read the file.   Rename it to more
accurately describe its intended use.

Signed-off-by: Zach Brown <zab@versity.com>
2019-06-28 10:23:55 -07:00
Zach Brown
663ce53109 scoutfs: clean up _IO ioctl macro usage
Accurately set the direction bits, pack down the used numbers, and
remove stale old ioctl definitions.

Signed-off-by: Zach Brown <zab@versity.com>
2019-06-28 10:23:55 -07:00
Zach Brown
4a29cb5888 scoutfs: naturally align ioctl structs
Order the ioctl struct field definitions and add padding so that
runtimes with different word dizes don't add different padding.
Userspace is spared having to deal with packing and we don't
have to worry about compat translation in the kernel.

We had two persistent structures that crossed the ioctl, a key and a
timespec, so we explicitly translate to and from their persistent types
in the ioctl.

Signed-off-by: Zach Brown <zab@versity.com>
2019-06-27 11:39:11 -07:00
Zach Brown
7dfbd3950f scoutfs: add index of inodes by xattr names
Add a .indx. xattr tag which adds the inode to an index of inodes keyed
by the hash of xattr names.  An ioctl is added which then returns all
the inodes which may contain an xattr of the given name.  Dropping all
xattrs now has to parse the name to find out if it also has to delete an
index item.

Signed-off-by: Zach Brown <zab@versity.com>
2019-06-24 09:58:22 -07:00
Zach Brown
a7fef3d7dd scoutfs: add listxattr_raw ioctl
Add an ioctl which can be used to iterate over the keys for all the
xattrs on an inode.  It is privileged, can see hidden inodes, and has an
iteration cursor so that it can make its way through very large numbers
of xattrs.

Signed-off-by: Zach Brown <zab@versity.com>
2019-06-24 09:58:22 -07:00
Zach Brown
c010afa8ff scoutfs: add setattr_more ioctl
Add an ioctl that can be used by userspace to restore a file to its
offline state.  To do that it needs to set inode fields that are
otherwise not exposed and create an offline extent.

Signed-off-by: Zach Brown <zab@versity.com>
2019-05-30 13:45:52 -07:00
Zach Brown
a6782fc03f scoutfs: add data waiting
One of the core features of scoutfs is the ability to transparently
migrate file contents to and from an archive tier.  For this to be
transparent we need file system operations to trigger staging the file
contents back into the file system as needed.

This adds the infrastructure which operations use to wait for offline
extents to come online and which provides userspace with a list of
blocks that the operations are waiting for.

We add some waiting infrastructure that callers use to lock, check for
offline extents, and unlock and wait before checking again to see if
they're still offline.  We add these checks and waiting to data io
operations that could encounter offline extents.

This has to be done carefully so that we don't wait while holding locks
that would prevent staging.  We use per-task structures to discover when
we are the first user of a cluster lock on an inode, indicating that
it's safe for us to wait because we don't hold any locks.

And while we're waiting our operation is tracked and reported to
userspace through an ioctl.  This is a non-blocking ioctl, it's up to
userspace to decide how often to check and how large a region to stage.

Waiters are woken up when the file contents could have changed, not
specifically when we know that the extent has come online.  This lets us
wake waiters when their lock is revoked so that they can block waiting
to reacquire the lock and test the extents again.  It lets us provide
coherent demand staging across the cluster without fine grained waiting
protocols sent betwen the nodes.  It may result in some spurious wakeups
and work but hopefully it won't, and it's a very simple and functional
first pass.

Signed-off-by: Zach Brown <zab@versity.com>
2019-05-21 11:33:26 -07:00
Zach Brown
9148f24aa2 scoutfs: use single small key struct
Variable length keys lead to having a key struct point to the buffer
that contains the key.  With dirents and xattrs now using small keys we
can convert everyone to using a single key struct and significantly
simplify the system.

We no longer have a seperate generic key buf struct that points to
specific per-type key storage.  All items use the key struct and fill
out the appropriate fields.  All the code that paired a generic key buf
struct and a specific key type struct is collapsed down to a key struct.
There's no longer the difference between a key buf that shares a
read-only key, has it's own precise allocation, or has a max size
allocation for incrementing and decrementing.

Each key user now has an init function fills out its fields.  It looks a
lot like the old pattern but we no longer have seperate key storage that
the buf points to.

A bunch of code now takes the address of static key storage instead of
managing allocated keys.  Conversely, swapping now uses the full keys
instead of pointers to the keys.

We don't need all the functions that worked on the generic key buf
struct because they had different lengths.  Copy, clone, length init,
memcpy, all of that goes away.

The item API had some functions that tested the length of keys and
values.  The key length tests vanish, and that gets rid of the _same()
call.  The _same_min() call only had one user who didn't also test for
the value length being too large.  Let's leave caller key constraints in
callers instead of trying to hide them on the other side of a bunch of
item calls.

We no longer have to track the number of key bytes when calculating if
an item population will fit in segments.  This removes the key length
from reservations, transactions, and segment writing.

The item cache key querying ioctls no longer have to deal with variable
length keys.  The simply specify the start key, the ioctls return the
number of keys copied instead of bytes, and the caller is responsible
for incrementing the next search key.

The segment no longer has to store the key length.  It stores the key
struct in the item header.

The fancy variable length key formatting and printing can be removed.
We have a single format for the universal key struct.  The SK_ wrappers
that bracked calls to use preempt safe per cpu buffers can turn back
into their normal calls.

Manifest entries are now a fixed size.  We can simply split them between
btree keys and values and initialize them instead of allocating them.
This means that level 0 entries don't have their own format that sorts
by the seq.  They're sorted by the key like all the other levels.
Compaction needs to sweep all of them looking for the oldest and read
can stop sweeping once it can no longer overlap.  This makes rare
compaction more expensive and common reading less expensive, which is
the right tradeoff.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
df6a8af71f scoutfs: remove name from dirent keys
Directory entries were the last items that had large variable length
keys because they stored the entry name in the key.  We'd like to have
small fixed size keys so let's store dirents with small keys.

Entries for lookup are stored at the hash of the name instead of the
full name.  The key also contains the unique readdir pos so that we
don't have to deal with collision on creation.  The lookup procedure now
does need to iterate over all the readdir positions for the hash value
and compare the names.

Entries for link backref walking are stored with the entry's position in
the parent dir instead of the entry's name.  The name is then stored in
the value.  Inode to path conversion can still walk the backref items
without having to lookup dirent items.

These changes mean that all directory entry items are now stored at a
small key with some u64s (hash, pos, parent dir, etc) and have a value
with the dirent struct and full entry name.  This lets us use the same
key and value format for the three entry key types.  We no longer have
to allocate keys, we can store them on the stack.

We store the entry's hash and pos in the dirent struct in the item value
so that any item has all the fields to reference all the other item
keys.  We store the same values in the dentry_info so that deletion
(unlink and rename) can find all the entries.

The ino_path ioctl can now much more clearly iterate over parent
directories and entry positions instead of oh so cleverly iterating over
null terminated names in the parent directories.  The ioctl interface
structs and implementation become simpler.

Signed-off-by: Zach Brown <zab@versity.com>
2018-04-04 09:15:27 -05:00
Zach Brown
c1311783d5 scoutfs: add tracking of online and offline blocks
Signed-off-by: Zach Brown <zab@versity.com>
2018-02-21 09:36:44 -08:00
Zach Brown
a49061a7d9 scoutfs: remove the size index
We aren't using the size index. It has runtime and code maintenance
costs that aren't worth paying.  Let's remove it.

Removing it from the format and no longer maintaining it are straight
forward.

The bulk of this patch is actually the act of removing it from the index
locking functions.  We no longer have to predict the size that will be
stored during the transaction to lock the index items that will be
created during the transaction.  A bunch of code to predict the size and
then pass it into locking and transactions goes away.  Like other inode
fields we now update the size as it changes.

Signed-off-by: Zach Brown <zab@versity.com>
2018-01-30 15:03:35 -08:00
Zach Brown
8bbb859f0c scoutfs: move scoutfs_ioctl definition
We're going to be strictly enforcing matching format.h and ioctl.h
between userspace and kernel space.  Let's get the exported kernel
function definition out of ioctl.h.

Signed-off-by: Zach Brown <zab@versity.com>
2017-10-12 13:57:31 -07:00
Mark Fasheh
021404bb6a scoutfs: remove inode ctime index
Like the mtime index, this index is unused. Removing it is a near
identical task. Running the same createmany test from our last
patch gives us the following:

 $ createmany -o '/scoutfs/file_%lu' 10000000

 total: 10000000 creates in 598.28 seconds: 16714.59 creates/second

 real    9m58.292s
 user    0m7.420s
 sys     5m44.632s

So after both indices are gone, we go from a 12m56 run time to 9m58s,
saving almost 3 minutes which translates into a total performance
increase of about 23%.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-22 15:59:13 -07:00
Mark Fasheh
d59367262d scoutfs: remove inode mtime index
This index is unused - we can gain some create performance by removing it.

To verify this, I ran createmany for 10 million files:

 $ createmany -o '/scoutfs/file_%lu' 10000000

Before this patch:
 total: 10000000 creates in 776.54 seconds: 12877.56 creates/second

 real    12m56.557s
 user    0m7.861s
 sys     6m56.986s

After this patch:
 total: 10000000 creates in 691.92 seconds: 14452.46 creates/second

 real    11m31.936s
 user    0m7.785s
 sys     6m19.328s

So removing the index gained us about a minute and a half on the test or a
12% performance increase.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
2017-08-22 15:59:13 -07:00
Zach Brown
c7ad9fe772 scoutfs: make release block granular
The existing release interface specified byte regions to release but
that didn't match what the underlying file data mapping structure is
capable of.  What happens if you specify a single byte to release?  Does
it release the whole block?  Does it release nothing?  Does it return an
error?

By making the interface match the capability of the operation we make
the functioning of the system that much more predictable.  Callers are
forced to think about implementing their desires in terms of block
granular releasing.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-14 09:19:03 -07:00
Zach Brown
9f4095bffb scoutfs: break the build if we export raw types
Raw [su]{8,16,32,64} types keep leaking into our exported headers where
they break userspace builds.  Make sure that we only use the exported __
types and add a check to break our build if we get it wrong.

Signed-off-by: Zach Brown <zab@versity.com>
2017-08-04 10:37:49 -07:00
Zach Brown
2eecbbe78a scoutfs: add item cache key ioctls
These ioctls let userspace see the items and ranges that are cached.

Signed-off-by: Zach Brown <zab@versity.com>
2017-06-27 14:04:38 -07:00
Zach Brown
5f11cdbfe5 scoutfs: add and index inode meta and data seqs
For each transaction we send a message to to the server asking for a
unique sequence number to associate with the transaction.  When we
change metadata or data of an inode we store the current transaction seq
in the inode and we index it with index items like the other inode
fields.

The server remembers the sequences it gives out.  When we go to walk the
inode sequence indexes we ask the server for the largest stable seq and
limit results to that seq.  This ensures that we never return seqs that
are past dirty items so never have inodes and seqs appear in the past.

Nodes use the sync timer to regularly cycle through seqs and ensure that
inode seq index walks don't get stuck on their otherwise idle seq.

Signed-off-by: Zach Brown <zab@versity.com>
2017-05-23 12:12:24 -07:00
Zach Brown
5307c56954 scoutfs: add a stat_more ioctl
We have inode fields that we want to return to userspace with very low
overhead.

Signed-off-by: Zach Brown <zab@versity.com>
2017-05-16 14:28:10 -07:00
Zach Brown
b97587b8fa scoutfs: add indexing of inodes by fields
Add items for indexing inodes by their fields.  When we update the inode
item we also delete the old index items and create the new items.  We
rename and refactor the old inode since ioctl to now walk the inode
index items.

Signed-off-by: Zach Brown <zab@versity.com>
2017-05-16 10:48:12 -07:00
Nic Henke
5c54bdbf85 Change type for DATA_VERSION ioctl to __u64
For consistency and to keep upstream users (scout-utils, etc) from
needing to include different type headers, we'll change the type to
match the rest of the header.

Signed-off-by: Nic Henke <nic.henke@versity.com>
2017-04-18 14:07:23 -07:00
Zach Brown
a310027380 Remove the find xattr ioctls
The current plan for finding populations of inodes to search no longer
involves xattr backrefs.  We're about to change the xattr storage format
so let's remove these interfaces so we don't have to update them.

Signed-off-by: Zach Brown <zab@versity.com>
2017-04-18 13:44:54 -07:00
Zach Brown
fff6fb4740 Restore link backref items
Convert the link backref code from btree items to the item cache.

Now that the backref items have the full entry name we can traverse a
link with one item lookup.  We don't need to lock the inode and verify
that the entry at the backref offset really points to our inode.  The
link backref walk gets a lot simpler.

But we have to widen the ioctl cursor to store a full dir ino and path
name isntead of just the dir's backref counter.

Signed-off-by: Zach Brown <zab@versity.com>
2017-04-18 13:44:54 -07:00
Zach Brown
c6b688c2bf Add staging ioctl
This adds the ioctl for writing archived file contents back into the
file if the data_version still matches.

Signed-off-by: Zach Brown <zab@versity.com>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
2016-11-16 14:45:08 -08:00
Zach Brown
df561bbd19 Add offline extent flag and release ioctl
Add the _OFFLINE flag to indicate offline extents.  The release ioctl
frees extents within the release range and sets their _OFFLINE flag if
the data_version still matches.

We tweak the existing truncate item function just a bit to support
making extents offline.  We make it take an explicit range of blocks to
remove instead of just giving it the size and it learns to mark extents
offline and update them instead of always deleting them.

Reads from offline extents return zeros like reading from a sparse
region (later it will trigger demand staging) and writing to offline
extents clears the offline flag (later only staging can do that).

Signed-off-by: Zach Brown <zab@versity.com>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
2016-11-16 14:45:08 -08:00
Zach Brown
5d87418925 Add ioctl for sampling inode data version
Add an ioctl that samples the inode's data_version.

Signed-off-by: Zach Brown <zab@versity.com>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
2016-11-16 14:45:08 -08:00
Zach Brown
ae6cc83d01 Raise the nlink limit
A few xfstests tests were failing because they tried to create a decent
number of hard links to a file.

We had a small nlink limit because the inode-paths ioctl copied all the
paths for all the hard links to a userspace buffer which could be
enormous if there was a larger nlink limit.

The hard link backref disk format already has a natural counter that
could be used as a cursor to iterate over all the hard links that point
to a given inode.

This refactors the inode_paths ioctl into a ino_path ioctl that returns
a single path for the given counter and returns the counter for the next
path that links to the inode.  Happily this lets us get rid of all the
weird path component lists and allocations.  Now there's just the kernel
path buffer that gets null terminated path components and the userspace
buffer that those are copied to.

We don't fully relax the nlink limit.  stat(2) returns the link count as
a u32.  We go a step further and limit it to S32_MAX so that apps might
avoid sign bugs.  That still gives us a more generous limit than ext4
and btrfs which are around U16_MAX.

Signed-off-by: Zach Brown <zab@versity.com>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
2016-11-16 14:45:08 -08:00
Zach Brown
16e94f6b7c Search for file data that has changed
We don't overwrite existing data.  Every file data write has to allocate
new blocks and update block mapping items.

We can search for inodes whose data has changed by filtering block
mapping item walks by the sequence number.  We do this by using the
exact same code for finding changed inodes but using the block mapping
key type.

Signed-off-by: Zach Brown <zab@versity.com>
2016-10-20 13:55:14 -07:00
Zach Brown
c90710d26b scoutfs: add find xattr ioctls
Add ioctls that return the inode numbers that probably contain the given
xattr name or value.  To support these we add items that index inodes by
the presence of xattr items whose names or values hash to a give hash
value.

Signed-off-by: Zach Brown <zab@versity.com>
2016-08-23 12:14:55 -07:00
Zach Brown
0991622a21 scoutfs: add inode_paths ioctl
This adds the ioctl that returns all the paths from the root to a given
inode.  The implementation only traverses btree items to keep it
isolated from the vfs object locking and life cycles, but that could be
a performance problem.  This is another motivation to accelerate the
btree code.

Signed-off-by: Zach Brown <zab@versity.com>
2016-08-11 16:46:18 -07:00
Zach Brown
90a73506c1 scoutfs: remove homebrew tracing
Oh, thank goodness.  It turns out that there's a crash extension for
working with tracepoints in crash dumps.  Let's use standard tracepoints
and pretend this tracing hack never happened.

Signed-off-by: Zach Brown <zab@versity.com>
2016-07-20 12:08:12 -07:00
Zach Brown
b51511466a scoutfs: add inodes_since ioctl
Add the ioctl that let's us find out about inodes that have changed
since a given sequence number.

A sequence number is added to the btree items so that we can track the
tree update that it last changed in.  We update this as we modify
items and maintain it across item copying for splits and merges.

The big change is using the parent item ref and item sequence numbers
to guide iteration over items in the tree.  The easier change is to have
the current iteration skip over items whose sequence number is too old.

The more subtle change has to do with how iteration is terminated.  The
current termination could stop when it doesn't find an item because that
could only happen at the final leaf.  When we're ignoring items with old
seqs this can happen at the end of any leaf.  So we change iteration to
keep advancing through leaf blocks until it crosses the last key value.

We add an argument to btree walking which communicates the next key that
can be used to continue iterating from the next leaf block.  This works
for the normal walk case as well as the seq walking case where walking
terminates prematurely in an interior node full of parent items with old
seqs.

Now that we're more robustly advancing iteration with btree walk calls
and the next key we can get rid fo the 'next_leaf' hack which was trying
to do the same thing inside the btree walk code.  It wasn't right for
the seq walking case and was pretty fiddly.

The next_key increment could wrap the maximal key at the right spine of
the tree so we have _inc saturate instead of wrap.

And finally, we want these inode scans to not have to skip over all the
other items associated with each inode as it walks looking for inodes
with the given sequence number.  We change the item sort order to first
sort by type instead of by inode.  We've wanted this more generally to
isolate item types that have different access patterns.

Signed-off-by: Zach Brown <zab@versity.com>
2016-07-05 14:46:20 -07:00
Zach Brown
7d6dd91a24 scoutfs: add tracing messages
This adds tracing functionality that's cheap and easy to
use.  By constantly gathering traces we'll always have rich
history to analyze when something goes wrong.

Signed-off-by: Zach Brown <zab@versity.com>
2016-05-28 11:11:15 -07:00