scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-01-08 04:55:21 +00:00

Author	SHA1	Message	Date
Zach Brown	96f2ad29dc	Add inode crtime creation time Add an inode creation time field. It's created for all new inodes. It's visible to stat_more. setattr_more can set it during restore. Signed-off-by: Zach Brown <zab@versity.com>	2021-07-08 11:00:30 -07:00
Zach Brown	73bf916182	Return ENOSPC as space gets low Returning ENOSPC is challenging because we have clients working on allocators which are a fraction of the whole and we use COW transactions so we need to be able to allocate to free. This adds support for returning ENOSPC to client posix allocators as free space gets low. For metadata, we reserve a number of free blocks for making progress with client and server transactions which can free space. The server sets the low flag in a client's allocator if we start to dip into reserved blocks. In the client we add an argument to entering a transaction which indicates if we're allocating new space (as opposed to just modifying existing data or freeing). When an allocating transaction runs low and the server low flag is set then we return ENOSPC. Adding an argument to transaciton holders and having it return ENOSPC gave us the opportunity to clean it up and make it a little clearer. More work is done outside the wait_event function and it now specifically waits for a transaction to cycle when it forces a commit rather than spinning until the transaction worker acquires the lock and stops it. For data the same pattern applies except there are no reserved blocks and we don't COW data so it's a simple case of returning the hard ENOSPC when the data allocator flag is set. The server needs to consider the reserved count when refilling the client's meta_avail allocator and when swapping between the two meta_avail and meta_free allocators. We add the reserved metadata block count to statfs_more so that df can subtract it from the free meta blocks and make it clear when enospc is going to be returned for metadata allocations. We increase the minimum device size in mkfs so that small testing devices provide sufficient reserved blocks. And finally we add a little test that makes sure we can fill both metadata and data to ENOSPC and then recover by deleting what we filled. Signed-off-by: Zach Brown <zab@versity.com>	2021-07-07 14:13:14 -07:00
Andy Grover	4630b77b45	cleanup: Use flexible array members instead of 0-length arrays See Documentation/process/deprecated.rst:217, items[] now preferred over items[0]. Signed-off-by: Andy Grover <agrover@versity.com>	2021-04-07 10:14:47 -07:00
Andy Grover	0deb232d3f	Support O_TMPFILE and allow MOVE_BLOCKS into released extents Support O_TMPFILE: Create an unlinked file and put it on the orphan list. If it ever gains a link, take it off the orphan list. Change MOVE_BLOCKS ioctl to allow moving blocks into offline extent ranges. Ioctl callers must set a new flag to enable this operation mode. RH-compat: tmpfile support it actually backported by RH into 3.10 kernel. We need to use some of their kabi-maintaining wrappers to use it: use a struct inode_operations_wrapper instead of base struct inode_operations, set S_IOPS_WRAPPER flag in i_flags. This lets RH's modified vfs_tmpfile() find our tmpfile fn pointer. Add a test that tests both creating tmpfiles as well as moving their contents into a destination file via MOVE_BLOCKS. xfstests common/004 now runs because tmpfile is supported. Signed-off-by: Andy Grover <agrover@versity.com>	2021-04-05 14:23:44 -07:00
Zach Brown	3139d3ea68	Add move_blocks ioctl Add a relatively constrained ioctl that moves extents between regular files. This is intended to be used by tasks which combine many existing files into a much larger file without reading and writing all the file contents. Signed-off-by: Zach Brown <zab@versity.com>	2021-01-14 13:42:22 -08:00
Zach Brown	4da3d47601	Move ALLOC_DETAIL ioctl definition By convention we have the _IO* ioctl definition after the argument structs and ALLOC_DETAIL got it a bit wrong so move it down. Signed-off-by: Zach Brown <zab@versity.com>	2021-01-14 13:42:22 -08:00
Andy Grover	2c5871c253	Change release ioctl to be denominated in bytes not blocks This more closely matches stage ioctl and other conventions. Also change release code to use offset/length nomenclature for consistency. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	cf278f5fa0	scoutfs: Tidy some enum usage Prefer named to anonymous enums. This helps readability a little. Use enum as param type if possible (a couple spots). Remove unused enum in lock_server.c. Define enum spbm_flags using shift notation for consistency. Rename get_file_block()'s "gfb" parameter to "flags" for consistency. Signed-off-by: Andy Grover <agrover@versity.com>	2020-11-30 13:35:44 -08:00
Andy Grover	d9d9b65f14	scoutfs: remove __packed from all struct definitions Instead, explicitly add padding field, and adjust member ordering to eliminate compiler-added padding between members, and at the end of the struct (if possible: some structs end in a u8[0] array.) This should prevent unaligned accesses. Not a big deal on x86_64, but other archs like aarch64 really want this. Signed-off-by: Andy Grover <agrover@versity.com>	2020-10-29 14:15:33 -07:00
Zach Brown	d589881855	scoutfs: add tot m/d device blocks to statfs_more The total_{meta,data}_blocks scoutfs_super_block fields initialized by mkfs aren't visible to userspace anywhere. Add them to statfs_more so that tools can get the totals (and use them for df, in this particular case). Signed-off-by: Zach Brown <zab@versity.com>	2020-10-26 15:19:03 -07:00
Zach Brown	3d790b24d5	scoutfs: add alloc_detail ioctl An an ioctl which copies details of each persistent allocator to userspace. This will be used by a scoutfs command to give information about the allocators in the system. Signed-off-by: Zach Brown <zab@versity.com>	2020-10-26 15:19:03 -07:00
Zach Brown	4b9c02ba32	scoutfs: add committed_seq to statfs_more Add the committed_seq to statfs_more which gives the greatest seq which has been committed. This lets callers disocover that a seq for a change they made has been committed. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach Brown	c415cab1e9	scoutfs: use srch to track .srch. xattrs Using strictly coherent btree items to map the hash of xattr names to inode numbers proved the value of the functionality, but it was too expensive. We now have the more efficient srch infrastructure to use. We change from the .indx. to the .srch. tag, and change the ioctl from find_xattr to search_xattrs. The idea is to communicate that these are accelerated searches, not precise index lookups and are relatively expensive. Rather than maintaining btree items, xattr setting and deleting emits srch entries which either tracks the xattr or combines with the previous tracker and removes the entry. These are done under the lock that protects the main xattr item, we can remove the separate locking of the previous index items. The semantics of the search ioctl needs to change a bit. Because searches are so expensive we now return a flag to indicate that the search completed. While we're there, we also allow a last_ino parameter so that searches can be divided up and run in parallel. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Benjamin LaHaise	f5863142be	scoutfs: add data_wait_err for reporting errors Add support for reporting errors to data waiters via a new SCOUTFS_IOC_DATA_WAIT_ERR ioctl. This allows waiters to return an error to readers when staging fails. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> [zab: renamed to data_wait_err, took ino arg] Signed-off-by: Zach Brown <zab@versity.com>	2020-05-29 13:50:13 -07:00
Zach Brown	edd8fe075c	scoutfs: remove lsm code Remove all the now unused code that deals with lsm: segment IO, the item cache, and the manifest. Signed-off-by: Zach Brown <zab@versity.com>	2020-01-17 11:21:36 -08:00
Zach Brown	a7ce9f22e2	scoutfs: add statfs ioctl Add an ioctl that can fill a user struct with file system info. We're going to use this to find the fsid and rid of a mount. Signed-off-by: Zach Brown <zab@versity.com>	2019-08-20 15:52:13 -07:00
Zach Brown	d8bc962fc5	scoutfs: unpriv listxattr_hidden only shows .hide. Our hidden attributes are hidden so that they don't leak out of the system when archiving tools transfer xattrs from listxattr along with the file. They're not intended to be secret, in fact users want to see their contents like they want to see other fs metadata that they can't update which describes the system. Make our listxattr ioctl only return hidden xattrs and allow anyone to see the results if they can read the file. Rename it to more accurately describe its intended use. Signed-off-by: Zach Brown <zab@versity.com>	2019-06-28 10:23:55 -07:00
Zach Brown	663ce53109	scoutfs: clean up _IO ioctl macro usage Accurately set the direction bits, pack down the used numbers, and remove stale old ioctl definitions. Signed-off-by: Zach Brown <zab@versity.com>	2019-06-28 10:23:55 -07:00
Zach Brown	4a29cb5888	scoutfs: naturally align ioctl structs Order the ioctl struct field definitions and add padding so that runtimes with different word dizes don't add different padding. Userspace is spared having to deal with packing and we don't have to worry about compat translation in the kernel. We had two persistent structures that crossed the ioctl, a key and a timespec, so we explicitly translate to and from their persistent types in the ioctl. Signed-off-by: Zach Brown <zab@versity.com>	2019-06-27 11:39:11 -07:00
Zach Brown	7dfbd3950f	scoutfs: add index of inodes by xattr names Add a .indx. xattr tag which adds the inode to an index of inodes keyed by the hash of xattr names. An ioctl is added which then returns all the inodes which may contain an xattr of the given name. Dropping all xattrs now has to parse the name to find out if it also has to delete an index item. Signed-off-by: Zach Brown <zab@versity.com>	2019-06-24 09:58:22 -07:00
Zach Brown	a7fef3d7dd	scoutfs: add listxattr_raw ioctl Add an ioctl which can be used to iterate over the keys for all the xattrs on an inode. It is privileged, can see hidden inodes, and has an iteration cursor so that it can make its way through very large numbers of xattrs. Signed-off-by: Zach Brown <zab@versity.com>	2019-06-24 09:58:22 -07:00
Zach Brown	c010afa8ff	scoutfs: add setattr_more ioctl Add an ioctl that can be used by userspace to restore a file to its offline state. To do that it needs to set inode fields that are otherwise not exposed and create an offline extent. Signed-off-by: Zach Brown <zab@versity.com>	2019-05-30 13:45:52 -07:00
Zach Brown	a6782fc03f	scoutfs: add data waiting One of the core features of scoutfs is the ability to transparently migrate file contents to and from an archive tier. For this to be transparent we need file system operations to trigger staging the file contents back into the file system as needed. This adds the infrastructure which operations use to wait for offline extents to come online and which provides userspace with a list of blocks that the operations are waiting for. We add some waiting infrastructure that callers use to lock, check for offline extents, and unlock and wait before checking again to see if they're still offline. We add these checks and waiting to data io operations that could encounter offline extents. This has to be done carefully so that we don't wait while holding locks that would prevent staging. We use per-task structures to discover when we are the first user of a cluster lock on an inode, indicating that it's safe for us to wait because we don't hold any locks. And while we're waiting our operation is tracked and reported to userspace through an ioctl. This is a non-blocking ioctl, it's up to userspace to decide how often to check and how large a region to stage. Waiters are woken up when the file contents could have changed, not specifically when we know that the extent has come online. This lets us wake waiters when their lock is revoked so that they can block waiting to reacquire the lock and test the extents again. It lets us provide coherent demand staging across the cluster without fine grained waiting protocols sent betwen the nodes. It may result in some spurious wakeups and work but hopefully it won't, and it's a very simple and functional first pass. Signed-off-by: Zach Brown <zab@versity.com>	2019-05-21 11:33:26 -07:00
Zach Brown	9148f24aa2	scoutfs: use single small key struct Variable length keys lead to having a key struct point to the buffer that contains the key. With dirents and xattrs now using small keys we can convert everyone to using a single key struct and significantly simplify the system. We no longer have a seperate generic key buf struct that points to specific per-type key storage. All items use the key struct and fill out the appropriate fields. All the code that paired a generic key buf struct and a specific key type struct is collapsed down to a key struct. There's no longer the difference between a key buf that shares a read-only key, has it's own precise allocation, or has a max size allocation for incrementing and decrementing. Each key user now has an init function fills out its fields. It looks a lot like the old pattern but we no longer have seperate key storage that the buf points to. A bunch of code now takes the address of static key storage instead of managing allocated keys. Conversely, swapping now uses the full keys instead of pointers to the keys. We don't need all the functions that worked on the generic key buf struct because they had different lengths. Copy, clone, length init, memcpy, all of that goes away. The item API had some functions that tested the length of keys and values. The key length tests vanish, and that gets rid of the _same() call. The _same_min() call only had one user who didn't also test for the value length being too large. Let's leave caller key constraints in callers instead of trying to hide them on the other side of a bunch of item calls. We no longer have to track the number of key bytes when calculating if an item population will fit in segments. This removes the key length from reservations, transactions, and segment writing. The item cache key querying ioctls no longer have to deal with variable length keys. The simply specify the start key, the ioctls return the number of keys copied instead of bytes, and the caller is responsible for incrementing the next search key. The segment no longer has to store the key length. It stores the key struct in the item header. The fancy variable length key formatting and printing can be removed. We have a single format for the universal key struct. The SK_ wrappers that bracked calls to use preempt safe per cpu buffers can turn back into their normal calls. Manifest entries are now a fixed size. We can simply split them between btree keys and values and initialize them instead of allocating them. This means that level 0 entries don't have their own format that sorts by the seq. They're sorted by the key like all the other levels. Compaction needs to sweep all of them looking for the oldest and read can stop sweeping once it can no longer overlap. This makes rare compaction more expensive and common reading less expensive, which is the right tradeoff. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	df6a8af71f	scoutfs: remove name from dirent keys Directory entries were the last items that had large variable length keys because they stored the entry name in the key. We'd like to have small fixed size keys so let's store dirents with small keys. Entries for lookup are stored at the hash of the name instead of the full name. The key also contains the unique readdir pos so that we don't have to deal with collision on creation. The lookup procedure now does need to iterate over all the readdir positions for the hash value and compare the names. Entries for link backref walking are stored with the entry's position in the parent dir instead of the entry's name. The name is then stored in the value. Inode to path conversion can still walk the backref items without having to lookup dirent items. These changes mean that all directory entry items are now stored at a small key with some u64s (hash, pos, parent dir, etc) and have a value with the dirent struct and full entry name. This lets us use the same key and value format for the three entry key types. We no longer have to allocate keys, we can store them on the stack. We store the entry's hash and pos in the dirent struct in the item value so that any item has all the fields to reference all the other item keys. We store the same values in the dentry_info so that deletion (unlink and rename) can find all the entries. The ino_path ioctl can now much more clearly iterate over parent directories and entry positions instead of oh so cleverly iterating over null terminated names in the parent directories. The ioctl interface structs and implementation become simpler. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	c1311783d5	scoutfs: add tracking of online and offline blocks Signed-off-by: Zach Brown <zab@versity.com>	2018-02-21 09:36:44 -08:00
Zach Brown	a49061a7d9	scoutfs: remove the size index We aren't using the size index. It has runtime and code maintenance costs that aren't worth paying. Let's remove it. Removing it from the format and no longer maintaining it are straight forward. The bulk of this patch is actually the act of removing it from the index locking functions. We no longer have to predict the size that will be stored during the transaction to lock the index items that will be created during the transaction. A bunch of code to predict the size and then pass it into locking and transactions goes away. Like other inode fields we now update the size as it changes. Signed-off-by: Zach Brown <zab@versity.com>	2018-01-30 15:03:35 -08:00
Zach Brown	8bbb859f0c	scoutfs: move scoutfs_ioctl definition We're going to be strictly enforcing matching format.h and ioctl.h between userspace and kernel space. Let's get the exported kernel function definition out of ioctl.h. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 13:57:31 -07:00
Mark Fasheh	021404bb6a	scoutfs: remove inode ctime index Like the mtime index, this index is unused. Removing it is a near identical task. Running the same createmany test from our last patch gives us the following: $ createmany -o '/scoutfs/file_%lu' 10000000 total: 10000000 creates in 598.28 seconds: 16714.59 creates/second real 9m58.292s user 0m7.420s sys 5m44.632s So after both indices are gone, we go from a 12m56 run time to 9m58s, saving almost 3 minutes which translates into a total performance increase of about 23%. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-22 15:59:13 -07:00
Mark Fasheh	d59367262d	scoutfs: remove inode mtime index This index is unused - we can gain some create performance by removing it. To verify this, I ran createmany for 10 million files: $ createmany -o '/scoutfs/file_%lu' 10000000 Before this patch: total: 10000000 creates in 776.54 seconds: 12877.56 creates/second real 12m56.557s user 0m7.861s sys 6m56.986s After this patch: total: 10000000 creates in 691.92 seconds: 14452.46 creates/second real 11m31.936s user 0m7.785s sys 6m19.328s So removing the index gained us about a minute and a half on the test or a 12% performance increase. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-22 15:59:13 -07:00
Zach Brown	c7ad9fe772	scoutfs: make release block granular The existing release interface specified byte regions to release but that didn't match what the underlying file data mapping structure is capable of. What happens if you specify a single byte to release? Does it release the whole block? Does it release nothing? Does it return an error? By making the interface match the capability of the operation we make the functioning of the system that much more predictable. Callers are forced to think about implementing their desires in terms of block granular releasing. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-14 09:19:03 -07:00
Zach Brown	9f4095bffb	scoutfs: break the build if we export raw types Raw [su]{8,16,32,64} types keep leaking into our exported headers where they break userspace builds. Make sure that we only use the exported __ types and add a check to break our build if we get it wrong. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-04 10:37:49 -07:00
Zach Brown	2eecbbe78a	scoutfs: add item cache key ioctls These ioctls let userspace see the items and ranges that are cached. Signed-off-by: Zach Brown <zab@versity.com>	2017-06-27 14:04:38 -07:00
Zach Brown	5f11cdbfe5	scoutfs: add and index inode meta and data seqs For each transaction we send a message to to the server asking for a unique sequence number to associate with the transaction. When we change metadata or data of an inode we store the current transaction seq in the inode and we index it with index items like the other inode fields. The server remembers the sequences it gives out. When we go to walk the inode sequence indexes we ask the server for the largest stable seq and limit results to that seq. This ensures that we never return seqs that are past dirty items so never have inodes and seqs appear in the past. Nodes use the sync timer to regularly cycle through seqs and ensure that inode seq index walks don't get stuck on their otherwise idle seq. Signed-off-by: Zach Brown <zab@versity.com>	2017-05-23 12:12:24 -07:00
Zach Brown	5307c56954	scoutfs: add a stat_more ioctl We have inode fields that we want to return to userspace with very low overhead. Signed-off-by: Zach Brown <zab@versity.com>	2017-05-16 14:28:10 -07:00
Zach Brown	b97587b8fa	scoutfs: add indexing of inodes by fields Add items for indexing inodes by their fields. When we update the inode item we also delete the old index items and create the new items. We rename and refactor the old inode since ioctl to now walk the inode index items. Signed-off-by: Zach Brown <zab@versity.com>	2017-05-16 10:48:12 -07:00
Nic Henke	5c54bdbf85	Change type for DATA_VERSION ioctl to __u64 For consistency and to keep upstream users (scout-utils, etc) from needing to include different type headers, we'll change the type to match the rest of the header. Signed-off-by: Nic Henke <nic.henke@versity.com>	2017-04-18 14:07:23 -07:00
Zach Brown	a310027380	Remove the find xattr ioctls The current plan for finding populations of inodes to search no longer involves xattr backrefs. We're about to change the xattr storage format so let's remove these interfaces so we don't have to update them. Signed-off-by: Zach Brown <zab@versity.com>	2017-04-18 13:44:54 -07:00
Zach Brown	fff6fb4740	Restore link backref items Convert the link backref code from btree items to the item cache. Now that the backref items have the full entry name we can traverse a link with one item lookup. We don't need to lock the inode and verify that the entry at the backref offset really points to our inode. The link backref walk gets a lot simpler. But we have to widen the ioctl cursor to store a full dir ino and path name isntead of just the dir's backref counter. Signed-off-by: Zach Brown <zab@versity.com>	2017-04-18 13:44:54 -07:00
Zach Brown	c6b688c2bf	Add staging ioctl This adds the ioctl for writing archived file contents back into the file if the data_version still matches. Signed-off-by: Zach Brown <zab@versity.com> Reviewed-by: Mark Fasheh <mfasheh@versity.com>	2016-11-16 14:45:08 -08:00
Zach Brown	df561bbd19	Add offline extent flag and release ioctl Add the _OFFLINE flag to indicate offline extents. The release ioctl frees extents within the release range and sets their _OFFLINE flag if the data_version still matches. We tweak the existing truncate item function just a bit to support making extents offline. We make it take an explicit range of blocks to remove instead of just giving it the size and it learns to mark extents offline and update them instead of always deleting them. Reads from offline extents return zeros like reading from a sparse region (later it will trigger demand staging) and writing to offline extents clears the offline flag (later only staging can do that). Signed-off-by: Zach Brown <zab@versity.com> Reviewed-by: Mark Fasheh <mfasheh@versity.com>	2016-11-16 14:45:08 -08:00
Zach Brown	5d87418925	Add ioctl for sampling inode data version Add an ioctl that samples the inode's data_version. Signed-off-by: Zach Brown <zab@versity.com> Reviewed-by: Mark Fasheh <mfasheh@versity.com>	2016-11-16 14:45:08 -08:00
Zach Brown	ae6cc83d01	Raise the nlink limit A few xfstests tests were failing because they tried to create a decent number of hard links to a file. We had a small nlink limit because the inode-paths ioctl copied all the paths for all the hard links to a userspace buffer which could be enormous if there was a larger nlink limit. The hard link backref disk format already has a natural counter that could be used as a cursor to iterate over all the hard links that point to a given inode. This refactors the inode_paths ioctl into a ino_path ioctl that returns a single path for the given counter and returns the counter for the next path that links to the inode. Happily this lets us get rid of all the weird path component lists and allocations. Now there's just the kernel path buffer that gets null terminated path components and the userspace buffer that those are copied to. We don't fully relax the nlink limit. stat(2) returns the link count as a u32. We go a step further and limit it to S32_MAX so that apps might avoid sign bugs. That still gives us a more generous limit than ext4 and btrfs which are around U16_MAX. Signed-off-by: Zach Brown <zab@versity.com> Reviewed-by: Mark Fasheh <mfasheh@versity.com>	2016-11-16 14:45:08 -08:00
Zach Brown	16e94f6b7c	Search for file data that has changed We don't overwrite existing data. Every file data write has to allocate new blocks and update block mapping items. We can search for inodes whose data has changed by filtering block mapping item walks by the sequence number. We do this by using the exact same code for finding changed inodes but using the block mapping key type. Signed-off-by: Zach Brown <zab@versity.com>	2016-10-20 13:55:14 -07:00
Zach Brown	c90710d26b	scoutfs: add find xattr ioctls Add ioctls that return the inode numbers that probably contain the given xattr name or value. To support these we add items that index inodes by the presence of xattr items whose names or values hash to a give hash value. Signed-off-by: Zach Brown <zab@versity.com>	2016-08-23 12:14:55 -07:00
Zach Brown	0991622a21	scoutfs: add inode_paths ioctl This adds the ioctl that returns all the paths from the root to a given inode. The implementation only traverses btree items to keep it isolated from the vfs object locking and life cycles, but that could be a performance problem. This is another motivation to accelerate the btree code. Signed-off-by: Zach Brown <zab@versity.com>	2016-08-11 16:46:18 -07:00
Zach Brown	90a73506c1	scoutfs: remove homebrew tracing Oh, thank goodness. It turns out that there's a crash extension for working with tracepoints in crash dumps. Let's use standard tracepoints and pretend this tracing hack never happened. Signed-off-by: Zach Brown <zab@versity.com>	2016-07-20 12:08:12 -07:00
Zach Brown	b51511466a	scoutfs: add inodes_since ioctl Add the ioctl that let's us find out about inodes that have changed since a given sequence number. A sequence number is added to the btree items so that we can track the tree update that it last changed in. We update this as we modify items and maintain it across item copying for splits and merges. The big change is using the parent item ref and item sequence numbers to guide iteration over items in the tree. The easier change is to have the current iteration skip over items whose sequence number is too old. The more subtle change has to do with how iteration is terminated. The current termination could stop when it doesn't find an item because that could only happen at the final leaf. When we're ignoring items with old seqs this can happen at the end of any leaf. So we change iteration to keep advancing through leaf blocks until it crosses the last key value. We add an argument to btree walking which communicates the next key that can be used to continue iterating from the next leaf block. This works for the normal walk case as well as the seq walking case where walking terminates prematurely in an interior node full of parent items with old seqs. Now that we're more robustly advancing iteration with btree walk calls and the next key we can get rid fo the 'next_leaf' hack which was trying to do the same thing inside the btree walk code. It wasn't right for the seq walking case and was pretty fiddly. The next_key increment could wrap the maximal key at the right spine of the tree so we have _inc saturate instead of wrap. And finally, we want these inode scans to not have to skip over all the other items associated with each inode as it walks looking for inodes with the given sequence number. We change the item sort order to first sort by type instead of by inode. We've wanted this more generally to isolate item types that have different access patterns. Signed-off-by: Zach Brown <zab@versity.com>	2016-07-05 14:46:20 -07:00
Zach Brown	7d6dd91a24	scoutfs: add tracing messages This adds tracing functionality that's cheap and easy to use. By constantly gathering traces we'll always have rich history to analyze when something goes wrong. Signed-off-by: Zach Brown <zab@versity.com>	2016-05-28 11:11:15 -07:00

49 Commits