scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-02-07 19:20:44 +00:00

Author	SHA1	Message	Date
Mark Fasheh	bc2fef7fc8	scoutfs: ifdef out ocfs2 specific callbacks and functions We only want the generic stuff. Long term the Ocfs2 specific code would be what's left in fs/ocfs2/dlmglue.[ch]. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-23 16:05:24 -05:00
Mark Fasheh	fc21a0253c	scoutfs: Hook dlmglue into our build system Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-23 15:54:08 -05:00
Mark Fasheh	f7e3f6f9e6	scoutfs: import fs/ocfs2/dlmglue.[ch] from Linux v4.13-rc6 Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-22 19:07:53 -05:00
Mark Fasheh	021404bb6a	scoutfs: remove inode ctime index Like the mtime index, this index is unused. Removing it is a near identical task. Running the same createmany test from our last patch gives us the following: $ createmany -o '/scoutfs/file_%lu' 10000000 total: 10000000 creates in 598.28 seconds: 16714.59 creates/second real 9m58.292s user 0m7.420s sys 5m44.632s So after both indices are gone, we go from a 12m56 run time to 9m58s, saving almost 3 minutes which translates into a total performance increase of about 23%. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-22 15:59:13 -07:00
Mark Fasheh	d59367262d	scoutfs: remove inode mtime index This index is unused - we can gain some create performance by removing it. To verify this, I ran createmany for 10 million files: $ createmany -o '/scoutfs/file_%lu' 10000000 Before this patch: total: 10000000 creates in 776.54 seconds: 12877.56 creates/second real 12m56.557s user 0m7.861s sys 6m56.986s After this patch: total: 10000000 creates in 691.92 seconds: 14452.46 creates/second real 11m31.936s user 0m7.785s sys 6m19.328s So removing the index gained us about a minute and a half on the test or a 12% performance increase. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-22 15:59:13 -07:00
Zach Brown	8135b18c76	scoutfs: start truncate from first block Truncation updates extents that intersect with the input range. It starts with the first block in the range and iterates until it has searched for all the extents that could cover the range. Extents are stored in items at their final block location so that we can use _next to find intersections. Truncation was searching for the next extent after the full extent that it was still searching for. That means it was starting the search at the last block in the extent, not the first. It would miss all the extents that didn't overlap with the last block it was searching for. This fixed by searching from a temporary single block extent at the start of the search range. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-17 15:29:08 -07:00
Mark Fasheh	d1ae486d83	scoutfs: provide ->llseek Without this we return -ESPIPE when a process tries to seek on a regular file. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-14 19:57:13 -07:00
Zach Brown	07bbc418c3	scoutfs: merge offline extents Offline extents weren't being merged because they all had their physical blkno set to 0 and all the extent calculations didn't treat them specially. They would only merge if the physical blocks of two extent were contiguous. Instead of special casing offline extents everywhere we store them with a physical blkno set to the logical blk_off. This lets all the current extent calculations work as expected. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-14 09:19:03 -07:00
Zach Brown	7cc09761f5	scoutfs: release item cleanup needs transaction Release tries to re-instate extents if it sees an error during release. Those item manipulations need to be covered by the transaction. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-14 09:19:03 -07:00
Zach Brown	c7ad9fe772	scoutfs: make release block granular The existing release interface specified byte regions to release but that didn't match what the underlying file data mapping structure is capable of. What happens if you specify a single byte to release? Does it release the whole block? Does it release nothing? Does it return an error? By making the interface match the capability of the operation we make the functioning of the system that much more predictable. Callers are forced to think about implementing their desires in terms of block granular releasing. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-14 09:19:03 -07:00
Zach Brown	87ab27beb1	scoutfs: add statfs network message The ->statfs method was still using the super_block in the super_info that was read during mount. This will get progressively more out of date. We add a network message to ask the server for the current fields that impact statfs. This is always racy and the fields are mostly nonsense, but we try our best. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-11 10:43:35 -07:00
Zach Brown	ba7bde30fc	scoutfs: delete inode index items Delete inode index items when deleting all the items associated with an inode after its been unlinked and had all its references dropped. The index items should always match the fields in the inode item so we read it to determine the index items that should be deleted, regardless of if we have the vfs inode cached or not. We take the opportunity to collapse the two callers of item deletion which looked up the inode into item deletion so that it can use the inode fields. The deletion of index items is partially verified by an inode index test in xfstests which makes sure that unlinked files are no longer present in the index. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-11 10:13:56 -07:00
Zach Brown	3768e3c41c	scoutfs: don't add dirs to data_seq index Directories were getting added to the data_seq index. It might have looked like they weren't because their data_seqs were always 0 but when inodes are created they don't have 'have_item' set so all the fields are added regardless of their current value. We'd rather not have to wade their directories when looking for regular file data in the data_seq index so let's explicitly test for regular files when updating the data_seq index items. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-11 10:13:56 -07:00
Zach Brown	1398b2316d	scoutfs: clean up racey inode index updates The updating of the inode index items was racey. It loaded the inode values, updated the items, loaded the fields again, and then stored the fields in the inode info. All without locking. Concurrent attempts could get the fields scrambled and racing with other paths that update the fields could get the items and inode info out of sync. This fixes up the two races by only reading the inode fields once and performing the multi-stage update under a mutex. We add a new lock to avoid ordering problems with trying to add an existing lock at these points in the locking heirarchy. We specifically use a mutex because the item functions can block. Now the inode index field update just has to safely race with concurrent access to the fields. This was found by generic/037 once getattr started refreshing the inode. It now passes again. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-11 10:07:42 -07:00
Zach Brown	cdb58a967a	scoutfs: give module fs scoutfs alias Use MODULE_ALIAS_FS() to register the "scoutfs" fs alias so that modprobe can find the module if it's installed and visible to depmod. We don't yet have clever enough xfstests to mess around with modules. I manually verified this by installing the module in /lib/modules and trying mount -t scoutfs before and after the change. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-10 18:07:26 -07:00
Zach Brown	c1b2ad9421	scoutfs: separate client and server net processing The networking code was really suffering by trying to combine the client and server processing paths into one file. The code can be a lot simpler by giving the client and server their own processing paths that take their different socket lifecysles into account. The client maintains a single connection. Blocked senders work on the socket under a sending mutex. The recv path runs in work that can be canceled after first shutting down the socket. A long running server work function acquires the listener lock, manages the listening socket, and accepts new sockets. Each accepted socket has a single recv work blocked waiting for requests. That then spawns concurrent processing work which sends replies under a sending mutex. All of this is torn down by shutting down sockets and canceling work which frees its context. All this restructuring makes it a lot easier to track what is happening in mount and unmount between the client and server. This fixes bugs where unmount was failing because the monolithic socket shutdown function was queueing other work while running while draining. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-04 10:47:42 -07:00
Zach Brown	74a80b772e	scoutfs: add endian_swap.h Add a helper header for conversions between little and big endian. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-04 10:44:06 -07:00
Zach Brown	b98f97e143	scoutfs: use hlist hash for data cursors The rhashtable API has changed over time. Continuing to use it means having to worry about maintaining different APIs in different kernel generations. We have a static pool of cursors so we don't need the flexibility of the resizable rhashtable. We can roll a simple array of hlist heads to use as a hash table. And finally, these cursors will probably disappear eventually anyway. Let's not invest too much in them. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-04 10:44:06 -07:00
Zach Brown	9f4095bffb	scoutfs: break the build if we export raw types Raw [su]{8,16,32,64} types keep leaking into our exported headers where they break userspace builds. Make sure that we only use the exported __ types and add a check to break our build if we get it wrong. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-04 10:37:49 -07:00
Zach Brown	cefe06af61	scoutfs: add git describe to built module It's handy to quickly find the git commit that built a given module. We add a MOD_INFO() tag for it so we can see it in modinfo on the built module. We add a ELF note that the kernel tracks in /sys/modules/$m/notes/ when the module is loaded. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-03 15:07:23 -07:00
Zach Brown	6d16034112	scoutfs: remove old dlm make -I We don't need arguments for a dlm build. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-03 15:07:23 -07:00
Zach Brown	65c3ac5043	scoutfs: Add cluster locking to node/file ops This gives us cluster locking for the overwhelming majority of metadata ops that scoutfs supports. In particular, we can create and modify file metadata from one node and immediately see the changes reflected on another node. In addition to synchonrization the cluster locks here are providing an I/O endpoint for our item cache, ensuring that it doesn't read stale items. Readdir and file read/write are notable exception - they require a more specific approach and will be implemented in a future patch. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [fixed iget unlock and truncated commit message summary] Signed-off-by: Zach Brown <zab@versity.com>	2017-08-03 11:16:35 -07:00
Zach Brown	172cff5537	scoutfs: return -ENODATA from getxattr The conversion to the multi-item xattrs accidentally returned -EIO when an attribute wasn't found instead of -ENODATA. That broke a huge number of xfstests because ls can look up xattrs and return EIO. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-02 11:16:12 -07:00
Mark Fasheh	325eadca9f	scoutfs: check for NULL lock in scoutfs_unlock This reduces the amount of duplicate code in callers and makes error handling easier. The alternative is to sprinkle the code with 'if (lock)' lines at the end of our functions. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-07-27 12:33:21 -07:00
Mark Fasheh	4ff2148f10	scoutfs: Don't use stale root in get_manifest_refs get_manifest_refs was using the btree root in its stale copy of the super block. It is supposed to use the btree root that it was given by its caller who went to the trouble of finding a sufficiently current btree root. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [zab: added commit message and fixed formatting] Signed-off-by: Zach Brown <zab@versity.com>	2017-07-27 12:32:05 -07:00
Mark Fasheh	a65b28d440	scoutfs: lock impossible ino group for listen lock Otherwise we get into a problem where the listen lock is conflicting with regular inode group requests. Since we never drop the listen lock and it (by design) blocks progress on another node, those inode group requests may hang. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-07-19 19:04:41 -05:00
Mark Fasheh	2d11f08f5e	scoutfs: Remove unused functions, scoutfs_[un]lock_addr Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-07-19 19:04:41 -05:00
Zach Brown	13ebd8d18c	scoutfs: don't use delayed downconvert work The delayed downconvert work wasn't being canceled on shutdown. 60s after unmount at least the net lock's timer would fire and crash trying to queue the delayed work on the destroyed workqueue. Proactively unlocking the locks isn't always beneficial to begin with. The relative costs of mispredicting the future are wildly different if we have to re-read item caches from segments or have to downconvert a blocking read lock. So we can just remove the delayed work to fix the bug and remove a moving piece that would need to be considered and tuned. There's still a race where we can get basts after destroying the workqueue but before we destroy the lockspace, we'll get there. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	47b26d7888	scoutfs: add end to _item_delete Add the end argument to scoutfs_item_delete() to limit how many items it will read into the cache. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	d5b4677e7f	scoutfs: add end to _dirty, _delete_many, _update These transformations are mechanical and there aren't many callers of these so we combine them into one commit. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	d78ed098a7	scoutfs: add cache reading limit to _set_batch Add an end argument to _set_batch to specify the limit of items we'll read into the cache. And it turns out that the loop in _set_batch that meant to cache all the items covered by the batch didn't try hard enough. It would stop once the first key was covered but didn't make sure that the coverage extended to cover last. This can happen if segment boundaries happen to fall within the items that make up the batch. Fix it up while we're in here. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	0b64a4c83f	scoutfs: lock inode index item iteration Add locks around inode index item iteration. This is tricky because the inode index items are enormous and we can't default to coarse locks that let it read and iterate over the entire key space. We use the manifest to find the next small fixed size region to lock and iterate from. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	f611c769e2	scoutfs: add 'end' to item_next to limit reads Add an end key to the item_next calls to limit how many items will be read into the cache. Callers typically get this from the lock they hold that covers the iteration. We differentiate between iteration and caching so that a series of small iterations (listxattr on inodes, namespace walk in small dirs) can be satisfied by a single read of adjacent items from segments. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	4f6f842efa	scoutfs: add inode index item locking Add a locking wrapper for the inode index items. It maps the index fields to a lock name for each index type. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	c80dd579e1	scoutfs: add scoutfs_manifest_next_key Add an item reading variant that just returns the next key that it finds in segments after the given key. This will be used while iterating to find the next key to lock and then try to iterate towards. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	19171f7a25	scoutfs: add end to _item_lookup The item cache can only be populated with items that are covered by locks. Require callers to provide the farthest key that can be covered by the locks. Locks provide a key for exactly this purpose. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	67cc4fb697	scoutfs: allow NULL end around read_items Let both check_range and read_items take a NULL end. check_range just doesn't do anything with the end of the range. read_items defaults to trying to read as many items as it can but clamps to the extent of the segments that intersect with the key. This will let us incrementally add end arguments to the item functions that are intially passed in as NULL in callers as we add lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	11a8570117	scoutfs: remove our copy of the dlm We don't need the dlm to track key ranges if we implement ranges by mapping keys to resources which represent ranges of the key space. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	8a42a4d75a	scoutfs: introduce lock names Instead of locking one resource with ranges we'll have callers map their logical resources to a tuple name that we'll store in lock resources. The names still map to ranges for cache reading and cache invalidation but the ranges aren't exposed to the DLM. This lets us use the stock DLM and distribute resources across masters. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	6de2bfc1c5	scoutfs: use the dlm mode/levels directly We intend to use more of the dlm lock levels. Let's use its modes directly so we don't have to maintain a mental map from differently named modes. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	8d29c82306	scoutfs: sort keys by zone, then inode, then type Holding a DLM lock protects a range of the key space. The DLM locks span inodes or regions of inodes. We need the sort order in LSM items to match the DLM range keys so that we can read all the items covered by a lock into the cache from a region of LSM segments. If their orders differered then we'd have to jump around segments to find all the items covered by a given DLM lock. Previously we were sorting by type then, within types, by inode. Now we want to sort by inode then by type. But there are structures which previously had a type but weren't then sorted by inode. We introduce zones as the primary sort key. Inode index and node zones are sorted by the inode fields and node ids respectively. Then comes the fs zone first sorted by inode then the type of the key. The bulk of this is the mechanical introduction of the zone field to the keys, moving the type field down, and a bulk rename of _KEY to _TYPE. But there are some more substantial changes. The orphan keys needed to be put in a zone. They fit in the NODE zone which is all about resources that nodes hold and would need to be cleaned up if the node went away. The key formatting is significantly changed to match the new formatting. Formatted keys are now generally of the form "zone.primary.type..." And finally with the keys now properly sorted by inodes we can correctly construct a single range of item cache keys to invalidate when unlocking the inode group locks. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	690049c293	scoutfs: add GET_MANIFEST_ROOT network op We're going to need to be able to sample the current stable manifest root occasionally. We're adding it now because we don't yet have the lock plumbing that would provide the lvb. Eventually this call will bubble up into the locking and the root will be stored in the lock instead of always requested. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	412e7a7e3b	scoutfs: remove unused ring log storage Remove the old unused ring now all of its previous callers now use the btree. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-08 10:59:40 -07:00
Zach Brown	c2f13ccf24	scoutfs: have net.c commit btree blocks Convert the net server metadata dirtying and committing code to use the btree instead of the ring. It has to be careful to setup and teardown the btree info as it starts up and shuts down the server. This fixes up some questionable setup/teardown changes made in the previous patches to convert the manifest and allocator. We could rebase the patches to merge those together. But given that the previous patches don't work at all without the net updates it might not be worth the trouble. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-08 10:59:40 -07:00
Zach Brown	ff5a094833	scoutfs: store allocator regions in btree Convert the segment allocator to store its free region bitmaps in the btree. This is a very straight forward mechanical transformation. We split the allocator region into a big-endian index key and the bitmap value payload. We're careful to operate on aligned copies of the bitmaps so that they're long aligned. We can remove all the funky functions that were needed when writing the ring. All we're left with is a call to apply the pending allocations to dirty btree blocks before writing the btree. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-08 10:59:40 -07:00
Zach Brown	fc50072cf9	scoutfs: store manifest entries in the btree Convert the manifest to store entries in persistent btree keys and values instead of using the rbtree in memory from the ring. The btree doesn't have a sort function. It just compares variable length keys. The most complicated part of this transformation is dealing with the fallout of this. The compare function can't compare different search keys and item keys so searches need to construct full synthetic btree keys to search. It also can't return different comparisons, like overlaping, so the caller needs to do a bit more work to use key comparisons to find overlapping segments. And it can't compare differently depending on the level of the manifest so we store the manifest in keys differently depending on whether its in level 0 or not. All mount clients can now see the manifest blocks. They can query the manifest directly when trying to find segments to read. We can get rid of all the networking calls that were finding the segments for readers. We change the manifest functions that relied on the ring that the to make changes in the manifest persistent. We don't touch the allocator or the rest of the manifest server, though, so this commit breaks the world. It'll be restored in future patches as we update the segment allocator and server to work with the btree. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-08 10:59:40 -07:00
Zach Brown	3eaabe81de	scoutfs: add btree stored in persistent ring Add a cow btree whose blocks are stored in a persistently allocated ring. This will let us incrementally index very large data sets efficiently. This is an adaptation of the previous btree code which now uses the ring, stores variable length keys, and augments the items with bits that ored up through parents. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-08 10:59:40 -07:00
Mark Fasheh	eb439ccc01	scoutfs: s/lck/lock/ lock.[ch] Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-07-06 18:18:58 -05:00
Mark Fasheh	136cbbed29	scoutfs: only release lockspace/workqueues in lock_destroy if they exist Mount failure means these might be NULL. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-07-06 18:17:22 -05:00
Mark Fasheh	19f6f40fee	scoutfs: get rid of held_locks construct Now that we have a dlm, this is a needless redirection. Merge all fields back into the lock_info struct. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-07-06 18:17:22 -05:00

1 2 3 4 5 ...

368 Commits