scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-04-30 01:46:54 +00:00

Author	SHA1	Message	Date
Auke Kok	64fcbdc15e	Zero out dirent padding to avoid leaking to disk. This allocation here currently leaks through __pad[7] which is written to disk. Use the initializer to enforce zeroing the pad. The name member is written right after. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-05 16:20:06 -08:00
Auke Kok	533f309aec	Switch to .get_inode_acl() to avoid rcu corruption. In el9.6, the kernel VFS no longer goes through xattr handlers to retreive ACLs, but instead calls the FS drivers' .get_{inode_}acl method. In the initial compat version we hooked up to .get_acl given the identical name that was used in the past. However, this results in caching issues, as was encountered by customers and exposed in the added test case `basic-acl-consistency`. The result is that some group ACL entries may appear randomly missing. Dropping caches may temporarily fix the issue. The root cause of the issue is that the VFS now has 2 separate paths to retreive ACL's from the FS driver, and, they have conflicting implications for caching. `.get_acl` is purely meant for filesystems like overlay/ecryptfs where no caching should ever go on as they are fully passthrough only. Filesystems with dentries (i.e. all normal filesystems should not expose this interface, and instead expose the .get_inode_acl method. And indeed, in introducing the new interface, the upstream kernel converts all but a few fs's to use .get_inode_acl(). The functional change in the driver is to detach KC_GET_ACL_DENTRY and introduce KC_GET_INODE_ACL to handle the new (and required) interface. KC_SET_ACL_DENTRY is detached due to it being a different changeset in the kernel and we should separate these for good measure now. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-30 11:31:43 -08:00
Auke Kok	afb6ba00ad	POSIX ACL changes. The .get_acl() method now gets passed a mnt_idmap arg, and we can now choose to implement either .get_acl() or .get_inode_acl(). Technically .get_acl() is a new implementation, and .get_inode_acl() is the old. That second method now also gets an rcu flag passed, but we should be fine either way. Deeper under the covers however we do need to hook up the .set_acl() method for inodes, otherwise setfacl will just fail with -ENOTSUPP. To make this not super messy (it already is) we tack on the get_acl() changes here. This is all roughly ca. v6.1-rc1-4-g7420332a6ff4. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-10-30 13:59:44 -04:00
Zach Brown	609fc56cd6	Merge pull request #203 from versity/auke/new_inode_ctime Fix new_inode ctime assignment.	2025-02-25 15:23:16 -08:00
Auke Kok	e3e2cfceec	Fix new_inode ctime assignment. Very old copy/paste bug here, we want to update new_inode's ctime instead. old_inode already is updated. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-02-18 13:15:49 -05:00
Auke Kok	e9d147260c	Fix ctx->pos updating to properly handle dent gaps We need to assure we're emitting dents with the proper position and we already have them as part of our dent. The only caveat is to increment ctx->pos once beyond the list to make sure the caller doesn't call us once more. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-01-27 14:49:04 -05:00
Auke Kok	cad12d5ce8	Avoid deadlock in _readdir() due to copy_to_user(). dir_emit() will copy_to_user, which can pagefault. If this happens while cluster locked, we could deadlock. We use a single page to stage dir_emit data, and iterate between fetching dirents while locked, and emitting them while not locked. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-01-27 14:49:04 -05:00
Auke Kok	1bcd1d4d00	Drop readdir pre-.iterate() compat (el7.5ish). These 2 sections of compat for readdir are wholly obsolete and can be hard dropped, which restores the method to look like current upstream code. This was added in `ddd1a4e`. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-01-23 14:28:40 -05:00
Auke Kok	d5c2768f04	.tmpfile method now passed a struct file, which must be opened. v6.0-rc6-9-g863f144f12ad changes the VFS method to pass in a struct file and not a dentry in preperation for tmpfile support in fuse. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	4ef64c6fcf	Vfs methods become user namespace mount aware. v5.11-rc4-24-g549c7297717c All of these VFS methods are now passed a user_namespace. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Zach Brown	38e6f11ee4	Add quota support Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	4a8240748e	Add project ID support Add support for project IDs. They're managed through the _attr_x interfaces and are inherited from the parent directory during creation. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	fb5331a1d9	Add inode retention bit Add a bit to the private scoutfs inode flags which indicates that the inode is in retention mode. The bit is visible through the _attr_x interface. It can only be set on regular files and when set it prevents modification to all but non-user xattrs. It can be cleared by root. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	2af6f47c8b	Fix bad error exit path in unlink Unlink looks up the entry items for the name it is removing because we no longer store the extra key material in dentries. If this lookup fails it will use an error path which release a transaction which wasn't held. Thankfully this error path is unlikely (corruption or systemic errors like eio or enomem) so we haven't hit this in practice. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Auke Kok	d480243c11	Support .read/write_iter callbacks in lieu of .aio_read/write The aio_read and aio_write callbacks are no longer used by newer kernels which now uses iter based readers and writers. We can avoid implementing plain .read and .write as an iter will be generated when needed for us automatically. We add a new data_wait_check_iter() function accordingly. With these methods removed from the kernel, the el8 kernel no longer uses the extended ops wrapper struct and is much closer now to upstream. As a result, a lot of methods are moving around from inode_dir_operations to and from inode_file_operations etc, and perhaps things will look a bit more structured as a result. As a result, we need a slightly different data_wait_check() that accounts for the iter and offset properly. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	ec50e66fff	Timespec64 changes for yr2038. Provide a fallback `current_time(inode)` implementation for older kernels. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	006555d42a	READ_ONCE() replaces ACCESS_ONCE() v3.18-rc3-2-g230fa253df63 forces us to remove ACCESS_ONCE() with READ_ONCE(), but it is probably the better interface and works with non-scalar types. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Zach Brown	0316c22026	Extend scoutfs_dir_add_next_linkrefs Extend scoutfs_dir_add_next_linkref() to be able to return multiple backrefs under the lock for each call and have it take an argument to limit the number of backrefs that can be added and returned. Its return code changes a bit in that it returns 1 on success instead of 0 so we have to be a little careful with callers who were expecting 0. It still returns -ENOENT when no entries are found. We break up its tracepoint into one that records each entry added and one that records the result of each call. This will be used by an ioctl to give callers just the entries that point to an inode instead of assembling full paths from the root. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-14 14:12:10 -07:00
Zach Brown	a61b8d9961	Fix renaming into root directory The VFS performs a lot of checks on renames before calling the fs method. We acquire locks and refresh inodes in the rename method so we have to duplciate a lot of the vfs checks. One of the checks involves loops with ancestors and subdirectories. We missed the case where the root directory is the destination and doesn't have any parent directories. The backref walker it calls returns -ENOENT instead of 0 with an empty set of parents and that error bubbled up to rename. The fix is to notice when we're asking for ancestors of the one directory that can't have ancestors and short circuit the test. Signed-off-by: Zach Brown <zab@versity.com>	2023-03-08 11:00:59 -08:00
Zach Brown	71ed4512dc	Include primary lock write_seq for write_only vers FS items are deleted by logging a deletion item that has a greater item version than the item to delete. The versions are usually maintained by the write_seq of the exclusive write lock that protects the item. Any newer write hold will have a greater version than all previous write holds so any items created under the lock will have a greater vers than all previous items under the lock. All deletion items will be merged with the older item and both will be dropped. This doesn't work for concurrent write-only locks. The write-only locks match with each other so their write_seqs are asssigned in the order that they are granted. That grant order can be mismatched with item creation order. We can get deletion items with lesser versions than the item to delete because of when each creation's write-only lock was granted. Write only locks are used to maintain consistency between concurrent writers and readers, not between writers. Consistency between writers is done with another primary write lock. For example, if you're writing seq items to a write-only region you need to have the write lock on the inode for the specific seq item you're writing. The fix, then, is to pass these primary write locks down to the item cache so that it can chose an item version that is the greatest amongst the transaction, the write-only lock, and the primary lock. This now ensures that the primary lock's increasing write_seq makes it down to the item, bringing item version ordering in line with exclusive holds of the primary lock. All of this to fix concurrent inode updates sometimes leaving behind duplicate meta_seq items because old seq item deletions ended up with older versions than the seq item they tried to delete, nullifying the deletion. Signed-off-by: Zach Brown <zab@versity.com>	2022-11-15 13:26:32 -08:00
Zach Brown	aed4313995	Simplify dentry verification Now that we've removed the hash and pos from the dentry_info struct we can do without it. We can store the refresh gen in the d_fsdsta pointer (sorry, 64bit only for now.. could allocate if we needed to.) This gets rid of the lock coverage spinlocks and puts a bit more pressure on lock lookup, which we already know we have to make more efficient. We can get rid of all the dentry info allocation calls. Now that we're not setting d_op as we allocate d_fsdata we put the ops on the super block so that we get d_revalidate called on all our dentries. We also are a bit more precise about the errors we can return from verification. If the target of a dentry link changes then we return -ESTALE rather than silently performing the caller's operation on another inode. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-27 14:32:06 -07:00
Zach Brown	c92a7ff705	Don't use dentry private hash/pos for deletion The dentry cache life cycles are far too crazy to rely on d_fsdata being kept in sync with the rest of the dentry fields. Callers can do all sorts of crazy things with dentries. Only unlink and rename need these fields and those operations are already so expensive that item lookups to get the current actual hash and pos are lost in the noise. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-26 16:42:26 -07:00
Zach Brown	29538a9f45	Add POSIX ACL support Add support for the POSIX ACLs as described in acl(5). Support is enabled by default and can be explicitly enabled or disabled with the acl or noacl mount options, respectively. Signed-off-by: Zach Brown <zab@versity.com>	2022-09-28 10:36:10 -07:00
Zach Brown	798fbb793e	Move to xattr_handler xattr prefix dispatch Move to the use of the array of xattr_handler structs on the super to dispatch set and get from generic_ based on the xattr prefix. This will make it easier to add handling of the pseudo system. ACL xattrs. Signed-off-by: Zach Brown <zab@versity.com>	2022-09-21 14:24:52 -07:00
Zach Brown	bddca171ee	Call iput outside cluster locked transactions The final iput of an inode can delete items in cluster locked transactions. It was never safe to call iput within locked transactions but we never saw the problem. Recent work on inode deletion raised the issue again. This makes sure that we always perform iput outside of locked transactions. The only interesting change is making scoutfs_new_inode() return the allocated inode on error so that the caller can put the inode after releasing the transaction. Signed-off-by: Zach Brown <zab@versity.com>	2022-03-11 15:29:20 -08:00
Zach Brown	15d7eec1f9	Disallow openening unlinked files by handle Our open by handle functions didn't care that the inode wasn't referenced and let tasks open unlinked inodes by number. This interacted badly with the inode deletion mechanisms which required that inodes couldn't be cached on other nodes after the transaction which removed their final reference. If a task did accidentally open a file by inode while it was being deleted it could see the inode items in an inconsistent state and return very confusing errors that look like corruption. The fix is to give the handle iget callers a flag to tell iget to only get the inode if it has a positive nlink. If iget sees that the inode has been unlinked it returns enoent. Signed-off-by: Zach Brown <zab@versity.com>	2022-01-24 09:40:08 -08:00
Bryant G. Duffy-Ly	1b8e3f7c05	Add basic renameat2 syscall support Support generic renameat2 syscall then add support for the RENAME_NOREPLACE flag. To suppor the flag we need to check the existance of both entries and return -EXIST. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-19 17:54:02 -06:00
Zach Brown	95ed36f9d3	Maintain inode count in super and log trees Add a count of used inodes to the super block and a change in the inode count to the log_trees struct. Client transactions track the change in inode count as they create and delete inodes. The log_trees delta is added to the count in the super as finalized log_trees are deleted. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Bryant Duffy-Ly	66b8c5fbd7	Enhance clarify of some kfree paths In some of the allocation paths there are goto statements that end up calling kfree(). That is fine, but in cases where the pointer is not initially set to NULL then we might have an undefined behavior. kfree on a NULL pointer does nothing, so essentially these changes should not change behavior, but clarifies the code path better. Signed-off-by: Bryant Duffy-Ly <bduffyly@versity.com>	2021-10-06 18:07:27 -05:00
Zach Brown	6ca8c0eec2	Consistently initialize dentry info Unfortunately, we're back in kernels that don't yet have d_op->d_init. We allocate our dentry info manually as we're given dentries. The recent verification work forgot to consistently make sure the info was allocated before using it. Fix that up, and while we're at it be a bit more robust in how we check to see that it's been initialized without grabbing the d_lock. Signed-off-by: Zach Brown <zab@versity.com>	2021-09-13 14:41:07 -07:00
Zach Brown	ea2b01434e	Add support for i_version This adds i_version to our inode and maintains it as we allocate, load, modify, and store inodes. We set the flag in the superblock so in-kernel users can use i_version to see changes in our inodes. Signed-off-by: Zach Brown <zab@versity.com>	2021-09-13 14:41:07 -07:00
Zach Brown	46edf82b6b	Add inode crtime creation time Add an inode creation time field. It's created for all new inodes. It's visible to stat_more. setattr_more can set it during restore. Signed-off-by: Zach Brown <zab@versity.com>	2021-09-03 11:14:41 -07:00
Zach Brown	79fbaa6481	Verify dentries after locking Our dir methods were trusting dentry args. The vfs code paths use i_mutex to protect dentries across revalidate or lookup and method calls. But that doesn't protect methods running in other mounts. Multiple nodes can interleave the initial lookup or revalidate then actual method call. Rename got this right. It is very paranoid about verifying inputs after acquiring all the locks it needs. We extend this pattern to the rest of the methods that need to use the mapping of name to inode (and our hash and pos) in dentries. Once we acquire the parent dir lock we verify that the dentry is still current, returning -EEXIST or -ENOENT as appropriate. Along these lines, we tighten up dentry info correctness a bit by updating our dentry info (recording lock coverage and hash/pos) for negative dentries produced by lookup or as the result of unlink. Signed-off-by: Zach Brown <zab@versity.com>	2021-08-31 09:49:32 -07:00
Zach Brown	d6bed7181f	Remove almost all interruptible waits As subsystems were built I tended to use interruptible waits in the hope that we'd let users break out of most waits. The reality is that we have significant code paths that have trouble unwinding. Final inode deletion during iput->evict in a task is a good example. It's madness to have a pending signal turn an inode deletion from an efficient inline operation to a deferred background orphan inode scan deletion. It also happens that golang built pre-emptive thread scheduling around signals. Under load we see a surprising amount of signal spam and it has created surprising error cases which would have otherwise been fine. This changes waits to expect that IOs (including network commands) will complete reasonably promptly. We remove all interruptible waits with the notable exception of breaking out of a pending mount. That requires shuffling setup around a little bit so that the first network message we wait for is the lock for getting the root inode. Signed-off-by: Zach Brown <zab@versity.com>	2021-07-30 13:22:42 -07:00
Zach Brown	4893a6f915	scoutfs_dirents_equal should return bool It looks like it returned u64 because it was derived from _name_hash(). Signed-off-by: Zach Brown <zab@versity.com>	2021-07-30 13:22:42 -07:00
Zach Brown	73bf916182	Return ENOSPC as space gets low Returning ENOSPC is challenging because we have clients working on allocators which are a fraction of the whole and we use COW transactions so we need to be able to allocate to free. This adds support for returning ENOSPC to client posix allocators as free space gets low. For metadata, we reserve a number of free blocks for making progress with client and server transactions which can free space. The server sets the low flag in a client's allocator if we start to dip into reserved blocks. In the client we add an argument to entering a transaction which indicates if we're allocating new space (as opposed to just modifying existing data or freeing). When an allocating transaction runs low and the server low flag is set then we return ENOSPC. Adding an argument to transaciton holders and having it return ENOSPC gave us the opportunity to clean it up and make it a little clearer. More work is done outside the wait_event function and it now specifically waits for a transaction to cycle when it forces a commit rather than spinning until the transaction worker acquires the lock and stops it. For data the same pattern applies except there are no reserved blocks and we don't COW data so it's a simple case of returning the hard ENOSPC when the data allocator flag is set. The server needs to consider the reserved count when refilling the client's meta_avail allocator and when swapping between the two meta_avail and meta_free allocators. We add the reserved metadata block count to statfs_more so that df can subtract it from the free meta blocks and make it clear when enospc is going to be returned for metadata allocations. We increase the minimum device size in mkfs so that small testing devices provide sufficient reserved blocks. And finally we add a little test that makes sure we can fill both metadata and data to ENOSPC and then recover by deleting what we filled. Signed-off-by: Zach Brown <zab@versity.com>	2021-07-07 14:13:14 -07:00
Zach Brown	07210b5734	Reliably delete orphaned inodes Orphaned items haven't been deleted for quite a while -- the call to the orphan inode scanner has been commented out for ages. The deletion of the orphan item didn't take rid zone locking into account as we moved deletion from being strictly local to being performed by whoever last used the inode. This reworks orphan item management and brings back orphan inode scanning to correctly delete orphaned inodes. We get rid of the rid zone that was always _WRITE locked by each mount. That made it impossible for other mounts to get a _WRITE lock to delete orphan items. Instead we rename it to the orphan zone and have orphan item callers get _WRITE_ONLY locks inside their inode locks. Now all nodes can create and delete orphan items as they have _WRITE locks on the associated inodes. Then we refresh the orphan inode scanning function. It now runs regularly in the background of all mounts. It avoids creating cluster lock contention by finding candidates with unlocked forest hint reads and by testing inode caches locally and via the open map before properly locking and trying to delete the inode's items. Signed-off-by: Zach Brown <zab@versity.com>	2021-07-02 10:52:46 -07:00
Zach Brown	22371fe5bd	Fully destroy inodes after all mounts evict Today an inode's items are deleted once its nlink reaches zero and the final iput is called in a local mount. This can delete inodes from under other mounts which have opened the inode before it was unlinked on another mount. We fix this by adding cached inode tracking. Each mount maintains groups of cached inode bitmaps at the same granularity as inode locking. As a mount performs its final iput it gets a bitmap from the server which indicates if any other mount has inodes in the group open. This makes the two fast paths of opening and closing linked files and of deleting a file that was unlinked locally only pay a moderate cost of either maintaining the bitmap locally and only getting the open map once per lock group. Removing many files in a group will only lock and get the open map once per group. Signed-off-by: Zach Brown <zab@versity.com>	2021-04-21 12:17:33 -07:00
Andy Grover	0deb232d3f	Support O_TMPFILE and allow MOVE_BLOCKS into released extents Support O_TMPFILE: Create an unlinked file and put it on the orphan list. If it ever gains a link, take it off the orphan list. Change MOVE_BLOCKS ioctl to allow moving blocks into offline extent ranges. Ioctl callers must set a new flag to enable this operation mode. RH-compat: tmpfile support it actually backported by RH into 3.10 kernel. We need to use some of their kabi-maintaining wrappers to use it: use a struct inode_operations_wrapper instead of base struct inode_operations, set S_IOPS_WRAPPER flag in i_flags. This lets RH's modified vfs_tmpfile() find our tmpfile fn pointer. Add a test that tests both creating tmpfiles as well as moving their contents into a destination file via MOVE_BLOCKS. xfstests common/004 now runs because tmpfile is supported. Signed-off-by: Andy Grover <agrover@versity.com>	2021-04-05 14:23:44 -07:00
Zach Brown	da5911c311	Use d_materialise_unique to splice dir dentries When we're splicing in dentries in lookup we can be splicing the result of changes on other nodes into a stale dcache. The stale dcache might contain dir entries and the dcache does not allow aliased directories. Use d_materialise_unique() to splice in dir inodes so that we remove all aliased dentries which must be stale. We can still use d_splice_alias() for all other inode types. Any existing stale dentries will fail revalidation before they're used. Signed-off-by: Zach Brown <zab@versity.com>	2021-01-26 14:46:07 -08:00
Andy Grover	bed33c7ffd	Remove item accounting Remove kmod/src/count.h Remove scoutfs_trans_track_item() Remove reserved/actual fields from scoutfs_reservation Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-20 17:01:08 -08:00
Andy Grover	cf278f5fa0	scoutfs: Tidy some enum usage Prefer named to anonymous enums. This helps readability a little. Use enum as param type if possible (a couple spots). Remove unused enum in lock_server.c. Define enum spbm_flags using shift notation for consistency. Rename get_file_block()'s "gfb" parameter to "flags" for consistency. Signed-off-by: Andy Grover <agrover@versity.com>	2020-11-30 13:35:44 -08:00
Zach Brown	6bacd95aea	scoutfs: fs uses item cache instead of forest Use the new item cache for all the item work in the fs instead of calling into the forest of btrees. Most of this is mechanical conversion from the _forest calls to the _item calls. The item cache no longer supports the kvec argument for describing values so all the callers pass in the value pointer and length directly. The item cache doesn't support saving items as they're deleted and later restoring them from an error unwinding path. There were only two users of this. Directory entries can easily guarantee that deletion won't fail by dirtying the items first in the item cache. Xattr updates were a little trickier. They can combine dirtying, creating, updating, and deleting to atomically switch between items that describe different versions of a multi-item value. This also fixed a bug in the srch xattrs where replacing an xattr would create a new id for the xattr and leave existing srch items referencing a now deleted id. Replacing now reuses the old id. And finally we add back in the locking and transaction item cache integration. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach Brown	42e7fbb4f7	scoutfs: switch to using fnv1a for hashing We had a few uses of crc for hashing. That was fine enough for initial testing but the huge number of xattrs that srch is recording was seeing very bad collisions from the clumsy combination of crc32c into a 64bit hash. Replace it with FNV for now. This also takes the opportunity to use 3 hash functions in the forest bloom filter so that we can extract them from the 64bit hash of the key rather than iterating and recalculating hashes for each function. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach Brown	f9ff25db23	scoutfs: add dirent name fingerprint Entries in a directory are indexed by the hash of their name. This introduces a perfectly random access pattern. And this results in a cow storm as directories get large enough such that the leaf blocks that store their entries are larger than our commits. Each commit ends up being full of cowed leaf blocks that contain a single new entry. The dirent name fingerprints change the dirent key to first start with a fingerprint of the name. This reduces the scope of hash randomization from the entire directory to entries with the same fingerprint. On real customer dir sizes and file names we saw roughly 3x create rate improvements from being able to create more entries in leaf blocks within a commit. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach Brown	3a82090ab1	scoutfs: have per-fs inode nr allocators We had previously seen lock contention between mounts that were either resolving paths by looking up entries in directories or writing xattrs in file inodes as they did archiving work. The previous attempt to avoid this contention was to give each directory its own inode number allocator which ensured that inodes created for entries in the directory wouldn't share lock groups with inodes in other directories. But this creates the problem of operating on few files per lock for reasonably small directories. It also creates more server commits as each new directory gets its inode allocation reservation. The fix is to have mount-wide seperate allocators for directories and for everything else. This puts directories and files in seperate groups and locks, regardless of directory population. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach Brown	495358996c	scoutfs: fix older kc readdir emit When we added the kernelcompat layer around the old and new readdir interfaces there was some confusion in the old readdir interface filldir arguments. We were passing in our scoutfs dent item struct pointer instead of the filldir callback buf pointer. This prevented readdir from working in older kernels because filldir would immediately see a corrupt buf and return an error. This renames the emit compat macro arguments to make them consistent with the other calls and readdir now provides the correct pointer to the emit wrapper. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-21 16:28:06 -07:00
Zach Brown	48448d3926	scoutfs: convert fs callers to forest Convert fs callers to work with the btree forest calls instead of the lsm item cache calls. This is mostly a mechanical syntax conversion. The inode dirtying path does now update the item rather than simply dirtying it. Signed-off-by: Zach Brown <zab@versity.com>	2020-01-17 11:21:36 -08:00
Zach Brown	ddd1a4ef5a	scoutfs: support newer ->iterate readdir The modern upstream kernel has a ->iterate() readdir file_operattions method which takes a context and calls dir_emit(). We add some kernelcompat helpers to juggle the various function definitions, types, and arguments to support both the old ->readdir(filldir) and the new ->iterate(ctx) interfaces. Signed-off-by: Zach Brown <zab@versity.com>	2020-01-15 14:57:57 -08:00
Zach Brown	08a140c8b0	scoutfs: use our locking service Convert client locking to call the server's lock service instead of using a fs/dlm lockspace. The client code gets some shims to send and receive lock messages to and from the server. Callers use our lock mode constants instead of the DLM's. Locks are now identified by their starting key instead of an additional scoped lock name so that we don't have more mapping structures to track. The global rename lock uses keys that are defined by the format as only used for locking. The biggest change is in the client lock state machine. Instead of calling the dlm and getting callbacks we send messages to our server and get called from incoming message processing. We don't have everything come through a per-lock work queue. Instead we send requests either from the blocking lock caller or from a shrink work queue. Incoming messages are called in the net layer's blocking work contexts so we don't need to do any more work to defer to other contexts. The different processing contexts leads to a slightly different lock life cycle. We refactor and seperate allocation and freeing from tracking and removing locks in data structures. We add a _get and _put to track active use of locks and then async references to locks by holders and requests are tracked seperately. Our lock service's rules are a bit simpler in that we'll only ever send one request at a time and the server will only ever send one request at a time. We do have to do a bit of work to make sure we process back to back grant reponses and invalidation requests from the server. As of this change the lock setup and destruction paths are a little wobbly. They'll be shored up as we add lock recovery between the client and server. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00

1 2 3

137 Commits