scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-01-05 11:45:09 +00:00

Author	SHA1	Message	Date
Zach Brown	c7e97eeb1f	Allow srch compaction from _SAFE_BYTES Compacting sorted srch files can take multiple transactions because they can be very large. Each transaction resumes at a byte offset in a block where the previous transaction stopped. The resuming code tests that the byte offsets are sane but had a mistake in testing the offset to skip to. It returned an error if the compaction resumed from the last possible safe offset for decoding entries. If a system is unlucky enough to have a compaction transaction stop at just this offset then compaction stops making forward progress as each attempt to resume returns an error. The fix allows continuation from this last safe offset while returning errors for attempts to continue past that offset. This matches all the encoding code which allows encoding the last entry in the block at this offset. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-07 12:34:00 -08:00
Zach Brown	21c070b42d	Add test for srch continutation safe pos errors Add a test for srch compaction getting stuck hitting errors continuing a partial operation. It ensures that a block has an encoded entry at the _SAFE_BYTES offset, that an operaton stops precisely at that offset, and then watches for errors. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-07 12:34:00 -08:00
Zach Brown	d5c699c3b4	Don't respond with ENOENT for no srch compaction The srch compaction request building function and the srch compaction worker both have logic to recognize a valid response with no input files indicating that there's no work to do. The server unfortunately translated nr == 0 into ENOENT and send that error response to the client. This caused the client to increment error counters in the common case when there's no compaction work to perform. We'd like the error counter to reflect actual errors, we're about to check it in a test, so let's fix this up to the server sends a sucessful response with nr == 0 to indicate that there's no work to do. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-07 10:30:38 -08:00
Zach Brown	006f429f72	Use seqlock instead of seqcount in server The server had a few lower level seqcounts that it used to protect state. One user got it wrong by forgetting to disable pre-emption around writers. Debug kernels warned as write_seqcount_begin() was called without preemption disabled. We fix that user and make it easier to get right in the future by having one higher level seqlock and using that consistently for seq read begin/retry and write lock/unlock patterns. Signed-off-by: Zach Brown <zab@versity.com>	2023-10-19 15:43:15 -07:00
Ben McClelland	d2c2fece2a	Add rpm spec file support for el8 builds The rpmbuild support files no longer define the previously used kernel module macros. This carves out the differences between el7 and el8 with conditionals based on the distro we are building for. Signed-off-by: Ben McClelland <ben.mcclelland@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	293cee9554	Don't use static struct initializer. In rhel7 this is a nested struct with ktime_t. However, in rhel8 ktime_t is a simple s64, and not a union, and thus we can't do this as easily. Just memset it. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	592e3d471f	Use `.prefix` for POSIX acl instead of `.name`. New kernels expect to do a partial match when a .prefix is used here, and provide a .name member in case matching should look at the whole string. This is what we want. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	29160b0bc6	Don't cache ACL's in newer kernels. The caller takes care of caching for us. Us doing caching messes with memory management of cached ACLs and breaks. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	d480243c11	Support .read/write_iter callbacks in lieu of .aio_read/write The aio_read and aio_write callbacks are no longer used by newer kernels which now uses iter based readers and writers. We can avoid implementing plain .read and .write as an iter will be generated when needed for us automatically. We add a new data_wait_check_iter() function accordingly. With these methods removed from the kernel, the el8 kernel no longer uses the extended ops wrapper struct and is much closer now to upstream. As a result, a lot of methods are moving around from inode_dir_operations to and from inode_file_operations etc, and perhaps things will look a bit more structured as a result. As a result, we need a slightly different data_wait_check() that accounts for the iter and offset properly. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	bafecbc604	Implement .readahead for address_space_operations (aops). .readpages is obsolete in el8 kernels. We implement the .readahead method instead which is passed a struct readahead_control. We use the readahead_page(rac) accessor to retrieve page by page from the struct. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	65be4682e3	implement generic_file_buffered_write() This function is removed in el8 therefore we need to implement it ourselves now. Copy it. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	e88845d185	(un)register_hotcpu_notifier is obsolete v4.9-12228-g530e9b76ae8f Drops all (un)register_(hot)cpu_notifier() API functions. From here on we need to use the new cpuhp_* API. We avoid this entirely for now, at the cost of leaking pages until the filesystem is unmounted. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	ec50e66fff	Timespec64 changes for yr2038. Provide a fallback `current_time(inode)` implementation for older kernels. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	0e91f9a277	Adjust scoutfs_quorum_loop trace point. Convert the timeout struct unto a u64 nsecs value before passing it to the trace point event, as to not overflow the 64bit limitation on args. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	69068ae2c0	Initialize msg.msg_iter from iovec. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	016dac39bf	Handle net arg being added to sock_create_kern() Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	e69cf3dec8	kernel_getsockname and kernel_getpeername dropped addrlen arg. v4.16-rc1-1-g9b2c45d479d0 This interface now returns (sizeof (addr)) on success, instead of 0. Therefore, we have to change the error condition detection. The compat for older kernels handles the addrlen check internally. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	d6c143a639	xattr functions are now passed flags through struct xattr_handler Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	09ae100254	Remove the use of backing_dev_info pt from address_space. Instead, use the new inline inode_to_bdi from <backing-dev.h> to fill in the task's backing_dev_info. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	50f5077863	Do not use MS_* flags anymore in kernel space. MS_* flags from <linux/mount.h> should not be used in the kernel anymore from 4.x onwards. Instead, we need to use the SB_* versions Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Zach Brown	cca4fcb788	Use count/scan objects shrinking interface Move to the more recent interfaces for counting and scanning cached objects to shrink. Signed-off-by: Zach Brown <zab@versity.com> Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	1d150da3f0	Use page->lru instead of page->list With v3.14-rc1-10-g34bf6ef94a83, page->list is removed Instead, use the union member ->lru. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Zach Brown	28f03d3558	Use more modern bio interfaces Move towards modern bio intefaces, while unfortunately carrying along a bunch of compat functions that let us still work with the old incompatible interfaces. Signed-off-by: Zach Brown <zab@versity.com> Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Zach Brown	4275f6e6e5	Use memalloc_nofs_save memalloc_nofs_save() was introduced as preferential to trying to use GFP flags to indicate that a task should not recurse during reclaim. We use it instead of the _noio_ we were using before. Signed-off-by: Zach Brown <zab@versity.com>	2023-10-09 15:35:40 -04:00
Zach Brown	70a5b6ffe2	Use percpu_counter_add_batch __percpu_counter_add_batch was renamed to make it clear that the __ doesn't mean it's less safe, as it means in other calls in the API, but just that it takes an additional parameter. Signed-off-by: Zach Brown <zab@versity.com> Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	b89ecd47b4	Use __posix_acl_create/_chmod and add backwards compatibility There are new interfaces available but the old one has been retained for us to use. In case of older kernels, we will need to fall back to the previous name of these functions. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	4293816764	Fix argument test for __posix_acl_valid. The argument is fixed to be user_namespace, instead of user_ns. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	f0de59a9a3	Use setattr_preapre() as inode_change_ok() was removed in v4.8-rc1 Instead, we can call setattr_prepare() directly. We provide a fallback for older kernels. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	1f0a08eacb	Use the new inode->i_version manipulation methods. Provide fallback in degraded mode for kernels pre-v4.15-rc3 by directly manipulating the member as needed. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	dac3f056a5	inode->i_mutex has been replaced with inode->i_rwsem. Since v4.6-rc3-27-g9902af79c01a, inode->i_mutex has been replaced with ->i_rwsem. However, long since whenever, inode_lock() and related functions already worked as intended and provided fully exclusive locking to the inode. To avoid a name clash on pre-rhel8 kernels, we have to rename a stack variable in `src/file.c`. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	af868aad9b	New inode->i_version API requires <iversion.h> Since v4.15-rc3-4-gae5e165d855d, <linux/iversion.h> contains a new inode->i_version API and it is not included by default. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	cf4df0ef9f	use $(MAKE) to allow passing jobserver flags. With this, we can `make -jX` to speed up compiles a bit from the kmod folder. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	81aa58253e	module_init/_exit should have a semicolon at eol. In the past this was not needed but since el7 onwards these macros should require the semicolon. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	c683ded0e6	Adjust for new augmented rbtree compute callback function signature The new variant of the code that recomputes the augmented value is designed to handle non-scalar types and to facilitate that, it has new semantics for the _compute callback. It is now passed a boolean flag `exit` that indicates that if the value isn't changed, it should exit and halt propagation. The callback function now shall return whether that propagation should stop or not, and not the computed new value. The callback can now directly update the new computed value in the node. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	f27431b3ae	Add include <blkdev.h>. Fixes: Error: implicit declaration of function ‘blkdev_put’ Previously this was an `extern` in <fs.h> and included implicitly, hence the need to hard include it now. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	28c3cee995	preempt_mask.h is removed entirely. v4.1-rc4-22-g92cf211874e9 merges this into preempt.h, and on rhel7 kernels we don't need this include anymore either. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	430960ef3c	page_cache_release() is removed. put_page() instead. Even in 3.x, this already was equivalent. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	7006a84d96	flush_work_sync is equivalent to flush_work. v3.15-rc1-6-g1a56f2aa4752 removes flush_work_sync entirely, but ever since v3.6-rc1-25-g606a5020b9bd which made all workqueues non-reentrant, it has been equivalent to flush_work. This is safe because in all cases only one server->work can be in flight at a time. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	eafb8621da	d_materialise_unique replaced with d_splice_alias. Note argument order reversal. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	006555d42a	READ_ONCE() replaces ACCESS_ONCE() v3.18-rc3-2-g230fa253df63 forces us to remove ACCESS_ONCE() with READ_ONCE(), but it is probably the better interface and works with non-scalar types. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	8e458f9230	PAGE_CACHE_SIZE was removed, replace with PAGE_SIZE. PAGE_CACHE_SIZE was previously defined to be equivalent to PAGE_SIZE. This symbol was removed in v4.6-rc1-32-g1fa64f198b9f. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Auke Kok	32c0dbce09	Include kernel.h and fs.h at the top of kernelcompat.h Because we `-include src/kernelcompat.h` from the command line, this header gets included before any of the kernel includes in most .c and .h files. We should at least make sure we pull in <fs> and <kernel> since they're required. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Zach Brown	4784ccdfd5	Start server commits when holds wait for alloc Server code that wants to dirty blocks by holding a commit won't be allowed to until the current allocators for the server transaction have enough space for the holder. As an active holder applies the commit the allocators are refilled and the waiting holders will proceed. But the current allocators can have no resources as the server starts up. There will never be active holders to apply the commit and refill the allocators. In this case all the holders will block indefinitely. The fix is to trigger a server commit when a holder doesn't have room. It used to be that commits were only triggered when apply callers were waiting. We transfer some of that logic into a new 'committing' field so that we can have commits in flight without apply callers waiting. We add it to the server commit tracing. While we're at it we clean up the logic that tests if a hold can proceed. It used to be confusingly split across two functions that both could sample the current allocator space remaining. This could lead to weird cases where the first holder could use the second alloc remaining call, not the one whose values were tested to see if the holder could fit. Now each hold check only samples the allocators once. And finally we fix a subtle case where the budget exceeded message can spuriously trigger in the case where dirtying the freed list created a new empty block after the holder recorded the amount of space in the freed block. Signed-off-by: Zach Brown <zab@versity.com>	2023-10-03 13:32:09 -07:00
Zach Brown	1672b3ecec	Merge pull request #130 from versity/zab/noncontig_alloc_einval Fix partial preallocation when _contig_only = 0	2023-07-17 10:21:18 -07:00
Zach Brown	55f9435fad	Fix partial preallocation when _contig_only = 0 Data preallocation attempts to allocate large aligned regions of extents. It tried to fill the hole around a write offset that didn't contain an extent. It missed the case where there can be multiple extents between the start of the region and the hole. It could try to overwrite these additional existing extents and writes could return EINVAL. We fix this by trimming the preallocation to start at the write offset if there are any extents in the region before the write offset. The data preallocation test output has to be updated now that allocation extents won't grow towards the start of the region when there are existing extents. Signed-off-by: Zach Brown <zab@versity.com>	2023-07-17 09:36:09 -07:00
Zach Brown	8a64b46a2f	Process log merge splicing in many commits Log merge completions were spliced in one server commit. It's possible to get enough completion work pending that it all can't be completed in one server commit. Operations fail with ENOSPC and because these changes can't be unwound cleanly the server asserts. This allows the completion splicing to break the work up into multiple commits. Processing completions in multiple commits means that request creation can observe the merge status in states that weren't possible before. Splicing is careful to maintain an elevated nr_complete count while the client can't get requests because the tree is rebalancing. Signed-off-by: Zach Brown <zab@versity.com>	2023-07-14 13:28:29 -07:00
Zach Brown	a9da27444f	Merge pull request #128 from versity/zab/prealloc_fragmentation Zab/prealloc fragmentation	2023-06-29 09:57:32 -07:00
Zach Brown	49fe89741d	Merge pull request #125 from versity/zab/get_referring_entries Zab/get referring entries	2023-06-29 09:57:06 -07:00
Zach Brown	847916860d	Advance move_blocks extent search offset The move_blocks ioctl finds extents to move in the source file by searching from the starting block offset of the region to move. Logically, this is fine. After each extent item is deleted the next search will find the next extent. The problem is that deleted items still exist in the item cache. The next iteration has to skip over all the deleted extents from the start of the region. This is fine with large extents, but with heavily fragmented extents this creates a huge amplification of the number of items to traverse when moving the fragmented extents in a large file. (It's not quite O(n^2)/2 for the total extents, deleted items are purged as we write out the dirty items in each transaction.. but it's still immense.) The fix is to simply start searching for the next extent after the one we just moved. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-28 16:54:28 -07:00
Zach Brown	3d99fda0f6	Preallocate data around iblock when noncontig If the _contig_only option isn't set then we try to preallocate aligned regions of files. The initial implementation naively only allowed one preallocation attempt in each aligned region. If it got a small allocation that didn't fill the region then every future allocation in the region would be a single block. This changes every preallocation in the region to attempt to fill the hole in the region that iblock fell in. It uses an extra extent search (item cache search) to try and avoid thousands of single block allocations. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-28 12:21:25 -07:00

1 2 3 4 5 ...

1261 Commits