scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-01-08 04:55:21 +00:00

Author	SHA1	Message	Date
Auke Kok	d1092cdbe9	current_time() is no longer extern. Since v6.5-rc1-7-g9b6304c1d537, current_time() is no longer extern, so we need to update this grep regex to continue to match. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-11-08 13:21:03 -05:00
Zach Brown	19e78c32fc	Allow null lock compatibility between nodes Right now a client requesting a null mode for a lock will cause invalidations of all existing granted modes of the lock across the cluster. This is unneccessarily broad. The absolute requirement is that a null request invalidates other existing granted modes on the client. That's how the client safely resolves shrinking's desire to free locks while the locks are in use. It relies on turning it into the race between use and remote invalidation. But that only requires invalidating existing grants from the requesting client, not all clients. It is always safe for null grants to coexist with all grants on other clients. Consider the existing mechanics involving null modes. First, null locks are instatiated on the client before sending any requests at all. At any given time newly allocated null locks are coexisting with all existing locks across the cluster. Second, the server frees the client entry tracking struct the moment it sends a null grant to the client. From that point on the client's null lock can not have any impact on the rest of the lock holders because the server has forgotten about it. So we add this case to the server's test that two client lock modes are compatible. We take the opportunity to comment the heck out of this function instead of making it a dense boolean composition. The only functional change is the addition of this case, the existing cases are refactored but unchanged. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-31 15:34:59 -07:00
Zach Brown	8c1a45c9f5	Use bools instead of weird addition as or in net When freeing acked reesponses in the net layer we sweep the send and resend queues looking for queued responses up to the sequence number we've had acked. The code that did this used a weird pattern of returning ints and adding them which gave me pause. Clean it up to use bools and or (not short-circuiting \|\|) to more obviously communicate what's going on. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-30 13:38:12 -07:00
Zach Brown	5a6eb569f3	Add some lock debugging trace fields Over time some fields have been added to the lock struct which haven't been added to the lock tracing output. Add some of the more relevant lock fields to tracing. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-30 13:16:04 -07:00
Zach Brown	69d9040e68	Close lock server use-after-free race Lock object lifetimes in the lock server are protected by reference counts. References are acquired while holding a lock on an rbtree. Unfortunately, the decision to free lock objects wasn't tested while also holding that lock on the rbtree. A caller putting their object would test the refcount, then wait to get the rbtree lock to remove it from the tree. There's a possible race where the decision is made to remove the object but another reference is added before the object is removed. This was seen in testing and manifest as an incoming request handling path adding a request message to the object before it is freed, losing the message. Clients would then hang on a lock that never saw a response because their request was freed with the lock object. The fix is to hold the rbtree lock when testing the refcount and deciding to free. It adds a bit more contention but not significantly so, given the wild existing contention on a per-fs spinlocked rbtree. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-30 13:04:13 -07:00
Greg Cymbalski	70c36ae394	Generate debug packages We had previously explicitly disabled this; let's start generating them.	2024-10-24 14:56:09 -07:00
Auke Kok	235ab133a7	We must provide a_ops->dirty_folio and invalidate_folio. In v5.17-rc4-53-g3a3bae50af5d, we can no longer omit having this method unhooked as the mm caller blindly calls it now. In-kernel filesystems all were fixed in this change. aops->invalidatepage was the old aops method that would free pages with private attached data. This method is replaced with the new invalidate_folio method. If this method is NULL, the memory will become orphaned. (v5.17-rc4-29-gf50015a596fa) Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 15:38:34 -07:00
Auke Kok	7d0e7e29f8	Avoid integer wrapping pitfalls for (off, len) pairs. We use check_add_overflow(a, b, d) here to validate that (off, len) pairs do not exceed the max value type. The kernel conveniently has several macros to sort out the problems with signed or unsigned types. However, we're not interested in purely seeing whether (a + b) overflows, because we're using this for (off, len) overflow checks, where the bytes we read are from 0 to len -1. We must therefore call this check with (b) being "len - 1". I've made sure that we don't accidentally fail when (len == 0) in all cases by making sure we've already checked this condition before, and moving code around as needed to ensure that (len > 0) in all cases where we check. The macro check_add_overflow requires a (d) argument in which temporarily the result of the addition is stored and then checked to see if an overflow occurred. We put a `tmp` variable on the stack of the correct type as needed to make the checks function. simple-release-extents test mistakenly relied on this buggy wrap code, so it needs fixing. The move-blocks test also got it wrong. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	69de6d7a74	Check for zero len in scoutfs_data_wait_check We consistently enter scoutfs_data_wait_check when len == 0 from scoutfs_aio_write() which directly passes the i_size_read() value, and for cases where we `echo >> $FILE` this is always reached. This can cause the wrapping check to fail since `0 + (0 - 1) < 0` which triggers the WARN_ON_ONCE wrap check that needs updating to allow certain operations on huge files. More importantly we can just omit all these checks if `len == 0` anyway, since they should always succeed and never should require taking all the locks. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	c298360a49	blkdev api changes - pass holder to replace FMODE_EXCL Passing a holder ptr to these functions now replaces the FMODE_EXCL flag. _put no longer needs flags for this reason, but the holder instead. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	95f4e56546	Introduce blk_mode_t instead of abuse of fmode_t v6.4-rc2-198-g05bdb9965305 adds a new type for passing flags instead of abusing fmode_t flags. They are essentially the same flags just in a new type. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	d5c2768f04	.tmpfile method now passed a struct file, which must be opened. v6.0-rc6-9-g863f144f12ad changes the VFS method to pass in a struct file and not a dentry in preperation for tmpfile support in fuse. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	676d429264	Assume el9 is the same as el8 for rpmbuild purposes. The current spec template can't handle future major el releases gracefully and fails to build entirely. We isolate all changes so that they are either "el7 specific" or generic. This rids us entirely of el8 specific conditionals. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	5b260e6b54	block_write_begin() no longer is being passed aop_flags. The flag is now obsolete, we don't need to set flags here or pass them. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	e2b06f2c92	mpage_readpage() is now replaced with mpage_read_folio. Folios are the new data types used for passing pages. For now, folios only appear to have a single page. Future kernels will change that. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	546b437df7	Shrinkers are now registered with a name. v5.19-rc4-52-ge33c267ab70d Adds shrinker names to the registration call to aid with shrinker debugging, which is highly opaque. To enable you'll have to recompile the kernel with CONFIG_SHRINKER_DEBUG=y though, since it's disabled by default in OSV kernels. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	381f4543b7	Use iter based read/write to support splice and thus sendfile(). The iter based read/write calls can support splice in el9 if we hook up these calls, otherwise splice will stop working. ->write() similar to: v3.15-rc4-330-g8d0207652cbe. ->read() to generic implementation. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	418a441604	kernel_setsockopt no longer available. We instead opt to use sock_setsockopt which is generally exactly the same and can be easily converted to map to kernel_setsockopt without impacting the code significantly. There are 3 methods we're calling with usec timeval's, and that is significantly different now that this requires a bit more compat code so we split these out to separate compat functions to handle them. Some of the TCP sock functions also have a slightly different signature that we want to split them out (struct socket vs. sock). Some further no longer return success, either. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	f3abf9710b	generic_perform_write signature changed It now only needs the iocb and no longer the flip. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	8a45c2baff	Deprecate struct timeval. We switch to using 64bit usec structs and recommended replacement functions from Documentation/core-api/timekeeping.rst. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	345ebd0876	fiemap_prep replaces fiemap_check_flags. v5.7-rc4-53-gcddf8a2c4a82 The prep helper replaces the sanity checks. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	b718cf09de	Handle idmapped mounts in xattr_handler In v5.11-rc4-8-ge65ce2a50cf6 the *set handler is passed a user_namespace struct pointing to the map from the mount. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	e4721366ff	Added user_ns argument to posix_acl_update_mode, set_posix_acl v5.11-rc4-8-ge65ce2a50cf6 adds idmap support to these calls. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	4ef64c6fcf	Vfs methods become user namespace mount aware. v5.11-rc4-24-g549c7297717c All of these VFS methods are now passed a user_namespace. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	2d58ee2a37	Account for new bio_alloc() args. Block device and opf are now passed through and set. We mimic compat code to do the same. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	1f0dd7f025	__vmalloc defaults to PAGE_KERNEL everywhere, so the arg was removed. v5.7-523-g88dca4ca5a93 __vmalloc no longer has the 3rd argument. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	077468ac1e	debugfs_create_atomic_t now returns void, don't check result Greg KH tells us to do just this in v5.4-rc5-31-g9927c6fa3e1d: No one checks the return value of debugfs_create_atomic_t(), as it's not needed, so make the return value void, so that no one tries to do so in the future. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	c951713ab2	list_cmp_func_t introduced, using const. v5.12-rc6-9-g4f0f586bf0c8 All list_sort functions use the list_cmp_func_t type, which compares list_head member types. These are now required to be `const` as the compiler will now check them. This propagates into our callers. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	ad82a5e52a	Squelch warning from bpf_iter.c. v5.7-rc2-1174-gfd4f12bc38c3 significantly rewrites the bpf iterator which hits this _next() function. It also adds a check that verifies that the *pos is incremented after every call, even if it goes beyond the last member (in which case it's not used). Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	d3c5328909	setattr_prepare no longer extern in fs.h v5.11-rc4-7-g2f221d6f7b88 Changes setattr_prepare from an extern to plain int. There's no impact further to the compat to keep it working except for the detection regex. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	c30172210f	Use blk_opf_t to pass bio op flags Compat is back to unsigned int. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	19af6e28fb	"unaligned/access_ok.h" is not needed, and removed. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	8885486bc8	Add several low level includes. Newer kernels include less header dependencies by default, so we have to add these. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	0204e092e4	FIELD_SIZEOF was deprecated. We could use sizeof_field as a direct replacement (which is the same) except that this entire thing can directly use offsetofend(). Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	b45fbe0bbb	Don't pass data version to attr_x unless the ioctl means to set it. The wrapper in setattr_more that translates the operations to attr_x needs to decide whether to ask attr_x to perform a change to any of the fields passed to it or not. For the date and size fields this is implicit - we always tell attr_x to change them. For any of the other fields, it should be explicit. The only field that is in the struct that this applies to is data_version. Because the data version field by default is zero, we use that as condition to decide whether to pass the data_version down to attr_x. Previously, the code would always pass a data_version=0 down to attr_x, triggering one of the validity checks, making it return -EINVAL. We add a simple test case to test for this issue. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-09-27 19:31:22 -04:00
Greg Cymbalski	4dde57dc27	Rely on $PATH for weak-modules This avoids having to deal with EL-specific path differences for the weak-modules script.	2024-09-27 12:30:25 -07:00
Auke Kok	fb93d82b1e	Add shrinker counters for wkic and quota_info. These new shrinkers were recently added. Because there's very little ways to debug them, or even see them properly function, we should at least add counters for them. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-09-18 13:40:54 -04:00
Auke Kok	ccd65b9a61	Fix POSIX ACL use in el8+. In `29160b0b` I mistakenly disabled all caching of ACLs for el8 instead of only disabling cache lookups. The correct change should have been to disable cache lookups only, and leave setting the acl cache after storing or fetching, as the kernel needs this data to resolve acls when doing permission checks. Restore the acl cache insertions fixes. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-08-09 17:57:23 -04:00
Zach Brown	38c6d66ffc	Add indx xattr tag support Add support for the indx xattr tag which lets xattrs determine the sort order of by their inode number in a global index. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	38e6f11ee4	Add quota support Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	4a8240748e	Add project ID support Add support for project IDs. They're managed through the _attr_x interfaces and are inherited from the parent directory during creation. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	9c45e8b7ef	read_xattr_totls ioctl uses weak item cache Change the read_xattr_totls ioctl to use the weak item cache instead of manually reading and merging the fs items for the xattr totals on every call. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	ee9e8c3e1a	Extract .totl. item merging into own functions The _READ_XATTR_TOTALS ioctl had manual code for merging the .totl. total and value while reading fs items. We're going to want to do this in another reader so let's put these in their own funcions that clearly isolate the logic of merging the fs items into a coherent result. We can get rid of some of the totl_read_ counters that tracked which items we were merging. They weren't adding much value and conflated the reading ioctl interface with the merging logic. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	5f156b7a36	Add scoutfs_forest_read_items_roots Add a forest item reading interface that lets the caller specify the net roots instead of always getting them from a network request. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	3a51ca369b	Add the weak item cache Add the weak item cache that is used for reads that can handle results being a little behind. This gives us a lot more freedom to implement the cache that biases concurrent reads. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	fb5331a1d9	Add inode retention bit Add a bit to the private scoutfs inode flags which indicates that the inode is in retention mode. The bit is visible through the _attr_x interface. It can only be set on regular files and when set it prevents modification to all but non-user xattrs. It can be cleared by root. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	270726a6ea	Implement stat_more and setattr_more with attr_x Now that we have the attr_x calls we can implement stat_more with get_attr_x and setattr_more with set_attr_x. The conversion of stat_more fixes a surprising consistency bug. stat_more wasn't acquiring a cluster lock for the inode nore refreshing it so it could have returned stale data if modifications were made in another mount. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	6a99ca9ede	Add attr_x core and ioctls The existing stat_more and setattr_more interfaces aren't extensible. This solves that problem by adding attribute interfaces which specify the specific fields to work with. We're about to add a few more inode fields and it makes sense to add them to this extensible structure rather than adding more ioctls or relatively clumsy xattrs. This is modeled loosely on the upstream kernel's statx support. The ioctl entry points call core functions so that we can also implement the existing stat_more and setattr_more interfaces in terms of these new attr_x functions. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	0521bd0e6b	Make offline extent creation use one transaction Initially setattr_more followed the general pattern where extent manipulation might require multiple transactions if there are lots of extent items to work with. The scoutfs_data_init_offline_extent() function that creates an offline extent handled transactions itself. But in this case the call only supports adding a single offline extent. It will always use a small fixed amount of metadata and could be combined with other metadata changes in one atomic transaction. This changes scoutfs_data_init_offline_extent() to have the caller handle transactions, inode updates, etc. This lets the caller perform all the restore changes in one transaction. This interface change will then be used as we add another caller that adds a single offline extent in the same way. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	361491846d	Add scoutfs_fmt_vers_unsupported() Add a little inline helper to test whether the mounted format version supports a feature or not, returning an errno that callers can use when they can return a shared expected error. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00

1 2 3 4 5 ...

1340 Commits