scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-01-08 13:01:23 +00:00

Author	SHA1	Message	Date
Zach Brown	a23e7478a0	Fix move_blocks loop exit conditions The move_blocks ioctl intends to only move extents whose bytes fall inside i_size. This is easy except for a final extent that straddles an i_size that isn't aligned to 4K data blocks. The code that either checked for an extent being entirely past i_size or for limiting the number of blocks to move by i_size clumsily compared i_size offsets in bytes with extent counts in 4KB blocks. In just the right circumstances, probably with the help of a byte length to move that is much larger than i_size, the length calculation could result in trying to move 0 blocks. Once this hit the loop would keep finding that extent and calculating 0 blocks to move and would be stuck. We fix this by clamping the count of blocks in extents to move in terms of byte offsets at the start of the loop. This gets rid of the extra size checks and byte offset use in the loop. We also add a sanity check to make sure that we can't get stuck if, say, corruption resulted in an otherwise impossible zero length extent. Signed-off-by: Zach Brown <zab@versity.com>	2023-01-10 09:34:52 -08:00
Zach Brown	78279ffb4a	Merge pull request #108 from versity/zab/v1.10 v1.10 Release	2022-12-07 13:33:45 -08:00
Zach Brown	0b919e2ba7	v1.10 Release Finish the release notes for the 1.10 release. Signed-off-by: Zach Brown <zab@versity.com> v1.10	2022-12-07 12:30:17 -08:00
Zach Brown	bb5267f0c9	Merge pull request #107 from versity/zab/write_truncated_zero_tail Zab/write truncated zero tail	2022-12-06 11:31:52 -08:00
Zach Brown	6d4916954b	Add basic-truncate test Signed-off-by: Zach Brown <zab@versity.com>	2022-12-06 10:31:31 -08:00
Zach Brown	8e067b3d3f	Truncate dirties zero tail extension When we truncate away from a partial block we need to zero its tail that was past i_size and dirty it so that it's written. We missed the typical vfs boilerplate of calling block_truncate_page from setattr->set_size that does this. We need to be a little careful to pass our file lock down to get_block and then queue the inode for writeback so its written out with the transaction. This follows the pattern in .write_end. Signed-off-by: Zach Brown <zab@versity.com>	2022-12-06 10:31:31 -08:00
Zach Brown	87500e8bb5	Merge pull request #106 from versity/zab/invalidation_dprune_iput Zab/invalidation dprune iput	2022-12-02 13:23:56 -08:00
Zach Brown	41174867ed	Add t_get_sysfs_mount_option test func Add a quick little function to get the value of a mount option. Signed-off-by: Zach Brown <zab@versity.com>	2022-12-02 12:28:13 -08:00
Zach Brown	276fbebdac	Avoid dput in lock invalidation The d_prune_aliases in lock invalidation was thought to be safe because the caller had an inode refernece, surely it can't get into iput_final. I missed the fundamental dcache pattern that dput can ascend through parents and end up in inode eviction for entirely unrelated inodes. It's very easy for this to deadlock, imagine if nothing else that the inode invalidation is blocked on in dput->iput->evict->delete->lock is itself in the list of locks to invalidate in the caller. We fix this by always kicking off d_prune and dput into async work. This increases the chance that inodes will still be referenced after invalidation and prevent inline deletion. More deletions can be deferred until the orphan scanner finds them. It should be rare, though. We're still likely to put and drop invalidated inodes before a writer gets around to removing the final unlink and asking us for the omap that describes our cached inodes. To perform the d_prune in work we make it a behavioural flag and make our queued iputs a little more robust. We use much safer and understandable locking to cover the count and the new flags and we put the work in re-entrant work in their own workqueue instead of one work instance in the system_wq. Signed-off-by: Zach Brown <zab@versity.com>	2022-12-02 12:28:13 -08:00
Zach Brown	03df993e14	Merge pull request #105 from versity/zab/cw_item_vers Zab/cw item vers	2022-11-30 11:10:18 -08:00
Zach Brown	701f1a9538	Add test that checks duplicate meta_seq entries Add a quick test of the index items to make sure that rapid inode updates don't create duplicate meta_seq items. Signed-off-by: Zach Brown <zab@versity.com>	2022-11-15 13:26:32 -08:00
Zach Brown	71ed4512dc	Include primary lock write_seq for write_only vers FS items are deleted by logging a deletion item that has a greater item version than the item to delete. The versions are usually maintained by the write_seq of the exclusive write lock that protects the item. Any newer write hold will have a greater version than all previous write holds so any items created under the lock will have a greater vers than all previous items under the lock. All deletion items will be merged with the older item and both will be dropped. This doesn't work for concurrent write-only locks. The write-only locks match with each other so their write_seqs are asssigned in the order that they are granted. That grant order can be mismatched with item creation order. We can get deletion items with lesser versions than the item to delete because of when each creation's write-only lock was granted. Write only locks are used to maintain consistency between concurrent writers and readers, not between writers. Consistency between writers is done with another primary write lock. For example, if you're writing seq items to a write-only region you need to have the write lock on the inode for the specific seq item you're writing. The fix, then, is to pass these primary write locks down to the item cache so that it can chose an item version that is the greatest amongst the transaction, the write-only lock, and the primary lock. This now ensures that the primary lock's increasing write_seq makes it down to the item, bringing item version ordering in line with exclusive holds of the primary lock. All of this to fix concurrent inode updates sometimes leaving behind duplicate meta_seq items because old seq item deletions ended up with older versions than the seq item they tried to delete, nullifying the deletion. Signed-off-by: Zach Brown <zab@versity.com>	2022-11-15 13:26:32 -08:00
Zach Brown	57dff347a6	Merge pull request #104 from versity/zab/v1.9 v1.9 Release	2022-10-29 17:41:51 -07:00
Zach Brown	fb7cb057c4	v1.9 Release Finish the release notes for the 1.9 release. Signed-off-by: Zach Brown <zab@versity.com> v1.9	2022-10-29 16:41:58 -07:00
Zach Brown	1b924c501e	Merge pull request #103 from versity/zab/verify_dentry_errors Zab/verify dentry errors	2022-10-27 16:15:53 -07:00
Zach Brown	aed4313995	Simplify dentry verification Now that we've removed the hash and pos from the dentry_info struct we can do without it. We can store the refresh gen in the d_fsdsta pointer (sorry, 64bit only for now.. could allocate if we needed to.) This gets rid of the lock coverage spinlocks and puts a bit more pressure on lock lookup, which we already know we have to make more efficient. We can get rid of all the dentry info allocation calls. Now that we're not setting d_op as we allocate d_fsdata we put the ops on the super block so that we get d_revalidate called on all our dentries. We also are a bit more precise about the errors we can return from verification. If the target of a dentry link changes then we return -ESTALE rather than silently performing the caller's operation on another inode. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-27 14:32:06 -07:00
Zach Brown	61d86f7718	Add scoutfs_lock_ino_refresh_gen Add a lock call to get the current refresh_gen of a held lock. If the lock doesn't exist or isn't readable then we return 0. This an be used to track lock coverage of structures without the overhead and lifetime binding of the lock coverage struct. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-27 14:16:07 -07:00
Zach Brown	717b56698a	Remove __exit from scoutfs_sysfs_exit() scoutfs_sysfs_exit() is called during error handling in module init. When scoutfs is built-in (so, never.) the __exit section won't be loaded. Remove the __exit annotation so it's always available to be called. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-26 16:42:27 -07:00
Zach Brown	c92a7ff705	Don't use dentry private hash/pos for deletion The dentry cache life cycles are far too crazy to rely on d_fsdata being kept in sync with the rest of the dentry fields. Callers can do all sorts of crazy things with dentries. Only unlink and rename need these fields and those operations are already so expensive that item lookups to get the current actual hash and pos are lost in the noise. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-26 16:42:26 -07:00
Zach Brown	d05489c670	Merge pull request #102 from versity/zab/v1.8 v1.8 Release	2022-10-18 11:21:48 -07:00
Zach Brown	4806e8a7b3	v1.8 Release Finish the release notes for the 1.8 release. Signed-off-by: Zach Brown <zab@versity.com> v1.8	2022-10-18 09:48:41 -07:00
Zach Brown	b74f3f577d	Merge pull request #101 from versity/zab/data_prealloc_options Zab/data prealloc options	2022-10-17 12:18:51 -07:00
Zach Brown	d5ddf1ecac	Fix option save/restore test helpers The test shell helpers for saving and restoring mount options were trying to put each mount's option value in an array. It meant to build the array key by concatenating the option name and the mount number. But it didn't isolate the option "name" variable when evaluating it, instead always evaluating "name_" to nothing and building keys for all options that only contained the mount index. This then broke when tests attempted to save and restore multiple options. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-17 09:12:21 -07:00
Zach Brown	e27ea22fe4	Add run-tests -T option to increase trace size Add an option to increase the trace buffer size during the run. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-14 14:03:36 -07:00
Zach Brown	51fe5a4ceb	Add -o mount option argument to run-tests Add a run-tests option that lets us append an option string to all mounts performed during the tests. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-14 14:03:36 -07:00
Zach Brown	3847c4fe63	Add data-prealloc test Signed-off-by: Zach Brown <zab@versity.com>	2022-10-14 14:03:35 -07:00
Zach Brown	ef2daf8857	Make data preallocation tunable Make mount options for the size of preallocation and whether or not it should be restricted to extending writes. Disabling the default restriction to streaming writes lets it preallocate in aligned regions of the preallocation size when they contain no extents. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-14 14:03:35 -07:00
Zach Brown	064409eb62	Merge pull request #100 from versity/zab/acl Zab/acl	2022-09-29 09:51:10 -07:00
Zach Brown	ddc5d9f04d	Allow setting orphan_scan_delay_ms option The orphan_scan_delay_ms option setting code mistakenly set the default before testing the option for -1 (not the default) to discover if multiple options had been set. This made any attempt to set fail. Initialize the option to -1 so the first set succeeds and apply the default if we don't set the value. Signed-off-by: Zach Brown <zab@versity.com>	2022-09-28 10:36:10 -07:00
Zach Brown	433a80c6fc	Add compat for changing posix_acl_valid arguments Signed-off-by: Zach Brown <zab@versity.com>	2022-09-28 10:36:10 -07:00
Zach Brown	78405bb5fd	Remove ACL tests from xfstests expunge list Signed-off-by: Zach Brown <zab@versity.com>	2022-09-28 10:36:10 -07:00
Zach Brown	98e514e5f4	Add failure message to xattr length test The simple-xattr-unit test had a helper that failed by exiting with non-zero instead of emitting a message. Let's make it a bit easier to see what's going on. Signed-off-by: Zach Brown <zab@versity.com>	2022-09-28 10:36:10 -07:00
Zach Brown	29538a9f45	Add POSIX ACL support Add support for the POSIX ACLs as described in acl(5). Support is enabled by default and can be explicitly enabled or disabled with the acl or noacl mount options, respectively. Signed-off-by: Zach Brown <zab@versity.com>	2022-09-28 10:36:10 -07:00
Zach Brown	1826048ca3	Add _locked xattr get and set calls The upcoming acl support wants to be able to get and set xattrs from callers who already have cluster locks and transactions. We refactor the existing xattr get and set calls into locked and unlocked variants. It's mostly boring code motion with the unfortunate situation that the caller needs to acquire the totl cluster lock before holding a transaction before calling into the xattr code. We push the parsing of the tags to the caller of the locked get and set so that they can know to acquire the right lock. (The acl callers will never be setting scoutfs. prefixed xattrs so they will never have tags.) Signed-off-by: Zach Brown <zab@versity.com>	2022-09-28 10:11:24 -07:00
Zach Brown	798fbb793e	Move to xattr_handler xattr prefix dispatch Move to the use of the array of xattr_handler structs on the super to dispatch set and get from generic_ based on the xattr prefix. This will make it easier to add handling of the pseudo system. ACL xattrs. Signed-off-by: Zach Brown <zab@versity.com>	2022-09-21 14:24:52 -07:00
Zach Brown	d7b16419ef	Merge pull request #99 from versity/zab/v1.7 v1.7 Release	2022-08-26 13:20:56 -07:00
Zach Brown	f13aba78b1	v1.7 Release Finish the release notes for the 1.7 release. Signed-off-by: Zach Brown <zab@versity.com> v1.7	2022-08-26 11:38:23 -07:00
Zach Brown	3220c2055c	Merge pull request #98 from versity/zab/move_freed_many_commits Zab/move freed many commits	2022-08-01 09:09:28 -07:00
Zach Brown	1cbc927ccb	Only clear trying inode deletion bit when set try_delete_inode_items() is responsible for making sure that it's safe to delete an inode's persistent items. One of the things it has to check is that there isn't another deletion attempt on the inode in this mount. It sets a bit in lock data while it's working and backs off if the bit is already set. Unfortunately it was always clearing this bit as it exited, regardless of whether it set it or not. This would let the next attempt perform the deletion again before the working task had finished. This was often not a problem because background orphan scanning is the only source of regular concurrent deletion attempts. But it's a big problem if a deletion attempt takes a very long time. It gives enough time for an orphan scan attempt to clear the bit then try again and clobber on whoever is performing the very slow deletion. I hit this in a test that built files with an absurd number of fragmented extents. The second concurrent orphan attempt was able to proceed with deletion and performed a bunch of duplicate data extent frees and caused corruption. The fix is to only clear the bit if we set it. Now all concurrent attempts will back off until the first task is done. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-29 11:25:01 -07:00
Zach Brown	acb94dd9b7	Add test of large fragmented free lists Add a test which gives the server a transaction with a free list block that contains blknos that each dirty an individiaul btree blocks in the global data free extent btree. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-29 11:25:01 -07:00
Zach Brown	233fbb39f3	Limit alloc_move per-call allocator consumption Recently scoutfs_alloc_move() was changed to try and limit the amount of metadata blocks it could allocate or free. The intent was to stop concurrent holders of a transaction from fully consuming the available allocator for the transaction. The limiting logic was a bit off. It stopped when the allocator had the caller's limit remaining, not when it had consumed the caller's limit. This is overly permissive and could still allow concurrent callers to consume the allocator. It was also triggering warning messages when a call consumed more than its allowed budget while holding a transaction. Unfortunately, we don't have per-caller tracking of allocator resource consumption. The best we can do is sample the allocators as we start and return if they drop by the caller's limit. This is overly conservative in that it accounts any consumption during concurrent callers to all callers. This isn't perfect but it makes the failure case less likely and the impact shouldn't be significant. We don't often have a lot of concurrency and the limits are larger than callers will typically consume. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-29 11:25:01 -07:00
Zach Brown	198d3cda32	Add scoutfs_alloc_meta_low_since() Add scoutfs_alloc_meta_low_since() to test if the metadata avail or freed resources have been used by a given amount since a previous snapshot. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-29 11:24:10 -07:00
Zach Brown	e8c64b4217	Move freed data extents in multiple server commits As _get_log_trees() in the server prepares the log_trees item for the client's commit, it moves all the freed data extents from the log_trees item into core data extent allocator btree items. If the freed blocks are very fragmented then it can exceed a commit's metadata allocation budget trying to dirty blocks in the free data extent btree. The fix is to move the freed data extents in multiple commits. First we move a limited number in the main commit that does all the rest of the work preparing the commit. Then we try to move the remaining freed extents in multiple additional commits. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-28 11:42:33 -07:00
Zach Brown	89b64ae1f7	Merge pull request #97 from versity/zab/v1_6_release v1.6 Release	2022-07-07 14:54:26 -07:00
Zach Brown	fc8a5a1b5c	v1.6 Release Finish the release notes for the 1.6 release. Signed-off-by: Zach Brown <zab@versity.com> v1.6	2022-07-07 13:07:55 -07:00
Zach Brown	d4c793e010	Merge pull request #94 from versity/zab/mem_free_fixes Zab/mem free fixes	2022-07-07 13:07:04 -07:00
Zach Brown	8a3058818c	Merge pull request #95 from versity/zab/skip_likely_huge Add skip-likely-huge print option	2022-07-07 10:27:50 -07:00
Zach Brown	ba9a106f72	Free send attempts to disconnected clients Callers who send to specific client connections can get -ENOTCONN if their client has gone away. We forgot to free the send tracking struct in that case. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-06 15:16:20 -07:00
Zach Brown	310725eb72	Free omap rid list as server exits The omap code keeps track of rids that are connected to the server. It only freed the tracked rids as the server told it that rids were being removed. But that removal only happened as clients were evicted. If the server shutdown it'd leave the old rid entries around. They'd be leaked as the mount was unmounted and could linger and crate duplicate entries if the server started back up and the same clients reconnected. The fix is to free the tracking rids as the server shuts down. They'll be rebuilt as clients reconnect if the server restarts. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-06 15:16:19 -07:00
Zach Brown	51a8236316	Fix missed partial fill_super teardown If we return an error from .fill_super without having set sb->s_root then the vfs won't call our put_super. Our fill_super is careful to call put_super so that it can tear down partial state, but we weren't doing this with a few very early errors in fill_super. This tripped leak detection when we weren't freeing the sbi when returning errors from bad option parsing. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-06 15:16:19 -07:00

1 2 3 4 5 ...

1681 Commits