scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-02-07 19:20:44 +00:00

Author	SHA1	Message	Date
Zach Brown	7767a8a48e	scoutfs: add item cache range tracing Add some tracepoints to track operations on our allocated item cache range structs. Signed-off-by: Zach Brown <zab@versity.com>	2017-11-21 13:11:43 -08:00
Zach Brown	3809f35b94	scoutfs: have item range tracepoint include fsid Signed-off-by: Zach Brown <zab@versity.com>	2017-11-21 13:11:43 -08:00
Zach Brown	5d52bb93ec	scoutfs: add item invalidation range trace point Signed-off-by: Zach Brown <zab@versity.com>	2017-11-21 13:11:43 -08:00
Mark Fasheh	dbb5541a0c	scoutfs: locking_state needs to include cwmode stats This was inadvertantly left out of the main CW locking commit. We simply need to seq_print the new fields. We add them to the end of the line, thus preserving backwards compatibility with old versions of the debug format. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-11-20 10:29:49 -08:00
Mark Fasheh	457d1b54cf	scoutfs: fix scoutfs_item_create() item leak We'll leak the new item if we don't have lock coverage. Move the check around to fix this. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-11-20 10:28:32 -08:00
Mark Fasheh	e8f87ff90a	scoutfs: use CW locks for inode index updates This will give us concurrency yet still allow our ioctls to drive cache syncing/invalidation on other nodes. Our lock_coverage() checks evolve to handle direct dlm modes, allowing us to verify correct usage of CW locks. As a test, we can run createmany on two nodes at the same time, each working in their own directory. The following commands were run on each node: $ mkdir /scoutfs/`uname -n` $ cd /scoutfs/`uname -n` $ /root/createmany -o ./file_$i 100000 Before this patch that test wouldn't finish in any reasonable amount of time and I would kill it after some number of hours. After this patch, we make swift progress through the test: [root@fstest3 fstest3.site]# /root/createmany -o ./file_$i 100000 - created 10000 (time 1509394646.11 total 0.31 last 0.31) - created 20000 (time 1509394646.38 total 0.59 last 0.28) - created 30000 (time 1509394646.81 total 1.01 last 0.43) - created 40000 (time 1509394647.31 total 1.51 last 0.50) - created 50000 (time 1509394647.82 total 2.02 last 0.51) - created 60000 (time 1509394648.40 total 2.60 last 0.58) - created 70000 (time 1509394649.06 total 3.26 last 0.66) - created 80000 (time 1509394649.72 total 3.93 last 0.66) - created 90000 (time 1509394650.36 total 4.56 last 0.64) total: 100000 creates in 35.02 seconds: 2855.80 creates/second [root@fstest4 fstest4.fstestnet]# /root/createmany -o ./file_$i 100000 - created 10000 (time 1509394647.35 total 0.75 last 0.75) - created 20000 (time 1509394647.89 total 1.28 last 0.54) - created 30000 (time 1509394648.46 total 1.86 last 0.58) - created 40000 (time 1509394648.96 total 2.35 last 0.49) - created 50000 (time 1509394649.51 total 2.90 last 0.55) - created 60000 (time 1509394650.07 total 3.46 last 0.56) - created 70000 (time 1509394650.79 total 4.19 last 0.72) - created 80000 (time 1509394681.26 total 34.66 last 30.47) - created 90000 (time 1509394681.63 total 35.03 last 0.37) total: 100000 creates in 35.50 seconds: 2816.76 creates/second Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-11-16 16:14:38 -08:00
Mark Fasheh	5fdcd54a54	scoutfs: _force variants of item_create and item_delete These variants will unconditionally overwrite any existing cached items, making them appropriate for us with CW locked inode index items. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-11-16 16:14:38 -08:00
Mark Fasheh	e70dbedb7b	scoutfs: dlmglue support for concurrent writer locks This is a bit trickier than just dropping in a cw holders count. dlmglue was comparing levels by a simple greater than or less than check. Since CW locks are not compatible with PR or EX, this check breaks down. Instead we provide a function which can tell us whether a conversion to a given lock levels is is compatible (cache-wise) with the level we have. We also have some slightly more complicated logic in downconvert. As a result we update the helper that dlmglue uses to choose a downconvert level. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-11-16 16:14:38 -08:00
Mark Fasheh	9fc67bcf13	scoutfs: add helper to check lock holders dlmglue does some holder checks that can become unwieldy, esepcially with the upcoming CW patch. Put them in a helper function. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-11-16 16:14:38 -08:00
Mark Fasheh	3a0d6839c8	scoutfs: provide a debug print method to dlmglue This allows us to decode our binary locknames into a string buffer which dlmglue can then print. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-11-10 11:44:39 -08:00
Mark Fasheh	fe8e5e095c	scoutfs: turn on cluster locking stats I accidentally left this off with the initial dlmglue commit. I enabled it here so that I could see our CW locks happening in real time. We don't print lock name yet but that will be remedied in a future patch. Turning this on gives us a debugfs file, /sys/kernel/debug/scoutfs/<fsid>/locking_state which exports the full lock state to userspace. The information exported on each lock is extensive. The export includes each locks name level, blocking level, request state, flags, etc. We also get a count of lock attempts and failures for each level (cw, pr, ex). In addition we also get the total time and max time waited on a given lock request. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-11-10 11:44:39 -08:00
Zach Brown	a3d500c143	scoutfs: add rename trace point Signed-off-by: Zach Brown <zab@versity.com>	2017-11-08 13:37:16 -08:00
Zach Brown	5c3962d223	scoutfs: trace correct index item deletion The trace point for deleting index items was using the wrong major and minor. Signed-off-by: Zach Brown <zab@versity.com>	2017-11-08 13:37:16 -08:00
Zach Brown	1c77473551	scoutfs: free both btree iter keys on error I noticed while working on other code that we weren't trying to free potentially allocated btree iter keys if one of them saw an allocation failure. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-26 14:47:59 -07:00
Zach Brown	22911afc6e	scoutfs: remove btree item bit tracking The augmenting of the btree to track items with bits set was too fiddly for its own good. We were able to migrate old btree blocks with a simple stored key while also fixing livelocks as the parent and item bits got out of sync. This is now unused buggy code that can be removed. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-26 14:47:59 -07:00
Zach Brown	ecbf59d130	scoutfs: use migration key instead of bits The bit tracking code was a bit much (HA). It introduced a lot of complexity just to provide a way to migrate blocks from the old half of the ring into the current half of the ring. We can get rid of a ton of code and potential for bugs if we simply store a persistent migration key in the super and use it to sweep the tree looking for old blocks to dirty. A simple tree walk that dirties and returns the next key is all we need. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-26 14:47:59 -07:00
Zach Brown	5f74a7280c	scoutfs: refresh inode in xattr set scoutfs_xattr_set() refreshes the cached item inode with its current vfs inode. It has to refresh its vfs item as it acquires the lock before it asserts that vfs inode as current. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-26 14:47:59 -07:00
Zach Brown	4c6253a18e	scoutfs: add lock trace event, convert invalidate Expand the generic lock tracing event to trace the level and holders, add an event for acquiring a lock, and switch the invalidation event over to using the lock class. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-26 14:47:59 -07:00
Zach Brown	4263a22c15	scoutfs: actually initialize per_task entry head We had callers using the initialization macro, it just didn't do anything. The uninitialized entries triggered a bug on trying to delete an uninitialized entry. fsx-mpi tripped over this on shutdown after seeing a consistency error. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-26 14:47:59 -07:00
Zach Brown	95d8f4bf20	scoutfs: only allow recursive blocked hold fsx-mpi spins creating contention between ex holders of locks between nodes. It was tripping assertions in item invalidation as it tried to invalidate dirty items. Tracing showed that we were allowing holders of locks while we were invalidating. Our invalidation function would commit the current transaction, another task would hold the lock and dirty an item, and then invalidation would continue on and try to invalidate the dirty item. The invalidation code has always assumed that it's not running concurrently with item dirtying. The recursive locking change allowed acquireing blocked locks if the recursive flag was set. It'd then check holders after calling downconvert_worker (invalidation for us) and retry the downconvert if a holder appeared. That it allowed recursive holders regardless of who was alredy holding the lock is what let holders arrive once downconvert started on the blocked lock. Not only did this create our problem with invalidation, it also could leave items behind if the holder dirtied an item and dropped the lock between invalidation and before downconvert checked the holders again. The fix is to only allow recursive holders on blocked locks that already have holders. This ensures that holders will never increase past zero on blocked locks. Once the downconvert sees the holders drain it will call invalidation which won't have racing dirtiers. We can remove the holder check after invalidation entirely. With this fixed fsx-mpi no longer tries to invalidate dirty items as it bounces locks back and forth. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-26 14:47:59 -07:00
Zach Brown	0712ca6b9b	scoutfs: correctly set new flag in get_blocks We weren't setting the new flag in the mapped buffer head. This tells the caller that the buffer is newly allocated and needs to be zeroed. Without this we expose unwritten newly allocated block contents. fsx found this almost immediately. With this fixed fsx passes. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-26 14:47:59 -07:00
Mark Fasheh	20a22ddc6b	scoutfs: provide ->setattr Simple attr changes are mostly handled by the VFS, we just have to mirror them into our inode. Truncates are done in a seperate set of transactions. We use a flag to indicate an in-progress truncate. This allows us to detect and continue the truncate should the node crash. Index locking is a bit complicated, so we add a helper function to grab index locks and start a transaction. With this patch we now pass the following xfstests: generic/014 generic/101 generic/313 Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-10-18 13:23:01 -07:00
Mark Fasheh	dd99a0127e	scoutfs: rename scoutfs_inode_index_lock_hold Call it scoutfs_inode_index_try_lock_hold since it may fail and unwind as part of normal (not an error) operation. This lets us re-use the name in an upcoming patch. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-10-18 13:23:01 -07:00
Mark Fasheh	afa30e60fe	scoutfs: use inclusive range for scoutfs_data_truncate_items() This makes calling it for truncate less cumbersome - we can safely use ~0ULL for the end point now. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-10-18 13:23:01 -07:00
Zach Brown	9027775ef2	scoutfs: fix parent dir nlink update in rename Renaming a dir between parents and clobbering an existing empty dir wasn't correctly updating the parent link counts. Updating parent link counts when dirs are moved between parents is an independent operation from decreasing the link count of a victim existing target of the rename. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 14:51:30 -07:00
Zach Brown	856f257085	scoutfs: used locked getattr for all inodes We only set the .getattr method to our locked getattr filler for regular files. Set it for all files so that stat, etc, will see the current inode for all file types. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 14:51:30 -07:00
Zach Brown	a30f0bf82f	scoutfs: stop spurious lockdep warning from dlm The fs/dlm code has a harmless but unannotated inversion between connection and socket locking that triggers during shutdown and disables lockdep. We don't want it to mask our warnings during testing that may happen after the first shared unmount so we disable lockdep around the dlm shutdown. It's not ideal but then neither are distro kernels that ship with lockdep warnings. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 14:51:30 -07:00
Zach Brown	8dee30047c	scoutfs: fix xattr trans reservation The xattr trans reservation assumed that it was only dirtying items for the new xattr size. It didn't account for dirty deletion items for parts from a larger previous xattr. With this fixed generic/070 no longer triggers warnings. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 13:59:52 -07:00
Zach Brown	cb879d9f37	scoutfs: add network greeting message Add a network greeting message that's exchanged between the client and server on every connection to make sure that we have the correct file system and format hash. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 13:57:31 -07:00
Zach Brown	ce4daa817a	scoutfs: add support for format_hash Calculate the hash of format.h and ioctl.h and make sure the hash stored in the super during mkfs matches our calculated hash on mount. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 13:57:31 -07:00
Zach Brown	80a4b7df2c	scoutfs: move btree parent min to format.h mkfs needs to know the size of the largest btree when figuring out how big to make the ring. It needs to know how few items we can have in parent blocks and to know that it needs to know how empty the blocks can get. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 13:57:31 -07:00
Zach Brown	8bbb859f0c	scoutfs: move scoutfs_ioctl definition We're going to be strictly enforcing matching format.h and ioctl.h between userspace and kernel space. Let's get the exported kernel function definition out of ioctl.h. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 13:57:31 -07:00
Zach Brown	d593e2caa0	scoutfs: warn if we read items without cache limit All the item ops now know the limit of the items they're allowed to read into the cache. Warn if someone asks to read items without knowing how much they're allowed to read based on their lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	365048b785	scoutfs: add full lock arg to _item_set_batch() Add the full lock arg to _item_set_batch() so that it can verify lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	9b31c9795b	scoutfs: add full lock arg to _item_delete() Add the full lock arg to _item_delete() so that it can verify lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	6cd64f3228	scoutfs: add full lock arg to _item_update() Add the full lock arg to _item_update() so that it can verify lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0aa16f5ef6	scoutfs: add lock arg to _item_create() scoutfs_item_create() hasn't been working with lock coverage. It wouldn't return -ENOENT if it didn't have the lock cached. It would create items outside lock coverate so they wouldn't be invalidated and re-read if another node modified the item. Add a lock arg and teach it to populate the cache so that it's correctly consistent. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	47f5946c90	scoutfs: fix lock name comparison The lock name comparison had a typo where it didn't compare the second fields between the two names. Only inode index items used the second field. This bug could cause lock matching when the names don't match and trigger lock coverage warnings. While we're in there don't rely so heavily on readers knowing the relative precedence of subtraction and (magical gcc empty) ternary operators. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	950436461a	scoutfs: add lock coverage for inode index items Add lock coverage for inode index items. Sadly, this isn't trivial. We have to predict the value of the indexed fields before the operation to lock those items. One value in particular we can't reliably predict: the sequence of the transaction we enter after locking. Also operations can create an absolute ton of index item updates -- rename can modify nr_inodes * items_per_inode * 2 items, so maybe 24 today. And these items can be arbitrarily positioned in the key space. So to handle all this we add functions to gather predicted item values we'll need to lock sort and lock them all, then pass appropriate locks down to the item functions during inode updates. The trickiest bit of the index locking code is having to retry if the sequence number changes. Preparing locks has to guess the sequence number of its upcoming trans and then makes item update decisions based on that. If we enter and have a different sequence number then we need to back off and retry with the correct sequence number (we may find that we'll need to update the indexed meta seq and need to have it locked). The use of the functions is straight forward. Sites figure out the predicted sizes, lock, pass the locks to inode updates, and unlock. While we're at it we replace the individual item field tracking variables in the inode info with an array of indexed values. The code ends up a bit nicer. It also gets rid of the indexed time fields that were left behind and were unused. It's worth noting that we're getting exclusive locks on the index updates. Locking the meta/data seq updates results in complete global serialization of all changes. We'll need concurrent writer locks to get concurrency back. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	960bc4d53c	scoutfs: add lock coverage for stage ioctl Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	aa70903154	scoutfs: add lock coverage for data paths Use per_task storage on the inode to pass locks from high level read and write lock holders down into the callbacks that operate under the locks so that the locks can then be passed to the item functions. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	c3e690a1ac	scoutfs: add per_task storage helper Add some functions for storing and using per-task storage in a list. Callers can use this to pass pointers to children in a given scope when interfaces don't allow for passing individual arguments amongst concurrent callers in the scope. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0535e249d1	scoutfs: add lock arg to scoutfs_update_inode_item Add a full lock argument to scoutfs_update_inode_item() and use it to pass the lock's end key into item_update(). This'll get changed into passing the full lock into _update soon. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	32a68e84cf	scoutfs: add full lock coverage to _item_dirty() Add the full lock argument to _item_dirty() so that it can verify lock coverage in addition to limiting item cache population to the range covered by the lock. This also ropes in scoutfs_dirty_inode_item() which is a thin wrapper around _item_dirty(); Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	1c6e3e39bf	scoutfs: add full lock coverage to _item_next() Add the full lock argument to _item_next() so that it can verify lock coverage in addition to limiting item cache population to the range covered by the lock. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0e4627ea65	scoutfs: add locking of link backref traversal Add cluster locking around the link backref item lookups during ino to path traversal. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	3a277bac6f	scoutfs: protect orphan items with node_id_lock Orphan processing only works with orphans on its node today. Protect that orphan item use with the node_id lock. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	b2668fee9a	scoutfs: protect node free block items Now that we have a long-lived node_id lock we can use it to protect the free block items in the node zone. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	9e3954a918	scoutfs: add lock around data item truncation Add cluster lock coverage to scoutfs_data_truncate_items() and plumb the lock down into the item functions. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	55709c4345	scoutfs: add lock coverage testing to item_lookup* Let's give the item functions the full lock so that they can make sure that the lock has coverage for the keys involved in the operation. This _lookup*() conversion is first so it adds the lock_coverager() helper. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00

1 2 3 4 5 ...

513 Commits