scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-04-27 00:25:06 +00:00

Author	SHA1	Message	Date
Zach Brown	d593e2caa0	scoutfs: warn if we read items without cache limit All the item ops now know the limit of the items they're allowed to read into the cache. Warn if someone asks to read items without knowing how much they're allowed to read based on their lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	365048b785	scoutfs: add full lock arg to _item_set_batch() Add the full lock arg to _item_set_batch() so that it can verify lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	9b31c9795b	scoutfs: add full lock arg to _item_delete() Add the full lock arg to _item_delete() so that it can verify lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	6cd64f3228	scoutfs: add full lock arg to _item_update() Add the full lock arg to _item_update() so that it can verify lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0aa16f5ef6	scoutfs: add lock arg to _item_create() scoutfs_item_create() hasn't been working with lock coverage. It wouldn't return -ENOENT if it didn't have the lock cached. It would create items outside lock coverate so they wouldn't be invalidated and re-read if another node modified the item. Add a lock arg and teach it to populate the cache so that it's correctly consistent. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	47f5946c90	scoutfs: fix lock name comparison The lock name comparison had a typo where it didn't compare the second fields between the two names. Only inode index items used the second field. This bug could cause lock matching when the names don't match and trigger lock coverage warnings. While we're in there don't rely so heavily on readers knowing the relative precedence of subtraction and (magical gcc empty) ternary operators. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	950436461a	scoutfs: add lock coverage for inode index items Add lock coverage for inode index items. Sadly, this isn't trivial. We have to predict the value of the indexed fields before the operation to lock those items. One value in particular we can't reliably predict: the sequence of the transaction we enter after locking. Also operations can create an absolute ton of index item updates -- rename can modify nr_inodes * items_per_inode * 2 items, so maybe 24 today. And these items can be arbitrarily positioned in the key space. So to handle all this we add functions to gather predicted item values we'll need to lock sort and lock them all, then pass appropriate locks down to the item functions during inode updates. The trickiest bit of the index locking code is having to retry if the sequence number changes. Preparing locks has to guess the sequence number of its upcoming trans and then makes item update decisions based on that. If we enter and have a different sequence number then we need to back off and retry with the correct sequence number (we may find that we'll need to update the indexed meta seq and need to have it locked). The use of the functions is straight forward. Sites figure out the predicted sizes, lock, pass the locks to inode updates, and unlock. While we're at it we replace the individual item field tracking variables in the inode info with an array of indexed values. The code ends up a bit nicer. It also gets rid of the indexed time fields that were left behind and were unused. It's worth noting that we're getting exclusive locks on the index updates. Locking the meta/data seq updates results in complete global serialization of all changes. We'll need concurrent writer locks to get concurrency back. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	960bc4d53c	scoutfs: add lock coverage for stage ioctl Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	aa70903154	scoutfs: add lock coverage for data paths Use per_task storage on the inode to pass locks from high level read and write lock holders down into the callbacks that operate under the locks so that the locks can then be passed to the item functions. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	c3e690a1ac	scoutfs: add per_task storage helper Add some functions for storing and using per-task storage in a list. Callers can use this to pass pointers to children in a given scope when interfaces don't allow for passing individual arguments amongst concurrent callers in the scope. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0535e249d1	scoutfs: add lock arg to scoutfs_update_inode_item Add a full lock argument to scoutfs_update_inode_item() and use it to pass the lock's end key into item_update(). This'll get changed into passing the full lock into _update soon. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	32a68e84cf	scoutfs: add full lock coverage to _item_dirty() Add the full lock argument to _item_dirty() so that it can verify lock coverage in addition to limiting item cache population to the range covered by the lock. This also ropes in scoutfs_dirty_inode_item() which is a thin wrapper around _item_dirty(); Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	1c6e3e39bf	scoutfs: add full lock coverage to _item_next() Add the full lock argument to _item_next() so that it can verify lock coverage in addition to limiting item cache population to the range covered by the lock. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0e4627ea65	scoutfs: add locking of link backref traversal Add cluster locking around the link backref item lookups during ino to path traversal. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	3a277bac6f	scoutfs: protect orphan items with node_id_lock Orphan processing only works with orphans on its node today. Protect that orphan item use with the node_id lock. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	b2668fee9a	scoutfs: protect node free block items Now that we have a long-lived node_id lock we can use it to protect the free block items in the node zone. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	9e3954a918	scoutfs: add lock around data item truncation Add cluster lock coverage to scoutfs_data_truncate_items() and plumb the lock down into the item functions. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	55709c4345	scoutfs: add lock coverage testing to item_lookup* Let's give the item functions the full lock so that they can make sure that the lock has coverage for the keys involved in the operation. This _lookup*() conversion is first so it adds the lock_coverager() helper. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	1193fbc9c5	scoutfs: add a node_id lock A mount's node_id item operations need to be locked. For now let's use a lock that's held for the duration of the mount. It makes it trivial for us to use it with node_id items but we'll have work to do if we want to opportunistically get access to other mount's node_id items while they're still up. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	85dbc21dc6	scoutfs: use lock end keys in rename verification scoutfs_rename() looks for dirents again after acquiring cluster locks. It needs to pass in the lock end keys to limit the items that are read into the cache. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	1da18d17cf	scoutfs: use trylock for global server lock Shared unmount hasn't worked for a long time because we didn't have the server work woken out of blocking trying to acquire the lock. In the old lock code the wait conditions didn't test ->shutdown. dlmglue doesn't give us a reasonable way to break a caller out of a blocked lock. We could add some code to do it with a global context that'd have to wake all locks or add a call with a lock resource name, not a held lock, that'd wake that specific lock. Neither sound great. So instead we'll use trylock to get the server lock. It's guaranteed to make reasonble forward progress. The server work is already requeued with a delay to retry. While we're at it we add a global server lock instead of using the weird magical inode lock in the fs space. The server lock doesn't need keys or to participate in item cache consistency, etc. With this unmount works. All mounts will now generate regular background trylock requests. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Mark Fasheh	28a6b82690	scoutfs: allow some recursive locking in dlmglue Scoutfs can get into a situation where it wants to acquire a lock twice. This happens when we have a parent and child in the same inode group. A create operation will lock that group twice, once for each inode. Instead of forcing callers to remember which inode groups they've locked, we handle this internally within dlmglue. Add a dlmglue lock type flag that indicates we might recursively lock a resource. During locking, when dlmglue sees that flag and we already have the lock at an appropriate level, it will allow the lock operation to continue even when the lock is marked blocking. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-10-04 08:55:14 -07:00
Mark Fasheh	c5e6676b04	scoutfs: remove some ifdef'd out dlmglue code We can dump the ocfs2-specifics as well as any definitions that have been exported via the header file. This makes reading through and modifying dlmglue much more palatable. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-10-04 08:55:14 -07:00
Mark Fasheh	17c6025cb7	scoutfs: clean up some comments in lock.c We finished the lock lru work and can remove these TODO comments. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-10-04 08:55:14 -07:00
Zach Brown	ccf5301c37	scoutfs: add -Werror for build errors We insist on a warning free build but it's up to human diligence to discover and address warnings. We've also caught errors when compilers in automated testing saw problems that the compilers in developer environments didn't. That is, a human only could have noticed by investigating the output from successful test runs. Let's put some weight behind our promise of a warning free build and turn gcc warnings into errors. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	b6c592f099	scoutfs: don't dirty btree buffers The btree sets some buffer head flags before it writes them to satisfy submit_bh() requirements. It was setting dirty which wasn't required. Nothing every cleared dirty so those buffers sat around and were never freed. Each btree block we wrote sat around forever. Eventually the vm gets clogged up and the world backs up trying to allocate pages to write and we see massive stalls. With this fix we no longer see the 'buffers' vm stat continously grow and IO rates are consistent over time. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	4bc565be39	scoutfs: silence bulk_alloc gcc warning Some versions of gcc correctly noticed that bulk_alloc() had a case where it wouldn't initialize ret if the first segno was 0. This won't happen because the client response processing returns an error in this case. So this just shuts up the warning. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	c5ddec7058	scoutfs: more aggressively shrink items The old item shrinking was very conservative. It would only try and reclaim items from the front of the range of the oldest items in the lru. It would stop making progress if all the items in the front of the lru are in a range whose initial item can't be reclaimed. With the item shrinking not making progress memory fills with items. Eventually the system backs up behind an allocation during segment writing blocking waiting for free pages. We fix this by much more aggressively shrinking items. We now look for a region of items around the oldest items to shrink. If those fall in the middle of a range then we use the memory from the items to construct a new range and split the existing range. Now the only way we'll refuse to shrink items is if they're dirty. We have a reasonably small cap on the number of dirty items so we shoudln't get stuck. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	fd509840d4	scoutfs: use pages for seg shrink object count The VM wasn't very excited about trying to reclaim our seg count when we returned small count of the number of large segment objects available for reclaim. Each segment represents a ton of memory so we want to give the VM more visibility into the scale of the cache to encourage it to shrink it. We define the object count for the seg shrinker as the number of pages of segments in the lru. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	ccefffe74f	scoutfs: add item, range, lock alloc/free counters Add some counters to track allocation and freeing of our structures that are subject to shrinking. This lets us eyeball the counters to see if we have runaway leaks. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	15aa09b0c2	scoutfs: add shrink exit trace points Add trace points that show the incoming nr_to_scan and resulting object count for shrinker calls. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	43a2d63f79	scoutfs: replace trace_printk in bio.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	e67e500940	scoutfs: turn off tracing in dlmglue.c Put this behind a #define. Leave the asserts (mlog_bug_on_msg) though and redefine their macro to printk instead of going to the trace buffer. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	3430edb60b	scoutfs: replace trace_printk in item.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	2a07e6f642	scoutfs: replace trace_printk in data.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	deecfa0ad5	scoutfs: replace trace_printk in trans.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	285842086d	scoutfs: replace trace_printk in ioctl.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	8ad6ff9d41	scoutfs: replace trace_printk in inode.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	44a19b63c0	scoutfs: replace trace_printk in segment.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	cf3f9fee75	scoutfs: replace trace_printk in lock.c Also clean up these traces a bit and make a lock_info trace class which we can expand in a future patch. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	7739a0084e	scoutfs: replace trace_printk in xattr.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	87adeb9306	scoutfs: replace trace_printk in manifest.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	0d28930271	scoutfs: replace trace_printk in super.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	a5283e6f2c	scoutfs: replace trace_printk in dir.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	2c1f117bef	scoutfs: replace trace_printk in compact.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	3a5093c6ae	scoutfs: replace trace_printk in alloc.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	215ba7d4ad	scoutfs: more reliably set btree parent item bits Most paths correctly calculated the bits to set in a parent item by combining the bits set in the child with the half bit of the child block's position in the ring. With the exception of fixing up the parent item bits after descent by walking the path. This mistake caused the parent item half bits to be zero and prevented migrating of blocks from the old half of the ring. Fix it by introducing a helper function to calculate the parent ref item bits and consistently using it. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-20 10:26:40 -07:00
Zach Brown	42b33d616e	scoutfs: fix btree bit iteration store_pos_bits() was trying to iterate over bits that were different between the existing bits set in the item and the new bits that will be set. It used a too clever for_each helper that tried to only iterate as many times as there were bits. But it messed up and only used ffs to find the next bit for the first iteration. From then on it would iterate over bits that weren't different. This would cause the counts to be changed when the bits didn't change and end up being wildly wrong. Fix this by using a much clearer loop. It still breaks out when there are no more different bits and we're only using a few low bits so the number of iterations is tiny. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-20 10:26:40 -07:00
Zach Brown	0b15cfe7f8	scoutfs: split on btree deletion We were getting asserts during deletion that insertion while updating a parent item didn't have enough room in the block. Our btree has variable length keys. During a merge it's possible that the items moved between the blocks can result in the final key of a block changing from a small key to a large key. To update the parent ref the parent block must have as much free space as the difference in the key sizes. We ensure free space in parents during descent by trying to split the block. Deletion wasn't doing that, it was only trying to merge blocks. We need to try to split as well as merge during deletion. And we have to update the merge threshold so that we don't just split the resulting block again if it doesn't have the min free space for a new parent item. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-19 21:19:25 -07:00
Zach Brown	1012ee5e8f	scoutfs: use block mapping items Move to static mapping items instead of unbounded extents. We get more predictable data structures and simpler code but still get reasonably dense metadata. We no longer need all the extent code needed to split and merge extents, test for overlaps, and all that. The functions that use the mappings (get_block, fiemap, truncate) now have a pattern where they decode the mapping item into an allocated native representation, do their work, and encode the result back into the dense item. We do have to grow the largest possible item value to fit the worst case encoding expansion of random block numbers. The local allocators are no longer two extents but are instead simple bitmaps: one for full segments and one for individual blocks. There are helper functions to free and allocate segments and blocks, with careful coordination of, for example, freeing a segment once all of its constituent blocks are free. _fiemap is refactored a bit to make it more clear what's going on. There's one function that either merges the next bit with the currently building extent or fills the current and starts recording from a non-mergable additional block. The old loop worked this way but was implemented with a single squirrely iteration over the extents. This wasn't feasible now that we're also iterating over blocks inside the mapping items. It's a lot clearer to call out to merge or fill the fiemap entry. The dirty item reservation counts for using the mappings is reduced significantly because each modification no longer has to assume that it might merge with two adjacent contiguous neighbours. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-19 11:25:38 -07:00

1 2 3 4 5 ...

481 Commits