scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-02-07 19:20:44 +00:00

Author	SHA1	Message	Date
Mark Fasheh	2a07e6f642	scoutfs: replace trace_printk in data.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	deecfa0ad5	scoutfs: replace trace_printk in trans.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	285842086d	scoutfs: replace trace_printk in ioctl.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	8ad6ff9d41	scoutfs: replace trace_printk in inode.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	44a19b63c0	scoutfs: replace trace_printk in segment.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	cf3f9fee75	scoutfs: replace trace_printk in lock.c Also clean up these traces a bit and make a lock_info trace class which we can expand in a future patch. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	7739a0084e	scoutfs: replace trace_printk in xattr.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	87adeb9306	scoutfs: replace trace_printk in manifest.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	0d28930271	scoutfs: replace trace_printk in super.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	a5283e6f2c	scoutfs: replace trace_printk in dir.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	2c1f117bef	scoutfs: replace trace_printk in compact.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Mark Fasheh	3a5093c6ae	scoutfs: replace trace_printk in alloc.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	215ba7d4ad	scoutfs: more reliably set btree parent item bits Most paths correctly calculated the bits to set in a parent item by combining the bits set in the child with the half bit of the child block's position in the ring. With the exception of fixing up the parent item bits after descent by walking the path. This mistake caused the parent item half bits to be zero and prevented migrating of blocks from the old half of the ring. Fix it by introducing a helper function to calculate the parent ref item bits and consistently using it. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-20 10:26:40 -07:00
Zach Brown	42b33d616e	scoutfs: fix btree bit iteration store_pos_bits() was trying to iterate over bits that were different between the existing bits set in the item and the new bits that will be set. It used a too clever for_each helper that tried to only iterate as many times as there were bits. But it messed up and only used ffs to find the next bit for the first iteration. From then on it would iterate over bits that weren't different. This would cause the counts to be changed when the bits didn't change and end up being wildly wrong. Fix this by using a much clearer loop. It still breaks out when there are no more different bits and we're only using a few low bits so the number of iterations is tiny. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-20 10:26:40 -07:00
Zach Brown	0b15cfe7f8	scoutfs: split on btree deletion We were getting asserts during deletion that insertion while updating a parent item didn't have enough room in the block. Our btree has variable length keys. During a merge it's possible that the items moved between the blocks can result in the final key of a block changing from a small key to a large key. To update the parent ref the parent block must have as much free space as the difference in the key sizes. We ensure free space in parents during descent by trying to split the block. Deletion wasn't doing that, it was only trying to merge blocks. We need to try to split as well as merge during deletion. And we have to update the merge threshold so that we don't just split the resulting block again if it doesn't have the min free space for a new parent item. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-19 21:19:25 -07:00
Zach Brown	1012ee5e8f	scoutfs: use block mapping items Move to static mapping items instead of unbounded extents. We get more predictable data structures and simpler code but still get reasonably dense metadata. We no longer need all the extent code needed to split and merge extents, test for overlaps, and all that. The functions that use the mappings (get_block, fiemap, truncate) now have a pattern where they decode the mapping item into an allocated native representation, do their work, and encode the result back into the dense item. We do have to grow the largest possible item value to fit the worst case encoding expansion of random block numbers. The local allocators are no longer two extents but are instead simple bitmaps: one for full segments and one for individual blocks. There are helper functions to free and allocate segments and blocks, with careful coordination of, for example, freeing a segment once all of its constituent blocks are free. _fiemap is refactored a bit to make it more clear what's going on. There's one function that either merges the next bit with the currently building extent or fills the current and starts recording from a non-mergable additional block. The old loop worked this way but was implemented with a single squirrely iteration over the extents. This wasn't feasible now that we're also iterating over blocks inside the mapping items. It's a lot clearer to call out to merge or fill the fiemap entry. The dirty item reservation counts for using the mappings is reduced significantly because each modification no longer has to assume that it might merge with two adjacent contiguous neighbours. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-19 11:25:38 -07:00
Zach Brown	c4f3c26343	scoutfs: add scoutfs_item_update_dirty() Add a function for updating the value of a dirty item. Callers can use this to make changes without having to worry about failure. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-19 11:25:38 -07:00
Zach Brown	5325aff698	scoutfs: add item count update function The item cache maintains a count of the number of dirty items, keys, and values. It updates the counts as it dirties and cleans items. There are callers who want to modify the accounting directly instead of having the accounting updated as a side effect of cleaning and re-dirtying the item. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-19 11:25:38 -07:00
Zach Brown	7854471475	scoutfs: fix server wq destory warning We were seeing warnings in destroy_workqueue() which meant that work was queued on the server workqueue after it was drained and before it was finally destroyed. The only work that wasn't properly waited for was the commit work. It looks like it'd be idle because the server receive threads all wait for their request processing work to finish. But the way the commit work is batched means that a request can have its commit processed by executing commit work while leaving the work queued for another run. Fix this by specifically waiting for the commit work to finish after the server work has waited for all the recv and compaction work to finish. I wasn't able to reliably trigger the assertion in repeated xfstests runs. This survived many runs also, let's see if it stops the destroy_workqueue() assertion from triggering in the future. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-12 15:22:03 -07:00
Zach Brown	f0a7c4f294	scoutfs: make trans item count const rhs The item count estimate functions didn't obviously differentiate between adding to a count and resetting it. Most callers initialized the count struct to 0 on the stack, incremented their estimate once, and passed it in. The problem is that those same functions that increment once in callers are also used in other estimates to build counts based on multiple operations. This tripped up the data truncate path. It looped and kept incrementing its count while truncating a file with lots of extents until the count got so large that it didn't fit in a segment by itself and blocked forever. This cleans up the item count code so that it's much harder to get wrong. We differentiate between the SIC_() high level count estimates that are meant to be passed in to _hold_trans(), and the internal __count_() functions which are used to add up the item counts that make up an aggregate operation. With this fix the only way to use the count in extent truncation is to correctly reset it for the item count for each transacation. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-12 10:41:55 -07:00
Zach Brown	e165d89f7f	scoutfs: warn on invalid item counts We had a bug where a caller was slowly increasing their item count for every transaction they attempted in a loop. Eventually the item count grew to be too large to fit in a segment and they slept indefinitely. Let's warn on invalid and impossibly large item counts as we enter transactions. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-12 10:41:55 -07:00
Zach Brown	ba40899e84	scoutfs: remove scoutfs_trans_wake_holders() This was used by compaction to wake local holders who were waiting for compaction to free up level 0 segments for them to enter the transaction. Throttling level 0 segment writes works differently now and doesn't involve blocking trans holders. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-12 10:41:55 -07:00
Mark Fasheh	5fa97018e5	scoutfs: get rid of dlmglues ocfs2_uninit_super All this does is uninitialize the dlmglue debug infrastructure, however we already do that in ocfs2_dlm_shutdown(), resulting in some double frees on unmount. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-11 21:57:31 -07:00
Zach Brown	fbfbe910aa	scoutfs: return error from lock_name_keys xfstests generic/028 was crashing dereferencing NULL locks. It'd hit either rename trying to refresh an inode with a NULL lock or lookup trying to pass a NULL lock's end to item lookup. The addition of the lock LRU fixed a bug in lock_name_keys() where it wouldn't drop a lock when _cluster_lock() returned an error. But it always returned 0 instead of returning the error. Returning 0 without setting the lock caused the callers to deref their NULL locks. We also forcefully NULL the lock at the start of the function. It was lucky that callers had already NULLed their locks. If they hadn't they would have been following random on-stack memory and it might have been harder to debug. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-11 09:27:57 -07:00
Mark Fasheh	b1fff0997e	scoutfs: dlmglue should initialize res->l_debug_list We're missing initialization of this field. It should never cause a problem today because we always do a list add immediately afterwards but let's be extra careful here and initialize it just in case. We also add a sanity check in ocfs2_add_lockres_tracking() that the lockres hasn't already been put on the debug list. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-08 13:29:56 -07:00
Mark Fasheh	f276771d8c	scoutfs: we need to uninitialize the dlmglue lockres The dlmglue lockres gets put on a debugging list at initialization time, and taken off the list at uninit time. We were missing the uninit portion of this cycle, causing some list debugging warnings. Call ocfs2_lock_res_free() in free_scoutfs_lock(). In addition, we had a raw kfree() of a scoutfs lock in find_alloc_scoutfs_lock() which also needed to be replaced. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-08 13:29:56 -07:00
Zach Brown	2d6d113e03	scoutfs: continue index walk after lock We saw inode index queries spinning. They were finding no cached entries in their locked region but the next key in the segments was in the region. This can happen if an item has been deleted in the current transaction. The query won't walk up in to the new dirty seq but it will try to walk the old seq. The item will still be in the segments but won't be visible to item_next because it's marked deleted. The query will spin finding the next stale key to read from and finding it missing in the cache. This is fixed by taking the current coherent cache at its word. When it tells us there's no entries we advance the key to check the manifest for to past the locked region. In this case it'll skip past the cached delete item and look for the next key in the segments. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-08 10:27:07 -07:00
Mark Fasheh	9461104f8e	scoutfs: Use LRU for locks The move to dlmglue necessitating removing the old lru code. That's fine, it didn't work anyway. We can't drop locks in the shrinker directly, so instead we have the shrinker put them on a workqueue where they are dropped. The rules for the LRU are simple. Locks get a users count. Any process that holds the lock or is in the process of acquiring the lock increments this count. When unlock is called, the count is decremented. We can use the value of this count to manage the LRU - scoutfs_unlock puts locks on the LRU when the count reaches zero, lock_name_keys() always takes them off. If the lock is selected for reclaim, callers wanting to use the lock will need to wait. We acheive this with a pair of flags. SCOUTFS_LOCK_RECLAIM is used to indicate the the lock is now queued for reclaim. Once the is ready to be destroyed, we set SCOUTFS_LOCK_DROPPED flag, telling callers to put the lock and retry their rbtree search. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-08 09:57:04 -07:00
Mark Fasheh	4bb5cadaf0	scoutfs: remove dead code in lock.[ch] With the dlmglue transition over, we can finally remove some of the no longer used portions of lock.[ch]. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-08 09:57:04 -07:00
Zach Brown	79110a74eb	scoutfs: prevent partial block stage, except final The staging ioctl is just a thin wrapper around writing. If we allowed partial-block staging then the write would zero a newly allocated block and only stage in the partial region of the block, leaving zeros in the file that didn't exist before. We prevent staging when the starting offset isn't block aligned. We prevent staging when the final offset isn't block aligned unless it matches the size because the stage ends in the final partial block of the file. This is verified by an xfstest (scoutfs/003) that is in flight. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-07 13:49:37 -07:00
Zach Brown	e7b5cd4c66	scoutfs: limit get_block bh use We were seeing __block_write_begin spin when staging writes were called after a read of an offline region saw an error. It turns out that the way __block_write_begin iterates through buffer heads on a page will livelock if b_size is 0. Our get_block was clearing b_blocknr and b_size before doing anything. It'd set them when it allocated blocks or found existing mapped blocks. But it'd leave them 0 on an error and trigger this hang. So we'll back off and only do the same things to the result bh that ext2/3 do, presumably that's what's actually supported. We only set mapped, set or clear new, and set b_size to less than the input b_size. While we're at it we remove a totally bogus extent flag check that's done before seeing if the next extent we found even intersects with the logical block that we're searching for. The extra test is performed again correctly inside the check for the extents overlapping. It is an artifact from the days when the "extents" were a single block and didn't need to check for overlaps. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-07 13:49:37 -07:00
Zach Brown	51e8b614e5	scoutfs: stop livelocking in item_next scoutfs_item_next() could livelock given the right key and segment key boundaries. This was easiest to trigger with an fio command that wrote a lot of data: fio --filesize=100m --nrfiles=25 --name=100m --numjobs=1 \ --iodepth=1 --ioengine=sync --fallocate=0 \ --rw=write --openfiles=256 \ --directory=$TEST_DIR There were two problems. First, if it found a cached region that didn't contain a next item it would try to read the end of the existing cached region instead of trying to populate more items by reading from the key past the existing cached region. This is fixed by incrementing the key to read from after setting it to the end of the cached region. Second, it got totally confused by non-merged but adjacent cached regions. It would find a cached region that contains the search key and try to read from the key after that region, but that key could also be cached and just not merged with its previous region. This is fixed by (duh) having an allocated pos key that we set as we walk through cached regions. It used to always try and read from the search key which was bonkers. With these fixes fio now completes. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-05 14:22:47 -07:00
Zach Brown	82f8daaebf	scoutfs: print trailing key bytes The key printing functions only output the key material that's described by the format. We have some callers that need to increment or decrement keys so they expand them to full size keys. This expansion and extra high precision low significance was hidden from the traces. This adds a helper that prints the key material with the format and then appends an encoding of the trailing bytes. The key printer was a huge mess of cases and ifs that made it hard to integrate a sane helper. We also take the opportunity to break it up into zone\|type key printer functions. The isolation makes it much clearer to see what's going on. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-05 14:22:47 -07:00
Zach Brown	2eddeb5db4	scoutfs: delete unused net key types Remove these key types from the format which haven't been used for a while. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-05 14:22:47 -07:00
Zach Brown	76cf28b442	scoutfs: warn if lock with trans held We can't block on a lock while holding the transaction open because that'd stop lock downconversion from syncing to write out items while it is converting from EX. Add a warning if we try to acquire a blocking lock while holding a transaction. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	599269e539	scoutfs: don't return uninit index entries Initially the index walking ioctl only ever output a single entry per iteration. So the number of entries to return and the next entry pointer to copy to userspace were maintained in the post-increment of the for loop. When we added locking of the index item results we made it possible to not copy any entries in a loop iteration. When that happened the nr and pointer would be incremented without initializing the entry. The ioctl caller would see a garbage entry in the results. This was visible in scoutfs/002 test results on a volume that had an interesting file population after having run through all the other scoutfs tests. The uninitialized entries would show up as garbage in the size index portion of the test. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	a8db7e5b74	scoutfs: stop iteration at lock end value If item iteration finds a hole in the cache it tries to read items. After the items are read it can look at the cached region and return items or -ENOENT. We recently added an end key to limit how far we can read and cache items. The end key addition correctly limited the cache read to the lock end value. It could never read item cache ranges beyond that. This means that it can't iterate past the end value and should return -ENOENT if it gets past end. But the code forgot to do that, it only checked for iteration past last before returning -ENOENT. It spins continually finding a hole past end but inside last, tries to read items but limits them to end, then finds the same hole again. Triggering this requires a lock end that's nearer than the last iteration key. That's hard to do because most of our item reads are covered by inode group locks which extend well past iteration inside a given inode. Inode index item can easily trigger this if there's no items. I tripped over it when walking empty indexes (data_seq or online_blocks with no regular files). Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	ca78757ca5	scoutfs: more careful client connect timeouts The client connection loop was a bit of a mess. It only slept between retries in one particular case. Other failures to connect would spin and livelock. It would spin forever. This fixed loop now has a much more orderly reconnect procedure. Each connecting sender always tries once. Then retry attempts backoff exponentially, settling at a nice long timeout. After long enough it'll return errors. This fixes livelocks in the xfstests that mount and unmount around dm-flakey config. generic/{034,039,040} would easily livelock before this fix. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	a15ec9ff84	scoutfs: comment out dlm posix calls stackglue is trying to call dlm posix symbols that don't exist in some RHEL dlm kernels. We're not using this functionality yet so let's just tear it out for now. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	d228de60c4	scoutfs: remove unused lock code There's a fair amount of lock.c that's dead code now that we're using dlmglue. Some of the dead code is seen as unused and is throwing warnings. This silences the errors by removing the code. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	826bd7f7bf	scoutfs: ifdef out some unused dlmglue functions Some dlmglue functions are unused by the current ifdefery. They're throwing warnigns that obscure other warnings in the build. This broadens the ifdef coverage so that we don't get warnings. The unused code will either be promoted to an interface or removed as dlmglue evolves into a reusable component. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	8735d319a3	scoutfs: fix inode lock inversions We lock multiple inodes by order of their inode number. This fixes the directory entry paths that hold parent dir and target inode locks. Link and unlink are easy because they just acquire the existing parent dir and target inode locks. Lookup is a little squirrely because we don't want to try and order the parent dir lock with locks down in iget. It turns out that it's safe to drop the dir lock before calling iget as long as iget handles racing the inode cache instantiation with inode deletion. Creation is the remaining pattern and it's a little weird because we want to lock the newly created inode before we create it and the items that store it. We add a function that correctly orders the locks, transaction, and inode cache instantiation. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	d2ea247ab9	scoutfs: remove scoutfs_item_delete_many() It looked like it was easier to have a helper dirty and delete items. But now that we also have to pass in locks the interface gets messy enough that it's easier to have the caller take care of it. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	f634a5b598	scoutfs: implement scoutfs_rename() Previously we had lots of inode creation callers that used a function to create the dirent items and we had unlink remove entries by hand. Rename is different because it wants to remove and add multiple links as it does its work, including recreating links that it has deleted. We rework add_entry_item() so that it gets the specific fields it needs instead of getting them from the vfs structs. This makes it clear that callers are responsible for the source of the fields. Specifically we need to be able to add entries during failed rename cleanup without allocating a new readdir pos from the parent dir. With callers now responsible for the inputs to add_entry_items() we move some of its code out into all callers: checking name length, dirtying the parent dir inode, and allocating a readdir pos from the parent. We then refactor most of _unlink() into a a del_entry_items() to match addition. This removes the last user of scoutfs_item_delete_many() and it will be removed in a future commit. With the entry item helpers taking specific fields all the helpers they use also need to use specific fields instead of the vfs structs. To make rename cluster safe we need to get cluster locks for all the inodes that we work with. We also have to check that the locally cached vfs input is still valid after acquiring the locks. We only check the basic structural correctness of the args: that parent dirs don't violate ancestor rules to create loops and that the entries assumed by the rename arguments still exist, or not. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	3233ab47e8	scoutfs: add global lock names Add a lock name that has a global scope in a given lockspace. It's not associated with any file system items. We add a scope to the lock name to indicate if a lock is global or not and set that in other lock naming intitialization. We permit lock allocation to accept null start and end keys. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	e47d66ddd3	scoutfs: add scoutfs_lock_inodes() Add a function that can lock multiple inodes in order of their inode numbers. It handles nulls and duplicate inodes. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Mark Fasheh	c4e7b5a6e9	scoutfs: provide cluster safe ->llseek Without this we return -ESPIPE when a process tries to seek on a regular file. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [zab: adapted to new lock call] Signed-off-by: Zach Brown <zab@zabbo.net>	2017-08-30 10:38:00 -07:00
Mark Fasheh	1bcad2e9cc	scoutfs: provide ->permission We need to lock and refresh the VFS inode before it checks permissions in system calls, otherwise we risk checking against stale inode metadata. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [zab: adapted to newer lock call] Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Mark Fasheh	c0d3f99a6e	scoutfs: Cluster coherent read/write With trylock implemented we can add locking in readpage. After that it's pretty easy to implement our own read/write functions which at this point are more or less wrapping the kernel helpers in the correct cluster locking. Data invalidation is a bit interesting. If the lock we are invalidating is an inode group lock, we use the lock boundaries to incrementally search our inode cache. When an inode struct is found, we sync and (optionally) truncate pages. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [zab: adapted to newer lock call, fixed some error handling] Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	ceccc56c8f	scoutfs: add inode locking flags to callers Now that we have the inode refreshing flags let's add them to the callers that want to have a current inode after they have their lock. Callers locking newly created items use the new inode flag to reset the refresh gen. A few inode tests are moved down to after locking so that it can test the current refreshed inode. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00

1 2 3 4 5 ...

447 Commits