scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-07-25 17:42:50 +00:00

Author	SHA1	Message	Date
Zach BrownandZach Brown	ca6b7f1e6d	scoutfs: lock invalidate only syncs dirty Lock invalidation has to make sure that changes are visible to future readers. It was syncing if the current transaction is dirty. This was never optimal, but it wasn't catastrophic when concurrent invalidation work could all block on one sync in progress. With the move to a single invalidation worker serially invalidating locks it became unacceptable. Invalidation happening in the presence of writers would constantly sync the current transaction while very old unused write locks were invalidated. Their changes had long since been committed in previous transactions. We add a lock field to remember the transaction sequence which could have been dirtied under the lock. If that transaction has already been comitted by the time we invalidate the lock it doesn't have to sync. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	55dde87bb1	scoutfs: fix lock invalidation work deadlock The client lock network message processing callbacks were built to simply perform the processing work for the message in the networking work context that it was called in. This particularly makes sense for invalidation because it has to interact with other components that require blocking contexts (syncing commits, invalidating inodes, truncating pages, etc). The problem is that these messages are per-lock. With the right workloads we can use all the capacity for executing work just in lock invalidation work. There is no more work execution available for other network processing. Critically, the blocked invalidation work is waiting for the commit thread to get its network responses before invalidation can make forward progress. I was easily reproducing deadlocks by leaving behind a lot of locks and then triggering a flood of invalidation requests on behalf of shrinking due to memory pressure. The fix is to put locks on lists and have a small fixed number of work contexts process all the locks pending for each message type. The network callbacks don't block, they just put the lock on the list and queue the work that will walk the lists. Invalidation now blocks one work context, not the number of incoming requests. There were some wait conditions in work that used to use the lock workq. Other paths that change those conditions now have to know to queue the work specifically, not just wake tasks which included blocked work executors. The other subtle impact of the change is that we can no longer rely on networking to shutdown message processing work that was happening in its callbacks. We have to specifically stop our work queues in _shutdown. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	f4db553c28	scoutfs: fix error unwinding in server advance_seq While checking for lost server commit holds, I noticed that the advance_seq request path had obviously incorrect unwinding after getting an error. Fix it up so that it always unlocks and applies its commit. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	4b9c02ba32	scoutfs: add committed_seq to statfs_more Add the committed_seq to statfs_more which gives the greatest seq which has been committed. This lets callers disocover that a seq for a change they made has been committed. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	6356440073	scoutfs: add error message for client commit error We had a debugging WARN_ON that warns when a client has an error commiting their transaction. Let's add a bit more detail and promote it to a proper error. These should not happen. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	9658412d09	scoutfs: add forest counters Add a bunch of counters to track significant events in the forest. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	57c7caf348	scoutfs: fix forest dirty log tracking The forest code is responsible for constructing a consistent fs image out of the items spread across all the btrees written by mounts in the system. Usually readers walk a btree looking for log trees that they should read. As a mount modifies items in its dirty log tree, readers need to be sure to check that in-memory dirty log tree even though it isn't present in the btree that records persistent log trees. The code did this by setting a flag to indicate that readers using a lock should check the dirty log tree. But the flag usage wasn't properly locked and left a race where a reader and writer could race, leaving future readers to not know that they should check the dirty log tree. When we rarely hit that race we'd see item errors that made no sense, like not being able to find an inode item to update after having just created it in the current transaction. To fix this, we clean up the tree tracking in the forest code. We get rid of the static forest_root structs in the lock_private that were used to track the two special-case roots that aren't found in log tree items: the in-memory dirty log root and the final fs root. All roots are now dynamically allocated. We use a flag in the root to identify it as the dirty log root, and identify the fs root by its rid/nr. This results in a bunch of caller churn as we remove lpriv from root identifying functions. We get rid of the idea of the writer adding a static root to the list as well as marking the log as needing to read the root. Instead we make all root management happen as we refresh the list. The forest maintains a commit sequence and writers set state in the lock to indicate that the lock has dirty items in the log during this transaction. Iteration then compares the state set by the commit, writer, and the last refresh to determine if a new refresh needs to happen. Properly tracking the presence of dirty items lets us recognize when the lock no longer has dirty items in the log and we can stop locking and reading the dirty log and fall back to reading the committed stable version. The previous code didn't do that, it would lock and read the dirty root forever. While we're in here, we fix the locking around setting bloom bits and have it track the version of the log tree that was set so that we don't have to clear set bits as the log version is rotated out by the server. There was also a subtle bug where we could hit to stale errors for the same root and return -EIO because we triggering refresh returned stale. We rework the retrying logic to use a separate error code to force refreshing so that we can't accidentally trigger eio by conflating reading stale blocks and forcing refreshing. And finally, we no longer record that we need the dirty log tree in a root if we have a lock that could never read. It's a minor optimization that doesn't change functional behaviour. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	f8bf1718a0	scoutfs: add a bunch of btree counters Add some counters for the most basic btree events. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	c415cab1e9	scoutfs: use srch to track .srch. xattrs Using strictly coherent btree items to map the hash of xattr names to inode numbers proved the value of the functionality, but it was too expensive. We now have the more efficient srch infrastructure to use. We change from the .indx. to the .srch. tag, and change the ioctl from find_xattr to search_xattrs. The idea is to communicate that these are accelerated searches, not precise index lookups and are relatively expensive. Rather than maintaining btree items, xattr setting and deleting emits srch entries which either tracks the xattr or combines with the previous tracker and removes the entry. These are done under the lock that protects the main xattr item, we can remove the separate locking of the previous index items. The semantics of the search ioctl needs to change a bit. Because searches are so expensive we now return a flag to indicate that the search completed. While we're there, we also allow a last_ino parameter so that searches can be divided up and run in parallel. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	f8e1812288	scoutfs: add srch infrastructure This introduces the srch mechanism that we'll use to accelerate finding files based on the presence of a given named xattr. This is an optimized version of the initial prototype that was using locked btree items for .indx. xattrs. This is built around specific compressed data structures, having the operation cost match the reality of orders of magnitude more writers than readers, and adopting a relaxed locking model. Combine all of this and maintaining the xattrs no longer tanks creation rates while maintaining excellent search latencies, given that searches are defined as rare and relatively expensive. The core data type is the srch entry which maps a hashed name to an inode number. Mounts can append entries to the end of unsorted log files during their transaction. The server tracks these files and rotates them into a list of files as they get large enough. Mounts have compaction work that regularly asks the server for a set of files to read and combine into a single sorted output file. The server only initiates compactions when it sees a number of files of roughly the same size. Searches then walk all the commited srch files, both log files and sorted compacted files, looking for entries that associate an xattr name with an inode number. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	cca83b1758	scoutfs: rework get_fs_roots to get_roots The get_fs_roots rpc and server interfaces were built around individual roots. Rebuild it around passing around a struct so that we can add roots without impacting all the current users. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	8c114ddb87	scoutfs: increase max btree item size Now that we have larger blocks we can have a larger max item. This was increased to make room for the srch compaction items which store a good number of srch files in their value. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	ab271f4682	scoutfs: report sm metadata blocks in statfs The conversion of the super block metadata block counters to units of large metadata blocks forgot to scale back to the small block size when filling out the block count fields in the statfs rpc. This resulted in the free and total metadata use being off by the factor of large to small block size (default of ~16x at the moment). Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	42e7fbb4f7	scoutfs: switch to using fnv1a for hashing We had a few uses of crc for hashing. That was fine enough for initial testing but the huge number of xattrs that srch is recording was seeing very bad collisions from the clumsy combination of crc32c into a 64bit hash. Replace it with FNV for now. This also takes the opportunity to use 3 hash functions in the forest bloom filter so that we can extract them from the 64bit hash of the key rather than iterating and recalculating hashes for each function. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	f48112e2a7	scoutfs: allocate contig block pages with nowarn We first attempt to allocate our large logically contiguous cached blocks with physically contiguous pages to minimize the impact on the tlb. When that fails we fall back to vmalloc()ed blocks. Sadly, high-order page allocation failure is expected and we forgot to provide the flag that suppresses the page allocation failure message. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	07ba053021	scoutfs: check super blkno fields We had a bug where mkfs would set a free data blkno allocator bit past the end of the device. (Just at it, in fact. Those fenceposts.) Add some checks at mount to make sure that the allocator blkno ranges in the super don't have obvious mistakes. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	69e5f5ae5f	scoutfs: add btree walk trace point Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	2980edac53	scoutfs: restore btree block verification Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	f9ff25db23	scoutfs: add dirent name fingerprint Entries in a directory are indexed by the hash of their name. This introduces a perfectly random access pattern. And this results in a cow storm as directories get large enough such that the leaf blocks that store their entries are larger than our commits. Each commit ends up being full of cowed leaf blocks that contain a single new entry. The dirent name fingerprints change the dirent key to first start with a fingerprint of the name. This reduces the scope of hash randomization from the entire directory to entries with the same fingerprint. On real customer dir sizes and file names we saw roughly 3x create rate improvements from being able to create more entries in leaf blocks within a commit. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	0a47e8f936	Revert "scoutfs: add block visited bit" The radix allocator no longer uses the block visited bit because it maintains its own much richer private per-block data stored off the priv pointer. Signed-off-by: Zach Brown <zab@versity.com> This reverts commit 294b6d1f79e6d00ba60e26960c764d10c7f4b8a5.	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	3a82090ab1	scoutfs: have per-fs inode nr allocators We had previously seen lock contention between mounts that were either resolving paths by looking up entries in directories or writing xattrs in file inodes as they did archiving work. The previous attempt to avoid this contention was to give each directory its own inode number allocator which ensured that inodes created for entries in the directory wouldn't share lock groups with inodes in other directories. But this creates the problem of operating on few files per lock for reasonably small directories. It also creates more server commits as each new directory gets its inode allocation reservation. The fix is to have mount-wide seperate allocators for directories and for everything else. This puts directories and files in seperate groups and locks, regardless of directory population. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	4d0b78f5cb	scoutfs: add counters for server commits Add some counters for server commits. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	e6ae397d12	Revert "scoutfs: switch block cache to rbtree" We had switched away from the radix_tree because we were adding a _block_move call which couldn't fail. We no longer need that call, so we can go back to storing cached blocks in the radix tree which can use RCU lookups. This revert has some conflict resolution around recent commits to add the IO_BUSY block flag and the switch to _LG_ blocks. This reverts commit 10205a5670dd96af350cf481a3336817871a9a5b. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	e5f5ee2679	Revert "scoutfs: add scoutfs_block_move" We add _block_move for the radix allocator, but it no longer needs it. This reverts commit 6bb0726689981eb9699296ae2cb4c8599add5b90.	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	8fe683dab8	scoutfs: cow dirty radix blocks instead of moving The radix allocator has to be careful to not get lost in recursion trying to allocate metadata blocks for its dirty radix blocks while allocating metadata blocks for others. The first pass had used path data structures to record the references to all the blocks we'd need to modify to reflect the frees and allocations performed while dirtying radix blocks. Once it had all the path blocks it moved the old clean blocks into new dirty locations so that the dirtying couldn't fail. This had two very bad performance implications. First, it meant that trying to read clean versions of dirtied trees would always read the old blocks again because their clean version had been moved to the dirty version. Typically this wouldn't happen but the server does exactly this every time it tries to merge freed blocks back into its avail allocator. This created a significant IO load on the server. Secondly, that block cache move not being allowed to fail motivated us to move to a locked rbtree for the block cache instead of the lockless rcu radix_tree. This changes the recursion avoidance to use per-block private metadata to track every block that we allocate and cow rather than move. Each dirty block knows its parent ref and the blknos it would clear and set. If dirtying fails we can walk back through all the blocks we dirty and restore their original references before dropping all the dirty blocks and returning an error. This lets us get rid of the path structure entirely and results in a much cleaner system. This change meant tracking free blocks without clearing them as they're used to satisfy dirty block allocations. The change now has a cursor that walks the avail metadata tree without modifying it. While building this it became clear that tracking the first set bits of refs doesn't provide any value if we're always searching from a cursor. The cursor ends up providing the same value of avoiding constantly searching empty initial bits and refs. Maintaining the first metadata was just overhead. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	6d7b8233c6	scoutfs: add radix merge retry counter Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	26ccaca80b	scoutfs: add commit written counter Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	ca8abeebb1	scoutfs: check fs root in forest hint The forst code has a hint call to gives iterators a place to start reading from before they acquire locks. It was checking all the log trees but it wasn't checking the main fs tree. This happened to be OK today because we're not yet merging items from the log trees into the main fs tree, but we don't want to miss them once we do start merging the trees. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	b7943c5412	scoutfs: avoid reading forest roots with block IO The forest item operations were reading the super block to find the roots that it should read items from. This was easiest to implement to start, but it is too expensive. We have to find the roots for every newly acquired lock and every call to walk the inode seq indexes. To avoid all these reads we first send the current stable versions of the fs and logs btrees roots along with root grants. Then we add a net command to get the current stable roots from the server. This is used to refresh the roots if stale blocks are encountered and on the seq index queries. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	304dbbbafa	scoutfs: merge partial allocator blocks The server fills radix allocators for the client to consume while allocating during a transaction. The radix merge function used to move an entire radix block at a time. With larger blocks this becomes much too coarse and can move way too much in one call. This moves allocator bits a word at a time and more precisely moves the amount that the caller asked for. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	177af7f746	scoutfs: use larger metadata blocks Introduce different constants for small and large metadata block sizes. The small 4KB size is used for the super block, quorum blocks, and as the granularity of file data block allocation. The larger 64KB size is used for the radix, btree, and forest bloom metadata block structures. The bulk of this are obvious transitions from the old single constant to the appropriate new constant. But there are a few more involved changes, though just barely. The block crc calculation now needs the caller to pass in the size of the block. The radix function to return free bytes instead returns free blocks and the caller is responsible for knowing how big its managed blocks are. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	99bc710f03	scoutfs: remove tiny btree block option It used to take significant effort to create very tall btrees because they only stored small references to large LSM segments. Now they store all file system metadata and we can easily create sufficiently large btrees for testing. We don't need the tiny btree option. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	ac0e58839d	scoutfs: remove btree _before and _after There's no users of these variants of _prev and _next so they can be removed. Support for them was also dropped in the previous reworking of the internal structure of the btree blocks. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	efd9763355	scoutfs: use efficient btree block structures This btree implementation was first built for the relatively light duty of indexing segments in the LSM item implementation. We're now using it as the core metadata index. It's already using a lot of cpu to do its job with small blocks and it only gets more expensive as the block size increases. These changes reduce the CPU use of working with the btree block structures. We use a balanced binary tree to index items by key in the block. This gives us rare tree balancing cost on insertion and deletion instead of the memmove overhead of maintaining a dense array of item offsets sorted by key. The keys are stored in the item struct which are stored in an array at the front of the block so searching for an item uses contiguous cachelines. We add a trailing owner offset to values so that we can iterate through them. This is used to track space freed up by values instead of paying the memmove cost of keeping all the values at the end of the block. We occasionally reclaim the fragmented value free space instead of splitting the block. Direct item lookups use a small hash table at the end of the block which maps offsets to items. It uses linear probing and is guaranteed to have a light load factor so lookups are very likely to only need a single cache lookup. We adjust the watermark for triggering a join from half of a block down to a quarter. This results in less utilized blocks on average. But it creates distance between the join and split thresholds so we get less cpu use from constantly joining and splitting if item populations happen to hover around the previously shared threshold. While shifting the implementation we choose not to add support for some features that no longer make sense. There are no longer callers of _before and _after, and having synthetic tests to use small btree blocks no longer makes ense when we can easily create very tall trees. Both those btree interfaces and the tiny btree block support will be removed. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	f59336085d	scoutfs: add avl Add the little avl implementation that we're going to use for indexing items within the btree blocks. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	ad99636af8	scoutfs: use scoutfs_key as btree key The btree currently uses variable length big-endian buffers that are compared with memcmp() as keys. This is a historical relic of the time when keys could be very large. We had dirent keys that included the name and manifest entries that included those fs keys. But now all the btree callers are jumping through hoops to translate their fs keys into big-endian btree keys. And the memcmp() of the keys is showing up in profiles. This makes the btree take native scoutfs_key structs as its key. The forest callers which are working with fs keys can just pass their keys straight through. The server btree callers with their private btrees get key fields definied for their use instead of having individual big-endian key structs. A nice side-effect of this is that splitting parents doesn't have to assume that a maximal key will be inserted by a child split. We can have more keys in parents and wider trees. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	f9df3ada6c	scoutfs: remove MAX key TYPE and ZONE These were used for constructing arrays of string mappings of key fields. We don't print keys with symbolic strings anymore so we don't need to maintain these values anymore. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	22716c0389	scoutfs: add scoutfs_key_is_zeros() Add a little function for testing if a given scoutfs key is all zeros. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach BrownandZach Brown	c98e75006e	scoutfs: remove lock_client entries in commit The lock server maintains some items in btrees in the server. It is usually called by the server core during a commit so it doesn't need to worry about managing commits. But the lock recovery timeout code happens in its own async context. It needs to protect the lock_client item removals with a commit. This was causing failures during xfstests that simulate node crashes by unmounting with dm-flakey. Lock recovery would dirty blocks in the btree writer outside of a commit. The first server commit holder would find dirty blocks and throw an assertion indicating that someone modified blocks without holding a commit. Signed-off-by: Zach Brown <zab@versity.com>	2020-06-18 14:07:43 -07:00
Zach BrownandZach Brown	ff9386faba	scoutfs: export server commit holds The calls for holding and applying commits in the server are currently private. The lock server is a server component that has been seperated out into its own file. Most of the time the server calls it during commits so the btree changes made in the lock server are protected by the commits. But there are btree calls in the lock server that happen outside of calls from the server. Exporting these calls will let the lock server make all its btree changes in server commits. Signed-off-by: Zach Brown <zab@versity.com>	2020-06-18 14:07:43 -07:00
Benjamin LaHaiseandZach Brown	f5863142be	scoutfs: add data_wait_err for reporting errors Add support for reporting errors to data waiters via a new SCOUTFS_IOC_DATA_WAIT_ERR ioctl. This allows waiters to return an error to readers when staging fails. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> [zab: renamed to data_wait_err, took ino arg] Signed-off-by: Zach Brown <zab@versity.com>	2020-05-29 13:50:13 -07:00
Zach BrownandZach Brown	d16b18562d	scoutfs: make sure forest sees dirty log tree Item writes are first stored in dirty blocks in the private version of the mount's log tree. Local readers need to be sure to check the dirty version of the mount's log tree to make sure that they see the result of writes. Usually trees are found by walking the log tree items stored in another btree in the super. The private dirty version of a mount's log tree hasn't been committed yet and isn't visible in these items. The forest uses its lock private data to track which lock has seen items written and so should always check the local dirty log tree when reading. The intent was to use the per-lock static forest_root for the log tree to record that it had been marked by a write and was then always used for reads. We used storing the forest info's rid and testing for a non-zero forest_root rid as the mechanism for always testing the dirty log root during read. But we weren't setting the forest info rid as each transaction opened. It was always 0 so readers never added the dirty log tree for reading. The fix is to use the more reliable indication that the log root has items for us by testing the flag that all the bits have been set. Then we're also sure to always set the rid/nr of the forest_info record of our log tree, and the per-lock forest_root copy of it whenever we use it. This fixed spurious errors we were seeing as creates tried to read the item they just wrote as memory reclaim freed locks. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-29 12:02:47 -07:00
Zach BrownandZach Brown	e3b1f2e2b0	scoutfs: add counters for radix enospc Add counters for the various sources of ENOSPC from the radix block allocator. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-22 16:08:03 -07:00
Zach BrownandZach Brown	9ad86d4d29	scoutfs: commit trans before premature enospc File data allocations come from radix allocators which are populated by the server before each client transation. It's possible to fully consume the data allocator within one transaction if the number of dirty metadata blocks is kept low. This could result in premature ENOSPC. This was happening to the archive-light-cycle test. If the transactions performed by previous tests lined up just right then the creation of the initial test files could see ENOSPC and cause all sorts of nonsense in the rest of the test, culminating in cmp commands stuck in offline waits. This introduces high and low data allocator water marks for transactions. The server tries to fill data allocators for each transaction to the high water mark and the client forces the commit of a transaction if its data allocator falls below the low water mark. The archive-light-cycle test now passes easily and we see the trans_commit_data_alloc_low counter increasing during the test. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-22 16:08:03 -07:00
Zach BrownandZach Brown	7da8ddb8a1	scoutfs: fix data.h include guard The identifier for data.h's include guard was brought over from an old file and still had the old name. Update it to reflect it's use in data, not filerw. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-22 16:08:03 -07:00
Zach BrownandZach Brown	495358996c	scoutfs: fix older kc readdir emit When we added the kernelcompat layer around the old and new readdir interfaces there was some confusion in the old readdir interface filldir arguments. We were passing in our scoutfs dent item struct pointer instead of the filldir callback buf pointer. This prevented readdir from working in older kernels because filldir would immediately see a corrupt buf and return an error. This renames the emit compat macro arguments to make them consistent with the other calls and readdir now provides the correct pointer to the emit wrapper. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-21 16:28:06 -07:00
Zach BrownandZach Brown	d2a15ea506	scoutfs: fix depth-first radix next bit search The radix block next bit search could return a spurious -ENOENT if it ran out of references in a parent block further down the tree. It needs to bubble up to try the next ref in its parent so that it keeps performing a depth-first search of the entire tree. This lead to an assertion being tripped in _radix_merge. Getting an early -ENOENT caused it to start searching from 0 again. When it's iterating over a read-only input it could find the same leaf and try to clear source bits that were already cleared. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00
Zach BrownandZach Brown	2c5e3aa551	scoutfs: trace radix merge input root and leaf bit Add a bit more detail to the radix merge trace. It was missing the input block and leaf bit. Also use abbreviations of the fields in the trace output so that it's slightly less enormous. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00
Zach BrownandZach Brown	2478d124dd	scoutfs: use random radix block ref seqs The seq portion of radix block references is intended to differentiate versions of a given block location over time. The current method of incrementing the existing value as the block is dirtied is risky. It means that every lineage of a block has the same sequence number progression. Different trees referencing the same block over time could get confused. It's more robust to have large random numbers. The collision window is then evenly distributed over the 64bit space rather than being bunched up all in in the initial seq values. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00
Zach BrownandZach Brown	968e719a9a	scoutfs: check for bad radix merge count When we're merging bits that are set in a read-only input tree then we can't try to merge more bits than exist in the input tree. That'll cause us to loop around and double-free bits. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00

1 2 3 4 5 ...