scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-02-09 20:20:08 +00:00

Author	SHA1	Message	Date
Zach Brown	08a140c8b0	scoutfs: use our locking service Convert client locking to call the server's lock service instead of using a fs/dlm lockspace. The client code gets some shims to send and receive lock messages to and from the server. Callers use our lock mode constants instead of the DLM's. Locks are now identified by their starting key instead of an additional scoped lock name so that we don't have more mapping structures to track. The global rename lock uses keys that are defined by the format as only used for locking. The biggest change is in the client lock state machine. Instead of calling the dlm and getting callbacks we send messages to our server and get called from incoming message processing. We don't have everything come through a per-lock work queue. Instead we send requests either from the blocking lock caller or from a shrink work queue. Incoming messages are called in the net layer's blocking work contexts so we don't need to do any more work to defer to other contexts. The different processing contexts leads to a slightly different lock life cycle. We refactor and seperate allocation and freeing from tracking and removing locks in data structures. We add a _get and _put to track active use of locks and then async references to locks by holders and requests are tracked seperately. Our lock service's rules are a bit simpler in that we'll only ever send one request at a time and the server will only ever send one request at a time. We do have to do a bit of work to make sure we process back to back grant reponses and invalidation requests from the server. As of this change the lock setup and destruction paths are a little wobbly. They'll be shored up as we add lock recovery between the client and server. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	7c8383eddd	scoutfs: add scoutfs_lock_rename() Add a specific lock method for locking the global rename lock instead of having the caller specify it as a global lock. We're getting rid of the notion of lock scopes and requiring all locks to be related to keys. The rename lock will use magic keys at the end of the volume. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	3efcc87413	scoutfs: add corruption messages for namei Add scoutfs_corruption() calls for corruption associated with mapping names to inodes. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-27 09:06:35 -07:00
Zach Brown	676d1e32ef	scoutfs: more carefully trace backref walk loop We were only issuing one kernel warning when we couldn't resolve a path to an inode due to excessive retries. It was hard to capture and we only saw details from the first instance. This adds a counter for each time we see excessive retries and returns -ELOOP in that case. We also extend the link backref adding trace point to include the found entry, if any. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-13 10:09:31 -07:00
Zach Brown	9148f24aa2	scoutfs: use single small key struct Variable length keys lead to having a key struct point to the buffer that contains the key. With dirents and xattrs now using small keys we can convert everyone to using a single key struct and significantly simplify the system. We no longer have a seperate generic key buf struct that points to specific per-type key storage. All items use the key struct and fill out the appropriate fields. All the code that paired a generic key buf struct and a specific key type struct is collapsed down to a key struct. There's no longer the difference between a key buf that shares a read-only key, has it's own precise allocation, or has a max size allocation for incrementing and decrementing. Each key user now has an init function fills out its fields. It looks a lot like the old pattern but we no longer have seperate key storage that the buf points to. A bunch of code now takes the address of static key storage instead of managing allocated keys. Conversely, swapping now uses the full keys instead of pointers to the keys. We don't need all the functions that worked on the generic key buf struct because they had different lengths. Copy, clone, length init, memcpy, all of that goes away. The item API had some functions that tested the length of keys and values. The key length tests vanish, and that gets rid of the _same() call. The _same_min() call only had one user who didn't also test for the value length being too large. Let's leave caller key constraints in callers instead of trying to hide them on the other side of a bunch of item calls. We no longer have to track the number of key bytes when calculating if an item population will fit in segments. This removes the key length from reservations, transactions, and segment writing. The item cache key querying ioctls no longer have to deal with variable length keys. The simply specify the start key, the ioctls return the number of keys copied instead of bytes, and the caller is responsible for incrementing the next search key. The segment no longer has to store the key length. It stores the key struct in the item header. The fancy variable length key formatting and printing can be removed. We have a single format for the universal key struct. The SK_ wrappers that bracked calls to use preempt safe per cpu buffers can turn back into their normal calls. Manifest entries are now a fixed size. We can simply split them between btree keys and values and initialize them instead of allocating them. This means that level 0 entries don't have their own format that sorts by the seq. They're sorted by the key like all the other levels. Compaction needs to sweep all of them looking for the oldest and read can stop sweeping once it can no longer overlap. This makes rare compaction more expensive and common reading less expensive, which is the right tradeoff. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	df6a8af71f	scoutfs: remove name from dirent keys Directory entries were the last items that had large variable length keys because they stored the entry name in the key. We'd like to have small fixed size keys so let's store dirents with small keys. Entries for lookup are stored at the hash of the name instead of the full name. The key also contains the unique readdir pos so that we don't have to deal with collision on creation. The lookup procedure now does need to iterate over all the readdir positions for the hash value and compare the names. Entries for link backref walking are stored with the entry's position in the parent dir instead of the entry's name. The name is then stored in the value. Inode to path conversion can still walk the backref items without having to lookup dirent items. These changes mean that all directory entry items are now stored at a small key with some u64s (hash, pos, parent dir, etc) and have a value with the dirent struct and full entry name. This lets us use the same key and value format for the three entry key types. We no longer have to allocate keys, we can store them on the stack. We store the entry's hash and pos in the dirent struct in the item value so that any item has all the fields to reference all the other item keys. We store the same values in the dentry_info so that deletion (unlink and rename) can find all the entries. The ino_path ioctl can now much more clearly iterate over parent directories and entry positions instead of oh so cleverly iterating over null terminated names in the parent directories. The ioctl interface structs and implementation become simpler. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	b0bd273acc	scoutfs: remove support for multi-element kvecs Originally the item interfaces were written with full support for vectored keys and values. Callers constructed keys and values made up of header structs and data buffers. Segments supported much larger values which could span pages when stored in memory. But over time we've pulled that support back. Keys are described by a key struct instead of a multi-element kvec. Values are now much smaller and don't span pages. The item interfaces still use the kvec arrays but everyone only uses a single element. So let's make the world a whole lot less awful but having the item interfaces only supporting a single value buffer specified by a kvec. A bunch of code disappears and the result is much easier to understand. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	982a0a313e	scoutfs: allocate contiguous dirent for creation The values used in dirent item creation are one of the few places we have value kvecs with multiple entries. Let's instead allocate and copy the dirent struct and name into a contiguous buffer so that we can move towards single vector values. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	08f544cc15	scoutfs: remove scoutfs_item_lookup_exact() size Every caller of scoutfs_item_lookup_exact() provided a size that matches the value buffer. Let's remove the redundant arg and use the value buffer length as the exact size to match. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	951b6d8dcd	scoutfs: add d_revalidate trace Add a trace event to get some visibility into dentry revalidation. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-09 14:29:55 -08:00
Zach Brown	f9e282048f	scoutfs: revalidate dentries by checking items Initially we had d_revalidate always return that the dentry was invalid. This avoids dentry cache consistency problems across the cluster by always performing lookups. That's slow by itself, but it turns out that the dentry invalidation that happens on revalidation failure is very expensive if you have lots of dentries. So we switched to forcefully dropping dirents as we revoked their lock. That avoided the cost of revalidation failure but it adds the problem that dentries are unhashed when their locks are dropped. This causes paths like getcwd() to return errors when they see unhashed dentries instead of trying to revalidate them. This implements a d_revalidate which actually does work to determine if the dentry is still valid. When we populate dentries under a lock we add them to a list on the lock. As we drop the lock we remove them from the list. But the dentry is not modified. This lets paths like getcwd() still work. Then we implement revalidation that does the actual item lookups if the dentry's lock has been dropped. This lets revalidation return success and avoid the terrible invalidation costs from returning failure and then calling lookup to populate a new dentry. This brings us more in line with the revalidation behaviour of other systems that maintain multi-node dcache consistency. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-28 11:18:57 -08:00
Zach Brown	4ff1e3020f	scoutfs: allocate inode numbers per directory Having an inode number allocation pool in the super block meant that all allocations across the mount are interleaved. This means that concurrent file creation in different directories will create overlapping inode numbers. This leads to lock contention as reasonable work loads will tend to distribute work by directories. The easy fix is to have per-directory inode number allocation pools. We take the opportunity to clean up the network request so that the caller gets the allocation instead of having it be fed back in via a weird callback. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-09 17:58:19 -08:00
Zach Brown	a49061a7d9	scoutfs: remove the size index We aren't using the size index. It has runtime and code maintenance costs that aren't worth paying. Let's remove it. Removing it from the format and no longer maintaining it are straight forward. The bulk of this patch is actually the act of removing it from the index locking functions. We no longer have to predict the size that will be stored during the transaction to lock the index items that will be created during the transaction. A bunch of code to predict the size and then pass it into locking and transactions goes away. Like other inode fields we now update the size as it changes. Signed-off-by: Zach Brown <zab@versity.com>	2018-01-30 15:03:35 -08:00
Mark Fasheh	ac09f03327	scoutfs: open by handle This is implemented by filling in our export ops functions. When we get those right, the VFS handles most of the details for us. Internally, scoutfs handles are two u64's (ino and parent ino) and a type which indicates whether the handle contains the parent ino or not. Surpisingly enough, no existing type matches this pattern so we use our own types to identify the handle. Most of the export ops are self explanatory scoutfs_encode_fh() takes an inode and an optional parent and encodes those into the smallest handle that would fit. scoutfs_fh_to_[dentry\|parent] turn an existing file handle into a dentry. scoutfs_get_parent() is a bit different and would be called on directory inodes to connect a disconnected dentry path. For scoutfs_get_parent(), we can export add_next_linkref() and use the backref mechanism to quickly find a parent directory. scoutfs_get_name() is almost identical to scoutfs_get_parent(). Here we're linking an inode to a name which exists in the parent directory. We can also use add_next_linkref, and simply copy the name from the backref. As a result of this patch we can also now export scoutfs file systems via NFS, however testing NFS thoroughly is outside the scope of this work so export support should be considered experimental at best. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [zab edited <= NAME_MAX]	2018-01-26 11:59:47 -08:00
Mark Fasheh	b015927e7b	scoutfs: add debug check for scout-107 We have a bug filed where the fs got stuck spinning in scoutfs_dir_get_backref_path(). There's been enough changes lately that we're not sure if this issue still exists. Catch if we have an excessive number of iterations through our loop there and exit with some debug info. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2018-01-26 10:30:26 -08:00
Zach Brown	c8f8feb7f8	scoutfs: invalidate dentries as locks are dropped Today we use unconditional dentry revalidation to provide directory entry consistency. Any time the vfs tries to use a cached dentry we tell it to drop it and perform a lookup. This hits our item cache which is kept consistent by the locks. This would just be a waste of cpu if it weren't for how heavy weight the vfs revalidation->lookup path is here. It doesn't just invalidate the entry it uses shrink_dcache_parent() to drop all the cached entries in the subtree rooted at the cached entry. We saw 22 second long cpu livelocks in this shrink_dcache_parent() when creating and archiving empty files. Instead lets let the vfs use dcache entries. We only invalidate them as we're dropping the lock that covers them. (Today coarse inode locks cover all the entries in batches of inodes.) We can use d_drop() to remove entries from the cache to stop them from satisfying lookup without trying to free all the dentries under them. Signed-off-by: Zach Brown <zab@versity.com>	2017-12-08 13:22:11 -06:00
Zach Brown	a3d500c143	scoutfs: add rename trace point Signed-off-by: Zach Brown <zab@versity.com>	2017-11-08 13:37:16 -08:00
Mark Fasheh	20a22ddc6b	scoutfs: provide ->setattr Simple attr changes are mostly handled by the VFS, we just have to mirror them into our inode. Truncates are done in a seperate set of transactions. We use a flag to indicate an in-progress truncate. This allows us to detect and continue the truncate should the node crash. Index locking is a bit complicated, so we add a helper function to grab index locks and start a transaction. With this patch we now pass the following xfstests: generic/014 generic/101 generic/313 Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-10-18 13:23:01 -07:00
Mark Fasheh	dd99a0127e	scoutfs: rename scoutfs_inode_index_lock_hold Call it scoutfs_inode_index_try_lock_hold since it may fail and unwind as part of normal (not an error) operation. This lets us re-use the name in an upcoming patch. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-10-18 13:23:01 -07:00
Zach Brown	9027775ef2	scoutfs: fix parent dir nlink update in rename Renaming a dir between parents and clobbering an existing empty dir wasn't correctly updating the parent link counts. Updating parent link counts when dirs are moved between parents is an independent operation from decreasing the link count of a victim existing target of the rename. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 14:51:30 -07:00
Zach Brown	856f257085	scoutfs: used locked getattr for all inodes We only set the .getattr method to our locked getattr filler for regular files. Set it for all files so that stat, etc, will see the current inode for all file types. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 14:51:30 -07:00
Zach Brown	9b31c9795b	scoutfs: add full lock arg to _item_delete() Add the full lock arg to _item_delete() so that it can verify lock coverage. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0aa16f5ef6	scoutfs: add lock arg to _item_create() scoutfs_item_create() hasn't been working with lock coverage. It wouldn't return -ENOENT if it didn't have the lock cached. It would create items outside lock coverate so they wouldn't be invalidated and re-read if another node modified the item. Add a lock arg and teach it to populate the cache so that it's correctly consistent. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	950436461a	scoutfs: add lock coverage for inode index items Add lock coverage for inode index items. Sadly, this isn't trivial. We have to predict the value of the indexed fields before the operation to lock those items. One value in particular we can't reliably predict: the sequence of the transaction we enter after locking. Also operations can create an absolute ton of index item updates -- rename can modify nr_inodes * items_per_inode * 2 items, so maybe 24 today. And these items can be arbitrarily positioned in the key space. So to handle all this we add functions to gather predicted item values we'll need to lock sort and lock them all, then pass appropriate locks down to the item functions during inode updates. The trickiest bit of the index locking code is having to retry if the sequence number changes. Preparing locks has to guess the sequence number of its upcoming trans and then makes item update decisions based on that. If we enter and have a different sequence number then we need to back off and retry with the correct sequence number (we may find that we'll need to update the indexed meta seq and need to have it locked). The use of the functions is straight forward. Sites figure out the predicted sizes, lock, pass the locks to inode updates, and unlock. While we're at it we replace the individual item field tracking variables in the inode info with an array of indexed values. The code ends up a bit nicer. It also gets rid of the indexed time fields that were left behind and were unused. It's worth noting that we're getting exclusive locks on the index updates. Locking the meta/data seq updates results in complete global serialization of all changes. We'll need concurrent writer locks to get concurrency back. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0535e249d1	scoutfs: add lock arg to scoutfs_update_inode_item Add a full lock argument to scoutfs_update_inode_item() and use it to pass the lock's end key into item_update(). This'll get changed into passing the full lock into _update soon. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	32a68e84cf	scoutfs: add full lock coverage to _item_dirty() Add the full lock argument to _item_dirty() so that it can verify lock coverage in addition to limiting item cache population to the range covered by the lock. This also ropes in scoutfs_dirty_inode_item() which is a thin wrapper around _item_dirty(); Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	1c6e3e39bf	scoutfs: add full lock coverage to _item_next() Add the full lock argument to _item_next() so that it can verify lock coverage in addition to limiting item cache population to the range covered by the lock. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	0e4627ea65	scoutfs: add locking of link backref traversal Add cluster locking around the link backref item lookups during ino to path traversal. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	55709c4345	scoutfs: add lock coverage testing to item_lookup* Let's give the item functions the full lock so that they can make sure that the lock has coverage for the keys involved in the operation. This _lookup*() conversion is first so it adds the lock_coverager() helper. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	85dbc21dc6	scoutfs: use lock end keys in rename verification scoutfs_rename() looks for dirents again after acquiring cluster locks. It needs to pass in the lock end keys to limit the items that are read into the cache. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Mark Fasheh	a5283e6f2c	scoutfs: replace trace_printk in dir.c Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-09-28 13:59:49 -07:00
Zach Brown	1012ee5e8f	scoutfs: use block mapping items Move to static mapping items instead of unbounded extents. We get more predictable data structures and simpler code but still get reasonably dense metadata. We no longer need all the extent code needed to split and merge extents, test for overlaps, and all that. The functions that use the mappings (get_block, fiemap, truncate) now have a pattern where they decode the mapping item into an allocated native representation, do their work, and encode the result back into the dense item. We do have to grow the largest possible item value to fit the worst case encoding expansion of random block numbers. The local allocators are no longer two extents but are instead simple bitmaps: one for full segments and one for individual blocks. There are helper functions to free and allocate segments and blocks, with careful coordination of, for example, freeing a segment once all of its constituent blocks are free. _fiemap is refactored a bit to make it more clear what's going on. There's one function that either merges the next bit with the currently building extent or fills the current and starts recording from a non-mergable additional block. The old loop worked this way but was implemented with a single squirrely iteration over the extents. This wasn't feasible now that we're also iterating over blocks inside the mapping items. It's a lot clearer to call out to merge or fill the fiemap entry. The dirty item reservation counts for using the mappings is reduced significantly because each modification no longer has to assume that it might merge with two adjacent contiguous neighbours. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-19 11:25:38 -07:00
Zach Brown	f0a7c4f294	scoutfs: make trans item count const rhs The item count estimate functions didn't obviously differentiate between adding to a count and resetting it. Most callers initialized the count struct to 0 on the stack, incremented their estimate once, and passed it in. The problem is that those same functions that increment once in callers are also used in other estimates to build counts based on multiple operations. This tripped up the data truncate path. It looped and kept incrementing its count while truncating a file with lots of extents until the count got so large that it didn't fit in a segment by itself and blocked forever. This cleans up the item count code so that it's much harder to get wrong. We differentiate between the SIC_() high level count estimates that are meant to be passed in to _hold_trans(), and the internal __count_() functions which are used to add up the item counts that make up an aggregate operation. With this fix the only way to use the count in extent truncation is to correctly reset it for the item count for each transacation. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-12 10:41:55 -07:00
Zach Brown	8735d319a3	scoutfs: fix inode lock inversions We lock multiple inodes by order of their inode number. This fixes the directory entry paths that hold parent dir and target inode locks. Link and unlink are easy because they just acquire the existing parent dir and target inode locks. Lookup is a little squirrely because we don't want to try and order the parent dir lock with locks down in iget. It turns out that it's safe to drop the dir lock before calling iget as long as iget handles racing the inode cache instantiation with inode deletion. Creation is the remaining pattern and it's a little weird because we want to lock the newly created inode before we create it and the items that store it. We add a function that correctly orders the locks, transaction, and inode cache instantiation. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	f634a5b598	scoutfs: implement scoutfs_rename() Previously we had lots of inode creation callers that used a function to create the dirent items and we had unlink remove entries by hand. Rename is different because it wants to remove and add multiple links as it does its work, including recreating links that it has deleted. We rework add_entry_item() so that it gets the specific fields it needs instead of getting them from the vfs structs. This makes it clear that callers are responsible for the source of the fields. Specifically we need to be able to add entries during failed rename cleanup without allocating a new readdir pos from the parent dir. With callers now responsible for the inputs to add_entry_items() we move some of its code out into all callers: checking name length, dirtying the parent dir inode, and allocating a readdir pos from the parent. We then refactor most of _unlink() into a a del_entry_items() to match addition. This removes the last user of scoutfs_item_delete_many() and it will be removed in a future commit. With the entry item helpers taking specific fields all the helpers they use also need to use specific fields instead of the vfs structs. To make rename cluster safe we need to get cluster locks for all the inodes that we work with. We also have to check that the locally cached vfs input is still valid after acquiring the locks. We only check the basic structural correctness of the args: that parent dirs don't violate ancestor rules to create loops and that the entries assumed by the rename arguments still exist, or not. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Mark Fasheh	1bcad2e9cc	scoutfs: provide ->permission We need to lock and refresh the VFS inode before it checks permissions in system calls, otherwise we risk checking against stale inode metadata. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [zab: adapted to newer lock call] Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	ceccc56c8f	scoutfs: add inode locking flags to callers Now that we have the inode refreshing flags let's add them to the callers that want to have a current inode after they have their lock. Callers locking newly created items use the new inode flag to reset the refresh gen. A few inode tests are moved down to after locking so that it can test the current refreshed inode. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:38:00 -07:00
Zach Brown	fdbe0de8e9	scoutfs: add flag to refresh inode after locking Lock callers can specify that they want inode fields reread from items after the lock is acquired. dlmglue sets a refresh_gen in the locks that we store in inodes to track when they were last refreshed and if they need a refresh. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:37:59 -07:00
Zach Brown	51e03dcb7a	scoutfs: refactor inode locking function This is based on Mark Fasheh <mfasheh@versity.com>'s series that introduced inode refreshing after locking and a trylock for readpage. Rework the inode locking function so that it's more clearly named and takes flags and the inode struct. We have callers that want to lock the logical inode but aren't doing anything with the vfs inode so we provide that specific entry point. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:37:59 -07:00
Mark Fasheh	0011c185a9	scoutfs: plug the rest of our locking into dlmglue We move struct ocfs2_lock_res_ops and flags to dlmglue.c so that locks.c can get access to it. Similarly, we export ocfs2_lock_res_init_common() for locks.c can initialize each lockres before use. Also, free_lock_tree() now has to happen before we shut down the dlm - this gives dlmglue the opportunity to unlock their underlying dlm locks before we go off freeing the structures. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-08-24 11:45:15 -05:00
Zach Brown	65c3ac5043	scoutfs: Add cluster locking to node/file ops This gives us cluster locking for the overwhelming majority of metadata ops that scoutfs supports. In particular, we can create and modify file metadata from one node and immediately see the changes reflected on another node. In addition to synchonrization the cluster locks here are providing an I/O endpoint for our item cache, ensuring that it doesn't read stale items. Readdir and file read/write are notable exception - they require a more specific approach and will be implemented in a future patch. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [fixed iget unlock and truncated commit message summary] Signed-off-by: Zach Brown <zab@versity.com>	2017-08-03 11:16:35 -07:00
Zach Brown	47b26d7888	scoutfs: add end to _item_delete Add the end argument to scoutfs_item_delete() to limit how many items it will read into the cache. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	d5b4677e7f	scoutfs: add end to _dirty, _delete_many, _update These transformations are mechanical and there aren't many callers of these so we combine them into one commit. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	f611c769e2	scoutfs: add 'end' to item_next to limit reads Add an end key to the item_next calls to limit how many items will be read into the cache. Callers typically get this from the lock they hold that covers the iteration. We differentiate between iteration and caching so that a series of small iterations (listxattr on inodes, namespace walk in small dirs) can be satisfied by a single read of adjacent items from segments. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	19171f7a25	scoutfs: add end to _item_lookup The item cache can only be populated with items that are covered by locks. Require callers to provide the farthest key that can be covered by the locks. Locks provide a key for exactly this purpose. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	8d29c82306	scoutfs: sort keys by zone, then inode, then type Holding a DLM lock protects a range of the key space. The DLM locks span inodes or regions of inodes. We need the sort order in LSM items to match the DLM range keys so that we can read all the items covered by a lock into the cache from a region of LSM segments. If their orders differered then we'd have to jump around segments to find all the items covered by a given DLM lock. Previously we were sorting by type then, within types, by inode. Now we want to sort by inode then by type. But there are structures which previously had a type but weren't then sorted by inode. We introduce zones as the primary sort key. Inode index and node zones are sorted by the inode fields and node ids respectively. Then comes the fs zone first sorted by inode then the type of the key. The bulk of this is the mechanical introduction of the zone field to the keys, moving the type field down, and a bulk rename of _KEY to _TYPE. But there are some more substantial changes. The orphan keys needed to be put in a zone. They fit in the NODE zone which is all about resources that nodes hold and would need to be cleaned up if the node went away. The key formatting is significantly changed to match the new formatting. Formatted keys are now generally of the form "zone.primary.type..." And finally with the keys now properly sorted by inodes we can correctly construct a single range of item cache keys to invalidate when unlocking the inode group locks. Signed-off-by: Zach Brown <zab@versity.com>	2017-07-19 13:30:03 -07:00
Zach Brown	463a696575	scoutfs: add value length limit Add a relatively small universal value size limit. This will be needed by more dense item packing to predict the worst case padding to avoid full items crossing block boundaries. We refactor the existing symlink and xattr item value limit to use this new limit. Signed-off-by: Zach Brown <zab@versity.com>	2017-06-27 14:04:38 -07:00
Zach Brown	1724bab8ea	scoutfs: store large symlinks in multiple items We're shrinking the max item value size so we need to store symlinks with large target paths in multiple items. The arbitrary max value size defined here will be replaced in the future with the new global maximum value size. Signed-off-by: Zach Brown <zab@versity.com>	2017-06-27 14:04:38 -07:00
Zach Brown	b7bbad1fba	scoutfs: add precise transation item reservations We had a simple mechanism for ensuring that transaction didn't create more items than would fit in a single written segment. We calculated the most dirty items that a holder could generate and assumed that all holders dirtied that much. This had two big problems. The first was that it wasn't accounting for nested holds. write_begin/end calls the generic inode dirtying path whild holding a transaction. This ended up deadlocking as the dirty inode waited to be able to write while its trans held back in write_begin prevented writeout. The second was that the worst case (full size xattr) item dirtying is enormous and meaningfully restricts concurrent transaction holders. With no currently dirty items you can have less than 16 full size xattr writes. This concurrency limit only gets worse as the transaction fills up with dirty items. This fixes those problems. It adds precise accounting of the dirty items that can be created while a transaction is held. These reservations are tracked in journal_info so that they can be used by nested holds. The precision allows much greater concurrency as something like a create will try to reserve a few hundreds bytes instead of 64k. Normal sized xattr operations won't try to reserve the largest possible space. We add some feedback from the item cache to the transaction to issue warnings if a holder dirties more items than it reserved. Now that we have precise item/key/value counts (segment space consumption is a function of all three :/) we can't have a single atomic track transaction holders. We add a long-overdue trans_info and put a proper lock and fields there and much more clearly track transaction serialization amongst the holders and writer. Signed-off-by: Zach Brown <zab@versity.com>	2017-05-23 12:15:13 -07:00
Nic Henke	9fc47dedf8	Add unlocked ioctls for directories. The use of the Scout ioctls for inode-since and data-since on the root directory is a rather helpful boost. This allows user code to start on blank filesystems and monitor activity without needing to create files. The existing ioctl code was already present, so wiring into the directory file operations was all that needed to happen. Signed-off-by: Nic Henke <nic.henke@versity.com> Reviewed-by: Zach Brown <zab@versity.com> Reviewed-by: Mark Fasheh <mfasheh@versity.com>	2017-04-18 14:03:24 -07:00

1 2

88 Commits