scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-05-01 10:25:43 +00:00

Author	SHA1	Message	Date
Zach Brown	08f544cc15	scoutfs: remove scoutfs_item_lookup_exact() size Every caller of scoutfs_item_lookup_exact() provided a size that matches the value buffer. Let's remove the redundant arg and use the value buffer length as the exact size to match. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	9c1b393404	scoutfs: don't track offline sparse blocks There were some mistakes in tracking offline blocks. Online and offline block counts are meant to only refer to actual data contents. Sparse blocks in an archived file shouldn't be counted as offline. But the code was marking unallocated blocks as offline. This could corrupt the offline block count if a release extended past i_size and marked the blocks in the mapping item as offline even though they're past i_size. We could have clamped the block walking to not go past i_size. But we still would have had the problem of having offline blocks track sparse blocks. Instead we can fix the problem by only marking blocks offline if they had allocated blocks. This means that sparse regions are never marked offline and will always read zeros. Now a release that extends past i_size will not do anything to the unallocated blocks in the mapping item past i_size and the offline block count will be consistent. (Also the 'modified' and 'dirty' booleans were redundant, we only need one of the two.) Signed-off-by: Zach Brown <zab@versity.com>	2018-04-02 10:21:58 -07:00
Nic Henke	9d18d3a7aa	Add script to build rpms and populate distro release To better support building RPMs for multiple distribution versions, we need a bit of help to organize the RPMs. We take the path of adding a 'rpms/7.4.1708' style directory, with the implicit knowledge this is for CentOS and RHEL. Other distribution handling is left for the future. To ease DOCKER_IMAGE selection for different distribution versions, the environment variable DISTRO_VERS can be used. This simplifies a bunch of call locations and scripting when we don't need to change the Docker image flavor beyond this distribution version toggle. i.e: DISTRO_VERS=el73 ./indocker.sh ./build_rpms.sh The directory tree ends up looking like this: rpms/7.4.1708/kmod-scoutfs-1.0-0.git.5fee207.el7.x86_64.rpm rpms/7.3.1611/kmod-scoutfs-1.0-0.git.5fee207.el7.x86_64.rpm	2018-03-29 15:39:36 -07:00
Nic Henke	22f1ded17b	Add RPM builds for scoutfs-kmod This adds in the makefile targets and spec file template we need to build RPMs. Most of the heavy lifting is taken care of by our docker container and the rpmbuild.sh script distributed in it. The git versioning comes from 'git describe --long', which gives us the tag, the number of commits and the abbreviated commit name. This allows us to use the number of commits as the RPM release version, letting yum understand how to process 'yum update' ordering. yum update shows us the proper processing, along with how our versioning lines up in the RPMs: ---> Package kmod-scoutfs.x86_64 0:0-0.3.gb83d29d.el7 will be updated ---> Package kmod-scoutfs.x86_64 0:0-0.4.g2e5324e.el7 will be an update The rpm file name is: kmod-scoutfs-0-0.4.g2e5324e.el7.x86_64.rpm When we build release RPMS, we'll toggle _release, giving us a rpm name and version like kmod-scoutfs-0-1.4.g2e5324e.el7.x86_64.rpm. The toggle of 0/1 is enough to tell yum that all of the non-release RPMs with the same version are older than the released RPMs. This allows for the release to yum update cleanly over development versions. The git hash helps map RPM names to the git version and the contents of the .note-git_describe, for this RPM it was: heads/nic/rpms-0-g2e5324. The RPM doesn't contain the branch name, but we can add that and other info later if needed. We are not naming the module for a kernel version, that does not seem to be standard practice upstream. Instead, we'll make use of our Artifactory repos and upload the RPMs to the correct places (7.3 vs 7.4 directories, etc).	2018-03-29 15:39:36 -07:00
Zach Brown	995e43aa18	scoutfs: hold the alloc sem during truncate The super info's alloc_rwsem protects the local node free segment and block bitmap items. The truncate code wasn't holding using the rwsem so it could race with other local node allocator item users and corrupt the bitmaps. In the best case this could corrupt structures that trigger EIO. The corrupt items could also create duplicate block allocations that clobber each other and corrupt data. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-16 09:18:50 -07:00
Zach Brown	3818f72776	scoutfs: fix inefficient backwards item reading Iterating over items backwards would result in a lot of extra work. When an item isn't present in the cache we go and search the segments for the item. Once we find the item in its stack of segments we also read in and cache all the items from the missing item to the end of all the segments. This reduced complexity a bit but had very bad worst case performance. If you read items backwards you constantly get cache misses that each search the segments for the item and then try to cache everything to the end of the segment. You're essentially working uncached and are doing quite a lot of work to get that single missed item cached each time. This adds the complexity to cache all the items in the segment stack around the missed item, not just after the missed item. Now reverse iteration hits cached items for everything in the segment after the initial miss. To make this work we have to pass the full lock coverage range to the item reading path. Then we search the manifest for segments that contain the missing key and use those segment's ranges to determine the full range of items that we'll cache. Then we again search the manifest for all the level 0 segments that intersect that range. That range extension is only for cached reads, it doesn't apply to the 'next' call which ignores caching. That operation is getting different enough that we pull it out into its own function. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 13:15:04 -07:00
Zach Brown	c4de85fd82	scoutfs: cleanup xattr item storage Honoring the XATTR_REMOVE flag in xattr deletion exposed an interesting bug in getxattr(). We were unconditinally returning the max xattr value size when someone tried to probe an existing xattrs' value size by calling getxattr with size == 0. Some kernel paths did this to probe the existance of xattrs. They expected to get an error if the xattr didn't exist, but we were giving them the max possible size. This kernel path then tried to remove the xattrs with XATTR_REMOVE and that now failed and caused a bunch of errors in xfstests. The fix is to return the real xattr value size when getxattr is called with size == 0. To do that with the old format we'd have to iterate over all the items which happened to be pretty awkward in the current code paths. So we're taking this opportunity to land a change that had been brewing for a while. We now form the xattr keys from the hash of the name and the item values now store a logical contiquous header, the name, and the value. This makes it very easy for us to have the full xattr value length in the header and return it from getxattr when size == 0. Now all tests pass while honororing the XATTR_CREATE and XATTR_REMOVE flags. And the code is a whole lot easier to follow. And we've removed another barrier for moving to small fixed size keys. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 09:23:57 -07:00
Zach Brown	c438f5d887	scoutfs: remove scoutfs_item_set_batch() scoutfs_item_set_batch() has a rocky history of being a giant pain in the butt. It's been a lot simpler to have callers use individual item ops instead of trying to describe a compound item operation to sometihng like _set_batch(). Its last user has gone away so we can remove it and never speak of it again. And there was much rejoycing. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 09:23:57 -07:00
Zach Brown	4101c655a5	scoutfs: rework set_xattr to honor XATTR_ flags We weren't properly honoring the XATTR_{CREATE,REPLACE} flags. For a start we weren't even passing them in to our _xattr_set() from _setxattr(). So that's something. We left it to scoutfs_item_set_batch() to return errors if we were. This is wrong because the xattr flags are xattr granular, not item granular. We don't want _REPLACE to fail when replacing a larger xattr value because later items in the xattr don't have matching existing items. (And it had some bugs where it could livelock if you set flags and items already existed. :high_fives:). Now that we have the _save and _restore calls we can avoid _set_batch's bad semantics and bugs entirely. It's easy for us to compare the flags to item lookups, delete the old, create the new, and restore the old on errors. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 09:23:57 -07:00
Zach Brown	acfc4b357b	scoutfs: add item saving and restoring Add item cache functions for saving and restoring items. This lets callers more easily undo changes while they have transactions pinned. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 09:23:57 -07:00
Zach Brown	4dad03a3dd	scoutfs: add item_is_dirty() helper We had an absolute ton of open coding of testing an item's dirty flag. Let's hide it off in a helper so we're less likely to mess it up. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 09:23:57 -07:00
Zach Brown	7fb6841b1e	scoutfs: free val while deleting items It was silly to hand off deleted values to callers to free. We can just free as we delete and save a bunch of caller value manipulation. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 09:23:57 -07:00
Zach Brown	77f29fa021	scoutfs: allow null val in scoutfs_item_lookup Some callers may want to just test if an item is present and not necessarily want to setup storage for copying the value in. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 09:23:57 -07:00
Zach Brown	9f51b63f8d	scoutfs: check snprintf_key() format args Add the function attribute to snprintf_key() to have the compiler verify its print format and args. I noticed some buggy changes that didn't throw errors. Happily none of the existing calls had problems. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-15 09:23:57 -07:00
Zach Brown	0b54d71b98	scoutfs: avoid double unlock We weren't sufficiently careful in reacting to basts. If a bast arrived whlie an unlock is in flight we'd turn around and try to unlock again, returning an error, and exploding. More carefully only act on basts if we have an active mode that needs to be unlocked. Now if the racey bast arrives we'll ignore it and end up freeing the lock in processing after the unlock succeeds. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-13 13:14:26 -07:00
Zach Brown	d58c8d5993	scoutfs: move lock work after dependencies Some of the lock processing path was happening too early. Both maintainance of the locks on the LRU and waking waiters depends on whether there is work pending and on the the granted mode. Those are changed in the middle by processing so we need to move these two bits of work down so that they can consume the updated state. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-09 14:29:55 -08:00
Zach Brown	9ad0f81084	scoutfs: add some lock/item consistency checks Add some tests to the locking paths to see if we violate item caching rules. As we finish locking calls we make sure that the item cache is consistent with the lock mode. And we make sure that we don't free locks before they've been unlocked and had a chance to check the item cache. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-09 14:29:55 -08:00
Zach Brown	2aa613dae5	scoutfs: add scoutfs_item_range_cached() Add a quick helper for querying if a given range of keys is covered by the item cache. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-09 14:29:55 -08:00
Zach Brown	951b6d8dcd	scoutfs: add d_revalidate trace Add a trace event to get some visibility into dentry revalidation. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-09 14:29:55 -08:00
Zach Brown	2136a973ed	scoutfs: copy names in rename trace event The rename trace event was recording and later dereferencing pointers to dentry names that could be long gone by the time the output is generated. We need to copy the name strings into the trace buffers. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-09 14:29:55 -08:00
Zach Brown	8ec5b7efe3	scoutfs: remove bio page add trace This is a very chatty trace evenet that doesn't add much value. Let's remove it and make a lot more room for other more interesting trace events. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-09 14:29:55 -08:00
Zach Brown	6adb24f0f5	scoutfs: clean up compaction destruction We're seeing warnings from trying to destroy the server work queue while it's still active. Auditing shows that almost all of the sources of queued work are shutdown before we destroy the work queue. Except for the compaction func. It queues itself via the sneaky call to scoutfs_compact_kick() inside scoutfs_client_finish_compaction(). What a mess. We only wait for work to finish running in scoutfs_compact_destroy(), we don't forbid further queueing. So with just the right races it looks possible to have the compact func executing after we return from _destroy(). It can then later try to queue the commit_work in the server workqueue. It's pretty hard to imagine this race, but it's made a bit easier by the startling fact that we don't free the compact info struct. That makes it a little easier to imagine use-after-destroy not exploding. So let's forcibly forbid chain queueing during compaction shutdown by using cancel_work_sync(). It marks the work canceling while flushing so the queue_work in the work func won't do anything. This should ensure that the compaction func isn't running when destroy returns. Also while we're at it actually free the allocated compaction info struct! Cool cool cool. Signed-off-by: Zach Brown <zab@versity.com>	2018-03-06 10:32:33 -08:00
Zach Brown	241b52d55a	scoutfs: reserve at least one xattr item value Even when we're setting an xattr with no value we still have a file system item value that contains the xattr value header which tells us that this is the last value. This fixes a warning that would be issued if we tried to set an xattr with a zero length value. We'd try to dirty an item value with the header after having reserved zero bytes for item values. To hit the warning the inode couldn't already be dirty so that the xattr value didn't get to hide in the unsed reservation for dirtying the inode item's value. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-28 22:14:15 -08:00
Zach Brown	f9e282048f	scoutfs: revalidate dentries by checking items Initially we had d_revalidate always return that the dentry was invalid. This avoids dentry cache consistency problems across the cluster by always performing lookups. That's slow by itself, but it turns out that the dentry invalidation that happens on revalidation failure is very expensive if you have lots of dentries. So we switched to forcefully dropping dirents as we revoked their lock. That avoided the cost of revalidation failure but it adds the problem that dentries are unhashed when their locks are dropped. This causes paths like getcwd() to return errors when they see unhashed dentries instead of trying to revalidate them. This implements a d_revalidate which actually does work to determine if the dentry is still valid. When we populate dentries under a lock we add them to a list on the lock. As we drop the lock we remove them from the list. But the dentry is not modified. This lets paths like getcwd() still work. Then we implement revalidation that does the actual item lookups if the dentry's lock has been dropped. This lets revalidation return success and avoid the terrible invalidation costs from returning failure and then calling lookup to populate a new dentry. This brings us more in line with the revalidation behaviour of other systems that maintain multi-node dcache consistency. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-28 11:18:57 -08:00
Zach Brown	c76c6582f0	scoutfs: release server conn under mutex I was rarely seeing null derefs during unmount. The per-mount listening scoutfs_server_func() was seeing null sock->ops as it called kernel_sock_shutdown() to shutdown the connected client sockets. sock_release() sets the ops to null. We're not supposed to use a socket after we call it. The per-connection scoutfs_server_recv_func() calls sock_release() as it tears down its connection. But it does this before it removes the connection from the listener's list. There's a brief window where the connection's socket has been released but is still visible on the list. If the listener tries to shutdown during this time it will crash. Hitting this window depends on scheduling races during unmount. The unmount path has the client close its connection to the server then the server closes all its connected clients. If the local mount is the server then it will have recv work see an error as the client disconnects and it will be racing to shut down the connection with the listening thread during unmount. I think I only saw this in my guests because they're running slower debug kernels on my slower laptop. The window of vulnerability while the released socket is on the list is longer. The fix is to release the socket while we hold the mutex and are removing the connection from the list. A released socket is never visible on the list. While we're at it don't use list_for_each_entry_safe() to iterate over the connection list. We're not modifying it. This is an lingering artifact from previous versions of the server code. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-22 14:27:01 -08:00
Zach Brown	e31e828aff	scoutfs: don't livelock conflicting waiters If there are two tasks waiting for conflicting modes, say a writer waiting for a CW index lock and a index walker waiting for a PR index lock, they can livelock. In the ast one of their modes will be granted. We'll wake them under the lock now that they can see that their mode is ready. But then while still under the lock we see a conflicting waiter, and no users, so we immediately start converting the lock away to the other waiting conflicting mode. The woken waiter is scheduled but now sees that the lock isn't granted anymore because it's converting. This bounces back and forth forever. The fix is to refuse to start conversion while there are still waiters for the currently granted mode. Once they finish it'll be able to convert. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-21 15:14:29 -08:00
Zach Brown	302b0f5316	scoutfs: track inode 512b block count We weren't doing anything with the inode blocks field. We weren't even initializing it which explains why we'd sometimes see garbage i_blocks values in scoutfs inodes in segments. The logical blocks field reflects the contents of the file regardless of whether its online or not. It's the sum of our online and offline block tracking. So we can initialize it to our persistent online and offline counts and then keep it in sync as blocks are allocated and freed. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-21 09:36:44 -08:00
Zach Brown	c1311783d5	scoutfs: add tracking of online and offline blocks Signed-off-by: Zach Brown <zab@versity.com>	2018-02-21 09:36:44 -08:00
Zach Brown	f52dc28322	scoutfs: simplify lock use of kernel dlm We had an excessive number of layers between scoutfs and the dlm code in the kernel. We had dlmglue, the scoutfs locks, and task refs. Each layer had structs that track the lifetime of the layer below it. We were about to add another layer to hold on to locks just a bit longer so that we can avoid down conversion and transaction commit storms under contention. This collapses all those layers into simple state machine in lock.c that manages the mode of dlm locks on behalf of the file system. The users of the lock interface are mainly unchanged. We did change from a heavier trylock to a lighter nonblock lock attempt and have to change the single rare readpage use. Lock fields change so a few external users of those fields change. This not only removes a lot of code it also contains functional improvements. For example, it can now convert directly to CW locks with a single lock request instead of having to use two by first converting to NL. It introduces the concept of an unlock grace period. Locks won't be dropped on behalf of other nodes soon after being unlocked so that tasks have a chance to batch up work before the other node gets a chance. This can result in two orders of magnitude improvements in the time it takes to, say, change a set of xattrs on the same file population from two nodes concurrently. There are significant changes to trace points, counters, and debug files that follow the implementation changes. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-14 15:00:17 -08:00
Zach Brown	d42a3115c9	scoutfs: fix livelock in item_set_batch scoutfs_item_set_batch() first tries to populate the item cache with the range of keys it's going to be modifying. It does this by walking the input key range and trying to read any missing regions. It made a bad assumption that reading from the final present key of a cached range would read more items into the cache. That was often the case when the last present key landed in a segment that contained more keys. But if the last present key was at the end of a segment the read wouldn't make any difference. It'd keep trying to read that final present key indefinitely. The fix is to try and populate the item cache starting with the first key that's missing from the cache by incrementing the last key that we found in the cache. This stopped scoutfs/507 from reliably getting stuck trying to modify an xattr whose single item happened to land at the end of a segment. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-09 17:58:19 -08:00
Zach Brown	4ff1e3020f	scoutfs: allocate inode numbers per directory Having an inode number allocation pool in the super block meant that all allocations across the mount are interleaved. This means that concurrent file creation in different directories will create overlapping inode numbers. This leads to lock contention as reasonable work loads will tend to distribute work by directories. The easy fix is to have per-directory inode number allocation pools. We take the opportunity to clean up the network request so that the caller gets the allocation instead of having it be fed back in via a weird callback. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-09 17:58:19 -08:00
Zach Brown	a49061a7d9	scoutfs: remove the size index We aren't using the size index. It has runtime and code maintenance costs that aren't worth paying. Let's remove it. Removing it from the format and no longer maintaining it are straight forward. The bulk of this patch is actually the act of removing it from the index locking functions. We no longer have to predict the size that will be stored during the transaction to lock the index items that will be created during the transaction. A bunch of code to predict the size and then pass it into locking and transactions goes away. Like other inode fields we now update the size as it changes. Signed-off-by: Zach Brown <zab@versity.com>	2018-01-30 15:03:35 -08:00
Zach Brown	9cc750c4ec	scoutfs: remove lock idr in free, not put The idr entry that identifies a lock's position in the debugfs locks file is allocated early in the process of building up a lock. Today the idr entry is only destroyed in put_(), which is called later once reference counts are established. Errors before then just call free_() and can leave idrs around that reference freed memory. This always destroys the idr entry in free_(). We no longer leave idr entries around that reference freed memory. This fixes use after free while walking the debugfs file which can hit in scoutfs/006 which uses the locks file. Signed-off-by: Zach Brown <zab@versity.com>	2018-01-30 11:04:47 -08:00
Mark Fasheh	ac09f03327	scoutfs: open by handle This is implemented by filling in our export ops functions. When we get those right, the VFS handles most of the details for us. Internally, scoutfs handles are two u64's (ino and parent ino) and a type which indicates whether the handle contains the parent ino or not. Surpisingly enough, no existing type matches this pattern so we use our own types to identify the handle. Most of the export ops are self explanatory scoutfs_encode_fh() takes an inode and an optional parent and encodes those into the smallest handle that would fit. scoutfs_fh_to_[dentry\|parent] turn an existing file handle into a dentry. scoutfs_get_parent() is a bit different and would be called on directory inodes to connect a disconnected dentry path. For scoutfs_get_parent(), we can export add_next_linkref() and use the backref mechanism to quickly find a parent directory. scoutfs_get_name() is almost identical to scoutfs_get_parent(). Here we're linking an inode to a name which exists in the parent directory. We can also use add_next_linkref, and simply copy the name from the backref. As a result of this patch we can also now export scoutfs file systems via NFS, however testing NFS thoroughly is outside the scope of this work so export support should be considered experimental at best. Signed-off-by: Mark Fasheh <mfasheh@versity.com> [zab edited <= NAME_MAX]	2018-01-26 11:59:47 -08:00
Mark Fasheh	b015927e7b	scoutfs: add debug check for scout-107 We have a bug filed where the fs got stuck spinning in scoutfs_dir_get_backref_path(). There's been enough changes lately that we're not sure if this issue still exists. Catch if we have an excessive number of iterations through our loop there and exit with some debug info. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2018-01-26 10:30:26 -08:00
Zach Brown	e803b10bca	scoutfs: drop lock refcnt/users under task ref If scoutfs_unlock() sees that it isn't the last task using a lock it just returns. It doesn't unlock the lock and it doesn't drop the lock refcnt and users. This leaks the lock refcnt and users because find_alloc_scoutfs_lock() always increments them when it finds a lock. Inflated counts will stop the shrinker from freeing the locks and eventually the counts will wrap and could cause locks to be freed while they're still in use. We can either always drop the refcnt/users in unlock or we can drop them in lock as we notice that our task already has the lock. I chose to have the task ref hold one refcnt/users which are only dropped as the final task unlocks. Signed-off-by: Zach Brown <zab@versity.com>	2018-01-15 10:04:15 -08:00
Zach Brown	f54e59eef1	scoutfs: add debugfs "locks" for scoutfs_lock Add a file for showing the scoutfs_lock struct contents. This is the layer above the detailed dlmglue/dlm info provided in the existing "locking_state" file. Signed-off-by: Zach Brown <zab@versity.com>	2018-01-15 10:04:15 -08:00
Zach Brown	a9c7511c8b	scoutfs: add more scoutfs lock tracing fields We were missing the blocking level and count of cw holders. Signed-off-by: Zach Brown <zab@versity.com>	2018-01-15 10:04:15 -08:00
Zach Brown	fb6c128503	scoutfs: move unblock_lock trace under lock It samples fields that are only consistent under the lock. We also want to see the fields every time it rechecks the conditions that stop it from downconverting. Signed-off-by: Zach Brown <zab@versity.com>	2018-01-15 10:04:15 -08:00
Mark Fasheh	afc798599f	scoutfs: invalidate cache when we free locks We weren't invalidating our cache before freeing locks due to memory pressure. This would cause stale data on the node which originally held the lock. Fix this by firing a callback from dlmglue before we free a lock from the system. On the scoutfs side, the callback is wired to call our invalidate function. This will ensure that the right data and metadata hit disk before another node is allowed to acquire that lock. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2018-01-14 10:58:05 -08:00
Mark Fasheh	3661f06bec	scoutfs: add trigger to drop lock cache We have a corruption that can happen when a lock is reclaimed but it's cache is still dirty. Detect this corruption by placing a trigger in statfs which fires off lock reclaim. Statfs is nice because for scoutfs it's lockless, which means there should not be any references on locks when the trigger is fired. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2018-01-14 10:58:05 -08:00
Mark Fasheh	5cc05d663e	scoutfs: count some lock events by type We use a new event callback in dlmglue so that scout has a chance to do some per-lock type counters. I included the most important dlmglue events - basically those which can cost us network or disk traffic. Right now scout just counts downconvert events since those are the most interesting to us. We also just count on the ino and index locks for now. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-12-21 13:20:56 -08:00
Zach Brown	829126790b	scoutfs: retry stale btree and segment reads We don't have strict consistency protocols protecting the "physical" caches that hold btree blocks and segments. We have metadata that tells a reader that it's hit a stale cached entry and needs to invalidate and read a the current version from the media. This implements the retrying. If we get stale sequence numbers in segments or btree blocks we invalidate them from the cache and return -ESTALE. This can only happen when reading structures that could have been modified remotely. This means btree reads in the clients and segment reads for everyone. btree reads on the server are always consistent because it is the only writer. Adding retrying to item reading and compaction catches all of these cases. Stale reads are triggered by inconsistency. But that could also be persistent corruption in persistent media. Callers need to be careful to turn their retries into hard errors if they're persistent. Item reading can do this because it knows the btree root seq that anchored the walk. Compaction doesn't do this today. That gets addressed in a big sweep of error handling at some point in the not too distant future. Signed-off-by: Zach Brown <zab@versity.com>	2017-12-20 12:21:13 -08:00
Zach Brown	e354fd18b1	scoutfs: add sysfs.c, fsid file I wanted to add a sysfs file that exports the fsid for the mount of a given device. But our use of sysfs was confusing and spread through super.c and counters.c. This moves the core of our sysfs use to sysfs.c. Instead of defining the per-mount dir as a kset we define it as an object with attributes which gives us a place to add an fsid attribute. counters still have their own whack of sysfs implementation. We'll let it keep it for now but we could move it into sysfs.c. It's just counter interation around the insane sysfs obj/attr/type nonsense. For now it just needs to know to add its counters dir as a child of the per-mount dir instead of adding it to the kset. Signed-off-by: Zach Brown <zab@versity.com>	2017-12-20 12:21:13 -08:00
Zach Brown	9ed34f8892	scoutfs: add triggers Signed-off-by: Zach Brown <zab@versity.com>	2017-12-20 12:21:13 -08:00
Zach Brown	bbcf76e154	scoutfs: reformat counters.h Clean up the counter definition macro. Sort the entries and clean up whitespace so that adding counters in the future will be more orderly and satisfying. Signed-off-by: Zach Brown <zab@versity.com>	2017-12-20 12:21:13 -08:00
Mark Fasheh	29d97086aa	scoutfs: trace print lockspace and lock names correctly We weren't using the right string macros in the recent lock traces, fix that. Also osb->cconn->cc_name is NULL terminated so we don't need to keep the string length around. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-12-15 15:34:07 -08:00
Mark Fasheh	cdc2249fcf	scoutfs: dlmglue/lock counters and tracing We can use the excellent code in counters.h to easily place a whole set of useful counters in dlmglue: - one for every kind of wait in cluster_lock (blocked, busy, etc) - one for each type of dlm operation (lock/unlock requests, converts, etc) - one for each type of downconvert (cw/pr/ex) These will give us a decent idea of the amount and type of lock traffic a given node is seeing. In addition, we add a second trace at the bottom of invalidate_caches. By turning both traces in invalidate_caches on, we can look at our trace log to see how long a given locks downconvert took. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-12-15 10:31:11 -08:00
Zach Brown	bbbb098e38	scoutfs: warn on stale cached segments We've yet to really wire up the eventual consistency of btree ring blocks and segments. The btree block reading code has had a warning that fires if it sees stale blocks for a long time (which we've yet to hit) but we have no such warning in the segment. If we hit stale segments we could have very unpredictable results. So let's add a quick warning to highlight the case to save us heartache if we hit it before implementing full retrying. Signed-off-by: Zach Brown <zab@versity.com>	2017-12-12 18:04:45 -06:00
Mark Fasheh	4a043dfd3f	scoutfs: add some tracing to dlmglue The cluster_lock and cluster_unlock traces are close to each other but not quite there so they have to be two different traces (thanks tracepoints!). The rest (ocfs2_unblock_lock, ocfs2_simple_drop_lock) can use a shared trace class. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-12-12 13:47:37 -08:00

1 2 3 4 5 ...

581 Commits