scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-02-09 04:00:10 +00:00

Author	SHA1	Message	Date
Zach Brown	c061ada671	scoutfs: mounts connect once server is listening An elected leader writes a quorum block showing that it's elected before it assumes exclusive access to the device and starts bringing up the server. This lets another later elected leader find and fence it if something happens. Other mounts were trying to connect to the server once this elected quorum block was written and before the server was listening. They'd get conection refused, decide to elect a new leader, and try to fence the server that's still running. Now, they should have tried much harder to connect to the elected leader instead of taking a single failed attempt as fatal. But that's a problem for another day that involves more work in balancing timeouts and retries. But mounts should not have tried try to connect to the server until its listening. That's easy to signal by adding a simple listening flag to the quorum block. Now mounts will only try to connect once they see the listening flag and don't see these racey refused connections. Signed-off-by: Zach Brown <zab@versity.com>	2019-05-30 15:01:00 -07:00
Zach Brown	36b0df336b	scoutfs: add unmount barrier Now that a mount's client is responsible for electing and starting a server we need to be careful about coordinating unmount. We can't let unmounting clients leave the remaining mounted clients without quorum. The server carefully tracks who is mounted and who is unmounting while it is processing farewell requests. It only sends responses to voting mounts while quorum remains or once all the voting clients are all trying to unmount. We use a field in the quorum blocks to communicate to the final set of unmounting voters that their farewells have been processed and that they can finish unmounting without trying to restablish quorum. The commit introduces and maintains the unmount_barrier field in the quorum blocks. It is passed to the server from the election, the server sends it to the client and writes new versions, and the client compares what it received with what it sees in quorum blocks. The commit then has the clients send their unique name to the server who stores it in persistent mounted client records and compares the names to the quorum config when deciding which farewell reqeusts can be responded to. Now that farewell response processing can block for a very long time it is moved off into async work so that it doesn't prevent net connections from being shutdown and re-established. This also makes it easier to make global decisions based on the count of pending farewell requests. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	fe63b566c9	scoutfs: use _unaligned instead of __packed We were relying on a cute (and probably broken) trick of defining pointers to unaligned base types with __packed. Modern versions of gcc warn about this. Instead we either directly access unaligned types with get_ and put_unaligned, or we copy unaligned data into aligned copies before working with it. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	e88b5732ad	scoutfs: track trans seq in btree Currently the server tracks the outstanding transaction sequence numbers that clients have open in a simple list in memory. It's not properly cleaned up if a client unmounts and a new server that takes over after a crash won't know about open transaction sequence numbers. This stores open transaction sequence numbers in a shared persistent btree instead of in memory. It removes tracking for clients as they send their farewell during unmount. A new server that starts up will see existing entries for clients that were created by old servers. This fixes a bug where a client who unmounts could leave behind a pending sequence number that would never be cleaned up and would indefinitely limit the visibility of index items that came after it. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	ec0fb5380a	scoutfs: implement lock recovery When a server crashes all the connected clients still have operational locks and can be using them to protect IO. As a new server starts up its lock service needs to account for those outstanding locks before granting new locks to clients. This implements lock recovery by having the lock service recover locks from clients as it starts up. First the lock service stores records of connected clients in a btree off the super block. Records are added as the server receives their greeting and are removed as the server receives their farewell. Then the server checks for existing persistent records as it starts up. If it finds any it enters recovery and waits for all the old clients to reconnect before resuming normal processing. We add lock recover request and response messages that are used to communicate locks from the clients to the server. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	74366f0df1	scoutfs: make networking more reliable The current networking code has loose reliability guarantees. If a connection between the client and server is broken then the client reconnects as though its an entirely new connection. The client resends requests but no responses are resent. A client's requests could be processed twice on the same server. The server throws away disconnected client state. This was fine, sort of, for the simple requests we had implemented so far. It's not good enough for the locking service which would prefer to let networking worry about reliable message delivery so it doesn't have to track and replay partial state across reconnection between the same client and server. This adds the infrastructure to ensure that requests and responses between a given client and server will be delivered across reconnected sockets and will only be processed once. The server keeps track of disconnected clients and restores state if the same client reconnects. This required some work around the greetings so that clients and servers can recognize each other. Now that the server remembers disconnected clients we add a farewell request so that servers can forget about clients that are shutting down and won't be reconnecting. Now that connections between the client and server are preserved we can resend responses across reconnection. We add outgoing message sequence numbers which are used to drop duplicates and communicate the received sequence back to the sender to free responses once they're received. When the client is reconnecting to a new server it resets its receive state that was dependent on the old server and it drops responses which were being sent to a server instance which no longer exists. This stronger reliable messaging guarantee will make it much easier to implement lock recovery which can now rewind state relative to requests that are in flight and replay existing state on a new server instance. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	675275fbf1	scoutfs: use hdr.fsid in greeting instead of id The network greeting exchange was mistakenly using the global super block magic number instead of the per-volume fsid to identify the volumes that the endpoints are working with. This prevented the check from doing its only job: to fail when clients in one volume try to connect to a server in another. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	288d781645	scoutfs: start and stop server with quorum Currently all mounts try to get a dlm lock which gives them exclusive access to become the server for the filesystem. That isn't going to work if we're moving to locking provided by the server. This uses quorum election to determine who should run the server. We switch from long running server work blocked trying to get a lock to calls which start and stop the server. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	34b8950bca	scoutfs: initial lock server core Add the core lock server code for providing a lock service from our server. The lock messages are wired up but nothing calls them. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	7e9d40d65a	scoutfs: init ret when freeing zero extents The server forgot to initialize ret to 0 and might return undefined errnos if a client asked it to free zero extents. Signed-off-by: Zach Brown <zab@versity.com>	2018-09-12 15:37:45 -07:00
Zach Brown	2cc990406a	scoutfs: compact using net requests Currently compaction is only performed by one thread running in the server. Total metadata throughput of the system is limited by only having one compaction operation in flight at a time. This refactors the compaction code to have the server send compaction requests to clients who then perform the compaction and send responses to the server. This spreads compaction load out amongst all the clients and greatly increases total compaction throughput. The manifest keeps track of compactions that are in flight at a given level so that we maintain segment count invariants with multiple compactions in flight. It also uses the sparse bitmap to lock down segments that are being used as inputs to avoid duplicating items across two concurrent compactions. A server thread still coordinates which segments are compacted. The search for a candidate compaction operation is largely unchanged. It now has to deal with being unable to process a compaction because its segments are busy. We add some logic to keep searching in a level until we find a compaction that doesn't intersect with current compaction requests. If there are none at the level we move up to the next level. The server will only issue a given number of compaction requests to a client at a time. When it needs to send a compaction request it rotates through the current clients until it finds one that doesn't have the max in flight. If a client disconnects the server forgets the compactions it had sent to that client. If those compactions still need to be processed they'll be sent to the next client. The segnos that are allocated for compaction are not reclaimed if a client disconnects or the server crashes. This is a known deficiency that will be addressed with the broader work to add crash recovery to the multiple points in the protocol where the server and client trade ownership of persistent state. The server needs to block as it does work for compaction in the notify_up and response callbacks. We move them out from under spin locks. The server needs to clean up allocated segnos for a compaction request that fails. We let the client send a data payload along with an error response so that it can give the server the id of the compaction that failed. Signed-off-by: Zach Brown <zab@versity.com>	2018-08-28 15:34:30 -07:00
Zach Brown	62d6c11e3c	scoutfs: clean up workqueue flags We had gotten a bit sloppy with the workqueue flags. We needed _UNBOUND in some workqueues where we wanted concurrency by scheduling across cpus instead of waiting for the current (very long running) work on a cpu to finish. We add NON_REENTRANT out of an abundance of caution. It has gone away in modern kernels and is probably not needed here, but according to the docs we would want it so we at least document that fact by using it. Signed-off-by: Zach Brown <zab@versity.com>	2018-08-28 15:34:30 -07:00
Zach Brown	0adbd7e439	scoutfs: have server track connected clients This extends the notify up and down calls to let the server keep track of connected clients. It adds the notion of per-connection info that is allocated for each connection. It's passed to the notification callbacks so that callers can have per-client storage without having to manage allocations in the callbacks. It adds the node_id argument to the notification callbacks to indicate if the call is for the listening socket itself or an accepted client connection on that listening socket. Signed-off-by: Zach Brown <zab@versity.com>	2018-08-28 15:34:30 -07:00
Zach Brown	746293987c	scoutfs: let server send msg to specific node_id The current sending interfaces only send a message to the peer of a given connection. For the server to send to a specific connected client it'd have to track connections itself and send to them. This adds a sending interface that uses the node_id to send to a specific connected client. The conn argument is the listening socket and its accepted sockets are searched for the destination node_id. Signed-off-by: Zach Brown <zab@versity.com>	2018-08-28 15:34:30 -07:00
Zach Brown	8b3193ea72	scoutfs: server allocates node_id Today node_ids are randomly assigned. This adds the risk of failure from random number generation and still allows for the risk of collisions. Switch to assigning strictly advancing node_ids on the server during the initial connection greeting message exchange. This simplifies the system and allows us to derive information from the relative values of node_ids in the system. To do this we refactor the greeting code from internal to the net layer to proper client and server request and response processing. This lets the server manage persistent node_id storage and allows the client to wait for a node_id during mount. Now that net_connect is sync in the client we don't need the notify_up callback anymore. The client can perform those duties when the connect returns. The net code still has to snoop on request and response processing to see when the greetings have been exchange and allow messages to flow. Signed-off-by: Zach Brown <zab@versity.com>	2018-08-28 15:34:30 -07:00
Zach Brown	a25b6324d2	scoutfs: maintain free_blocks in one place The free_blocks counter in the super is meant to track the number of total blocks in the primary free extent index. Callers of extent manipulation were trying to keep it in sync with the extents. Segment allocation was allocating extents manually using a cursor. It forgot to update free_blocks. Segment freeing then freed the segment as an extent which did update free_blocks. This created ever accumulating free blocks over time which eventually pushed it greater than total blocks and caused df to report negative usage. This updates the free_blocks count in server extent io which is the only place we update the extent items themselves. This ensures that we'll keep the count in sync with the extent items. Callers don't have to worry about it. Signed-off-by: Zach Brown <zab@versity.com> T# with '#' will be ignored, and an empty message aborts the commit.	2018-08-21 13:25:05 -07:00
Zach Brown	d708421cfb	scoutfs: remove unused client and server code The previous commit added shared networking code and disabled the old unused code. This removes all that unused client and server code that was refactored to become the shared networking code. Signed-off-by: Zach Brown <zab@versity.com>	2018-07-27 09:50:21 -07:00
Zach Brown	17dec65a52	scoutfs: add bidirectional network messages The client and server networking code was a bit too rudimentary. The existing code only had support for the client synchronously and actively sending requests that the server could only passively respond to. We're going to need the server to be able to send requests to connected clients and it can't block waiting for responses from each one. This refactors sending and receiving in both the client and server code into shared networking code. It's built around a connection struct that then holds the message state. Both peers on the connection can send requests and send responses. The existing code only retransmitted requests down newly established connections. Requests could be processed twice. This adds robust reliability guarantees. Requests are resend until their response is received. Requests are only processed once by a given peer, regardless of the connection's transport socket. Responses are reiably resent until acknowledged. This only adds the new refactored code and disables the old unused code to keep the diff foot print minmal. A following commit will remove all the unused code. Signed-off-by: Zach Brown <zab@versity.com>	2018-07-27 09:50:21 -07:00
Zach Brown	295bf6b73b	scoutfs: return free extents to server Freed file data extents are tracked in free extent items in each node. They could only be re-used in the future for file data extent allocation on that node. Allocations on other nodes or, critically, segment allocation on the server could never see those free extents. With the right allocation patterns, particularly allocating on node X and freeing on node Y, all the free extents can build up on a node and starve other allocations. This adds a simple high water mark after which nodes start returning free extents to the server. From there they can satisfy segment allocations or be sent to other nodes for file data extent allocation. Signed-off-by: Zach Brown <zab@versity.com>	2018-07-05 16:19:31 -07:00
Zach Brown	e19716a0f2	scoutfs: clean up super block use The code that works with the super block had drifted a bit. We still had two from an old design and we weren't doing anything with its crc. Move to only using one super block at a fixed blkno and store and verify its crc field by sharing code with the btree block checksumming. Signed-off-by: Zach Brown <zab@versity.com>	2018-06-29 15:56:42 -07:00
Zach Brown	002daf3c1c	scoutfs: return -ENOSPC to client alloc segno The server send_reply interface is confusing. It uses errors to shut down the connection. Clients getting enospc needs to happen in the message reply payload. The segno allocation server processing needs to set the segno to 0 so that the client gets it and translates that into -ENOSPC. Signed-off-by: Zach Brown <zab@versity.com>	2018-06-29 14:42:06 -07:00
Zach Brown	2efba47b77	scoutfs: satisfy large allocs with smaller extents The previous fallocate and get_block allocators only looked for free extents larger than the requested allocation size. This prematurely returns -ENOSPC if a very large allocation is attempted. Some xfstests stress low free space situations by fallocating almost all the free space in the volume. This adds an allocation helper function that finds the biggest free extent to satisfy an allocation, psosibly after trying to get more free extents from the server. It looks for previous extents in the index of extents by length. This builds on the previously added item and extent _prev operations. Allocators need to then know the size of the allocation they got instead of assuming they got what they asked for. The server can also return a smaller extent so it needs to communicate the extent length, not just its start. Signed-off-by: Zach Brown <zab@versity.com>	2018-06-29 14:42:06 -07:00
Zach Brown	04660dbfee	scoutfs: add scoutfs_extent_prev() Add an extent function for iterating backwards through extents. We add the wrapper and have the extent IO functions call their storage _prev functions. Data extent IO can now call the new scoutfs_item_prev(). Signed-off-by: Zach Brown <zab@versity.com>	2018-06-29 14:42:06 -07:00
Zach Brown	9c74f2011d	scoutfs: add server work tracing Add some server workqueue and work tracing to chase down the destruction of an active workqueue. Signed-off-by: Zach Brown <zab@versity.com>	2018-06-29 14:42:06 -07:00
Zach Brown	41c29c48dd	scoutfs: add extent corruption cases The extent code was originally written to panic if it hit errors during cleanup that resulted in inconsistent metadata. The more reasonble strategy is to warn about the corruption and act accordingly and leave it to corrective measures to resolve the corruption. In this case we continue returning the error that caused us to try and clean up. Signed-off-by: Zach Brown <zab@versity.com>	2018-06-29 14:42:06 -07:00
Zach Brown	1b3645db8b	scoutfs: remove dead server allocator code Remove the bitmap segno allocator code that the server used to use to manage allocations. Signed-off-by: Zach Brown <zab@versity.com>	2018-06-29 14:42:06 -07:00
Zach Brown	c01a715852	scoutfs: use extents in the server allocator Have the server use the extent core to maintain free extent items in the allocation btree instead of the bitmap items. We add a client request to allocate an extent of a given length. The existing segment alloc and free now work with a segment's worth of blocks. The server maintains counters in the super block of free blocks instead of free segments. We maintain an allocation cursor so that allocation results tend to cycle through the device. It's stored in the super so that it is maintained across server instances. This doesn't remove unused dead code to keep the commit from getting too noisy. It'll be removed in a future commit. Signed-off-by: Zach Brown <zab@versity.com>	2018-06-29 14:42:06 -07:00
Zach Brown	f3007f10ca	scoutfs: shut down server on commit errors We hadn't yet implemented any error handling in the server when commits fail. Commit errors are serious and we take them as a sign that something has gone horribly wrong. This patch prints commit error warnings to the console and shuts down. Clients will try to reconnect and resend their requests. The hope is that another server will be able to make progress. But this same node could become the server again and it could well be that the errors are persistent. The next steps are to implement server startup backoff, client retry backoff, and hard failure policies. Signed-off-by: Zach Brown <zab@versity.com>	2018-05-01 11:48:19 -07:00
Zach Brown	24cc5cc296	scoutfs: lock manifest root request The manifest root request processing samples the stable_manifest_root in the server info. The stable_manifest_root is updated after a commit has suceeded. The read of stable_manifest_root in request processing was locking the manifest. The update during commit doesn't lock the manifest so these paths were racing. The race is very tight, a few cpu stores, but it could in theory give a client a malformed root that could be misinterpreted as corruption. Add a seqcount around the store of the stable manifest root during commit and its load during request processing. This ensures that clients always get a consistent manifest root. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-27 09:06:35 -07:00
Zach Brown	8061a5cd28	scoutfs: add server bind warning Emit an error message if the server fails to bind. It can mean that there is a bad configured address. But we might want to be able to bind if the address becomes available, so we don't hard error. We only emit the message once for a series of failures. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-13 15:49:14 -07:00
Zach Brown	9148f24aa2	scoutfs: use single small key struct Variable length keys lead to having a key struct point to the buffer that contains the key. With dirents and xattrs now using small keys we can convert everyone to using a single key struct and significantly simplify the system. We no longer have a seperate generic key buf struct that points to specific per-type key storage. All items use the key struct and fill out the appropriate fields. All the code that paired a generic key buf struct and a specific key type struct is collapsed down to a key struct. There's no longer the difference between a key buf that shares a read-only key, has it's own precise allocation, or has a max size allocation for incrementing and decrementing. Each key user now has an init function fills out its fields. It looks a lot like the old pattern but we no longer have seperate key storage that the buf points to. A bunch of code now takes the address of static key storage instead of managing allocated keys. Conversely, swapping now uses the full keys instead of pointers to the keys. We don't need all the functions that worked on the generic key buf struct because they had different lengths. Copy, clone, length init, memcpy, all of that goes away. The item API had some functions that tested the length of keys and values. The key length tests vanish, and that gets rid of the _same() call. The _same_min() call only had one user who didn't also test for the value length being too large. Let's leave caller key constraints in callers instead of trying to hide them on the other side of a bunch of item calls. We no longer have to track the number of key bytes when calculating if an item population will fit in segments. This removes the key length from reservations, transactions, and segment writing. The item cache key querying ioctls no longer have to deal with variable length keys. The simply specify the start key, the ioctls return the number of keys copied instead of bytes, and the caller is responsible for incrementing the next search key. The segment no longer has to store the key length. It stores the key struct in the item header. The fancy variable length key formatting and printing can be removed. We have a single format for the universal key struct. The SK_ wrappers that bracked calls to use preempt safe per cpu buffers can turn back into their normal calls. Manifest entries are now a fixed size. We can simply split them between btree keys and values and initialize them instead of allocating them. This means that level 0 entries don't have their own format that sorts by the seq. They're sorted by the key like all the other levels. Compaction needs to sweep all of them looking for the oldest and read can stop sweeping once it can no longer overlap. This makes rare compaction more expensive and common reading less expensive, which is the right tradeoff. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-04 09:15:27 -05:00
Zach Brown	c76c6582f0	scoutfs: release server conn under mutex I was rarely seeing null derefs during unmount. The per-mount listening scoutfs_server_func() was seeing null sock->ops as it called kernel_sock_shutdown() to shutdown the connected client sockets. sock_release() sets the ops to null. We're not supposed to use a socket after we call it. The per-connection scoutfs_server_recv_func() calls sock_release() as it tears down its connection. But it does this before it removes the connection from the listener's list. There's a brief window where the connection's socket has been released but is still visible on the list. If the listener tries to shutdown during this time it will crash. Hitting this window depends on scheduling races during unmount. The unmount path has the client close its connection to the server then the server closes all its connected clients. If the local mount is the server then it will have recv work see an error as the client disconnects and it will be racing to shut down the connection with the listening thread during unmount. I think I only saw this in my guests because they're running slower debug kernels on my slower laptop. The window of vulnerability while the released socket is on the list is longer. The fix is to release the socket while we hold the mutex and are removing the connection from the list. A released socket is never visible on the list. While we're at it don't use list_for_each_entry_safe() to iterate over the connection list. We're not modifying it. This is an lingering artifact from previous versions of the server code. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-22 14:27:01 -08:00
Zach Brown	f52dc28322	scoutfs: simplify lock use of kernel dlm We had an excessive number of layers between scoutfs and the dlm code in the kernel. We had dlmglue, the scoutfs locks, and task refs. Each layer had structs that track the lifetime of the layer below it. We were about to add another layer to hold on to locks just a bit longer so that we can avoid down conversion and transaction commit storms under contention. This collapses all those layers into simple state machine in lock.c that manages the mode of dlm locks on behalf of the file system. The users of the lock interface are mainly unchanged. We did change from a heavier trylock to a lighter nonblock lock attempt and have to change the single rare readpage use. Lock fields change so a few external users of those fields change. This not only removes a lot of code it also contains functional improvements. For example, it can now convert directly to CW locks with a single lock request instead of having to use two by first converting to NL. It introduces the concept of an unlock grace period. Locks won't be dropped on behalf of other nodes soon after being unlocked so that tasks have a chance to batch up work before the other node gets a chance. This can result in two orders of magnitude improvements in the time it takes to, say, change a set of xattrs on the same file population from two nodes concurrently. There are significant changes to trace points, counters, and debug files that follow the implementation changes. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-14 15:00:17 -08:00
Zach Brown	4ff1e3020f	scoutfs: allocate inode numbers per directory Having an inode number allocation pool in the super block meant that all allocations across the mount are interleaved. This means that concurrent file creation in different directories will create overlapping inode numbers. This leads to lock contention as reasonable work loads will tend to distribute work by directories. The easy fix is to have per-directory inode number allocation pools. We take the opportunity to clean up the network request so that the caller gets the allocation instead of having it be fed back in via a weird callback. Signed-off-by: Zach Brown <zab@versity.com>	2018-02-09 17:58:19 -08:00
Zach Brown	ec91a4375f	scoutfs: unlock the server listen lock Turns out the server wasn't explicitly unlocking the listen lock! This ended up working because we only shut down an active server on unmount and unmount will tear down the lock space which will drop the still held listen lock. That's just dumb. But it also forced using an awkward lock flag to avoid setting up a task ref for the lock hold which wouldn't have been torn down otherwise. By adding the lock we restore balance to the force and can get rid of that flag. Cool, cool, cool. Signed-off-by: Zach Brown <zab@versity.com>	2017-12-08 17:00:44 -06:00
Mark Fasheh	8064a161f0	scoutfs: better tracking of recursive lock holders This replaces the fragile recursive locking logic in dlmglue. In particular that code fails when we have a pending downconvert and a process comes in for a level that's compatible with the existing level. The downconvert will still happen which causes us to now believe we are holding a lock that we are not! We could go back to checking for holders that raced our downconvert worker but that had problems of its own (see commit e8f7ef0). Instead of trying to infer from lock state what we are allowed to do, let's be explicit. Each lock now has a tree of task refs. If you come in to acquire a lock, we look for our task in that tree. If it's not there, we know this is the first time this task wanted that lock, so we can continue. Otherwise we incremement a count on the task ref and return the already locked lock. Unlock does the opposite - it finds the task ref and decreases the count. On zero it will proceed with the actual unlock. The owning task is the only process allowed to manipulate a task ref, so we only have to lock manipulation of the tree. We make an exception for global locks which might be unlocked from another process context (in this case that means the node id lock). Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-12-08 10:25:30 -08:00
Zach Brown	cb879d9f37	scoutfs: add network greeting message Add a network greeting message that's exchanged between the client and server on every connection to make sure that we have the correct file system and format hash. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-12 13:57:31 -07:00
Zach Brown	1da18d17cf	scoutfs: use trylock for global server lock Shared unmount hasn't worked for a long time because we didn't have the server work woken out of blocking trying to acquire the lock. In the old lock code the wait conditions didn't test ->shutdown. dlmglue doesn't give us a reasonable way to break a caller out of a blocked lock. We could add some code to do it with a global context that'd have to wake all locks or add a call with a lock resource name, not a held lock, that'd wake that specific lock. Neither sound great. So instead we'll use trylock to get the server lock. It's guaranteed to make reasonble forward progress. The server work is already requeued with a delay to retry. While we're at it we add a global server lock instead of using the weird magical inode lock in the fs space. The server lock doesn't need keys or to participate in item cache consistency, etc. With this unmount works. All mounts will now generate regular background trylock requests. Signed-off-by: Zach Brown <zab@versity.com>	2017-10-09 15:31:29 -07:00
Zach Brown	7854471475	scoutfs: fix server wq destory warning We were seeing warnings in destroy_workqueue() which meant that work was queued on the server workqueue after it was drained and before it was finally destroyed. The only work that wasn't properly waited for was the commit work. It looks like it'd be idle because the server receive threads all wait for their request processing work to finish. But the way the commit work is batched means that a request can have its commit processed by executing commit work while leaving the work queued for another run. Fix this by specifically waiting for the commit work to finish after the server work has waited for all the recv and compaction work to finish. I wasn't able to reliably trigger the assertion in repeated xfstests runs. This survived many runs also, let's see if it stops the destroy_workqueue() assertion from triggering in the future. Signed-off-by: Zach Brown <zab@versity.com>	2017-09-12 15:22:03 -07:00
Zach Brown	51e03dcb7a	scoutfs: refactor inode locking function This is based on Mark Fasheh <mfasheh@versity.com>'s series that introduced inode refreshing after locking and a trylock for readpage. Rework the inode locking function so that it's more clearly named and takes flags and the inode struct. We have callers that want to lock the logical inode but aren't doing anything with the vfs inode so we provide that specific entry point. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-30 10:37:59 -07:00
Zach Brown	87ab27beb1	scoutfs: add statfs network message The ->statfs method was still using the super_block in the super_info that was read during mount. This will get progressively more out of date. We add a network message to ask the server for the current fields that impact statfs. This is always racy and the fields are mostly nonsense, but we try our best. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-11 10:43:35 -07:00
Zach Brown	c1b2ad9421	scoutfs: separate client and server net processing The networking code was really suffering by trying to combine the client and server processing paths into one file. The code can be a lot simpler by giving the client and server their own processing paths that take their different socket lifecysles into account. The client maintains a single connection. Blocked senders work on the socket under a sending mutex. The recv path runs in work that can be canceled after first shutting down the socket. A long running server work function acquires the listener lock, manages the listening socket, and accepts new sockets. Each accepted socket has a single recv work blocked waiting for requests. That then spawns concurrent processing work which sends replies under a sending mutex. All of this is torn down by shutting down sockets and canceling work which frees its context. All this restructuring makes it a lot easier to track what is happening in mount and unmount between the client and server. This fixes bugs where unmount was failing because the monolithic socket shutdown function was queueing other work while running while draining. Signed-off-by: Zach Brown <zab@versity.com>	2017-08-04 10:47:42 -07:00

42 Commits