scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-01-08 13:01:23 +00:00

Author	SHA1	Message	Date
Bryant G. Duffy-Ly	3ae0ebd0d8	Fix block-stale-read test case The current test case attempts to create a state to read by calling setattr and getattr in attempt to force block cache reads. It so happens that this does not always force cache block reads, which in rare cases causes this test case to fail. The new test case removes all the extra bouncing around of mount points and we just directly call scoutfs df which will walk everyone's allocators to summarize the block counts, which is guaranteed to exist. Therefore, we do not have to create any sort of state prior to trying to force a read. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-19 15:41:54 -06:00
Zach Brown	714b7f2a84	Merge pull request #54 from bgly/bduffyly/abort_conn Fix client/server abort conn on force unmount	2021-11-09 13:29:20 -08:00
Zach Brown	945f8b4828	Merge pull request #58 from bgly/bduffyly/print_data Fix scoutfs print <data_dev> hang	2021-11-09 09:50:14 -08:00
Zach Brown	b5ccefeeb9	Merge pull request #59 from versity/zab/v1_release_notes Add release notes with the 1.0 GA release v1.0	2021-11-08 16:09:42 -08:00
Zach Brown	ea08942824	Add release notes with the 1.0 GA release Let's try maintaining release notes in a file in the repo. There are lots of schemes for associating commits and release notes and this seems like the simplest place to start. Signed-off-by: Zach Brown <zab@versity.com>	2021-11-08 14:42:33 -08:00
Bryant G. Duffy-Ly	95f2a87864	Fix scoutfs print <data_dev> hang If a user tries to print a data device exit early if it is data device. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-08 16:16:13 -06:00
Bryant G. Duffy-Ly	38ee2defd5	Add a filter for forced unmount error output [85164.299902] scoutfs f.8c19e1.r.facf2e error: server error writing btree blocks: -5 [144308.589596] scoutfs f.c9397a.r.8ae97f error: server error -5 freeing merged btree blocks: looping commit del/upd freeing item [174646.005596] scoutfs f.15f0b3.r.1862df error: server error -5 freeing merged btree blocks: final commit del/upd freeing item [146653.893676] scoutfs f.c7f188.r.34e23c error: server error writing super block: -5 [273218.436675] scoutfs f.dd4157.r.f0da7e error: server failed to bind to 127.0.0.1:42002, err -98 [376832.542823] scoutfs f.049985.r.1a8987 error: error -5 reading quorum block 19 to update event 1 term 3 The above is an example output that will be filtered out Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-08 07:36:02 -06:00
Bryant G. Duffy-Ly	0fc8ccb122	Fix exiting out of btree_walk early for force_umnt We do not want to short-circuit btree_walk early, it is better to handle the force unmount on the caller side. Therefore, remove this from btree_walk. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-05 15:21:09 -05:00
Bryant G. Duffy-Ly	e4a3c2b95d	Break client/server out of waiting network replies If there is a forced unmount we call _net_shutdown from umount_begin in order to tell the server and clients to break out of pending network replies. We then add the call to abort within the shutdown_worker since most of the mucking with send and resend queues are all done there. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-05 15:21:04 -05:00
Bryant G. Duffy-Ly	cf4e6611d3	Fix inconsistency assertions at commit_log_merge Only BUG_ON for inconsistency and not do it for commit errors or failure to delete the original request. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-05 15:18:57 -05:00
Bryant G. Duffy-Ly	65429a9cc4	Ensure that writer_init and alloc_init are cleaned In scoutfs_server_worker we do not properly handle the clean up of _block_writer_init and alloc_init. On error paths we can clean up the context if either of thoes are initialized we can call alloc_prepare_commit or writer_forget_all to ensure we drop the block references and clear the dirty status of all the blocks in the writer. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-05 15:05:42 -05:00
Zach Brown	d764ed7c43	Merge pull request #57 from versity/zab/update_readme Update README.md	2021-11-05 11:34:44 -07:00
Zach Brown	465e5ee769	Update README.md Remove a bunch of old language from the README. We're no longer in the early days of the open release so we can remove all the alpha quality language. And the system has grown sufficiently that the repo README isn't a great place for a small getting started doc. There just isn't room to do the subject justice. If we need such a thing for the project we'll put it as a first order doc in the repo that'd be distributed along with everything else. Signed-off-by: Zach Brown <zab@versity.com>	2021-11-05 11:16:57 -07:00
Bryant G. Duffy-Ly	83a6bbb640	Fix inconsistency in server_log_merge_free_work In order to safely free blocks we need to first dirty the work. This allows for resume later on without a double free. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-03 17:09:51 -05:00
Zach Brown	f02d68f567	Merge pull request #55 from versity/zab/v1_format_version Zab/v1 format version	2021-11-03 10:18:50 -07:00
Zach Brown	5d6a510e25	Merge pull request #56 from versity/zab/xattr_shrink_bad_items Fix xattr update out of bounds access	2021-11-02 10:17:06 -07:00
Zach Brown	1b4d291bf7	Fix xattr update out of bounds access As we update xattrs we need to update any existing old items with the contents of the new xattr that uses those items. The loop that updated existing items only took the old xattr size into account and assumed that the new xattr would use those items. If the new xattr size used fewer parts then the attempt to update all the old parts that weren't covered by the new size would go very wrong. The length of the region in the new xattr would be negative so it'd try to use the max part length. Worse, it'd copy these max part length regions outside the input new xattr buffer. Typically this would land in addressible memory and copy garbage into the unused old items before they were later deleted. However, it could access so far outside the input buffer that it could cross a page boudary into inaccessible memory and fault. We saw this in the field while trying to repeatedly incrementally shrink a large xattr. This fixes the loop that updates overlapping items between the new and old xattr to start with the smaller of their two item counts. Now it will only update items that are actually used by both xattrs and will only safely access the new xattr input buffer. Signed-off-by: Zach Brown <zab@versity.com>	2021-11-01 11:33:17 -07:00
Zach Brown	223ee5deef	Declare v1 of the stable persistent format From now on if we make incompatible changes to structures or messages then we update the format version and ensure that the code can deal with all the versions in its supported range. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	8f60ac06c5	Clean up our ioctl numbers We had arbitrarily chosen an ioctl code 's' to match scoutfs, but of course that conflicts. This chooses an arbitrary hole in the upstream reservations from inode-numbers.rst. Then we make sure to have our _IO[WR] usage reflect the direction of the final type paramater. For most of our ioctls userspace is writing an argument parameter to perform an operation (that often has side effects). Most of our ioctls should be _IOW because userspace is writing the parameter, not _IOR (though the operation tends to read state). A few ioctls copy output back to userspace in the parameter so they're _IOWR. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	932a842ae3	Remove valid_bytes from stat _more ioctls The idea here was that we'd expand the size of the struct and valid_bytes would tell the kernel which fields were present in userspace's struct. That doesn't combine well with the ioctl convention of having the size of the type baked into the ioctl number. We'll remove this to make the world less surprising. If we expand the interface we'd add additional ioctls and types. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	618a7a4c47	Remove unused lock server alloc and wri While checking in on some other code I noticed that we have lingering allocator and writer contexts over in the lock server. The lock server used to manage its own client state and recovery. We've sinced moved that into shared recov functionality in the server. The lock server no longer manipulates its own btrees and doesn't need these unused references to the server's contexts. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	9ebf43db99	Spread out key zone and type values Introduce some space between the current key zone and type values so that we have room to insert new keys amongst the current keys if we need to. A spacing of 4 is arbitrarily chosen as small enough to still give us intuitively small numbers while leaving enough room to grow, given how long its taken to come to the current number of keys. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	e38beee85a	Stop using inode index key type as array index The code that updates inode index items on behalf of indexed fields uses an array to track changes in the fields. Those array indexes were the raw key type values. We're about to introduce some sparse space between all the key values so that we have some room to add keys in the future at arbitrary sort positions amongst the previous keys. We don't want the inode index item updating code to keep using raw types as array indices when the type values are no longer small dense values. We introduce indirection from type values to array indices to keep the tracking array in the in-memory inode struct small. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	20ac2e35fa	Remove clock_sync field from net message As we freeze the format let's remove this old experiment to try and make it easier to line up traces from different mounts. It never worked particularly well and I think it could be argued that trying to merge trace logs on different machines isn't a particularly meaningful thing to do. You care about how they interact not what they were doing at the same time with their indepdendent resources. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	80ee2c6d57	Harden client transaction processing There are a few bad corner cases in the state machine that governs how client transactions are opened, modified, and committed. The worst problem is on the server side. All server request handlers need to cope with resent requests without causing bad side effects. Both get_log_trees and commit_log_trees would try to fully processes resent requests. _get_log_trees() looks safe because it works with the log_trees that was stored previously. _commit_log_trees() is not safe because it can rotate out the srch log file referenced by the sent log_trees every time it's processed. This could create extra srch entries which would delete the first instance of entries. Worse still, by injecting the same block structure into the system multiple times it ends up causing multiple frees of the blocks that make up the srch file. The client side problems are slightly different, but related. There aren't strong constraints which guarantee that we'll only send a commit request after a get request succeeds. In crazy circumstances the commit request in the write worker could come before the first get in mount succeeds. Far worse is that we can send multiple commit requests for one transaction if it changes as we get errors during multiple queued write attempts, particularly if we get errors from get_log_trees after having successfully committed. This hardens all these paths to ensure a strict sequence of get_log_trees, transaction modification, and commit_log_trees. On the server we add _trans_seq fields to the log_trees struct so that both get_ and commit_ can see that they've already prepared a commit to send or have already committed the incoming commit, respectively. We can use the get_trans_seq field as the trans_seq of the open transaction and get rid of the entire seperate mechanism we used to have for tracking open trans seqs in the clients. We can get the same info by walking the log_trees and looking at their _trans_seq fields. In the client we have the write worker immediately return success if mount hasn't opened the first transaction. Then we don't have the worker return to allow further modification until it has gotten success from get_log_trees. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	42c4c6dd24	Move transaction sbi fields to trans_info The transaction code was built a million years ago and put all of its data in our core super block info. This finally moves the rest of the private transaction fields out of the core super block and into the transaction info. This makes it clear that it's private to trans.c and brings it line with the rest of the subsystems in the tree. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	7d71b610af	Add server extent motion tracking Add tracking in the alloc functions that the server uses to move extents between allocator structures on behalf of client mounts. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	70ede28e39	Remove unused traced_extent leavings Remove some lingering support helpers for the traced_extent struct that we haven't used in a while. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	b477604339	Don't clobber srch compact errors The srch compaction worker will wait a bit before attempting another compaction as it finishes a compaction that failed. Unfortunately, it clobbered the errors it got during compaction with the result of sending the commit to the server with the error flag. If the commit is successful then it thinks there were no errors and immediately re-queues itself to try the next compaction. If the error is persistent, as it was with a bug in how we merged log files with a single page's worth of entries, then we can spin indefinitely getting and error, clobbering the error with the commit result, and immediately queueing our work to do it all over again. This fix preserves existing errors when geting the result of the commit and will correctly back off. If we get persistent merge errors at least they won't consume significant resources. We add a counter for commit for the errors so we can get some visibility if this happens. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	75f9aabe75	Allow compacting logs down to a single page The k-way merge function at the core of the srch file entry merging had some bookkeeping math (calculating number of parents) that couldn't handle merging a single incoming entry stream, so it threw a warning and returned an error. When refusing to handle that case, it was assuming that caller was trying to merge down a single log file which doesn't make any sense. But in the case of multiple small unsorted logs we can absolutely end up with their entries stored in one sorted page. We have one sorted input page that's merging multiple log files. The merge function is also the path that writes to the output file so we absolutely need to handle this case. We more carefully calculate the number of parents, clamping it to one parent when we'd otherwise get "(roundup(1) -> 1) - 1 == 0" when calculating the number of parents from the number of inputs. We can relax the warning and error to refuse to merge nothing. The test triggers this case by putting single search entries in the log files for mounts and unmounting them to force rotation of the mount log files into mergable rotated log files. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	cf512c5fcf	Use inode_count field for statfs file counts Our statfs implementation had clients reading the super block and using the next free inode number to guess how many inodes there might be. We are very aggressive with giving directories private pools of inode numbers to allocate from. They're often not used at all, creating huge gaps in allocated inode numbers. The ratio of the average number of allocations per directory to the batch size given to each directory is the factor that the used inode count can be off by. Now that we have a precise count of active inodes we can use that to return accurate counts of inodes in the files fields in the statfs struct. We still don't have static inode allocation so the fields don't make a ton of sense. We fake the total and free count to give a reasonable estimate of the total files that doesn't change while the free count is calculated from the correct count of used inodes. While we're at it we add a request to get the summed fields that the server can cheaply discover in cache rather than having the client always perform read IOs. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	a53d6d1a8e	Add scoutfs_alloc_foreach_super which takes super Add an alloc_foreach variant which uses the caller's super to walk the allocators rather than always reading it off the device. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	95ed36f9d3	Maintain inode count in super and log trees Add a count of used inodes to the super block and a change in the inode count to the log_trees struct. Client transactions track the change in inode count as they create and delete inodes. The log_trees delta is added to the count in the super as finalized log_trees are deleted. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:47 -07:00
Zach Brown	94e5bc1457	Remove unused scoutfs_last_ino() Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:46 -07:00
Zach Brown	366f615c9f	Add support for our format version We had previously started on a relatively simple notion of an interoperability version which wasn't quite right. This fleshes out support for a more functional format version. The super blocks have a single version that defines behaviour of the running system. The code supports a range of versions and we add some initial interfaces for updating the version while the system is offline. All of this together should let us safely change the underlying format over time. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:46 -07:00
Zach Brown	ac2587017e	Add write_nr to quorum blocks Add a write_nr field to the quorum block header which is incremented with every write. Each event also gets a write_nr field that is set to the incremented value from the header. This gives us a history of the order of event updates that isn't sensitive to misconfigured time. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:46 -07:00
Zach Brown	1cdcf41ac7	Move more block read/write functions to util We're adding another command that does block IO so move some block reading and writing functions out of mkfs. We also grow a few function variants and call the write_sync variant from mkfs instead of having it manually sync. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:46 -07:00
Zach Brown	024426df28	Add a file for userspace quorum config helpers Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:46 -07:00
Zach Brown	a0690070ae	Don't null terminate our note strings The code that shows the note sections as files uses the section size to define the size of the notes payload. We don't need to null terminate the strings to define their lengths. Doing so puts a null in the notes file which isn't appreciated by many readers. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:46 -07:00
Zach Brown	4e00f95014	run-tests builds our targets with -j The test harness might as well use all cpus when building. It's reasonably safe to assume both that the test systems are otherwise idle and that the build is likely to succeed. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:46 -07:00
Zach Brown	0c95388f3b	Set TCP_USER_TIMEOUT in addition to keepalives TCP keepalive probes only work when the connection is idle. They're not sent when there's unacked send data being retramnsmitted. If the server fails while we're retransmitting we don't break the connection and try to elect and connect to a new server until the very long default conneciton timeouts or the server comes back and the stale connection is aborted. We can set TCP_USER_TIMEOUT to break an unresponsive connection when there's written data. It changes the behavior of the keepalive probes so we rework them a bit to clearly apply our timeout consistently between the two mechanisms. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:30:46 -07:00
Zach Brown	d255dd3b32	Fix SCOUTFs typo in totl name nr define Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:10:45 -07:00
Zach Brown	9b4ac64312	Consistently stop fencing as server stops As the server comes up it needs to fence any previous servers before it assumes exclusive access to the device. If fencing fails it can leave fence requests behind. The error path for these very early failures didn't shut down fencing so we'd have lingering fence requests span the life cycle of server startup and shutdown. The next time the server starts up in this mount it can try to create the fence request again, get an error because a lingering one already exists, and immediately shut down. The result is that fencing errors that hit that initial attempt during server startup can become persistent fencing errors for the lifetime of that mount, preventing it from every successfully starting the server. Moving the fence stop call to hit all exiting error paths consistently clean up fence requests and avoid this problem. The next server instance will get a chance to process the fence request again. It might well hit the same error, but at least it gets a chance. Signed-off-by: Zach Brown <zab@versity.com>	2021-10-28 12:10:45 -07:00
Zach Brown	22f9ab4dab	Merge pull request #53 from bgly/fix_mkdir_test Fix mkdir-rename-rmdir test script	2021-10-26 11:53:15 -07:00
Bryant Duffy-Ly	501953d69e	Fix mkdir-rename-rmdir test script The current script gets stuck in an infinite loop when the test suite is started with 1 mount point. This is due to the advancement part of the script in which it advances the ops for each mount. The current while loop checks for when the op_mnt wraps by checking if it equals 0. But the problem is we set each of the op_mnts to 0 during the advancement, so when it wraps it still equates to 0, so it is an infinite loop. Therefore, the fix is to check at the end of the loop check if the last op's mount number wrapped. If so just break out. Signed-off-by: Bryant Duffy-Ly <bduffyly@versity.com>	2021-10-21 11:41:02 -05:00
Bryant Duffy-Ly	66b8c5fbd7	Enhance clarify of some kfree paths In some of the allocation paths there are goto statements that end up calling kfree(). That is fine, but in cases where the pointer is not initially set to NULL then we might have an undefined behavior. kfree on a NULL pointer does nothing, so essentially these changes should not change behavior, but clarifies the code path better. Signed-off-by: Bryant Duffy-Ly <bduffyly@versity.com>	2021-10-06 18:07:27 -05:00
Zach Brown	3c6c2194bd	Merge pull request #51 from versity/zab/totl_xattr_tag Zab/totl xattr tag	2021-09-13 18:06:28 -07:00
Zach Brown	6ca8c0eec2	Consistently initialize dentry info Unfortunately, we're back in kernels that don't yet have d_op->d_init. We allocate our dentry info manually as we're given dentries. The recent verification work forgot to consistently make sure the info was allocated before using it. Fix that up, and while we're at it be a bit more robust in how we check to see that it's been initialized without grabbing the d_lock. Signed-off-by: Zach Brown <zab@versity.com>	2021-09-13 14:41:07 -07:00
Zach Brown	ea2b01434e	Add support for i_version This adds i_version to our inode and maintains it as we allocate, load, modify, and store inodes. We set the flag in the superblock so in-kernel users can use i_version to see changes in our inodes. Signed-off-by: Zach Brown <zab@versity.com>	2021-09-13 14:41:07 -07:00
Zach Brown	d5eec7d001	Fix uninitialized srch ret that won't happen More recent gcc notices that ret in delete_files can be undefined if nr is 0 while missing that we won't call delete_files in that case. Seems worth fixing, regardless. Signed-off-by: Zach Brown <zab@versity.com>	2021-09-13 14:41:07 -07:00

1 2 3 4 5 ...

1534 Commits