scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-01-07 12:35:28 +00:00

Author	SHA1	Message	Date
Zach Brown	07210b5734	Reliably delete orphaned inodes Orphaned items haven't been deleted for quite a while -- the call to the orphan inode scanner has been commented out for ages. The deletion of the orphan item didn't take rid zone locking into account as we moved deletion from being strictly local to being performed by whoever last used the inode. This reworks orphan item management and brings back orphan inode scanning to correctly delete orphaned inodes. We get rid of the rid zone that was always _WRITE locked by each mount. That made it impossible for other mounts to get a _WRITE lock to delete orphan items. Instead we rename it to the orphan zone and have orphan item callers get _WRITE_ONLY locks inside their inode locks. Now all nodes can create and delete orphan items as they have _WRITE locks on the associated inodes. Then we refresh the orphan inode scanning function. It now runs regularly in the background of all mounts. It avoids creating cluster lock contention by finding candidates with unlocked forest hint reads and by testing inode caches locally and via the open map before properly locking and trying to delete the inode's items. Signed-off-by: Zach Brown <zab@versity.com>	2021-07-02 10:52:46 -07:00
Zach Brown	a1d46e1a92	Fix mkfs btree item offset calculation mkfs was miscalculating the offset of the start of the free region in the center of blocks as it populated blocks with items. It was using the length of the free region as its offset in the block. To find the offset of the end of the free region in the block it has to be taken relative to the end of the item array. Signed-off-by: Zach Brown <zab@versity.com>	2021-06-17 09:36:00 -07:00
Zach Brown	3488b4e6e0	Add scoutfs print support for log merge items Add support for printing all the items in the log_merge tree that the server uses to track log merging. Signed-off-by: Zach Brown <zab@versity.com>	2021-06-17 09:36:00 -07:00
Zach Brown	c482204fcf	Clean up btree root printing in superblock Over time the printing of the btree roots embedded in the super block has gotten a little out of hand. Add a helper macro for the printf format and args and re-order them to match their order in the superblock. Signed-off-by: Zach Brown <zab@versity.com>	2021-06-17 09:36:00 -07:00
Zach Brown	9711fef122	Update for core, trans, and item seq use We now have a core seq number in the super that is advanced for multiple users. The client transaction seq comes from the core seq so we remove the trans_seq from the super. The item version is also converted to use a seq that's derived from the core seq. Signed-off-by: Zach Brown <zab@versity.com>	2021-06-17 09:36:00 -07:00
Zach Brown	38a4a56741	Stop writing to other quorum slot blocks The core quorum work loop assumes that it has exclusive access to its slot's quorum block. It uniquely marks blocks it writes and verifies the marks on read to discover if another mount has written to its slot under the assumption that this must be a configuration error that put two mounts in the same slot. But the design of the leader bit in the block violates the invariant that only a slot will write to its block. As the server comes up and fences previous leaders it writes to their block to clear their leader bit. The final hole in the design is that because we're fencing mounts, not slots, each slot can have two mounts in play. An active mount can be using the slot and there can still be a persistent record of a previous mount in the slot that crashed that needs to be fenced. All this comes together to have the server fence an old mount in a slot while a new mount is coming up. The new mount sees the mark change and freaks out and stops participating in quorum. The fix is to rework the quorum blocks so that each slot only writes to its own block. Instead of the server writing to each fenced mount's slot, it writes a fence event to its block once all previous mounts have been fenced. We add a bit of bookkeeping so that the server can discover when all block leader fence operations have completed. Each event gets its own term so we can compare events to discover live servers. We get rid of the write marks and instead have an event that is written as a quorum agent starts up and is then checked on every read to make sure it still matches. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-31 13:10:45 -07:00
Zach Brown	76076011a2	Add scoutfs-fenced man page Signed-off-by: Zach Brown <zab@versity.com>	2021-05-26 14:18:39 -07:00
Zach Brown	bdc0282fa7	Describe fencing in the scoutfs.5 man page Signed-off-by: Zach Brown <zab@versity.com>	2021-05-26 14:18:39 -07:00
Zach Brown	1e460e5cb0	Add scoutfs-fenced and its run scripts to spec Install the scoutfs-fenced daemon and its run scripts in the rpm spec file. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-26 14:18:39 -07:00
Zach Brown	877e30d60f	Add client address to mounted_client item Add the peername of the client's connected socket to its mounted_client item as it mounts. If the client doesn't recover then fencing can use the IP to find the host to fence. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-26 14:18:39 -07:00
Zach Brown	1f1f40f079	Add fence agent that processes fence requests Signed-off-by: Zach Brown <zab@versity.com>	2021-05-26 14:18:28 -07:00
Zach Brown	d0b04e790c	Add data-alloc-zone-blocks argument to mkfs Add an argument to mkfs which sets the data_alloc_zone_blocks volume option. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-21 15:31:02 -07:00
Zach Brown	54644a5074	Add data_alloc_zone_blocks volume option Add the data_alloc_zone_blocks volume option. This changes the behaviour of the server to try and give mounts free data extents which fall in exclusive fixed-size zones. We add the field to the scoutfs_volume_options struct and add it to the set_volopt server handler which enforces constrains on the size of the zones. We then add fields to the log_trees struct which records the size of the zones and sets bits for the zones that contain free extents in the data_avail allocator root. The get_log_trees handler is changed to read all the zone bitmaps from all the items, pass those bitmaps in to _alloc_move to direct data allocations, and finally update the bitmaps in the log_trees items to cover the newly allocated extents. The log_trees data_alloc_zone fields are cleared as the mount's logs are reclaimed to indicate that the mount is no longer writing to the zone. The policy mechanism of finding free extents based on the bitmaps is ipmlemented down in _data_alloc_move(). Signed-off-by: Zach Brown <zab@versity.com>	2021-05-21 15:31:02 -07:00
Zach Brown	9de3ae6dcb	Index free extents by order of length Allocators store free extents in two items, one sorted by their blkno position and the other by their precise length. The length index makes it easy to search for precise extent lengths, but it makes it hard to search for a large extent within a given blkno region. Skipping in the blkno dimension has to be done for every precise length value. We don't need that level of precision. If we index the extents by a coarser order of the length then we have a fixed number of orders in which we have to skip in the blkno dimension when searching within a specific region. This changes the length item to be stored at the log(8) order of the length of the extents. This groups extents into orders that are close to the human-friendly base 10 orders of magnitude. With this change the order field in the key no longer stores the precise extent length. To preserve the length of the extent we need to use another field. The only 64bit field remaining is the first which is a higher comparision priority than the type. So we use the highest comparison priority zone field to differentiate the position and order indexes and can now use all three 64bit fields in the key. Finally, we have to be careful when constructing a key to use _next when searching for a large extent. Previously keys were relying on the magic property that building a key from an extent length of 0 ended up at the key value -0 = 0. That only worked because we never stored zero length extents. We now store zero length orders so we can't use the negative trick anymore. We explicitly treat 0 length extents carefully when building keys and we subtract the order from U64_MAX to store the orders from largest to smallest. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-21 15:25:56 -07:00
Zach Brown	0aa6005c99	Add volume options super, server, and sysfs Introduce global volume options. They're stored in the superblock and can be seen in sysfs files that use network commands to get and set the options on the server. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-19 14:15:06 -07:00
Zach Brown	c6fd807638	Use recov to manage lock recovery Now that we have the recov layer we can have the lock server use it to track lock recovery. The lock server no longer needs its own recovery tracking structures and can instead call recov. We add a call for the server to call to kick lock processing once lock recovery finishes. We can get rid of the persistent lock_client items now that the server is driving recovery from the mounted_client items. Signed-off-by: Zach Brown <zab@versity.com>	2021-04-13 12:10:35 -07:00
Andy Grover	0deb232d3f	Support O_TMPFILE and allow MOVE_BLOCKS into released extents Support O_TMPFILE: Create an unlinked file and put it on the orphan list. If it ever gains a link, take it off the orphan list. Change MOVE_BLOCKS ioctl to allow moving blocks into offline extent ranges. Ioctl callers must set a new flag to enable this operation mode. RH-compat: tmpfile support it actually backported by RH into 3.10 kernel. We need to use some of their kabi-maintaining wrappers to use it: use a struct inode_operations_wrapper instead of base struct inode_operations, set S_IOPS_WRAPPER flag in i_flags. This lets RH's modified vfs_tmpfile() find our tmpfile fn pointer. Add a test that tests both creating tmpfiles as well as moving their contents into a destination file via MOVE_BLOCKS. xfstests common/004 now runs because tmpfile is supported. Signed-off-by: Andy Grover <agrover@versity.com>	2021-04-05 14:23:44 -07:00
Andy Grover	efe5d92458	Reserve space in superblock for IPv6 addresses Define a family field, and add a union for IPv4 and v6 variants, although v6 is not supported yet. Family field is now used to determine presence of address in a quorum slot, instead of checking if addr is zero. Signed-off-by: Andy Grover <agrover@versity.com>	2021-03-12 14:10:42 -08:00
Zach Brown	f18fa0e97a	Update scoutfs print for centralized block_ref Update scoutfs print to use the new block_ref struct instead of the handful of per-block type ref structs that we had accumulated. Signed-off-by: Zach Brown <zab@versity.com>	2021-03-01 09:49:17 -08:00
Zach Brown	9878312b4d	Update man pages for quorum slot changes Update the man pages with descriptions of the new mkfs -Q quorum slot configuration and quorum_slot_nr mount option. Signed-off-by: Zach Brown <zab@versity.com>	2021-02-22 13:28:38 -08:00
Zach Brown	57f34e90e9	Use mounted_client item as sign of farewell As clients unmount they send a farewell request that cleans up persistent state associated with the mount. The client needs to be sure that it gets processed, and we must maintain a majority of quorum members mounted to be able to elect a server to process farewell requests. We had a mechanism using the unmount_barrier fields in the greeting and super_block to let the final unmounting quorum majority know that their farewells have been processed and that they didn't need to keep trying to reconnect. But we missed that we also need this out of band farewell handling signal for non-quorum member clients as well. The server can send farewells to a non-member client as well as the final majority and then tear down all the connections before the non-quorum client can see its farewell response. It also needs to be able to know that its farewell has been processed before the server let the final majority unmount. We can remove the custom unmount_barrier method and instead have all unmounting clients check for their mounted_client item in the server's btree. This item is removed as the last step of farewell processing so if the client sees that it has been removed it knows that it doesn't need to resend the farewell and can finish unmounting. This fixes a bug where a non-quorum unmount could hang if it raced with the final majority unmounting. I was able to trigger this hang in our tests with 5 mounts and 3 quorum members. Signed-off-by: Zach Brown <zab@versity.com>	2021-02-22 13:28:38 -08:00
Zach Brown	79f6878355	Clean up block writing in mkfs scoutfs mkfs had two block writing functions: write_block to fill out some block header fields including crc calculation, and then write_block_raw to pwrite the raw buffer to the bytes in the device. These were used inconsistenly as blocks came and went over time. Most callers filled out all the header fields themselves and called the raw writer. write_block was only used for super writing, which made sense because it clobbered the block's header with the super header so the caller's set header magic and seq fields would be lost. This cleans up the mess. We only have one block writer and the caller provides all the hdr fields. Everything uses it instead of filling out the fields themselves and calling the raw writer. Signed-off-by: Zach Brown <zab@versity.com>	2021-02-22 13:28:38 -08:00
Zach Brown	87fcad5428	Update scoutfs mkfs and print for quorum slots Signed-off-by: Zach Brown <zab@versity.com>	2021-02-22 13:28:38 -08:00
Zach Brown	406d157891	Add stringify macro to utils Add macros for stringifying either the name of a macro or its value. In keeping with making our utils/ sort of look like kernel code, we use the kernel stringify names. Signed-off-by: Zach Brown <zab@versity.com>	2021-02-18 12:57:30 -08:00
Andy Grover	15fd2ccc02	utils: Do not assert if release is given unaligned offset or length This is checked for by the kernel ioctl code, so giving unaligned values will return an error, instead of aborting with an assert. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-29 09:30:57 -08:00
Andy Grover	d731c1577e	Filesystem version instead of format hash check Instead of hashing headers, define an interop version. Do not mount superblocks that have a different version, either higher or lower. Since this is pretty much the same as the format hash except it's a constant, minimal code changes are needed. Initial dev version is 0, with the intent that version will be bumped to 1 immediately prior to tagging initial release version. Update README. Fix comments. Add interop version to notes and modinfo. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-15 10:53:00 -08:00
Zach Brown	eb3981c103	Add move-blocks scoutfs cli command Add a move-blocks command that translates arguments and calls the MOVE_BLOCKS ioctl. Signed-off-by: Zach Brown <zab@versity.com>	2021-01-14 13:42:22 -08:00
Andy Grover	299062a456	Fix mkfs check for existing ScoutFS superblock We were checking for the wrong magic value. We now need to use -f when running mkfs in run-tests for things to work. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-13 16:32:41 -08:00
Andy Grover	2c5871c253	Change release ioctl to be denominated in bytes not blocks This more closely matches stage ioctl and other conventions. Also change release code to use offset/length nomenclature for consistency. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	d48b447e75	Do not set -Wpadded except for checking kmod-shared headers Remove now-unneeded manual padding in arg structs. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	5241bba7f6	Update scoutfs.8 man page Update for cli args and options changes. Reorder subcommands to match scoutfs built-in help. Consistent ScoutFS capitalization. Tighten up some descriptions and verbiage for consistency and omit descriptions of internals in a few spots. Add SEE ALSO for blockdev(8) and wipefs(8). Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	e0a2175c2e	Use argp info instead of duplicating for cmd_register() Make it static and then use it both for argp_parse as well as cmd_register_argp. Split commands into five groups, to help understanding of their usefulness. Mention that each command has its own help text, and that we are being fancy to keep the user from having to give fs path. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	f2cd1003f6	Implement argp support for walk-inodes This has some fancy parsing going on, and I decided to just leave it in the main function instead of going to the effort to move it all to the parsing function. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	97c6cc559e	Implement argp support for data-waiting and data-wait-err These both have a lot of required options. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	7c54c86c38	Implement argp support for setattr Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	e1ba508301	Implement argp support for counters Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	f35154eb19	counters: Ensure name_wid[0] is initialized to zero I was seeing some segfaults and other weirdness without this. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	7befc61482	Implement argp support for mkfs and add --force Support max-meta-size and max-data-size using KMGTP units with rounding. Detect other fs signatures using blkid library. Detect ScoutFS super using magic value. Move read_block() from print.c into util.c since blkid also needs it. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 16:29:42 -08:00
Andy Grover	6b5ddf2b3a	Implement argp support for print Print warning if printing a data dev, you probably wanted the meta dev. Change read_block to return err value. Otherwise there are confusing ENOMEM messages when pread() fails. e.g. try to print /dev/null. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 10:47:47 -08:00
Andy Grover	d025122fdd	Implement argp support for listxaddr-hidden Rename to list-hidden-xaddrs. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 10:47:47 -08:00
Andy Grover	706fe9a30e	Implement argp support for search-xattrs Get fs path via normal methods, and make xattr an argument not an option. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 10:47:47 -08:00
Andy Grover	0f17ecb9e3	Implement argp support for stage/release Make offset and length optional. Allow size units (KMGTP) to be used for offset/length. release: Since off/len no longer given in 4k blocks, round offset and length to to 4KiB, down and up respectively. Emit a message if rounding occurs. Make version a required option. stage: change ordering to src (the archive file) then the dest (the staged file). Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-12 10:47:47 -08:00
Andy Grover	10df01eb7a	Implement argp support for ino-path Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-04 11:49:31 -08:00
Andy Grover	68b8e4098d	Implement argp support for stat and statfs Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-04 11:49:31 -08:00
Andy Grover	5701184324	Implement argp support for df Convert arg parsing to use argp. Use new get_path() helper fn. Add -h human-readable option. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-04 11:49:31 -08:00
Andy Grover	a3035582d3	Add strdup_or_error() Add a helper function to handle the impossible event that strdup fails. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-04 11:49:31 -08:00
Andy Grover	9e47a32257	Add get_path() Implement a fallback mechanism for opening paths to a filesystem. If explicitly given, use that. If env var is set, use that. Otherwise, use current working directory. Use wordexp to expand ~, $HOME, etc. Signed-off-by: Andy Grover <agrover@versity.com>	2021-01-04 11:49:31 -08:00
Zach Brown	e386b900ee	Remove README.md from utils This was just boilerplate for the utils repo. Signed-off-by: Zach Brown <zab@versity.com>	2020-12-07 10:39:20 -08:00
Zach Brown	86cf3ec4ab	Remove format.h and ioctl.h from utils Now that we're in one repo utils can get its format and ioctl headers from the authoriative kmod files. When we're building a dist tarball we copy the files over so that the build from the dist tarball can use them. Signed-off-by: Zach Brown <zab@versity.com>	2020-12-07 10:39:20 -08:00
Andy Grover	9a647a98f1	scoutfs-utils: Header changes to match kmod PR 41 Signed-off-by: Andy Grover <agrover@versity.com>	2020-11-30 13:35:39 -08:00

1 2 3 4 5 ...

334 Commits