scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-07-08 17:16:48 +00:00

Author	SHA1	Message	Date
Auke Kok	440c3dc769	Add punch-offline scoutfs subcommand. A minimal punch_offline ioctl wrapper. Argument style is adopted from stage/release. Following the syntax for the option of stage/release, this calls the punch offline ioctl, punching any offline extent within the designated range from offset with length. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-13 15:45:52 -07:00
Zach Brown	f0c7996612	Limit client locks with option instead of shrinker The use of the VM shrinker was a bad fit for locks. Shrinking a lock requires a round trip with the server to request a null mode. The VM treats the locks like a cache, as expected, which leads to huge amounts of locks accumulating and then being shrank in bulk. This creates a huge backlog of locks making their way through the network conversation with the server that implements invalidating to a null mode and freeing. It starves other network and lock processing, possibly for minutes. This removes the VM shrinker and instead introduces an option that sets a limit on the number of idle locks. As the number of locks exceeds the count we only try to free an oldest lock at each lock call. This results in a lock freeing pace that is proportional to the allocation of new locks by callers and so is throttled by the work done while callers hold locks. It avoids the bulk shrinking of 10s of thousands of locks that we see in the field. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-08 10:58:50 -08:00
Auke Kok	2884a92408	Avoid using bash special device nodes. Bash has special handling when these standard IO files, but there are cases where customers have special restrictions set on them. Likely to avoid leaking error data out of system logs as part of IDS software. In any case, we can just reopen existing file descriptors here in both these cases to avoid this entirely. This will always work. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 13:24:48 -05:00
Zach Brown	7ef62894bd	Add ino_alloc_per_lock option Add an option that can limit the number of inode numbers that are allocated per lock group. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 17:19:04 -08:00
Zach Brown	8ddf9b8c8c	Handle disappearing fencing requests and targets The userspace fencing process wasn't careful about handling underlying directories that disappear while it was working. On the server/fenced side, fencing requests can linger after they've been resolved by writing 1 to fenced or error. The script could come back around to see the directory before the server finally removes it, causing all later uses of the request dir to fail. We saw this in the logs as a bunch of cat errors for the various request files. On the local fence script side, all the mounts can be in the process of being unmounted so both the /sys/fs dirs and the mount it self can be removed while we're working. For both, when we're working with the /sys/fs files we read them without logging errors and then test that the dir still exists before using what we read. When fencing a mount, we stop if findmnt doesn't find the mount and then raise a umount error if the /sys/fs dir exists after umount fails. And while we're at it, we have each scripts logging append instead of truncating (if, say, it's a log file instead of an interactive tty). Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Auke Kok	f67462750b	Add tcp_keepalive_timeout_ms option, change default to 60s The default TCP keepalive value is currently 10s, resulting in clients being disconnected after 10 seconds of not replying to a TCP keepalive packet. These keepalive values are reasonable most of the times, but we've seen client disconnects where this timeout has been exceeded, resulting in fencing. The cause for this is unknown at this time, but it is suspected that network intermissions are happening. This change adds a configurable value for this specific client socket timeout. It enforces that its value is above UNRESPONSIVE_PROBES, whose value remains unchanged. The default value of 10000ms (10s) is changed to 60s. This is the value we're assuming is much better suited for customers and has been briefly trialed, showing that it may help to avoid network level interruptions better. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-10-28 18:45:43 -04:00
Auke Kok	70bd936213	Ignore sparse error about stat.h on el8. On el8, sparse is at 0.6.4 in epel-release, but it fails with: ``` [SP src/util.c] src/util.c: note: in included file (through /usr/include/sys/stat.h): /usr/include/bits/statx.h:30:6: error: not a function <noident> /usr/include/bits/statx.h:30:6: error: bad constant expression type ``` This is due to us needing O_DIRECT from <fcntl.h>, so we set _GNU_SOURCE before including it, but this causes (through _USE_GNU in sys/stat.h) statx.h to be included, and that has __has_include, and sparse is too dumb to understand it. Just shut it up. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-10-06 12:27:25 -05:00
Zach Brown	d0cf026298	Require sparse, and filter kernel sparse output Fail the build if we don't check with sparse in both the kernel and userspace utils. Add a filtering wrapper to the kernel build so that we have a place to filter out uninteresting errors from kernel sources that we're building against. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-03 09:35:36 -07:00
Zach Brown	c6dab3c306	Remove wordexp expansion of utils path argument scoutfs cli commands were using a helper that tried to perform word expansion on the path argument. This was done with the intent of providing the convenience of shell expansion (env vars, ~) within the cli command argument. But it breaks paths that accidentally have their file names match the syntax that wordexp supports. "[ ]" tripped up files in the wild. We don't need to provide shell expansion functionality in our argument parsing. The shell can do that. The cli must pass the arguments straight through, no parsing at all. Signed-off-by: Zach Brown <zab@versity.com>	2025-02-18 11:55:37 -08:00
Zach Brown	295f751aed	Add test_bit to utils bitmap Add test_bit() to the trivial utils bitmap.c implementation. Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:58:58 -08:00
Zach Brown	7f6032d9b4	Add lk rbtree wrapper Import the kernel's rbtree implementation with a wrapper so we can use it from userspace. Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:58:49 -08:00
Zach Brown	7e3a6537ec	Add userspace version of our dirent name hash Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:58:41 -08:00
Zach Brown	49b7b70438	Add userspace version of our mode to type Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:58:31 -08:00
Zach Brown	de0fdd1f9f	Promote userspace btree block initialization Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:58:23 -08:00
Zach Brown	a6d7de3c00	Add fls64() alias for userspace flsll() Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:58:16 -08:00
Zach Brown	2c2c127c5e	Add put_unaligned_leXX() for userspace Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:58:10 -08:00
Zach Brown	9491c784e7	Add srch_encode_entry() for userspace utils Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:57:56 -08:00
Zach Brown	c3b30930fa	Add bloom filter index calc for userspace utils Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:57:46 -08:00
Zach Brown	e7e46a80e6	Add userspace NSEC_PER_SEC Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:57:39 -08:00
Zach Brown	1ddf752f42	Import a few more functions to our list.h Import a few more functions from the kernel's list.h into our imported copy. Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:57:29 -08:00
Zach Brown	14b65c6360	Fix printing alloc list block extents The list alloc blocks have an array of blknos that are offset by a start field in the block header. The print code wasn't using that and was always referencing the beginning of the array, which could miss blocks. Signed-off-by: Zach Brown <zab@versity.com>	2025-01-22 09:57:21 -08:00
Auke Kok	8a4b0967cb	Add fiemap output through scoutfs util. There's filefrag already, and that works, but, it's output is very inconsistent between various OS release versions, and it has already meant that we'd needed to adjust tests to account for these little but insignificant changes. A lot more work than useful. It's even more changed in el9. This adds `scoutfs get-fiemap FILE` and prints out block extent info with flags that we care about as an abbreviated letter: U for Unwritten, L for Last, and O for Unknown (as in, "offline"). The -P/--physical and -L/--logical options turn off logical or physical offset display, in case you only want to see the offsets in either units. You can pass -b/--byte to display offsets and lengths in byte values. The block size will then be obtained from fstat() of the queried file (4096 for scoutfs). I've removed all uses of filefrag from our scoutfs tests. Xfstests still calls it but their internal diff takes care of that issue. Where needed and appropriate, the tests are adjusted so that the output of `scoutfs get-fiemap` is as close as it can to what it used to be, so that reading the test results allows the quick view of what might have been going wrong. There are some output strings I have not bothered to update because there's no real value to updating every output string to match, and we just adjust the golden file accordingly. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 15:38:34 -07:00
Auke Kok	ac00f5cedb	Free after getline(), even if fail, and catch eof() on el9 getline() allocates the space for the return value even if there is an error, so when it returns an error, we still have to free() it. In el9, when reading stdin we will get errno=0 returned (no error) when we hit the end of stdin. This behavior is different from el7/8. We don't want to throw an error here to avoid failing the test, since it doesn't. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	00ebe92186	Add stddef.h to util.h to avoid duplicate offsetof() def. In el9 releases, our includes declare offsetof() before our header chain includes stddef.h, which doesn't properly check if offsetof is already defined, leading to a redefinition. Just include stddef at all times here. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	570c05898c	Correct endian conversion length (blkno is le64) Trivial correction of wrong bitlength conversion. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 12:41:05 -07:00
Auke Kok	3b8d2eab8e	Sparse fix for epel 0.6.4 sparse - redefines We should rely on sparse from epel to do automated sparse checking and not a git tag. But the 0.6.4 build currently fails on sparse/gcc redefines. This magic Awk from Zach script processes sparse and gcc internal defines and leaves the one intact that sparse doesn't have. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-09-27 15:37:47 -04:00
Auke Kok	267c1cc2d5	Check meta flags bit set/unset for devices. This extra check assures the passed meta device and data device are indeed what they should be, and prevents against unwanted swapping or repeated duplicate device arguments. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-07-12 15:22:45 -04:00
Zach Brown	1bc83e9e2d	Add indx xattr tag support to utils Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	e0bb6ca481	Add quota support to utils Add scoutfs cli commands for managing quotas and add its persistent structures to the print command. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	4a8240748e	Add project ID support Add support for project IDs. They're managed through the _attr_x interfaces and are inherited from the parent directory during creation. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	fb5331a1d9	Add inode retention bit Add a bit to the private scoutfs inode flags which indicates that the inode is in retention mode. The bit is visible through the _attr_x interface. It can only be set on regular files and when set it prevents modification to all but non-user xattrs. It can be cleared by root. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 15:09:05 -07:00
Zach Brown	de304628ea	Add attr_x commands and documentation to utils Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00
Bryant G. Duffy-Ly	9ba4271c26	Add new max format version of 2 We're about to add new format structures so increment the max version to 2. Future commits will add the features before we release version 2 in the wild. Signed-off-by: Zach Brown <zab@zabbo.net>	2024-06-28 14:53:49 -07:00
Bryant G. Duffy-Ly	90cfaf17d1	Initial support for different inode sizes We're about to increase the inode size and increment the format version. Inode reading and writing has to handle different valid inode sizes as allowed by the format version. This is the initial skeletal work that later patches which really increase the inode size will further refine to add the specific known sizes and format versions. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com> [zab@versity.com: reworded description, reworked to use _within] Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	d6642da44d	Prevent downgrade of format version Don't let change-format-version decrease the format version. It doesn't have the machinery to go back and migrate newer structures to older structures that would be compatible with code expecting the older version. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com> [zab@versity.com: split from initial patch with other changes] Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	6db69b7a4f	Set root inode crtime in mkfs When we added the crtime creation timestamp to the inode we forgot to update mkfs to set the crtime of the root inode. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:18 -07:00
Zach Brown	90a4c82363	Make log merge wait timeout tunable Add a mount option for the amount of time that log merge creation can wait before giving up. We add some counters so we can see how often the timeout is being hit and what the average successfull wait time is. Signed-off-by: Zach Brown <zab@versity.com>	2024-01-25 11:25:56 -08:00
Ben McClelland	d2c2fece2a	Add rpm spec file support for el8 builds The rpmbuild support files no longer define the previously used kernel module macros. This carves out the differences between el7 and el8 with conditionals based on the distro we are building for. Signed-off-by: Ben McClelland <ben.mcclelland@versity.com>	2023-10-09 15:35:40 -04:00
Zach Brown	2279e9657f	Add get_referring_entries scoutfs command Add a cli command for the get_referring_entries ioctl. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-14 14:12:10 -07:00
Zach Brown	912906f050	Make quorum heartbeat timeout tunable Add mount and sysfs options for changing the quorum heartbeat timeout. This allows setting a longer delay in taking over for failed hosts that has a greater chance of surviving temporary non-fatal delays. We also double the existing default timeout to 10s which is still reasonably responsive. Signed-off-by: Zach Brown <zab@versity.com>	2023-05-17 14:44:27 -07:00
Zach Brown	e7bd1b45dc	Add prepare-empty-data-device scoutfs command Add a command for writing a super block to a new data device after reading the metadata device to ensure that there's no existing data on the old data device. Signed-off-by: Zach Brown <zab@versity.com>	2023-04-17 12:47:50 -07:00
Zach Brown	18903ce500	Alphabetize command listing in scoutfs man page List the scoutfs utility commands in the man page in alphabetical order. Signed-off-by: Zach Brown <zab@versity.com>	2023-04-17 12:47:50 -07:00
Zach Brown	b76e22ffcf	Refactor user util functions for device size Split the existing device_size() into get_device_size() and limit_device_size(). An upcoming command wants to get the device size without applying limiting policy. Signed-off-by: Zach Brown <zab@versity.com>	2023-04-17 12:47:50 -07:00
Zach Brown	3363b4fb79	Flush device caches in buffered util cmds Add calls to our new device cache flushing helper in commands that use buffered reads. Signed-off-by: Zach Brown <zab@versity.com>	2023-01-18 10:52:02 -08:00
Zach Brown	ddb5cce2a5	Add quick utils flush_device helper Add a quick helper that just calls cache flushing ioctls on different kinds of files. Signed-off-by: Zach Brown <zab@versity.com>	2023-01-18 10:27:47 -08:00
Zach Brown	ef2daf8857	Make data preallocation tunable Make mount options for the size of preallocation and whether or not it should be restricted to extending writes. Disabling the default restriction to streaming writes lets it preallocate in aligned regions of the preallocation size when they contain no extents. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-14 14:03:35 -07:00
Zach Brown	29538a9f45	Add POSIX ACL support Add support for the POSIX ACLs as described in acl(5). Support is enabled by default and can be explicitly enabled or disabled with the acl or noacl mount options, respectively. Signed-off-by: Zach Brown <zab@versity.com>	2022-09-28 10:36:10 -07:00
Zach Brown	49df98f5a8	Add skip-likely-huge print option Add an option to skip printing structures that are likely to be so huge that the print output becomes completely unwieldly on large systems. Signed-off-by: Zach Brown <zab@versity.com>	2022-07-06 15:07:57 -07:00
Zach Brown	26ae9c6e04	Verify local unmount testing fence script The fence script we use for our single node multi-mount tests only knows how to fence by using forced unmount to destroy a mount. As of now, the tests only generate failing nodes that need to be fenced by using forced unmount as well. This results in the awkward situation where the testing fence script doesn't have anything to do because the mount is already gone. When the test fence script has nothing to do we might not notice if it isn't run. This adds explicit verification to the fencing tests that the script was really run. It adds per-invocation logging to the fence script and the test makes sure that it was run. While we're at it, we take the opportunity to tidy up some of the scripting around this. We use a sysfs file with the data device major:minor numbers so that the fencing script can find and unmount mounts without having to ask them for their rid. They may not be operational. Signed-off-by: Zach Brown <zab@versity.com>	2022-03-28 14:52:08 -07:00
Zach Brown	a67ea30bb7	Add orphan_scan_delay_ms mount option Add a mount option to set the delay betwen scanning of the orphan list. The sysfs file for the option is writable so this option can be set at run time. Signed-off-by: Zach Brown <zab@versity.com>	2022-03-10 11:43:11 -08:00

1 2 3 4 5 ...

365 Commits