Bash has special handling when these standard IO files, but
there are cases where customers have special restrictions set
on them. Likely to avoid leaking error data out of system logs
as part of IDS software.
In any case, we can just reopen existing file descriptors here
in both these cases to avoid this entirely. This will always
work.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The userspace fencing process wasn't careful about handling underlying
directories that disappear while it was working.
On the server/fenced side, fencing requests can linger after they've
been resolved by writing 1 to fenced or error. The script could come
back around to see the directory before the server finally removes it,
causing all later uses of the request dir to fail. We saw this in the
logs as a bunch of cat errors for the various request files.
On the local fence script side, all the mounts can be in the process of
being unmounted so both the /sys/fs dirs and the mount it self can be
removed while we're working.
For both, when we're working with the /sys/fs files we read them without
logging errors and then test that the dir still exists before using what
we read. When fencing a mount, we stop if findmnt doesn't find the
mount and then raise a umount error if the /sys/fs dir exists after
umount fails.
And while we're at it, we have each scripts logging append instead of
truncating (if, say, it's a log file instead of an interactive tty).
Signed-off-by: Zach Brown <zab@versity.com>
The default TCP keepalive value is currently 10s, resulting in clients
being disconnected after 10 seconds of not replying to a TCP keepalive
packet. These keepalive values are reasonable most of the times, but
we've seen client disconnects where this timeout has been exceeded,
resulting in fencing. The cause for this is unknown at this time, but it
is suspected that network intermissions are happening.
This change adds a configurable value for this specific client socket
timeout. It enforces that its value is above UNRESPONSIVE_PROBES, whose
value remains unchanged.
The default value of 10000ms (10s) is changed to 60s. This is the value
we're assuming is much better suited for customers and has been briefly
trialed, showing that it may help to avoid network level interruptions
better.
Signed-off-by: Auke Kok <auke.kok@versity.com>
On el8, sparse is at 0.6.4 in epel-release, but it fails with:
```
[SP src/util.c]
src/util.c: note: in included file (through /usr/include/sys/stat.h):
/usr/include/bits/statx.h:30:6: error: not a function <noident>
/usr/include/bits/statx.h:30:6: error: bad constant expression type
```
This is due to us needing O_DIRECT from <fcntl.h>, so we set _GNU_SOURCE
before including it, but this causes (through _USE_GNU in sys/stat.h)
statx.h to be included, and that has __has_include, and sparse is too
dumb to understand it.
Just shut it up.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Fail the build if we don't check with sparse in both the kernel and
userspace utils. Add a filtering wrapper to the kernel build so that we
have a place to filter out uninteresting errors from kernel sources that
we're building against.
Signed-off-by: Zach Brown <zab@versity.com>
scoutfs cli commands were using a helper that tried to perform word
expansion on the path argument. This was done with the intent of
providing the convenience of shell expansion (env vars, ~) within the
cli command argument.
But it breaks paths that accidentally have their file names match the
syntax that wordexp supports. "[ ]" tripped up files in the wild.
We don't need to provide shell expansion functionality in our argument
parsing. The shell can do that. The cli must pass the arguments
straight through, no parsing at all.
Signed-off-by: Zach Brown <zab@versity.com>
The list alloc blocks have an array of blknos that are offset by a start
field in the block header. The print code wasn't using that and was
always referencing the beginning of the array, which could miss blocks.
Signed-off-by: Zach Brown <zab@versity.com>
There's filefrag already, and that works, but, it's output is very
inconsistent between various OS release versions, and it has already
meant that we'd needed to adjust tests to account for these little
but insignificant changes. A lot more work than useful. It's even
more changed in el9.
This adds `scoutfs get-fiemap FILE` and prints out block extent
info with flags that we care about as an abbreviated letter: U for
Unwritten, L for Last, and O for Unknown (as in, "offline").
The -P/--physical and -L/--logical options turn off logical or physical
offset display, in case you only want to see the offsets in either
units. You can pass -b/--byte to display offsets and lengths in
byte values. The block size will then be obtained from fstat() of
the queried file (4096 for scoutfs).
I've removed all uses of filefrag from our scoutfs tests. Xfstests
still calls it but their internal diff takes care of that issue.
Where needed and appropriate, the tests are adjusted so that the output
of `scoutfs get-fiemap` is as close as it can to what it used to be,
so that reading the test results allows the quick view of what might
have been going wrong.
There are some output strings I have not bothered to update because
there's no real value to updating every output string to match,
and we just adjust the golden file accordingly.
Signed-off-by: Auke Kok <auke.kok@versity.com>
getline() allocates the space for the return value even if there is an
error, so when it returns an error, we still have to free() it.
In el9, when reading stdin we will get errno=0 returned (no error) when
we hit the end of stdin. This behavior is different from el7/8. We don't
want to throw an error here to avoid failing the test, since it doesn't.
Signed-off-by: Auke Kok <auke.kok@versity.com>
In el9 releases, our includes declare offsetof() before our header
chain includes stddef.h, which doesn't properly check if offsetof
is already defined, leading to a redefinition. Just include stddef
at all times here.
Signed-off-by: Auke Kok <auke.kok@versity.com>
We should rely on sparse from epel to do automated sparse checking and
not a git tag. But the 0.6.4 build currently fails on sparse/gcc
redefines.
This magic Awk from Zach script processes sparse and gcc internal defines
and leaves the one intact that sparse doesn't have.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This extra check assures the passed meta device and data device
are indeed what they should be, and prevents against unwanted
swapping or repeated duplicate device arguments.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Add support for project IDs. They're managed through the _attr_x
interfaces and are inherited from the parent directory during creation.
Signed-off-by: Zach Brown <zab@versity.com>
Add a bit to the private scoutfs inode flags which indicates that the
inode is in retention mode. The bit is visible through the _attr_x
interface. It can only be set on regular files and when set it prevents
modification to all but non-user xattrs. It can be cleared by root.
Signed-off-by: Zach Brown <zab@versity.com>
We're about to add new format structures so increment the max version to
2. Future commits will add the features before we release version 2 in
the wild.
Signed-off-by: Zach Brown <zab@zabbo.net>
We're about to increase the inode size and increment the format version.
Inode reading and writing has to handle different valid inode sizes as
allowed by the format version. This is the initial skeletal work that
later patches which really increase the inode size will further refine
to add the specific known sizes and format versions.
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
[zab@versity.com: reworded description, reworked to use _within]
Signed-off-by: Zach Brown <zab@versity.com>
Don't let change-format-version decrease the format version. It doesn't
have the machinery to go back and migrate newer structures to older
structures that would be compatible with code expecting the older
version.
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
[zab@versity.com: split from initial patch with other changes]
Signed-off-by: Zach Brown <zab@versity.com>
When we added the crtime creation timestamp to the inode we forgot to
update mkfs to set the crtime of the root inode.
Signed-off-by: Zach Brown <zab@versity.com>
Add a mount option for the amount of time that log merge creation can
wait before giving up. We add some counters so we can see how often
the timeout is being hit and what the average successfull wait time is.
Signed-off-by: Zach Brown <zab@versity.com>
The rpmbuild support files no longer define the previously used kernel
module macros. This carves out the differences between el7 and el8 with
conditionals based on the distro we are building for.
Signed-off-by: Ben McClelland <ben.mcclelland@versity.com>
Add mount and sysfs options for changing the quorum heartbeat timeout.
This allows setting a longer delay in taking over for failed hosts that
has a greater chance of surviving temporary non-fatal delays.
We also double the existing default timeout to 10s which is still
reasonably responsive.
Signed-off-by: Zach Brown <zab@versity.com>
Add a command for writing a super block to a new data device after
reading the metadata device to ensure that there's no existing
data on the old data device.
Signed-off-by: Zach Brown <zab@versity.com>
Split the existing device_size() into get_device_size() and
limit_device_size(). An upcoming command wants to get the device size
without applying limiting policy.
Signed-off-by: Zach Brown <zab@versity.com>
Make mount options for the size of preallocation and whether or not it
should be restricted to extending writes. Disabling the default
restriction to streaming writes lets it preallocate in aligned regions
of the preallocation size when they contain no extents.
Signed-off-by: Zach Brown <zab@versity.com>
Add support for the POSIX ACLs as described in acl(5). Support is
enabled by default and can be explicitly enabled or disabled with the
acl or noacl mount options, respectively.
Signed-off-by: Zach Brown <zab@versity.com>
Add an option to skip printing structures that are likely to be so huge
that the print output becomes completely unwieldly on large systems.
Signed-off-by: Zach Brown <zab@versity.com>
The fence script we use for our single node multi-mount tests only knows
how to fence by using forced unmount to destroy a mount. As of now, the
tests only generate failing nodes that need to be fenced by using forced
unmount as well. This results in the awkward situation where the
testing fence script doesn't have anything to do because the mount is
already gone.
When the test fence script has nothing to do we might not notice if it
isn't run. This adds explicit verification to the fencing tests that
the script was really run. It adds per-invocation logging to the fence
script and the test makes sure that it was run.
While we're at it, we take the opportunity to tidy up some of the
scripting around this. We use a sysfs file with the data device
major:minor numbers so that the fencing script can find and unmount
mounts without having to ask them for their rid. They may not be
operational.
Signed-off-by: Zach Brown <zab@versity.com>
Add a mount option to set the delay betwen scanning of the orphan list.
The sysfs file for the option is writable so this option can be set at
run time.
Signed-off-by: Zach Brown <zab@versity.com>
The man pages and inline help blurbs for the recently added format
version and quorum config commands incorrectly described the device
arguments which are needed.
Signed-off-by: Zach Brown <zab@versity.com>
Add the get-allocated-inos scoutfs command which wraps the
GET_ALLOCATED_INOS ioctl. It'll be used by tests to find items
associated with an inode instead of trying to open the inode by a
constructed handle after it was unlinked.
Signed-off-by: Zach Brown <zab@versity.com>