Commit Graph

1750 Commits

Author SHA1 Message Date
Zach Brown
a9da27444f Merge pull request #128 from versity/zab/prealloc_fragmentation
Zab/prealloc fragmentation
2023-06-29 09:57:32 -07:00
Zach Brown
49fe89741d Merge pull request #125 from versity/zab/get_referring_entries
Zab/get referring entries
2023-06-29 09:57:06 -07:00
Zach Brown
847916860d Advance move_blocks extent search offset
The move_blocks ioctl finds extents to move in the source file by
searching from the starting block offset of the region to move.
Logically, this is fine.  After each extent item is deleted the next
search will find the next extent.

The problem is that deleted items still exist in the item cache.  The
next iteration has to skip over all the deleted extents from the start
of the region.  This is fine with large extents, but with heavily
fragmented extents this creates a huge amplification of the number of
items to traverse when moving the fragmented extents in a large file.
(It's not quite O(n^2)/2 for the total extents, deleted items are purged
as we write out the dirty items in each transaction.. but it's still
immense.)

The fix is to simply start searching for the next extent after the one
we just moved.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 16:54:28 -07:00
Zach Brown
564b942ead Write test for hole filling noncontig prealloc
Add a test which exercises filling holes in prealloc regions when the
_contig_only prealloc option is not set.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 16:16:04 -07:00
Zach Brown
3d99fda0f6 Preallocate data around iblock when noncontig
If the _contig_only option isn't set then we try to preallocate aligned
regions of files.  The initial implementation naively only allowed one
preallocation attempt in each aligned region.  If it got a small
allocation that didn't fill the region then every future allocation
in the region would be a single block.

This changes every preallocation in the region to attempt to fill the
hole in the region that iblock fell in.  It uses an extra extent search
(item cache search) to try and avoid thousands of single block
allocations.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 12:21:25 -07:00
Zach Brown
6c0ab75477 Merge pull request #126 from versity/zab/rht_block_shrink_deadlock
Avoid deadlock from block reclaim in rht resize
2023-06-16 10:30:16 -07:00
Zach Brown
89b238a5c4 Add more acceptable quorum delay during testing
Loaded VMs can see a few more seconds delay.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-16 09:38:58 -07:00
Zach Brown
05371b83f0 Update expected console messages during testing
Signed-off-by: Zach Brown <zab@versity.com>
2023-06-16 09:37:37 -07:00
Zach Brown
acafb869e7 Avoid deadlock from block reclaim in rht resize
The RCU hash table uses deferred work to resize the hash table.  There's
a time during resize when hash table iteration will return EAGAIN until
resize makes more progress.  During this time resize can perform
GFP_KERNEL allocations.

Our shrinker tries to iterate over its RCU hash table to find blocks to
reclaim.  It tries to restart iteration if it gets EAGAIN on the
assumption that it will be usable again soon.

Combine the two and our shrinker can get stuck retrying iteration
indefinitely because it's shrinking on behalf of the hash table resizing
that is trying to allocate the next table before making iteration work
again.  We have to stop shrinking in this case so that the resizing
caller can proceed.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-15 14:45:26 -07:00
Zach Brown
74c5fe1115 Add get-referring-entries test
Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
2279e9657f Add get_referring_entries scoutfs command
Add a cli command for the get_referring_entries ioctl.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
707752a7bf Add get_referring_entries ioctl
Add an ioctl that gives the callers all entries that refer to an inode.
It's like a backwards readdir.  It's a light bit of translation between
the internal _add_next_linkrefs() list of entries and the ioctl
interface of a buffer of entry structs.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
0316c22026 Extend scoutfs_dir_add_next_linkrefs
Extend scoutfs_dir_add_next_linkref() to be able to return multiple
backrefs under the lock for each call and have it take an argument to
limit the number of backrefs that can be added and returned.

Its return code changes a bit in that it returns 1 on success instead of
0 so we have to be a little careful with callers who were expecting 0.
It still returns -ENOENT when no entries are found.

We break up its tracepoint into one that records each entry added and
one that records the result of each call.

This will be used by an ioctl to give callers just the entries that
point to an inode instead of assembling full paths from the root.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
5a1e5639c2 Merge pull request #124 from versity/zab/fix_quo_hb_mount_option
Zab/fix quo hb mount option
2023-06-07 10:50:32 -07:00
Zach Brown
950963375b Update quorum heartbeat test for mount option
Update the quorum_heartbeat_timeout_ms test to also test the mount
option, not just updating the timeout via sysfs.  This takes some
reworking as we have to avoid the active leader/server when setting the
timeout via the mount option.  We also allow for a bit more slack around
comparing kernel sleeps and userspace wall clocks.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-23 09:57:13 -07:00
Zach Brown
e52435b993 Add t_mount_opt
Add a test helper that mounts with a mount option.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-22 16:30:01 -07:00
Zach Brown
2b72c57cb0 Fix crash in quorum_heartbeat_timeout_ms parsing
Mount option parsing runs early enough that the rest of the option
read/write serialization infrastructure isn't set up yet.  The
quorum_heartbeat_timeout_ms mount option tried to use a helper that
updated the stored option but it wasn't initialized yet so it crashed.

The helper was really only to have the option validity test in one
place.  It's reworked to only verify the option and the actual setting
is left to the callers.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-22 16:29:56 -07:00
Zach Brown
9c67b2a42d Merge pull request #122 from versity/zab/v1.13
v1.13 Release
2023-05-19 11:38:48 -07:00
Zach Brown
0b38aeb5a4 v1.13 Release
Finish the release notes for the 1.13 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.13
2023-05-19 10:38:40 -07:00
Zach Brown
2daf873983 Merge pull request #121 from versity/zab/heartbeat_fencing_tweaks
Zab/heartbeat fencing tweaks
2023-05-18 17:10:40 -07:00
Zach Brown
904c5dce90 Filter forced unmount transaction commit error
Add a transaction commit error message to the set of errors we ignore
when triggering forced unmount.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 15:50:34 -07:00
Zach Brown
57c6d78df8 Add test of quorum heartbeat timeout setting
Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 15:50:33 -07:00
Zach Brown
74e9d0f764 Silence test syfs option failure
If setting a sysfs option failes the bash write error is output.  It
contains the script line number which can fail over time, leading to
mismatched golden output failures if we used the output as an expected
indication of failure.  Callers should test its rc and output
accordingly if they want the failure logged and compared.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 11:15:28 -07:00
Zach Brown
98eb0eb649 Add t_quorum_nrs test helper
Add a quick function that outputs the fs numbers of the quorum mounts.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 11:15:28 -07:00
Zach Brown
15de0c21c1 Have quorum drop messages on force unmount
Forced unmount is supposed to isolate the mount from the world.  The
net.c TCP messaging returns errors when sending during forced unmount.
The quorum code has its own UDP messaging and wasn't taking forced
unmount into account.

This lead to quorum still being able to send resignation messages to
other quorum peers during forced unmount, making it hard to test
heartbeat timeouts with forced unmount.

The quorum messaging is already unreliable so we can easily make it drop
messages during forced unmount.  Now forced unmount more fully isolates
the quorum code and it becomes easier to test.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 10:01:19 -07:00
Zach Brown
7b65767803 Track and log quorum heartbeat delays
Add tracking and reporting of delays in sending or receiving quorum
heartbeat messages.  We measure the time between back to back sends or
receives of heartbeat messages.  We record these delays truncated down
to second granularity in the quorum sysfs status file.  We log messages
to the console for each longest measured delay up to the maximum
configurable heartbeat timeout.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-17 14:44:27 -07:00
Zach Brown
46640e4ff9 Add counter for quorum heartbeat send failures
Add a counter which tracks the number of heartbeat message send attempts
which fail.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-17 14:44:27 -07:00
Zach Brown
912906f050 Make quorum heartbeat timeout tunable
Add mount and sysfs options for changing the quorum heartbeat timeout.
This allows setting a longer delay in taking over for failed hosts that
has a greater chance of surviving temporary non-fatal delays.

We also double the existing default timeout to 10s which is still
reasonably responsive.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-17 14:44:27 -07:00
Zach Brown
ec02cf442b Use lower latency allocation in quorum socket
The quorum udp socket allocation still allowed starting io which can
trigger longer latencies trying to free memory.  We change the flags to
prefer dipping into emergency pools and then failing rather than
blocking trying to satisfy an allocation.  We'd much rather have a given
heartbeat attempt fail and have the opportunity to succeed at the next
interval rather than running the risk of blocking across multiple
intervals.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-17 14:44:27 -07:00
Zach Brown
0e9cd1eea5 Use specific work queue for quorum work
The quorum work was using the system workq.  While that's mostly fine,
we can create a dedicated workqueue with the specific flags that we
need.  The quorum work needs to run promptly to avoid fencing so we set
it to high priority.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-17 14:44:27 -07:00
Zach Brown
e18ea24561 Move quorum recv that sets timeout before check
In the quorum work loop some message receive actions extend the timeout
after the timeout expiration is checked.  This is usually fine when the
work runs soon after the messages are received and before the timeout
expires.  But under load the work might not schedule until long after
both the message has been received and the timeout has expired.

If the message was a heartbeat message then the wakeup delay would be
mistaken for lack of activity on the server and it would try to take
over for an otherwise active server.

This moves the extension of the heartbeat on message receive to before
the timeout is checked.  In our case of a delayed heartbeat message it
would still find it in the recv queue and extend the timeout, avoiding
fencing an active server.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-17 09:56:53 -07:00
Zach Brown
723309ff75 Merge pull request #120 from versity/zab/v1.12
v1.12 Release
2023-04-17 15:33:36 -07:00
Zach Brown
9bfad7d324 v1.12 Release
Finish the release notes for the 1.12 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.12
2023-04-17 14:30:08 -07:00
Zach Brown
448e0abacb Merge pull request #118 from versity/zab/prepare_empty_data_dev
Zab/prepare empty data dev
2023-04-17 14:20:29 -07:00
Zach Brown
2a6d827e7a Add test for changing devices
Add a relatively small initial test for swapping devices around.

Signed-off-by: Zach Brown <zab@versity.com>
2023-04-17 12:47:50 -07:00
Zach Brown
e7bd1b45dc Add prepare-empty-data-device scoutfs command
Add a command for writing a super block to a new data device after
reading the metadata device to ensure that there's no existing
data on the old data device.

Signed-off-by: Zach Brown <zab@versity.com>
2023-04-17 12:47:50 -07:00
Zach Brown
6ded240089 Add t_rc test execution helper function
Add a quick wrapper to run commands whose output is saved while only
echoing their return code.

Signed-off-by: Zach Brown <zab@versity.com>
2023-04-17 12:47:50 -07:00
Zach Brown
99a20bc383 Put scratch mount point in test tmp dirs
Some tests had grown a bad pattern of making a mount point for the
scratch mount in the root /mnt directory.  Change them to use a mount
point in their test's temp directory outside the testing fs.

Signed-off-by: Zach Brown <zab@versity.com>
2023-04-17 12:47:50 -07:00
Zach Brown
18903ce500 Alphabetize command listing in scoutfs man page
List the scoutfs utility commands in the man page in alphabetical order.

Signed-off-by: Zach Brown <zab@versity.com>
2023-04-17 12:47:50 -07:00
Zach Brown
b76e22ffcf Refactor user util functions for device size
Split the existing device_size() into get_device_size() and
limit_device_size().  An upcoming command wants to get the device size
without applying limiting policy.

Signed-off-by: Zach Brown <zab@versity.com>
2023-04-17 12:47:50 -07:00
Zach Brown
d6863d6832 Merge pull request #119 from versity/zab/inode_nsec
Set sb->s_time_gran to support nsecs
2023-04-17 12:39:13 -07:00
Zach Brown
bb01a3990f Set sb->s_time_gran to support nsecs
We missed initializing sb->s_time_gran which controls how some parts of
the kernel truncate the granularity of nsec in timespec.  Some paths
don't use it at all so time would be maintained at full precision.  But
other paths, particularly setattr_copy() from userspace and
notify_change() from the kernel use it to truncate as times are set.

Setting s_time_gran to 1 maintains full nsec precision.

Signed-off-by: Zach Brown <zab@versity.com>
2023-03-24 10:50:34 -07:00
Zach Brown
409631ceb1 Merge pull request #117 from versity/zab/rename_into_root
Zab/rename into root
2023-03-13 09:28:57 -07:00
Zach Brown
f1264c7e47 Add test to rename into root directory
The ancestor tests in rename were preventing renaming into the root
directory.

Signed-off-by: Zach Brown <zab@versity.com>
2023-03-08 11:00:59 -08:00
Zach Brown
a61b8d9961 Fix renaming into root directory
The VFS performs a lot of checks on renames before calling the fs
method.  We acquire locks and refresh inodes in the rename method so we
have to duplciate a lot of the vfs checks.

One of the checks involves loops with ancestors and subdirectories.  We
missed the case where the root directory is the destination and doesn't
have any parent directories.  The backref walker it calls returns
-ENOENT instead of 0 with an empty set of parents and that error bubbled
up to rename.

The fix is to notice when we're asking for ancestors of the one
directory that can't have ancestors and short circuit the test.

Signed-off-by: Zach Brown <zab@versity.com>
2023-03-08 11:00:59 -08:00
Zach Brown
eac57a1f7a Merge pull request #116 from versity/zab/v1.11
v1.11 Release
2023-02-02 12:02:45 -08:00
Zach Brown
5512d5c03e v1.11 Release
Finish the release notes for the 1.11 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.11
2023-02-02 11:00:38 -08:00
Zach Brown
8cf7be4651 Merge pull request #115 from versity/zab/utils_flush
Zab/utils flush
2023-02-02 10:25:12 -08:00
Zach Brown
3363b4fb79 Flush device caches in buffered util cmds
Add calls to our new device cache flushing helper in commands that use
buffered reads.

Signed-off-by: Zach Brown <zab@versity.com>
2023-01-18 10:52:02 -08:00
Zach Brown
ddb5cce2a5 Add quick utils flush_device helper
Add a quick helper that just calls cache flushing ioctls on different
kinds of files.

Signed-off-by: Zach Brown <zab@versity.com>
2023-01-18 10:27:47 -08:00