Commit Graph

1584 Commits

Author SHA1 Message Date
Zach Brown
a67ea30bb7 Add orphan_scan_delay_ms mount option
Add a mount option to set the delay betwen scanning of the orphan list.
The sysfs file for the option is writable so this option can be set at
run time.

Signed-off-by: Zach Brown <zab@versity.com>
2022-03-10 11:43:11 -08:00
Zach Brown
f3b7c683f0 Fix quorum_server_nr syfs file name typo
The quorum_slot_nr mount option was being mistakenly shown in a sysfs
file named quorum_server_nr.

Signed-off-by: Zach Brown <zab@versity.com>
2022-03-09 11:12:36 -08:00
Zach Brown
8decc54467 Clean up mount option handling
The mount options code is some of the oldest in the tree and is weirdly
split between options.c and super.c.  This cleans up the options code,
moves it all to options.c, and reworks it to be more in line with the
modern subsystem convenction of storing state in an allocated info
struct.

Rather than putting the parsed options in the super for everyone to
directly reference we put them in the private options info struct and
add a locked read function.  This will let us add sysfs files to change
mount options while safely serializing with readers.

All the users of mount options that used to directly reference the
parsed struct now call the read function to get a copy.  They're all
small local changes except for quorum which saves a static copy of the
quorum slot number because it references it in so many places and relies
on it not changing.

Finally, we remove the empty debugfs "options" directory.

Signed-off-by: Zach Brown <zab@versity.com>
2022-03-09 11:12:36 -08:00
Zach Brown
5adcf7677f Export omap group calc helper
The inode caller of omap was manually calculating the group and bits,
which isn't fantastic.   Export the little helper to calculate it so
the inode caller doesn't have to.

Signed-off-by: Zach Brown <zab@versity.com>
2022-03-09 11:12:36 -08:00
Zach Brown
07f03d499f Remove duplicate orphan work delay calculation
You can almost feel the editing mistake that brought the delay
calculation into the conditional and forgot to remove the initial
calculation at declaration.

Signed-off-by: Zach Brown <zab@versity.com>
2022-03-09 11:12:23 -08:00
Zach Brown
c5068efef0 Merge pull request #75 from versity/zab/bad_mount_option
Zab/bad mount option
2022-02-28 09:07:15 -08:00
Zach Brown
66678dc63b Fail mounts with unknown options
Weirdly, the mout option parser silently returned when it found mount
options that weren't recognized.

Signed-off-by: Zach Brown <zab@versity.com>
2022-02-21 10:44:56 -08:00
Zach Brown
b2834d3c28 Add basic bad mount testing
Add some tests which exercise the kinds of reasonable mistakes that
people will make in the field.

Signed-off-by: Zach Brown <zab@versity.com>
2022-02-21 10:44:38 -08:00
Zach Brown
cff50bec6b Merge pull request #74 from versity/zab/fallocate_read_inversion
Zab/fallocate read inversion
2022-02-21 09:58:49 -08:00
Zach Brown
4d6350b3b0 Fix lock ordering in fallocate
We were seeing ABBA deadlocks on the dio_count wait and extent_sem
between fallocate and reads.  It turns out that fallocate got lock
ordering wrong.

This brings fallocate in line with the rest of the adherents to the lock
heirarchy.   Most importantly, the extent_sem is used after the
dio_count.   While we're at it we bring the i_mutex down to just before
the cluster lock for consistency.

Signed-off-by: Zach Brown <zab@versity.com>
2022-02-17 14:48:13 -08:00
Zach Brown
48966b42bb Add simple fallocate test
Signed-off-by: Zach Brown <zab@versity.com>
2022-02-17 11:20:08 -08:00
Zach Brown
97cb8ad50d Merge pull request #72 from versity/zab/quick_man_fix
Clean quorum and format change command docs
2022-02-09 09:22:50 -08:00
Zach Brown
ae08a797ae Clean quorum and format change command docs
The man pages and inline help blurbs for the recently added format
version and quorum config commands incorrectly described the device
arguments which are needed.

Signed-off-by: Zach Brown <zab@versity.com>
2022-02-08 11:23:27 -08:00
Zach Brown
2634fadfcb Merge pull request #71 from versity/zab/v1_1_release
Zab/v1 1 release
v1.1
2022-02-04 11:35:39 -08:00
Zach Brown
0c1f19556d Prepare v1.2-rc release
Add the v1.2-rc section to the release notes so that we can add entries
with commits as needed.

Signed-off-by: Zach Brown <zab@versity.com>
2022-02-04 11:32:53 -08:00
Zach Brown
19caae3da8 v1.1 Release
Finish off the release notes for the 1.1 release.

Signed-off-by: Zach Brown <zab@versity.com>
2022-02-04 11:32:37 -08:00
Zach Brown
2989afbf46 Merge pull request #70 from versity/zab/silence_duplicate_log_merge_complete_error
Silence resent log merge commit error
2022-02-02 14:35:01 -08:00
Zach Brown
730a84af92 Silence resent log merge commit error
The server's log merge complete request handler was considering the
absence of the client's original request as a failure.  Unfortunately,
this case is possible if a previous server successfully completed the
client's request but the response was lost because it stopped for
whatever reason.

The failure was being logged as a hard error to the console which was
causing tests to occasionally fail during server failover that hit just
as the log merge completion was being processed.

The error was being sent to the client as a response, we just need to
silence the message for these expected but rare errors.

We also fix the related case where the server printed the even more
harsh WARN_ON if there was a next original request but it wasn't the one
we expected to find from our requesting client.

Signed-off-by: Zach Brown <zab@versity.com>
2022-02-02 11:26:36 -08:00
Zach Brown
5b77133c3b Merge pull request #68 from versity/zab/collection_of_fixes
Zab/collection of fixes
2022-01-24 11:22:41 -08:00
Zach Brown
329ac0347d Remove unused scoutfs_net_cancel_request()
The net _cancel_request call hasn't been used or tested in approximately
a bazillion years.   Best to get rid of it and have to add and test it
if we think we need it again.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:40:08 -08:00
Zach Brown
15d7eec1f9 Disallow openening unlinked files by handle
Our open by handle functions didn't care that the inode wasn't
referenced and let tasks open unlinked inodes by number.  This
interacted badly with the inode deletion mechanisms which required that
inodes couldn't be cached on other nodes after the transaction which
removed their final reference.

If a task did accidentally open a file by inode while it was being
deleted it could see the inode items in an inconsistent state and return
very confusing errors that look like corruption.

The fix is to give the handle iget callers a flag to tell iget to only
get the inode if it has a positive nlink.   If iget sees that the inode
has been unlinked it returns enoent.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:40:08 -08:00
Zach Brown
cff17a4cae Remove unused flags scoutfs_inode_refresh arg
The flags argument to scoutfs_inode_refresh wasn't being used.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:40:08 -08:00
Zach Brown
9fa2c6af89 Use get-allocated-inos in orphan-inodes test
The orphan inodes test needs to test if inode items exist as it
manipulates inodes.  It used to open the inode by a handle but we're
fixing that to not allow opening unlinked files.   The
get-allocated-inos ioctl tests for the presence of items owned by the
inode regardless of any other vfs state so we can use it to verify what
scoutfs is doing as we work with the vfs inodes.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:40:08 -08:00
Zach Brown
e067961714 Add get-allocated-inos scoutfs command
Add the get-allocated-inos scoutfs command which wraps the
GET_ALLOCATED_INOS ioctl.   It'll be used by tests to find items
associated with an inode instead of trying to open the inode by a
constructed handle after it was unlinked.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:40:08 -08:00
Zach Brown
7a96e03148 Add get_allocated_inos ioctl
Add an ioctl that can give some indication of inodes that have inode
items.   We're exposing this for tests that verify the handling of open
unlinked inodes.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:40:08 -08:00
Zach Brown
e9b3cc873a Export scoutfs_inode_init_key
We're adding an ioctl that wants to build inode item keys so let's
export the private inode key initializer.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:40:08 -08:00
Zach Brown
5f2259c48f Revert "Fix client/server race btwn lock recov and farewell"
This reverts commit 61ad844891.

This fix was trying to ensure that lock recovery response handling
can't run after farewell calls reclaim_rid() by jumping through a bunch
of hoops to tear down locking state as the first farewell request
arrived.

It introduced very slippery use after free during shutdown.  It appears
that it was from drain_workqueue() previously being able to stop
chaining work.   That's no longer possible when you're trying to drain
two workqueues that can queue work in each other.

We found a much clearer way to solve the problem so we can toss this.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:40:08 -08:00
Zach Brown
e14912974d Wait for lock recovery before sending farewell
We recently found that the server can send a farewell response and try
to tear down a client's lock state while it was still in lock recovery
with the client.   The lock recovery response could add a lock
for the client after farell's reclaim_rid() had thought the client was
gone forever and tore down its locks.

This left a lock in the lock server that wasn't associated with any
clients and so could never be invalidated.   Attempts to acquire
conflicting locks with it would hang forever, which we saw as hangs in
testing with lots of unmounting.

We tried to fix it by serializing incoming request handling and
forcefully clobbering the client's lock state as we first got
the farewell request.   That went very badly.

This takes another approach of trying to explicitly wait for lock
recovery to finish before sending farewell responses.   It's more in
line with the overall pattern of having the client be up and functional
until farewell tears it down.

With this in place we can revert the other attempted fix that was
causing so many problems.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-24 09:39:51 -08:00
Zach Brown
813ce24d79 Move local-force-unmount test script into tests/
The local-force-unmount fenced fencing script only works when all the
mounts are on the local host and it uses force unmount.   It is only
used in our specific local testing scripts.  Packaging it as an example
lead people to believe that it could be used to cobble together a
multi-host testing network, however temporary.

Move it from being in utils and packged to being private to our tests so
that it doesn't present an attractive nuisance.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-19 11:33:34 -08:00
Zach Brown
e2ce5ab6da Free pending recovery state on shutdown
scoutfs_recov_shutdown() tried to move the recovery tracking structs off
the shared list and into a private list so they could be freed.  But
then it went and walked the now empty shared list to free entries.  It
should walk the private list.

This would leak a small amount of memory in the rare cases where the
server was shutdown while recovery was still pending.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-19 09:22:48 -08:00
Zach Brown
89ca903c41 Print log trees get/commit seqs
Back when we added the get/commit transaction sequence numbers to the
log_trees we forgot to add them to the scoutfs print output.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-19 09:21:02 -08:00
Zach Brown
e3c7e21c40 Use write memory barrier in set_shutting_down
The server's little set_shutting_down() helper accidentally used a read
barrier instead of a write barrier.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-19 09:17:38 -08:00
Zach Brown
e97ea5407d Merge pull request #64 from bgly/bduffyly/quorum_race
Fix client/server race between lock recov and farewell processing
2022-01-14 09:03:00 -08:00
Bryant G. Duffy-Ly
8db5c118c3 Change clent to c_ent
To make it clearer; changing clent to c_ent to represent
client entry.

Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
2022-01-13 13:33:05 -06:00
Bryant G. Duffy-Ly
61ad844891 Fix client/server race btwn lock recov and farewell
Tear down client lock server state and set a boolean so that
there is no race between client/server processing lock recovery
at the same time as farewell.

Currently there is a bug where if server and clients are unmounted
then work from the client is processed out of order, which leaves
behind a server_lock for a RID that no longer exists.
In order to fix this we need to serialize SCOUTFS_NET_CMD_FAREWELL
in recv_worker.

Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
2022-01-13 13:32:56 -06:00
Zach Brown
2c8f5d8fc1 Merge pull request #65 from versity/zab/item_cache_move_page_seq
Preserve item cache page max_seq as items move
2022-01-13 09:12:23 -08:00
Bryant G. Duffy-Ly
8a504cd5ae Add client/server unmount race on lock_recov unit test
This unit test reproduces the race we have between
client and server diong lock recovery while farewell
is processed.

Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
2022-01-12 21:29:00 -06:00
Zach Brown
99a1cc704f Preserve item cache page max_seq as items move
The max_seq and active reader mechanisms in the item cache stop readers
from reading old items and inserting them in the cache after newer items
have been reclaimed by memory pressure.  The max_seq field in the pages
must reflect the greatest seq of the items in the page so that reclaim
knows that the page contains items newer than old readers and must not
be removed.

We update the page max_seq as items are inserted or as they're dirtied
in the page.   There's an additional subtle effect that the max_seq can
also protect items which have been erased.  Deletion items are erased
from the pages as a commit completes.   The max_seq in that page will
still protect it from being reclaimed even though no items have that seq
value themselves.

That protection fails if the range of keys containing the erased item is
moved to another page with a lower max_seq.   The item mover only
updated the destination page's max_seq for each item that was moved.  It
missed that the empty space between the items might have a larger
max_seq from an erased item.  We don't know where the erased item is so
we have to assume that a larger max_seq in the source page must be set
on the destination page.

This could explain very rare item cache corruption where nodes were
seeing deleted directory entry items reappearing.  It would take a
specific sequence of events involving large directories with an isolated
removal, a delayed item cache reader, a commit, and then enough
insertions to split the page all happening in precisely the wrong
sequence.

Signed-off-by: Zach Brown <zab@versity.com>
2022-01-12 10:23:55 -08:00
Zach Brown
166ab58b99 Merge pull request #62 from versity/zab/change_quorum_config
Zab/change quorum config
2021-11-29 12:18:15 -08:00
Zach Brown
8bc1ee8346 Add change-quorum-config command
Add a command to change the quorum config which starts by only supports
updating the super block whlie the file system is oflfine.

Signed-off-by: Zach Brown <zab@versity.com>
2021-11-24 15:41:04 -08:00
Zach Brown
285b68879a Set quorum config ver to 1 in mkfs and print
We're adding a command to change the quorum config which updates its
version number.  Let's make the version a little more visible and start
it at the more humane 1.

Signed-off-by: Zach Brown <zab@versity.com>
2021-11-24 15:41:04 -08:00
Zach Brown
1ac3efe701 Add meta_super_in_use utils helper
Move the code that checks that the super is in use from
change-format-version into its own function in util.c.   We'll use it in
an upcoming command to change the quorum config.

Signed-off-by: Zach Brown <zab@versity.com>
2021-11-24 15:40:25 -08:00
Zach Brown
ce76682db7 Make mkfs quorum helpers available
Move functions for printing and validating the quorum config from mkfs.c
to quorum.c so that they can be used in an upcoming command to change
the quorum config.

Signed-off-by: Zach Brown <zab@versity.com>
2021-11-24 13:44:51 -08:00
Zach Brown
686f8515bc Fix --quorum-count typo in mkfs error message
The change from --quorum-count to --quorum-slot forgot to update a
mention of the option in an error message in mkfs when it wasn't
provided.

Signed-off-by: Zach Brown <zab@versity.com>
2021-11-24 13:44:51 -08:00
Zach Brown
93bc52cc54 Merge pull request #60 from bgly/bduffyly/block_stale_reads
Fix block-stale-read test case
2021-11-24 10:25:26 -08:00
Zach Brown
1108d1288a Merge pull request #61 from bgly/bduffyly/rename2
Add basic renameat2 syscall support
2021-11-24 10:24:23 -08:00
Bryant G. Duffy-Ly
0abcd5a004 Take generic/025/078 off expunge list adding 23/24
We want to enable the test case for:
generic/023 - tests that renameat2 syscall exists
generic/024 - renameat2 with NOREPLACE flag

Move both generic/025 and 078 to the no run list so that
we can test the unsupported output if the flags were
passed that were not supported.

Example output:
generic/025      [not run] fs doesn't support RENAME_EXCHANGE
generic/078      [not run] fs doesn't support RENAME_WHITEOUT

Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
2021-11-19 17:54:19 -06:00
Bryant G. Duffy-Ly
888ad8ec5c Add renameat2 unit test case
The goal of the test case is to have two mount points
with two async calls made to do renameat2. This allows
for two calls to race to call renameat2 RENAME_NOREPLACE.
When this happens you expect one of them to fail with a
-EEXIST. This would validate that the new flag works.
Essentially one of the two calls to renameat should hit the
new RENAME_NOREPLACE code and exit early.

Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
2021-11-19 17:54:13 -06:00
Bryant G. Duffy-Ly
16ea0ef671 Add syscall wrapper for renameat2
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
2021-11-19 17:54:08 -06:00
Bryant G. Duffy-Ly
1b8e3f7c05 Add basic renameat2 syscall support
Support generic renameat2 syscall then add support for the
RENAME_NOREPLACE flag. To suppor the flag we need to check
the existance of both entries and return -EXIST.

Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
2021-11-19 17:54:02 -06:00