Files
scoutfs/kmod/src/omap.h
Zach Brown d846eec5e8 Harden final inode deletion
We were seeing a number of problems coming from races that allowed tasks
in a mount to try and concurrently delete an inode's items.  We could
see error messages indicating that deletion failed with -ENOENT, we
could see users of inodes behave erratically as inodes were deleted from
under them, and we could see eventual server errors trying to merge
overlapping data extents which were "freed" (add to transaction lists)
multiple times.

This commit addresses the problems in one relatively large patch.  While
we could mechanically split up the fixes, they're all interdependent and
splitting them up (bisecting through them) could cause failures that
would be devilishly hard to diagnose.

First we stop allowing multiple cached vfs inodes.  This was initially
done to avoid deadlocks between lock invalidation and final inode
deletion.  We add a specific lookup that's used by invalidation which
ignores any inodes which are in I_NEW or I_FREEING.  Now that iget can
wait on inode flags we call iget5_locked before acquiring the cluster
lock.  This ensures that we can only have one cached vfs inode for a
given inode number in evict_inode trying to delete.

Now that we can only have one cached inode, we can rework the omap
tracking to use _set and _clear instead of _inc and _put.  This isn't
strictly necessary but is a simplification and lets us issue warnings if
we see that we ever try to set an inode numbers bit on behalf of
multiple cached inodes.  We also add a _test helper.

Orphan scanning would try to perform deletion by instantiating a cached
inode and then putting it, triggering eviction and final deletion.  This
was an attempt to simplify concurrency but ended up causing more
problems.  It no longer tries to interact with inode cache at all and
attempts to safely delete inode items directly.  It uses the omap test
to determine that it should skip an already cached inode.

We had attempted to forbid opening inodes by handle if they had an nlink
of 0.  Since we allowed multiple cached inodes for an inode number this
was to prevent adding cached inodes that were being deleted.  It was
only performing the check on newly allocated inodes, though, so it could
get a reference to the cached inode that the scanner had inserted for
deleting.  We're chosing to keep restricting opening by handle to only
linked inodes so we also check existing inodes after they're refreshed.

We're left with a task evicting an inode and the orphan scanner racing
to delete an inode's items.  We move the work of determining if its safe
to delete out of scoutfs_omap_should_delete() and into
try_delete_inode_items() which is called directly from eviction and
scanning.  This is mostly code motion but we do make three critical
changes.  We get rid of the goofy concurrent deletion detection in
delete_inode_items() and instead use a bit in the lock data to serialize
multiple attempts to delete an inode's items.  We no longer assume that
the inode must still be around because we were called from evict and
specifically check that inode item is still present for deleting.
Finally, we use the omap test to discover that we shouldn't delete an
inode that is locally cached (and would be not be included to the omap
response).  We do all this under the inode write lock to serialize
between mounts.

Signed-off-by: Zach Brown <zab@versity.com>
2022-03-11 15:28:58 -08:00

24 lines
1014 B
C

#ifndef _SCOUTFS_OMAP_H_
#define _SCOUTFS_OMAP_H_
int scoutfs_omap_set(struct super_block *sb, u64 ino);
bool scoutfs_omap_test(struct super_block *sb, u64 ino);
void scoutfs_omap_clear(struct super_block *sb, u64 ino);
int scoutfs_omap_client_handle_request(struct super_block *sb, u64 id,
struct scoutfs_open_ino_map_args *args);
void scoutfs_omap_calc_group_nrs(u64 ino, u64 *group_nr, int *bit_nr);
int scoutfs_omap_add_rid(struct super_block *sb, u64 rid);
int scoutfs_omap_remove_rid(struct super_block *sb, u64 rid);
int scoutfs_omap_finished_recovery(struct super_block *sb);
int scoutfs_omap_server_handle_request(struct super_block *sb, u64 rid, u64 id,
struct scoutfs_open_ino_map_args *args);
int scoutfs_omap_server_handle_response(struct super_block *sb, u64 rid,
struct scoutfs_open_ino_map *resp_map);
void scoutfs_omap_server_shutdown(struct super_block *sb);
int scoutfs_omap_setup(struct super_block *sb);
void scoutfs_omap_destroy(struct super_block *sb);
#endif