Compare commits

...

19 Commits

Author SHA1 Message Date
Zach Brown
e095127ae9 v1.14 Release
Finish the release notes for the 1.14 release.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-29 10:03:53 -07:00
Zach Brown
a9da27444f Merge pull request #128 from versity/zab/prealloc_fragmentation
Zab/prealloc fragmentation
2023-06-29 09:57:32 -07:00
Zach Brown
49fe89741d Merge pull request #125 from versity/zab/get_referring_entries
Zab/get referring entries
2023-06-29 09:57:06 -07:00
Zach Brown
847916860d Advance move_blocks extent search offset
The move_blocks ioctl finds extents to move in the source file by
searching from the starting block offset of the region to move.
Logically, this is fine.  After each extent item is deleted the next
search will find the next extent.

The problem is that deleted items still exist in the item cache.  The
next iteration has to skip over all the deleted extents from the start
of the region.  This is fine with large extents, but with heavily
fragmented extents this creates a huge amplification of the number of
items to traverse when moving the fragmented extents in a large file.
(It's not quite O(n^2)/2 for the total extents, deleted items are purged
as we write out the dirty items in each transaction.. but it's still
immense.)

The fix is to simply start searching for the next extent after the one
we just moved.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 16:54:28 -07:00
Zach Brown
564b942ead Write test for hole filling noncontig prealloc
Add a test which exercises filling holes in prealloc regions when the
_contig_only prealloc option is not set.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 16:16:04 -07:00
Zach Brown
3d99fda0f6 Preallocate data around iblock when noncontig
If the _contig_only option isn't set then we try to preallocate aligned
regions of files.  The initial implementation naively only allowed one
preallocation attempt in each aligned region.  If it got a small
allocation that didn't fill the region then every future allocation
in the region would be a single block.

This changes every preallocation in the region to attempt to fill the
hole in the region that iblock fell in.  It uses an extra extent search
(item cache search) to try and avoid thousands of single block
allocations.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 12:21:25 -07:00
Zach Brown
6c0ab75477 Merge pull request #126 from versity/zab/rht_block_shrink_deadlock
Avoid deadlock from block reclaim in rht resize
2023-06-16 10:30:16 -07:00
Zach Brown
89b238a5c4 Add more acceptable quorum delay during testing
Loaded VMs can see a few more seconds delay.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-16 09:38:58 -07:00
Zach Brown
05371b83f0 Update expected console messages during testing
Signed-off-by: Zach Brown <zab@versity.com>
2023-06-16 09:37:37 -07:00
Zach Brown
acafb869e7 Avoid deadlock from block reclaim in rht resize
The RCU hash table uses deferred work to resize the hash table.  There's
a time during resize when hash table iteration will return EAGAIN until
resize makes more progress.  During this time resize can perform
GFP_KERNEL allocations.

Our shrinker tries to iterate over its RCU hash table to find blocks to
reclaim.  It tries to restart iteration if it gets EAGAIN on the
assumption that it will be usable again soon.

Combine the two and our shrinker can get stuck retrying iteration
indefinitely because it's shrinking on behalf of the hash table resizing
that is trying to allocate the next table before making iteration work
again.  We have to stop shrinking in this case so that the resizing
caller can proceed.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-15 14:45:26 -07:00
Zach Brown
74c5fe1115 Add get-referring-entries test
Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
2279e9657f Add get_referring_entries scoutfs command
Add a cli command for the get_referring_entries ioctl.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
707752a7bf Add get_referring_entries ioctl
Add an ioctl that gives the callers all entries that refer to an inode.
It's like a backwards readdir.  It's a light bit of translation between
the internal _add_next_linkrefs() list of entries and the ioctl
interface of a buffer of entry structs.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
0316c22026 Extend scoutfs_dir_add_next_linkrefs
Extend scoutfs_dir_add_next_linkref() to be able to return multiple
backrefs under the lock for each call and have it take an argument to
limit the number of backrefs that can be added and returned.

Its return code changes a bit in that it returns 1 on success instead of
0 so we have to be a little careful with callers who were expecting 0.
It still returns -ENOENT when no entries are found.

We break up its tracepoint into one that records each entry added and
one that records the result of each call.

This will be used by an ioctl to give callers just the entries that
point to an inode instead of assembling full paths from the root.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
5a1e5639c2 Merge pull request #124 from versity/zab/fix_quo_hb_mount_option
Zab/fix quo hb mount option
2023-06-07 10:50:32 -07:00
Zach Brown
950963375b Update quorum heartbeat test for mount option
Update the quorum_heartbeat_timeout_ms test to also test the mount
option, not just updating the timeout via sysfs.  This takes some
reworking as we have to avoid the active leader/server when setting the
timeout via the mount option.  We also allow for a bit more slack around
comparing kernel sleeps and userspace wall clocks.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-23 09:57:13 -07:00
Zach Brown
e52435b993 Add t_mount_opt
Add a test helper that mounts with a mount option.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-22 16:30:01 -07:00
Zach Brown
2b72c57cb0 Fix crash in quorum_heartbeat_timeout_ms parsing
Mount option parsing runs early enough that the rest of the option
read/write serialization infrastructure isn't set up yet.  The
quorum_heartbeat_timeout_ms mount option tried to use a helper that
updated the stored option but it wasn't initialized yet so it crashed.

The helper was really only to have the option validity test in one
place.  It's reworked to only verify the option and the actual setting
is left to the callers.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-22 16:29:56 -07:00
Zach Brown
9c67b2a42d Merge pull request #122 from versity/zab/v1.13
v1.13 Release
2023-05-19 11:38:48 -07:00
22 changed files with 1198 additions and 129 deletions

View File

@@ -1,6 +1,26 @@
Versity ScoutFS Release Notes
=============================
---
v1.14
\
*Jun 29, 2023*
Add get\_referring\_entries ioctl for getting directory entries that
refer to an inode.
Fix excessive CPU use in the move\_blocks interface when moving a large
number of extents.
Reduce fragmented data allocation when contig\_only prealloc is not in
use by more consistently allocating multi-block extents within each
aligned prealloc region.
Avoid rare deadlock in metadata block cache recalim under both heavy
load and memory pressure.
Fix crash when using quorum\_heartbeat\_timeout\_ms mount option.
---
v1.13
\

View File

@@ -1096,6 +1096,7 @@ static int block_shrink(struct shrinker *shrink, struct shrink_control *sc)
struct super_block *sb = binf->sb;
struct rhashtable_iter iter;
struct block_private *bp;
bool stop = false;
unsigned long nr;
u64 recently;
@@ -1107,7 +1108,6 @@ static int block_shrink(struct shrinker *shrink, struct shrink_control *sc)
nr = DIV_ROUND_UP(nr, SCOUTFS_BLOCK_LG_PAGES_PER);
restart:
recently = accessed_recently(binf);
rhashtable_walk_enter(&binf->ht, &iter);
rhashtable_walk_start(&iter);
@@ -1129,12 +1129,15 @@ restart:
if (bp == NULL)
break;
if (bp == ERR_PTR(-EAGAIN)) {
/* hard exit to wait for rcu rebalance to finish */
rhashtable_walk_stop(&iter);
rhashtable_walk_exit(&iter);
scoutfs_inc_counter(sb, block_cache_shrink_restart);
synchronize_rcu();
goto restart;
/*
* We can be called from reclaim in the allocation
* to resize the hash table itself. We have to
* return so that the caller can proceed and
* enable hash table iteration again.
*/
scoutfs_inc_counter(sb, block_cache_shrink_stop);
stop = true;
break;
}
scoutfs_inc_counter(sb, block_cache_shrink_next);
@@ -1157,8 +1160,11 @@ restart:
rhashtable_walk_stop(&iter);
rhashtable_walk_exit(&iter);
out:
return min_t(u64, (u64)atomic_read(&binf->total_inserted) * SCOUTFS_BLOCK_LG_PAGES_PER,
INT_MAX);
if (stop)
return -1;
else
return min_t(u64, INT_MAX,
(u64)atomic_read(&binf->total_inserted) * SCOUTFS_BLOCK_LG_PAGES_PER);
}
struct sm_block_completion {

View File

@@ -34,7 +34,7 @@
EXPAND_COUNTER(block_cache_shrink_next) \
EXPAND_COUNTER(block_cache_shrink_recent) \
EXPAND_COUNTER(block_cache_shrink_remove) \
EXPAND_COUNTER(block_cache_shrink_restart) \
EXPAND_COUNTER(block_cache_shrink_stop) \
EXPAND_COUNTER(btree_compact_values) \
EXPAND_COUNTER(btree_compact_values_enomem) \
EXPAND_COUNTER(btree_delete) \

View File

@@ -456,11 +456,13 @@ static int alloc_block(struct super_block *sb, struct inode *inode,
} else {
/*
* Preallocation of aligned regions only preallocates if
* the aligned region contains no extents at all. This
* could be fooled by offline sparse extents but we
* don't want to iterate over all offline extents in the
* aligned region.
* Preallocation within aligned regions tries to
* allocate an extent to fill the hole in the region
* that contains iblock. We search for a next extent
* from the start of the region. If it's at the start
* we might have to search again to find an existing
* extent at the end of the region. (This next could be
* given to us by the caller).
*/
div64_u64_rem(iblock, opts.data_prealloc_blocks, &rem);
start = iblock - rem;
@@ -468,8 +470,20 @@ static int alloc_block(struct super_block *sb, struct inode *inode,
ret = scoutfs_ext_next(sb, &data_ext_ops, &args, start, 1, &found);
if (ret < 0 && ret != -ENOENT)
goto out;
if (found.len && found.start < start + count)
count = 1;
/* trim count if there's an extent in the region before iblock */
if (found.len && found.start < iblock) {
count -= (found.start + found.len) - start;
start = found.start + found.len;
/* see if there's also an extent after iblock */
ret = scoutfs_ext_next(sb, &data_ext_ops, &args, iblock, 1, &found);
if (ret < 0 && ret != -ENOENT)
goto out;
}
/* trim count by a next extent in the region */
if (found.len && found.start > start && found.start < start + count)
count = (found.start - start);
}
/* overall prealloc limit */
@@ -1253,6 +1267,7 @@ int scoutfs_data_move_blocks(struct inode *from, u64 from_off,
from_iblock = from_off >> SCOUTFS_BLOCK_SM_SHIFT;
count = (byte_len + SCOUTFS_BLOCK_SM_MASK) >> SCOUTFS_BLOCK_SM_SHIFT;
to_iblock = to_off >> SCOUTFS_BLOCK_SM_SHIFT;
from_start = from_iblock;
/* only move extent blocks inside i_size, careful not to wrap */
from_size = i_size_read(from);
@@ -1329,7 +1344,7 @@ int scoutfs_data_move_blocks(struct inode *from, u64 from_off,
/* find the next extent to move */
ret = scoutfs_ext_next(sb, &data_ext_ops, &from_args,
from_iblock, 1, &ext);
from_start, 1, &ext);
if (ret < 0) {
if (ret == -ENOENT) {
done = true;
@@ -1417,6 +1432,12 @@ int scoutfs_data_move_blocks(struct inode *from, u64 from_off,
i_size_read(from);
i_size_write(to, to_size);
}
/* find next after moved extent, avoiding wrapping */
if (from_start + len < from_start)
from_start = from_iblock + count + 1;
else
from_start += len;
}

View File

@@ -1253,75 +1253,93 @@ int scoutfs_symlink_drop(struct super_block *sb, u64 ino,
}
/*
* Find the next link backref key for the given ino starting from the
* given dir inode and final entry position. If we find a backref item
* we add an allocated copy of it to the head of the caller's list.
* Find the next link backref items for the given ino starting from the
* given dir inode and final entry position. For each backref item we
* add an allocated copy of it to the head of the caller's list.
*
* Returns 0 if we added an entry, -ENOENT if we didn't, and -errno for
* search errors.
* Callers who are building a path can add one entry for each parent.
* They're left with a list of entries from the root down in list order.
*
* Callers who are gathering multiple entries for one inode get the
* entries in the opposite order that their items are found.
*
* Returns +ve for number of entries added, -ENOENT if no entries were
* found, or -errno on error. It weirdly won't return 0, but early
* callers preferred -ENOENT so we use that for the case of no entries.
*
* Callers are comfortable with the race inherent to incrementally
* building up a path with individual locked backref item lookups.
* gathering backrefs across multiple lock acquisitions.
*/
int scoutfs_dir_add_next_linkref(struct super_block *sb, u64 ino,
u64 dir_ino, u64 dir_pos,
struct list_head *list)
int scoutfs_dir_add_next_linkrefs(struct super_block *sb, u64 ino, u64 dir_ino, u64 dir_pos,
int count, struct list_head *list)
{
struct scoutfs_link_backref_entry *prev_ent = NULL;
struct scoutfs_link_backref_entry *ent = NULL;
struct scoutfs_lock *lock = NULL;
struct scoutfs_key last_key;
struct scoutfs_key key;
int nr = 0;
int len;
int ret;
ent = kmalloc(offsetof(struct scoutfs_link_backref_entry,
dent.name[SCOUTFS_NAME_LEN]), GFP_KERNEL);
if (!ent) {
ret = -ENOMEM;
goto out;
}
INIT_LIST_HEAD(&ent->head);
init_dirent_key(&key, SCOUTFS_LINK_BACKREF_TYPE, ino, dir_ino, dir_pos);
init_dirent_key(&last_key, SCOUTFS_LINK_BACKREF_TYPE, ino, U64_MAX,
U64_MAX);
init_dirent_key(&last_key, SCOUTFS_LINK_BACKREF_TYPE, ino, U64_MAX, U64_MAX);
ret = scoutfs_lock_ino(sb, SCOUTFS_LOCK_READ, 0, ino, &lock);
if (ret)
goto out;
ret = scoutfs_item_next(sb, &key, &last_key, &ent->dent,
dirent_bytes(SCOUTFS_NAME_LEN), lock);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
lock = NULL;
if (ret < 0)
goto out;
while (nr < count) {
ent = kmalloc(offsetof(struct scoutfs_link_backref_entry,
dent.name[SCOUTFS_NAME_LEN]), GFP_NOFS);
if (!ent) {
ret = -ENOMEM;
goto out;
}
len = ret - sizeof(struct scoutfs_dirent);
if (len < 1 || len > SCOUTFS_NAME_LEN) {
scoutfs_corruption(sb, SC_DIRENT_BACKREF_NAME_LEN,
corrupt_dirent_backref_name_len,
"ino %llu dir_ino %llu pos %llu key "SK_FMT" len %d",
ino, dir_ino, dir_pos, SK_ARG(&key), len);
ret = -EIO;
goto out;
INIT_LIST_HEAD(&ent->head);
ret = scoutfs_item_next(sb, &key, &last_key, &ent->dent,
dirent_bytes(SCOUTFS_NAME_LEN), lock);
if (ret < 0) {
if (ret == -ENOENT && prev_ent)
prev_ent->last = true;
goto out;
}
len = ret - sizeof(struct scoutfs_dirent);
if (len < 1 || len > SCOUTFS_NAME_LEN) {
scoutfs_corruption(sb, SC_DIRENT_BACKREF_NAME_LEN,
corrupt_dirent_backref_name_len,
"ino %llu dir_ino %llu pos %llu key "SK_FMT" len %d",
ino, dir_ino, dir_pos, SK_ARG(&key), len);
ret = -EIO;
goto out;
}
ent->dir_ino = le64_to_cpu(key.skd_major);
ent->dir_pos = le64_to_cpu(key.skd_minor);
ent->name_len = len;
ent->d_type = dentry_type(ent->dent.type);
ent->last = false;
trace_scoutfs_dir_add_next_linkref_found(sb, ino, ent->dir_ino, ent->dir_pos,
ent->name_len);
list_add(&ent->head, list);
prev_ent = ent;
ent = NULL;
nr++;
scoutfs_key_inc(&key);
}
list_add(&ent->head, list);
ent->dir_ino = le64_to_cpu(key.skd_major);
ent->dir_pos = le64_to_cpu(key.skd_minor);
ent->name_len = len;
ret = 0;
out:
trace_scoutfs_dir_add_next_linkref(sb, ino, dir_ino, dir_pos, ret,
ent ? ent->dir_ino : 0,
ent ? ent->dir_pos : 0,
ent ? ent->name_len : 0);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
trace_scoutfs_dir_add_next_linkrefs(sb, ino, dir_ino, dir_pos, count, nr, ret);
if (ent && list_empty(&ent->head))
kfree(ent);
return ret;
kfree(ent);
return nr ?: ret;
}
static u64 first_backref_dir_ino(struct list_head *list)
@@ -1396,7 +1414,7 @@ retry:
}
/* get the next link name to the given inode */
ret = scoutfs_dir_add_next_linkref(sb, ino, dir_ino, dir_pos, list);
ret = scoutfs_dir_add_next_linkrefs(sb, ino, dir_ino, dir_pos, 1, list);
if (ret < 0)
goto out;
@@ -1404,7 +1422,7 @@ retry:
par_ino = first_backref_dir_ino(list);
while (par_ino != SCOUTFS_ROOT_INO) {
ret = scoutfs_dir_add_next_linkref(sb, par_ino, 0, 0, list);
ret = scoutfs_dir_add_next_linkrefs(sb, par_ino, 0, 0, 1, list);
if (ret < 0) {
if (ret == -ENOENT) {
/* restart if there was no parent component */
@@ -1416,6 +1434,8 @@ retry:
par_ino = first_backref_dir_ino(list);
}
ret = 0;
out:
if (ret < 0)
scoutfs_dir_free_backref_path(sb, list);

View File

@@ -15,6 +15,8 @@ struct scoutfs_link_backref_entry {
u64 dir_ino;
u64 dir_pos;
u16 name_len;
u8 d_type;
bool last;
struct scoutfs_dirent dent;
/* the full name is allocated and stored in dent.name[] */
};
@@ -24,9 +26,8 @@ int scoutfs_dir_get_backref_path(struct super_block *sb, u64 ino, u64 dir_ino,
void scoutfs_dir_free_backref_path(struct super_block *sb,
struct list_head *list);
int scoutfs_dir_add_next_linkref(struct super_block *sb, u64 ino,
u64 dir_ino, u64 dir_pos,
struct list_head *list);
int scoutfs_dir_add_next_linkrefs(struct super_block *sb, u64 ino, u64 dir_ino, u64 dir_pos,
int count, struct list_head *list);
int scoutfs_symlink_drop(struct super_block *sb, u64 ino,
struct scoutfs_lock *lock, u64 i_size);

View File

@@ -114,8 +114,8 @@ static struct dentry *scoutfs_get_parent(struct dentry *child)
int ret;
u64 ino;
ret = scoutfs_dir_add_next_linkref(sb, scoutfs_ino(inode), 0, 0, &list);
if (ret)
ret = scoutfs_dir_add_next_linkrefs(sb, scoutfs_ino(inode), 0, 0, 1, &list);
if (ret < 0)
return ERR_PTR(ret);
ent = list_first_entry(&list, struct scoutfs_link_backref_entry, head);
@@ -138,9 +138,9 @@ static int scoutfs_get_name(struct dentry *parent, char *name,
LIST_HEAD(list);
int ret;
ret = scoutfs_dir_add_next_linkref(sb, scoutfs_ino(inode), dir_ino,
0, &list);
if (ret)
ret = scoutfs_dir_add_next_linkrefs(sb, scoutfs_ino(inode), dir_ino,
0, 1, &list);
if (ret < 0)
return ret;
ret = -ENOENT;

View File

@@ -1398,6 +1398,110 @@ out:
return ret ?: nr;
}
/*
* Copy entries that point to an inode to the user's buffer. We copy to
* userspace from copies of the entries that are acquired under a lock
* so that we don't fault while holding cluster locks. It also gives us
* a chance to limit the amount of work under each lock hold.
*/
static long scoutfs_ioc_get_referring_entries(struct file *file, unsigned long arg)
{
struct super_block *sb = file_inode(file)->i_sb;
struct scoutfs_ioctl_get_referring_entries gre;
struct scoutfs_link_backref_entry *bref = NULL;
struct scoutfs_link_backref_entry *bref_tmp;
struct scoutfs_ioctl_dirent __user *uent;
struct scoutfs_ioctl_dirent ent;
LIST_HEAD(list);
u64 copied;
int name_len;
int bytes;
long nr;
int ret;
if (!capable(CAP_DAC_READ_SEARCH))
return -EPERM;
if (copy_from_user(&gre, (void __user *)arg, sizeof(gre)))
return -EFAULT;
uent = (void __user *)(unsigned long)gre.entries_ptr;
copied = 0;
nr = 0;
/* use entry as cursor between calls */
ent.dir_ino = gre.dir_ino;
ent.dir_pos = gre.dir_pos;
for (;;) {
ret = scoutfs_dir_add_next_linkrefs(sb, gre.ino, ent.dir_ino, ent.dir_pos, 1024,
&list);
if (ret < 0) {
if (ret == -ENOENT)
ret = 0;
goto out;
}
/* _add_next adds each entry to the head, _reverse for key order */
list_for_each_entry_safe_reverse(bref, bref_tmp, &list, head) {
list_del_init(&bref->head);
name_len = bref->name_len;
bytes = ALIGN(offsetof(struct scoutfs_ioctl_dirent, name[name_len + 1]),
16);
if (copied + bytes > gre.entries_bytes) {
ret = -EINVAL;
goto out;
}
ent.dir_ino = bref->dir_ino;
ent.dir_pos = bref->dir_pos;
ent.ino = gre.ino;
ent.entry_bytes = bytes;
ent.flags = bref->last ? SCOUTFS_IOCTL_DIRENT_FLAG_LAST : 0;
ent.d_type = bref->d_type;
ent.name_len = name_len;
if (copy_to_user(uent, &ent, sizeof(struct scoutfs_ioctl_dirent)) ||
copy_to_user(&uent->name[0], bref->dent.name, name_len) ||
put_user('\0', &uent->name[name_len])) {
ret = -EFAULT;
goto out;
}
kfree(bref);
bref = NULL;
uent = (void __user *)uent + bytes;
copied += bytes;
nr++;
if (nr == LONG_MAX || (ent.flags & SCOUTFS_IOCTL_DIRENT_FLAG_LAST)) {
ret = 0;
goto out;
}
}
/* advance cursor pos from last copied entry */
if (++ent.dir_pos == 0) {
if (++ent.dir_ino == 0) {
ret = 0;
goto out;
}
}
}
ret = 0;
out:
kfree(bref);
list_for_each_entry_safe(bref, bref_tmp, &list, head) {
list_del_init(&bref->head);
kfree(bref);
}
return nr ?: ret;
}
long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
@@ -1433,6 +1537,8 @@ long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return scoutfs_ioc_read_xattr_totals(file, arg);
case SCOUTFS_IOC_GET_ALLOCATED_INOS:
return scoutfs_ioc_get_allocated_inos(file, arg);
case SCOUTFS_IOC_GET_REFERRING_ENTRIES:
return scoutfs_ioc_get_referring_entries(file, arg);
}
return -ENOTTY;

View File

@@ -559,4 +559,118 @@ struct scoutfs_ioctl_get_allocated_inos {
#define SCOUTFS_IOC_GET_ALLOCATED_INOS \
_IOW(SCOUTFS_IOCTL_MAGIC, 16, struct scoutfs_ioctl_get_allocated_inos)
/*
* Get directory entries that refer to a specific inode.
*
* @ino: The target ino that we're finding referring entries to.
* Constant across all the calls that make up an iteration over all the
* inode's entries.
*
* @dir_ino: The inode number of a directory containing the entry to our
* inode to search from. If this parent directory contains no more
* entries to our inode then we'll search through other parent directory
* inodes in inode order.
*
* @dir_pos: The position in the dir_ino parent directory of the entry
* to our inode to search from. If there is no entry at this position
* then we'll search through other entry positions in increasing order.
* If we exhaust the parent directory then we'll search through
* additional parent directories in inode order.
*
* @entries_ptr: A pointer to the buffer where found entries will be
* stored. The pointer must be aligned to 16 bytes.
*
* @entries_bytes: The size of the buffer that will contain entries.
*
* To start iterating set the desired target ino, dir_ino to 0, dir_pos
* to 0, and set result_ptr and _bytes to a sufficiently large buffer.
* Each entry struct that's stored in the buffer adds some overhead so a
* large multiple of the largest possible name is a reasonable choice.
* (A few multiples of PATH_MAX perhaps.)
*
* Each call returns the total number of entries that were stored in the
* entries buffer. Zero is returned when the search was successful and
* no referring entries were found. The entries can be iterated over by
* advancing each starting struct offset by the total number of bytes in
* each entry. If the _LAST flag is set on an entry then there were no
* more entries referring to the inode at the time of the call and
* iteration can be stopped.
*
* To resume iteration set the next call's starting dir_ino and dir_pos
* to one past the last entry seen. Increment the last entry's dir_pos,
* and if it wrapped to 0, increment its dir_ino.
*
* This does not check that the caller has permission to read the
* entries found in each containing directory. It requires
* CAP_DAC_READ_SEARCH which bypasses path traversal permissions
* checking.
*
* Entries returned by a single call can reflect any combination of
* racing creation and removal of entries. Each entry existed at the
* time it was read though it may have changed in the time it took to
* return from the call. The set of entries returned may no longer
* reflect the current set of entries and may not have existed at the
* same time.
*
* This has no knowledge of the life cycle of the inode. It can return
* 0 when there are no referring entries because either the target inode
* doesn't exist, it is in the process of being deleted, or because it
* is still open while being unlinked.
*
* On success this returns the number of entries filled in the buffer.
* A return of 0 indicates that no entries referred to the inode.
*
* EINVAL is returned when there is a problem with the buffer. Either
* it was not aligned or it was not large enough for the first entry.
*
* Many other errnos indicate hard failure to find the next entry.
*/
struct scoutfs_ioctl_get_referring_entries {
__u64 ino;
__u64 dir_ino;
__u64 dir_pos;
__u64 entries_ptr;
__u64 entries_bytes;
};
/*
* @dir_ino: The inode of the directory containing the entry.
*
* @dir_pos: The readdir f_pos position of the entry within the
* directory.
*
* @ino: The inode number of the target of the entry.
*
* @flags: Flags associated with this entry.
*
* @d_type: Inode type as specified with DT_ enum values in readdir(3).
*
* @entry_bytes: The total bytes taken by the entry in memory, including
* the name and any alignment padding. The start of a following entry
* will be found after this number of bytes.
*
* @name_len: The number of bytes in the name not including the trailing
* null, ala strlen(3).
*
* @name: The null terminated name of the referring entry. In the
* struct definition this array is sized to naturally align the struct.
* That number of padded bytes are not necessarily found in the buffer
* returned by _get_referring_entries;
*/
struct scoutfs_ioctl_dirent {
__u64 dir_ino;
__u64 dir_pos;
__u64 ino;
__u16 entry_bytes;
__u8 flags;
__u8 d_type;
__u8 name_len;
__u8 name[3];
};
#define SCOUTFS_IOCTL_DIRENT_FLAG_LAST (1 << 0)
#define SCOUTFS_IOC_GET_REFERRING_ENTRIES \
_IOW(SCOUTFS_IOCTL_MAGIC, 17, struct scoutfs_ioctl_get_referring_entries)
#endif

View File

@@ -131,10 +131,8 @@ static void init_default_options(struct scoutfs_mount_options *opts)
opts->quorum_slot_nr = -1;
}
static int set_quorum_heartbeat_timeout_ms(struct super_block *sb, int ret, u64 val)
static int verify_quorum_heartbeat_timeout_ms(struct super_block *sb, int ret, u64 val)
{
DECLARE_OPTIONS_INFO(sb, optinf);
if (ret < 0) {
scoutfs_err(sb, "failed to parse quorum_heartbeat_timeout_ms value");
return -EINVAL;
@@ -145,10 +143,6 @@ static int set_quorum_heartbeat_timeout_ms(struct super_block *sb, int ret, u64
return -EINVAL;
}
write_seqlock(&optinf->seqlock);
optinf->opts.quorum_heartbeat_timeout_ms = val;
write_sequnlock(&optinf->seqlock);
return 0;
}
@@ -232,9 +226,10 @@ static int parse_options(struct super_block *sb, char *options, struct scoutfs_m
case Opt_quorum_heartbeat_timeout_ms:
ret = match_u64(args, &nr64);
ret = set_quorum_heartbeat_timeout_ms(sb, ret, nr64);
ret = verify_quorum_heartbeat_timeout_ms(sb, ret, nr64);
if (ret < 0)
return ret;
opts->quorum_heartbeat_timeout_ms = nr64;
break;
case Opt_quorum_slot_nr:
@@ -493,6 +488,7 @@ static ssize_t quorum_heartbeat_timeout_ms_store(struct kobject *kobj, struct ko
const char *buf, size_t count)
{
struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
DECLARE_OPTIONS_INFO(sb, optinf);
char nullterm[30]; /* more than enough for octal -U64_MAX */
u64 val;
int len;
@@ -503,9 +499,13 @@ static ssize_t quorum_heartbeat_timeout_ms_store(struct kobject *kobj, struct ko
nullterm[len] = '\0';
ret = kstrtoll(nullterm, 0, &val);
ret = set_quorum_heartbeat_timeout_ms(sb, ret, val);
if (ret == 0)
ret = verify_quorum_heartbeat_timeout_ms(sb, ret, val);
if (ret == 0) {
write_seqlock(&optinf->seqlock);
optinf->opts.quorum_heartbeat_timeout_ms = val;
write_sequnlock(&optinf->seqlock);
ret = count;
}
return ret;
}

View File

@@ -817,22 +817,17 @@ TRACE_EVENT(scoutfs_advance_dirty_super,
TP_printk(SCSBF" super seq now %llu", SCSB_TRACE_ARGS, __entry->seq)
);
TRACE_EVENT(scoutfs_dir_add_next_linkref,
TRACE_EVENT(scoutfs_dir_add_next_linkref_found,
TP_PROTO(struct super_block *sb, __u64 ino, __u64 dir_ino,
__u64 dir_pos, int ret, __u64 found_dir_ino,
__u64 found_dir_pos, unsigned int name_len),
__u64 dir_pos, unsigned int name_len),
TP_ARGS(sb, ino, dir_ino, dir_pos, ret, found_dir_pos, found_dir_ino,
name_len),
TP_ARGS(sb, ino, dir_ino, dir_pos, name_len),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, ino)
__field(__u64, dir_ino)
__field(__u64, dir_pos)
__field(int, ret)
__field(__u64, found_dir_ino)
__field(__u64, found_dir_pos)
__field(unsigned int, name_len)
),
@@ -841,16 +836,43 @@ TRACE_EVENT(scoutfs_dir_add_next_linkref,
__entry->ino = ino;
__entry->dir_ino = dir_ino;
__entry->dir_pos = dir_pos;
__entry->ret = ret;
__entry->found_dir_ino = dir_ino;
__entry->found_dir_pos = dir_pos;
__entry->name_len = name_len;
),
TP_printk(SCSBF" ino %llu dir_ino %llu dir_pos %llu ret %d found_dir_ino %llu found_dir_pos %llu name_len %u",
SCSB_TRACE_ARGS, __entry->ino, __entry->dir_pos,
__entry->dir_ino, __entry->ret, __entry->found_dir_pos,
__entry->found_dir_ino, __entry->name_len)
TP_printk(SCSBF" ino %llu dir_ino %llu dir_pos %llu name_len %u",
SCSB_TRACE_ARGS, __entry->ino, __entry->dir_ino,
__entry->dir_pos, __entry->name_len)
);
TRACE_EVENT(scoutfs_dir_add_next_linkrefs,
TP_PROTO(struct super_block *sb, __u64 ino, __u64 dir_ino,
__u64 dir_pos, int count, int nr, int ret),
TP_ARGS(sb, ino, dir_ino, dir_pos, count, nr, ret),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, ino)
__field(__u64, dir_ino)
__field(__u64, dir_pos)
__field(int, count)
__field(int, nr)
__field(int, ret)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->ino = ino;
__entry->dir_ino = dir_ino;
__entry->dir_pos = dir_pos;
__entry->count = count;
__entry->nr = nr;
__entry->ret = ret;
),
TP_printk(SCSBF" ino %llu dir_ino %llu dir_pos %llu count %d nr %d ret %d",
SCSB_TRACE_ARGS, __entry->ino, __entry->dir_ino,
__entry->dir_pos, __entry->count, __entry->nr, __entry->ret)
);
TRACE_EVENT(scoutfs_write_begin,

View File

@@ -18,6 +18,7 @@ t_filter_dmesg()
# the kernel can just be noisy
re=" used greatest stack depth: "
re="$re|sched: RT throttling activated"
# mkfs/mount checks partition tables
re="$re|unknown partition table"

View File

@@ -153,7 +153,27 @@ t_mount()
test "$nr" -lt "$T_NR_MOUNTS" || \
t_fail "fs nr $nr invalid"
eval t_quiet mount -t scoutfs \$T_O$nr \$T_DB$nr \$T_M$nr
eval t_quiet mount -t scoutfs \$T_O$nr\$opt \$T_DB$nr \$T_M$nr
}
#
# Mount with an optional mount option string. If the string is empty
# then the saved mount options are used. If the string has contents
# then it is appended to the end of the saved options with a separating
# comma.
#
# Unlike t_mount this won't inherently fail in t_quiet, errors are
# returned so bad options can be tested.
#
t_mount_opt()
{
local nr="$1"
local opt="${2:+,$2}"
test "$nr" -lt "$T_NR_MOUNTS" || \
t_fail "fs nr $nr invalid"
eval mount -t scoutfs \$T_O$nr\$opt \$T_DB$nr \$T_M$nr
}
t_umount()

View File

@@ -24,3 +24,325 @@
/mnt/test/test/data-prealloc/file-2: 5 extents found
/mnt/test/test/data-prealloc/file-1: 3 extents found
/mnt/test/test/data-prealloc/file-2: 3 extents found
== block writes into region allocs hole
wrote blk 24
wrote blk 32
wrote blk 40
wrote blk 55
wrote blk 63
wrote blk 71
wrote blk 72
wrote blk 79
wrote blk 80
wrote blk 87
wrote blk 88
wrote blk 95
before:
24.. 1:
32.. 1:
40.. 1:
55.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 0 at pos 0
wrote blk 0
0.. 1:
1.. 7: unwritten
24.. 1:
32.. 1:
40.. 1:
55.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 0 at pos 1
wrote blk 15
0.. 1:
1.. 14: unwritten
15.. 1:
24.. 1:
32.. 1:
40.. 1:
55.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 0 at pos 2
wrote blk 19
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
32.. 1:
40.. 1:
55.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 1 at pos 0
wrote blk 25
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
40.. 1:
55.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 1 at pos 1
wrote blk 39
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
55.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 1 at pos 2
wrote blk 44
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
55.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 2 at pos 0
wrote blk 48
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
49.. 6: unwritten
55.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 2 at pos 1
wrote blk 62
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
49.. 6: unwritten
55.. 1:
56.. 6: unwritten
62.. 1:
63.. 1:
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 2 at pos 2
wrote blk 67
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
49.. 6: unwritten
55.. 1:
56.. 6: unwritten
62.. 1:
63.. 1:
64.. 3: unwritten
67.. 1:
68.. 3: unwritten
71.. 2:
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 3 at pos 0
wrote blk 73
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
49.. 6: unwritten
55.. 1:
56.. 6: unwritten
62.. 1:
63.. 1:
64.. 3: unwritten
67.. 1:
68.. 3: unwritten
71.. 2:
73.. 1:
74.. 5: unwritten
79.. 2:
87.. 2:
95.. 1: eof
writing into existing 3 at pos 1
wrote blk 86
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
49.. 6: unwritten
55.. 1:
56.. 6: unwritten
62.. 1:
63.. 1:
64.. 3: unwritten
67.. 1:
68.. 3: unwritten
71.. 2:
73.. 1:
74.. 5: unwritten
79.. 2:
81.. 5: unwritten
86.. 1:
87.. 2:
95.. 1: eof
writing into existing 3 at pos 2
wrote blk 92
0.. 1:
1.. 14: unwritten
15.. 1:
16.. 3: unwritten
19.. 1:
20.. 4: unwritten
24.. 1:
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
49.. 6: unwritten
55.. 1:
56.. 6: unwritten
62.. 1:
63.. 1:
64.. 3: unwritten
67.. 1:
68.. 3: unwritten
71.. 2:
73.. 1:
74.. 5: unwritten
79.. 2:
81.. 5: unwritten
86.. 1:
87.. 2:
89.. 3: unwritten
92.. 1:
93.. 2: unwritten
95.. 1: eof

View File

@@ -0,0 +1,18 @@
== root inode returns nothing
== crazy large unused inode does nothing
== basic entry
file
== rename
renamed
== hard link
file
link
== removal
== different dirs
== file types
type b name block
type c name char
type d name dir
type f name file
type l name symlink
== all name lengths work

View File

@@ -1,2 +1,5 @@
== bad timeout values fail
== test different timeouts
== bad mount option fails
== mount option
== sysfs
== reset all options

View File

@@ -5,6 +5,7 @@ inode-items-updated.sh
simple-inode-index.sh
simple-staging.sh
simple-release-extents.sh
get-referring-entries.sh
fallocate.sh
basic-truncate.sh
data-prealloc.sh

View File

@@ -6,6 +6,15 @@
#
t_require_commands scoutfs stat filefrag dd touch truncate
write_block()
{
local file="$1"
local blk="$2"
dd if=/dev/zero of="$file" bs=4096 seek=$blk count=1 conv=notrunc status=none
echo "wrote blk $blk"
}
write_forwards()
{
local prefix="$1"
@@ -70,6 +79,25 @@ print_extents_found()
filefrag "$prefix"* 2>&1 | grep "extent.*found" | t_filter_fs
}
#
# print the logical start, len, and flags if they're there.
#
print_logical_extents()
{
local file="$1"
filefrag -v -b4096 "$file" 2>&1 | t_filter_fs | awk '
($1 ~ /[0-9]+:/) {
if ($NF !~ /[0-9]+:/) {
flags=$NF
} else {
flags=""
}
print $2, $6, flags
}
'
}
t_save_all_sysfs_mount_options data_prealloc_blocks
t_save_all_sysfs_mount_options data_prealloc_contig_only
restore_options()
@@ -133,4 +161,70 @@ t_set_sysfs_mount_option 0 data_prealloc_contig_only 0
write_forwards $prefix 3
print_extents_found $prefix
#
# prepare aligned regions of 8 blocks that we'll write into.
# We'll right into the first, last, and middle block of each
# region which was prepared with no existing extents, one at
# the start, and one at the end.
#
# Let's keep this last because it creates a ton of output to read
# through.
#
echo "== block writes into region allocs hole"
t_set_sysfs_mount_option 0 data_prealloc_blocks 8
t_set_sysfs_mount_option 0 data_prealloc_contig_only 1
touch "$prefix"
truncate -s 0 "$prefix"
# write initial blocks in regions
base=0
for sides in 0 1 2 3; do
for i in 0 1 2; do
case "$sides" in
# none
0) ;;
# left
1) write_block $prefix $((base + 0)) ;;
# right
2) write_block $prefix $((base + 7)) ;;
# both
3) write_block $prefix $((base + 0))
write_block $prefix $((base + 7)) ;;
esac
((base+=8))
done
done
echo before:
print_logical_extents "$prefix"
# now write into the first, middle, and last empty block of each
t_set_sysfs_mount_option 0 data_prealloc_contig_only 0
base=0
for sides in 0 1 2 3; do
for i in 0 1 2; do
echo "writing into existing $sides at pos $i"
case "$sides" in
# none
0) left=$base; right=$((base + 7));;
# left
1) left=$((base + 1)); right=$((base + 7));;
# right
2) left=$((base)); right=$((base + 6));;
# both
3) left=$((base + 1)); right=$((base + 6));;
esac
case "$i" in
# start
0) write_block $prefix $left ;;
# end
1) write_block $prefix $right ;;
# mid (both has 6 blocks internally)
2) write_block $prefix $((left + 3)) ;;
esac
print_logical_extents "$prefix"
((base+=8))
done
done
t_pass

View File

@@ -0,0 +1,99 @@
#
# Test _GET_REFERRING_ENTRIES ioctl via the get-referring-entries cli
# command
#
# consistently print only entry names
filter_names() {
exec cut -d ' ' -f 8- | sort
}
# print entries with type characters to match find. not happy with hard
# coding, but abi won't change much.
filter_types() {
exec cut -d ' ' -f 5- | \
sed \
-e 's/type 1 /type p /' \
-e 's/type 2 /type c /' \
-e 's/type 4 /type d /' \
-e 's/type 6 /type b /' \
-e 's/type 8 /type f /' \
-e 's/type 10 /type l /' \
-e 's/type 12 /type s /' \
| \
sort
}
n_chars() {
local n="$1"
printf 'A%.0s' $(eval echo {1..\$n})
}
GRE="scoutfs get-referring-entries -p $T_M0"
echo "== root inode returns nothing"
$GRE 1
echo "== crazy large unused inode does nothing"
$GRE 4611686018427387904 # 1 << 62
echo "== basic entry"
touch $T_D0/file
ino=$(stat -c '%i' $T_D0/file)
$GRE $ino | filter_names
echo "== rename"
mv $T_D0/file $T_D0/renamed
$GRE $ino | filter_names
echo "== hard link"
mv $T_D0/renamed $T_D0/file
ln $T_D0/file $T_D0/link
$GRE $ino | filter_names
echo "== removal"
rm $T_D0/file $T_D0/link
$GRE $ino
echo "== different dirs"
touch $T_D0/file
ino=$(stat -c '%i' $T_D0/file)
for i in $(seq 1 10); do
mkdir $T_D0/dir-$i
ln $T_D0/file $T_D0/dir-$i/file-$i
done
diff -u <(find $T_D0 -type f -printf '%f\n' | sort) <($GRE $ino | filter_names)
rm $T_D0/file
echo "== file types"
mkdir $T_D0/dir
touch $T_D0/dir/file
mkdir $T_D0/dir/dir
ln -s $T_D0/dir/file $T_D0/dir/symlink
mknod $T_D0/dir/char c 1 3 # null
mknod $T_D0/dir/block b 7 0 # loop0
for name in $(ls -UA $T_D0/dir | sort); do
ino=$(stat -c '%i' $T_D0/dir/$name)
$GRE $ino | filter_types
done
rm -rf $T_D0/dir
echo "== all name lengths work"
mkdir $T_D0/dir
touch $T_D0/dir/file
ino=$(stat -c '%i' $T_D0/dir/file)
name=""
> $T_TMP.unsorted
for i in $(seq 1 255); do
name+="a"
echo "$name" >> $T_TMP.unsorted
ln $T_D0/dir/file $T_D0/dir/$name
done
sort $T_TMP.unsorted > $T_TMP.sorted
rm $T_D0/dir/file
$GRE $ino | filter_names > $T_TMP.gre
diff -u $T_TMP.sorted $T_TMP.gre
rm -rf $T_D0/dir
t_pass

View File

@@ -17,43 +17,52 @@ set_bad_timeout() {
t_fail "set bad q hb to $to"
}
set_quorum_timeouts()
set_timeout()
{
local to="$1"
local was
local nr="$1"
local how="$2"
local to="$3"
local is
for nr in $(t_quorum_nrs); do
local mnt="$(eval echo \$T_M$nr)"
was=$(t_get_sysfs_mount_option $nr quorum_heartbeat_timeout_ms)
if [ $how == "sysfs" ]; then
t_set_sysfs_mount_option $nr quorum_heartbeat_timeout_ms $to
is=$(t_get_sysfs_mount_option $nr quorum_heartbeat_timeout_ms)
fi
if [ $how == "mount" ]; then
t_umount $nr
t_mount_opt $nr "quorum_heartbeat_timeout_ms=$to"
fi
if [ "$is" != "$to" ]; then
t_fail "tried to set qhbto on $nr to $to but got $is"
fi
done
is=$(t_get_sysfs_mount_option $nr quorum_heartbeat_timeout_ms)
if [ "$is" != "$to" ]; then
t_fail "tried to set qhbto on $nr via $how to $to but got $is"
fi
}
test_timeout()
{
local to="$1"
local orig_to
local how="$1"
local to="$2"
local start
local nr
local sv
local delay
local low
local high
# set new timeouts, saving original
orig_to=$(t_get_sysfs_mount_option 0 quorum_heartbeat_timeout_ms)
set_quorum_timeouts $to
# set timeout on non-server quorum mounts
sv=$(t_server_nr)
for nr in $(t_quorum_nrs); do
if [ $nr -ne $sv ]; then
set_timeout $nr $how $to
fi
done
# give followers time to recv heartbeats and reset timeouts
sleep 1
# tear down the current server/leader
nr=$(t_server_nr)
t_force_umount $nr
t_force_umount $sv
# see how long it takes for the next leader to start
start=$(time_ms)
@@ -64,15 +73,15 @@ test_timeout()
echo "to $to delay $delay" >> $T_TMP.delay
# restore the mount that we tore down
t_mount $nr
t_mount $sv
# reset the original timeouts
set_quorum_timeouts $orig_to
# make sure the new leader delay was reasonable, allowing for some slack
low=$((to - 1000))
high=$((to + 5000))
# make sure the new leader delay was reasonable
test "$delay" -gt "$to" || t_fail "delay $delay < to $to"
# allow 5 seconds of slop
test "$delay" -lt $(($to + 5000)) || t_fail "delay $delay > to $to + 5sec"
test "$delay" -lt "$low" && t_fail "delay $delay < low $low (to $to)"
test "$delay" -gt "$high" && t_fail "delay $delay > high $high (to $to)"
}
echo "== bad timeout values fail"
@@ -80,10 +89,29 @@ set_bad_timeout 0
set_bad_timeout -1
set_bad_timeout 1000000
echo "== test different timeouts"
echo "== bad mount option fails"
if [ "$(t_server_nr)" == 0 ]; then
nr=1
else
nr=0
fi
t_umount $nr
t_mount_opt $nr "quorum_heartbeat_timeout_ms=1000000" 2>/dev/null && \
t_fail "bad mount option succeeded"
t_mount $nr
echo "== mount option"
def=$(t_get_sysfs_mount_option 0 quorum_heartbeat_timeout_ms)
test_timeout $def
test_timeout 3000
test_timeout $((def + 19000))
test_timeout mount $def
test_timeout mount 3000
test_timeout mount $((def + 19000))
echo "== sysfs"
test_timeout sysfs $def
test_timeout sysfs 3000
test_timeout sysfs $((def + 19000))
echo "== reset all options"
t_remount_all
t_pass

View File

@@ -209,6 +209,29 @@ A path within a ScoutFS filesystem.
.RE
.PD
.TP
.BI "get-referring-entries [-p|--path PATH] INO"
.sp
Find directory entries that reference an inode number.
.sp
Display all the directory entries that refer to a given inode. Each
entry includes the inode number of the directory that contains it, the
d_off and d_type values for the entry as described by
.BR readdir (3)
, and the name of the entry.
.RS 1.0i
.PD 0
.TP
.sp
.TP
.B "-p, --path PATH"
A path within a ScoutFS filesystem.
.TP
.B "INO"
The inode number of the target inode.
.RE
.PD
.TP
.BI "ino-path INODE-NUM [-p|--path PATH]"
.sp

View File

@@ -0,0 +1,150 @@
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <limits.h>
#include <argp.h>
#include "sparse.h"
#include "parse.h"
#include "util.h"
#include "format.h"
#include "ioctl.h"
#include "parse.h"
#include "cmd.h"
struct gre_args {
char *path;
u64 ino;
};
static int do_get_referring_entries(struct gre_args *args)
{
struct scoutfs_ioctl_get_referring_entries gre;
struct scoutfs_ioctl_dirent *dent;
unsigned int bytes;
void *buf;
int ret;
int fd;
fd = get_path(args->path, O_RDONLY);
if (fd < 0)
return fd;
bytes = PATH_MAX * 1024;
buf = malloc(bytes);
if (!buf) {
fprintf(stderr, "couldn't allocate %u byte buffer\n", bytes);
ret = -ENOMEM;
goto out;
}
gre.ino = args->ino;
gre.dir_ino = 0;
gre.dir_pos = 0;
gre.entries_ptr = (intptr_t)buf;
gre.entries_bytes = bytes;
for (;;) {
ret = ioctl(fd, SCOUTFS_IOC_GET_REFERRING_ENTRIES, &gre);
if (ret <= 0) {
if (ret < 0) {
ret = -errno;
fprintf(stderr, "ioctl failed: %s (%d)\n", strerror(errno), errno);
}
goto out;
}
dent = buf;
while (ret-- > 0) {
printf("dir %llu pos %llu type %u name %s\n",
dent->dir_ino, dent->dir_pos, dent->d_type, dent->name);
gre.dir_ino = dent->dir_ino;
gre.dir_pos = dent->dir_pos;
if (dent->flags & SCOUTFS_IOCTL_DIRENT_FLAG_LAST) {
ret = 0;
goto out;
}
dent = (void *)dent + dent->entry_bytes;
}
if (++gre.dir_pos == 0) {
if (++gre.dir_ino == 0) {
ret = 0;
goto out;
}
}
}
out:
close(fd);
free(buf);
return ret;
};
static int parse_opt(int key, char *arg, struct argp_state *state)
{
struct gre_args *args = state->input;
int ret;
switch (key) {
case 'p':
args->path = strdup_or_error(state, arg);
break;
case ARGP_KEY_ARG:
if (args->ino)
argp_error(state, "more than one argument given");
ret = parse_u64(arg, &args->ino);
if (ret)
argp_error(state, "inode parse error");
break;
case ARGP_KEY_FINI:
if (!args->ino) {
argp_error(state, "must provide inode number");
}
break;
default:
break;
}
return 0;
}
static struct argp_option options[] = {
{ "path", 'p', "PATH", 0, "Path to ScoutFS filesystem"},
{ NULL }
};
static struct argp argp = {
options,
parse_opt,
"INODE-NUM",
"Print directory entries that refer to inode number"
};
static int get_referring_entries_cmd(int argc, char **argv)
{
struct gre_args args = {NULL};
int ret;
ret = argp_parse(&argp, argc, argv, 0, NULL, &args);
if (ret)
return ret;
return do_get_referring_entries(&args);
}
static void __attribute__((constructor)) get_referring_entries_ctor(void)
{
cmd_register_argp("get-referring-entries", &argp, GROUP_SEARCH, get_referring_entries_cmd);
}