Compare commits

..

40 Commits

Author SHA1 Message Date
Auke Kok
72dc5695a6 Introduce meta_reserve_blocks mount option, default value.
This option adds a mount option, with default value of 16384, that adds
an additional reserve amount of blocks for the meta device.

The default value is 16384, which corresponds to 1GB of space, and just
about doubles the internal value for the reserve that is calculated
based on clients/mounts dynamically in sort of standard values. It also
just compromises about less than 2% of the meta device size for the
smallest meta device size.

A suggested value for larger deployments is like somewhere around 256
blocks per GB of meta device size, i.e. 1/64 of the meta device space,
and about 1.6% in effect.

Customers who are running into issues can adjust their mount options to
increase the value to have a larger safety buffer, or decrease it to
potentially have a way to get out of low space conditions temporarily.
Obviously one would want to increase the value of this option after
resolving the low space condition issues as soon as possible.

Our test suite will run with meta_reserve_blocks=0, so that the behavior
of any of our tests is functionally unaffected by this change, and won't
interfere with resolving underlying ENOSPC issues and their resolution.
The addition of this option however allows us to artifically create
ENOSPC conditions at will, and we may want to add tests specifically
that do so.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-17 16:06:33 -04:00
Zach Brown
459de5b478 Merge pull request #211 from versity/auke/tapf-output
TAP formatted output.
2025-04-15 14:25:06 -07:00
Auke Kok
24031cde1d TAP formatted output.
Stored as `results/scoutfs.tap`, this file contains TAP format 14
generated test results.

Embedded in the output are some metadata so that these files can be
aggregated and stored in an unique and deduplicating way, but using a
generated UUID at the start of testing. The file itself also catches git
ID, date, and kernel version, as well as the (possibly altered) test
sequence used.

Any test that has diff or dmesg output will be considered failed, and a
copy of the relevant data is included as comments.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-15 12:02:41 -07:00
Zach Brown
04cc41719c Merge pull request #209 from versity/auke/basic-truncate-yes-pipefail
Ignore pipefail alternative error when not a tty.
2025-04-14 13:15:03 -07:00
Auke Kok
7ea084082d Ignore pipefail alternative error when not a tty.
This happens with the basic-truncate test, only. It's the only user
of the `yes` program.

The `yes` command normally fails gracefully under the usual runs that
are attached to some terminal. But when the test script runs entirely
under something else, it will throw a needless error message that
pollutes the test output:

  `yes: standard output: Broken pipe`

Adjust the redirect to omit all stderr for `yes` in this case.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-14 11:13:39 -07:00
Zach Brown
f565451f76 Merge pull request #208 from versity/zab/v1.24
v1.24 Release
2025-03-17 11:18:42 -07:00
Zach Brown
05f14640fb v1.24 Release
Finish the release notes for the 1.24 release.

Signed-off-by: Zach Brown <zab@versity.com>
2025-03-14 12:19:30 -07:00
Zach Brown
609fc56cd6 Merge pull request #203 from versity/auke/new_inode_ctime
Fix new_inode ctime assignment.
2025-02-25 15:23:16 -08:00
Zach Brown
a4b5a256eb Merge pull request #175 from versity/auke/mmap
Support for mmap() writable mappings.
2025-02-20 14:03:01 -08:00
Zach Brown
f701ce104c Merge pull request #204 from versity/zab/remove_wordexp
Remove wordexp expansion of utils path argument
2025-02-19 09:27:15 -08:00
Zach Brown
c6dab3c306 Remove wordexp expansion of utils path argument
scoutfs cli commands were using a helper that tried to perform word
expansion on the path argument.  This was done with the intent of
providing the convenience of shell expansion (env vars, ~) within the
cli command argument.

But it breaks paths that accidentally have their file names match the
syntax that wordexp supports.   "[ ]" tripped up files in the wild.

We don't need to provide shell expansion functionality in our argument
parsing.  The shell can do that.  The cli must pass the arguments
straight through, no parsing at all.

Signed-off-by: Zach Brown <zab@versity.com>
2025-02-18 11:55:37 -08:00
Auke Kok
e3e2cfceec Fix new_inode ctime assignment.
Very old copy/paste bug here, we want to update new_inode's ctime
instead. old_inode already is updated.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-02-18 13:15:49 -05:00
Zach Brown
5a10c79409 Merge pull request #201 from versity/auke/fixes_pre_parallel_restore
Misc. fixes and changes to support parallel_restore and check.
2025-02-02 06:53:25 -08:00
Auke Kok
e9d147260c Fix ctx->pos updating to properly handle dent gaps
We need to assure we're emitting dents with the proper position
and we already have them as part of our dent. The only caveat is
to increment ctx->pos once beyond the list to make sure the caller
doesn't call us once more.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
6c85879489 Assert unlock doesn't underflow lock user count.
While debugging a double unlock error we hit this condition and
debugging would have been a lot easier had we enforced this simple
constraint that we can't decrement the lock users count if it's
already 0.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
8b76a53cf3 Avoid cluster locking while put_user() in _allocated_inos.
Similar to fiemap, readdir and walk_inodes, this method could have
put_user during a page fault, causing potentially a deadlock.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
e76a171c40 Avoid faulting while cluster locked in _walk_inodes.
Similar to readdir and fiemap vfs methods, we can't copy to user while
holding cluster locks. The previous comment about it being safe no
longer applies, and this could deadlock.

Rewrite the loop to iterate and store entries in a page, then flush
the page contents while not holding a clusterlock.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
8cb08507d6 Do not copy to user while holding locks in scoutfs_data_fiemap()
Now that we support mmap writes, at any point in time we could
pagefault and lock for writes. That means - just like readdir -
we can no longer lock and copy_to_user, since it also may page fault
and thus deadlock.

We statically allocate 32 extent entries on the stack and use
these to shuffle out fiemap entries at a time, locking and
unlocking around collecting and fiemap_fill_extent_next.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
cad12d5ce8 Avoid deadlock in _readdir() due to copy_to_user().
dir_emit() will copy_to_user, which can pagefault. If this happens while
cluster locked, we could deadlock.

We use a single page to stage dir_emit data, and iterate between
fetching dirents while locked, and emitting them while not locked.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
e59a5f8ebd Readdir w/offset validation.
Verify using xfs_io that readdir offsets match expected output.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
1bcd1d4d00 Drop readdir pre-.iterate() compat (el7.5ish).
These 2 sections of compat for readdir are wholly obsolete and can be
hard dropped, which restores the method to look like current upstream
code.

This was added in ddd1a4e.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-23 14:28:40 -05:00
Auke Kok
b944f609aa remap_pages ops becomes obsolete. 2025-01-23 14:28:40 -05:00
Auke Kok
519b47a53c mmap() trace events.
We merely trace exit values and position, and ignore length.

Because vm_fault_t is __bitwise, sparse will loudly complain about
a plain cast to u32, so we must __force (on el8). ret will be 512 in
normal cases.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-23 14:28:40 -05:00
Auke Kok
92f704d35a Enable all xfstests mmap() tests.
Now that all of these should be passing, we enable all mmap() tests in
xfstests, and update the golden output with the new tests.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-23 14:28:40 -05:00
Auke Kok
311bf75902 Add mmap tests.
Two test programs are added. The run time is about 1min on my el7
instance.

The test script finishes up with a read/write mmap test on offline
extents to verify the data wait paths in those functions.

One program will perform vfs read/write and mmap read/write calls on
the same file from across 5 threads (mounts) repeatedly.  The goal
is to assure there are no locking issues between read/write paths.

The second test program performs consistency checking on a file that is
repeatedly written/read using memory maps and normal reads and writes,
and the content is verified after every operation.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-23 14:28:40 -05:00
Benjamin LaHaise
3788d67101 Add support for writable shared mmap()ings
Add support for writable MAP_SHARED mmap()ings.  Avoid issues with late
writepage()s building transactions by doing the block_write_begin() work in
scoutfs_data_page_mkwrite().  Ensure the page is marked dirty and prepared
for write, then let the VM complete the write when the page is flushed or
invalidated.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-23 14:28:40 -05:00
Benjamin LaHaise
b7a3d03711 Add support for read only mmap()
Adds the required memory mapped ops struct and page fault handler
for reads.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-23 14:28:40 -05:00
Zach Brown
295f751aed Add test_bit to utils bitmap
Add test_bit() to the trivial utils bitmap.c implementation.

Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:58:58 -08:00
Zach Brown
7f6032d9b4 Add lk rbtree wrapper
Import the kernel's rbtree implementation with a wrapper so we can use
it from userspace.

Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:58:49 -08:00
Zach Brown
7e3a6537ec Add userspace version of our dirent name hash
Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:58:41 -08:00
Zach Brown
49b7b70438 Add userspace version of our mode to type
Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:58:31 -08:00
Zach Brown
de0fdd1f9f Promote userspace btree block initialization
Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:58:23 -08:00
Zach Brown
a6d7de3c00 Add fls64() alias for userspace flsll()
Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:58:16 -08:00
Zach Brown
2c2c127c5e Add put_unaligned_leXX() for userspace
Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:58:10 -08:00
Zach Brown
9491c784e7 Add srch_encode_entry() for userspace utils
Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:57:56 -08:00
Zach Brown
c3b30930fa Add bloom filter index calc for userspace utils
Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:57:46 -08:00
Zach Brown
e7e46a80e6 Add userspace NSEC_PER_SEC
Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:57:39 -08:00
Zach Brown
1ddf752f42 Import a few more functions to our list.h
Import a few more functions from the kernel's list.h into our imported
copy.

Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:57:29 -08:00
Zach Brown
14b65c6360 Fix printing alloc list block extents
The list alloc blocks have an array of blknos that are offset by a start
field in the block header.  The print code wasn't using that and was
always referencing the beginning of the array, which could miss blocks.

Signed-off-by: Zach Brown <zab@versity.com>
2025-01-22 09:57:21 -08:00
Zach Brown
934f6c7648 Merge pull request #199 from versity/zab/v1.23
v1.23 Release
2024-12-11 17:02:52 -08:00
45 changed files with 2955 additions and 284 deletions

View File

@@ -1,6 +1,22 @@
Versity ScoutFS Release Notes
=============================
---
v1.24
\
*Mar 14, 2025*
Add support for coherent read and write mmap() mappings of regular file
data between mounts.
Fix a bug that was causing scoutfs utilities to parse and change some
file names before passing them on to the kernel for processing. This
fixes spurious scoutfs command errors for files with the offending
patterns in their names.
Fix a bug where rename wasn't updating the ctime of the inode at the
destination name if it existed.
---
v1.23
\

View File

@@ -6,26 +6,6 @@
ccflags-y += -include $(src)/kernelcompat.h
#
# v3.10-rc6-21-gbb6f619b3a49
#
# _readdir changes from fop->readdir() to fop->iterate() and from
# filldir(dirent) to dir_emit(ctx).
#
ifneq (,$(shell grep 'iterate.*dir_context' include/linux/fs.h))
ccflags-y += -DKC_ITERATE_DIR_CONTEXT
endif
#
# v3.10-rc6-23-g5f99f4e79abc
#
# Helpers including dir_emit_dots() are added in the process of
# switching dcache_readdir() from fop->readdir() to fop->iterate()
#
ifneq (,$(shell grep 'dir_emit_dots' include/linux/fs.h))
ccflags-y += -DKC_DIR_EMIT_DOTS
endif
#
# v3.18-rc2-19-gb5ae6b15bd73
#
@@ -431,3 +411,26 @@ endif
ifneq (,$(shell grep 'struct file.*bdev_file_open_by_path.const char.*path' include/linux/blkdev.h))
ccflags-y += -DKC_BDEV_FILE_OPEN_BY_PATH
endif
# v4.0-rc7-1796-gfe0f07d08ee3
#
# direct-io changes modify inode_dio_done to now be called inode_dio_end
ifneq (,$(shell grep 'void inode_dio_end.struct inode' include/linux/fs.h))
ccflags-y += -DKC_INODE_DIO_END
endif
#
# v5.0-6476-g3d3539018d2c
#
# page fault handlers return a bitmask vm_fault_t instead
# Note: el8's header has a slightly modified prefix here
ifneq (,$(shell grep 'typedef.*__bitwise unsigned.*int vm_fault_t' include/linux/mm_types.h))
ccflags-y += -DKC_MM_VM_FAULT_T
endif
# v3.19-499-gd83a08db5ba6
#
# .remap pages becomes obsolete
ifneq (,$(shell grep 'int ..remap_pages..struct vm_area_struct' include/linux/mm.h))
ccflags-y += -DKC_MM_REMAP_PAGES
endif

View File

@@ -560,7 +560,7 @@ static int scoutfs_get_block(struct inode *inode, sector_t iblock,
u64 offset;
int ret;
WARN_ON_ONCE(create && !inode_is_locked(inode));
WARN_ON_ONCE(create && !rwsem_is_locked(&si->extent_sem));
/* make sure caller holds a cluster lock */
lock = scoutfs_per_task_get(&si->pt_data_lock);
@@ -1551,13 +1551,17 @@ int scoutfs_data_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
struct super_block *sb = inode->i_sb;
const u64 ino = scoutfs_ino(inode);
struct scoutfs_lock *lock = NULL;
struct scoutfs_extent *info = NULL;
struct page *page = NULL;
struct scoutfs_extent ext;
struct scoutfs_extent cur;
struct data_ext_args args;
u32 last_flags;
u64 iblock;
u64 last;
int entries = 0;
int ret;
int complete = 0;
if (len == 0) {
ret = 0;
@@ -1568,16 +1572,11 @@ int scoutfs_data_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
if (ret)
goto out;
inode_lock(inode);
down_read(&si->extent_sem);
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_READ, 0, inode, &lock);
if (ret)
goto unlock;
args.ino = ino;
args.inode = inode;
args.lock = lock;
page = alloc_page(GFP_KERNEL);
if (!page) {
ret = -ENOMEM;
goto out;
}
/* use a dummy extent to track */
memset(&cur, 0, sizeof(cur));
@@ -1586,48 +1585,93 @@ int scoutfs_data_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
iblock = start >> SCOUTFS_BLOCK_SM_SHIFT;
last = (start + len - 1) >> SCOUTFS_BLOCK_SM_SHIFT;
args.ino = ino;
args.inode = inode;
/* outer loop */
while (iblock <= last) {
ret = scoutfs_ext_next(sb, &data_ext_ops, &args,
iblock, 1, &ext);
if (ret < 0) {
if (ret == -ENOENT)
/* lock */
inode_lock(inode);
down_read(&si->extent_sem);
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_READ, 0, inode, &lock);
if (ret) {
up_read(&si->extent_sem);
inode_unlock(inode);
break;
}
args.lock = lock;
/* collect entries */
info = page_address(page);
memset(info, 0, PAGE_SIZE);
while (entries < (PAGE_SIZE / sizeof(struct fiemap_extent)) - 1) {
ret = scoutfs_ext_next(sb, &data_ext_ops, &args,
iblock, 1, &ext);
if (ret < 0) {
if (ret == -ENOENT)
ret = 0;
complete = 1;
last_flags = FIEMAP_EXTENT_LAST;
break;
}
trace_scoutfs_data_fiemap_extent(sb, ino, &ext);
if (ext.start > last) {
/* not setting _LAST, it's for end of file */
ret = 0;
last_flags = FIEMAP_EXTENT_LAST;
break;
complete = 1;
break;
}
if (scoutfs_ext_can_merge(&cur, &ext)) {
/* merged extents could be greater than input len */
cur.len += ext.len;
} else {
/* fill it */
memcpy(info, &cur, sizeof(cur));
entries++;
info++;
cur = ext;
}
iblock = ext.start + ext.len;
}
trace_scoutfs_data_fiemap_extent(sb, ino, &ext);
/* unlock */
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
up_read(&si->extent_sem);
inode_unlock(inode);
if (ext.start > last) {
/* not setting _LAST, it's for end of file */
ret = 0;
if (ret)
break;
}
if (scoutfs_ext_can_merge(&cur, &ext)) {
/* merged extents could be greater than input len */
cur.len += ext.len;
} else {
ret = fill_extent(fieinfo, &cur, 0);
/* emit entries */
info = page_address(page);
for (; entries > 0; entries--) {
ret = fill_extent(fieinfo, info, 0);
if (ret != 0)
goto unlock;
cur = ext;
goto out;
info++;
}
iblock = ext.start + ext.len;
if (complete)
break;
}
/* still one left, it's in cur */
if (cur.len)
ret = fill_extent(fieinfo, &cur, last_flags);
unlock:
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
up_read(&si->extent_sem);
inode_unlock(inode);
out:
if (ret == 1)
ret = 0;
if (page)
__free_page(page);
trace_scoutfs_data_fiemap(sb, start, len, ret);
return ret;
@@ -1914,6 +1958,236 @@ int scoutfs_data_waiting(struct super_block *sb, u64 ino, u64 iblock,
return ret;
}
#ifdef KC_MM_VM_FAULT_T
static vm_fault_t scoutfs_data_page_mkwrite(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
#else
static int scoutfs_data_page_mkwrite(struct vm_area_struct *vma,
struct vm_fault *vmf)
{
#endif
struct page *page = vmf->page;
struct file *file = vma->vm_file;
struct inode *inode = file_inode(file);
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
struct super_block *sb = inode->i_sb;
struct scoutfs_lock *lock = NULL;
SCOUTFS_DECLARE_PER_TASK_ENTRY(pt_ent);
DECLARE_DATA_WAIT(dw);
struct write_begin_data wbd;
u64 ind_seq;
loff_t pos;
loff_t size;
unsigned int len = PAGE_SIZE;
vm_fault_t ret = VM_FAULT_SIGBUS;
int err;
pos = vmf->pgoff << PAGE_SHIFT;
sb_start_pagefault(sb);
err = scoutfs_lock_inode(sb, SCOUTFS_LOCK_WRITE,
SCOUTFS_LKF_REFRESH_INODE, inode, &lock);
if (err) {
ret = vmf_error(err);
goto out;
}
size = i_size_read(inode);
if (scoutfs_per_task_add_excl(&si->pt_data_lock, &pt_ent, lock)) {
/* data_version is per inode, whole file must be online */
err = scoutfs_data_wait_check(inode, 0, size,
SEF_OFFLINE,
SCOUTFS_IOC_DWO_WRITE,
&dw, lock);
if (err != 0) {
if (err < 0)
ret = vmf_error(err);
goto out_unlock;
}
}
/* scoutfs_write_begin */
memset(&wbd, 0, sizeof(wbd));
INIT_LIST_HEAD(&wbd.ind_locks);
wbd.lock = lock;
/*
* Start transaction before taking page locks - we want to make sure we're
* not locking a page, then waiting for trans, because writeback might race
* against it and cause a lock inversion hang - as demonstrated by both
* holetest and fsstress tests in xfstests.
*/
do {
err = scoutfs_inode_index_start(sb, &ind_seq) ?:
scoutfs_inode_index_prepare(sb, &wbd.ind_locks, inode,
true) ?:
scoutfs_inode_index_try_lock_hold(sb, &wbd.ind_locks,
ind_seq, false);
} while (err > 0);
if (err < 0) {
ret = vmf_error(err);
goto out_trans;
}
down_write(&si->extent_sem);
if (!trylock_page(page)) {
ret = VM_FAULT_NOPAGE;
goto out_sem;
}
ret = VM_FAULT_LOCKED;
if ((page->mapping != inode->i_mapping) ||
(!PageUptodate(page)) ||
(page_offset(page) > size)) {
unlock_page(page);
ret = VM_FAULT_NOPAGE;
goto out_sem;
}
if (page->index == (size - 1) >> PAGE_SHIFT)
len = ((size - 1) & ~PAGE_MASK) + 1;
err = __block_write_begin(page, pos, PAGE_SIZE, scoutfs_get_block);
if (err) {
ret = vmf_error(err);
unlock_page(page);
goto out_sem;
}
/* end scoutfs_write_begin */
/*
* We mark the page dirty already here so that when freeze is in
* progress, we are guaranteed that writeback during freezing will
* see the dirty page and writeprotect it again.
*/
set_page_dirty(page);
wait_for_stable_page(page);
/* scoutfs_write_end */
scoutfs_inode_set_data_seq(inode);
scoutfs_inode_inc_data_version(inode);
file_update_time(vma->vm_file);
scoutfs_update_inode_item(inode, wbd.lock, &wbd.ind_locks);
scoutfs_inode_queue_writeback(inode);
out_sem:
up_write(&si->extent_sem);
out_trans:
scoutfs_release_trans(sb);
scoutfs_inode_index_unlock(sb, &wbd.ind_locks);
/* end scoutfs_write_end */
out_unlock:
scoutfs_per_task_del(&si->pt_data_lock, &pt_ent);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
out:
sb_end_pagefault(sb);
if (scoutfs_data_wait_found(&dw)) {
/*
* It'd be really nice to not hold the mmap_sem lock here
* before waiting for data, and then return VM_FAULT_RETRY
*/
err = scoutfs_data_wait(inode, &dw);
if (err == 0)
ret = VM_FAULT_NOPAGE;
else
ret = vmf_error(err);
}
trace_scoutfs_data_page_mkwrite(sb, scoutfs_ino(inode), pos, (__force u32)ret);
return ret;
}
#ifdef KC_MM_VM_FAULT_T
static vm_fault_t scoutfs_data_filemap_fault(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
#else
static int scoutfs_data_filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
#endif
struct file *file = vma->vm_file;
struct inode *inode = file_inode(file);
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
struct super_block *sb = inode->i_sb;
struct scoutfs_lock *inode_lock = NULL;
SCOUTFS_DECLARE_PER_TASK_ENTRY(pt_ent);
DECLARE_DATA_WAIT(dw);
loff_t pos;
int err;
vm_fault_t ret = VM_FAULT_SIGBUS;
pos = vmf->pgoff;
pos <<= PAGE_SHIFT;
retry:
err = scoutfs_lock_inode(sb, SCOUTFS_LOCK_READ,
SCOUTFS_LKF_REFRESH_INODE, inode, &inode_lock);
if (err < 0)
return vmf_error(err);
if (scoutfs_per_task_add_excl(&si->pt_data_lock, &pt_ent, inode_lock)) {
/* protect checked extents from stage/release */
atomic_inc(&inode->i_dio_count);
err = scoutfs_data_wait_check(inode, pos, PAGE_SIZE,
SEF_OFFLINE, SCOUTFS_IOC_DWO_READ,
&dw, inode_lock);
if (err != 0) {
if (err < 0)
ret = vmf_error(err);
goto out;
}
}
#ifdef KC_MM_VM_FAULT_T
ret = filemap_fault(vmf);
#else
ret = filemap_fault(vma, vmf);
#endif
out:
if (scoutfs_per_task_del(&si->pt_data_lock, &pt_ent))
kc_inode_dio_end(inode);
scoutfs_unlock(sb, inode_lock, SCOUTFS_LOCK_READ);
if (scoutfs_data_wait_found(&dw)) {
err = scoutfs_data_wait(inode, &dw);
if (err == 0)
goto retry;
ret = VM_FAULT_RETRY;
}
trace_scoutfs_data_filemap_fault(sb, scoutfs_ino(inode), pos, (__force u32)ret);
return ret;
}
static const struct vm_operations_struct scoutfs_data_file_vm_ops = {
.fault = scoutfs_data_filemap_fault,
.page_mkwrite = scoutfs_data_page_mkwrite,
#ifdef KC_MM_REMAP_PAGES
.remap_pages = generic_file_remap_pages,
#endif
};
static int scoutfs_file_mmap(struct file *file, struct vm_area_struct *vma)
{
file_accessed(file);
vma->vm_ops = &scoutfs_data_file_vm_ops;
return 0;
}
const struct address_space_operations scoutfs_file_aops = {
#ifdef KC_MPAGE_READ_FOLIO
.dirty_folio = block_dirty_folio,
@@ -1945,6 +2219,7 @@ const struct file_operations scoutfs_file_fops = {
.splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write,
#endif
.mmap = scoutfs_file_mmap,
.unlocked_ioctl = scoutfs_ioctl,
.fsync = scoutfs_file_fsync,
.llseek = scoutfs_file_llseek,

View File

@@ -11,11 +11,13 @@
* General Public License for more details.
*/
#include <linux/kernel.h>
#include <linux/stddef.h>
#include <linux/fs.h>
#include <linux/slab.h>
#include <linux/uio.h>
#include <linux/xattr.h>
#include <linux/namei.h>
#include <linux/mm.h>
#include "format.h"
#include "file.h"
@@ -434,6 +436,15 @@ out:
return d_splice_alias(inode, dentry);
}
/*
* Helper to make iterating through dirent ptrs aligned
*/
static inline struct scoutfs_dirent *next_aligned_dirent(struct scoutfs_dirent *dent, u8 len)
{
return (void *)dent +
ALIGN(offsetof(struct scoutfs_dirent, name[len]), __alignof__(struct scoutfs_dirent));
}
/*
* readdir simply iterates over the dirent items for the dir inode and
* uses their offset as the readdir position.
@@ -441,76 +452,112 @@ out:
* It will need to be careful not to read past the region of the dirent
* hash offset keys that it has access to.
*/
static int KC_DECLARE_READDIR(scoutfs_readdir, struct file *file,
void *dirent, kc_readdir_ctx_t ctx)
static int scoutfs_readdir(struct file *file, struct dir_context *ctx)
{
struct inode *inode = file_inode(file);
struct super_block *sb = inode->i_sb;
struct scoutfs_lock *dir_lock = NULL;
struct scoutfs_dirent *dent = NULL;
/* we'll store name_len in dent->__pad[0] */
#define hacky_name_len __pad[0]
struct scoutfs_key last_key;
struct scoutfs_key key;
struct page *page = NULL;
int name_len;
u64 pos;
int entries = 0;
int ret;
int complete = 0;
struct scoutfs_dirent *end;
if (!kc_dir_emit_dots(file, dirent, ctx))
if (!dir_emit_dots(file, ctx))
return 0;
dent = alloc_dirent(SCOUTFS_NAME_LEN);
if (!dent) {
page = alloc_page(GFP_KERNEL);
if (!page)
return -ENOMEM;
}
end = page_address(page) + PAGE_SIZE;
init_dirent_key(&last_key, SCOUTFS_READDIR_TYPE, scoutfs_ino(inode),
SCOUTFS_DIRENT_LAST_POS, 0);
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_READ, 0, inode, &dir_lock);
if (ret)
goto out;
/*
* lock and fetch dirent items, until the page no longer fits
* a max size dirent (288b). Then unlock and dir_emit the ones
* we stored in the page.
*/
for (;;) {
init_dirent_key(&key, SCOUTFS_READDIR_TYPE, scoutfs_ino(inode),
kc_readdir_pos(file, ctx), 0);
/* lock */
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_READ, 0, inode, &dir_lock);
if (ret)
break;
ret = scoutfs_item_next(sb, &key, &last_key, dent,
dirent_bytes(SCOUTFS_NAME_LEN),
dir_lock);
if (ret < 0) {
if (ret == -ENOENT)
dent = page_address(page);
pos = ctx->pos;
while (next_aligned_dirent(dent, SCOUTFS_NAME_LEN) < end) {
init_dirent_key(&key, SCOUTFS_READDIR_TYPE, scoutfs_ino(inode),
pos, 0);
ret = scoutfs_item_next(sb, &key, &last_key, dent,
dirent_bytes(SCOUTFS_NAME_LEN),
dir_lock);
if (ret < 0) {
if (ret == -ENOENT) {
ret = 0;
complete = 1;
}
break;
}
name_len = ret - sizeof(struct scoutfs_dirent);
dent->hacky_name_len = name_len;
if (name_len < 1 || name_len > SCOUTFS_NAME_LEN) {
scoutfs_corruption(sb, SC_DIRENT_READDIR_NAME_LEN,
corrupt_dirent_readdir_name_len,
"dir_ino %llu pos %llu key "SK_FMT" len %d",
scoutfs_ino(inode),
pos,
SK_ARG(&key), name_len);
ret = -EIO;
break;
}
pos = le64_to_cpu(dent->pos) + 1;
dent = next_aligned_dirent(dent, name_len);
entries++;
}
/* unlock */
scoutfs_unlock(sb, dir_lock, SCOUTFS_LOCK_READ);
if (ret < 0)
break;
dent = page_address(page);
for (; entries > 0; entries--) {
ctx->pos = le64_to_cpu(dent->pos);
if (!dir_emit(ctx, dent->name, dent->hacky_name_len,
le64_to_cpu(dent->ino),
dentry_type(dent->type))) {
ret = 0;
goto out;
}
dent = next_aligned_dirent(dent, dent->hacky_name_len);
/* always advance ctx->pos past */
ctx->pos++;
}
if (complete)
break;
}
name_len = ret - sizeof(struct scoutfs_dirent);
if (name_len < 1 || name_len > SCOUTFS_NAME_LEN) {
scoutfs_corruption(sb, SC_DIRENT_READDIR_NAME_LEN,
corrupt_dirent_readdir_name_len,
"dir_ino %llu pos %llu key "SK_FMT" len %d",
scoutfs_ino(inode),
kc_readdir_pos(file, ctx),
SK_ARG(&key), name_len);
ret = -EIO;
goto out;
}
pos = le64_to_cpu(key.skd_major);
kc_readdir_pos(file, ctx) = pos;
if (!kc_dir_emit(ctx, dirent, dent->name, name_len, pos,
le64_to_cpu(dent->ino),
dentry_type(dent->type))) {
ret = 0;
break;
}
kc_readdir_pos(file, ctx) = pos + 1;
}
out:
scoutfs_unlock(sb, dir_lock, SCOUTFS_LOCK_READ);
kfree(dent);
if (page)
__free_page(page);
return ret;
}
@@ -1765,7 +1812,7 @@ retry:
}
old_inode->i_ctime = now;
if (new_inode)
old_inode->i_ctime = now;
new_inode->i_ctime = now;
inode_inc_iversion(old_dir);
inode_inc_iversion(old_inode);
@@ -1973,7 +2020,7 @@ const struct inode_operations scoutfs_symlink_iops = {
};
const struct file_operations scoutfs_dir_fops = {
.KC_FOP_READDIR = scoutfs_readdir,
.iterate = scoutfs_readdir,
#ifdef KC_FMODE_KABI_ITERATE
.open = scoutfs_dir_open,
#endif

View File

@@ -58,25 +58,23 @@
* key space after we find no items in a given lock region. This is
* relatively cheap because reading is going to check the segments
* anyway.
*
* This is copying to userspace while holding a read lock. This is safe
* because faulting can send a request for a write lock while the read
* lock is being used. The cluster locks don't block tasks in a node,
* they match and the tasks fall back to local locking. In this case
* the spin locks around the item cache.
*/
static long scoutfs_ioc_walk_inodes(struct file *file, unsigned long arg)
{
struct super_block *sb = file_inode(file)->i_sb;
struct scoutfs_ioctl_walk_inodes __user *uwalk = (void __user *)arg;
struct scoutfs_ioctl_walk_inodes walk;
struct scoutfs_ioctl_walk_inodes_entry ent;
struct scoutfs_ioctl_walk_inodes_entry *ent = NULL;
struct scoutfs_ioctl_walk_inodes_entry *end;
struct scoutfs_key next_key;
struct scoutfs_key last_key;
struct scoutfs_key key;
struct scoutfs_lock *lock;
struct page *page = NULL;
u64 last_seq;
u64 entries = 0;
int ret = 0;
int complete = 0;
u32 nr = 0;
u8 type;
@@ -107,6 +105,10 @@ static long scoutfs_ioc_walk_inodes(struct file *file, unsigned long arg)
}
}
page = alloc_page(GFP_KERNEL);
if (!page)
return -ENOMEM;
scoutfs_inode_init_index_key(&key, type, walk.first.major,
walk.first.minor, walk.first.ino);
scoutfs_inode_init_index_key(&last_key, type, walk.last.major,
@@ -115,77 +117,107 @@ static long scoutfs_ioc_walk_inodes(struct file *file, unsigned long arg)
/* cap nr to the max the ioctl can return to a compat task */
walk.nr_entries = min_t(u64, walk.nr_entries, INT_MAX);
ret = scoutfs_lock_inode_index(sb, SCOUTFS_LOCK_READ, type,
walk.first.major, walk.first.ino,
&lock);
if (ret < 0)
goto out;
end = page_address(page) + PAGE_SIZE;
for (nr = 0; nr < walk.nr_entries; ) {
/* outer loop */
for (nr = 0;;) {
ent = page_address(page);
/* make sure _pad and minor are zeroed */
memset(ent, 0, PAGE_SIZE);
ret = scoutfs_item_next(sb, &key, &last_key, NULL, 0, lock);
if (ret < 0 && ret != -ENOENT)
ret = scoutfs_lock_inode_index(sb, SCOUTFS_LOCK_READ, type,
le64_to_cpu(key.skii_major),
le64_to_cpu(key.skii_ino),
&lock);
if (ret)
break;
if (ret == -ENOENT) {
/* done if lock covers last iteration key */
if (scoutfs_key_compare(&last_key, &lock->end) <= 0) {
ret = 0;
/* inner loop 1 */
while (ent + 1 < end) {
ret = scoutfs_item_next(sb, &key, &last_key, NULL, 0, lock);
if (ret < 0 && ret != -ENOENT)
break;
if (ret == -ENOENT) {
/* done if lock covers last iteration key */
if (scoutfs_key_compare(&last_key, &lock->end) <= 0) {
ret = 0;
complete = 1;
break;
}
/* continue iterating after locked empty region */
key = lock->end;
scoutfs_key_inc(&key);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
/* avoid double-unlocking here after break */
lock = NULL;
ret = scoutfs_forest_next_hint(sb, &key, &next_key);
if (ret < 0 && ret != -ENOENT)
break;
if (ret == -ENOENT ||
scoutfs_key_compare(&next_key, &last_key) > 0) {
ret = 0;
complete = 1;
break;
}
key = next_key;
ret = scoutfs_lock_inode_index(sb, SCOUTFS_LOCK_READ,
type,
le64_to_cpu(key.skii_major),
le64_to_cpu(key.skii_ino),
&lock);
if (ret)
break;
continue;
}
/* continue iterating after locked empty region */
key = lock->end;
ent->major = le64_to_cpu(key.skii_major);
ent->ino = le64_to_cpu(key.skii_ino);
scoutfs_key_inc(&key);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
ent++;
entries++;
ret = scoutfs_forest_next_hint(sb, &key, &next_key);
if (ret < 0 && ret != -ENOENT)
goto out;
if (nr + entries >= walk.nr_entries) {
complete = 1;
break;
}
}
if (ret == -ENOENT ||
scoutfs_key_compare(&next_key, &last_key) > 0) {
ret = 0;
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
if (ret < 0)
break;
/* inner loop 2 */
ent = page_address(page);
for (; entries > 0; entries--) {
if (copy_to_user((void __user *)walk.entries_ptr, ent,
sizeof(struct scoutfs_ioctl_walk_inodes_entry))) {
ret = -EFAULT;
goto out;
}
key = next_key;
ret = scoutfs_lock_inode_index(sb, SCOUTFS_LOCK_READ,
key.sk_type,
le64_to_cpu(key.skii_major),
le64_to_cpu(key.skii_ino),
&lock);
if (ret < 0)
goto out;
continue;
walk.entries_ptr += sizeof(struct scoutfs_ioctl_walk_inodes_entry);
ent++;
nr++;
}
ent.major = le64_to_cpu(key.skii_major);
ent.minor = 0;
ent.ino = le64_to_cpu(key.skii_ino);
if (copy_to_user((void __user *)walk.entries_ptr, &ent,
sizeof(ent))) {
ret = -EFAULT;
if (complete)
break;
}
nr++;
walk.entries_ptr += sizeof(ent);
scoutfs_key_inc(&key);
}
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
out:
if (page)
__free_page(page);
if (nr > 0)
ret = nr;
return ret;
}
@@ -1163,11 +1195,15 @@ static long scoutfs_ioc_get_allocated_inos(struct file *file, unsigned long arg)
struct scoutfs_lock *lock = NULL;
struct scoutfs_key key;
struct scoutfs_key end;
struct page *page = NULL;
u64 __user *uinos;
u64 bytes;
u64 ino;
u64 *ino;
u64 *ino_end;
int entries = 0;
int nr;
int ret;
int complete = 0;
if (!(file->f_mode & FMODE_READ)) {
ret = -EBADF;
@@ -1189,47 +1225,83 @@ static long scoutfs_ioc_get_allocated_inos(struct file *file, unsigned long arg)
goto out;
}
page = alloc_page(GFP_KERNEL);
if (!page) {
ret = -ENOMEM;
goto out;
}
ino_end = page_address(page) + PAGE_SIZE;
scoutfs_inode_init_key(&key, gai.start_ino);
scoutfs_inode_init_key(&end, gai.start_ino | SCOUTFS_LOCK_INODE_GROUP_MASK);
uinos = (void __user *)gai.inos_ptr;
bytes = gai.inos_bytes;
nr = 0;
ret = scoutfs_lock_ino(sb, SCOUTFS_LOCK_READ, 0, gai.start_ino, &lock);
if (ret < 0)
goto out;
for (;;) {
while (bytes >= sizeof(*uinos)) {
ret = scoutfs_lock_ino(sb, SCOUTFS_LOCK_READ, 0, gai.start_ino, &lock);
if (ret < 0)
goto out;
ret = scoutfs_item_next(sb, &key, &end, NULL, 0, lock);
if (ret < 0) {
if (ret == -ENOENT)
ino = page_address(page);
while (ino < ino_end) {
ret = scoutfs_item_next(sb, &key, &end, NULL, 0, lock);
if (ret < 0) {
if (ret == -ENOENT) {
ret = 0;
complete = 1;
}
break;
}
if (key.sk_zone != SCOUTFS_FS_ZONE) {
ret = 0;
break;
complete = 1;
break;
}
/* all fs items are owned by allocated inodes, and _first is always ino */
*ino = le64_to_cpu(key._sk_first);
scoutfs_inode_init_key(&key, *ino + 1);
ino++;
entries++;
nr++;
bytes -= sizeof(*uinos);
if (bytes < sizeof(*uinos)) {
complete = 1;
break;
}
if (nr == INT_MAX) {
complete = 1;
break;
}
}
if (key.sk_zone != SCOUTFS_FS_ZONE) {
ret = 0;
break;
}
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
/* all fs items are owned by allocated inodes, and _first is always ino */
ino = le64_to_cpu(key._sk_first);
if (put_user(ino, uinos)) {
if (ret < 0)
break;
ino = page_address(page);
if (copy_to_user(uinos, ino, entries * sizeof(*uinos))) {
ret = -EFAULT;
break;
goto out;
}
uinos++;
bytes -= sizeof(*uinos);
if (++nr == INT_MAX)
uinos += entries;
entries = 0;
if (complete)
break;
scoutfs_inode_init_key(&key, ino + 1);
}
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
out:
if (page)
__free_page(page);
return ret ?: nr;
}

View File

@@ -29,50 +29,6 @@ do { \
})
#endif
#ifndef KC_ITERATE_DIR_CONTEXT
typedef filldir_t kc_readdir_ctx_t;
#define KC_DECLARE_READDIR(name, file, dirent, ctx) name(file, dirent, ctx)
#define KC_FOP_READDIR readdir
#define kc_readdir_pos(filp, ctx) (filp)->f_pos
#define kc_dir_emit_dots(file, dirent, ctx) dir_emit_dots(file, dirent, ctx)
#define kc_dir_emit(ctx, dirent, name, name_len, pos, ino, dt) \
(ctx(dirent, name, name_len, pos, ino, dt) == 0)
#else
typedef struct dir_context * kc_readdir_ctx_t;
#define KC_DECLARE_READDIR(name, file, dirent, ctx) name(file, ctx)
#define KC_FOP_READDIR iterate
#define kc_readdir_pos(filp, ctx) (ctx)->pos
#define kc_dir_emit_dots(file, dirent, ctx) dir_emit_dots(file, ctx)
#define kc_dir_emit(ctx, dirent, name, name_len, pos, ino, dt) \
dir_emit(ctx, name, name_len, ino, dt)
#endif
#ifndef KC_DIR_EMIT_DOTS
/*
* Kernels before ->iterate and don't have dir_emit_dots so we give them
* one that works with the ->readdir() filldir() method.
*/
static inline int dir_emit_dots(struct file *file, void *dirent,
filldir_t filldir)
{
if (file->f_pos == 0) {
if (filldir(dirent, ".", 1, 1,
file->f_path.dentry->d_inode->i_ino, DT_DIR))
return 0;
file->f_pos = 1;
}
if (file->f_pos == 1) {
if (filldir(dirent, "..", 2, 1,
parent_ino(file->f_path.dentry), DT_DIR))
return 0;
file->f_pos = 2;
}
return 1;
}
#endif
#ifdef KC_POSIX_ACL_VALID_USER_NS
#define kc_posix_acl_valid(user_ns, acl) posix_acl_valid(user_ns, acl)
#else
@@ -438,4 +394,20 @@ static inline int kc_tcp_sock_set_nodelay(struct socket *sock)
}
#endif
#ifdef KC_INODE_DIO_END
#define kc_inode_dio_end inode_dio_end
#else
#define kc_inode_dio_end inode_dio_done
#endif
#ifndef KC_MM_VM_FAULT_T
typedef unsigned int vm_fault_t;
static inline vm_fault_t vmf_error(int err)
{
if (err == -ENOMEM)
return VM_FAULT_OOM;
return VM_FAULT_SIGBUS;
}
#endif
#endif

View File

@@ -302,6 +302,7 @@ static void lock_inc_count(unsigned int *counts, enum scoutfs_lock_mode mode)
static void lock_dec_count(unsigned int *counts, enum scoutfs_lock_mode mode)
{
BUG_ON(mode < 0 || mode >= SCOUTFS_LOCK_NR_MODES);
BUG_ON(counts[mode] == 0);
counts[mode]--;
}

View File

@@ -39,6 +39,7 @@ enum {
Opt_orphan_scan_delay_ms,
Opt_quorum_heartbeat_timeout_ms,
Opt_quorum_slot_nr,
Opt_meta_reserve_blocks,
Opt_err,
};
@@ -52,6 +53,7 @@ static const match_table_t tokens = {
{Opt_orphan_scan_delay_ms, "orphan_scan_delay_ms=%s"},
{Opt_quorum_heartbeat_timeout_ms, "quorum_heartbeat_timeout_ms=%s"},
{Opt_quorum_slot_nr, "quorum_slot_nr=%s"},
{Opt_meta_reserve_blocks, "meta_reserve_blocks=%s"},
{Opt_err, NULL}
};
@@ -126,6 +128,9 @@ static void free_options(struct scoutfs_mount_options *opts)
#define MIN_DATA_PREALLOC_BLOCKS 1ULL
#define MAX_DATA_PREALLOC_BLOCKS ((unsigned long long)SCOUTFS_BLOCK_SM_MAX)
#define SCOUTFS_META_RESERVE_DEFAULT_BLOCKS 16384
static void init_default_options(struct scoutfs_mount_options *opts)
{
memset(opts, 0, sizeof(*opts));
@@ -136,6 +141,7 @@ static void init_default_options(struct scoutfs_mount_options *opts)
opts->orphan_scan_delay_ms = -1;
opts->quorum_heartbeat_timeout_ms = SCOUTFS_QUORUM_DEF_HB_TIMEO_MS;
opts->quorum_slot_nr = -1;
opts->meta_reserve_blocks = SCOUTFS_META_RESERVE_DEFAULT_BLOCKS;
}
static int verify_log_merge_wait_timeout_ms(struct super_block *sb, int ret, int val)
@@ -167,6 +173,24 @@ static int verify_quorum_heartbeat_timeout_ms(struct super_block *sb, int ret, u
return 0;
}
static int verify_meta_reserve_blocks(struct super_block *sb, int ret, int val)
{
/*
* Ideally we set a limit to something reasonable like 1/2 the actual
* total_meta_blocks, but we can't yet get this info when mount is called
*/
if (ret < 0) {
scoutfs_err(sb, "failed to parse meta_reserve_blocks value");
return -EINVAL;
}
if (val < 0 || val > INT_MAX) {
scoutfs_err(sb, "invalid meta_reserve_blocks value %d, must be between 0 and %d",
val, INT_MAX);
return -EINVAL;
}
return 0;
}
/*
* Parse the option string into our options struct. This can allocate
@@ -279,6 +303,14 @@ static int parse_options(struct super_block *sb, char *options, struct scoutfs_m
opts->quorum_slot_nr = nr;
break;
case Opt_meta_reserve_blocks:
ret = match_int(args, &nr);
ret = verify_meta_reserve_blocks(sb, ret, nr);
if (ret < 0)
return ret;
opts->meta_reserve_blocks = nr;
break;
default:
scoutfs_err(sb, "Unknown or malformed option, \"%s\"", p);
return -EINVAL;
@@ -371,6 +403,7 @@ int scoutfs_options_show(struct seq_file *seq, struct dentry *root)
seq_printf(seq, ",orphan_scan_delay_ms=%u", opts.orphan_scan_delay_ms);
if (opts.quorum_slot_nr >= 0)
seq_printf(seq, ",quorum_slot_nr=%d", opts.quorum_slot_nr);
seq_printf(seq, ".meta_reserve_blocks=%llu", opts.meta_reserve_blocks);
return 0;
}
@@ -589,6 +622,17 @@ static ssize_t quorum_slot_nr_show(struct kobject *kobj, struct kobj_attribute *
}
SCOUTFS_ATTR_RO(quorum_slot_nr);
static ssize_t meta_reserve_blocks_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
struct scoutfs_mount_options opts;
scoutfs_options_read(sb, &opts);
return snprintf(buf, PAGE_SIZE, "%lld\n", opts.meta_reserve_blocks);
}
SCOUTFS_ATTR_RO(meta_reserve_blocks);
static struct attribute *options_attrs[] = {
SCOUTFS_ATTR_PTR(data_prealloc_blocks),
SCOUTFS_ATTR_PTR(data_prealloc_contig_only),
@@ -597,6 +641,7 @@ static struct attribute *options_attrs[] = {
SCOUTFS_ATTR_PTR(orphan_scan_delay_ms),
SCOUTFS_ATTR_PTR(quorum_heartbeat_timeout_ms),
SCOUTFS_ATTR_PTR(quorum_slot_nr),
SCOUTFS_ATTR_PTR(meta_reserve_blocks),
NULL,
};

View File

@@ -13,6 +13,7 @@ struct scoutfs_mount_options {
unsigned int orphan_scan_delay_ms;
int quorum_slot_nr;
u64 quorum_heartbeat_timeout_ms;
u64 meta_reserve_blocks;
};
void scoutfs_options_read(struct super_block *sb, struct scoutfs_mount_options *opts);

View File

@@ -286,6 +286,52 @@ TRACE_EVENT(scoutfs_data_alloc_block_enter,
STE_ENTRY_ARGS(ext))
);
TRACE_EVENT(scoutfs_data_page_mkwrite,
TP_PROTO(struct super_block *sb, __u64 ino, __u64 pos, __u32 ret),
TP_ARGS(sb, ino, pos, ret),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, ino)
__field(__u64, pos)
__field(__u32, ret)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->ino = ino;
__entry->pos = pos;
__entry->ret = ret;
),
TP_printk(SCSBF" ino %llu pos %llu ret %u ",
SCSB_TRACE_ARGS, __entry->ino, __entry->pos, __entry->ret)
);
TRACE_EVENT(scoutfs_data_filemap_fault,
TP_PROTO(struct super_block *sb, __u64 ino, __u64 pos, __u32 ret),
TP_ARGS(sb, ino, pos, ret),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, ino)
__field(__u64, pos)
__field(__u32, ret)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->ino = ino;
__entry->pos = pos;
__entry->ret = ret;
),
TP_printk(SCSBF" ino %llu pos %llu ret %u ",
SCSB_TRACE_ARGS, __entry->ino, __entry->pos, __entry->ret)
);
DECLARE_EVENT_CLASS(scoutfs_data_file_extent_class,
TP_PROTO(struct super_block *sb, __u64 ino, struct scoutfs_extent *ext),

View File

@@ -772,11 +772,14 @@ static int alloc_move_empty(struct super_block *sb,
u64 scoutfs_server_reserved_meta_blocks(struct super_block *sb)
{
DECLARE_SERVER_INFO(sb, server);
struct scoutfs_mount_options opts;
u64 server_blocks;
u64 client_blocks;
u64 log_blocks;
u64 nr_clients;
scoutfs_options_read(sb, &opts);
/* server has two meta_avail lists it swaps between */
server_blocks = SCOUTFS_SERVER_META_FILL_TARGET * 2;
@@ -801,7 +804,7 @@ u64 scoutfs_server_reserved_meta_blocks(struct super_block *sb)
nr_clients = server->nr_clients;
spin_unlock(&server->lock);
return server_blocks + (max(1ULL, nr_clients) * client_blocks);
return server_blocks + (max(1ULL, nr_clients) * client_blocks) + opts.meta_reserve_blocks;
}
/*

2
tests/.gitignore vendored
View File

@@ -10,3 +10,5 @@ src/stage_tmpfile
src/create_xattr_loop
src/o_tmpfile_umask
src/o_tmpfile_linkat
src/mmap_stress
src/mmap_validate

View File

@@ -13,7 +13,9 @@ BIN := src/createmany \
src/create_xattr_loop \
src/fragmented_data_extents \
src/o_tmpfile_umask \
src/o_tmpfile_linkat
src/o_tmpfile_linkat \
src/mmap_stress \
src/mmap_validate
DEPS := $(wildcard src/*.d)
@@ -23,8 +25,10 @@ ifneq ($(DEPS),)
-include $(DEPS)
endif
src/mmap_stress: LIBS+=-lpthread
$(BIN): %: %.c Makefile
gcc $(CFLAGS) -MD -MP -MF $*.d $< -o $@
gcc $(CFLAGS) -MD -MP -MF $*.d $< -o $@ $(LIBS)
.PHONY: clean
clean:

88
tests/funcs/tap.sh Normal file
View File

@@ -0,0 +1,88 @@
#
# Generate TAP format test results
#
t_tap_header()
{
local runid=$1
local sequence=( $(echo $tests) )
local count=${#sequence[@]}
# avoid recreating the same TAP result over again - harness sets this
[[ -z "$runid" ]] && runid="*test*"
cat > $T_RESULTS/scoutfs.tap <<TAPEOF
TAP version 14
1..${count}
#
# TAP results for run ${runid}
#
# host/run info:
#
# hostname: ${HOSTNAME}
# test start time: $(date --utc)
# uname -r: $(uname -r)
# scoutfs commit id: $(git describe --tags)
#
# sequence for this run:
#
TAPEOF
# Sequence
for t in ${tests}; do
echo ${t/.sh/}
done | cat -n | expand | column -c 120 | expand | sed 's/^ /#/' >> $T_RESULTS/scoutfs.tap
echo "#" >> $T_RESULTS/scoutfs.tap
}
t_tap_progress()
{
(
local i=$(( testcount + 1 ))
local testname=$1
local result=$2
local diff=""
local dmsg=""
if [[ -s "$T_RESULTS/tmp/${testname}/dmesg.new" ]]; then
dmsg="1"
fi
if ! cmp -s golden/${testname} $T_RESULTS/output/${testname}; then
diff="1"
fi
if [[ "${result}" == "100" ]] && [[ -z "${dmsg}" ]] && [[ -z "${diff}" ]]; then
echo "ok ${i} - ${testname}"
elif [[ "${result}" == "103" ]]; then
echo "ok ${i} - ${testname}"
echo "# ${testname} ** skipped - permitted **"
else
echo "not ok ${i} - ${testname}"
case ${result} in
101)
echo "# ${testname} ** skipped **"
;;
102)
echo "# ${testname} ** failed **"
;;
esac
if [[ -n "${diff}" ]]; then
echo "#"
echo "# diff:"
echo "#"
diff -u golden/${testname} $T_RESULTS/output/${testname} | expand | sed 's/^/# /'
fi
if [[ -n "${dmsg}" ]]; then
echo "#"
echo "# dmesg:"
echo "#"
cat "$T_RESULTS/tmp/${testname}/dmesg.new" | sed 's/^/# /'
fi
fi
) >> $T_RESULTS/scoutfs.tap
}

27
tests/golden/mmap Normal file
View File

@@ -0,0 +1,27 @@
== mmap_stress
thread 0 complete
thread 1 complete
thread 2 complete
thread 3 complete
thread 4 complete
== basic mmap/read/write consistency checks
== mmap read from offline extent
0: offset: 0 length: 2 flags: O.L
extents: 1
1
00000200: ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ................
0
0: offset: 0 length: 2 flags: ..L
extents: 1
== mmap write to an offline extent
0: offset: 0 length: 2 flags: O.L
extents: 1
1
0
0: offset: 0 length: 2 flags: ..L
extents: 1
00000000 ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea |................|
00000010 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 |................|
00000020 ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea |................|
00000030
== done

View File

@@ -0,0 +1,97 @@
== create content
== readdir all
00000000: d_off: 0x00000001 d_reclen: 0x18 d_type: DT_DIR d_name: .
00000001: d_off: 0x00000002 d_reclen: 0x18 d_type: DT_DIR d_name: ..
00000002: d_off: 0x00000003 d_reclen: 0x18 d_type: DT_REG d_name: a
00000003: d_off: 0x00000004 d_reclen: 0x20 d_type: DT_REG d_name: aaaaaaaa
00000004: d_off: 0x00000005 d_reclen: 0x28 d_type: DT_REG d_name: aaaaaaaaaaaaaaa
00000005: d_off: 0x00000006 d_reclen: 0x30 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaa
00000006: d_off: 0x00000007 d_reclen: 0x38 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000007: d_off: 0x00000008 d_reclen: 0x38 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000008: d_off: 0x00000009 d_reclen: 0x40 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000009: d_off: 0x0000000a d_reclen: 0x48 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000000a: d_off: 0x0000000b d_reclen: 0x50 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000000b: d_off: 0x0000000c d_reclen: 0x58 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000000c: d_off: 0x0000000d d_reclen: 0x60 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000000d: d_off: 0x0000000e d_reclen: 0x68 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000000e: d_off: 0x0000000f d_reclen: 0x70 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000000f: d_off: 0x00000010 d_reclen: 0x70 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000010: d_off: 0x00000011 d_reclen: 0x78 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000011: d_off: 0x00000012 d_reclen: 0x80 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000012: d_off: 0x00000013 d_reclen: 0x88 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000013: d_off: 0x00000014 d_reclen: 0x90 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000014: d_off: 0x00000015 d_reclen: 0x98 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000015: d_off: 0x00000016 d_reclen: 0xa0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000016: d_off: 0x00000017 d_reclen: 0xa8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000017: d_off: 0x00000018 d_reclen: 0xa8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000018: d_off: 0x00000019 d_reclen: 0xb0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000019: d_off: 0x0000001a d_reclen: 0xb8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001a: d_off: 0x0000001b d_reclen: 0xc0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001b: d_off: 0x0000001c d_reclen: 0xc8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001c: d_off: 0x0000001d d_reclen: 0xd0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001d: d_off: 0x0000001e d_reclen: 0xd8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001e: d_off: 0x0000001f d_reclen: 0xe0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001f: d_off: 0x00000020 d_reclen: 0xe0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000020: d_off: 0x00000021 d_reclen: 0xe8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000021: d_off: 0x00000022 d_reclen: 0xf0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000022: d_off: 0x00000023 d_reclen: 0xf8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000023: d_off: 0x00000024 d_reclen: 0x100 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000024: d_off: 0x00000025 d_reclen: 0x108 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000025: d_off: 0x00000026 d_reclen: 0x110 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
== readdir offset
00000014: d_off: 0x00000015 d_reclen: 0x98 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000015: d_off: 0x00000016 d_reclen: 0xa0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000016: d_off: 0x00000017 d_reclen: 0xa8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000017: d_off: 0x00000018 d_reclen: 0xa8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000018: d_off: 0x00000019 d_reclen: 0xb0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000019: d_off: 0x0000001a d_reclen: 0xb8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001a: d_off: 0x0000001b d_reclen: 0xc0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001b: d_off: 0x0000001c d_reclen: 0xc8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001c: d_off: 0x0000001d d_reclen: 0xd0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001d: d_off: 0x0000001e d_reclen: 0xd8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001e: d_off: 0x0000001f d_reclen: 0xe0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001f: d_off: 0x00000020 d_reclen: 0xe0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000020: d_off: 0x00000021 d_reclen: 0xe8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000021: d_off: 0x00000022 d_reclen: 0xf0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000022: d_off: 0x00000023 d_reclen: 0xf8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000023: d_off: 0x00000024 d_reclen: 0x100 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000024: d_off: 0x00000025 d_reclen: 0x108 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000025: d_off: 0x00000026 d_reclen: 0x110 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
== readdir len (bytes)
00000000: d_off: 0x00000001 d_reclen: 0x18 d_type: DT_DIR d_name: .
00000001: d_off: 0x00000002 d_reclen: 0x18 d_type: DT_DIR d_name: ..
00000002: d_off: 0x00000003 d_reclen: 0x18 d_type: DT_REG d_name: a
00000003: d_off: 0x00000004 d_reclen: 0x20 d_type: DT_REG d_name: aaaaaaaa
00000004: d_off: 0x00000005 d_reclen: 0x28 d_type: DT_REG d_name: aaaaaaaaaaaaaaa
00000005: d_off: 0x00000006 d_reclen: 0x30 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaa
00000006: d_off: 0x00000007 d_reclen: 0x38 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
== introduce gap
00000000: d_off: 0x00000001 d_reclen: 0x18 d_type: DT_DIR d_name: .
00000001: d_off: 0x00000002 d_reclen: 0x18 d_type: DT_DIR d_name: ..
00000002: d_off: 0x00000003 d_reclen: 0x18 d_type: DT_REG d_name: a
00000003: d_off: 0x00000004 d_reclen: 0x20 d_type: DT_REG d_name: aaaaaaaa
00000004: d_off: 0x00000005 d_reclen: 0x28 d_type: DT_REG d_name: aaaaaaaaaaaaaaa
00000005: d_off: 0x00000006 d_reclen: 0x30 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaa
00000006: d_off: 0x00000007 d_reclen: 0x38 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000007: d_off: 0x00000008 d_reclen: 0x38 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000008: d_off: 0x00000009 d_reclen: 0x40 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000009: d_off: 0x00000014 d_reclen: 0x48 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000014: d_off: 0x00000015 d_reclen: 0x98 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000015: d_off: 0x00000016 d_reclen: 0xa0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000016: d_off: 0x00000017 d_reclen: 0xa8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000017: d_off: 0x00000018 d_reclen: 0xa8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000018: d_off: 0x00000019 d_reclen: 0xb0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000019: d_off: 0x0000001a d_reclen: 0xb8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001a: d_off: 0x0000001b d_reclen: 0xc0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001b: d_off: 0x0000001c d_reclen: 0xc8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001c: d_off: 0x0000001d d_reclen: 0xd0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001d: d_off: 0x0000001e d_reclen: 0xd8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001e: d_off: 0x0000001f d_reclen: 0xe0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000001f: d_off: 0x00000020 d_reclen: 0xe0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000020: d_off: 0x00000021 d_reclen: 0xe8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000021: d_off: 0x00000022 d_reclen: 0xf0 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000022: d_off: 0x00000023 d_reclen: 0xf8 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000023: d_off: 0x00000024 d_reclen: 0x100 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000024: d_off: 0x00000025 d_reclen: 0x108 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
00000025: d_off: 0x00000026 d_reclen: 0x110 d_type: DT_REG d_name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
== cleanup

View File

@@ -22,6 +22,8 @@ generic/024
generic/025
generic/026
generic/028
generic/029
generic/030
generic/031
generic/032
generic/033
@@ -53,6 +55,7 @@ generic/073
generic/076
generic/078
generic/079
generic/080
generic/081
generic/082
generic/084
@@ -81,10 +84,12 @@ generic/116
generic/117
generic/118
generic/119
generic/120
generic/121
generic/122
generic/123
generic/124
generic/126
generic/128
generic/129
generic/130
@@ -95,6 +100,7 @@ generic/136
generic/138
generic/139
generic/140
generic/141
generic/142
generic/143
generic/144
@@ -153,6 +159,7 @@ generic/210
generic/211
generic/212
generic/214
generic/215
generic/216
generic/217
generic/218
@@ -173,6 +180,9 @@ generic/238
generic/240
generic/244
generic/245
generic/246
generic/247
generic/248
generic/249
generic/250
generic/252
@@ -231,6 +241,7 @@ generic/317
generic/319
generic/322
generic/324
generic/325
generic/326
generic/327
generic/328
@@ -244,6 +255,7 @@ generic/337
generic/341
generic/342
generic/343
generic/346
generic/348
generic/353
generic/355
@@ -305,7 +317,9 @@ generic/424
generic/425
generic/426
generic/427
generic/428
generic/436
generic/437
generic/439
generic/440
generic/443
@@ -315,6 +329,7 @@ generic/448
generic/449
generic/450
generic/451
generic/452
generic/453
generic/454
generic/456
@@ -438,6 +453,7 @@ generic/610
generic/611
generic/612
generic/613
generic/614
generic/618
generic/621
generic/623
@@ -451,6 +467,7 @@ generic/632
generic/634
generic/635
generic/637
generic/638
generic/639
generic/640
generic/644
@@ -862,4 +879,4 @@ generic/688
generic/689
shared/002
shared/032
Passed all 495 tests
Passed all 512 tests

View File

@@ -464,6 +464,7 @@ for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
if [ "$i" -lt "$T_QUORUM" ]; then
opts="$opts,quorum_slot_nr=$i"
fi
opts="$opts,meta_reserve_blocks=0"
opts="${opts}${T_MNT_OPTIONS}"
msg "mounting $meta_dev|$data_dev on $dir"
@@ -512,6 +513,11 @@ msg "running tests"
> "$T_RESULTS/skip.log"
> "$T_RESULTS/fail.log"
# generate a test ID to make sure we can de-duplicate TAP results in aggregation
. funcs/tap.sh
t_tap_header $(uuidgen)
testcount=0
passed=0
skipped=0
failed=0
@@ -637,6 +643,11 @@ for t in $tests; do
test -n "$T_ABORT" && die "aborting after first failure"
fi
# record results for TAP format output
t_tap_progress $test_name $sts
((testcount++))
done
msg "all tests run: $passed passed, $skipped skipped, $skipped_permitted skipped (permitted), $failed failed"

View File

@@ -6,6 +6,7 @@ inode-items-updated.sh
simple-inode-index.sh
simple-staging.sh
simple-release-extents.sh
simple-readdir.sh
get-referring-entries.sh
fallocate.sh
basic-truncate.sh
@@ -17,6 +18,7 @@ projects.sh
large-fragmented-free.sh
format-version-forward-back.sh
enospc.sh
mmap.sh
srch-safe-merge-pos.sh
srch-basic-functionality.sh
simple-xattr-unit.sh

181
tests/src/mmap_stress.c Normal file
View File

@@ -0,0 +1,181 @@
#define _GNU_SOURCE
/*
* mmap() stress test for scoutfs
*
* This test exercises the scoutfs kernel module's locking by
* repeatedly reading/writing using mmap and pread/write calls
* across 5 clients (mounts).
*
* Each thread operates on a single thread/client, and performs
* operations in a random order on the file.
*
* The goal is to assure that locking between _page_mkwrite vfs
* calls and the normal read/write paths do not cause deadlocks.
*
* There is no content validation performed. All that is done is
* assure that the programs continues without errors.
*/
#include <sys/types.h>
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <sys/mman.h>
#include <pthread.h>
#include <errno.h>
static int size = 0;
static int count = 0; /* XXX make this duration instead */
struct thread_info {
int nr;
int fd;
};
static void *run_test_func(void *ptr)
{
void *buf = NULL;
char *addr = NULL;
struct thread_info *tinfo = ptr;
int c = 0;
int fd;
ssize_t read, written, ret;
int preads = 0, pwrites = 0, mreads = 0, mwrites = 0;
fd = tinfo->fd;
if (posix_memalign(&buf, 4096, size) != 0) {
perror("calloc");
exit(-1);
}
addr = mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(-1);
}
usleep(100000); /* 0.1sec to allow all threads to start roughly at the same time */
for (;;) {
if (++c > count)
break;
switch (rand() % 4) {
case 0: /* pread */
preads++;
for (read = 0; read < size;) {
ret = pread(fd, buf, size - read, read);
if (ret < 0) {
perror("pwrite");
exit(-1);
}
read += ret;
}
break;
case 1: /* pwrite */
pwrites++;
memset(buf, (char)(c & 0xff), size);
for (written = 0; written < size;) {
ret = pwrite(fd, buf, size - written, written);
if (ret < 0) {
perror("pwrite");
exit(-1);
}
written += ret;
}
break;
case 2: /* mmap read */
mreads++;
memcpy(buf, addr, size); /* noerr */
break;
case 3: /* mmap write */
mwrites++;
memset(buf, (char)(c & 0xff), size);
memcpy(addr, buf, size); /* noerr */
break;
}
}
munmap(addr, size);
free(buf);
printf("thread %u complete: preads %u pwrites %u mreads %u mwrites %u\n", tinfo->nr,
mreads, mwrites, preads, pwrites);
return NULL;
}
int main(int argc, char **argv)
{
pthread_t thread[5];
struct thread_info tinfo[5];
int fd[5];
int ret;
int i;
if (argc != 8) {
fprintf(stderr, "%s requires 7 arguments - size count file1 file2 file3 file4 file5\n", argv[0]);
exit(-1);
}
size = atoi(argv[1]);
if (size <= 0) {
fprintf(stderr, "invalid size, must be greater than 0\n");
exit(-1);
}
count = atoi(argv[2]);
if (count < 0) {
fprintf(stderr, "invalid count, must be greater than 0\n");
exit(-1);
}
/* create and truncate one fd */
fd[0] = open(argv[3], O_RDWR | O_CREAT | O_TRUNC, 00644);
if (fd[0] < 0) {
perror("open");
exit(-1);
}
/* make it the test size */
if (posix_fallocate(fd[0], 0, size) != 0) {
perror("fallocate");
exit(-1);
}
/* now open the rest of the fds */
for (i = 1; i < 5; i++) {
fd[i] = open(argv[3+i], O_RDWR);
if (fd[i] < 0) {
perror("open");
exit(-1);
}
}
/* start threads */
for (i = 0; i < 5; i++) {
tinfo[i].fd = fd[i];
tinfo[i].nr = i;
ret = pthread_create(&thread[i], NULL, run_test_func, (void*)&tinfo[i]);
if (ret) {
perror("pthread_create");
exit(-1);
}
}
/* wait for complete */
for (i = 0; i < 5; i++)
pthread_join(thread[i], NULL);
for (i = 0; i < 5; i++)
close(fd[i]);
exit(0);
}

159
tests/src/mmap_validate.c Normal file
View File

@@ -0,0 +1,159 @@
#define _GNU_SOURCE
/*
* mmap() content consistency checking for scoutfs
*
* This test program validates that content from memory mappings
* are consistent across clients, whether written/read with mmap or
* normal writes/reads.
*
* One side of (read/write) will always be memory mapped. It may
* be that both sides do memory mapped (33% of the time).
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
static int count = 0;
static int size = 0;
static void run_test_func(int fd1, int fd2)
{
void *buf1 = NULL;
void *buf2 = NULL;
char *addr1 = NULL;
char *addr2 = NULL;
int c = 0;
ssize_t read, written, ret;
/* buffers for both sides to compare */
if (posix_memalign(&buf1, 4096, size) != 0) {
perror("calloc1");
exit(-1);
}
if (posix_memalign(&buf2, 4096, size) != 0) {
perror("calloc1");
exit(-1);
}
/* memory maps for both sides */
addr1 = mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_SHARED, fd1, 0);
if (addr1 == MAP_FAILED) {
perror("mmap1");
exit(-1);
}
addr2 = mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_SHARED, fd2, 0);
if (addr2 == MAP_FAILED) {
perror("mmap2");
exit(-1);
}
for (;;) {
if (++c > count) /* 10k iterations */
break;
/* put a pattern in buf1 */
memset(buf1, c & 0xff, size);
/* pwrite or mmap write from buf1 */
switch (c % 3) {
case 0: /* pwrite */
for (written = 0; written < size;) {
ret = pwrite(fd1, buf1, size - written, written);
if (ret < 0) {
perror("pwrite");
exit(-1);
}
written += ret;
}
break;
default: /* mmap write */
memcpy(addr1, buf1, size);
break;
}
/* pread or mmap read to buf2 */
switch (c % 3) {
case 2: /* pread */
for (read = 0; read < size;) {
ret = pread(fd2, buf2, size - read, read);
if (ret < 0) {
perror("pwrite");
exit(-1);
}
read += ret;
}
break;
default: /* mmap read */
memcpy(buf2, addr2, size);
break;
}
/* compare bufs */
if (memcmp(buf1, buf2, size) != 0) {
fprintf(stderr, "memcmp() failed\n");
exit(-1);
}
}
munmap(addr1, size);
munmap(addr2, size);
free(buf1);
free(buf2);
}
int main(int argc, char **argv)
{
int fd[1];
if (argc != 5) {
fprintf(stderr, "%s requires 4 arguments - size count file1 file2\n", argv[0]);
exit(-1);
}
size = atoi(argv[1]);
if (size <= 0) {
fprintf(stderr, "invalid size, must be greater than 0\n");
exit(-1);
}
count = atoi(argv[2]);
if (count < 3) {
fprintf(stderr, "invalid count, must be greater than 3\n");
exit(-1);
}
/* create and truncate one fd */
fd[0] = open(argv[3], O_RDWR | O_CREAT | O_TRUNC, 00644);
if (fd[0] < 0) {
perror("open");
exit(-1);
}
fd[1] = open(argv[4], O_RDWR , 00644);
if (fd[1] < 0) {
perror("open");
exit(-1);
}
/* make it the test size */
if (posix_fallocate(fd[0], 0, size) != 0) {
perror("fallocate");
exit(-1);
}
/* run the test function */
run_test_func(fd[0], fd[1]);
close(fd[0]);
close(fd[1]);
exit(0);
}

View File

@@ -11,7 +11,7 @@ FILE="$T_D0/file"
# final block as we truncated past it.
#
echo "== truncate writes zeroed partial end of file block"
yes | dd of="$FILE" bs=8K count=1 status=none iflag=fullblock
yes 2>/dev/null | dd of="$FILE" bs=8K count=1 status=none iflag=fullblock
sync
# not passing iflag=fullblock causes the file occasionally to just be

54
tests/tests/mmap.sh Normal file
View File

@@ -0,0 +1,54 @@
#
# test mmap() and normal read/write consistency between different nodes
#
t_require_commands mmap_stress mmap_validate scoutfs xfs_io
echo "== mmap_stress"
mmap_stress 8192 2000 "$T_D0/mmap_stress" "$T_D1/mmap_stress" "$T_D2/mmap_stress" "$T_D3/mmap_stress" "$T_D4/mmap_stress" | sed 's/:.*//g' | sort
echo "== basic mmap/read/write consistency checks"
mmap_validate 256 1000 "$T_D0/mmap_val1" "$T_D1/mmap_val1"
mmap_validate 8192 1000 "$T_D0/mmap_val2" "$T_D1/mmap_val2"
mmap_validate 88400 1000 "$T_D0/mmap_val3" "$T_D1/mmap_val3"
echo "== mmap read from offline extent"
F="$T_D0/mmap-offline"
touch "$F"
xfs_io -c "pwrite -S 0xEA 0 8192" "$F" > /dev/null
cp "$F" "${F}-stage"
vers=$(scoutfs stat -s data_version "$F")
scoutfs release "$F" -V "$vers" -o 0 -l 8192
scoutfs get-fiemap -L "$F"
xfs_io -c "mmap -rwx 0 8192" \
-c "mread -v 512 16" "$F" &
sleep 1
# should be 1 - data waiting
jobs | wc -l
scoutfs stage "${F}-stage" "$F" -V "$vers" -o 0 -l 8192
# xfs_io thread <here> will output 16 bytes of read data
sleep 1
# should be 0 - no more waiting jobs, xfs_io should have exited
jobs | wc -l
scoutfs get-fiemap -L "$F"
echo "== mmap write to an offline extent"
# reuse the same file
scoutfs release "$F" -V "$vers" -o 0 -l 8192
scoutfs get-fiemap -L "$F"
xfs_io -c "mmap -rwx 0 8192" \
-c "mwrite -S 0x11 528 16" "$F" &
sleep 1
# should be 1 job waiting
jobs | wc -l
scoutfs stage "${F}-stage" "$F" -V "$vers" -o 0 -l 8192
# no output here from write
sleep 1
# should be 0 - no more waiting jobs, xfs_io should have exited
jobs | wc -l
scoutfs get-fiemap -L "$F"
# read back contents to assure write changed the file
dd status=none if="$F" bs=1 count=48 skip=512 | hexdump -C
echo "== done"
t_pass

View File

@@ -0,0 +1,37 @@
#
# verify d_off output of xfs_io is consistent.
#
t_require_commands xfs_io
filt()
{
grep d_off | cut -d ' ' -f 1,4-
}
echo "== create content"
for s in $(seq 1 7 250); do
f=$(printf '%*s' $s | tr ' ' 'a')
touch ${T_D0}/$f
done
echo "== readdir all"
xfs_io -c "readdir -v" $T_D0 | filt
echo "== readdir offset"
xfs_io -c "readdir -v -o 20" $T_D0 | filt
echo "== readdir len (bytes)"
xfs_io -c "readdir -v -l 193" $T_D0 | filt
echo "== introduce gap"
for s in $(seq 57 7 120); do
f=$(printf '%*s' $s | tr ' ' 'a')
rm -f ${T_D0}/$f
done
xfs_io -c "readdir -v" $T_D0 | filt
echo "== cleanup"
rm -rf $T_D0
t_pass

View File

@@ -65,26 +65,14 @@ EOF
cat << EOF > local.exclude
generic/003 # missing atime update in buffered read
generic/029 # mmap missing
generic/030 # mmap missing
generic/075 # file content mismatch failures (fds, etc)
generic/080 # mmap missing
generic/103 # enospc causes trans commit failures
generic/108 # mount fails on failing device?
generic/112 # file content mismatch failures (fds, etc)
generic/120 # (can't exec 'cause no mmap)
generic/126 # (can't exec 'cause no mmap)
generic/141 # mmap missing
generic/213 # enospc causes trans commit failures
generic/215 # mmap missing
generic/246 # mmap missing
generic/247 # mmap missing
generic/248 # mmap missing
generic/318 # can't support user namespaces until v5.11
generic/321 # requires selinux enabled for '+' in ls?
generic/325 # mmap missing
generic/338 # BUG_ON update inode error handling
generic/346 # mmap missing
generic/347 # _dmthin_mount doesn't work?
generic/356 # swap
generic/357 # swap
@@ -92,16 +80,13 @@ generic/409 # bind mounts not scripted yet
generic/410 # bind mounts not scripted yet
generic/411 # bind mounts not scripted yet
generic/423 # symlink inode size is strlen() + 1 on scoutfs
generic/428 # mmap missing
generic/430 # xfs_io copy_range missing in el7
generic/431 # xfs_io copy_range missing in el7
generic/432 # xfs_io copy_range missing in el7
generic/433 # xfs_io copy_range missing in el7
generic/434 # xfs_io copy_range missing in el7
generic/437 # mmap missing
generic/441 # dm-mapper
generic/444 # el9's posix_acl_update_mode is buggy ?
generic/452 # exec test - no mmap
generic/467 # open_by_handle ESTALE
generic/472 # swap
generic/484 # dm-mapper
@@ -118,11 +103,9 @@ generic/565 # xfs_io copy_range missing in el7
generic/568 # falloc not resulting in block count increase
generic/569 # swap
generic/570 # swap
generic/614 # mmap missing
generic/620 # dm-hugedisk
generic/633 # mmap, id-mapped mounts missing in el7
generic/633 # id-mapped mounts missing in el7
generic/636 # swap
generic/638 # mmap missing
generic/641 # swap
generic/643 # swap
EOF

View File

@@ -10,6 +10,11 @@
* Just a quick simple native bitmap.
*/
int test_bit(unsigned long *bits, u64 nr)
{
return !!(bits[nr / BITS_PER_LONG] & (1UL << (nr & (BITS_PER_LONG - 1))));
}
void set_bit(unsigned long *bits, u64 nr)
{
bits[nr / BITS_PER_LONG] |= 1UL << (nr & (BITS_PER_LONG - 1));

View File

@@ -1,6 +1,7 @@
#ifndef _BITMAP_H_
#define _BITMAP_H_
int test_bit(unsigned long *bits, u64 nr);
void set_bit(unsigned long *bits, u64 nr);
void clear_bit(unsigned long *bits, u64 nr);
u64 find_next_set_bit(unsigned long *start, u64 from, u64 total);

20
utils/src/bloom.c Normal file
View File

@@ -0,0 +1,20 @@
#include <errno.h>
#include "sparse.h"
#include "util.h"
#include "format.h"
#include "hash.h"
#include "bloom.h"
void calc_bloom_nrs(struct scoutfs_key *key, unsigned int *nrs)
{
u64 hash;
int i;
hash = scoutfs_hash64(key, sizeof(struct scoutfs_key));
for (i = 0; i < SCOUTFS_FOREST_BLOOM_NRS; i++) {
nrs[i] = (u32)hash % SCOUTFS_FOREST_BLOOM_BITS;
hash >>= SCOUTFS_FOREST_BLOOM_FUNC_BITS;
}
}

6
utils/src/bloom.h Normal file
View File

@@ -0,0 +1,6 @@
#ifndef _BLOOM_H_
#define _BLOOM_H_
void calc_bloom_nrs(struct scoutfs_key *key, unsigned int *nrs);
#endif

View File

@@ -8,7 +8,7 @@
#include "leaf_item_hash.h"
#include "btree.h"
static void init_block(struct scoutfs_btree_block *bt, int level)
void btree_init_block(struct scoutfs_btree_block *bt, int level)
{
int free;
@@ -33,7 +33,7 @@ void btree_init_root_single(struct scoutfs_btree_root *root,
memset(bt, 0, SCOUTFS_BLOCK_LG_SIZE);
init_block(bt, 0);
btree_init_block(bt, 0);
}
static void *alloc_val(struct scoutfs_btree_block *bt, int len)

View File

@@ -1,6 +1,7 @@
#ifndef _BTREE_H_
#define _BTREE_H_
void btree_init_block(struct scoutfs_btree_block *bt, int level);
void btree_init_root_single(struct scoutfs_btree_root *root,
struct scoutfs_btree_block *bt,
u64 seq, u64 blkno);

View File

@@ -156,6 +156,16 @@ static inline void list_move_tail(struct list_head *list,
list_add_tail(list, head);
}
/**
* list_is_head - tests whether @list is the list @head
* @list: the entry to test
* @head: the head of the list
*/
static inline int list_is_head(const struct list_head *list, const struct list_head *head)
{
return list == head;
}
/**
* list_empty - tests whether a list is empty
* @head: the list to test.
@@ -242,6 +252,15 @@ static inline void list_splice_init(struct list_head *list,
for (pos = (head)->next, n = pos->next; pos != (head); \
pos = n, n = pos->next)
/**
* list_entry_is_head - test if the entry points to the head of the list
* @pos: the type * to cursor
* @head: the head for your list.
* @member: the name of the list_head within the struct.
*/
#define list_entry_is_head(pos, head, member) \
(&pos->member == (head))
/**
* list_for_each_entry - iterate over list of given type
* @pos: the type * to use as a loop counter.
@@ -307,4 +326,28 @@ static inline void list_splice_init(struct list_head *list,
#define list_next_entry(pos, member) \
list_entry((pos)->member.next, typeof(*(pos)), member)
/**
* list_prev_entry - get the prev element in list
* @pos: the type * to cursor
* @member: the name of the list_head within the struct.
*/
#define list_prev_entry(pos, member) \
list_entry((pos)->member.prev, typeof(*(pos)), member)
/**
* list_for_each_entry_safe_reverse - iterate backwards over list safe against removal
* @pos: the type * to use as a loop cursor.
* @n: another type * to use as temporary storage
* @head: the head for your list.
* @member: the name of the list_head within the struct.
*
* Iterate backwards over list of given type, safe against removal
* of list entry.
*/
#define list_for_each_entry_safe_reverse(pos, n, head, member) \
for (pos = list_last_entry(head, typeof(*pos), member), \
n = list_prev_entry(pos, member); \
!list_entry_is_head(pos, head, member); \
pos = n, n = list_prev_entry(n, member))
#endif

View File

@@ -0,0 +1,24 @@
#ifndef _LK_RBTREE_WRAPPER_H_
#define _LK_RBTREE_WRAPPER_H_
/*
* We're using this lame hack to build and use the kernel's rbtree in
* userspace. We drop the kernel's rbtree*[ch] implementation in and
* use them with this wrapper. We only have to remove the kernel
* includes from the imported files.
*/
#include <stdbool.h>
#include "util.h"
#define rcu_assign_pointer(a, b) do { a = b; } while (0)
#define READ_ONCE(a) ({ a; })
#define WRITE_ONCE(a, b) do { a = b; } while (0)
#define unlikely(a) ({ a; })
#define EXPORT_SYMBOL(a) /* nop */
#include "rbtree_types.h"
#include "rbtree.h"
#include "rbtree_augmented.h"
#endif

24
utils/src/mode_types.c Normal file
View File

@@ -0,0 +1,24 @@
#include <unistd.h>
#include <sys/stat.h>
#include "sparse.h"
#include "util.h"
#include "format.h"
#include "mode_types.h"
unsigned int mode_to_type(mode_t mode)
{
#define S_SHIFT 12
static unsigned char mode_types[S_IFMT >> S_SHIFT] = {
[S_IFIFO >> S_SHIFT] = SCOUTFS_DT_FIFO,
[S_IFCHR >> S_SHIFT] = SCOUTFS_DT_CHR,
[S_IFDIR >> S_SHIFT] = SCOUTFS_DT_DIR,
[S_IFBLK >> S_SHIFT] = SCOUTFS_DT_BLK,
[S_IFREG >> S_SHIFT] = SCOUTFS_DT_REG,
[S_IFLNK >> S_SHIFT] = SCOUTFS_DT_LNK,
[S_IFSOCK >> S_SHIFT] = SCOUTFS_DT_SOCK,
};
return mode_types[(mode & S_IFMT) >> S_SHIFT];
#undef S_SHIFT
}

6
utils/src/mode_types.h Normal file
View File

@@ -0,0 +1,6 @@
#ifndef _MODE_TYPES_H_
#define _MODE_TYPES_H_
unsigned int mode_to_type(mode_t mode);
#endif

46
utils/src/name_hash.h Normal file
View File

@@ -0,0 +1,46 @@
#ifndef _SCOUTFS_NAME_HASH_H_
#define _SCOUTFS_NAME_HASH_H_
#include "hash.h"
/*
* Test a bit number as though an array of bytes is a large len-bit
* big-endian value. nr 0 is the LSB of the final byte, nr (len - 1) is
* the MSB of the first byte.
*/
static int test_be_bytes_bit(int nr, const char *bytes, int len)
{
return bytes[(len - 1 - nr) >> 3] & (1 << (nr & 7));
}
/*
* Generate a 32bit "fingerprint" of the name by extracting 32 evenly
* distributed bits from the name. The intent is to have the sort order
* of the fingerprints reflect the memcmp() sort order of the names
* while mapping large names down to small fs keys.
*
* Names that are smaller than 32bits are biased towards the high bits
* of the fingerprint so that most significant bits of the fingerprints
* consistently reflect the initial characters of the names.
*/
static inline u32 dirent_name_fingerprint(const char *name, unsigned int name_len)
{
int name_bits = name_len * 8;
int skip = max(name_bits / 32, 1);
u32 fp = 0;
int f;
int n;
for (f = 31, n = name_bits - 1; f >= 0 && n >= 0; f--, n -= skip)
fp |= !!test_be_bytes_bit(n, name, name_bits) << f;
return fp;
}
static inline u64 dirent_name_hash(const char *name, unsigned int name_len)
{
return scoutfs_hash32(name, name_len) |
((u64)dirent_name_fingerprint(name, name_len) << 32);
}
#endif

View File

@@ -645,6 +645,8 @@ static int print_alloc_list_block(int fd, char *str, struct scoutfs_block_ref *r
u64 blkno;
u64 start;
u64 len;
u64 st;
u64 nr;
int wid;
int ret;
int i;
@@ -663,27 +665,37 @@ static int print_alloc_list_block(int fd, char *str, struct scoutfs_block_ref *r
AL_REF_A(&lblk->next), le32_to_cpu(lblk->start),
le32_to_cpu(lblk->nr));
if (lblk->nr) {
wid = printf(" exts: ");
start = 0;
len = 0;
for (i = 0; i < le32_to_cpu(lblk->nr); i++) {
if (len == 0)
start = le64_to_cpu(lblk->blknos[i]);
len++;
if (i == (le32_to_cpu(lblk->nr) - 1) ||
start + len != le64_to_cpu(lblk->blknos[i + 1])) {
if (wid >= 72)
wid = printf("\n ");
wid += printf("%llu,%llu ", start, len);
len = 0;
}
}
printf("\n");
st = le32_to_cpu(lblk->start);
nr = le32_to_cpu(lblk->nr);
if (st >= SCOUTFS_ALLOC_LIST_MAX_BLOCKS ||
nr > SCOUTFS_ALLOC_LIST_MAX_BLOCKS ||
(st + nr) > SCOUTFS_ALLOC_LIST_MAX_BLOCKS) {
printf(" (invalid start and nr fields)\n");
goto out;
}
if (lblk->nr == 0)
goto out;
wid = printf(" exts: ");
start = 0;
len = 0;
for (i = 0; i < nr; i++) {
if (len == 0)
start = le64_to_cpu(lblk->blknos[st + i]);
len++;
if (i == (nr - 1) || (start + len) != le64_to_cpu(lblk->blknos[st + i + 1])) {
if (wid >= 72)
wid = printf("\n ");
wid += printf("%llu,%llu ", start, len);
len = 0;
}
}
printf("\n");
out:
next = lblk->next;
free(lblk);
return print_alloc_list_block(fd, str, &next);

629
utils/src/rbtree.c Normal file
View File

@@ -0,0 +1,629 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
Red Black Trees
(C) 1999 Andrea Arcangeli <andrea@suse.de>
(C) 2002 David Woodhouse <dwmw2@infradead.org>
(C) 2012 Michel Lespinasse <walken@google.com>
linux/lib/rbtree.c
*/
#include "lk_rbtree_wrapper.h"
/*
* red-black trees properties: https://en.wikipedia.org/wiki/Rbtree
*
* 1) A node is either red or black
* 2) The root is black
* 3) All leaves (NULL) are black
* 4) Both children of every red node are black
* 5) Every simple path from root to leaves contains the same number
* of black nodes.
*
* 4 and 5 give the O(log n) guarantee, since 4 implies you cannot have two
* consecutive red nodes in a path and every red node is therefore followed by
* a black. So if B is the number of black nodes on every simple path (as per
* 5), then the longest possible path due to 4 is 2B.
*
* We shall indicate color with case, where black nodes are uppercase and red
* nodes will be lowercase. Unknown color nodes shall be drawn as red within
* parentheses and have some accompanying text comment.
*/
/*
* Notes on lockless lookups:
*
* All stores to the tree structure (rb_left and rb_right) must be done using
* WRITE_ONCE(). And we must not inadvertently cause (temporary) loops in the
* tree structure as seen in program order.
*
* These two requirements will allow lockless iteration of the tree -- not
* correct iteration mind you, tree rotations are not atomic so a lookup might
* miss entire subtrees.
*
* But they do guarantee that any such traversal will only see valid elements
* and that it will indeed complete -- does not get stuck in a loop.
*
* It also guarantees that if the lookup returns an element it is the 'correct'
* one. But not returning an element does _NOT_ mean it's not present.
*
* NOTE:
*
* Stores to __rb_parent_color are not important for simple lookups so those
* are left undone as of now. Nor did I check for loops involving parent
* pointers.
*/
static inline void rb_set_black(struct rb_node *rb)
{
rb->__rb_parent_color |= RB_BLACK;
}
static inline struct rb_node *rb_red_parent(struct rb_node *red)
{
return (struct rb_node *)red->__rb_parent_color;
}
/*
* Helper function for rotations:
* - old's parent and color get assigned to new
* - old gets assigned new as a parent and 'color' as a color.
*/
static inline void
__rb_rotate_set_parents(struct rb_node *old, struct rb_node *new,
struct rb_root *root, int color)
{
struct rb_node *parent = rb_parent(old);
new->__rb_parent_color = old->__rb_parent_color;
rb_set_parent_color(old, new, color);
__rb_change_child(old, new, parent, root);
}
static __always_inline void
__rb_insert(struct rb_node *node, struct rb_root *root,
void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
{
struct rb_node *parent = rb_red_parent(node), *gparent, *tmp;
while (true) {
/*
* Loop invariant: node is red.
*/
if (unlikely(!parent)) {
/*
* The inserted node is root. Either this is the
* first node, or we recursed at Case 1 below and
* are no longer violating 4).
*/
rb_set_parent_color(node, NULL, RB_BLACK);
break;
}
/*
* If there is a black parent, we are done.
* Otherwise, take some corrective action as,
* per 4), we don't want a red root or two
* consecutive red nodes.
*/
if(rb_is_black(parent))
break;
gparent = rb_red_parent(parent);
tmp = gparent->rb_right;
if (parent != tmp) { /* parent == gparent->rb_left */
if (tmp && rb_is_red(tmp)) {
/*
* Case 1 - node's uncle is red (color flips).
*
* G g
* / \ / \
* p u --> P U
* / /
* n n
*
* However, since g's parent might be red, and
* 4) does not allow this, we need to recurse
* at g.
*/
rb_set_parent_color(tmp, gparent, RB_BLACK);
rb_set_parent_color(parent, gparent, RB_BLACK);
node = gparent;
parent = rb_parent(node);
rb_set_parent_color(node, parent, RB_RED);
continue;
}
tmp = parent->rb_right;
if (node == tmp) {
/*
* Case 2 - node's uncle is black and node is
* the parent's right child (left rotate at parent).
*
* G G
* / \ / \
* p U --> n U
* \ /
* n p
*
* This still leaves us in violation of 4), the
* continuation into Case 3 will fix that.
*/
tmp = node->rb_left;
WRITE_ONCE(parent->rb_right, tmp);
WRITE_ONCE(node->rb_left, parent);
if (tmp)
rb_set_parent_color(tmp, parent,
RB_BLACK);
rb_set_parent_color(parent, node, RB_RED);
augment_rotate(parent, node);
parent = node;
tmp = node->rb_right;
}
/*
* Case 3 - node's uncle is black and node is
* the parent's left child (right rotate at gparent).
*
* G P
* / \ / \
* p U --> n g
* / \
* n U
*/
WRITE_ONCE(gparent->rb_left, tmp); /* == parent->rb_right */
WRITE_ONCE(parent->rb_right, gparent);
if (tmp)
rb_set_parent_color(tmp, gparent, RB_BLACK);
__rb_rotate_set_parents(gparent, parent, root, RB_RED);
augment_rotate(gparent, parent);
break;
} else {
tmp = gparent->rb_left;
if (tmp && rb_is_red(tmp)) {
/* Case 1 - color flips */
rb_set_parent_color(tmp, gparent, RB_BLACK);
rb_set_parent_color(parent, gparent, RB_BLACK);
node = gparent;
parent = rb_parent(node);
rb_set_parent_color(node, parent, RB_RED);
continue;
}
tmp = parent->rb_left;
if (node == tmp) {
/* Case 2 - right rotate at parent */
tmp = node->rb_right;
WRITE_ONCE(parent->rb_left, tmp);
WRITE_ONCE(node->rb_right, parent);
if (tmp)
rb_set_parent_color(tmp, parent,
RB_BLACK);
rb_set_parent_color(parent, node, RB_RED);
augment_rotate(parent, node);
parent = node;
tmp = node->rb_left;
}
/* Case 3 - left rotate at gparent */
WRITE_ONCE(gparent->rb_right, tmp); /* == parent->rb_left */
WRITE_ONCE(parent->rb_left, gparent);
if (tmp)
rb_set_parent_color(tmp, gparent, RB_BLACK);
__rb_rotate_set_parents(gparent, parent, root, RB_RED);
augment_rotate(gparent, parent);
break;
}
}
}
/*
* Inline version for rb_erase() use - we want to be able to inline
* and eliminate the dummy_rotate callback there
*/
static __always_inline void
____rb_erase_color(struct rb_node *parent, struct rb_root *root,
void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
{
struct rb_node *node = NULL, *sibling, *tmp1, *tmp2;
while (true) {
/*
* Loop invariants:
* - node is black (or NULL on first iteration)
* - node is not the root (parent is not NULL)
* - All leaf paths going through parent and node have a
* black node count that is 1 lower than other leaf paths.
*/
sibling = parent->rb_right;
if (node != sibling) { /* node == parent->rb_left */
if (rb_is_red(sibling)) {
/*
* Case 1 - left rotate at parent
*
* P S
* / \ / \
* N s --> p Sr
* / \ / \
* Sl Sr N Sl
*/
tmp1 = sibling->rb_left;
WRITE_ONCE(parent->rb_right, tmp1);
WRITE_ONCE(sibling->rb_left, parent);
rb_set_parent_color(tmp1, parent, RB_BLACK);
__rb_rotate_set_parents(parent, sibling, root,
RB_RED);
augment_rotate(parent, sibling);
sibling = tmp1;
}
tmp1 = sibling->rb_right;
if (!tmp1 || rb_is_black(tmp1)) {
tmp2 = sibling->rb_left;
if (!tmp2 || rb_is_black(tmp2)) {
/*
* Case 2 - sibling color flip
* (p could be either color here)
*
* (p) (p)
* / \ / \
* N S --> N s
* / \ / \
* Sl Sr Sl Sr
*
* This leaves us violating 5) which
* can be fixed by flipping p to black
* if it was red, or by recursing at p.
* p is red when coming from Case 1.
*/
rb_set_parent_color(sibling, parent,
RB_RED);
if (rb_is_red(parent))
rb_set_black(parent);
else {
node = parent;
parent = rb_parent(node);
if (parent)
continue;
}
break;
}
/*
* Case 3 - right rotate at sibling
* (p could be either color here)
*
* (p) (p)
* / \ / \
* N S --> N sl
* / \ \
* sl Sr S
* \
* Sr
*
* Note: p might be red, and then both
* p and sl are red after rotation(which
* breaks property 4). This is fixed in
* Case 4 (in __rb_rotate_set_parents()
* which set sl the color of p
* and set p RB_BLACK)
*
* (p) (sl)
* / \ / \
* N sl --> P S
* \ / \
* S N Sr
* \
* Sr
*/
tmp1 = tmp2->rb_right;
WRITE_ONCE(sibling->rb_left, tmp1);
WRITE_ONCE(tmp2->rb_right, sibling);
WRITE_ONCE(parent->rb_right, tmp2);
if (tmp1)
rb_set_parent_color(tmp1, sibling,
RB_BLACK);
augment_rotate(sibling, tmp2);
tmp1 = sibling;
sibling = tmp2;
}
/*
* Case 4 - left rotate at parent + color flips
* (p and sl could be either color here.
* After rotation, p becomes black, s acquires
* p's color, and sl keeps its color)
*
* (p) (s)
* / \ / \
* N S --> P Sr
* / \ / \
* (sl) sr N (sl)
*/
tmp2 = sibling->rb_left;
WRITE_ONCE(parent->rb_right, tmp2);
WRITE_ONCE(sibling->rb_left, parent);
rb_set_parent_color(tmp1, sibling, RB_BLACK);
if (tmp2)
rb_set_parent(tmp2, parent);
__rb_rotate_set_parents(parent, sibling, root,
RB_BLACK);
augment_rotate(parent, sibling);
break;
} else {
sibling = parent->rb_left;
if (rb_is_red(sibling)) {
/* Case 1 - right rotate at parent */
tmp1 = sibling->rb_right;
WRITE_ONCE(parent->rb_left, tmp1);
WRITE_ONCE(sibling->rb_right, parent);
rb_set_parent_color(tmp1, parent, RB_BLACK);
__rb_rotate_set_parents(parent, sibling, root,
RB_RED);
augment_rotate(parent, sibling);
sibling = tmp1;
}
tmp1 = sibling->rb_left;
if (!tmp1 || rb_is_black(tmp1)) {
tmp2 = sibling->rb_right;
if (!tmp2 || rb_is_black(tmp2)) {
/* Case 2 - sibling color flip */
rb_set_parent_color(sibling, parent,
RB_RED);
if (rb_is_red(parent))
rb_set_black(parent);
else {
node = parent;
parent = rb_parent(node);
if (parent)
continue;
}
break;
}
/* Case 3 - left rotate at sibling */
tmp1 = tmp2->rb_left;
WRITE_ONCE(sibling->rb_right, tmp1);
WRITE_ONCE(tmp2->rb_left, sibling);
WRITE_ONCE(parent->rb_left, tmp2);
if (tmp1)
rb_set_parent_color(tmp1, sibling,
RB_BLACK);
augment_rotate(sibling, tmp2);
tmp1 = sibling;
sibling = tmp2;
}
/* Case 4 - right rotate at parent + color flips */
tmp2 = sibling->rb_right;
WRITE_ONCE(parent->rb_left, tmp2);
WRITE_ONCE(sibling->rb_right, parent);
rb_set_parent_color(tmp1, sibling, RB_BLACK);
if (tmp2)
rb_set_parent(tmp2, parent);
__rb_rotate_set_parents(parent, sibling, root,
RB_BLACK);
augment_rotate(parent, sibling);
break;
}
}
}
/* Non-inline version for rb_erase_augmented() use */
void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
{
____rb_erase_color(parent, root, augment_rotate);
}
EXPORT_SYMBOL(__rb_erase_color);
/*
* Non-augmented rbtree manipulation functions.
*
* We use dummy augmented callbacks here, and have the compiler optimize them
* out of the rb_insert_color() and rb_erase() function definitions.
*/
static inline void dummy_propagate(struct rb_node *node, struct rb_node *stop) {}
static inline void dummy_copy(struct rb_node *old, struct rb_node *new) {}
static inline void dummy_rotate(struct rb_node *old, struct rb_node *new) {}
static const struct rb_augment_callbacks dummy_callbacks = {
.propagate = dummy_propagate,
.copy = dummy_copy,
.rotate = dummy_rotate
};
void rb_insert_color(struct rb_node *node, struct rb_root *root)
{
__rb_insert(node, root, dummy_rotate);
}
EXPORT_SYMBOL(rb_insert_color);
void rb_erase(struct rb_node *node, struct rb_root *root)
{
struct rb_node *rebalance;
rebalance = __rb_erase_augmented(node, root, &dummy_callbacks);
if (rebalance)
____rb_erase_color(rebalance, root, dummy_rotate);
}
EXPORT_SYMBOL(rb_erase);
/*
* Augmented rbtree manipulation functions.
*
* This instantiates the same __always_inline functions as in the non-augmented
* case, but this time with user-defined callbacks.
*/
void __rb_insert_augmented(struct rb_node *node, struct rb_root *root,
void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
{
__rb_insert(node, root, augment_rotate);
}
EXPORT_SYMBOL(__rb_insert_augmented);
/*
* This function returns the first node (in sort order) of the tree.
*/
struct rb_node *rb_first(const struct rb_root *root)
{
struct rb_node *n;
n = root->rb_node;
if (!n)
return NULL;
while (n->rb_left)
n = n->rb_left;
return n;
}
EXPORT_SYMBOL(rb_first);
struct rb_node *rb_last(const struct rb_root *root)
{
struct rb_node *n;
n = root->rb_node;
if (!n)
return NULL;
while (n->rb_right)
n = n->rb_right;
return n;
}
EXPORT_SYMBOL(rb_last);
struct rb_node *rb_next(const struct rb_node *node)
{
struct rb_node *parent;
if (RB_EMPTY_NODE(node))
return NULL;
/*
* If we have a right-hand child, go down and then left as far
* as we can.
*/
if (node->rb_right) {
node = node->rb_right;
while (node->rb_left)
node = node->rb_left;
return (struct rb_node *)node;
}
/*
* No right-hand children. Everything down and left is smaller than us,
* so any 'next' node must be in the general direction of our parent.
* Go up the tree; any time the ancestor is a right-hand child of its
* parent, keep going up. First time it's a left-hand child of its
* parent, said parent is our 'next' node.
*/
while ((parent = rb_parent(node)) && node == parent->rb_right)
node = parent;
return parent;
}
EXPORT_SYMBOL(rb_next);
struct rb_node *rb_prev(const struct rb_node *node)
{
struct rb_node *parent;
if (RB_EMPTY_NODE(node))
return NULL;
/*
* If we have a left-hand child, go down and then right as far
* as we can.
*/
if (node->rb_left) {
node = node->rb_left;
while (node->rb_right)
node = node->rb_right;
return (struct rb_node *)node;
}
/*
* No left-hand children. Go up till we find an ancestor which
* is a right-hand child of its parent.
*/
while ((parent = rb_parent(node)) && node == parent->rb_left)
node = parent;
return parent;
}
EXPORT_SYMBOL(rb_prev);
void rb_replace_node(struct rb_node *victim, struct rb_node *new,
struct rb_root *root)
{
struct rb_node *parent = rb_parent(victim);
/* Copy the pointers/colour from the victim to the replacement */
*new = *victim;
/* Set the surrounding nodes to point to the replacement */
if (victim->rb_left)
rb_set_parent(victim->rb_left, new);
if (victim->rb_right)
rb_set_parent(victim->rb_right, new);
__rb_change_child(victim, new, parent, root);
}
EXPORT_SYMBOL(rb_replace_node);
void rb_replace_node_rcu(struct rb_node *victim, struct rb_node *new,
struct rb_root *root)
{
struct rb_node *parent = rb_parent(victim);
/* Copy the pointers/colour from the victim to the replacement */
*new = *victim;
/* Set the surrounding nodes to point to the replacement */
if (victim->rb_left)
rb_set_parent(victim->rb_left, new);
if (victim->rb_right)
rb_set_parent(victim->rb_right, new);
/* Set the parent's pointer to the new node last after an RCU barrier
* so that the pointers onwards are seen to be set correctly when doing
* an RCU walk over the tree.
*/
__rb_change_child_rcu(victim, new, parent, root);
}
EXPORT_SYMBOL(rb_replace_node_rcu);
static struct rb_node *rb_left_deepest_node(const struct rb_node *node)
{
for (;;) {
if (node->rb_left)
node = node->rb_left;
else if (node->rb_right)
node = node->rb_right;
else
return (struct rb_node *)node;
}
}
struct rb_node *rb_next_postorder(const struct rb_node *node)
{
const struct rb_node *parent;
if (!node)
return NULL;
parent = rb_parent(node);
/* If we're sitting on node, we've already seen our children */
if (parent && node == parent->rb_left && parent->rb_right) {
/* If we are the parent's left node, go to the parent's right
* node then all the way down to the left */
return rb_left_deepest_node(parent->rb_right);
} else
/* Otherwise we are the parent's right node, and the parent
* should be next */
return (struct rb_node *)parent;
}
EXPORT_SYMBOL(rb_next_postorder);
struct rb_node *rb_first_postorder(const struct rb_root *root)
{
if (!root->rb_node)
return NULL;
return rb_left_deepest_node(root->rb_node);
}
EXPORT_SYMBOL(rb_first_postorder);

328
utils/src/rbtree.h Normal file
View File

@@ -0,0 +1,328 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
Red Black Trees
(C) 1999 Andrea Arcangeli <andrea@suse.de>
linux/include/linux/rbtree.h
To use rbtrees you'll have to implement your own insert and search cores.
This will avoid us to use callbacks and to drop drammatically performances.
I know it's not the cleaner way, but in C (not in C++) to get
performances and genericity...
See Documentation/core-api/rbtree.rst for documentation and samples.
*/
#ifndef _LINUX_RBTREE_H
#define _LINUX_RBTREE_H
#define rb_parent(r) ((struct rb_node *)((r)->__rb_parent_color & ~3))
#define rb_entry(ptr, type, member) container_of(ptr, type, member)
#define RB_EMPTY_ROOT(root) (READ_ONCE((root)->rb_node) == NULL)
/* 'empty' nodes are nodes that are known not to be inserted in an rbtree */
#define RB_EMPTY_NODE(node) \
((node)->__rb_parent_color == (unsigned long)(node))
#define RB_CLEAR_NODE(node) \
((node)->__rb_parent_color = (unsigned long)(node))
extern void rb_insert_color(struct rb_node *, struct rb_root *);
extern void rb_erase(struct rb_node *, struct rb_root *);
/* Find logical next and previous nodes in a tree */
extern struct rb_node *rb_next(const struct rb_node *);
extern struct rb_node *rb_prev(const struct rb_node *);
extern struct rb_node *rb_first(const struct rb_root *);
extern struct rb_node *rb_last(const struct rb_root *);
/* Postorder iteration - always visit the parent after its children */
extern struct rb_node *rb_first_postorder(const struct rb_root *);
extern struct rb_node *rb_next_postorder(const struct rb_node *);
/* Fast replacement of a single node without remove/rebalance/add/rebalance */
extern void rb_replace_node(struct rb_node *victim, struct rb_node *new,
struct rb_root *root);
extern void rb_replace_node_rcu(struct rb_node *victim, struct rb_node *new,
struct rb_root *root);
static inline void rb_link_node(struct rb_node *node, struct rb_node *parent,
struct rb_node **rb_link)
{
node->__rb_parent_color = (unsigned long)parent;
node->rb_left = node->rb_right = NULL;
*rb_link = node;
}
static inline void rb_link_node_rcu(struct rb_node *node, struct rb_node *parent,
struct rb_node **rb_link)
{
node->__rb_parent_color = (unsigned long)parent;
node->rb_left = node->rb_right = NULL;
rcu_assign_pointer(*rb_link, node);
}
#define rb_entry_safe(ptr, type, member) \
({ typeof(ptr) ____ptr = (ptr); \
____ptr ? rb_entry(____ptr, type, member) : NULL; \
})
/**
* rbtree_postorder_for_each_entry_safe - iterate in post-order over rb_root of
* given type allowing the backing memory of @pos to be invalidated
*
* @pos: the 'type *' to use as a loop cursor.
* @n: another 'type *' to use as temporary storage
* @root: 'rb_root *' of the rbtree.
* @field: the name of the rb_node field within 'type'.
*
* rbtree_postorder_for_each_entry_safe() provides a similar guarantee as
* list_for_each_entry_safe() and allows the iteration to continue independent
* of changes to @pos by the body of the loop.
*
* Note, however, that it cannot handle other modifications that re-order the
* rbtree it is iterating over. This includes calling rb_erase() on @pos, as
* rb_erase() may rebalance the tree, causing us to miss some nodes.
*/
#define rbtree_postorder_for_each_entry_safe(pos, n, root, field) \
for (pos = rb_entry_safe(rb_first_postorder(root), typeof(*pos), field); \
pos && ({ n = rb_entry_safe(rb_next_postorder(&pos->field), \
typeof(*pos), field); 1; }); \
pos = n)
/* Same as rb_first(), but O(1) */
#define rb_first_cached(root) (root)->rb_leftmost
static inline void rb_insert_color_cached(struct rb_node *node,
struct rb_root_cached *root,
bool leftmost)
{
if (leftmost)
root->rb_leftmost = node;
rb_insert_color(node, &root->rb_root);
}
static inline struct rb_node *
rb_erase_cached(struct rb_node *node, struct rb_root_cached *root)
{
struct rb_node *leftmost = NULL;
if (root->rb_leftmost == node)
leftmost = root->rb_leftmost = rb_next(node);
rb_erase(node, &root->rb_root);
return leftmost;
}
static inline void rb_replace_node_cached(struct rb_node *victim,
struct rb_node *new,
struct rb_root_cached *root)
{
if (root->rb_leftmost == victim)
root->rb_leftmost = new;
rb_replace_node(victim, new, &root->rb_root);
}
/*
* The below helper functions use 2 operators with 3 different
* calling conventions. The operators are related like:
*
* comp(a->key,b) < 0 := less(a,b)
* comp(a->key,b) > 0 := less(b,a)
* comp(a->key,b) == 0 := !less(a,b) && !less(b,a)
*
* If these operators define a partial order on the elements we make no
* guarantee on which of the elements matching the key is found. See
* rb_find().
*
* The reason for this is to allow the find() interface without requiring an
* on-stack dummy object, which might not be feasible due to object size.
*/
/**
* rb_add_cached() - insert @node into the leftmost cached tree @tree
* @node: node to insert
* @tree: leftmost cached tree to insert @node into
* @less: operator defining the (partial) node order
*
* Returns @node when it is the new leftmost, or NULL.
*/
static __always_inline struct rb_node *
rb_add_cached(struct rb_node *node, struct rb_root_cached *tree,
bool (*less)(struct rb_node *, const struct rb_node *))
{
struct rb_node **link = &tree->rb_root.rb_node;
struct rb_node *parent = NULL;
bool leftmost = true;
while (*link) {
parent = *link;
if (less(node, parent)) {
link = &parent->rb_left;
} else {
link = &parent->rb_right;
leftmost = false;
}
}
rb_link_node(node, parent, link);
rb_insert_color_cached(node, tree, leftmost);
return leftmost ? node : NULL;
}
/**
* rb_add() - insert @node into @tree
* @node: node to insert
* @tree: tree to insert @node into
* @less: operator defining the (partial) node order
*/
static __always_inline void
rb_add(struct rb_node *node, struct rb_root *tree,
bool (*less)(struct rb_node *, const struct rb_node *))
{
struct rb_node **link = &tree->rb_node;
struct rb_node *parent = NULL;
while (*link) {
parent = *link;
if (less(node, parent))
link = &parent->rb_left;
else
link = &parent->rb_right;
}
rb_link_node(node, parent, link);
rb_insert_color(node, tree);
}
/**
* rb_find_add() - find equivalent @node in @tree, or add @node
* @node: node to look-for / insert
* @tree: tree to search / modify
* @cmp: operator defining the node order
*
* Returns the rb_node matching @node, or NULL when no match is found and @node
* is inserted.
*/
static __always_inline struct rb_node *
rb_find_add(struct rb_node *node, struct rb_root *tree,
int (*cmp)(struct rb_node *, const struct rb_node *))
{
struct rb_node **link = &tree->rb_node;
struct rb_node *parent = NULL;
int c;
while (*link) {
parent = *link;
c = cmp(node, parent);
if (c < 0)
link = &parent->rb_left;
else if (c > 0)
link = &parent->rb_right;
else
return parent;
}
rb_link_node(node, parent, link);
rb_insert_color(node, tree);
return NULL;
}
/**
* rb_find() - find @key in tree @tree
* @key: key to match
* @tree: tree to search
* @cmp: operator defining the node order
*
* Returns the rb_node matching @key or NULL.
*/
static __always_inline struct rb_node *
rb_find(const void *key, const struct rb_root *tree,
int (*cmp)(const void *key, const struct rb_node *))
{
struct rb_node *node = tree->rb_node;
while (node) {
int c = cmp(key, node);
if (c < 0)
node = node->rb_left;
else if (c > 0)
node = node->rb_right;
else
return node;
}
return NULL;
}
/**
* rb_find_first() - find the first @key in @tree
* @key: key to match
* @tree: tree to search
* @cmp: operator defining node order
*
* Returns the leftmost node matching @key, or NULL.
*/
static __always_inline struct rb_node *
rb_find_first(const void *key, const struct rb_root *tree,
int (*cmp)(const void *key, const struct rb_node *))
{
struct rb_node *node = tree->rb_node;
struct rb_node *match = NULL;
while (node) {
int c = cmp(key, node);
if (c <= 0) {
if (!c)
match = node;
node = node->rb_left;
} else if (c > 0) {
node = node->rb_right;
}
}
return match;
}
/**
* rb_next_match() - find the next @key in @tree
* @key: key to match
* @tree: tree to search
* @cmp: operator defining node order
*
* Returns the next node matching @key, or NULL.
*/
static __always_inline struct rb_node *
rb_next_match(const void *key, struct rb_node *node,
int (*cmp)(const void *key, const struct rb_node *))
{
node = rb_next(node);
if (node && cmp(key, node))
node = NULL;
return node;
}
/**
* rb_for_each() - iterates a subtree matching @key
* @node: iterator
* @key: key to match
* @tree: tree to search
* @cmp: operator defining node order
*/
#define rb_for_each(node, key, tree, cmp) \
for ((node) = rb_find_first((key), (tree), (cmp)); \
(node); (node) = rb_next_match((key), (node), (cmp)))
#endif /* _LINUX_RBTREE_H */

View File

@@ -0,0 +1,313 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
Red Black Trees
(C) 1999 Andrea Arcangeli <andrea@suse.de>
(C) 2002 David Woodhouse <dwmw2@infradead.org>
(C) 2012 Michel Lespinasse <walken@google.com>
linux/include/linux/rbtree_augmented.h
*/
#ifndef _LINUX_RBTREE_AUGMENTED_H
#define _LINUX_RBTREE_AUGMENTED_H
/*
* Please note - only struct rb_augment_callbacks and the prototypes for
* rb_insert_augmented() and rb_erase_augmented() are intended to be public.
* The rest are implementation details you are not expected to depend on.
*
* See Documentation/core-api/rbtree.rst for documentation and samples.
*/
struct rb_augment_callbacks {
void (*propagate)(struct rb_node *node, struct rb_node *stop);
void (*copy)(struct rb_node *old, struct rb_node *new);
void (*rotate)(struct rb_node *old, struct rb_node *new);
};
extern void __rb_insert_augmented(struct rb_node *node, struct rb_root *root,
void (*augment_rotate)(struct rb_node *old, struct rb_node *new));
/*
* Fixup the rbtree and update the augmented information when rebalancing.
*
* On insertion, the user must update the augmented information on the path
* leading to the inserted node, then call rb_link_node() as usual and
* rb_insert_augmented() instead of the usual rb_insert_color() call.
* If rb_insert_augmented() rebalances the rbtree, it will callback into
* a user provided function to update the augmented information on the
* affected subtrees.
*/
static inline void
rb_insert_augmented(struct rb_node *node, struct rb_root *root,
const struct rb_augment_callbacks *augment)
{
__rb_insert_augmented(node, root, augment->rotate);
}
static inline void
rb_insert_augmented_cached(struct rb_node *node,
struct rb_root_cached *root, bool newleft,
const struct rb_augment_callbacks *augment)
{
if (newleft)
root->rb_leftmost = node;
rb_insert_augmented(node, &root->rb_root, augment);
}
/*
* Template for declaring augmented rbtree callbacks (generic case)
*
* RBSTATIC: 'static' or empty
* RBNAME: name of the rb_augment_callbacks structure
* RBSTRUCT: struct type of the tree nodes
* RBFIELD: name of struct rb_node field within RBSTRUCT
* RBAUGMENTED: name of field within RBSTRUCT holding data for subtree
* RBCOMPUTE: name of function that recomputes the RBAUGMENTED data
*/
#define RB_DECLARE_CALLBACKS(RBSTATIC, RBNAME, \
RBSTRUCT, RBFIELD, RBAUGMENTED, RBCOMPUTE) \
static inline void \
RBNAME ## _propagate(struct rb_node *rb, struct rb_node *stop) \
{ \
while (rb != stop) { \
RBSTRUCT *node = rb_entry(rb, RBSTRUCT, RBFIELD); \
if (RBCOMPUTE(node, true)) \
break; \
rb = rb_parent(&node->RBFIELD); \
} \
} \
static inline void \
RBNAME ## _copy(struct rb_node *rb_old, struct rb_node *rb_new) \
{ \
RBSTRUCT *old = rb_entry(rb_old, RBSTRUCT, RBFIELD); \
RBSTRUCT *new = rb_entry(rb_new, RBSTRUCT, RBFIELD); \
new->RBAUGMENTED = old->RBAUGMENTED; \
} \
static void \
RBNAME ## _rotate(struct rb_node *rb_old, struct rb_node *rb_new) \
{ \
RBSTRUCT *old = rb_entry(rb_old, RBSTRUCT, RBFIELD); \
RBSTRUCT *new = rb_entry(rb_new, RBSTRUCT, RBFIELD); \
new->RBAUGMENTED = old->RBAUGMENTED; \
RBCOMPUTE(old, false); \
} \
RBSTATIC const struct rb_augment_callbacks RBNAME = { \
.propagate = RBNAME ## _propagate, \
.copy = RBNAME ## _copy, \
.rotate = RBNAME ## _rotate \
};
/*
* Template for declaring augmented rbtree callbacks,
* computing RBAUGMENTED scalar as max(RBCOMPUTE(node)) for all subtree nodes.
*
* RBSTATIC: 'static' or empty
* RBNAME: name of the rb_augment_callbacks structure
* RBSTRUCT: struct type of the tree nodes
* RBFIELD: name of struct rb_node field within RBSTRUCT
* RBTYPE: type of the RBAUGMENTED field
* RBAUGMENTED: name of RBTYPE field within RBSTRUCT holding data for subtree
* RBCOMPUTE: name of function that returns the per-node RBTYPE scalar
*/
#define RB_DECLARE_CALLBACKS_MAX(RBSTATIC, RBNAME, RBSTRUCT, RBFIELD, \
RBTYPE, RBAUGMENTED, RBCOMPUTE) \
static inline bool RBNAME ## _compute_max(RBSTRUCT *node, bool exit) \
{ \
RBSTRUCT *child; \
RBTYPE max = RBCOMPUTE(node); \
if (node->RBFIELD.rb_left) { \
child = rb_entry(node->RBFIELD.rb_left, RBSTRUCT, RBFIELD); \
if (child->RBAUGMENTED > max) \
max = child->RBAUGMENTED; \
} \
if (node->RBFIELD.rb_right) { \
child = rb_entry(node->RBFIELD.rb_right, RBSTRUCT, RBFIELD); \
if (child->RBAUGMENTED > max) \
max = child->RBAUGMENTED; \
} \
if (exit && node->RBAUGMENTED == max) \
return true; \
node->RBAUGMENTED = max; \
return false; \
} \
RB_DECLARE_CALLBACKS(RBSTATIC, RBNAME, \
RBSTRUCT, RBFIELD, RBAUGMENTED, RBNAME ## _compute_max)
#define RB_RED 0
#define RB_BLACK 1
#define __rb_parent(pc) ((struct rb_node *)(pc & ~3))
#define __rb_color(pc) ((pc) & 1)
#define __rb_is_black(pc) __rb_color(pc)
#define __rb_is_red(pc) (!__rb_color(pc))
#define rb_color(rb) __rb_color((rb)->__rb_parent_color)
#define rb_is_red(rb) __rb_is_red((rb)->__rb_parent_color)
#define rb_is_black(rb) __rb_is_black((rb)->__rb_parent_color)
static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
{
rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
}
static inline void rb_set_parent_color(struct rb_node *rb,
struct rb_node *p, int color)
{
rb->__rb_parent_color = (unsigned long)p | color;
}
static inline void
__rb_change_child(struct rb_node *old, struct rb_node *new,
struct rb_node *parent, struct rb_root *root)
{
if (parent) {
if (parent->rb_left == old)
WRITE_ONCE(parent->rb_left, new);
else
WRITE_ONCE(parent->rb_right, new);
} else
WRITE_ONCE(root->rb_node, new);
}
static inline void
__rb_change_child_rcu(struct rb_node *old, struct rb_node *new,
struct rb_node *parent, struct rb_root *root)
{
if (parent) {
if (parent->rb_left == old)
rcu_assign_pointer(parent->rb_left, new);
else
rcu_assign_pointer(parent->rb_right, new);
} else
rcu_assign_pointer(root->rb_node, new);
}
extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
void (*augment_rotate)(struct rb_node *old, struct rb_node *new));
static __always_inline struct rb_node *
__rb_erase_augmented(struct rb_node *node, struct rb_root *root,
const struct rb_augment_callbacks *augment)
{
struct rb_node *child = node->rb_right;
struct rb_node *tmp = node->rb_left;
struct rb_node *parent, *rebalance;
unsigned long pc;
if (!tmp) {
/*
* Case 1: node to erase has no more than 1 child (easy!)
*
* Note that if there is one child it must be red due to 5)
* and node must be black due to 4). We adjust colors locally
* so as to bypass __rb_erase_color() later on.
*/
pc = node->__rb_parent_color;
parent = __rb_parent(pc);
__rb_change_child(node, child, parent, root);
if (child) {
child->__rb_parent_color = pc;
rebalance = NULL;
} else
rebalance = __rb_is_black(pc) ? parent : NULL;
tmp = parent;
} else if (!child) {
/* Still case 1, but this time the child is node->rb_left */
tmp->__rb_parent_color = pc = node->__rb_parent_color;
parent = __rb_parent(pc);
__rb_change_child(node, tmp, parent, root);
rebalance = NULL;
tmp = parent;
} else {
struct rb_node *successor = child, *child2;
tmp = child->rb_left;
if (!tmp) {
/*
* Case 2: node's successor is its right child
*
* (n) (s)
* / \ / \
* (x) (s) -> (x) (c)
* \
* (c)
*/
parent = successor;
child2 = successor->rb_right;
augment->copy(node, successor);
} else {
/*
* Case 3: node's successor is leftmost under
* node's right child subtree
*
* (n) (s)
* / \ / \
* (x) (y) -> (x) (y)
* / /
* (p) (p)
* / /
* (s) (c)
* \
* (c)
*/
do {
parent = successor;
successor = tmp;
tmp = tmp->rb_left;
} while (tmp);
child2 = successor->rb_right;
WRITE_ONCE(parent->rb_left, child2);
WRITE_ONCE(successor->rb_right, child);
rb_set_parent(child, successor);
augment->copy(node, successor);
augment->propagate(parent, successor);
}
tmp = node->rb_left;
WRITE_ONCE(successor->rb_left, tmp);
rb_set_parent(tmp, successor);
pc = node->__rb_parent_color;
tmp = __rb_parent(pc);
__rb_change_child(node, successor, tmp, root);
if (child2) {
rb_set_parent_color(child2, parent, RB_BLACK);
rebalance = NULL;
} else {
rebalance = rb_is_black(successor) ? parent : NULL;
}
successor->__rb_parent_color = pc;
tmp = successor;
}
augment->propagate(tmp, NULL);
return rebalance;
}
static __always_inline void
rb_erase_augmented(struct rb_node *node, struct rb_root *root,
const struct rb_augment_callbacks *augment)
{
struct rb_node *rebalance = __rb_erase_augmented(node, root, augment);
if (rebalance)
__rb_erase_color(rebalance, root, augment->rotate);
}
static __always_inline void
rb_erase_augmented_cached(struct rb_node *node, struct rb_root_cached *root,
const struct rb_augment_callbacks *augment)
{
if (root->rb_leftmost == node)
root->rb_leftmost = rb_next(node);
rb_erase_augmented(node, &root->rb_root, augment);
}
#endif /* _LINUX_RBTREE_AUGMENTED_H */

34
utils/src/rbtree_types.h Normal file
View File

@@ -0,0 +1,34 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
#ifndef _LINUX_RBTREE_TYPES_H
#define _LINUX_RBTREE_TYPES_H
struct rb_node {
unsigned long __rb_parent_color;
struct rb_node *rb_right;
struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));
/* The alignment might seem pointless, but allegedly CRIS needs it */
struct rb_root {
struct rb_node *rb_node;
};
/*
* Leftmost-cached rbtrees.
*
* We do not cache the rightmost node based on footprint
* size vs number of potential users that could benefit
* from O(1) rb_last(). Just not worth it, users that want
* this feature can always implement the logic explicitly.
* Furthermore, users that want to cache both pointers may
* find it a bit asymmetric, but that's ok.
*/
struct rb_root_cached {
struct rb_root rb_root;
struct rb_node *rb_leftmost;
};
#define RB_ROOT (struct rb_root) { NULL, }
#define RB_ROOT_CACHED (struct rb_root_cached) { {NULL, }, NULL }
#endif

View File

@@ -44,3 +44,37 @@ int srch_decode_entry(void *buf, struct scoutfs_srch_entry *sre,
return tot;
}
static int encode_u64(__le64 *buf, u64 val)
{
int bytes;
val = (val << 1) ^ ((s64)val >> 63); /* shift sign extend */
bytes = (fls64(val) + 7) >> 3;
put_unaligned_le64(val, buf);
return bytes;
}
int srch_encode_entry(void *buf, struct scoutfs_srch_entry *sre, struct scoutfs_srch_entry *prev)
{
u64 diffs[] = {
le64_to_cpu(sre->hash) - le64_to_cpu(prev->hash),
le64_to_cpu(sre->ino) - le64_to_cpu(prev->ino),
le64_to_cpu(sre->id) - le64_to_cpu(prev->id),
};
u16 lengths = 0;
int bytes;
int tot = 2;
int i;
for (i = 0; i < array_size(diffs); i++) {
bytes = encode_u64(buf + tot, diffs[i]);
lengths |= bytes << (i << 2);
tot += bytes;
}
put_unaligned_le16(lengths, buf);
return tot;
}

View File

@@ -3,5 +3,6 @@
int srch_decode_entry(void *buf, struct scoutfs_srch_entry *sre,
struct scoutfs_srch_entry *prev);
int srch_encode_entry(void *buf, struct scoutfs_srch_entry *sre, struct scoutfs_srch_entry *prev);
#endif

View File

@@ -7,7 +7,6 @@
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <wordexp.h>
#include "util.h"
#include "format.h"
@@ -18,26 +17,15 @@
static int open_path(char *path, int flags)
{
wordexp_t exp_result;
int ret;
ret = wordexp(path, &exp_result, WRDE_NOCMD | WRDE_SHOWERR | WRDE_UNDEF);
if (ret) {
fprintf(stderr, "wordexp() failure for \"%s\": %d\n", path, ret);
ret = -EINVAL;
goto out;
}
ret = open(exp_result.we_wordv[0], flags);
ret = open(path, flags);
if (ret < 0) {
ret = -errno;
fprintf(stderr, "failed to open '%s': %s (%d)\n",
path, strerror(errno), errno);
}
out:
wordfree(&exp_result);
return ret;
}

View File

@@ -70,6 +70,8 @@ do { \
#define container_of(ptr, type, memb) \
((type *)((void *)(ptr) - offsetof(type, memb)))
#define NSEC_PER_SEC 1000000000
#define BITS_PER_LONG (sizeof(long) * 8)
#define U8_MAX ((u8)~0ULL)
#define U16_MAX ((u16)~0ULL)
@@ -82,6 +84,7 @@ do { \
\
(_x == 0 ? 0 : 64 - __builtin_clzll(_x)); \
})
#define fls64(x) flsll(x)
#define ilog2(x) \
({ \
@@ -99,6 +102,16 @@ emit_get_unaligned_le(16)
emit_get_unaligned_le(32)
emit_get_unaligned_le(64)
#define emit_put_unaligned_le(nr) \
static inline void put_unaligned_le##nr(u##nr val, void *buf) \
{ \
__le##nr x = cpu_to_le##nr(val); \
memcpy(buf, &x, sizeof(x)); \
}
emit_put_unaligned_le(16)
emit_put_unaligned_le(32)
emit_put_unaligned_le(64)
/*
* return -1,0,+1 based on the memcmp comparison of the minimum of their
* two lengths. If their min shared bytes are equal but the lengths