The item cache page life cycle is tricky. There are no proper page reference counts, everthing is done by nesting the page rwlock inside item_cache_info rwlock. The intent is that you can only reference pages while you hold the rwlocks appropriately. The per-cpu page references are outside that locking regime so they add a reference count. Now there are reference counts for the main cache index reference and for each per-cpu reference. The end result of all this is that you can only reference pages outside of locks if you're protected by references. Lock invalidation messed this up by trying to add its right split page to the lru after it was unlocked. Its page reference wasn't protected at this point. Shrinking could be freeing that page, and so it could be putting a freed page's memory back on the lru. Shrinking had a little bug that it was using list_move to move an initialized lru_head list_head. It turns out to be harmless (list_del will just follow pointers to itself and set itself as next and prev all over again), but boy does it catch one's eye. Let's remove all confusion and drop the reference while holding the cinf->rwlock instead of trying to optimize freeing outside locks. Finally, the big one: inserting a read item after compacting the page to make room was inserting into stale parent pointers into the old pre-compacted page, rather than the new page that was swapped in by compaction. This left references to a freed page in the page rbtree and hilarity ensued. Signed-off-by: Zach Brown <zab@versity.com>
Introduction
scoutfs is a clustered in-kernel Linux filesystem designed and built from the ground up to support large archival systems.
Its key differentiating features are:
- Integrated consistent indexing accelerates archival maintenance operations
- Commit logs allow nodes to write concurrently without contention
It meets best of breed expectations:
- Fully consistent POSIX semantics between nodes
- Rich metadata to ensure the integrity of metadata references
- Atomic transactions to maintain consistent persistent structures
- First class kernel implementation for high performance and low latency
- Open GPLv2 implementation
Learn more in the white paper.
Current Status
Alpha Open Source Development
scoutfs is under heavy active development. We're developing it in the open to give the community an opportunity to affect the design and implementation.
The core architectural design elements are in place. Much surrounding functionality hasn't been implemented. It's appropriate for early adopters and interested developers, not for production use.
In that vein, expect significant incompatible changes to both the format of network messages and persistent structures. Since the format hash-checking has now been removed in preparation for release, if there is any doubt, mkfs is strongly recommended.
The current kernel module is developed against the RHEL/CentOS 7.x kernel to minimize the friction of developing and testing with partners' existing infrastructure. Once we're happy with the design we'll shift development to the upstream kernel while maintaining distro compatibility branches.
Community Mailing List
Please join us on the open scoutfs-devel@scoutfs.org mailing list hosted on Google Groups for all discussion of scoutfs.
Quick Start
This following a very rough example of the procedure to get up and running, experience will be needed to fill in the gaps. We're happy to help on the mailing list.
The requirements for running scoutfs on a small cluster are:
- One or more nodes running x86-64 CentOS/RHEL 7.4 (or 7.3)
- Access to two shared block devices
- IPv4 connectivity between the nodes
The steps for getting scoutfs mounted and operational are:
- Get the kernel module running on the nodes
- Make a new filesystem on the devices with the userspace utilities
- Mount the devices on all the nodes
In this example we run all of these commands on three nodes. The names of the block devices are the same on all the nodes.
-
Get the Kernel Module and Userspace Binaries
- Either use snapshot RPMs built from git by Versity:
rpm -i https://scoutfs.s3-us-west-2.amazonaws.com/scoutfs-repo-0.0.1-1.el7_4.noarch.rpm yum install scoutfs-utils kmod-scoutfs- Or use the binaries built from checked out git repositories:
yum install kernel-devel git clone git@github.com:versity/scoutfs.git make -C scoutfs modprobe libcrc32c insmod scoutfs/kmod/src/scoutfs.ko alias scoutfs=$PWD/scoutfs/utils/src/scoutfs -
Make a New Filesystem (destroys contents, no questions asked)
We specify that two of our three nodes must be present to form a quorum for the system to function.
scoutfs mkfs -Q 2 /dev/meta_dev /dev/data_dev -
Mount the Filesystem
Each mounting node provides its local IP address on which it will run an internal server for the other mounts if it is elected the leader by the quorum.
mkdir /mnt/scoutfs mount -t scoutfs -o server_addr=$NODE_ADDR,metadev_path=/dev/meta_dev /dev/data_dev /mnt/scoutfs -
For Kicks, Observe the Metadata Change Index
The
meta_seqindex tracks the inodes that are changed in each transaction.scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs touch /mnt/scoutfs/one; sync scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs touch /mnt/scoutfs/two; sync scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs touch /mnt/scoutfs/one; sync scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs