mirror of https://github.com/versity/scoutfs.git synced 2026-02-08 19:50:08 +00:00

Files

Zach Brown a6782fc03f scoutfs: add data waiting

One of the core features of scoutfs is the ability to transparently
migrate file contents to and from an archive tier.  For this to be
transparent we need file system operations to trigger staging the file
contents back into the file system as needed.

This adds the infrastructure which operations use to wait for offline
extents to come online and which provides userspace with a list of
blocks that the operations are waiting for.

We add some waiting infrastructure that callers use to lock, check for
offline extents, and unlock and wait before checking again to see if
they're still offline.  We add these checks and waiting to data io
operations that could encounter offline extents.

This has to be done carefully so that we don't wait while holding locks
that would prevent staging.  We use per-task structures to discover when
we are the first user of a cluster lock on an inode, indicating that
it's safe for us to wait because we don't hold any locks.

And while we're waiting our operation is tracked and reported to
userspace through an ioctl.  This is a non-blocking ioctl, it's up to
userspace to decide how often to check and how large a region to stage.

Waiters are woken up when the file contents could have changed, not
specifically when we know that the extent has come online.  This lets us
wake waiters when their lock is revoked so that they can block waiting
to reacquire the lock and test the extents again.  It lets us provide
coherent demand staging across the cluster without fine grained waiting
protocols sent betwen the nodes.  It may result in some spurious wakeups
and work but hopefully it won't, and it's a very simple and functional
first pass.

Signed-off-by: Zach Brown <zab@versity.com>

2019-05-21 11:33:26 -07:00

src

scoutfs: add data waiting

2019-05-21 11:33:26 -07:00

.gitignore

scoutfs: update rpm building infrastructure

2018-09-14 15:07:10 -07:00

Makefile

scoutfs: update rpm building infrastructure

2018-09-14 15:07:10 -07:00

README.md

scoutfs: update README.md for quorum

2019-04-12 10:54:07 -07:00

scoutfs-kmod.spec.in

scoutfs: update rpm building infrastructure

2018-09-14 15:07:10 -07:00

README.md

Introduction

scoutfs is a clustered in-kernel Linux filesystem designed and built from the ground up to support large archival systems.

Its key differentiating features are:

Integrated consistent indexing to accelerate archival maintenance operations
Shared LSM index structure to scale metadata rates with storage bandwidth
Decoupled logical locking from serialized device writes to reduce contention

It meets best of breed expectations:

Fully consistent POSIX semantics between nodes
Rich metadata to ensure the integrity of metadata references
Atomic transactions to maintain consistent persistent structures
First class kernel implementation for high performance and low latency
Open GPLv2 implementation

Learn more in the white paper.

Current Status

Alpha Open Source Development

scoutfs is under heavy active development. We're developing it in the open to give the community an opportunity to affect the design and implementation.

The core architectural design elements are in place. Much surrounding functionality hasn't been implemented. It's appropriate for early adopters and interested developers, not for production use.

In that vein, expect significant incompatible changes to both the format of network messages and persistent structures. To avoid mistakes the implementation currently calculates a hash of the format and ioctl header files in the source tree. The kernel module will refuse to mount a volume created by userspace utilities with a mismatched hash, and it will refuse to connect to a remote node with a mismatched hash. This means having to unmount, mkfs, and remount everything across many functional changes. Once the format is nailed down we'll wire up forward and back compat machinery and remove this temporary safety measure.

The current kernel module is developed against the RHEL/CentOS 7.x kernel to minimize the friction of developing and testing with partners' existing infrastructure. Once we're happy with the design we'll shift development to the upstream kernel while maintaining distro compatibility branches.

Community Mailing List

Please join us on the open scoutfs-devel@scoutfs.org mailing list hosted on Google Groups for all discussion of scoutfs.

Quick Start

This following a very rough example of the procedure to get up and running, experience will be needed to fill in the gaps. We're happy to help on the mailing list.

The requirements for running scoutfs on a small cluster are:

One or more nodes running x86-64 CentOS/RHEL 7.4 (or 7.3)
Access to a single shared block device
IPv4 connectivity between the nodes

The steps for getting scoutfs mounted and operational are:

Get the kernel module running on the nodes
Make a new filesystem on the device with the userspace utilities
Mount the device on all the nodes

In this example we run all of these commands on two nodes. The block device name is the same on all the nodes.

Get the Kernel Module and Userspace Binaries

Either use snapshot RPMs built from git by Versity:

rpm -i https://scoutfs.s3-us-west-2.amazonaws.com/scoutfs-repo-0.0.1-1.el7_4.noarch.rpm
yum install scoutfs-utils kmod-scoutfs

Or use the binaries built from checked out git repositories:

yum install kernel-devel
git clone git@github.com:versity/scoutfs-kmod-dev.git
make -C scoutfs-kmod-dev module 
modprobe libcrc32c
insmod scoutfs-kmod-dev/src/scoutfs.ko

git clone git@github.com:versity/scoutfs-utils-dev.git
make -C scoutfs-utils-dev
alias scoutfs=$PWD/scoutfs-utils-dev/src/scoutfs

Make a New Filesystem (destroys contents, no questions asked)

We specify that every node will participate in quorum voting by configuring each in the super block with options to mkfs.
```
scoutfs mkfs -o quorum_slot node1:0:172.16.1.1 \
 	-o quorum_slot node2:0:172.16.1.2 /dev/shared_block_device
```
Mount the Filesystem

Each mounting node provides the name that was given to the quorum_slot option to mkfs.
```
mkdir /mnt/scoutfs
mount -t scoutfs -o uniq_name=$NODENAME /dev/shared_block_device /mnt/scoutfs
```

For Kicks, Observe the Metadata Change Index

The meta_seq index tracks the inodes that are changed in each transaction.

scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs
touch /mnt/scoutfs/one; sync
scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs
touch /mnt/scoutfs/two; sync
scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs
touch /mnt/scoutfs/one; sync
scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs