mirror of https://github.com/versity/scoutfs.git synced 2026-04-23 15:00:29 +00:00

Go to file

Zach Brown 80ee2c6d57 Harden client transaction processing

There are a few bad corner cases in the state machine that governs how
client transactions are opened, modified, and committed.

The worst problem is on the server side.   All server request handlers
need to cope with resent requests without causing bad side effects.
Both get_log_trees and commit_log_trees would try to fully processes
resent requests.  _get_log_trees() looks safe because it works with the
log_trees that was stored previously.  _commit_log_trees() is not safe
because it can rotate out the srch log file referenced by the sent
log_trees every time it's processed.  This could create extra srch
entries which would delete the first instance of entries.  Worse still,
by injecting the same block structure into the system multiple times it
ends up causing multiple frees of the blocks that make up the srch file.

The client side problems are slightly different, but related.   There
aren't strong constraints which guarantee that we'll only send a commit
request after a get request succeeds.   In crazy circumstances the
commit request in the write worker could come before the first get in
mount succeeds.   Far worse is that we can send multiple commit requests
for one transaction if it changes as we get errors during multiple
queued write attempts, particularly if we get errors from get_log_trees
after having successfully committed.

This hardens all these paths to ensure a strict sequence of
get_log_trees, transaction modification, and commit_log_trees.

On the server we add *_trans_seq fields to the log_trees struct so that
both get_ and commit_ can see that they've already prepared a commit to
send or have already committed the incoming commit, respectively.   We
can use the get_trans_seq field as the trans_seq of the open transaction
and get rid of the entire seperate mechanism we used to have for
tracking open trans seqs in the clients.  We can get the same info by
walking the log_trees and looking at their *_trans_seq fields.

In the client we have the write worker immediately return success if
mount hasn't opened the first transaction.   Then we don't have the
worker return to allow further modification until it has gotten success
from get_log_trees.

Signed-off-by: Zach Brown <zab@versity.com>

2021-10-28 12:30:47 -07:00

kmod

Harden client transaction processing

2021-10-28 12:30:47 -07:00

tests

Allow compacting logs down to a single page

2021-10-28 12:30:47 -07:00

utils

Harden client transaction processing

2021-10-28 12:30:47 -07:00

.gitignore

Initial commit

2020-12-07 09:47:12 -08:00

Makefile

Add simple top-level Makefile

2020-12-07 10:39:20 -08:00

README.md

Update repo README.md for quorum slots

2021-02-22 13:28:38 -08:00

README.md

Introduction

scoutfs is a clustered in-kernel Linux filesystem designed and built from the ground up to support large archival systems.

Its key differentiating features are:

Integrated consistent indexing accelerates archival maintenance operations
Commit logs allow nodes to write concurrently without contention

It meets best of breed expectations:

Fully consistent POSIX semantics between nodes
Rich metadata to ensure the integrity of metadata references
Atomic transactions to maintain consistent persistent structures
First class kernel implementation for high performance and low latency
Open GPLv2 implementation

Learn more in the white paper.

Current Status

Alpha Open Source Development

scoutfs is under heavy active development. We're developing it in the open to give the community an opportunity to affect the design and implementation.

The core architectural design elements are in place. Much surrounding functionality hasn't been implemented. It's appropriate for early adopters and interested developers, not for production use.

In that vein, expect significant incompatible changes to both the format of network messages and persistent structures. Since the format hash-checking has now been removed in preparation for release, if there is any doubt, mkfs is strongly recommended.

The current kernel module is developed against the RHEL/CentOS 7.x kernel to minimize the friction of developing and testing with partners' existing infrastructure. Once we're happy with the design we'll shift development to the upstream kernel while maintaining distro compatibility branches.

Community Mailing List

Please join us on the open scoutfs-devel@scoutfs.org mailing list hosted on Google Groups for all discussion of scoutfs.

Quick Start

This following a very rough example of the procedure to get up and running, experience will be needed to fill in the gaps. We're happy to help on the mailing list.

The requirements for running scoutfs on a small cluster are:

One or more nodes running x86-64 CentOS/RHEL 7.4 (or 7.3)
Access to two shared block devices
IPv4 connectivity between the nodes

The steps for getting scoutfs mounted and operational are:

Get the kernel module running on the nodes
Make a new filesystem on the devices with the userspace utilities
Mount the devices on all the nodes

In this example we use three nodes. The names of the block devices are the same on all the nodes. Two of the nodes will be quorum members. A majority of quorum members must be mounted to elect a leader to run a server that all the mounts connect to. It should be noted that two quorum members results in a majority of one, each member itself, so split brain elections are possible but so unlikely that it's fine for a demonstration.

Get the Kernel Module and Userspace Binaries

Either use snapshot RPMs built from git by Versity:

rpm -i https://scoutfs.s3-us-west-2.amazonaws.com/scoutfs-repo-0.0.1-1.el7_4.noarch.rpm
yum install scoutfs-utils kmod-scoutfs

Or use the binaries built from checked out git repositories:

yum install kernel-devel
git clone git@github.com:versity/scoutfs.git
make -C scoutfs
modprobe libcrc32c
insmod scoutfs/kmod/src/scoutfs.ko
alias scoutfs=$PWD/scoutfs/utils/src/scoutfs

Make a New Filesystem (destroys contents)

We specify quorum slots with the addresses of each of the quorum member nodes, the metadata device, and the data device.
```
scoutfs mkfs -Q 0,$NODE0_ADDR,12345 -Q 1,$NODE1_ADDR,12345 /dev/meta_dev /dev/data_dev
```
Mount the Filesystem

First, mount each of the quorum nodes so that they can elect and start a server for the remaining node to connect to. The slot numbers were specified with the leading "0,..." and "1,..." in the mkfs options above.
```
mount -t scoutfs -o quorum_slot_nr=$SLOT_NR,metadev_path=/dev/meta_dev /dev/data_dev /mnt/scoutfs
```
Then mount the remaining node which can now connect to the running server.
```
mount -t scoutfs -o metadev_path=/dev/meta_dev /dev/data_dev /mnt/scoutfs
```

For Kicks, Observe the Metadata Change Index

The meta_seq index tracks the inodes that are changed in each transaction.

scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs
touch /mnt/scoutfs/one; sync
scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs
touch /mnt/scoutfs/two; sync
scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs
touch /mnt/scoutfs/one; sync
scoutfs walk-inodes meta_seq 0 -1 /mnt/scoutfs

Languages

C 86.4%

Shell 10%

Roff 2.5%

TeX 0.8%

Makefile 0.3%