mirror of
https://github.com/versity/scoutfs.git
synced 2025-12-23 05:25:18 +00:00
e182914e5180aea454c29c13663603e236419db0
The log merging process is meant to provide parallelism across workers in mounts. The idea is that the server hands out a bunch of concurrent non-intersecting work that's based on the structure of the stable input fs_root btree. The nature of the parallel work (cow of the blocks that intersect a key range) means that the ranges of concurrently issued work can't overlap or the work will all cow the same input blocks, freeing that input stable block multiple times. We're seeing this in testing. Correctness was intended by having an advancing key that sweeps sorted ranges. Duplicate ranges would never be hit as the key advanced past each it visited. This was broken by the mapping of the fs item keys to log merge tree keys by clobbering the sk_zone key value. It effectively interleaves the ranges of each zone in the fs root (meta indexes, orphans, fs items). With just the right log merge conditions that involve logged items in the right places and partial completed work to insert remaining ranges behind the key, ranges can be stored at mapped keys that end up with ranges out of order. The server iterates over these and ends up issueing overlapping work, which results in duplicated frees of the input blocks. The fix, without changing the format of the stored log tree items, is to perform a full sweep of all the range items and determine the next item by looking at the full precision stored keys. This ensures that the processed ranges always advance and never overlap. Signed-off-by: Zach Brown <zab@versity.com>
Introduction
scoutfs is a clustered in-kernel Linux filesystem designed to support large archival systems. It features additional interfaces and metadata so that archive agents can perform their maintenance workflows without walking all the files in the namespace. Its cluster support lets deployments add nodes to satisfy archival tier bandwidth targets.
The design goal is to reach file populations in the trillions, with the archival bandwidth to match, while remaining operational and responsive.
Highlights of the design and implementation include:
- Fully consistent POSIX semantics between nodes
- Atomic transactions to maintain consistent persistent structures
- Integrated archival metadata replaces syncing to external databases
- Dynamic seperation of resources lets nodes write in parallel
- 64bit throughout; no limits on file or directory sizes or counts
- Open GPLv2 implementation
Community Mailing List
Please join us on the open scoutfs-devel@scoutfs.org mailing list hosted on Google Groups
Description
Languages
C
87.2%
Shell
9.1%
Roff
2.5%
TeX
0.9%
Makefile
0.3%