Zach Brown 4c2a287474 Protect clusters locks with refcounts
The first pass at managing the cluster lock state machine used a simple
global spinlock.  It's time to break it up.

This adds refcounting to the cluster lock struct.  Rather than managing
global data structures and individual lock state all under a global
spinlock, we use per-structure locks, a lock spinlock, and a lock
refcount.

Active users of the cluster lock hold a reference.  This primarily lets
unlock only check global structures once the refcounts say that it's
time to remove the lock from the structures.  The careful use of the
refcount to avoid locks that are being freed during lookup also paves
the way for using mostly read-only RCU lookup structures soon.

The global LRU is still modified on every lock use, that'll also be
removed up in future work.

The linfo spinlock is now only used for the LRU and lookup structures.
Other uses are removed, which causes more careful use of the finer
grained locks that initially just mirrored the use of the linfo spinlock
to keep those introduction patches safe.

The move from a single global lock to more fine grained locks creates
nesting that needs to be managed.  Shrinking and recovery in particular
need to be careful as they transition from spinlocks used to find
cluster locks to getting the cluster lock spinlock.

The presence of freeing locks in the lookup indexes means that some
callers need to retry if they hit freeing locks.  We have to add this
protection to recovery iterating over locks by their key value, but it
wouldn't have made sense to build that around the lookup rbtree as its
going away.  It makes sense to use the range tree that we're going to
keep using to make sure we don't accidentally introduce locks whose
ranges overlap (which would lead to item cache inconsistency).

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-31 15:38:31 -05:00
2020-12-07 09:47:12 -08:00
2020-12-07 10:39:20 -08:00
2021-11-05 11:16:57 -07:00
2025-06-03 13:35:42 -07:00

Introduction

scoutfs is a clustered in-kernel Linux filesystem designed to support large archival systems. It features additional interfaces and metadata so that archive agents can perform their maintenance workflows without walking all the files in the namespace. Its cluster support lets deployments add nodes to satisfy archival tier bandwidth targets.

The design goal is to reach file populations in the trillions, with the archival bandwidth to match, while remaining operational and responsive.

Highlights of the design and implementation include:

  • Fully consistent POSIX semantics between nodes
  • Atomic transactions to maintain consistent persistent structures
  • Integrated archival metadata replaces syncing to external databases
  • Dynamic seperation of resources lets nodes write in parallel
  • 64bit throughout; no limits on file or directory sizes or counts
  • Open GPLv2 implementation

Community Mailing List

Please join us on the open scoutfs-devel@scoutfs.org mailing list hosted on Google Groups

Description
No description provided
Readme 6.7 MiB
Languages
C 87.2%
Shell 9.1%
Roff 2.5%
TeX 0.9%
Makefile 0.3%