Block EL9 minor version upgrades

Since kABI migrations across minor versions is a thing of the past going forward, we now: - Detect if we're on EL9 - If so, add a requirement on the various flavors of release package to that specific major.minor version This appropriately does not allow upgrades across minor versions.
Merge pull request #194 from versity/zab/v1.22
2026-01-09 13:23:14 +00:00 · 2024-12-02 16:04:24 -08:00 · 2024-11-03 15:06:40 -08:00 · 2024-11-01 13:05:52 -07:00 · 2024-10-31 16:56:51 -07:00 · 2024-10-31 15:35:11 -07:00
8 changed files with 129 additions and 26 deletions
--- a/ReleaseNotes.md
+++ b/ReleaseNotes.md
@@ -1,6 +1,21 @@
 Versity ScoutFS Release Notes
 =============================

+---
+v1.22
+\
+*Nov 1, 2024*
+
+Add support for building against the RHEL9 family of kernels.
+
+Fix failure of the setattr\_more ioctl() to set the attributes of a
+zero-length file when restoring.
+
+Fix support for POSIX ACLs in the RHEL8 and later family of kernels.
+
+Fix a race condition in the lock server that could drop lock requests
+under heavy load and cause cluster lock attempts to hang.
+
 ---
 v1.21
 \
--- a/kmod/scoutfs-kmod.spec.in
+++ b/kmod/scoutfs-kmod.spec.in
@@ -4,9 +4,6 @@
 %define kmod_git_describe @@GITDESCRIBE@@
 %define pkg_date %(date +%%Y%%m%%d)

-# Disable the building of the debug package(s).
-%define debug_package %{nil}
-
 # take kernel version or default to uname -r
 %{!?kversion: %global kversion %(uname -r)}
 %global kernel_version %{kversion}
@@ -56,6 +53,18 @@ Source:		%{kmod_name}-kmod-%{kmod_version}.tar
 %global flavors_to_build x86_64
 %endif

+# el9 sanity: make sure we lock to the minor release we built for and block upgrades
+%{lua:
+  if string.match(rpm.expand("%{dist}"), "%.el9") then
+    rpm.define("el9 1")
+  end
+}
+
+%if 0%{?el9}
+%define release_major_minor 9.%{lua: print(rpm.expand("%{dist}"):match("%.el9_(%d)"))}
+Requires: system-release = %{release_major_minor}
+%endif
+
 %description
 %{kmod_name} - kernel module

--- a/kmod/src/lock_server.c
+++ b/kmod/src/lock_server.c
@@ -202,21 +202,48 @@ static u8 invalidation_mode(u8 granted, u8 requested)

 /*
 * Return true of the client lock instances described by the entries can
- * be granted at the same time.  Typically this only means they're both
- * modes that are compatible between nodes. In addition there's the
- * special case where a read lock on a client is compatible with a write
- * lock on the same client because the client's cache covered by the
- * read lock is still valid if they get a write lock.
+ * be granted at the same time.  There's only three cases where this is
+ * true.
+ *
+ * First, the two locks are both of the same mode that allows full
+ * sharing -- read and write only.  The only point of these modes is
+ * that everyone can share them.
+ *
+ * Second, a write lock gives the client permission to read as well.
+ * This means that a client can upgrade its read lock to a write lock
+ * without having to invalidate the existing read and drop caches.
+ *
+ * Third, null locks are always compatible between clients.  It's as
+ * though the client with the null lock has no lock at all.  But it's
+ * never compatible with all locks on the client requesting null.
+ * Sending invalidations for existing locks on a client when we get a
+ * null request is how we resolve races in shrinking locks -- we turn it
+ * into the unsolicited remote invalidation case.
+ *
+ * All other mode and client combinations can not be shared, most
+ * typically a write lock invalidating all other non-write holders to
+ * drop caches and force a read after the write has completed.
 */
 static bool client_entries_compatible(struct client_lock_entry *granted,
 				      struct client_lock_entry *requested)
 {
-	return (granted->mode == requested->mode &&
-		(granted->mode == SCOUTFS_LOCK_READ ||
-		 granted->mode == SCOUTFS_LOCK_WRITE_ONLY)) ||
-	       (granted->rid == requested->rid &&
-		granted->mode == SCOUTFS_LOCK_READ &&
-		requested->mode == SCOUTFS_LOCK_WRITE);
+	/* only read and write_only can be full shared */
+	if ((granted->mode == requested->mode) &&
+	    (granted->mode == SCOUTFS_LOCK_READ || granted->mode == SCOUTFS_LOCK_WRITE_ONLY))
+		return true;
+
+	/* _write includes reading, so a client can upgrade its read to write */
+	if (granted->rid == requested->rid &&
+	    granted->mode == SCOUTFS_LOCK_READ &&
+	    requested->mode == SCOUTFS_LOCK_WRITE)
+		return true;
+
+	/* null is always compatible across clients, never within a client */
+	if ((granted->rid != requested->rid) &&
+	    (granted->mode == SCOUTFS_LOCK_NULL || requested->mode == SCOUTFS_LOCK_NULL))
+		return true;
+
+	return false;
 }

 /*
@@ -317,16 +344,18 @@ static void put_server_lock(struct lock_server_info *inf,

 	BUG_ON(!mutex_is_locked(&snode->mutex));

+	spin_lock(&inf->lock);
+
 	if (atomic_dec_and_test(&snode->refcount) &&
 	    list_empty(&snode->granted) &&
 	    list_empty(&snode->requested) &&
 	    list_empty(&snode->invalidated)) {
-		spin_lock(&inf->lock);
 		rb_erase(&snode->node, &inf->locks_root);
-		spin_unlock(&inf->lock);
 		should_free = true;
 	}

+	spin_unlock(&inf->lock);
+
 	mutex_unlock(&snode->mutex);

 	if (should_free) {
--- a/kmod/src/net.c
+++ b/kmod/src/net.c
@@ -502,12 +502,12 @@ static void scoutfs_net_proc_worker(struct work_struct *work)
 * Free live responses up to and including the seq by marking them dead
 * and moving them to the send queue to be freed.
 */
-static int move_acked_responses(struct scoutfs_net_connection *conn,
-				struct list_head *list, u64 seq)
+static bool move_acked_responses(struct scoutfs_net_connection *conn,
+				 struct list_head *list, u64 seq)
 {
 	struct message_send *msend;
 	struct message_send *tmp;
-	int ret = 0;
+	bool moved = false;

 	assert_spin_locked(&conn->lock);

@@ -519,20 +519,20 @@ static int move_acked_responses(struct scoutfs_net_connection *conn,

 		msend->dead = 1;
 		list_move(&msend->head, &conn->send_queue);
-		ret = 1;
+		moved = true;
 	}

-	return ret;
+	return moved;
 }

 /* acks are processed inline in the recv worker */
 static void free_acked_responses(struct scoutfs_net_connection *conn, u64 seq)
 {
-	int moved;
+	bool moved;

 	spin_lock(&conn->lock);

-	moved = move_acked_responses(conn, &conn->send_queue, seq) +
+	moved = move_acked_responses(conn, &conn->send_queue, seq) |
 		move_acked_responses(conn, &conn->resend_queue, seq);

 	spin_unlock(&conn->lock);
--- a/kmod/src/scoutfs_trace.h
+++ b/kmod/src/scoutfs_trace.h
@@ -1046,9 +1046,12 @@ DECLARE_EVENT_CLASS(scoutfs_lock_class,
 		sk_trace_define(start)
 		sk_trace_define(end)
 		__field(u64, refresh_gen)
+		__field(u64, write_seq)
+		__field(u64, dirty_trans_seq)
 		__field(unsigned char, request_pending)
 		__field(unsigned char, invalidate_pending)
 		__field(int, mode)
+		__field(int, invalidating_mode)
 		__field(unsigned int, waiters_cw)
 		__field(unsigned int, waiters_pr)
 		__field(unsigned int, waiters_ex)
@@ -1061,9 +1064,12 @@ DECLARE_EVENT_CLASS(scoutfs_lock_class,
 		sk_trace_assign(start, &lck->start);
 		sk_trace_assign(end, &lck->end);
 		__entry->refresh_gen = lck->refresh_gen;
+		__entry->write_seq = lck->write_seq;
+		__entry->dirty_trans_seq = lck->dirty_trans_seq;
 		__entry->request_pending = lck->request_pending;
 		__entry->invalidate_pending = lck->invalidate_pending;
 		__entry->mode = lck->mode;
+		__entry->invalidating_mode = lck->invalidating_mode;
 		__entry->waiters_pr = lck->waiters[SCOUTFS_LOCK_READ];
 		__entry->waiters_ex = lck->waiters[SCOUTFS_LOCK_WRITE];
 		__entry->waiters_cw = lck->waiters[SCOUTFS_LOCK_WRITE_ONLY];
@@ -1071,10 +1077,11 @@ DECLARE_EVENT_CLASS(scoutfs_lock_class,
 		__entry->users_ex = lck->users[SCOUTFS_LOCK_WRITE];
 		__entry->users_cw = lck->users[SCOUTFS_LOCK_WRITE_ONLY];
        ),
-        TP_printk(SCSBF" start "SK_FMT" end "SK_FMT" mode %u reqpnd %u invpnd %u rfrgen %llu waiters: pr %u ex %u cw %u users: pr %u ex %u cw %u",
+        TP_printk(SCSBF" start "SK_FMT" end "SK_FMT" mode %u invmd %u reqp %u invp %u refg %llu wris %llu dts %llu waiters: pr %u ex %u cw %u users: pr %u ex %u cw %u",
 		  SCSB_TRACE_ARGS, sk_trace_args(start), sk_trace_args(end),
-		  __entry->mode, __entry->request_pending,
-		  __entry->invalidate_pending, __entry->refresh_gen,
+		  __entry->mode, __entry->invalidating_mode, __entry->request_pending,
+		  __entry->invalidate_pending, __entry->refresh_gen, __entry->write_seq,
+		  __entry->dirty_trans_seq,
 		  __entry->waiters_pr, __entry->waiters_ex, __entry->waiters_cw,
 		  __entry->users_pr, __entry->users_ex, __entry->users_cw)
 );
--- a/tests/golden/lock-shrink-read-race
+++ b/tests/golden/lock-shrink-read-race
@@ -0,0 +1,2 @@
+=== setup
+=== spin reading and shrinking
--- a/tests/sequence
+++ b/tests/sequence
@@ -25,6 +25,7 @@ totl-xattr-tag.sh
 quota.sh
 lock-refleak.sh
 lock-shrink-consistency.sh
+lock-shrink-read-race.sh
 lock-pr-cw-conflict.sh
 lock-revoke-getcwd.sh
 lock-recover-invalidate.sh
--- a/tests/tests/lock-shrink-read-race.sh
+++ b/tests/tests/lock-shrink-read-race.sh
@@ -0,0 +1,40 @@
+#
+# We had a lock server refcounting bug that could let one thread get a
+# reference on a lock struct that was being freed by another thread.  We
+# were able to reproduce this by having all clients try and produce a
+# lot of read and null requests.
+#
+# This will manfiest as a hung lock and timed out test runs, probably
+# with hung task messages on the console.  Depending on how the race
+# turns out, it can trigger KASAN warnings in
+# process_waiting_requests().
+#
+
+READERS_PER=3
+SECS=30
+
+echo "=== setup"
+touch "$T_D0/file"
+
+echo "=== spin reading and shrinking"
+END=$((SECONDS + SECS))
+for m in $(t_fs_nrs); do
+	eval file="\$T_D${m}/file"
+
+	# lots of tasks reading as fast as they can
+	for t in $(seq 1 $READERS_PER); do
+		(while [ $SECONDS -lt $END ]; do
+			stat $file > /dev/null
+		 done) &
+	done
+	# one task shrinking (triggering null requests) and reading
+	(while [ $SECONDS -lt $END ]; do
+		stat $file > /dev/null
+		t_trigger_arm_silent statfs_lock_purge $m
+		stat -f "$file" > /dev/null
+	 done) &
+done
+
+wait
+
+t_pass
Author	SHA1	Message	Date
Greg Cymbalski	110d5ea0d5	Block EL9 minor version upgrades Since kABI migrations across minor versions is a thing of the past going forward, we now: - Detect if we're on EL9 - If so, add a requirement on the various flavors of release package to that specific major.minor version This appropriately does not allow upgrades across minor versions.	2024-12-02 16:04:24 -08:00
Zach Brown	aed7169fac	Merge pull request #194 from versity/zab/v1.22 v1.22 Release	2024-11-03 15:06:40 -08:00
Zach Brown	7f313f2818	v1.22 Release Finish the release notes for the 1.22 release. Signed-off-by: Zach Brown <zab@versity.com>	2024-11-01 13:05:52 -07:00
Zach Brown	6b4e666952	Merge pull request #193 from versity/zab/hung_lock_fixes Zab/hung lock fixes	2024-10-31 16:56:51 -07:00
Zach Brown	4a26059d00	Add lock-shrink-read-race test Add a quick test that races readers and shrinking to stress lock object refcount racing between concurrent lock request handling threads in the lock server. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-31 15:35:11 -07:00
Zach Brown	19e78c32fc	Allow null lock compatibility between nodes Right now a client requesting a null mode for a lock will cause invalidations of all existing granted modes of the lock across the cluster. This is unneccessarily broad. The absolute requirement is that a null request invalidates other existing granted modes on the client. That's how the client safely resolves shrinking's desire to free locks while the locks are in use. It relies on turning it into the race between use and remote invalidation. But that only requires invalidating existing grants from the requesting client, not all clients. It is always safe for null grants to coexist with all grants on other clients. Consider the existing mechanics involving null modes. First, null locks are instatiated on the client before sending any requests at all. At any given time newly allocated null locks are coexisting with all existing locks across the cluster. Second, the server frees the client entry tracking struct the moment it sends a null grant to the client. From that point on the client's null lock can not have any impact on the rest of the lock holders because the server has forgotten about it. So we add this case to the server's test that two client lock modes are compatible. We take the opportunity to comment the heck out of this function instead of making it a dense boolean composition. The only functional change is the addition of this case, the existing cases are refactored but unchanged. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-31 15:34:59 -07:00
Zach Brown	8c1a45c9f5	Use bools instead of weird addition as or in net When freeing acked reesponses in the net layer we sweep the send and resend queues looking for queued responses up to the sequence number we've had acked. The code that did this used a weird pattern of returning ints and adding them which gave me pause. Clean it up to use bools and or (not short-circuiting \|\|) to more obviously communicate what's going on. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-30 13:38:12 -07:00
Zach Brown	5a6eb569f3	Add some lock debugging trace fields Over time some fields have been added to the lock struct which haven't been added to the lock tracing output. Add some of the more relevant lock fields to tracing. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-30 13:16:04 -07:00
Zach Brown	69d9040e68	Close lock server use-after-free race Lock object lifetimes in the lock server are protected by reference counts. References are acquired while holding a lock on an rbtree. Unfortunately, the decision to free lock objects wasn't tested while also holding that lock on the rbtree. A caller putting their object would test the refcount, then wait to get the rbtree lock to remove it from the tree. There's a possible race where the decision is made to remove the object but another reference is added before the object is removed. This was seen in testing and manifest as an incoming request handling path adding a request message to the object before it is freed, losing the message. Clients would then hang on a lock that never saw a response because their request was freed with the lock object. The fix is to hold the rbtree lock when testing the refcount and deciding to free. It adds a bit more contention but not significantly so, given the wild existing contention on a per-fs spinlocked rbtree. Signed-off-by: Zach Brown <zab@versity.com>	2024-10-30 13:04:13 -07:00
Zach Brown	d94ec29ffa	Merge pull request #192 from versity/greg/with-debug-kmod Generate debug packages	2024-10-24 15:35:03 -07:00
Greg Cymbalski	70c36ae394	Generate debug packages We had previously explicitly disabled this; let's start generating them.	2024-10-24 14:56:09 -07:00