scoutfs/ReleaseNotes.md

Versity ScoutFS Release Notes
=============================

---
v1.7
\
*Aug 26, 2022*

* **Fixed possible persistent errors moving freed data extents**
\
  Fixed a case where the server could hit persistent errors trying to
  move a client's freed extents in one commit.  The client had to free
  a large number of extents that occupied distant positions in the
  global free extent btree.  Very large fragmented files could cause
  this.  The server now moves the freed extents in multiple commits and
  can always ensure forward progress.

* **Fixed possible persistent errors from freed duplicate extents**
\
  Background orphan deletion wasn't properly synchronizing with
  foreground tasks deleting very large files.  If a deletion took long
  enough then background deletion could also attempt to delete inode items
  while the deletion was making progress.  This could create duplicate
  deletions of data extent items which causes the server to abort when
  it later discovers the duplicate extents as it merges free lists.

---
v1.6
\
*Jul 7, 2022*

* **Fix memory leaks in rare corner cases**
\
  Analysis tools found a few corner cases that leaked small structures,
  generally around error handling or startup and shutdown.

* **Add --skip-likely-huge scoutfs print command option**
\
  Add an option to scoutfs print to reduce the size of the output
  so that it can be used to see system-wide metadata without being
  overwhelmed by file-level details.

---
v1.5
\
*Jun 21, 2022*

* **Fix persistent error during server startup**
\
  Fixed a case where the server would always hit a consistent error on
  seartup, preventing the system from mounting.  This required a rare
  but valid state across the clients.

* **Fix a client hang that would lead to fencing**
\
  The client module's use of in-kernel networking was missing annotation
  that could lead to communication hanging.  The server would fence the
  client when it stopped communicating.  This could be identified by the
  server fencing a client after it disconnected with no attempt by the
  client to reconnect.

---
v1.4
\
*May 6, 2022*

* **Fix possible client crash during server failover**
\
  Fixed a narrow window during server failover and lock recovery that
  could cause a client mount to believe that it had an inconsistent item
  cache and panic.  This required very specific lock state and messaging
  patterns between multiple mounts and multiple servers which made it
  unlikely to occur in the field.

---
v1.3
\
*Apr 7, 2022*

* **Fix rare server instability under heavy load**
\
  Fixed a case of server instability under heavy load due to concurrent
  work fully exhausting metadata block allocation pools reserved for a
  single server transaction.  This would cause brief interruption as the
  server shutdown and the next server started up and made progress as
  pending work was retried.

* **Fix slow fencing preventing server startup**
\
  If a server had to process many fence requests with a slow fencing
  mechanism it could be interrupted before it finished.  The server
  now makes sure heartbeat messages are sent while it is making progress
  on fencing requests so that other quorum members don't interrupt the
  process.

* **Performance improvement in getxattr and setxattr**
\
  Kernel allocation patterns in the getxattr and setxattr
  implementations were causing significant contention between CPUs.  Their
  allocation strategy was changed so that concurrent tasks can call these
  xattr methods without degrading performance.

---
v1.2
\
*Mar 14, 2022*

* **Fix deadlock between fallocate() and read() system calls**
\
  Fixed a lock inversion that could cause two tasks to deadlock if they
  performed fallocate() and read() on a file at the same time.   The
  deadlock was uninterruptible so the machine needed to be rebooted.  This
  was relatively rare as fallocate() is usually used to prepare files
  before they're used.

* **Fix instability from heavy file deletion workloads**
\
  Fixed rare circumstances under which background file deletion cleanup
  tasks could try to delete a file while it is being deleted by another
  task.  Heavy load across multiple nodes, either many files being deleted
  or large files being deleted, increased the chances of this happening.
  Heavy staging could cause this problem because staging can create many
  internal temporary files that need to be deleted.

---
v1.1
\
*Feb 4, 2022*


* **Add scoutfs(1) change-quorum-config command**
\
  Add a change-quorum-config command to scoutfs(1) to change the quorum
  configuration stored in the metadata device while the file system is
  unmounted.   This can be used to change the mounts that will
  participate in quorum and the IP addresses they use.

* **Fix Rare Risk of Item Cache Corruption**
\
  Code review found a rare potential source of item cache corruption.
  If this happened it would look as though deleted parts of the filesystem
  returned, but only at the time they were deleted.  Old deleted items are
  not affected.  This problem only affected the item cache, never
  persistent storage.  Unmounting and remounting would drop the bad item
  cache and resync it with the correct persistent data.

---
v1.0
\
*Nov 8, 2021*


* **Initial Release**
\
  Version 1.0 marks the first GA release.