Commit Graph

96 Commits

Author SHA1 Message Date
Bryant G. Duffy-Ly
90cfaf17d1 Initial support for different inode sizes
We're about to increase the inode size and increment the format version.
Inode reading and writing has to handle different valid inode sizes as
allowed by the format version.   This is the initial skeletal work that
later patches which really increase the inode size will further refine
to add the specific known sizes and format versions.

Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
[zab@versity.com: reworded description, reworked to use _within]
Signed-off-by: Zach Brown <zab@versity.com>
2024-06-28 14:53:49 -07:00
Zach Brown
6db69b7a4f Set root inode crtime in mkfs
When we added the crtime creation timestamp to the inode we forgot to
update mkfs to set the crtime of the root inode.

Signed-off-by: Zach Brown <zab@versity.com>
2024-06-25 15:11:18 -07:00
Zach Brown
b76e22ffcf Refactor user util functions for device size
Split the existing device_size() into get_device_size() and
limit_device_size().  An upcoming command wants to get the device size
without applying limiting policy.

Signed-off-by: Zach Brown <zab@versity.com>
2023-04-17 12:47:50 -07:00
Zach Brown
3363b4fb79 Flush device caches in buffered util cmds
Add calls to our new device cache flushing helper in commands that use
buffered reads.

Signed-off-by: Zach Brown <zab@versity.com>
2023-01-18 10:52:02 -08:00
Zach Brown
285b68879a Set quorum config ver to 1 in mkfs and print
We're adding a command to change the quorum config which updates its
version number.  Let's make the version a little more visible and start
it at the more humane 1.

Signed-off-by: Zach Brown <zab@versity.com>
2021-11-24 15:41:04 -08:00
Zach Brown
ce76682db7 Make mkfs quorum helpers available
Move functions for printing and validating the quorum config from mkfs.c
to quorum.c so that they can be used in an upcoming command to change
the quorum config.

Signed-off-by: Zach Brown <zab@versity.com>
2021-11-24 13:44:51 -08:00
Zach Brown
686f8515bc Fix --quorum-count typo in mkfs error message
The change from --quorum-count to --quorum-slot forgot to update a
mention of the option in an error message in mkfs when it wasn't
provided.

Signed-off-by: Zach Brown <zab@versity.com>
2021-11-24 13:44:51 -08:00
Zach Brown
95ed36f9d3 Maintain inode count in super and log trees
Add a count of used inodes to the super block and a change in the inode
count to the log_trees struct.   Client transactions track the change in
inode count as they create and delete inodes.   The log_trees delta is
added to the count in the super as finalized log_trees are deleted.

Signed-off-by: Zach Brown <zab@versity.com>
2021-10-28 12:30:47 -07:00
Zach Brown
366f615c9f Add support for our format version
We had previously started on a relatively simple notion of an
interoperability version which wasn't quite right.  This fleshes out
support for a more functional format version.   The super blocks have a
single version that defines behaviour of the running system.   The code
supports a range of versions and we add some initial interfaces for
updating the version while the system is offline.   All of this together
should let us safely change the underlying format over time.

Signed-off-by: Zach Brown <zab@versity.com>
2021-10-28 12:30:46 -07:00
Zach Brown
1cdcf41ac7 Move more block read/write functions to util
We're adding another command that does block IO so move some block
reading and writing functions out of mkfs.   We also grow a few function
variants and call the write_sync variant from mkfs instead of having it
manually sync.

Signed-off-by: Zach Brown <zab@versity.com>
2021-10-28 12:30:46 -07:00
Zach Brown
36fcc4665d Align first free ino to lock group
Currently the first inode number that can be allocated directly follows
the root inode.  This means the first batch of allocated inodes are in
the same lock group as the root inode.

The root inode is a bit special.  It is always hot as absolute path
lookups and inode-to-path resolution always read directory entries from
the root.

Let's try aligning the first free inode number to the next inode lock
group boundary.  This will stop work in those inodes from necessarily
conflicting with work in the root inode.

Signed-off-by: Zach Brown <zab@versity.com>
2021-08-25 10:14:38 -07:00
Zach Brown
fd686cab86 Fix total_data_blocks calculation in mkfs
mkfs was incorrectly initializing total_data_blocks.  The field is meant
to record the number of blocks from the start of the device that the
filesystem could access.  mkfs was subtracting the initial reserved area
of the device, meaning the number of blocks that the filesystem might
access.

This could allow accesses past devices if mount checks the device size
against the smaller total_data_blocks.

And we're about to use total_data_blocks as the start of a new extent to
add when growing the volume.  It needs to be fixed so that this new
grown free extent doesn't overlap with the end of the existing free
extents.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-30 13:26:32 -07:00
Zach Brown
4c1181c055 Remove first_ and last_ super blkno fields
There are fields in the super block that specify the range of blocks
that would be used for metadata or data.  They are from the time when a
single block device was carved up into regions for metadata and data.

They don't make sense now that we have separate metadata and data block
devices.  The starting blkno is static and we go to the end of the
device.

This removes the fields now that they serve no purpose.   The only use
of them to check that freed extents fell within the correct bounds can
still be performed by using the static starting number or roughly using
the size of the devices.  It's not perfect, but this is already only
a check to see that the blknos aren't utter nonsense.

We're removing the fields now to avoid having to update them while
worrying about users when resizing devices.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-30 13:22:42 -07:00
Zach Brown
84454b38c5 Add mkfs -A for small device sizes
Normally mkfs would fail if we specify meta or data devices that are too
small.  We'd like to use small devices for test scenarios, though, so
add an option to allow specifying sizes smaller than the minumum
required sizes.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-07 14:13:14 -07:00
Zach Brown
73bf916182 Return ENOSPC as space gets low
Returning ENOSPC is challenging because we have clients working on
allocators which are a fraction of the whole and we use COW transactions
so we need to be able to allocate to free.  This adds support for
returning ENOSPC to client posix allocators as free space gets low.

For metadata, we reserve a number of free blocks for making progress
with client and server transactions which can free space.  The server
sets the low flag in a client's allocator if we start to dip into
reserved blocks.  In the client we add an argument to entering a
transaction which indicates if we're allocating new space (as opposed to
just modifying existing data or freeing).  When an allocating
transaction runs low and the server low flag is set then we return
ENOSPC.

Adding an argument to transaciton holders and having it return ENOSPC
gave us the opportunity to clean it up and make it a little clearer.
More work is done outside the wait_event function and it now
specifically waits for a transaction to cycle when it forces a commit
rather than spinning until the transaction worker acquires the lock and
stops it.

For data the same pattern applies except there are no reserved blocks
and we don't COW data so it's a simple case of returning the hard ENOSPC
when the data allocator flag is set.

The server needs to consider the reserved count when refilling the
client's meta_avail allocator and when swapping between the two
meta_avail and meta_free allocators.

We add the reserved metadata block count to statfs_more so that df can
subtract it from the free meta blocks and make it clear when enospc is
going to be returned for metadata allocations.

We increase the minimum device size in mkfs so that small testing
devices provide sufficient reserved blocks.

And finally we add a little test that makes sure we can fill both
metadata and data to ENOSPC and then recover by deleting what we filled.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-07 14:13:14 -07:00
Zach Brown
9711fef122 Update for core, trans, and item seq use
We now have a core seq number in the super that is advanced for multiple
users.    The client transaction seq comes from the core seq so we
remove the trans_seq from the super.  The item version is also converted
to use a seq that's derived from the core seq.

Signed-off-by: Zach Brown <zab@versity.com>
2021-06-17 09:36:00 -07:00
Zach Brown
d0b04e790c Add data-alloc-zone-blocks argument to mkfs
Add an argument to mkfs which sets the data_alloc_zone_blocks volume
option.

Signed-off-by: Zach Brown <zab@versity.com>
2021-05-21 15:31:02 -07:00
Zach Brown
9de3ae6dcb Index free extents by order of length
Allocators store free extents in two items, one sorted by their blkno
position and the other by their precise length.

The length index makes it easy to search for precise extent lengths, but
it makes it hard to search for a large extent within a given blkno
region.  Skipping in the blkno dimension has to be done for every
precise length value.

We don't need that level of precision.  If we index the extents by a
coarser order of the length then we have a fixed number of orders in
which we have to skip in the blkno dimension when searching within a
specific region.

This changes the length item to be stored at the log(8) order of the
length of the extents.  This groups extents into orders that are close
to the human-friendly base 10 orders of magnitude.

With this change the order field in the key no longer stores the precise
extent length.  To preserve the length of the extent we need to use
another field.  The only 64bit field remaining is the first which is a
higher comparision priority than the type.  So we use the highest
comparison priority zone field to differentiate the position and order
indexes and can now use all three 64bit fields in the key.

Finally, we have to be careful when constructing a key to use _next when
searching for a large extent.  Previously keys were relying on the magic
property that building a key from an extent length of 0 ended up at the
key value -0 = 0.  That only worked because we never stored zero length
extents.  We now store zero length orders so we can't use the negative
trick anymore.  We explicitly treat 0 length extents carefully when
building keys and we subtract the order from U64_MAX to store the orders
from largest to smallest.

Signed-off-by: Zach Brown <zab@versity.com>
2021-05-21 15:25:56 -07:00
Andy Grover
efe5d92458 Reserve space in superblock for IPv6 addresses
Define a family field, and add a union for IPv4 and v6 variants, although
v6 is not supported yet.

Family field is now used to determine presence of address in a quorum slot,
instead of checking if addr is zero.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-03-12 14:10:42 -08:00
Zach Brown
79f6878355 Clean up block writing in mkfs
scoutfs mkfs had two block writing functions: write_block to fill out
some block header fields including crc calculation, and then
write_block_raw to pwrite the raw buffer to the bytes in the device.

These were used inconsistenly as blocks came and went over time.  Most
callers filled out all the header fields themselves and called the raw
writer.  write_block was only used for super writing, which made sense
because it clobbered the block's header with the super header so the
caller's set header magic and seq fields would be lost.

This cleans up the mess.  We only have one block writer and the caller
provides all the hdr fields.  Everything uses it instead of filling out
the fields themselves and calling the raw writer.

Signed-off-by: Zach Brown <zab@versity.com>
2021-02-22 13:28:38 -08:00
Zach Brown
87fcad5428 Update scoutfs mkfs and print for quorum slots
Signed-off-by: Zach Brown <zab@versity.com>
2021-02-22 13:28:38 -08:00
Andy Grover
d731c1577e Filesystem version instead of format hash check
Instead of hashing headers, define an interop version. Do not mount
superblocks that have a different version, either higher or lower.

Since this is pretty much the same as the format hash except it's a
constant, minimal code changes are needed.

Initial dev version is 0, with the intent that version will be bumped to
1 immediately prior to tagging initial release version.

Update README. Fix comments.

Add interop version to notes and modinfo.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-15 10:53:00 -08:00
Andy Grover
d48b447e75 Do not set -Wpadded except for checking kmod-shared headers
Remove now-unneeded manual padding in arg structs.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-12 16:29:42 -08:00
Andy Grover
e0a2175c2e Use argp info instead of duplicating for cmd_register()
Make it static and then use it both for argp_parse as well as
cmd_register_argp.

Split commands into five groups, to help understanding of their
usefulness.

Mention that each command has its own help text, and that we are being
fancy to keep the user from having to give fs path.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-12 16:29:42 -08:00
Andy Grover
7befc61482 Implement argp support for mkfs and add --force
Support max-meta-size and max-data-size using KMGTP units with rounding.

Detect other fs signatures using blkid library.

Detect ScoutFS super using magic value.

Move read_block() from print.c into util.c since blkid also needs it.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-12 16:29:42 -08:00
Andy Grover
8f72d16609 scoutfs-utils: Use separate block devices for metadata and data
mkfs: Take two block devices as arguments. Write everything to metadata
dev, and the superblock to the data dev. UUIDs match. Differentiate by
checking a bit in a new "flags" field in the superblock.

Refactor device_size() a little. Convert spaces to tabs.

Move code to pretty-print sizes to dev.c so we can use it in error
messages there, as well as in mkfs.c.

print: Include flags in output.

Add -D and -M options for setting max dev sizes

Allow sizes to be specified using units like "K", "G" etc.

Note: -D option replaces -S option, and uses above units rather than
the number of 4k data blocks.

Update man pages for cmdline changes.

Signed-off-by: Andy Grover <agrover@versity.com>
2020-11-19 11:41:54 -08:00
Zach Brown
ea7c41d876 scoutfs-utils: remove free_*_blocks super fields
The kernel is no longer storing the total free space in all allocators
in super block fields.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:41 -07:00
Zach Brown
669e7f733b scoutfs-utils: add -S to limit device size
Add an option to mkfs to have it limit the size of the device that's
used by mkfs.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:41 -07:00
Zach Brown
4bd86d1a00 scoutfs-utils: return error for small device
The check for a small device didn't return an error code because it was
copied from error tests of ret for an error code.  It has to generate
one, do so.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:41 -07:00
Zach Brown
23711f05f6 scoutfs-utils: alloc and data uses full extents
Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:41 -07:00
Zach Brown
d87e2e0166 scoutfs-utils: add btree insertion for mkfs
Use little helpers to insert items into new single block btrees for
mkfs.  We're about to insert a whole bunch more items.

Signed-off-by: Zach Brown <zab@versity.com>
2020-10-26 15:19:41 -07:00
Zach Brown
9bb32b8003 scoutfs-utils: fix last data blkno
The calculation of the last valid data blkno was off by one.  It was
calculating the total number of small blocks that fit in the device
size.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:28 -07:00
Zach Brown
5f0dbc5f85 scoutfs-utils: remove radix _first fields
The recent cleanup of the radix allocator included removing tracking of
the first set bits or references in blocks.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:28 -07:00
Zach Brown
39993d8b5f scoutfs-utils: use larger metadata blocks
Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:28 -07:00
Zach Brown
b86a1bebbb scoutfs-utils: support btree avl and hash
Update the internal structure of btree blocks to use the avl item index
and hash table direct item lookup.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:28 -07:00
Zach Brown
aa84f7c601 scoutfs-utils: use scoutfs_key as btree key
Track the kernel changes to use the scoutfs_key struct as the btree key
instead of a big-endian binary blob.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:28 -07:00
Zach Brown
ec782fff8d scoutfs-utils: meta and data free blocks
The super block now tracks free metadata and data blocks in separate
counters.

Signed-off-by: Zach Brown <zab@versity.com>
2020-02-25 12:04:17 -08:00
Zach Brown
ff436db49b scoutfs-utils: add support for radix alloc
Add support for initializing radix allocator blocks that describe free
space in mkfs and support for printing them out.

Signed-off-by: Zach Brown <zab@versity.com>
2020-02-25 12:04:17 -08:00
Zach Brown
e0a49c46a7 scoutfs-utils: add packed extents and bitmaps
Signed-off-by: Zach Brown <zab@versity.com>
2020-01-17 11:22:04 -08:00
Zach Brown
3776c18c66 scoutfs-utils: switch to btree forest
Remove all the lsm code from mkfs and print, replacing
it with the forest of btrees.

Signed-off-by: Zach Brown <zab@versity.com>
2020-01-17 11:22:04 -08:00
Zach Brown
7cd8738add scoutfs-utils: net uses rid instead of node_id
Now that networking is identifing clients by their rid some persistent
structures are using that to store records of clients.

Signed-off-by: Zach Brown <zab@versity.com>
2019-08-20 15:52:17 -07:00
Zach Brown
3670a5b80d scoutfs-utils: remove quorum slot config
The format no longer has statically configured named slots.  The only
persistent config is the number of monts that must be voting to reach
quorum.  The quorum blocks now have a log of successfull elections.

Signed-off-by: Zach Brown <zab@versity.com>
2019-08-20 15:52:17 -07:00
Zach Brown
3c9eeeb2ef scoutfs-utils: add transaction seq btree
Signed-off-by: Zach Brown <zab@versity.com>
2019-04-12 10:54:20 -07:00
Zach Brown
64bdda717c scoutfs-utils: move super id to block hdr magic
Move the magic value that identifies the super block into the block
header and use it for btree blocks as well.

Signed-off-by: Zach Brown <zab@versity.com>
2019-04-12 10:54:20 -07:00
Zach Brown
ea969a5dde scoutfs-utils: update format.h for quorum
Signed-off-by: Zach Brown <zab@versity.com>
2019-04-12 10:54:20 -07:00
Zach Brown
bbfa71361f scoutfs-utils: compaction request format update
Signed-off-by: Zach Brown <zab@versity.com>
2018-08-28 15:34:33 -07:00
Zach Brown
078d2f6073 scoutfs-utils: update format for greeting node_id
Signed-off-by: Zach Brown <zab@versity.com>
2018-08-28 15:34:33 -07:00
Zach Brown
7abf5c1e2b scoutfs-utils: calculate segment crc in mkfs
Signed-off-by: Zach Brown <zab@versity.com>
2018-08-21 13:27:37 -07:00
Zach Brown
ea2ec838ec scoutfs-utils: use one super and verify its crc
Signed-off-by: Zach Brown <zab@versity.com>
2018-06-29 15:56:36 -07:00
Zach Brown
59739e0057 scoutfs-utils: remove sneaky tab in mkfs output
We had a tab in the mkfs output that'd cause it to be misaligned.

Signed-off-by: Zach Brown <zab@versity.com>
2018-06-29 14:42:08 -07:00