scoutfs: use efficient btree block structures

This btree implementation was first built for the relatively light duty
of indexing segments in the LSM item implementation.  We're now using it
as the core metadata index.  It's already using a lot of cpu to do its
job with small blocks and it only gets more expensive as the block size
increases.  These changes reduce the CPU use of working with the btree
block structures.

We use a balanced binary tree to index items by key in the block.  This
gives us rare tree balancing cost on insertion and deletion instead of
the memmove overhead of maintaining a dense array of item offsets sorted
by key.  The keys are stored in the item struct which are stored in an
array at the front of the block so searching for an item uses contiguous
cachelines.

We add a trailing owner offset to values so that we can iterate through
them.  This is used to track space freed up by values instead of paying
the memmove cost of keeping all the values at the end of the block.  We
occasionally reclaim the fragmented value free space instead of
splitting the block.

Direct item lookups use a small hash table at the end of the block
which maps offsets to items.  It uses linear probing and is guaranteed
to have a light load factor so lookups are very likely to only need
a single cache lookup.

We adjust the watermark for triggering a join from half of a block down
to a quarter.  This results in less utilized blocks on average.  But it
creates distance between the join and split thresholds so we get less
cpu use from constantly joining and splitting if item populations happen
to hover around the previously shared threshold.

While shifting the implementation we choose not to add support for some
features that no longer make sense.  There are no longer callers of
_before and _after, and having synthetic tests to use small btree blocks
no longer makes ense when we can easily create very tall trees.  Both
those btree interfaces and the tiny btree block support will be removed.

Signed-off-by: Zach Brown <zab@versity.com>
This commit is contained in:
Zach Brown
2020-04-29 14:50:11 -07:00
committed by Zach Brown
parent f59336085d
commit efd9763355
2 changed files with 693 additions and 425 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -196,17 +196,10 @@ struct scoutfs_avl_node {
} __packed;
/* when we split we want to have multiple items on each side */
#define SCOUTFS_BTREE_MAX_VAL_LEN (SCOUTFS_BLOCK_SIZE / 8)
#define SCOUTFS_BTREE_MAX_VAL_LEN 512
/*
* The min number of free bytes we must leave in a parent as we descend
* to modify. This guarantees enough free bytes in a parent to insert a
* new child reference item as a child block splits.
*/
#define SCOUTFS_BTREE_PARENT_MIN_FREE_BYTES \
(sizeof(struct scoutfs_btree_item_header) + \
sizeof(struct scoutfs_btree_item) + \
sizeof(struct scoutfs_btree_ref))
/* each value ends with an offset which lets compaction iterate over values */
#define SCOUTFS_BTREE_VAL_OWNER_BYTES sizeof(__le16)
/*
* When debugging we can tune the splitting and merging thresholds to
@@ -236,24 +229,37 @@ struct scoutfs_btree_root {
__u8 height;
} __packed;
struct scoutfs_btree_item_header {
__le32 off;
} __packed;
struct scoutfs_btree_item {
struct scoutfs_avl_node node;
struct scoutfs_key key;
__le16 val_off;
__le16 val_len;
__u8 val[0];
} __packed;
struct scoutfs_btree_block {
struct scoutfs_block_header hdr;
__le32 free_end;
__le32 nr_items;
struct scoutfs_avl_root item_root;
__le16 nr_items;
__le16 total_item_bytes;
__le16 mid_free_len;
__le16 last_free_off;
__le16 last_free_len;
__u8 level;
struct scoutfs_btree_item_header item_hdrs[0];
struct scoutfs_btree_item items[0];
/* leaf blocks have a fixed size item offset hash table at the end */
} __packed;
/*
* Try to aim for a 75% load in a leaf full of items with no value.
* We'll almost never see this because most items have values and most
* blocks aren't full.
*/
#define SCOUTFS_BTREE_LEAF_ITEM_HASH_NR \
((SCOUTFS_BLOCK_SIZE - sizeof(struct scoutfs_btree_block)) / \
(sizeof(struct scoutfs_btree_item) + (sizeof(__le16))) * 100 / 75)
#define SCOUTFS_BTREE_LEAF_ITEM_HASH_BYTES \
(SCOUTFS_BTREE_LEAF_ITEM_HASH_NR * sizeof(__le16))
struct scoutfs_mounted_client_btree_val {
__u8 flags;
} __packed;