Compare commits

..

6 Commits

Author SHA1 Message Date
Zach Brown
9e3529060e v1.15 Release
Finish the release notes for the 1.15 release.

Signed-off-by: Zach Brown <zab@versity.com>
2023-07-17 12:07:13 -07:00
Zach Brown
1672b3ecec Merge pull request #130 from versity/zab/noncontig_alloc_einval
Fix partial preallocation when _contig_only = 0
2023-07-17 10:21:18 -07:00
Zach Brown
55f9435fad Fix partial preallocation when _contig_only = 0
Data preallocation attempts to allocate large aligned regions of
extents.  It tried to fill the hole around a write offset that
didn't contain an extent.  It missed the case where there can be
multiple extents between the start of the region and the hole.
It could try to overwrite these additional existing extents and writes
could return EINVAL.

We fix this by trimming the preallocation to start at the write offset
if there are any extents in the region before the write offset.  The
data preallocation test output has to be updated now that allocation
extents won't grow towards the start of the region when there are
existing extents.

Signed-off-by: Zach Brown <zab@versity.com>
2023-07-17 09:36:09 -07:00
Zach Brown
072f6868d3 Merge pull request #131 from versity/zab/server_merge_splice_failure
Process log merge splicing in many commits
2023-07-15 21:03:32 -07:00
Zach Brown
8a64b46a2f Process log merge splicing in many commits
Log merge completions were spliced in one server commit.  It's possible
to get enough completion work pending that it all can't be completed in
one server commit.  Operations fail with ENOSPC and because these
changes can't be unwound cleanly the server asserts.

This allows the completion splicing to break the work up into multiple
commits.

Processing completions in multiple commits means that request creation
can observe the merge status in states that weren't possible before.
Splicing is careful to maintain an elevated nr_complete count while the
client can't get requests because the tree is rebalancing.

Signed-off-by: Zach Brown <zab@versity.com>
2023-07-14 13:28:29 -07:00
Zach Brown
14901c39aa Merge pull request #129 from versity/zab/v1.14
v1.14 Release
2023-06-29 11:30:01 -07:00
5 changed files with 112 additions and 41 deletions

View File

@@ -1,6 +1,19 @@
Versity ScoutFS Release Notes
=============================
---
v1.15
\
*Jul 17, 2023*
Process log btree merge splicing in multiple commits. This prevents a
rare case where pending log merge completions contain more work than can
be done in a single server commit, causing the server to trigger an
assert shortly after starting.
Fix spurious EINVAL from data writes when data\_prealloc\_contig\_only was
set to 0.
---
v1.14
\

View File

@@ -458,11 +458,9 @@ static int alloc_block(struct super_block *sb, struct inode *inode,
/*
* Preallocation within aligned regions tries to
* allocate an extent to fill the hole in the region
* that contains iblock. We search for a next extent
* from the start of the region. If it's at the start
* we might have to search again to find an existing
* extent at the end of the region. (This next could be
* given to us by the caller).
* that contains iblock. We'd have to add a bit of plumbing
* to find previous extents so we only search for a next
* extent from the front of the region and from iblock.
*/
div64_u64_rem(iblock, opts.data_prealloc_blocks, &rem);
start = iblock - rem;
@@ -473,15 +471,15 @@ static int alloc_block(struct super_block *sb, struct inode *inode,
/* trim count if there's an extent in the region before iblock */
if (found.len && found.start < iblock) {
count -= (found.start + found.len) - start;
start = found.start + found.len;
count -= iblock - start;
start = iblock;
/* see if there's also an extent after iblock */
ret = scoutfs_ext_next(sb, &data_ext_ops, &args, iblock, 1, &found);
if (ret < 0 && ret != -ENOENT)
goto out;
}
/* trim count by a next extent in the region */
/* trim count by next extent after iblock */
if (found.len && found.start > start && found.start < start + count)
count = (found.start - start);
}

View File

@@ -2058,6 +2058,13 @@ out:
* reset the next range key if there's still work to do. If the
* operation is complete then we tear down the input log_trees items and
* delete the status.
*
* Processing all the completions can take more than one transaction.
* We return -EINPROGRESS if we have to commit a transaction and the
* caller will apply the commit and immediate call back in so we can
* perform another commit. We need to be very careful to leave the
* status in a state where requests won't be issued at the wrong time
* (by forcing nr_completions to a batch while we delete them).
*/
static int splice_log_merge_completions(struct super_block *sb,
struct scoutfs_log_merge_status *stat,
@@ -2070,15 +2077,29 @@ static int splice_log_merge_completions(struct super_block *sb,
struct scoutfs_log_merge_range rng;
struct scoutfs_log_trees lt = {{{0,}}};
SCOUTFS_BTREE_ITEM_REF(iref);
bool upd_stat = true;
int einprogress = 0;
struct scoutfs_key key;
char *err_str = NULL;
u32 alloc_low;
u32 tmp;
u64 seq;
int ret;
int err;
/* musn't rebalance fs tree parents while reqs rely on their key bounds */
if (WARN_ON_ONCE(le64_to_cpu(stat->nr_requests) > 0))
return -EIO;
/*
* Be overly conservative about how low the allocator can get
* before we commit. This gives us a lot of work to do in a
* commit while also allowing a pretty big smallest allocator to
* work with the theoretically unbounded alloc list splicing.
*/
scoutfs_alloc_meta_remaining(&server->alloc, &alloc_low, &tmp);
alloc_low = min(alloc_low, tmp) / 4;
/*
* Splice in all the completed subtrees at the initial parent
* blocks in the main fs_tree before rebalancing any of them.
@@ -2100,6 +2121,22 @@ static int splice_log_merge_completions(struct super_block *sb,
seq = le64_to_cpu(comp.seq);
/*
* Use having cleared the lists as an indication that
* we've already set the parents and don't need to dirty
* the btree blocks to do it all over again. This is
* safe because there is always an fs block that the
* merge dirties and frees into the meta_freed list.
*/
if (comp.meta_avail.ref.blkno == 0 && comp.meta_freed.ref.blkno == 0)
continue;
if (scoutfs_alloc_meta_low(sb, &server->alloc, alloc_low)) {
einprogress = -EINPROGRESS;
ret = 0;
goto out;
}
ret = scoutfs_btree_set_parent(sb, &server->alloc, &server->wri,
&super->fs_root, &comp.start,
&comp.root);
@@ -2134,6 +2171,14 @@ static int splice_log_merge_completions(struct super_block *sb,
}
}
/*
* Once we start rebalancing we force the number of completions
* to a batch so that requests won't be issued. Once we're done
* we clear the completion count and requests can flow again.
*/
if (le64_to_cpu(stat->nr_complete) < LOG_MERGE_SPLICE_BATCH)
stat->nr_complete = cpu_to_le64(LOG_MERGE_SPLICE_BATCH);
/*
* Now with all the parent blocks spliced in, rebalance items
* amongst parents that needed to split/join and delete the
@@ -2155,6 +2200,12 @@ static int splice_log_merge_completions(struct super_block *sb,
seq = le64_to_cpu(comp.seq);
if (scoutfs_alloc_meta_low(sb, &server->alloc, alloc_low)) {
einprogress = -EINPROGRESS;
ret = 0;
goto out;
}
/* balance when there was a remaining key range */
if (le64_to_cpu(comp.flags) & SCOUTFS_LOG_MERGE_COMP_REMAIN) {
ret = scoutfs_btree_rebalance(sb, &server->alloc,
@@ -2194,18 +2245,11 @@ static int splice_log_merge_completions(struct super_block *sb,
}
}
/* update the status once all completes are processed */
scoutfs_key_set_zeros(&stat->next_range_key);
stat->nr_complete = 0;
/* update counts and done if there's still ranges to process */
if (!no_ranges) {
init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
ret = scoutfs_btree_update(sb, &server->alloc, &server->wri,
&super->log_merge, &key,
stat, sizeof(*stat));
if (ret < 0)
err_str = "update status";
scoutfs_key_set_zeros(&stat->next_range_key);
stat->nr_complete = 0;
ret = 0;
goto out;
}
@@ -2241,6 +2285,12 @@ static int splice_log_merge_completions(struct super_block *sb,
(le64_to_cpu(lt.finalize_seq) < le64_to_cpu(stat->seq))))
continue;
if (scoutfs_alloc_meta_low(sb, &server->alloc, alloc_low)) {
einprogress = -EINPROGRESS;
ret = 0;
goto out;
}
fr.root = lt.item_root;
scoutfs_key_set_zeros(&fr.key);
fr.seq = cpu_to_le64(scoutfs_server_next_seq(sb));
@@ -2274,9 +2324,10 @@ static int splice_log_merge_completions(struct super_block *sb,
}
le64_add_cpu(&super->inode_count, le64_to_cpu(lt.inode_count_delta));
}
/* everything's done, remove the merge operation */
upd_stat = false;
init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
ret = scoutfs_btree_delete(sb, &server->alloc, &server->wri,
&super->log_merge, &key);
@@ -2285,12 +2336,23 @@ static int splice_log_merge_completions(struct super_block *sb,
else
err_str = "deleting merge status item";
out:
if (upd_stat) {
init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
err = scoutfs_btree_update(sb, &server->alloc, &server->wri,
&super->log_merge, &key,
stat, sizeof(struct scoutfs_log_merge_status));
if (err && !ret) {
err_str = "updating merge status item";
ret = err;
}
}
if (ret < 0)
scoutfs_err(sb, "server error %d splicing log merge completion: %s", ret, err_str);
BUG_ON(ret); /* inconsistent */
return ret;
return ret ?: einprogress;
}
/*
@@ -2465,6 +2527,12 @@ static void server_log_merge_free_work(struct work_struct *work)
}
/*
* Clients regularly ask if there is log merge work to do. We process
* completions inline before responding so that we don't create large
* delays between completion processing and the next request. We don't
* mind if the client get_log_merge request sees high latency, the
* blocked caller has nothing else to do.
*
* This will return ENOENT to the client if there is no work to do.
*/
static int server_get_log_merge(struct super_block *sb,
@@ -2532,14 +2600,22 @@ restart:
goto out;
}
/* maybe splice now that we know if there's ranges */
/* splice if we have a batch or ran out of ranges */
no_next = ret == -ENOENT;
no_ranges = scoutfs_key_is_zeros(&stat.next_range_key) && ret == -ENOENT;
if (le64_to_cpu(stat.nr_requests) == 0 &&
(no_next || le64_to_cpu(stat.nr_complete) >= LOG_MERGE_SPLICE_BATCH)) {
ret = splice_log_merge_completions(sb, &stat, no_ranges);
if (ret < 0)
if (ret == -EINPROGRESS) {
mutex_unlock(&server->logs_mutex);
ret = server_apply_commit(sb, &hold, 0);
if (ret < 0)
goto respond;
server_hold_commit(sb, &hold);
mutex_lock(&server->logs_mutex);
} else if (ret < 0) {
goto out;
}
/* splicing resets key and adds ranges, could finish status */
goto restart;
}
@@ -2741,6 +2817,7 @@ out:
mutex_unlock(&server->logs_mutex);
ret = server_apply_commit(sb, &hold, ret);
respond:
return scoutfs_net_response(sb, conn, cmd, id, ret, &req, sizeof(req));
}

View File

@@ -122,7 +122,6 @@ wrote blk 39
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
55.. 1:
@@ -143,10 +142,8 @@ wrote blk 44
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
55.. 1:
@@ -167,10 +164,8 @@ wrote blk 48
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
@@ -193,10 +188,8 @@ wrote blk 62
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
@@ -221,10 +214,8 @@ wrote blk 67
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
@@ -252,10 +243,8 @@ wrote blk 73
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
@@ -285,10 +274,8 @@ wrote blk 86
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
@@ -304,7 +291,6 @@ wrote blk 86
73.. 1:
74.. 5: unwritten
79.. 2:
81.. 5: unwritten
86.. 1:
87.. 2:
95.. 1: eof
@@ -320,10 +306,8 @@ wrote blk 92
25.. 1:
26.. 6: unwritten
32.. 1:
33.. 6: unwritten
39.. 1:
40.. 1:
41.. 3: unwritten
44.. 1:
45.. 3: unwritten
48.. 1:
@@ -339,10 +323,8 @@ wrote blk 92
73.. 1:
74.. 5: unwritten
79.. 2:
81.. 5: unwritten
86.. 1:
87.. 2:
89.. 3: unwritten
92.. 1:
93.. 2: unwritten
95.. 1: eof

View File

@@ -168,7 +168,8 @@ print_extents_found $prefix
# the start, and one at the end.
#
# Let's keep this last because it creates a ton of output to read
# through.
# through. The correct output is tied to preallocation strategy so it
# has to be verified each time we change preallocation.
#
echo "== block writes into region allocs hole"
t_set_sysfs_mount_option 0 data_prealloc_blocks 8