scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-01-05 11:45:09 +00:00

Author	SHA1	Message	Date
Bryant G. Duffy-Ly	9ba4271c26	Add new max format version of 2 We're about to add new format structures so increment the max version to 2. Future commits will add the features before we release version 2 in the wild. Signed-off-by: Zach Brown <zab@zabbo.net>	2024-06-28 14:53:49 -07:00
Bryant G. Duffy-Ly	90cfaf17d1	Initial support for different inode sizes We're about to increase the inode size and increment the format version. Inode reading and writing has to handle different valid inode sizes as allowed by the format version. This is the initial skeletal work that later patches which really increase the inode size will further refine to add the specific known sizes and format versions. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com> [zab@versity.com: reworded description, reworked to use _within] Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	6931cb7b0e	Add scoutfs_inode_[gs]et_flags Add functions for getting and setting our private inode flags. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	7d4db05445	Add scoutfs_item_lookup_smaller_zero Add a lookup variant that returns an error if the item value is larger than the caller's value buffer size and which zeros the rest of the caller's buffer if the returned value is smaller. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	7b71250072	Merge pull request #176 from versity/zab/accumulated_fixes Zab/accumulated fixes	2024-06-26 13:21:50 -07:00
Zach Brown	8e37be279c	Use seqlock to protect inode fields We were using a seqcount to protect high frequency reads and writes to some of our private inode fields. The writers were serialized by the caller but that's a bit too easy to get wrong. We're already storing the write seqcount update so the additional internal spinlock stores in seqlocks isn't a significant additional overhead. The seqlocks also handle preemption for us. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	d6642da44d	Prevent downgrade of format version Don't let change-format-version decrease the format version. It doesn't have the machinery to go back and migrate newer structures to older structures that would be compatible with code expecting the older version. Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com> [zab@versity.com: split from initial patch with other changes] Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	4b87045447	Pre-declare scoutfs_lock in forest.h Definitions in forest.h use lock pointers. Pre-declare the struct so it doesn't break inclusion without lock.h, following current practice in the header. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	3f773a8594	Fix uninit written in scoutfs_file_write_iter scoutfs_file_write_iter tried to track written bytes and return those unless there was an error. But written was uninitialized if we got errors in any of the calls leading up to performing the write. The bytes written were also not being passed to the generic_write_sync helper. This fixes up all those inconsistencies and makes it look like the write_iter path in other filesystems. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	c385eea9a1	Check for all offline in scoutfs_file_write_iter When we write to file contents we change the data_version. To stage old contents into an offline region the data_version of the file must match the archived copy. When writing we have to make sure that there is no offline data so that we don't increase the data_version which will prevent staging of any other file regions because the data_versions no longer match. scoutfs_file_write_iter was only checking for offline data in its write region, not the entire file. Fix it to match the _aio_write method and check the whole file. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	c296bc1959	Remove scoutfs_data_wait_check_iter scoutfs_data_wait_check_iter() was checking the contiguous region of the file starting at its pos and extending for iter_iov_count() bytes. The caller can do that with the previous _data_wait_check() method by providing the same count that _check_iter() was using. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	3052feac29	Have item cache show unprotected lock The item cache has a bit of safety checks that make sure that an operation is performed while holding a lock that covers the item. It dumped a stack trace via WARN when that wasn't true, but it didn't include any details about the keys or lock modes involved. This adds a message that's printed once which includes the keys and modes when an operation is attempted that isn't protected. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	1fa0d7727c	scoutfs_item_create checks wrong lock mode scoutfs_item_create() was checking that its lock had a read mode, when it should have been checking for a write mode. This worked out because callers with write mode locks are also protecting reads. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	2af6f47c8b	Fix bad error exit path in unlink Unlink looks up the entry items for the name it is removing because we no longer store the extra key material in dentries. If this lookup fails it will use an error path which release a transaction which wasn't held. Thankfully this error path is unlikely (corruption or systemic errors like eio or enomem) so we haven't hit this in practice. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:20 -07:00
Zach Brown	6db69b7a4f	Set root inode crtime in mkfs When we added the crtime creation timestamp to the inode we forgot to update mkfs to set the crtime of the root inode. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-25 15:11:18 -07:00
Zach Brown	8ca1f1994d	Merge pull request #174 from versity/zab/trace_block_estale Add tracepoint as block read returns ESTALE	2024-06-11 09:42:45 -07:00
Zach Brown	48716461e4	Add tracepoint as block read returns ESTALE Block reads can return ESTALE naturally as mounts read through old cached blocks. We won't always log it as an error but we should add a tracepoint that can be inspected. Signed-off-by: Zach Brown <zab@versity.com>	2024-06-10 11:03:38 -07:00
Zach Brown	965b692bdc	Merge pull request #171 from versity/zab/v1.20 v1.20 Release	2024-04-22 14:51:39 -07:00
Zach Brown	c3c4b08038	v1.20 Release Finish the release notes for the 1.20 release. Signed-off-by: Zach Brown <zab@versity.com> v1.20	2024-04-22 13:20:42 -07:00
Zach Brown	0519830229	Merge pull request #165 from versity/greg/kmod-uninstall-cleanup More cleanly drive weak-modules on install/uninstall	2024-04-11 14:32:06 -07:00
Greg Cymbalski	4d6e1a14ae	More safely install/uninstall with weak-modules This addresses some minor issues with how we handle driving the weak-modules infrastructure for handling running on kernels not explicitly built for. For one, we now drive weak-modules at install-time more explicitly (it was adding symlinks for all modules into the right place for the running kernel, whereas now it only handles that for scoutfs against all installed kernels). Also we no longer leave stale modules on the filesystem after an uninstall/upgrade, similar to what's done for vsm's kmods right now. RPM's pre/postinstall scriptlets are used to drive weak-modules to clean things up. Note that this (intentionally) does not (re)generate initrds of any kind. Finally, this was tested on both the native kernel version and on updates that would need the migrated modules. As a result, installs are a little quicker, the module still gets migrated successfully, and uninstalls correctly remove (only) the packaged module.	2024-04-11 13:20:50 -07:00
Greg Cymbalski	fc3e061ea8	Merge pull request #164 from versity/greg/preserve-git-describe Encode git info into spec to keep git info in final kmod	2024-03-29 13:48:33 -07:00
Greg Cymbalski	a4bc3fb27d	Capture git info at spec creation time, pass into make	2024-02-05 15:44:10 -08:00
Zach Brown	67990a7007	Merge pull request #162 from versity/zab/v1.19 v1.19 Release	2024-01-30 15:46:49 -08:00
Zach Brown	ba819be8f9	v1.19 Release Finish the release notes for the 1.19 release. Signed-off-by: Zach Brown <zab@versity.com> v1.19	2024-01-30 12:11:23 -08:00
Zach Brown	1b103184ca	Merge pull request #161 from versity/zab/merge_timeout_option_fix Correctly set the log_merge_wait_timeout_ms option	2024-01-30 12:07:10 -08:00
Zach Brown	c3890abd7b	Correctly set the log_merge_wait_timeout_ms option The initial code for setting the timeout used the wrong parsed variable. Signed-off-by: Zach Brown <zab@versity.com>	2024-01-30 12:01:35 -08:00
Zach Brown	5ab38bfa48	Merge pull request #160 from versity/zab/log_merging_speedups Zab/log merging speedups	2024-01-29 12:26:55 -08:00
Zach Brown	e9ad61b444	Delete multiple log trees items per server commit server_log_merge_free_work() is responsible for freeing all the input log trees for a log merge operation that has finished. It looks for the next item to free, frees the log btree it references, and then deletes the item. It was doing this with a full server commit for each item which can take an agonizingly long time. This changes it perform multiple deletions in a commit as long as there's plenty of alloc space. The moment the commit gets low it applies the commit and opens a new one. This sped up the deletion of a few hundred thousand log tree items from taking hours to seconds. Signed-off-by: Zach Brown <zab@versity.com>	2024-01-25 11:30:17 -08:00
Zach Brown	91bbf90f71	Don't pin input btrees when merging The btree_merge code was pinning leaf blocks for all input btrees as it iterated over them. This doesn't work when there are a very large number of input btrees. It can run out of memory trying to hold a reference to a 64KiB leaf block for each input root. This reworks the btree merging code. It reads a window of blocks from all input trees to get a set of merged items. It can take multiple passes to complete the merge but by setting the merge window large enough this overhead is reduced. Merging now consumes a fixed amount of memory rather than using memory proportional to the number of input btrees. Signed-off-by: Zach Brown <zab@versity.com>	2024-01-25 11:30:17 -08:00
Zach Brown	b5630f540d	Add tracing of the log merge finalizing decision Signed-off-by: Zach Brown <zab@versity.com>	2024-01-25 11:30:17 -08:00
Zach Brown	90a4c82363	Make log merge wait timeout tunable Add a mount option for the amount of time that log merge creation can wait before giving up. We add some counters so we can see how often the timeout is being hit and what the average successfull wait time is. Signed-off-by: Zach Brown <zab@versity.com>	2024-01-25 11:25:56 -08:00
Zach Brown	f654fa0fda	Send syncs once when starting to merge The server sends sync requests to clients when it sees that they have open log trees that need to be committed for log merging to proceed. These are currently sent in the context of each client's get_log_trees request, resulting in sync requests queued for one client from all clients. Depending on message delivery and commit latencies, this can create a sync storm. The server's sends are reliable and the open commits are marked with the seq when they opened. It's easy for us to record having sent syncs to all open commits so that future attempts can be avoided. Later open commits will have higher seqs and will get a new round of syncs sent. Signed-off-by: Zach Brown <zab@versity.com>	2024-01-25 11:25:20 -08:00
Zach Brown	50168a2d2a	Check each client's last log item for stable seq The server was checking all client log_trees items to search for the lowest commit seq that was still open. This can be expensive when there are a lot of finalized log_trees items that won't have open seqs. Only the last log_trees item for each client rid can be open, and the items are sorted by rid and nr, so we can easily only check the last item for each client rid. Signed-off-by: Zach Brown <zab@versity.com>	2024-01-25 11:24:50 -08:00
Zach Brown	3c0616524a	Only search last log_trees per rid for finalizing During get_log_trees the server checks log_trees items to see if it should start a log merge operation. It did this by iterating over all log_trees items and there can be quite a lot of them. It doesn't need to see all of the items. It only needs to see the most recent log_trees item for each mount. That's enough to make the decisions that start the log merging process. Signed-off-by: Zach Brown <zab@versity.com>	2024-01-25 11:23:59 -08:00
Zach Brown	8d3e6883c6	Merge pull request #159 from versity/auke/trans_hold Fix ret output for scoutfs_trans_hold trace pt.	2024-01-09 09:23:32 -08:00
Auke Kok	8747dae61c	Fix ret output for scoutfs_trans_hold trace pt. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-01-08 16:27:41 -08:00
Zach Brown	fffcf4a9bb	Merge pull request #158 from versity/zab/kasan_stack_oob_get_reg Ignore spurious KASAN unwind warning	2023-11-22 10:04:18 -08:00
Zach Brown	b552406427	Ignore spurious KASAN unwind warning KASAN could raise a spurious warning if the unwinder started in code without ORC metadata and tried to access in the KASAN stack frame redzones. This was fixed upstream but we can rarely see it in older kernels. We can ignore these messages. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-21 12:25:16 -08:00
Zach Brown	d812599e6b	Merge pull request #157 from versity/zab/dmsetup_test_devices Zab/dmsetup test devices	2023-11-21 10:13:02 -08:00
Zach Brown	03ab5cedb6	clean up createmany-parallel-mounts test This test is trying to make sure that concurrent work isn't much, much, slower than individual work. It does this by timing creating a bunch of files in a dir on a mount and then timing doing the same in two mounts concurrently. But it messed it up the concurrency pretty badly. It had the concurrent createmany tasks creating files with a full path. That means that every create is trying to read all the parent directories. The way inode number allocation works means that one of the mounts is likely to be getting a write lock that includes a shared parent. This created a ton of cluster lock contention between the two tasks. Then it didn't sync the creates between phases. It could be accidentally recording the time it took to write out the dirty single creates as time taken during the parallel creates. By syncing between phases and having the createmany tasks create files relative to their per-mount directories we actually perform concurrent work and test that we're not creating contention outside of the task load. This became a problem as we switched from loopback devices to device mapper devices. The loopback writers were using buffered writes so we were masking the io cost of constantly invalidating and refilling the item cache by turning the reads into memory copies out of the page cache. While we're in here we actually clean up the created files and then use t_fail to fail the test while the files still exist so they can be examined. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-15 15:12:57 -08:00
Zach Brown	2b94cd6468	Add loop module kernel message filter Now that we're not setting up per-mount loopback devices we can not have the loop module loaded until tests are running. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-15 13:39:38 -08:00
Zach Brown	5507ee5351	Use device-mapper for per-mount test devices We don't directly mount the underlying devices for each mount because the kernel notices multiple mounts and doesn't setup a new super block for each. Previously the script used loopback devices to create the local shared block construct 'cause it was easy. This introduced corruption of blocks that saw concurrent read and write IOs. The buffered kernel file IO paths that loopback eventually degrades into by default (via splice) could have buffered readers copying out of pages without the page lock while writers modified the page. This manifest as occasional crc failure of blocks that we knowingly issue concurrent reads and writes to from multiple mounts (the quorum and super blocks). This changes the script to use device-mapper linear passthrough devices. Their IOs don't hit a caching layer and don't provide an opportunity to corrupt blocks. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-15 13:39:38 -08:00
Zach Brown	1600a121d9	Merge pull request #156 from versity/zab/large_fragmented_free_hung_task Extend hung task timeout for large-fragmented-free	2023-11-15 09:49:13 -08:00
Zach Brown	6daf24ff37	Extend hung task timeout for large-fragmented-free Our large fragmented free test creates pathologically file extents which are as expensive as possible to free. We know that debugging kernels can take a long time to do this so we can extend the hung task timeout. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-14 15:01:37 -08:00
Zach Brown	cd5d9ff3e0	Merge pull request #154 from versity/zab/srch_test_fixes Zab/srch test fixes	2023-11-13 09:47:46 -08:00
Zach Brown	d94e49eb63	Fix quoted glob in srch-basic-functionality One of the phases of this test wanted to delete files but got the glob quoting wrong. This didn't matter for the original test but when we changed the test to use its own xattr name then those existing undeleted files got confused with other files in later phases of the test. This changes the test to delete the files with a more reliable find pattern instead of using shell glob expansion. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-09 14:16:36 -08:00
Zach Brown	1dbe408539	Add tracing of srch compact struct communication Signed-off-by: Zach Brown <zab@versity.com>	2023-11-09 14:16:33 -08:00
Zach Brown	bf21699ad7	bulk_create_paths test tool takes xattr name Previously the bulk_create_paths test tool used the same xattr name for each category of xattrs it was creating. This created a problem where two tests got their xattrs confused with each other. The first test created a bunch of srch xattrs, failed, and didn't clean up after itself. The second test saw these search xattrs as its own and got very confused when there were far more srch xattrs than it thought it had created. This lets each test specify the srch xattr names that are created by bulk_create_paths so that tests can work with their xattrs independent of each other. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-09 14:15:44 -08:00
Zach Brown	c7c67a173d	Specifically wait for compaction in srch test We just added a test to try and get srch compaction stuck by having an input file continue at a specific offset. To exercise the bug the test needs to perform 6 compactions. It needs to merge 4 sets of logs into 4 sorted files, it needs to make partial progress merging those 4 sorted files into another file, and then finall attempt to continue compacting from the partial progress offset. The first version of the test didn't necessarily ensure that these compactions happened. It created far too many log files then just waited for time to pass. If the host was slow then the mounts may not make it through the initial logs to try and compact the sorted files. The triggers wouldn't fire and the test would fail. These changes much more carefully orchestrate and watch the various steps of compaction to make sure that we trigger the bug. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-09 14:13:13 -08:00

1 2 3 4 5 ...

1881 Commits