scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-01-06 04:04:59 +00:00

Author	SHA1	Message	Date
Benjamin LaHaise	e064c439ff	scoutfs: mmap: add support for writable shared mmap()ings Add support for writable MAP_SHARED mmap()ings. Avoid issues with late writepage()s building transactions by doing the block_write_begin() work in scoutfs_data_page_mkwrite(). Ensure the page is marked dirty and prepared for write, then let the VM complete the write when the page is flushed or invalidated. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2021-03-31 10:43:47 -07:00
Benjamin LaHaise	91e68b1f83	mmap: add support for read only mmap() Add support for read only mmap(). Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2021-03-31 10:42:37 -07:00
Zach Brown	c98e75006e	scoutfs: remove lock_client entries in commit The lock server maintains some items in btrees in the server. It is usually called by the server core during a commit so it doesn't need to worry about managing commits. But the lock recovery timeout code happens in its own async context. It needs to protect the lock_client item removals with a commit. This was causing failures during xfstests that simulate node crashes by unmounting with dm-flakey. Lock recovery would dirty blocks in the btree writer outside of a commit. The first server commit holder would find dirty blocks and throw an assertion indicating that someone modified blocks without holding a commit. Signed-off-by: Zach Brown <zab@versity.com>	2020-06-18 14:07:43 -07:00
Zach Brown	ff9386faba	scoutfs: export server commit holds The calls for holding and applying commits in the server are currently private. The lock server is a server component that has been seperated out into its own file. Most of the time the server calls it during commits so the btree changes made in the lock server are protected by the commits. But there are btree calls in the lock server that happen outside of calls from the server. Exporting these calls will let the lock server make all its btree changes in server commits. Signed-off-by: Zach Brown <zab@versity.com>	2020-06-18 14:07:43 -07:00
Benjamin LaHaise	f5863142be	scoutfs: add data_wait_err for reporting errors Add support for reporting errors to data waiters via a new SCOUTFS_IOC_DATA_WAIT_ERR ioctl. This allows waiters to return an error to readers when staging fails. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> [zab: renamed to data_wait_err, took ino arg] Signed-off-by: Zach Brown <zab@versity.com>	2020-05-29 13:50:13 -07:00
Zach Brown	d16b18562d	scoutfs: make sure forest sees dirty log tree Item writes are first stored in dirty blocks in the private version of the mount's log tree. Local readers need to be sure to check the dirty version of the mount's log tree to make sure that they see the result of writes. Usually trees are found by walking the log tree items stored in another btree in the super. The private dirty version of a mount's log tree hasn't been committed yet and isn't visible in these items. The forest uses its lock private data to track which lock has seen items written and so should always check the local dirty log tree when reading. The intent was to use the per-lock static forest_root for the log tree to record that it had been marked by a write and was then always used for reads. We used storing the forest info's rid and testing for a non-zero forest_root rid as the mechanism for always testing the dirty log root during read. But we weren't setting the forest info rid as each transaction opened. It was always 0 so readers never added the dirty log tree for reading. The fix is to use the more reliable indication that the log root has items for us by testing the flag that all the bits have been set. Then we're also sure to always set the rid/nr of the forest_info record of our log tree, and the per-lock forest_root copy of it whenever we use it. This fixed spurious errors we were seeing as creates tried to read the item they just wrote as memory reclaim freed locks. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-29 12:02:47 -07:00
Zach Brown	e3b1f2e2b0	scoutfs: add counters for radix enospc Add counters for the various sources of ENOSPC from the radix block allocator. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-22 16:08:03 -07:00
Zach Brown	9ad86d4d29	scoutfs: commit trans before premature enospc File data allocations come from radix allocators which are populated by the server before each client transation. It's possible to fully consume the data allocator within one transaction if the number of dirty metadata blocks is kept low. This could result in premature ENOSPC. This was happening to the archive-light-cycle test. If the transactions performed by previous tests lined up just right then the creation of the initial test files could see ENOSPC and cause all sorts of nonsense in the rest of the test, culminating in cmp commands stuck in offline waits. This introduces high and low data allocator water marks for transactions. The server tries to fill data allocators for each transaction to the high water mark and the client forces the commit of a transaction if its data allocator falls below the low water mark. The archive-light-cycle test now passes easily and we see the trans_commit_data_alloc_low counter increasing during the test. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-22 16:08:03 -07:00
Zach Brown	7da8ddb8a1	scoutfs: fix data.h include guard The identifier for data.h's include guard was brought over from an old file and still had the old name. Update it to reflect it's use in data, not filerw. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-22 16:08:03 -07:00
Zach Brown	495358996c	scoutfs: fix older kc readdir emit When we added the kernelcompat layer around the old and new readdir interfaces there was some confusion in the old readdir interface filldir arguments. We were passing in our scoutfs dent item struct pointer instead of the filldir callback buf pointer. This prevented readdir from working in older kernels because filldir would immediately see a corrupt buf and return an error. This renames the emit compat macro arguments to make them consistent with the other calls and readdir now provides the correct pointer to the emit wrapper. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-21 16:28:06 -07:00
Zach Brown	d2a15ea506	scoutfs: fix depth-first radix next bit search The radix block next bit search could return a spurious -ENOENT if it ran out of references in a parent block further down the tree. It needs to bubble up to try the next ref in its parent so that it keeps performing a depth-first search of the entire tree. This lead to an assertion being tripped in _radix_merge. Getting an early -ENOENT caused it to start searching from 0 again. When it's iterating over a read-only input it could find the same leaf and try to clear source bits that were already cleared. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00
Zach Brown	2c5e3aa551	scoutfs: trace radix merge input root and leaf bit Add a bit more detail to the radix merge trace. It was missing the input block and leaf bit. Also use abbreviations of the fields in the trace output so that it's slightly less enormous. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00
Zach Brown	2478d124dd	scoutfs: use random radix block ref seqs The seq portion of radix block references is intended to differentiate versions of a given block location over time. The current method of incrementing the existing value as the block is dirtied is risky. It means that every lineage of a block has the same sequence number progression. Different trees referencing the same block over time could get confused. It's more robust to have large random numbers. The collision window is then evenly distributed over the 64bit space rather than being bunched up all in in the initial seq values. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00
Zach Brown	968e719a9a	scoutfs: check for bad radix merge count When we're merging bits that are set in a read-only input tree then we can't try to merge more bits than exist in the input tree. That'll cause us to loop around and double-free bits. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00
Zach Brown	4c1f78afd4	scoutfs: use our own _le bitmap xor helper We were using bitmap_xor() to set and clear blocks of allocator bits at a time. bitmap_xor() is a ternary function with two const input pointers and we were providing the changing destination as a const input pointer. That doesn't seem wise. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-16 10:33:28 -07:00
Zach Brown	66f8b3814c	scoutfs: remove warning on reading while staging An incorrect warning condition was added as fallocate was implemented. It tried to warn against trying to read from the staging ioctl. But the staging boolean is set on the inode when the staging ioctl has the inode mutex. It protects against writes, but page reading doesn't use the mutex. It's perfectly acceptable for reads to be attempted while the staging ioctl is busy. We rely on it for a large read to consume staging being written. The warning caused reads to fail while the stager ioctl was working. Typically this would hit read-ahead and just force sync reads. But it could hit sync reads and cause EIO. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-10 12:02:18 -07:00
Zach Brown	192453e717	scoutfs: add server error messages Add specific error messages for failures that can happen as the server commits log trees from the client. These are severe enough that we'd like to know about them. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-10 12:02:18 -07:00
Zach Brown	ae9e060fbf	scoutfs: read and write supers as sm blocks Back in ancient LSM times these functions to read and write the super block reused the bio functions that LSM segment IO used. Each IO would be performed with privately allocated pages and bios. When we got rid of the LSM code we got rid of the bio functions. It was quick and easy to transition super read/write to use buffer_heads. This introduced sharing of the super's buffer_head between readers and writers. First we saw concurrent readers being confused by the uptodate bit and added a bunch of complexity to coordinate use of the uptodate bit. Now we're seeing the writer copy its super for writing into the buffer that readers are using, causing crc failures on read. Let's not use buffer_heads anymore (always good advice). We added quick block functions to read and write small blocks with private pages and bios. Use those here to read and write the super so that readers and writers operate on their own buffers again. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-07 14:06:00 -07:00
Zach Brown	44ac668afa	scoutfs: add small private block io read and write Add two quick functions which perform IO on small fixed size 4K blocks to or from the caller's buffer with privately allocated pages and bios. Callers have no interaction with each other. This matches the behaviour expected by callers of scoutfs_read_super and _write_super. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-07 14:06:00 -07:00
Zach Brown	6ae0ac936c	scoutfs: fix setattr offline extent length We miscalculated the length of extents to create when initializing offline extents for setattr_more. We were clamping the extent length in each packed extent item by the full size of the offline extent, ignoring the iblock position that we were starting from. Signed-off-by: Zach Brown <zab@versity.com>	2020-04-03 14:46:48 -07:00
Zach Brown	6228f7cde7	scoutfs: create offline extents after arg checks With the introduction of packed extent items the setattr_more ioctl had to be careful not to try and dirty all the extent items in one transaction. But it pulled the extent creation call up to high and was doing it before some argument checks that were done after the inode was refreshed by acquiring its lock. This moves the extent creation to be done after the args are verified for the inode. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-16 15:48:19 -07:00
Zach Brown	88422c6405	scoutfs: fiemap with no extents returns 0 Don't return -ENOENT from fiemap on a file with no extents. The operation is supposed to succeed with no extents. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-16 15:48:19 -07:00
Zach Brown	ef1dc677d0	scoutfs: store initialied offline unpacked extents The setattr_more ioctl has its own helper for creating uninitialized extents when we know that there can't be any other existing extents. We don't have to worry about freeing blocks they might have referenced. This helper forgot to actually store the modified extents back into packed extent items after setting extents offline. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-16 15:48:19 -07:00
Zach Brown	462749cb87	scoutfs: add stage and release tracing Add a bit more tracing to stage, release, and unwritten extent conversion so we can get a bit more visibility into the threads staging and releasing. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-05 09:02:06 -08:00
Zach Brown	ac3466921a	scoutfs: invalidate stale bloom blocks We need to invalidate old stale blocks we encounter when reading old bloom block references written by other nodes. This is the same consistency mechanism used by btree blocks. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-05 09:02:06 -08:00
Zach Brown	65724c6724	scoutfs: forest comment update A quick update of the comment describing the forest's use of the bloom filter block. It used to be a tree of bloom filter items. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-05 09:02:06 -08:00
Zach Brown	e8b0bbc619	scoutfs: remove unused counters Remove a bunch of unused counters which have accumulated over time as we've worked on the code and forgotten to remove counters. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-05 09:02:06 -08:00
Zach Brown	debac8ab06	scoutfs: free all forest iter pos Forest item iteration allocates iterator positions for each tree root it reads from. The postorder destruction of the iterator nodes wasn't quite right because we were balancing the nodes as they were freed. That can change parent/child relationships and cause postorder iteration to skip some nodes, leaking memory. It would have worked if we just freed the nodes without using rb_erase to balance. The fix is to actually iterate over the rbnodes while using the destroy helper which rebalances as it frees. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-05 09:02:06 -08:00
Zach Brown	e9e515524b	scoutfs: remove unused corruption sources Remove a bunch of constants for sources of corruption that are no longer used in the code. Signed-off-by: Zach Brown <zab@versity.com>	2020-03-05 09:02:06 -08:00
Zach Brown	7cf8d01c1b	scoutfs: fix super read error race The conversion to reading the super with buffer_head IO caused racing readers to risk spurious errors. Clearing uptodate to force device access could race with a current waking reader. They could wake and find uptodate cleared and think that an IO error had occurred. The buffer_head functions generally require higher level serialization of this kind of use of the uptodate bit. We use bh_private as a counter to ensure that we don't clear uptodate while there are active readers. We then also use a private buffer_head bit to satisfy batches of waiting readers with each IO. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-28 11:34:02 -08:00
Zach Brown	d374a7c06f	scoutfs: fix up radix block _first tracking Updating the _first tracking in leaf bits was pretty confusing because we tried to mashing all the tracking updates from all leaf modifications into one shared code path. It had a bug where merging would advance _first tracking by the number of bits merged in the leaf rather than the number of contiguous set bits after the new first. This lead to allocation failures eventually as _first was after actual set bits in the leaf. This fixes that by moving _first tracking updates into the leaf callers that modify bits and to the parent ref updating code. In the process we also fix little bugs in the support code that were found by the radix block consistency checking. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-28 11:34:02 -08:00
Zach Brown	6eac823bd3	scoutfs: add radix block metadata checker Add a quick runtime check of the consistency of the radix block and reference metadata fields. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-28 11:34:02 -08:00
Zach Brown	c10c7d9748	scoutfs: clean up forest lock data The client lock code forgot to call into the forest to clear its per-lock tracking before freeing the lock. This would result in a slow memory leak over time as locks were reclaimed by memory pressure. It shouldn't have affected consistency. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-28 11:34:02 -08:00
Zach Brown	757ee85520	scoutfs: don't lose block wakeups The block end_io path could lose wakeups. Both the bio submission task and a bio's end_io completion could see an io_count > 1 and neither would set the block uptodate before dropping their io_count and waking. It got into this mess because readers were waiting for io_count to drop to 0. We add a io_busy bit which indicates that io is still in flight which waiters now wait for. This gives the final io_count drop a chance to do work before clearing io_busy and dropping their reference before waking. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-28 11:34:02 -08:00
Zach Brown	44a7e2ab56	scoutfs: more carefully handle alloc cursors The first pass at the radix allocator wasn't paying a lot of attention to the allocation cursors. This more carefully manages them. They're only advanced after allocating. Previously the metadata alloc cursor was advanced as it searched through leaves that it might allocate from. We test for wrapping past the specific final allocatable bit, rather than the limit of what the radix height can store. This required pushing knoweldge of metadata or data allocs down through some of the code paths. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	76ed627548	scoutfs: reclaim freed metadata blocks in server Reclaim freed metadata blocks in the server by merging the stable freed tree into the allocator as a commit opens and we can trust that the stable version of the freed allocator in the super is a strict subset of the allocator's dirty freed tree. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	093f8ead58	scoutfs: refactor server commit locking Server processing paths had open coded management of holding and applying transactions. Refactor that into hold_commit() and apply_commit() helpers. It makes the code a whole lot clearer and gives us a place in hold_commit() to add code that needs to be run before anything is modified in a commit on the server. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	ce7f7bdbd3	scoutfs: reclaim client log allocators The server now consistently reclaims free space in client allocator radix trees. It merges the client's freed trees as the client opens a new transaction. And it reclaims all the client's trees when it is removed. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	5b6401b5cd	scoutfs: add missed btree block freeing The conversion of the btree to using allocators missed freeing blocks in two places. As we overwrite dirty new blocks we weren't freeing the old stable block as its reference was overwritten. And as we removed the final item in the tree we weren't freeing the final empty block as it's removed. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	128a2c64f4	scoutfs: restore df/statfs block counts The removal of extent allocators in the server removed the tracking of total free blocks in the system as extents were allocated and freed. This restores tracking of total free blocks by observing the difference in each allocator's sm_total count as a new version is stored during a commit on the server. We change the single free_blocks counter in the super to separate counts of free metadata and data blocks to reflect the metadata and data allocators. The statfs net command is updated. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	300b7bc3ba	scoutfs: remove allocators that used btree items Now that we have the allocators that use radix blocks we can remove all the code that was using btree items to store free block bitmaps. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	85142dcadf	scoutfs: use radix allocator Convert metadata block and file data extent allocations to use the radix allocator. Most of this is simple transitions between types and calls. The server no longer has to initialize blocks because mkfs can write a single radix parent block with fully set parent refs to initialize a full radix. We remove the code and fields that were responsible for adding uninitialized data and metadata. The rest of the unused block allocator code is only ifdefed out. It'll be removed in a separate patch to reduce noise here. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	455a547e8e	scoutfs: add radix allocator Add the allocator that uses bits stored in the leaves of a cow radix. It'll replace two metadata and data allocators that were previously storing allocation bitmap fragments in btree items. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	8681f920e0	scoutfs: add scoutfs_block_move Add a call to move a block's location in the cache without failure. The radix allocator is going to use this to dirty radix blocks while making atomic changes to multipls paths through multiple radix trees. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	809d4be58e	scoutfs: switch block cache to rbtree Switch the block cache from indexing blocks in a radix tree to using an rbtree. We lose the RCU lookups but we gain being able to move blocks around in the cache without allocation failure. And we no longer have the problem of not being able to index large blocks with a 32bit long radix key. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	05a8573054	scoutfs: add block visited bit Add functions for callers to maintain a visited bit in cached blocks. The radix allocator is going to use this to count the number of clean blocks it sees across paths through the radix which can share parent blocks. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	10fd4fcec0	scoutfs: verify read bloom block ref The bloom block reading code forgot to test if the read block was stale. It would trust whatever it read. Now the read when building up roots to use can return stale and retry. Signed-off-by: Zach Brown <zab@versity.com>	2020-02-25 12:03:46 -08:00
Zach Brown	5ed1cb3aaf	scoutfs: remove LSM from README.md Update the summary of the benefit we get from concurrent per-mount commits. Instead of describing it specifically in terms of LSM we abstract it out a bit to make it also true of writing per-mount log btrees. Signed-off-by: Zach Brown <zab@versity.com>	2020-01-17 11:21:36 -08:00
Zach Brown	e034ffa7e9	scoutfs: fix forest iteration The forest item iterator was missing items. Picture the following search pattern: - find a candidate item to return in a root - ignore a greater candidate to return in another root - find the first candidates item's deletion in another root The problem was that finding the deletion item didn't reset the notion that we'd found a key. The next item from the second root was never used because the found key wasn't reset and that root had already searched past the found key. The core architectural problem is that iteration can't examine each item only once given that keys and deletions can be randomly distributed across the roots. The most efficient way to solve the problem is to really sort the iteration positions in each root and then walk those in order. We get the right answer and pay some data structure overhead to perform the minimum number of btree searches. Signed-off-by: Zach Brown <zab@versity.com>	2020-01-17 11:21:36 -08:00
Zach Brown	85178efa19	scoutfs: add more forest tracing Signed-off-by: Zach Brown <zab@versity.com>	2020-01-17 11:21:36 -08:00

1 2 3 4 5 ...

822 Commits