scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-05-01 10:25:43 +00:00

Author	SHA1	Message	Date
Zach Brown	8e34c5d66a	Use quorum slots and background election work Previously quorum configuration specified the number of votes needed to elected the leader. This was an excessive amount of freedom in the configuration of the cluster which created all sorts of problems which had to be designed around. Most acutely, though, it required a probabilistic mechanism for mounts to persistently record that they're starting a server so that future servers could find and possibly fence them. They would write to a lot of quorum blocks and trust that it was unlikely that future servers would overwrite all of their written blocks. Overwriting was always possible, which would be bad enough, but it also required so much IO that we had to use long election timeouts to avoid spurious fencing. These longer timeouts had already gone wrong on some storage configurations, leading to hung mounts. To fix this and other problems we see coming, like live membership changes, we now specifically configure the number and identity of mounts which will be participating in quorum voting. With specific identities, mounts now have a corresponding specific block they can write to and which future servers can read from to see if they're still running. We change the quorum config in the super block from a single quorum_count to an array of quorum slots which specify the address of the mount that is assigned to that slot. The mount argument to specify a quorum voter changes from "server_addr=$addr" to "quorum_slot_nr=$nr" which specifies the mount's slot. The slot's address is used for udp election messages and tcp server connections. Now that we specifically have configured unique IP addresses for all the quorum members, we can use UDP messages to send and receive the vote mesages in the raft protocol to elect a leader. The quorum code doesn't have to read and write disk block votes and is a more reasonable core loop that either waits for received network messages or timeouts to advance the raft election state machine. The quorum blocks are now used for slots to store their persistent raft term and to set their leader state. We have event fields in the block to record the timestamp of the most recent interesting events that happened to the slot. Now that raft doesn't use IO, we can leave the quorum election work running in the background. The raft work in the quorum members is always running so we can use a much more typical raft implementation with heartbeats. Critically, this decouples the client and election life cycles. Quorum is always running and is responsible for starting and stopping the server. The client repeatedly tries to connect to a server, it has nothing to do with deciding to participate in quorum. Finally, we add a quorum/status sysfs file which shows the state of the quorum raft protocol in a member mount and has the last messages that were sent to or received from the other members. Signed-off-by: Zach Brown <zab@versity.com>	2021-02-18 12:57:30 -08:00
Andy Grover	cf278f5fa0	scoutfs: Tidy some enum usage Prefer named to anonymous enums. This helps readability a little. Use enum as param type if possible (a couple spots). Remove unused enum in lock_server.c. Define enum spbm_flags using shift notation for consistency. Rename get_file_block()'s "gfb" parameter to "flags" for consistency. Signed-off-by: Andy Grover <agrover@versity.com>	2020-11-30 13:35:44 -08:00
Andy Grover	9f151fde92	scoutfs: Use separate block devices for metadata and data Require a second path to metadata bdev be given via mount option. Verify meta sb matches sb also written to data sb. Change code as needed in super.c to allow both to be read. Remove check for overlapping meta and data blknos, since they are now on entirely separate bdevs. Use meta_bdev for superblock, quorum, and block.c reads and writes. Signed-off-by: Andy Grover <agrover@versity.com>	2020-11-19 11:41:20 -08:00
Zach Brown	99bc710f03	scoutfs: remove tiny btree block option It used to take significant effort to create very tall btrees because they only stored small references to large LSM segments. Now they store all file system metadata and we can easily create sufficiently large btrees for testing. We don't need the tiny btree option. Signed-off-by: Zach Brown <zab@versity.com>	2020-08-26 14:39:12 -07:00
Zach Brown	f0a86f05f8	scoutfs: remove unused uniq_name option Signed-off-by: Zach Brown <zab@versity.com>	2019-08-20 15:52:13 -07:00
Zach Brown	5929a36747	scoutfs: add server_addr mount option Add a server_addr mount option that takes an ipv4 address. This will be used by the upcoming changes to quorum voting to indicate that a mount should participate in voting and to specify the address that its server should listen on. Signed-off-by: Zach Brown <zab@versity.com>	2019-08-20 15:52:13 -07:00
Zach Brown	288d781645	scoutfs: start and stop server with quorum Currently all mounts try to get a dlm lock which gives them exclusive access to become the server for the filesystem. That isn't going to work if we're moving to locking provided by the server. This uses quorum election to determine who should run the server. We switch from long running server work blocked trying to get a lock to calls which start and stop the server. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	e9f6e79d67	scoutfs: add uniq_name mount option Each mount is getting a specified unique name. This can be used to identify a reconnecting mount that indicates that an old instance of the same unique name can no longer exist and doesn't need to be fenced. Signed-off-by: Zach Brown <zab@versity.com>	2019-04-12 10:54:07 -07:00
Zach Brown	c118f7cc03	scoutfs: add option to force tiny btree blocks Add a tunable option to force using tiny btree blocks on an active mount. This lets us quickly exercise large btrees. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-13 08:59:03 -07:00
Zach Brown	31286ad714	scoutfs: add options debugfs dir Add a debugfs dir that will offer debugging options for an actively mounted volume. Signed-off-by: Zach Brown <zab@versity.com>	2018-04-13 08:59:03 -07:00
Mark Fasheh	e711c15acf	scoutfs: use dlm for locking To actually use it, we first have to copy symbols over from the dlm build into the scoutfs source directory. Make that happen automatically for us in the Makefile. The only users of locking at the moment are mount, unmount and xattr read/write. Adding more locking calls should be a straight-forward endeavor. The LVB based server ip communication didn't work out, and LVBS as they are written don't make sense in a range locking world. So instead, we record the server ip address in the superblock. This is protected by the listen lock, which also arbitrates which node will be the manifest server. We take and drop the dlm lock on each lock/unlock call. Lock caching will come in a future patch. Signed-off-by: Mark Fasheh <mfasheh@versity.com>	2017-06-23 15:08:02 -05:00

11 Commits