From 1be8004ca7bd6062e2fddde9ad669c55bee14138 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 6 Oct 2015 13:47:18 +0000 Subject: [PATCH] scst/README_in-tree: Minimize diffs with scst/README git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@6532 d57e44dd-8a1f-0410-8b47-8ef2f437770f --- scst/README_in-tree | 334 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 330 insertions(+), 4 deletions(-) diff --git a/scst/README_in-tree b/scst/README_in-tree index f21545b5f..efb12a791 100644 --- a/scst/README_in-tree +++ b/scst/README_in-tree @@ -259,7 +259,7 @@ SCST sysfs interface -------------------- SCST sysfs interface designed to be self descriptive and self -containing. This means that a high level managament tool for it can be +containing. This means that a high level management tool for it can be written once and automatically support any future sysfs interface changes (attributes additions or removals, new target drivers and dev handlers, etc.) without any modifications. Scstadmin is an example of @@ -1046,11 +1046,21 @@ Each vdisk_fileio's device has the following attributes in - size_mb - contains size of this virtual device in MB. + - pr_file_name - Full path of the file or block device in which to store + persistent reservation information. The default value for this attribute is + /var/lib/scst/pr/${device_name}. Writing a new value into this sysfs + attribute is only allowed if the device is not exported. Modifying this + sysfs attribute causes the persistent reservation state to be reloaded. + - t10_dev_id - contains and allows to set T10 vendor specific identifier for Device Identification VPD page (0x83) of INQUIRY data. By default VDISK handler always generates t10_dev_id for every new created device at creation time based on the device name and scst_vdisk_ID scst_vdisk.ko module parameter (see below). + Note: some initiators, e.g. VMware's ESXi or MS Hyper-V, only looks + at the first eight characters of t10_dev_id. You have to make sure + that these first eight characters are unique or VMware will consider + these devices as identical. - eui64_id - allows to set the EUI-64 based device identifier in the SCSI device identification VPD page (83h). This identifier must be 8, @@ -1258,6 +1268,286 @@ persistent reservations from this device are released, upon reconnect the initiators will see it. +Implicit ALUA Support +--------------------- + +SCST supports implicit asymmetric logical unit access (ALUA). Implicit ALUA is +a feature defined by the ANSI T10 SCSI committee that allows a target to tell +the initiator which path to use in a multipath setup. The redundant paths +between initiator and target can be used either for redundancy or for load +sharing purposes. The target can either be a single target system running SCST +with multiple communication interfaces or two target systems each running SCST +and configured in a high availability setup. + +In the SPC-4 standard the following concepts are defined related to ALUA: +* Relative target port ID. A number between 1 and 65535 that uniquely + identifies a target port. These numbers must be unique over the target as + a whole, even if that target consists of multiple systems each running SCST. +* Target port group asymmetric access state. One of active/optimized, + active/non-optimized, standby, unavailable, logical block dependent or + offline. The access state of a port defines which (if any) SCSI commands + will be processed by the target port. +* Target port preference indicator. This indicator is additional information + next to the asymmetric access state that is provided by the target to an + initiator and that may impact the decision taken by the initiator about + which path that will be chosen. + +More detailed information about ALUA can be found in section 5.11.2 of the +ANSI T10 standard called SPC-4. + +ALUA support in SCST +.................... + +SCST allows to define implicit ALUA settings for each unique combination of +SCST device and SCST target. An initiator however queries ALUA settings by +sending an appropriate SCSI command to a specific LUN of an SCST target. Each +such LUN maps uniquely to an SCST device. For hardware SCST target drivers, +e.g. ib_srpt, there is a one-to-one correspondence between SCST target and +SCSI target port. With other SCST targets, e.g. iSCSI-SCST, by default the +only relationship between SCST targets and SCSI target ports is that all SCST +targets defined on a system are visible via all SCSI target ports. See also +the iSCSI-SCST documentation about the allowed_portal attribute for +information about how to associate iSCSI targets with a single physical +interface. + +Notes: +- In a H.A. setup it is the responsibility of the user to synchronize ALUA + information between the individual systems running SCST. There are no + provisions in SCST to exchange ALUA information automatically between + individual systems. +- In order to support H.A. setups it is possible to let one SCST system + report information about target ports present in other SCST systems. +- With SCST, and certainly in a H.A. setup, it is possible to configure ALUA + such that an initiator receives information that is not standard compliant, + e.g. setting all target ports in the offline state. It is the responsibility + of the user to make sure that the information queried by an initiator is + consistent independent of the LUN and the target port used by the initiator + to query this information. +- Before building a H.A. setup consisting of two or more SCST systems one + should evaluate whether it's acceptable that persistent reservation commands, + SCSI task management commands and MODE SELECT commands will only be processed + by a single node instead of being processed by all nodes. + +Configuring ALUA in SCST +........................ + +SCST allows to configure the following settings related to implicit ALUA +for each unique combination of SCST target and virtual SCST device +(vdisk_fileio, vdisk_blockio, vcdrom, ...): +* The target port group asymmetric access state. SCST supports all ALUA port + states except logical block dependent. +* The preference indicator for a target port group. +* The relative target port ID associated with the SCST target. + +It is possible to configure the following ALUA-related information via the +sysfs interface of SCST: +* Device groups, where each device group has a name and contains zero or more + SCST devices. If a device group contains only a single SCST device, the name + of the group may be identical to the device name. See also + /sys/kernel/scst_tgt/device_groups/mgmt. +* Which devices are inside a device group. See also + /sys/kernel/scst_tgt/device_groups//devices/mgmt. +* Target groups, where each target group has a name and contains zero or more + SCST target names. See also + /sys/kernel/scst_tgt/device_groups//target_groups/mgmt. +* Target port group identifier. This is a number in the range 0..65535 and is + called the TARGET PORT GROUP in SPC-4. See also + /sys/kernel/scst_tgt/device_groups//target_groups//group_id. +* Target port group preference indicator. This is a boolean value called the + PREF bit in SPC-4. See also /sys/kernel/scst_tgt/device_groups//target_groups//preferred. +* Target port group state name. One of active, nonoptimized, standby, + unavailable, offline or transitioning. See also + /sys/kernel/scst_tgt/device_groups//target_groups//state. +* Target group contents - zero or more target names. The target names either + exist on the local system or on a remote system in a H.A. setup. For target + names that refer to SCST targets on another system only the relative target + port identifier matters, not the assigned name. See also + /sys/kernel/scst_tgt/device_groups//target_groups//mgmt. +* Relative target identifier. See also + /sys/kernel/scst_tgt/device_groups//target_groups///rel_tgt_id. + +The steps involved in configuring ALUA are: +* Identify the SCST devices that will always share the same ALUA settings and + state. Assign a name to each such group of SCST devices. If a device group + only contains a single device, the group name may be identical to the device + name. +* Configure that device group in SCST via sysfs. +* Identify the SCSI target ports that will always share the same ALUA settings + and state. Assign a name, a group ID and preference indicator to each such + SCSI target port group. +* Configure the target port group information in SCST via sysfs. +* Identify all SCST targets that can be accessed via a target port group. +* Assign all these SCST target names to the target group via sysfs. +* Assign a relative target port identifier to each target. + +As an example, in a H.A. setup with two systems each having one InfiniBand +HCA controlled by the ib_srpt driver and where each system exports two LUNs +the following configuration can be used in scst.conf on both systems: + +DEVICE_GROUP dgroup1 { + DEVICE disk01 + + TARGET_GROUP tgroup1 { + group_id 256 + preferred 1 + state active + TARGET fe80:0000:0000:0000:0002:c903:00fa:b7e1 { + rel_tgt_id 1 + } + } + TARGET_GROUP tgroup2 { + group_id 257 + state standby + TARGET fe80:0000:0000:0000:0002:c903:00fa:b7f2 { + rel_tgt_id 2 + } + } +} + +DEVICE_GROUP dgroup2 { + DEVICE disk02 + + TARGET_GROUP tgroup1 { + group_id 256 + state standby + TARGET fe80:0000:0000:0000:0002:c903:00fa:b7e1 { + rel_tgt_id 1 + } + } + TARGET_GROUP tgroup2 { + group_id 257 + preferred 1 + state active + TARGET fe80:0000:0000:0000:0002:c903:00fa:b7f2 { + rel_tgt_id 2 + } + } +} + + +Checking the Target Configuration +................................. + +One way to verify the implicit ALUA configuration from a Linux initiator is +via the commands provided in the sg3_utils package. The first step is to +verify whether for a certain LUN implicit ALUA has been configured on the +target. This is possible by checking whether the TPGS=1 text appears in the +sg_inq output, where /dev/sdb is a device node created by the ib_srp initiator: + +# sg_inq /dev/sdb +standard INQUIRY: + PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3] + [AERC=0] [TrmTsk=0] NormACA=0 HiSUP=1 Resp_data_format=2 + SCCS=0 ACC=0 TPGS=1 3PC=0 Protect=0 BQue=0 + EncServ=0 MultiP=0 [MChngr=0] [ACKREQQ=0] Addr16=1 + [RelAdr=0] WBus16=0 Sync=0 Linked=0 [TranDis=0] CmdQue=1 + [SPI: Clocking=0x0 QAS=0 IUS=0] + length=66 (0x42) Peripheral device type: disk + Vendor identification: SCST_FIO + Product identification: disk01 + Product revision level: 300 + Unit serial number: 27cddc71 + +The next step is to verify the target group configuration. That is possible +by verifying whether the output of the sg_rtpg command matches the values +configured on the target: + +# sg_rtpg /dev/sdb +Report target port groups: + target port group id : 0x100 , Pref=1 + target port group asymmetric access state : 0x00 + T_SUP : 0, O_SUP : 0, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 1, AO_SUP : 1 + status code : 0x02 + vendor unique status : 0x00 + target port count : 01 + Relative target port ids: + 0x01 + target port group id : 0x101 , Pref=0 + target port group asymmetric access state : 0x00 + T_SUP : 0, O_SUP : 0, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 1, AO_SUP : 1 + status code : 0x02 + vendor unique status : 0x00 + target port count : 01 + Relative target port ids: + 0x02 + +The relative target port ID and the target port group ID for a certain path +can be queried e.g. as follows: + +# sg_vpd -p di /dev/sdb +Device Identification VPD page: + Addressed logical unit: + designator type: T10 vendor identification, code set: ASCII + vendor id: SCST_FIO + vendor specific: 27cddc71-disk01 + designator type: EUI-64 based, code set: Binary + 0x3237636464633731 + Target port: + designator type: Relative target port, code set: Binary + Relative target port: 0x1 + designator type: Target port group, code set: Binary + Target port group: 0x100 + + +Initiator Support +................. + +On Linux systems implicit ALUA support is provided by the scsi_dh_alua kernel +driver in combination with the user space multipathd daemon. You will have to +modify at least the following in /etc/multipath.conf to enable implicit ALUA: +* hardware_handler "1 alua" +* prio alua +* path_grouping_policy group_by_prio +* path_checker tur + +Notes: +- Newer versions of multipathd support a parameter called + "detect_prio". It can be more convenient to enable this parameter instead of + setting the parameter "prio" to "alua" for only those LUNs that support ALUA. +- Older versions of multipathd (e.g. RHEL 5 and SLES 10 SP1) need + 'prio_callout "/sbin/mpath_prio_alua /dev/%n"' instead of 'prio alua'. + +# multipath -ll +23237636464633731 dm-3 SCST_FIO,disk01 +size=1.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw +|-+- policy='service-time 0' prio=1 status=active +| `- 10:0:0:0 sdd 8:48 active ready running +`-+- policy='service-time 0' prio=130 status=enabled + `- 11:0:0:0 sde 8:64 active ready running +23133326137346538 dm-4 SCST_FIO,disk02 +size=1.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw +|-+- policy='service-time 0' prio=130 status=active +| `- 10:0:0:2 sdn 8:208 active ready running +`-+- policy='service-time 0' prio=1 status=enabled + `- 11:0:0:2 sdp 8:240 active ready running + +The following information can be derived from the above output: +* That the hardware handler (hw_handler) has been set to "1 alua". +* That multipathd created two priority groups - one with priority 1 and one + with priority 130. +* That the SRP path with SCSI host number 10 will be used for communication + with LUN "disk01" and that the SRP path with SCSI host number 11 will be used + for communication with LUN "disk02". + +More information about how to configure the device mapper and the scsi_dh_alua +driver can be found in the manual of your Linux distribution ("man +multipath.conf", "man multipath" and "man multipathd"). + +Windows initiator systems support ALUA from Windows Server 2008 on. For more +information about ALUA support in Windows Server, see also: +* Microsoft, Windows Server 2008 R2 Multipath I/O Overview, MSDN + (http://technet.microsoft.com/en-us/library/cc725907.aspx). +* Microsoft, Multipathing Support in Windows Server 2008, July 2008, MSDN + (http://blogs.msdn.com/b/san/archive/2008/07/27/multipathing-support-in-windows-server-2008.aspx). +* Microsoft, ALUA MPIO Logo Test, MSDN + (http://msdn.microsoft.com/en-us/library/gg607458%28v=vs.85%29.aspx). + + Caching ------- @@ -1345,6 +1635,41 @@ Note, on some real-life workloads write through caching might perform better, than write back one with the barrier protection turned on. +Errors caching +.............. + +When using virtual device in FILEIO mode, the Linux page cache comes +into picture. The negative side of it is that it's sometimes also +caching errored pages. That is, if the underlying file experiences IO +errors, those errors might be cached by the Linux page cache. As a +result, even when the underlying file recovers and stops failing IOs, +the initiator may still hit IO errors returned by the Linux page cache, +until the cache re-reads the errored pages (usually it happens pretty +soon, but not immediately). To make sure that cached pages are dropped, +one of the following can be done: + +- Detach the SCSI virtual device (del_device) and re-attach it + (add_device). This should evict all the cached pages, unless somebody + else holds the same "filename" opened. + +- Issue a BLKFLSBUF ioctl to the same "filename" you provided for "add_device". + +For the second option, a rudimentary C code is required: + +fd = open(filename, O_RDWR); +if (fd < 0) { + err = errno; + ... +} else { + err = ioctl(fd, BLKFLSBUF); + if (err < 0) { + err = errno; + ... + } + close(fd); +} + + BLOCKIO VDISK mode ------------------ @@ -1386,9 +1711,9 @@ IMPORTANT: If SCST 1.x BLOCKIO worked by default in NV_CACHE mode, when non-NV_CACHE mode, when each device reported to remote initiators as having write back caching, and synchronizes the internal device's cache on each SYNCHRONIZE_CACHE command - from the initiators. It might lead to some PERFORMANCE LOSS, + from the initiators. It might lead to some *PERFORMANCE LOSS*, so if you are are sure in your power supply and want to - restore 1.x behavior, your should recreate your BLOCKIO + restore the 1.x behavior, your should recreate your BLOCKIO devices in NV_CACHE mode. @@ -1631,7 +1956,7 @@ sessions, which is enough. 7. For hardware on target. - Make sure that your target hardware (e.g. target FC or network card) - and underlaying IO hardware (e.g. IO card, like SATA, SCSI or RAID to + and underlying IO hardware (e.g. IO card, like SATA, SCSI or RAID to which your disks connected) don't share the same PCI bus. You can check it using lspci utility. They have to work in parallel, so it will be better if they don't compete for the bus. The problem is not @@ -1668,6 +1993,7 @@ IMPORTANT: If you use on initiator some versions of Windows (at least W2K) See also important notes about setting block sizes >512 bytes for VDISK FILEIO devices above. + 9. In some cases, for instance working with SSD devices, which consume 100% of a single CPU load for data transfers in their internal threads, to maximize IOPS it can be needed to assign for those threads dedicated