mirror of
https://github.com/SCST-project/scst.git
synced 2026-05-14 09:11:27 +00:00
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@1939 d57e44dd-8a1f-0410-8b47-8ef2f437770f
SCSI RDMA Protocol (SRP) Target driver for Linux
=================================================
The SRP target driver has been designed to work on top of the Linux
InfiniBand kernel drivers -- either the InfiniBand drivers included
with a Linux distribution of the OFED InfiniBand drivers. For more
information about using the SRP target driver in combination with
OFED, see also README.ofed.
The SRP target driver has been implemented as an SCST driver. This
makes it possible to support a lot of I/O modes on real and virtual
devices. A few examples of supported device handlers are:
1. scst_disk. This device handler implements transparent pass-through
of SCSI commands and allows SRP to access and to export real
SCSI devices, i.e. disks, hardware RAID volumes, tape libraries
as SRP LUNs.
2. scst_vdisk, either in fileio or in blockio mode. This device handler
allows to export software RAID volumes, LVM volumes, IDE disks, and
normal files as SRP LUNs.
3. nullio. The nullio device handler allows to measure the performance
of the SRP target implementation without performing any actual I/O.
Installation
------------
Proceed as follows to compile and install the SRP target driver:
1. To minimize QUEUE_FULL conditions, apply the
scst_increase_max_tgt_cmds patch as follows:
cd ${SCST_DIR}
patch -p0 < srpt/patches/scst_increase_max_tgt_cmds.patch
This patch increases SCST's per-device queue size from 48 to 64. This
helps to avoid QUEUE_FULL conditions because the size of the transmit
queue in Linux' SRP initiator is also 64.
Note: the SCSI layer of kernel 2.6.33 will have dynamic queue depth
adjustment. When using SRP initiator systems with kernel 2.6.33 or later,
this patch is less important.
2. Now compile and install SRPT:
cd ${SCST_DIR}
make -s scst_clean scst scst_install
make -s srpt_clean srpt srpt_install
make -s scstadm scstadm_install
3. Edit the installed file /etc/init.d/scst and add ib_srpt to the
SCST_MODULES variable.
4. Configure SCST such that it will be started during system boot:
chkconfig scst on
The ib_srpt kernel module supports the following parameters:
* srp_max_message_size (unsigned integer)
Maximum size of an SRP control message in bytes. Examples of SRP control
messages are: login request, logout request, data transfer request, ...
The larger this parameter, the more scatter/gather list elements can be
sent at once. Use the following formula to compute an appropriate value
for this parameter: 68 + 16 * (max_sg_elem_count). The default value of
this parameter is 2116, which corresponds to an sg list with 128 elements.
* srp_max_rdma_size (unsigned integer)
Maximum number of bytes that may be transferred at once via RDMA. Defaults
to 65536 bytes, which is sufficient to use the full bandwidth of low-latency
HCA's such as Mellanox' ConnectX series. Increasing this value may decrease
latency for applications transferring large amounts of data at once via
direct I/O.
* thread (0 or 1)
Whether incoming SRP requests will be processed in the IB interrupt that
was triggered by the request (thread=0) or on the context of a separate
thread (thread=1). The choice thread=0 results in the best performance,
while thread=1 makes debugging easier. If a kernel oops is triggered inside
an interrupt handler the system will be halted. As a result the call trace
associated with the kernel oops will not be written to the kernel log in
/var/log/messages. When using thread=1 however, the SRPT code runs in thread
context. Any kernel oops generated in thread context will cause the offending
thread to be killed. Other threads will keep running and call traces will be
written to the on-disk kernel log.
* trace_flag (unsigned integer, only available in debug builds)
The individual bits of the trace_flag parameter define which categories of
trace messages should be sent to the kernel log and which ones not.
Configuring the SRP Target System
---------------------------------
First of all, create the file /etc/scst.conf. Below you can find an
example of how you can create this file using the scstadmin tool:
/etc/init.d/scst stop
/etc/init.d/scst start
scstadmin -ClearConfig /etc/scst.conf
scstadmin -adddev disk01 -path /dev/ram0 -handler vdisk -options NV_CACHE
scstadmin -adddev disk02 -path /dev/ram1 -handler vdisk -options NV_CACHE
scstadmin -assigndev disk01 -group Default -lun 0
scstadmin -assigndev disk02 -group Default -lun 1
scstadmin -assigndev 4:0:0:0 -group Default -lun 2
scstadmin -WriteConfig /etc/scst.conf
cat /etc/scst.conf
Now load the new configuration:
/etc/init.d/scst reload
Configuring the SRP Initiator System
------------------------------------
First of all, load the SRP kernel module as follows:
modprobe ib_srp
Next, discover the new SRP target by running the ibsrpdm command:
ibsrpdm -c
Now let the initiator system log in to the target system:
ibsrpdm -c | while read target_info; do echo "${target_info}" > /sys/class/infiniband_srp/${SRP_HCA_NAME}/add_target; done
Finally run lsscsi to display the details of the newly discovered SCSI disks:
lsscsi
SRP targets can be recognized in the output of lsscsi by looking for
the disk names assigned on the SCST target ("disk01" in the example below):
[8:0:0:0] disk SCST_FIO disk01 102 /dev/sdb
Notes:
* You can edit /etc/infiniband/openib.conf to load srp driver and srp HA daemon
automatically ie. set SRP_LOAD=yes, and SRPHA_ENABLE=yes
* To set up and use high availability feature you need dm-multipath driver
and multipath tool
* Please refer to the OFED-1.x user manual for more in-detail instructions
on how to enable and how to use the HA feature. See e.g. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_user_manual_1_40_1.pdf.
Performance Notes - Initiator Side
----------------------------------
* For latency sensitive applications, using the noop scheduler at the initiator
side can give significantly better results than with other schedulers.
* The following parameters have a small but measureable impact on SRP
performance:
* /sys/class/block/${dev}/queue/rq_affinity
* /proc/irq/${ib_int_no}/smp_affinity
Performance Notes - Target Side
----------------------------------
* In some cases, for instance working with SSD devices, which consume 100%
of a single CPU load for data transfers in their internal threads, to
maximize IOPS it can be needed to assign for those threads dedicated
CPUs using Linux CPU affinity facilities. No IRQ processing should be
done on those CPUs. Check that using /proc/interrupts. See taskset
command and Documentation/IRQ-affinity.txt in your kernel's source tree
for how to assign CPU affinity to tasks and IRQs.
The reason for that is that processing of coming commands in SIRQ context
can be done on the same CPUs as SSD devices' threads doing data
transfers. As the result, those threads won't receive all the CPU power
and perform worse.
Alternatively to CPU affinity assignment, you can try to enable SRP
target's internal thread. It will allows Linux CPU scheduler to better
distribute load among available CPUs. To enable SRP target driver's
internal thread you should load ib_srpt module with parameter
"thread=1".
Send questions about this driver to scst-devel@lists.sourceforge.net, CC:
Vu Pham <vuhuong@mellanox.com> and Bart Van Assche <bart.vanassche@gmail.com>.