mirror of
https://github.com/SCST-project/scst.git
synced 2026-05-17 10:41:26 +00:00
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@1390 d57e44dd-8a1f-0410-8b47-8ef2f437770f
183 lines
7.4 KiB
Plaintext
183 lines
7.4 KiB
Plaintext
SCSI RDMA Protocol (SRP) Target driver for Linux
|
|
=================================================
|
|
|
|
The SRP target driver has been designed to work on top of the Linux
|
|
InfiniBand kernel drivers -- either the InfiniBand drivers included
|
|
with a Linux distribution of the OFED InfiniBand drivers. For more
|
|
information about using the SRP target driver in combination with
|
|
OFED, see also README.ofed.
|
|
|
|
The SRP target driver has been implemented as an SCST driver. This
|
|
makes it possible to support a lot of I/O modes on real and virtual
|
|
devices. A few examples of supported device handlers are:
|
|
|
|
1. scst_disk. This device handler implements transparent pass-through
|
|
of SCSI commands and allows SRP to access and to export real
|
|
SCSI devices, i.e. disks, hardware RAID volumes, tape libraries
|
|
as SRP LUNs.
|
|
|
|
2. scst_vdisk, either in fileio or in blockio mode. This device handler
|
|
allows to export software RAID volumes, LVM volumes, IDE disks, and
|
|
normal files as SRP LUNs.
|
|
|
|
3. nullio. The nullio device handler allows to measure the performance
|
|
of the SRP target implementation without performing any actual I/O.
|
|
|
|
|
|
Installation
|
|
------------
|
|
|
|
Proceed as follows to compile and install the SRP target driver:
|
|
|
|
1. To minimize QUEUE_FULL conditions, apply the
|
|
scst_increase_max_tgt_cmds patch as follows:
|
|
|
|
cd ${SCST_DIR}
|
|
patch -p0 < srpt/patches/scst_increase_max_tgt_cmds.patch
|
|
|
|
This patch increases SCST's per-device queue size from 48 to 64. This
|
|
helps to avoid QUEUE_FULL conditions because the size of the transmit
|
|
queue in Linux' SRP initiator is also 64.
|
|
|
|
Note: the SCSI layer of kernel 2.6.33 will have dynamic queue depth
|
|
adjustment. When using SRP initiator systems with kernel 2.6.33 or later,
|
|
this patch is less important.
|
|
|
|
2. Now compile and install SRPT:
|
|
|
|
cd ${SCST_DIR}
|
|
make -s scst_clean scst scst_install
|
|
make -s srpt_clean srpt srpt_install
|
|
make -s scstadm scstadm_install
|
|
|
|
3. Edit the installed file /etc/init.d/scst and add ib_srpt to the
|
|
SCST_MODULES variable.
|
|
|
|
4. Configure SCST such that it will be started during system boot:
|
|
|
|
chkconfig scst on
|
|
|
|
The ib_srpt kernel module supports the following parameters:
|
|
* srp_max_message_size (unsigned integer)
|
|
Maximum size of an SRP control message in bytes. Examples of SRP control
|
|
messages are: login request, logout request, data transfer request, ...
|
|
The larger this parameter, the more scatter/gather list elements can be
|
|
sent at once. Use the following formula to compute an appropriate value
|
|
for this parameter: 68 + 16 * (max_sg_elem_count). The default value of
|
|
this parameter is 2116, which corresponds to an sg list with 128 elements.
|
|
* srp_max_rdma_size (unsigned integer)
|
|
Maximum number of bytes that may be transferred at once via RDMA. Defaults
|
|
to 65536 bytes, which is sufficient to use the full bandwidth of low-latency
|
|
HCA's such as Mellanox' ConnectX series. Increasing this value may decrease
|
|
latency for applications transferring large amounts of data at once via
|
|
direct I/O.
|
|
* thread (0 or 1)
|
|
Whether incoming SRP requests will be processed in the IB interrupt that
|
|
was triggered by the request (thread=0) or on the context of a separate
|
|
thread (thread=1). The choice thread=0 results in the best performance,
|
|
while thread=1 makes debugging easier. If a kernel oops is triggered inside
|
|
an interrupt handler the system will be halted. As a result the call trace
|
|
associated with the kernel oops will not be written to the kernel log in
|
|
/var/log/messages. When using thread=1 however, the SRPT code runs in thread
|
|
context. Any kernel oops generated in thread context will cause the offending
|
|
thread to be killed. Other threads will keep running and call traces will be
|
|
written to the on-disk kernel log.
|
|
* trace_flag (unsigned integer, only available in debug builds)
|
|
The individual bits of the trace_flag parameter define which categories of
|
|
trace messages should be sent to the kernel log and which ones not.
|
|
|
|
|
|
Configuring the SRP Target System
|
|
---------------------------------
|
|
|
|
First of all, create the file /etc/scst.conf. Below you can find an
|
|
example of how you can create this file using the scstadmin tool:
|
|
|
|
/etc/init.d/scst stop
|
|
/etc/init.d/scst start
|
|
|
|
scstadmin -ClearConfig /etc/scst.conf
|
|
scstadmin -adddev disk01 -path /dev/ram0 -handler vdisk -options NV_CACHE
|
|
scstadmin -adddev disk02 -path /dev/ram1 -handler vdisk -options NV_CACHE
|
|
scstadmin -assigndev disk01 -group Default -lun 0
|
|
scstadmin -assigndev disk02 -group Default -lun 1
|
|
scstadmin -assigndev 4:0:0:0 -group Default -lun 2
|
|
scstadmin -WriteConfig /etc/scst.conf
|
|
cat /etc/scst.conf
|
|
|
|
Now load the new configuration:
|
|
|
|
/etc/init.d/scst reload
|
|
|
|
|
|
Configuring the SRP Initiator System
|
|
------------------------------------
|
|
|
|
First of all, load the SRP kernel module as follows:
|
|
|
|
modprobe ib_srp
|
|
|
|
Next, discover the new SRP target by running the ibsrpdm command:
|
|
|
|
ibsrpdm -c
|
|
|
|
Now let the initiator system log in to the target system:
|
|
|
|
ibsrpdm -c | while read target_info; do echo "${target_info}" > /sys/class/infiniband_srp/${SRP_HCA_NAME}/add_target; done
|
|
|
|
Finally run lsscsi to display the details of the newly discovered SCSI disks:
|
|
|
|
lsscsi
|
|
|
|
SRP targets can be recognized in the output of lsscsi by looking for
|
|
the disk names assigned on the SCST target ("disk01" in the example below):
|
|
|
|
[8:0:0:0] disk SCST_FIO disk01 102 /dev/sdb
|
|
|
|
Notes:
|
|
* You can edit /etc/infiniband/openib.conf to load srp driver and srp HA daemon
|
|
automatically ie. set SRP_LOAD=yes, and SRPHA_ENABLE=yes
|
|
* To set up and use high availability feature you need dm-multipath driver
|
|
and multipath tool
|
|
* Please refer to the OFED-1.x user manual for more in-detail instructions
|
|
on how to enable and how to use the HA feature. See e.g. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_user_manual_1_40_1.pdf.
|
|
|
|
|
|
Performance Notes - Initiator Side
|
|
----------------------------------
|
|
|
|
* For latency sensitive applications, using the noop scheduler at the initiator
|
|
side can give significantly better results than with other schedulers.
|
|
|
|
* The following parameters have a small but measureable impact on SRP
|
|
performance:
|
|
* /sys/class/block/${dev}/queue/rq_affinity
|
|
* /proc/irq/${ib_int_no}/smp_affinity
|
|
|
|
|
|
Performance Notes - Target Side
|
|
----------------------------------
|
|
|
|
* In some cases, for instance working with SSD devices, which consume 100%
|
|
of a single CPU load for data transfers in their internal threads, to
|
|
maximize IOPS it can be needed to assign for those threads dedicated
|
|
CPUs using Linux CPU affinity facilities. No IRQ processing should be
|
|
done on those CPUs. Check that using /proc/interrupts. See taskset
|
|
command and Documentation/IRQ-affinity.txt in your kernel's source tree
|
|
for how to assign CPU affinity to tasks and IRQs.
|
|
|
|
The reason for that is that processing of coming commands in SIRQ context
|
|
can be done on the same CPUs as SSD devices' threads doing data
|
|
transfers. As the result, those threads won't receive all the CPU power
|
|
and perform worse.
|
|
|
|
Alternatively to CPU affinity assignment, you can try to enable SRP
|
|
target's internal thread. It will allows Linux CPU scheduler to better
|
|
distribute load among available CPUs. To enable SRP target driver's
|
|
internal thread you should load ib_srpt module with parameter
|
|
"thread=1".
|
|
|
|
|
|
Send questions about this driver to scst-devel@lists.sourceforge.net, CC:
|
|
Vu Pham <vuhuong@mellanox.com> and Bart Van Assche <bart.vanassche@gmail.com>.
|