Files
scst/srpt
Bart Van Assche ed195d5e4f Added more comments.
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@1939 d57e44dd-8a1f-0410-8b47-8ef2f437770f
2010-08-06 17:28:04 +00:00
..
2010-08-06 17:28:04 +00:00
2009-05-22 10:59:16 +00:00
2009-12-13 18:02:39 +00:00
2010-04-29 06:35:35 +00:00
2010-07-23 13:41:59 +00:00
2009-12-26 09:23:39 +00:00

SCSI RDMA Protocol (SRP) Target driver for Linux
=================================================

The SRP target driver has been designed to work on top of the Linux
InfiniBand kernel drivers -- either the InfiniBand drivers included
with a Linux distribution of the OFED InfiniBand drivers. For more
information about using the SRP target driver in combination with
OFED, see also README.ofed.

The SRP target driver has been implemented as an SCST driver. This
makes it possible to support a lot of I/O modes on real and virtual
devices. A few examples of supported device handlers are:

1. scst_disk. This device handler implements transparent pass-through
   of SCSI commands and allows SRP to access and to export real
   SCSI devices, i.e. disks, hardware RAID volumes, tape libraries
   as SRP LUNs.

2. scst_vdisk, either in fileio or in blockio mode. This device handler
   allows to export software RAID volumes, LVM volumes, IDE disks, and
   normal files as SRP LUNs.

3. nullio. The nullio device handler allows to measure the performance
   of the SRP target implementation without performing any actual I/O.


Installation
------------

Proceed as follows to compile and install the SRP target driver:

1. To minimize QUEUE_FULL conditions, apply the
   scst_increase_max_tgt_cmds patch as follows:

   cd ${SCST_DIR}
   patch -p0 < srpt/patches/scst_increase_max_tgt_cmds.patch

   This patch increases SCST's per-device queue size from 48 to 64. This
   helps to avoid QUEUE_FULL conditions because the size of the transmit
   queue in Linux' SRP initiator is also 64.

   Note: the SCSI layer of kernel 2.6.33 will have dynamic queue depth
   adjustment. When using SRP initiator systems with kernel 2.6.33 or later,
   this patch is less important.

2. Now compile and install SRPT:

   cd ${SCST_DIR}
   make -s scst_clean scst scst_install
   make -s srpt_clean srpt srpt_install
   make -s scstadm scstadm_install

3. Edit the installed file /etc/init.d/scst and add ib_srpt to the
   SCST_MODULES variable.

4. Configure SCST such that it will be started during system boot:

   chkconfig scst on

The ib_srpt kernel module supports the following parameters:
* srp_max_message_size (unsigned integer)
  Maximum size of an SRP control message in bytes. Examples of SRP control
  messages are: login request, logout request, data transfer request, ...
  The larger this parameter, the more scatter/gather list elements can be
  sent at once. Use the following formula to compute an appropriate value
  for this parameter: 68 + 16 * (max_sg_elem_count). The default value of
  this parameter is 2116, which corresponds to an sg list with 128 elements.
* srp_max_rdma_size (unsigned integer)
  Maximum number of bytes that may be transferred at once via RDMA. Defaults
  to 65536 bytes, which is sufficient to use the full bandwidth of low-latency
  HCA's such as Mellanox' ConnectX series. Increasing this value may decrease
  latency for applications transferring large amounts of data at once via
  direct I/O.
* thread (0 or 1)
  Whether incoming SRP requests will be processed in the IB interrupt that
  was triggered by the request (thread=0) or on the context of a separate
  thread (thread=1). The choice thread=0 results in the best performance,
  while thread=1 makes debugging easier. If a kernel oops is triggered inside
  an interrupt handler the system will be halted. As a result the call trace
  associated with the kernel oops will not be written to the kernel log in
  /var/log/messages. When using thread=1 however, the SRPT code runs in thread
  context. Any kernel oops generated in thread context will cause the offending
  thread to be killed. Other threads will keep running and call traces will be
  written to the on-disk kernel log.
* trace_flag (unsigned integer, only available in debug builds)
  The individual bits of the trace_flag parameter define which categories of
  trace messages should be sent to the kernel log and which ones not.


Configuring the SRP Target System
---------------------------------

First of all, create the file /etc/scst.conf. Below you can find an
example of how you can create this file using the scstadmin tool:

  /etc/init.d/scst stop
  /etc/init.d/scst start

  scstadmin -ClearConfig /etc/scst.conf
  scstadmin -adddev disk01 -path /dev/ram0 -handler vdisk -options NV_CACHE
  scstadmin -adddev disk02 -path /dev/ram1 -handler vdisk -options NV_CACHE
  scstadmin -assigndev disk01 -group Default -lun 0
  scstadmin -assigndev disk02 -group Default -lun 1
  scstadmin -assigndev 4:0:0:0 -group Default -lun 2
  scstadmin -WriteConfig /etc/scst.conf
  cat /etc/scst.conf

Now load the new configuration:

  /etc/init.d/scst reload


Configuring the SRP Initiator System
------------------------------------

First of all, load the SRP kernel module as follows:

   modprobe ib_srp

Next, discover the new SRP target by running the ibsrpdm command:

   ibsrpdm -c

Now let the initiator system log in to the target system:

   ibsrpdm -c | while read target_info; do echo "${target_info}" > /sys/class/infiniband_srp/${SRP_HCA_NAME}/add_target; done

Finally run lsscsi to display the details of the newly discovered SCSI disks:

   lsscsi

SRP targets can be recognized in the output of lsscsi by looking for
the disk names assigned on the SCST target ("disk01" in the example below):

   [8:0:0:0]    disk    SCST_FIO disk01            102  /dev/sdb

Notes:
* You can edit /etc/infiniband/openib.conf to load srp driver and srp HA daemon
  automatically ie. set SRP_LOAD=yes, and SRPHA_ENABLE=yes
* To set up and use high availability feature you need dm-multipath driver
  and multipath tool
* Please refer to the OFED-1.x user manual for more in-detail instructions
  on how to enable and how to use the HA feature. See e.g. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_user_manual_1_40_1.pdf.


Performance Notes - Initiator Side
----------------------------------

* For latency sensitive applications, using the noop scheduler at the initiator
  side can give significantly better results than with other schedulers.

* The following parameters have a small but measureable impact on SRP
  performance:
  * /sys/class/block/${dev}/queue/rq_affinity
  * /proc/irq/${ib_int_no}/smp_affinity


Performance Notes - Target Side
----------------------------------

* In some cases, for instance working with SSD devices, which consume 100%
  of a single CPU load for data transfers in their internal threads, to
  maximize IOPS it can be needed to assign for those threads dedicated
  CPUs using Linux CPU affinity facilities. No IRQ processing should be
  done on those CPUs. Check that using /proc/interrupts. See taskset
  command and Documentation/IRQ-affinity.txt in your kernel's source tree
  for how to assign CPU affinity to tasks and IRQs.

  The reason for that is that processing of coming commands in SIRQ context
  can be done on the same CPUs as SSD devices' threads doing data
  transfers. As the result, those threads won't receive all the CPU power
  and perform worse.

  Alternatively to CPU affinity assignment, you can try to enable SRP
  target's internal thread. It will allows Linux CPU scheduler to better
  distribute load among available CPUs. To enable SRP target driver's
  internal thread you should load ib_srpt module with parameter
  "thread=1".


Send questions about this driver to scst-devel@lists.sourceforge.net, CC:
Vu Pham <vuhuong@mellanox.com> and Bart Van Assche <bart.vanassche@gmail.com>.