Files
scst/srpt
Yan Burman d7e6569b1a Merged revisions 5667-5670,5672-5686,5698,5700-5722,5724-5737 via svnmerge from
svn+ssh://yanb123@svn.code.sf.net/p/scst/svn/trunk

........
  r5667 | bvassche | 2014-07-08 18:11:58 +0300 (Tue, 08 Jul 2014) | 1 line
  
  nightly build: Update kernel versions
........
  r5668 | vlnb | 2014-07-10 04:00:29 +0300 (Thu, 10 Jul 2014) | 3 lines
  
  Make SCST interface compatibility more robust
........
  r5669 | bvassche | 2014-07-10 09:17:57 +0300 (Thu, 10 Jul 2014) | 1 line
  
  scst/Makefile: Create /var/lib/scst/vdev_mode_pages while installing SCST
........
  r5670 | bvassche | 2014-07-10 09:18:58 +0300 (Thu, 10 Jul 2014) | 1 line
  
  scst.spec.in: Create /var/lib/scst/pr and /var/lib/scst/vdev_mode_pages when installing the SCST RPM
........
  r5672 | bvassche | 2014-07-10 15:29:59 +0300 (Thu, 10 Jul 2014) | 2 lines
  
  scripts/rebuild-rhel-kernel-rpm: Move code for downloading a RHEL (clone) RPM into a separate file
........
  r5673 | bvassche | 2014-07-11 09:48:49 +0300 (Fri, 11 Jul 2014) | 1 line
  
  scripts/generate-kernel-patch: Also generate scst-itf-ver.h
........
  r5674 | bvassche | 2014-07-11 10:54:19 +0300 (Fri, 11 Jul 2014) | 1 line
  
  scripts/generate-kernel-patch: Remove trailing whitespace
........
  r5675 | vlnb | 2014-07-12 03:53:46 +0300 (Sat, 12 Jul 2014) | 3 lines
  
  BLOCKIO microoptimization: use per-device biosets
........
  r5676 | bvassche | 2014-07-13 10:15:05 +0300 (Sun, 13 Jul 2014) | 1 line
  
  nightly build: Update kernel versions
........
  r5677 | vlnb | 2014-07-16 06:27:20 +0300 (Wed, 16 Jul 2014) | 3 lines
  
  Install scst_itf_ver.h as well
........
  r5678 | bvassche | 2014-07-16 08:35:40 +0300 (Wed, 16 Jul 2014) | 1 line
  
  scst-devel rpm: Include /usr/include/scst/scst_itf_ver.h
........
  r5679 | bvassche | 2014-07-16 11:29:16 +0300 (Wed, 16 Jul 2014) | 6 lines
  
  ib_srpt: Fix Mellanox OFED build
  
  Use the proper include directory when building against Mellanox OFED.
  Do not require to remove /lib/modules/$(KVER)/kernel/drivers/infiniband
  before building ib_srpt.
........
  r5680 | bvassche | 2014-07-18 12:27:41 +0300 (Fri, 18 Jul 2014) | 1 line
  
  srpt/Makefile: Remove a superfluous assignment statement
........
  r5681 | bvassche | 2014-07-18 12:28:22 +0300 (Fri, 18 Jul 2014) | 1 line
  
  srpt/Makefile: SLES + Mellanox OFED build fix
........
  r5682 | bvassche | 2014-07-18 12:30:25 +0300 (Fri, 18 Jul 2014) | 1 line
  
  srpt/README: Fix SLES patch instructions
........
  r5683 | vlnb | 2014-07-19 06:54:33 +0300 (Sat, 19 Jul 2014) | 11 lines
  
  iscsi-scst: Handle data buffers with non-zero offset correctly
  
  Start at the proper offset in the receive buffer if sg[0].offset != 0.
  Return the proper data to the initiator if sg[0].offset != 0.
  
  This patch reworks trunk r5281, "iscsi-scst: fix offset
  calculation", February 13, 2014.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5684 | vlnb | 2014-07-19 07:08:30 +0300 (Sat, 19 Jul 2014) | 10 lines
  
  scst: Clean up scst_process_active_cmd()
  
  Since inside scst_process_active_cmd() cmd->state can only change after
  cmd has been added back to the command list it is safe to perform the
  cmd->state check without holding the command list lock. Hence move the
  perform the cmd->state check without holding the command list lock.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5685 | vlnb | 2014-07-19 07:11:39 +0300 (Sat, 19 Jul 2014) | 7 lines
  
  scst: Introduce scst_set_thr_cpu_mask()
  
  This patch does not change any functionality.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5686 | vlnb | 2014-07-19 07:13:24 +0300 (Sat, 19 Jul 2014) | 8 lines
  
  scst_vdisk: Micro-optimize the zero-copy check
  
  Only evaluate the SCSI command type if virt_dev->zero_copy has been
  set instead of always checking the SCSI command type.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5698 | bvassche | 2014-07-20 11:37:58 +0300 (Sun, 20 Jul 2014) | 1 line
  
  nightly build: Update kernel versions
........
  r5700 | vlnb | 2014-07-22 02:04:18 +0300 (Tue, 22 Jul 2014) | 3 lines
  
  Web updates
........
  r5701 | vlnb | 2014-07-22 03:22:06 +0300 (Tue, 22 Jul 2014) | 8 lines
  
  scst: Make scst_cmd_threads.threads_list locking more fine-grained
  
  Introduce a new synchronization object, namely scst_cmd_threads.thr_lock,
  to protect scst_cmd_threads.threads_list.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5702 | vlnb | 2014-07-23 05:00:06 +0300 (Wed, 23 Jul 2014) | 7 lines
  
  Prevent possible collisions between saved PR and mode pages backup files
  
  From now on '.' is illegal character in SCST device names
  
  Reported-by Ken Raeburn <raeburn@permabit.com>
........
  r5703 | vlnb | 2014-07-23 05:49:50 +0300 (Wed, 23 Jul 2014) | 5 lines
  
  Review of host_status handling (pass-through mode)
  
  Inspired by Dave Butler <tears.the@gmail.com> and Bart Van Assche <bvanassche@acm.org>
........
  r5704 | vlnb | 2014-07-26 03:04:51 +0300 (Sat, 26 Jul 2014) | 3 lines
  
  Internal REQUEST SENSE: NO SENSE is also valid sense
........
  r5705 | vlnb | 2014-07-29 02:11:14 +0300 (Tue, 29 Jul 2014) | 3 lines
  
  Print initiator and target in the abort messages
........
  r5706 | vlnb | 2014-07-30 05:27:10 +0300 (Wed, 30 Jul 2014) | 3 lines
  
  Minor logging improvements
........
  r5707 | vlnb | 2014-07-30 05:52:26 +0300 (Wed, 30 Jul 2014) | 3 lines
  
  Follow up for r5704: NO SENSE is also valid sense
........
  r5708 | vlnb | 2014-07-30 05:53:07 +0300 (Wed, 30 Jul 2014) | 3 lines
  
  Minor fix
........
  r5709 | bvassche | 2014-08-06 20:40:30 +0300 (Wed, 06 Aug 2014) | 1 line
  
  nightly build: Update kernel versions
........
  r5710 | bvassche | 2014-08-07 10:20:08 +0300 (Thu, 07 Aug 2014) | 1 line
  
  RHEL 7: Add scst_exec_req_fifo and put_page_callback patches
........
  r5711 | bvassche | 2014-08-07 11:12:22 +0300 (Thu, 07 Aug 2014) | 1 line
  
  Rename the two RHEL 7 scst_exec_req_fifo patches
........
  r5712 | bvassche | 2014-08-07 11:38:19 +0300 (Thu, 07 Aug 2014) | 1 line
  
  scripts/rebuild-rhel-kernel-rpm: Add RHEL 7 support
........
  r5713 | bvassche | 2014-08-08 13:37:17 +0300 (Fri, 08 Aug 2014) | 1 line
  
  nightly build: Update kernel versions
........
  r5714 | bvassche | 2014-08-09 14:02:19 +0300 (Sat, 09 Aug 2014) | 1 line
  
  scripts/rebuild-rhel-kernel-rpm: Install more prerequisites
........
  r5715 | bvassche | 2014-08-15 10:49:07 +0300 (Fri, 15 Aug 2014) | 1 line
  
  nightly build: Update kernel versions
........
  r5716 | vlnb | 2014-08-16 02:44:33 +0300 (Sat, 16 Aug 2014) | 3 lines
  
  Fix blockio bioset for older kernels, which need explicit bio destructors
........
  r5717 | vlnb | 2014-08-20 00:52:31 +0300 (Wed, 20 Aug 2014) | 3 lines
  
  Update for kernels 3.16
........
  r5718 | vlnb | 2014-08-20 05:24:08 +0300 (Wed, 20 Aug 2014) | 12 lines
  
  Fix Coverity warning of q2t_ctio_to_cmd() dead code
  
  Coverity warns that 'handle == Q2T_SKIP_HANDLE' can never occur, because
  code preceding the test has masked out a handle bit that would be
  required for the test to succeed.
  
  Fix that by extending Q2T_SKIP_HANDLE to incorporate
  CTIO_INTERMEDIATE_HANDLE_MARK as well.
  
  Reported-by: Steven J. Magnani <steve@digidescorp.com>
........
  r5719 | bvassche | 2014-08-20 09:55:04 +0300 (Wed, 20 Aug 2014) | 1 line
  
  nightly build: Add kernel 3.16 nightly build infrastructure
........
  r5720 | bvassche | 2014-08-20 09:57:04 +0300 (Wed, 20 Aug 2014) | 1 line
  
  nightly build: Add kernel version 3.16.1
........
  r5721 | vlnb | 2014-08-21 08:17:47 +0300 (Thu, 21 Aug 2014) | 5 lines
  
  Fix incorrect address computation during receive PDUs preparations
  
  Found and fix suggested by Кирилл Тюшев <kirill.tyushev8@gmail.com>
........
  r5722 | vlnb | 2014-08-21 08:18:43 +0300 (Thu, 21 Aug 2014) | 3 lines
  
  Integration of the QLogic git's qla2x00t into the SCST tree
........
  r5724 | bvassche | 2014-08-22 10:19:55 +0300 (Fri, 22 Aug 2014) | 9 lines
  
  Fix a kernel 3.16 checkpatch complaint about trailing semicolons
  
  Avoid that the checkpatch tool included in Linux kernel v3.16 reports the
  following warning:
  
      macros should not use a trailing semicolon
  
  This patch does not change any functionality.
........
  r5725 | bvassche | 2014-08-25 14:36:33 +0300 (Mon, 25 Aug 2014) | 1 line
  
  ib_srpt: Kernel v3.17 build fix
........
  r5726 | bvassche | 2014-08-25 14:41:55 +0300 (Mon, 25 Aug 2014) | 1 line
  
  ib_srpt: Log QPN next to session name
........
  r5727 | bvassche | 2014-08-25 14:46:43 +0300 (Mon, 25 Aug 2014) | 6 lines
  
  ib_srpt: Speed up kernel driver unloading after a cable pull
  
  When unloading the ib_srpt kernel module, instead of waiting until
  all connected queue pairs have left the TimeWait state, destroy
  these queue pairs immediately.
........
  r5728 | bvassche | 2014-08-25 15:50:19 +0300 (Mon, 25 Aug 2014) | 8 lines
  
  ib_srpt: Make the completion vector configurable
  
  Allow a set of completion vectors to be associated with each InfiniBand
  HCA port and allocate the completion vector for each session in a
  round-robin fashion from the per-port set. This helps to spread the
  InfiniBand interrupt workload over multiple CPU's, at least if different
  InfiniBand MSI-X vectors are associated with different CPU's.
........
  r5729 | vlnb | 2014-08-26 06:21:07 +0300 (Tue, 26 Aug 2014) | 7 lines
  
  scst_vdisk: Make vdisk_sup_vpd() easier to extend
  
  This patch doesn't change any functionality.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5730 | vlnb | 2014-08-26 06:44:23 +0300 (Tue, 26 Aug 2014) | 12 lines
  
  scst_vdisk: Make EUI-64 and NAA IDs configurable
  
  Make the SCSI device identification page (83h) EUI-64 and NAA
  IDs configurable. If neither the eui64_id nor the naa_id sysfs
  attribute has been set, export the first eight bytes of the
  t10_dev_id as an EUI-64 ID. If the NAA ID but not the EUI-64 ID
  has been set, report the NAA ID only. If both IDs have been set,
  report both.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5731 | bvassche | 2014-08-26 12:56:54 +0300 (Tue, 26 Aug 2014) | 1 line
  
  scstadmin: Avoid that the Perl interpreter prints a warning message about using an undefined variable on Fedora systems
........
  r5732 | vlnb | 2014-08-27 05:41:40 +0300 (Wed, 27 Aug 2014) | 9 lines
  
  iscsi-scst: Build fix for IPV6=n
  
  Ensure that iscsi-scst builds properly against a kernel with CONFIG_IPV6=n.
  
  Reported by Igor Novgorodov <igor@novg.net> on June 7, 2014.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5733 | vlnb | 2014-08-27 05:42:35 +0300 (Wed, 27 Aug 2014) | 9 lines
  
  scst_pres: Make a few error messages more clear
  
  Make it possible to figure out what went wrong from inspecting the
  system log only by mentioning the incorrect filename in the system
  log.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5734 | vlnb | 2014-08-27 05:43:50 +0300 (Wed, 27 Aug 2014) | 8 lines
  
  scst: Remove set_cpus_allowed() invocations
  
  Since scst.h contains a backport of the definition of set_cpus_allowed_ptr(),
  the set_cpus_allowed() invocations are superfluous. Hence remove these.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5735 | vlnb | 2014-08-27 05:55:28 +0300 (Wed, 27 Aug 2014) | 8 lines
  
  iscsi-scst: Micro-optimize cmnd_prepare_recv_pdu()
  
  Instead of adding sg[idx].offset to addr and immediately subtracting
  sg[idx].offset again, leave out both arithmetic operations.
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5736 | vlnb | 2014-08-27 06:00:44 +0300 (Wed, 27 Aug 2014) | 5 lines
  
  scst/include/scst.h: Document on_abort_cmd() further
  
  Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
  r5737 | bvassche | 2014-08-27 09:33:02 +0300 (Wed, 27 Aug 2014) | 1 line
  
  scst_vdisk: Build fix for kernel 3.1.x
........


git-svn-id: http://svn.code.sf.net/p/scst/svn/branches/iser@5738 d57e44dd-8a1f-0410-8b47-8ef2f437770f
2014-08-27 13:26:02 +00:00
..
2012-04-07 09:13:50 +00:00
2012-12-21 12:02:53 +00:00

SCSI RDMA Protocol (SRP) Target driver for Linux
=================================================

The SRP target driver has been designed to work on top of the Linux RDMA
kernel drivers -- either the RDMA drivers included with a Linux distribution
or the OFED RDMA drivers. For more information about using the SRP target
driver in combination with OFED, see also README.ofed.

The SRP target driver has been implemented as an SCST driver. This
makes it possible to support a lot of I/O modes on real and virtual
devices. A few examples of supported device handlers are:

1. scst_disk. This device handler implements transparent pass-through
   of SCSI commands and allows SRP to access and to export real
   SCSI devices, i.e. disks, hardware RAID volumes, tape libraries
   as SRP LUNs.

2. scst_vdisk, either in fileio or in blockio mode. This device handler
   allows to export software RAID volumes, LVM volumes, IDE disks, and
   normal files as SRP LUNs.

3. nullio. The nullio device handler allows to measure the performance
   of the SRP target implementation without performing any actual I/O.


Installation
------------

Building and installing the SRP target driver is possible as follows:

   cd ${SCST_DIR}
   if type -p rpm >/dev/null; then
      make -s rpm
      sudo rpm -U rpmbuilddir/RPMS/*/*rpm scstadmin/rpmbuilddir/RPMS/*/*rpm
   else
      make -s scst_clean srpt_clean scst srpt scstadmin
      sudo make -s scst_install srpt_install scstadm_install
   fi

The ib_srpt kernel module supports the following parameters:
* one_target_per_port (boolean) and
* use_node_guid_in_target_name (boolean)
  ib_srpt can operate in one of the following three modes:
  1. Access control configuration per HCA and assigning a "ib_srpt_target_<n>"
     style name to each HCA.
  2. Access control configuration per HCA and referring to a HCA via its node
     GUID (e.g. 0002:c903:0005:f34a).
  3. Access control configuration per HCA port and referring to a HCA via its
     port GID (e.g. fe80:0000:0000:0000:0002:c903:0005:f34b).
  Mode (1) is choosen if both one_target_per_port and
  use_node_guid_in_target_name are false. Mode (2) is choosen if
  one_target_per_port is false and use_node_guid_in_target_name is true. Mode
  (3) is choosen if one_target_per_port is true. This last mode is the
  default mode.
* rdma_cm_port (number)
  A 16-bit number that specifies the port number to be registered via the
  RDMA/CM. Must be specified to make communication over RoCE or iWARP
  possible. If this parameter is zero (the default value) the SRP target
  driver does not register with the RDMA/CM.
* srp_max_req_size (number)
  Maximum size of an SRP control message in bytes. Examples of SRP control
  messages are: login request, logout request, data transfer request, ...
  The larger this parameter, the more scatter/gather list elements can be
  sent at once. Use the following formula to compute an appropriate value
  for this parameter: 68 + 16 * (sg_tablesize). The default value of
  this parameter is 4148, which corresponds to an sg table size of 255.
* srp_max_rsp_size (number)
  Maximum size of an SRP response message in bytes. Sense data is sent back
  via these messages towards the initiator. The default size is 256 bytes.
  With this value there remains (256-36) = 220 bytes for sense data.
* srp_max_rdma_size (number)
  Maximum number of bytes that may be transferred at once via RDMA. Defaults
  to 65536 bytes, which is sufficient to use the full bandwidth of low-latency
  HCAs. Increasing this value may decrease latency for applications
  transferring large amounts of data at once.
* srpt_srq_size (number, default 4095)
  ib_srpt uses a shared receive queue (SRQ) for processing incoming SRP
  requests. This number may have to be increased when a large number of
  initiator systems is accessing a single SRP target system.
* srpt_sq_size (number, default 4096)
  Per-channel InfiniBand send queue size. The default setting is sufficient
  for a credit limit of 128. Changing this parameter to a smaller value may
  cause RDMA requests to be retried and hence may slow down data transfer
  severely.
* trace_flag (unsigned integer, only available in debug builds)
  The individual bits of the trace_flag parameter define which categories of
  trace messages should be sent to the kernel log and which ones not.


Configuring the SRP Target System
---------------------------------

The first step is to choose whether access control will be controlled per
HCA or per HCA port and to create a modprobe configuration file that reflects
this choice. An example:

  # cat /etc/modprobe.d/ib_srpt.conf
  options ib_srpt one_target_per_port=1

Next, create the file /etc/scst.conf. You can create this file with
the scstadmin tool as follows:

  /etc/init.d/scst stop
  /etc/init.d/scst start

Now configure SCST using scstadmin - see also the scstadmin documentation for
further information. Once finished, save the configuration to /etc/scst.conf:

  scstadmin -write_config /etc/scst.conf  (sysfs version)
or
  scstadmin -WriteConfig /etc/scst.conf   (procfs version)

One can verify the contents of scst.conf e.g. as follows:

  cat /etc/scst.conf

Now verify that loading the configuration from file works correctly:

  /etc/init.d/scst reload

Note: when using InfiniBand loading the ib_ipoib kernel module and assigning
an IP address to each IPoIB interface is only needed when using the RDMA/CM.
When using the IB/CM however, it is allowed but not necessary to load the
ib_ipoib kernel module.


Configuring the SRP Initiator System
------------------------------------

First of all, load the SRP kernel module as follows:

   modprobe ib_srp

Next, when using InfiniBand, discover the new SRP target by running the
srp_daemon command:

   for d in /dev/infiniband/umad*; do srp_daemon -oacd$d; done

If you want to let the initiator system log in to all SRP targets available
in the same InfiniBand subnet that is possible as follows (-e = execute):

   for d in /dev/infiniband/umad*; do srp_daemon -oecd$d; done

If you want to let the initiator log in to a specific target you can do that
e.g. as follows:

   echo "id_ext=0002c903000f1366,ioc_guid=0002c903000f1366,dgid=fe800000000000000002c903000f1367,pkey=ffff,service_id=0002c903000f1366" > /sys/class/infiniband_srp/${SRP_HCA_NAME}/add_target; done

The meaning of the parameters in the above command is as follows:
   * id_ext: must match ioc_guid.
   * ioc_guid: see also the documentation of the ib_srpt ioc_guid parameter.
   * dgid: target HCA port GID to connect to.
   * pkey: IB partition key (P_Key) of the target to connect to.
   * service_id: must match ioc_guid.

When using RoCE or iWARP, log in to the target system to determine the id_ext
and ioc_guid parameters and use these to log in. An example:

    [ target system ]
    # sed 's/,\(pkey\|dgid\|service_id\)=[^,]*//g' $(find /sys/kernel/scst_tgt/targets/ib_srpt -name login_info) | uniq
    id_ext=0002c90300a34270,ioc_guid=0002c90300a34270

    [ initiator system ]
    echo dest=192.168.5.1:5000,id_ext=0002c90300a34270,ioc_guid=0002c90300a34270
    >/sys/class/infiniband_srp/srp-mlx4_0-1/add_target
    echo dest=192.168.6.1:5000,id_ext=0002c90300a34270,ioc_guid=0002c90300a34270
    >/sys/class/infiniband_srp/srp-mlx4_0-2/add_target

Initiator port GIDs can be queried e.g. via sysfs:

$ for f in /sys/devices/*/*/*/infiniband/*/ports/*/gids/0; do echo $f; \
cat $f | sed 's/://g'; done
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/1/gids/0
fe800000000000000002c9030005f34b
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/2/gids/0
fe800000000000000002c9030005f34c
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/1/gids/0
fe800000000000000002c9030003cca7
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/2/gids/0
fe800000000000000002c9030003cca8

Finally run lsscsi to display the details of the newly discovered SCSI disks:

   lsscsi

SRP targets can be recognized in the output of lsscsi by looking for
the disk names assigned on the SCST target ("disk01" in the example below):

   [8:0:0:0]    disk    SCST_FIO disk01            102  /dev/sdb


Target names
------------

The name assigned by the ib_srpt target driver to an SCST target is either
ib_srpt_target_<n>, the node GUID of a HCA in hexadecimal form with a colon
after every fourth digit or the port GID with a colon afer every fourth
digit. The HCA node GUID and the port GIDs can be obtained via the
ibv_devinfo command. An example:

# ibv_devinfo -v | grep -E '[^a-z]port:|guid|GID'
node_guid:      0002:c903:0005:f34e
sys_image_guid: 0002:c903:0005:f351
  port: 1
    GID[0]:     fe80:0000:0000:0000:0002:c903:0005:f34f
  port: 2
    GID[0]:     fe80:0000:0000:0000:0002:c903:0005:f350

Once the ib_srpt driver has been loaded the available SCST targets can be
queried as follows:

# (cd /sys/kernel/scst_tgt/targets/ib_srpt && ls -d [0-9a-f]*)
fe80:0000:0000:0000:0002:c903:0005:f34f
fe80:0000:0000:0000:0002:c903:0005:f350


Session names
-------------

The name assigned by the ib_srpt target driver to a session depends on the
mode in which it is operating. If one_target_per_port=y then the source port
GID is used as the session name. If one_target_per_port=n then the 128-bit SRP
initiator port identifier is used as the session name. This identifier is sent
by the SRP initiator to the SRP target via the SRP_LOGIN_REQ information unit.
The Linux SRP initiator (ib_srp) generates the initiator port identifier as
follows:
- The first eight bytes are the identifier extension ('initiator_ext' parameter
  specified in the login string echoed into the sysfs file 'add_target').
- The last eight bytes are the GUID of the initiator HCA port used to
  communicate with the target.

An example:

[ INITIATOR ]

$ for f in /sys/devices/*/*/*/infiniband/*/ports/*/gids/0; do echo
f; cat $f; done
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/1/gids/0
fe80:0000:0000:0000:0002:c903:0005:f34b
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/2/gids/0
fe80:0000:0000:0000:0002:c903:0005:f34c
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/1/gids/0
fe80:0000:0000:0000:0002:c903:0003:cca7
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/2/gids/0
fe80:0000:0000:0000:0002:c903:0003:cca8

[ TARGET, after login ]

$ (cd /sys/kernel/scst_tgt/targets/ib_srpt/[0-9a-f]* && ls -d sessions/*)
sessions/fe80:0000:0000:0000:0002:c903:0003:cca7
sessions/fe80:0000:0000:0000:0002:c903:0005:f34b


LUN masking
-----------

In a straightforward configuration every LUN is visible to every initiator.
It is possible however to make a different set of LUNs visible to each
initiator by using the LUN masking feature of SCST. SRP initiators are
identified by their session name (see above). An example of an scst.conf
file using LUN masking for ib_srpt:

TARGET_DRIVER ib_srpt {
        TARGET fe80:0000:0000:0000:0002:c903:0005:f34b {
                enabled 1
                rel_tgt_id 1

                # LUNs visible by all initiators not listed below
                LUN 0 disk01

                GROUP grp1 {
                        # LUNs visible by initiator system 1
                        LUN 0 disk02

                        INITIATOR fe80:0000:0000:0000:0002:c903:0005:f34b
                }

                GROUP grp2 {
                        # LUNs visible by initiator system 2
                        LUN 0 disk03

                        INITIATOR fe80:0000:0000:0000:0002:c903:0005:f34c
                }
        }
}


Adding and Removing LUNs Dynamically
------------------------------------

It is possible to add and/or remove LUNs on the target without restarting
target or initiator. This can be done either via scstadmin or directly via the
sysfs interface. Although the SCST core will notify the initiator about LUN
changes, Linux initiators will ignore these notifications. In order to bring a
Linux initiator again in sync after a LUN change, the initiator has to be told
to rescan SCSI devices. Rescanning SCSI devices is e.g. possible via the
rescsan-scsi-bus.sh script that can be found here:
http://www.garloff.de/kurt/linux/#rescan-scsi. An example:
$ rescan-scsi-bus --hosts=${srp_host_id} --channels=0 --ids=0 --luns=0-31


InfiniBand Partitions
---------------------

Just like a VLAN allows to segment traffic on an Ethernet network partitions
allow to segment traffic on an InfiniBand network. Each InfiniBand partition
is identified by a partition key which is a 16-bit number. During fabric
initialization the subnet manager assigns one or more partition keys to
each InfiniBand port. For opensm partitions are defined in
/etc/opensm/partitions.conf. ib_srpt uses the partition with index 0. Which
partition key corresponds to index 0 can be found out by querying sysfs:

$ head /sys/class/infiniband/*/ports/*/pkeys/0
==> /sys/class/infiniband/mlx4_0/ports/1/pkeys/0 <==
0xffff

==> /sys/class/infiniband/mlx4_0/ports/2/pkeys/0 <==
0xffff


High availability
-----------------

If there are redundant paths in the IB network between initiator and target,
automatic path failover can be set up on the initiator as follows:
* Edit /etc/infiniband/openib.conf to load the SRP driver and SRP HA daemon
  automatically: set SRP_LOAD=yes and SRPHA_ENABLE=yes.
* To set up and use the high availability feature you need the dm-multipath
  driver and multipath tool.
* Please refer to the OFED-1.x user manual for more detailed instructions
  on how to enable and how to use the HA feature. See e.g.
  http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED%20_Linux_user_manual_1_5_1_2.pdf.

A setup with automatic failover between redundant targets is possible by
installing and configuring DRBD on both targets. If the initiator system
supports mirroring (e.g. Linux), you can use the following approach:
* Configure DRBD in Active/Active mode.
* Configure the initiator(s) for mirroring between the redundant targets.
If the initiator system does not support mirroring (e.g. VMware ESX), you
can use the following approach:
* Configure DRBD in Active/Passive mode and enable STONITH mode in the
  Heartbeat software.

For more information, see also:
* http://www.drbd.org/
* http://www.linux-ha.org/wiki/Main_Page


Performance Notes - Target Side
-------------------------------

* Building the SCST core and the ib_srpt target driver in release mode
  improves performance compared to debug mode.

* When using high-latency storage devices (hard disks), the default value
  choosen by SCST for DEVICE.threads_num should be fine. When using
  low-latency storage devices though (SSDs), DEVICE.threads_num should be set
  to 1 or 2 in /etc/scst.conf in order to reach optimal performance for small
  block sizes (e.g. 4 KB).

* When multiple InfiniBand HCA's are present in a target system the Linux
  kernel by default will assign the associated interrupt handlers to CPU 0.
  Even irqbalance will often assign the interrupt handlers of multiple HCA's
  to the same CPU. That is unfortunate because it leads to unfair handling of
  SRP sessions. The solution is to assign InfiniBand HCA interrupts manually
  to different CPU's. That's possible by writing looking up the InfiniBand
  interrupt numbers in /proc/interrupts and by writing proper bitmasks into
  /proc/irq/<n>/smp_affinity.


Performance Notes - Initiator Side
----------------------------------

* Choose a proper value for the ib_srp kernel module parameter
  cmd_sg_entries. The default value 12 works well for buffered reads while
  the throughput for write-dominated workloads improves by changing this value
  into 255. One way to set this kernel module parameter is as follows:

  echo options ib_srp cmd_sg_entries=255 >>/etc/modprobe.d/ib_srp.conf

* For multithreaded workloads using small block sizes changing rq_affinity
  into 2 improves IOPS significantly (Linux kernel 3.1 and later; see also
  commit 5757a6d76cdf6dda2a492c09b985c015e86779b1).

* For latency sensitive applications, using the noop scheduler at the initiator
  side can give significantly better results than with other schedulers.

* The SRP initiator limits by default the queue depth to 64 commands. If your
  workload benefits from a larger queue depth, enlarge the queue depth by
  setting the max_cmd_per_lun and queue_size parameters in the SRP login
  string.

* The following parameters have a small but measurable impact on SRP
  performance:
  * /sys/class/block/${dev}/queue/rotational
  * /sys/class/block/${dev}/queue/rq_affinity
  * /proc/irq/${ib_int_no}/smp_affinity


Performance Notes - Both Sides
------------------------------

* Disabling CONFIG_SCHED_DEBUG and CONFIG_SCHEDSTATS in the kernel config
  improves performance.

* Disable CONFIG_IRQSOFF_TRACER such that CONFIG_TRACE_IRQFLAGS is disabled.

* Consider which memory allocator to use. With recent kernels using the SLUB
  memory allocator instead of SLAB may help. On multi-socket systems the SLAB
  memory allocator may result in better performance. Please note that SLAB is
  tunable while SLUB is not. See also http://lkml.org/lkml/2010/7/9/264 and
  http://www.ibm.com/developerworks/linux/library/l-linux-slab-allocator/.


Frequently Asked Questions
--------------------------

Q: Every now and then "SRP abort called" and "SRP reset_device called"
   messages are logged at the initiator side. Around the same time I see the
   following message in the target log: "ib_srpt: ***ERROR***: Command ...: IB
   completion for idx ... has not been received in time (SRPT command state
   ...)". What is the meaning of these messages mean and how can I fix this ?

A: This means that a timeout occurred while a HCA was waiting for an
   acknowledge message. Check the IB network for bad IB cables, bad HCA's
   and/or bad switch ports. Also make sure that the HCA firmware is up to
   date.

Q: Loading the kernel module ib_srpt triggers a kernel panic with a call trace
   like the one below. What is the cause of this and how can this be solved ?

   Call Trace:
    [<ffffffffa02f2a50>] srpt_alloc_ioctx+0x60/0xb0 [ib_srpt]
    [<ffffffffa02f2f0a>] srpt_alloc_ioctx_ring+0xea/0x1e0 [ib_srpt]
    [<ffffffffa02f32e9>] srpt_add_one+0x2e9/0x670 [ib_srpt]
    [<ffffffffa015a480>] ib_register_client+0x80/0xa0 [ib_core]
    [<ffffffffa02421eb>] srpt_init_module+0x1eb/0x235 [ib_srpt]
    [<ffffffff81000344>] do_one_initcall+0x34/0x1a0
    [<ffffffff8107a63c>] sys_init_module+0xdc/0x260
    [<ffffffff81002e3b>] system_call_fastpath+0x16/0x1b

A: This means that you are using a system on which OFED has been installed but
   that ib_srpt has been compiled against the in-tree kernel headers instead
   of the OFED kernel headers. You can fix this by rebuilding ib_srpt against
   the OFED kernel headers. The ib_srpt makefile should detect the OFED kernel
   headers automatically - at least if ib_srpt is built after OFED has been
   installed.


Feedback
--------

Send questions about this driver to scst-devel@lists.sourceforge.net.