mirror of
https://github.com/SCST-project/scst.git
synced 2026-05-19 19:51:27 +00:00
svn+ssh://yanb123@svn.code.sf.net/p/scst/svn/trunk
........
r5246 | vlnb | 2014-01-29 05:30:24 +0200 (Wed, 29 Jan 2014) | 3 lines
Put CDB control byte parsing in one place
........
r5247 | vlnb | 2014-01-29 06:16:58 +0200 (Wed, 29 Jan 2014) | 3 lines
Better version of the previous patch
........
r5248 | vlnb | 2014-01-30 03:40:48 +0200 (Thu, 30 Jan 2014) | 7 lines
[PATCH 1/2] scst_sysfs: Make it easier to add new target sysfs attributes
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5249 | vlnb | 2014-01-30 03:41:54 +0200 (Thu, 30 Jan 2014) | 10 lines
[PATCH 2/2] scst_sysfs: Add I/O statistics per target
Although it is possible to obtain these statistics by iterating over
all sessions and by computing the sum of the per-target statistics,
make per-target statistics directly available such that these can be
retrieved easily.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5250 | vlnb | 2014-01-30 04:32:44 +0200 (Thu, 30 Jan 2014) | 3 lines
Update for 3.13 kernels
........
r5251 | bvassche | 2014-01-30 11:16:27 +0200 (Thu, 30 Jan 2014) | 1 line
nightly build: Add kernel 3.13 build infrastructure
........
r5252 | bvassche | 2014-01-30 11:30:18 +0200 (Thu, 30 Jan 2014) | 1 line
scripts/kernel-functions: Add a bug fix for the kernel 3.13 series that is not yet present in the kernel 3.13 stable series
........
r5253 | bvassche | 2014-01-30 11:31:32 +0200 (Thu, 30 Jan 2014) | 1 line
nightly build: Add kernel version 3.13.1
........
r5254 | vlnb | 2014-01-31 04:32:02 +0200 (Fri, 31 Jan 2014) | 10 lines
scst_pres: Simplify PR locking
Since the time during which a PR read or write lock is held is short,
use a mutex to implement PR read and write locking. So although this
patch excludes multiple simultaneous readers that shouldn't affect the
time needed to process a PR operation measurably.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5255 | vlnb | 2014-01-31 04:33:11 +0200 (Fri, 31 Jan 2014) | 5 lines
scst_vdisk: Check that "filename" is specified at most once
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5256 | vlnb | 2014-01-31 04:35:20 +0200 (Fri, 31 Jan 2014) | 5 lines
scst_vdisk: Sort "add_device_parameters" alphabetically
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5260 | bvassche | 2014-02-03 11:03:03 +0200 (Mon, 03 Feb 2014) | 1 line
scripts/list-source-files: Handle Mercurial subdirectories properly
........
r5264 | bvassche | 2014-02-06 14:46:51 +0200 (Thu, 06 Feb 2014) | 9 lines
scst_local: Fix a kernel oops for kernel versions < 2.6.37
Avoid that scst_local triggers "BUG: unable to handle kernel NULL
pointer dereference" on kernel versions before 2.6.37. This patch
fixes a regression introduced via patch "scst_local: Avoid
deadlock during module removal with kernel 3.6" (trunk r4566).
Reported-by: Sebastian Herbszt <herbszt@gmx.de>
........
r5266 | bvassche | 2014-02-06 15:30:06 +0200 (Thu, 06 Feb 2014) | 14 lines
Hush Coverity warning of scst_ws_push_single_write() uninitialized pointer
Coverity warns that sgv may be used uninitialized. The warning
applies to WRITE SAME commands with LBDATA == PBDATA == 0 (replicate
a single block of user data into the specified LBA range).
The warning appears to be spurious - when LBDATA == PBDATA == 0,
scst_ws_write_cmd_finished() will not use the uninitialized value
saved by scst_ws_push_single_write().
Move initialization of sgv earlier in the function to quiesce the warning.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
........
r5267 | bvassche | 2014-02-06 15:38:28 +0200 (Thu, 06 Feb 2014) | 7 lines
qla2x00t: Re-sync help text with the code
The ql2xfdmienable module parameter defaults to 1, but the help text
claims it defaults to zero.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
........
r5268 | bvassche | 2014-02-06 16:17:49 +0200 (Thu, 06 Feb 2014) | 6 lines
ib_srpt: Avoid that disabling a target triggers a race condition
Avoid that disabling a target triggers a race condition with
SRP relogin. At least in theory this race condition could result
in a kernel crash.
........
r5269 | bvassche | 2014-02-07 09:31:38 +0200 (Fri, 07 Feb 2014) | 9 lines
scst_sysfs: Fix a build failure on kernels 2.6.2[678]
The sysfs API is supported from kernel 2.6.26 on and uses the swap()
macro while the swap() macro was introduced in kernel 2.6.29. Hence
provide a definition of the swap() macro for kernels before 2.6.29.
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
[bvanassche: Moved swap() definition a few lines down and added #ifndef/#endif]
........
r5270 | bvassche | 2014-02-07 09:45:15 +0200 (Fri, 07 Feb 2014) | 2 lines
regression tests: Run the 2.6.26..2.6.32 tests on the sysfs code instead of procfs
........
r5271 | bvassche | 2014-02-07 10:11:28 +0200 (Fri, 07 Feb 2014) | 1 line
nightly build: Update kernel versions
........
r5272 | bvassche | 2014-02-07 14:43:25 +0200 (Fri, 07 Feb 2014) | 16 lines
scst_user, rt: Wake command processing thread when needed
In a fully-preemptible realtime kernel (CONFIG_PREEMPT_RT_FULL=y),
SCSI commands from an initiator time out because the userland target
application is never woken to process them.
This is because in a fully-preemptible realtime kernel, soft-IRQ
(tasklet) execution always occurs in a ksoftirqd thread and
preempt_count is not manipulated on soft-IRQ processing entry/exit.
This makes in_interrupt() useless for determining whether soft-IRQ
processing is occurring; instead, in_serving_softirq() should be
used for that purpose.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
[bvanassche: Elaborated source code comment]
........
r5273 | bvassche | 2014-02-07 14:46:39 +0200 (Fri, 07 Feb 2014) | 9 lines
scst_vdisk: Build fix for kernels 2.6.27..2.6.30
add_to_page_cache_lru and __lock_page_killable are exported since
kernel version 2.6.30. See also patch "Staging: pohmelfs: kconfig/makefile
and vfs changes" (commit 18bc0bbd162e3eb3e7ea2953c315ad4113a57164;
included in kernel v2.6.30).
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
........
r5274 | vlnb | 2014-02-08 03:04:27 +0200 (Sat, 08 Feb 2014) | 9 lines
scst_user: Convert sgv_purge_interval to jiffies before use
The sgv_purge_interval from userland is passed down without conversion to
jiffies. Yet, if it is zero, the default value is (60 * HZ).
Convert to jiffies before passing down.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
........
r5275 | vlnb | 2014-02-08 03:52:03 +0200 (Sat, 08 Feb 2014) | 19 lines
Fix spurious BUG when parse_type != SCST_USER_PARSE_STANDARD
Changeset 4224 introduced EXTRACHECKS for valid lba/data_len and state
at the end of the parsing phase of command processing.
However, the checks do not account for deferral of parsing to userland,
as occurs when SCST_USER_PARSE_CALL or SCST_USER_PARSE_EXCEPTION are specified.
In such cases the checks report errors on commands that userland has not yet
had an opportunity to parse.
NOTE: this includes a refactoring of the EXTRACHECKS to improve clarity.
The rework is not exactly equivalent to the original code, but does
conform to the comments describing the original code.
Specifically, the original code would not trap an illegal command state
unless there was also an illegal lba or data_len.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
with some improvements
........
r5276 | bvassche | 2014-02-08 10:24:28 +0200 (Sat, 08 Feb 2014) | 1 line
scst: Build fix for kernel versions before 2.6.37
........
r5277 | bvassche | 2014-02-09 18:50:10 +0200 (Sun, 09 Feb 2014) | 1 line
scst_debug.h: Avoid that the sBUG() and sBUG_ON() definitions confuse the smatch static code checker
........
r5281 | vlnb | 2014-02-13 06:02:56 +0200 (Thu, 13 Feb 2014) | 8 lines
iscsi-scst: fix offset calculation
Fixed a subtle bug in iSCSI-SCST with incorrectly calculated offsets
for non-page aligned transfers. Originally discovered, investigated and
fix suggested by Кирилл Тюшев, then Shahar Salzman tested and proved it.
See http://sourceforge.net/mailarchive/message.php?msg_id=31924078
........
r5282 | vlnb | 2014-02-13 06:15:31 +0200 (Thu, 13 Feb 2014) | 3 lines
Web update
........
r5283 | bvassche | 2014-02-14 15:05:55 +0200 (Fri, 14 Feb 2014) | 7 lines
Makefiles: remove redundant 'depmod' invocations
Running 'make modules_install' already triggers invocation of depmod,
hence leave it out from those Makefiles that use 'make modules_install'.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
........
r5284 | bvassche | 2014-02-14 15:48:54 +0200 (Fri, 14 Feb 2014) | 2 lines
Makefiles: Convert from "install" to "make modules_install"
........
r5285 | bvassche | 2014-02-14 16:46:11 +0200 (Fri, 14 Feb 2014) | 1 line
mvsas_tgt/Makefile: Remove trailing whitespace
........
r5286 | bvassche | 2014-02-14 17:52:10 +0200 (Fri, 14 Feb 2014) | 18 lines
Makefiles: calculate KVER properly
When deriving the kernel version (KVER) from KDIR, the file
$(KDIR)/include/config/kernel.release should be preferred over
'make kernelversion'.
For example, the Ubuntu 3.2.0-23-generic kernel has a kernel.release
file containing '3.2.0-23-generic', but 'make kernelversion' returns
3.2.14. Since the modules are stored under /lib/modules/3.2.0-23-generic,
the value in kernel.release is the correct one to use.
Also:
- Evaluate KVER only once
- All depmod commands must include KVER
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
[bvanassche: Split long lines / removed trailing whitespace]
........
r5287 | bvassche | 2014-02-14 21:27:09 +0200 (Fri, 14 Feb 2014) | 1 line
nightly build: Update kernel versions
........
r5288 | bvassche | 2014-02-18 10:31:44 +0200 (Tue, 18 Feb 2014) | 22 lines
scst, qla2x00t: Prevent inappropriate sleeping with a real-time kernel
With a realtime kernel with full preemption (CONFIG_PREEMPT_RT_FULL),
spinlocks can sleep, interrupt handlers run in thread context, and
the standard local_irq functions manipulate preemptibility, not HW
interruptibility. Under these conditions, most calls to local_irq
functions should be replaced by no-ops. The CONFIG_PREEMPT_RT patch
defines _nort versions of local_irq functions that compile away
under CONFIG_PREEMPT_RT_FULL and compile to their "normal"
equivalents otherwise.
Define _nort equivalents to support compilation against both
"normal" and RT-patched kernels, and use the _nort local_irq
functons in cases where spinlocks are taken within a
local_irq_save() or local_irq_disable() block. Without these
changes, runtime warnings about "sleeping function called from
invalid context" occur.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
[bvanassche: Edited patch description and comment in scst_priv.h]
........
r5289 | bvassche | 2014-02-18 10:40:36 +0200 (Tue, 18 Feb 2014) | 13 lines
Makefiles: respect DESTDIR when specified
Not all SCST components handle DESTDIR properly, or at all.
In particular:
* INSTALL_MOD_PATH should account for DESTDIR when 'make modules_install'
is invoked, so the kernel make infrastructure deploys the modules
and runs depmod against the proper directory tree.
* depmods must include a '-b' option to reference the proper directory tree.
* Drop special ISCSI_DESTDIR.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
........
r5290 | bvassche | 2014-02-18 10:41:30 +0200 (Tue, 18 Feb 2014) | 7 lines
Makefiles: 'uninstall' target fixes
Some components don't have 'uninstall' targets although the top-level
Makefile references them. Some others don't remove the proper file.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
........
r5291 | vlnb | 2014-02-19 05:45:48 +0200 (Wed, 19 Feb 2014) | 8 lines
Fix incorrect start and length calculation for issuing block discard requests
Block layer always expects start and length in 512 byte blocks, so they
should be corrected for non-512b SCST devices.
Original patch from Ken Raeburn <raeburn@permabit.com>
........
r5292 | vlnb | 2014-02-19 06:06:10 +0200 (Wed, 19 Feb 2014) | 3 lines
Cleanups
........
r5293 | vlnb | 2014-02-19 06:21:00 +0200 (Wed, 19 Feb 2014) | 12 lines
scst_user: Complete "Preparing" / "finished" symmetry
Add some TRACE statements so events sent to userland are bracketed by
"Preparing" and "finished". This makes it a little easier to find the
boundaries between the various stages of command processing in trace output.
Note, this patch does not implement a 'finished' message for TM events;
there is already a "TM reply" message that can serve that purpose.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
........
r5294 | bvassche | 2014-02-19 09:38:57 +0200 (Wed, 19 Feb 2014) | 1 line
nightly build: Update kernel versions
........
r5295 | bvassche | 2014-02-19 10:51:35 +0200 (Wed, 19 Feb 2014) | 1 line
scripts/blockdev-perftest: Fix bashisms
........
r5296 | vlnb | 2014-02-20 07:54:49 +0200 (Thu, 20 Feb 2014) | 3 lines
put_page_callback patch for 3.13.3+ kernels
........
r5300 | vlnb | 2014-02-21 04:08:05 +0200 (Fri, 21 Feb 2014) | 3 lines
Docs update
........
r5301 | bvassche | 2014-02-21 09:44:55 +0200 (Fri, 21 Feb 2014) | 1 line
nightly build: Add support for the put_page_callback-3.13.3 patch
........
r5302 | bvassche | 2014-02-21 09:48:21 +0200 (Fri, 21 Feb 2014) | 1 line
nightly build: Update kernel versions
........
r5303 | bvassche | 2014-02-21 12:02:11 +0200 (Fri, 21 Feb 2014) | 1 line
nightly build: Update kernel versions
........
r5304 | bvassche | 2014-02-21 12:09:45 +0200 (Fri, 21 Feb 2014) | 1 line
nightly build: Update kernel versions
........
r5305 | bvassche | 2014-02-24 08:56:05 +0200 (Mon, 24 Feb 2014) | 1 line
nightly build: Update kernel versions
........
r5306 | bvassche | 2014-02-24 08:56:44 +0200 (Mon, 24 Feb 2014) | 1 line
Spelling fix: initator -> initiator
........
r5307 | bvassche | 2014-02-24 09:30:50 +0200 (Mon, 24 Feb 2014) | 1 line
make rpm: Do not remove rpmbuilddir
........
r5308 | bvassche | 2014-02-24 09:39:45 +0200 (Mon, 24 Feb 2014) | 5 lines
scst_local: Add newline to sysfs output
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
[bvanassche: Reduced source code line length to 80 columns]
........
r5309 | bvassche | 2014-02-25 12:55:36 +0200 (Tue, 25 Feb 2014) | 1 line
put_page_callback-3.12.11.patch: Add
........
r5310 | bvassche | 2014-02-25 12:57:27 +0200 (Tue, 25 Feb 2014) | 1 line
put_page_callback-3.10.30.patch: Add
........
r5311 | bvassche | 2014-02-25 12:58:08 +0200 (Tue, 25 Feb 2014) | 1 line
nightly build: Add support for kernels >= 3.10.30 and >= 3.12.11
........
r5312 | bvassche | 2014-02-25 12:59:54 +0200 (Tue, 25 Feb 2014) | 1 line
nightly build: Update kernel versions
........
r5315 | vlnb | 2014-02-26 04:32:39 +0200 (Wed, 26 Feb 2014) | 3 lines
Make internal memory layout more cache friendly
........
r5316 | vlnb | 2014-02-26 04:49:38 +0200 (Wed, 26 Feb 2014) | 5 lines
scst_vdisk: Make vendor, product ID and related fields configurable via sysfs
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5320 | bvassche | 2014-03-02 10:49:50 +0200 (Sun, 02 Mar 2014) | 1 line
Documentation spelling fix: change INQUERY into INQUIRY
........
git-svn-id: http://svn.code.sf.net/p/scst/svn/branches/iser@5321 d57e44dd-8a1f-0410-8b47-8ef2f437770f
SCSI RDMA Protocol (SRP) Target driver for Linux
=================================================
The SRP target driver has been designed to work on top of the Linux
InfiniBand kernel drivers -- either the InfiniBand drivers included
with a Linux distribution or the OFED InfiniBand drivers. For more
information about using the SRP target driver in combination with
OFED, see also README.ofed.
The SRP target driver has been implemented as an SCST driver. This
makes it possible to support a lot of I/O modes on real and virtual
devices. A few examples of supported device handlers are:
1. scst_disk. This device handler implements transparent pass-through
of SCSI commands and allows SRP to access and to export real
SCSI devices, i.e. disks, hardware RAID volumes, tape libraries
as SRP LUNs.
2. scst_vdisk, either in fileio or in blockio mode. This device handler
allows to export software RAID volumes, LVM volumes, IDE disks, and
normal files as SRP LUNs.
3. nullio. The nullio device handler allows to measure the performance
of the SRP target implementation without performing any actual I/O.
Installation
------------
Building and installing the SRP target driver is possible as follows:
cd ${SCST_DIR}
make -s scst_clean scst scst_install
make -s srpt_clean srpt srpt_install
make -s scstadm scstadm_install
The ib_srpt kernel module supports the following parameters:
* one_target_per_port (boolean) and
* use_node_guid_in_target_name (boolean)
ib_srpt can operate in one of the following three modes:
1. Access control configuration per HCA and assigning a "ib_srpt_target_<n>"
style name to each HCA.
2. Access control configuration per HCA and referring to a HCA via its node
GUID (e.g. 0002:c903:0005:f34a).
3. Access control configuration per HCA port and referring to a HCA via its
port GID (e.g. fe80:0000:0000:0000:0002:c903:0005:f34b).
Mode (1) is choosen if both one_target_per_port and
use_node_guid_in_target_name are false. Mode (2) is choosen if
one_target_per_port is false and use_node_guid_in_target_name is true. Mode
(3) is choosen if one_target_per_port is true.
* srp_max_req_size (number)
Maximum size of an SRP control message in bytes. Examples of SRP control
messages are: login request, logout request, data transfer request, ...
The larger this parameter, the more scatter/gather list elements can be
sent at once. Use the following formula to compute an appropriate value
for this parameter: 68 + 16 * (sg_tablesize). The default value of
this parameter is 4148, which corresponds to an sg table size of 255.
* srp_max_rsp_size (number)
Maximum size of an SRP response message in bytes. Sense data is sent back
via these messages towards the initiator. The default size is 256 bytes.
With this value there remains (256-36) = 220 bytes for sense data.
* srp_max_rdma_size (number)
Maximum number of bytes that may be transferred at once via RDMA. Defaults
to 65536 bytes, which is sufficient to use the full bandwidth of low-latency
HCAs. Increasing this value may decrease latency for applications
transferring large amounts of data at once.
* srpt_srq_size (number, default 4095)
ib_srpt uses a shared receive queue (SRQ) for processing incoming SRP
requests. This number may have to be increased when a large number of
initiator systems is accessing a single SRP target system.
* srpt_sq_size (number, default 4096)
Per-channel InfiniBand send queue size. The default setting is sufficient
for a credit limit of 128. Changing this parameter to a smaller value may
cause RDMA requests to be retried and hence may slow down data transfer
severely.
* trace_flag (unsigned integer, only available in debug builds)
The individual bits of the trace_flag parameter define which categories of
trace messages should be sent to the kernel log and which ones not.
Configuring the SRP Target System
---------------------------------
The first step is to choose whether access control will be controlled per
HCA or per HCA port and to create a modprobe configuration file that reflects
this choice. An example:
# cat /etc/modprobe.d/ib_srpt.conf
options ib_srpt one_target_per_port=1
Next, create the file /etc/scst.conf. You can create this file with
the scstadmin tool as follows:
/etc/init.d/scst stop
/etc/init.d/scst start
Now configure SCST using scstadmin - see also the scstadmin documentation for
further information. Once finished, save the configuration to /etc/scst.conf:
scstadmin -write_config /etc/scst.conf (sysfs version)
or
scstadmin -WriteConfig /etc/scst.conf (procfs version)
One can verify the contents of scst.conf e.g. as follows:
cat /etc/scst.conf
Now verify that loading the configuration from file works correctly:
/etc/init.d/scst reload
Configuring the SRP Initiator System
------------------------------------
First of all, load the SRP kernel module as follows:
modprobe ib_srp
Next, discover the new SRP target by running the srp_daemon command:
for d in /dev/infiniband/umad*; do srp_daemon -oacd$d; done
If you want to let the initiator system log in to all SRP targets available
in the same InfiniBand subnet that is possible as follows (-e = execute):
for d in /dev/infiniband/umad*; do srp_daemon -oecd$d; done
If you want to let the initiator log in to a specific target you can do that
e.g. as follows:
echo "id_ext=0002c903000f1366,ioc_guid=0002c903000f1366,dgid=fe800000000000000002c903000f1367,pkey=ffff,service_id=0002c903000f1366" > /sys/class/infiniband_srp/${SRP_HCA_NAME}/add_target; done
The meaning of the parameters in the above command is as follows:
* id_ext: must match ioc_guid.
* ioc_guid: see also the documentation of the ib_srpt ioc_guid parameter.
* dgid: target HCA port GID to connect to.
* pkey: IB partition key (P_Key) of the target to connect to.
* service_id: must match ioc_guid.
Target GIDs can be queried e.g. via sysfs:
$ for f in /sys/devices/*/*/*/infiniband/*/ports/*/gids/0; do echo $f; \
cat $f | sed 's/://g'; done
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/1/gids/0
fe800000000000000002c9030005f34b
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/2/gids/0
fe800000000000000002c9030005f34c
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/1/gids/0
fe800000000000000002c9030003cca7
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/2/gids/0
fe800000000000000002c9030003cca8
Finally run lsscsi to display the details of the newly discovered SCSI disks:
lsscsi
SRP targets can be recognized in the output of lsscsi by looking for
the disk names assigned on the SCST target ("disk01" in the example below):
[8:0:0:0] disk SCST_FIO disk01 102 /dev/sdb
Target names
------------
The name assigned by the ib_srpt target driver to an SCST target is either
ib_srpt_target_<n>, the node GUID of a HCA in hexadecimal form with a colon
after every fourth digit or the port GUID with a colon afer every fourth
digit. The HCA node and port GUIDs can be obtained via the ibv_devinfo
command. An example:
# ibv_devinfo -v | grep -E '[^a-z]port:|guid|GID'
node_guid: 0002:c903:0005:f34e
sys_image_guid: 0002:c903:0005:f351
port: 1
GID[0]: fe80:0000:0000:0000:0002:c903:0005:f34f
port: 2
GID[0]: fe80:0000:0000:0000:0002:c903:0005:f350
Once the ib_srpt driver has been loaded the available SCST targets can be
queried as follows:
# (cd /sys/kernel/scst_tgt/targets/ib_srpt && ls -d [0-9a-f]*)
fe80:0000:0000:0000:0002:c903:0005:f34f
fe80:0000:0000:0000:0002:c903:0005:f350
Session names
-------------
The name assigned by the ib_srpt target driver to a session depends on the
mode in which it is operating. If one_target_per_port=y then the source port
GID is used as the session name. If one_target_per_port=n then the 128-bit SRP
initiator port identifier is used as the session name. This identifier is sent
by the SRP initiator to the SRP target via the SRP_LOGIN_REQ information unit.
The Linux SRP initiator (ib_srp) generates the initiator port identifier as
follows:
- The first eight bytes are the identifier extension ('initiator_ext' parameter
specified in the login string echoed into the sysfs file 'add_target').
- The last eight bytes are the GUID of the initiator HCA port used to
communicate with the target.
An example:
[ INITIATOR ]
$ for f in /sys/devices/*/*/*/infiniband/*/ports/*/gids/0; do echo
f; cat $f; done
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/1/gids/0
fe80:0000:0000:0000:0002:c903:0005:f34b
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/2/gids/0
fe80:0000:0000:0000:0002:c903:0005:f34c
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/1/gids/0
fe80:0000:0000:0000:0002:c903:0003:cca7
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/2/gids/0
fe80:0000:0000:0000:0002:c903:0003:cca8
[ TARGET, after login ]
$ (cd /sys/kernel/scst_tgt/targets/ib_srpt/[0-9a-f]* && ls -d sessions/*)
sessions/fe80:0000:0000:0000:0002:c903:0003:cca7
sessions/fe80:0000:0000:0000:0002:c903:0005:f34b
LUN masking
-----------
In a straightforward configuration every LUN is visible to every initiator.
It is possible however to make a different set of LUNs visible to each
initiator by using the LUN masking feature of SCST. SRP initiators are
identified by their session name (see above). An example of an scst.conf
file using LUN masking for ib_srpt:
TARGET_DRIVER ib_srpt {
TARGET fe80:0000:0000:0000:0002:c903:0005:f34b {
enabled 1
rel_tgt_id 1
# LUNs visible by all initiators not listed below
LUN 0 disk01
GROUP grp1 {
# LUNs visible by initiator system 1
LUN 0 disk02
INITIATOR fe80:0000:0000:0000:0002:c903:0005:f34b
}
GROUP grp2 {
# LUNs visible by initiator system 2
LUN 0 disk03
INITIATOR fe80:0000:0000:0000:0002:c903:0005:f34c
}
}
}
Adding and Removing LUNs Dynamically
------------------------------------
It is possible to add and/or remove LUNs on the target without restarting
target or initiator. This can be done either via scstadmin or directly via the
sysfs interface. Although the SCST core will notify the initiator about LUN
changes, Linux initiators will ignore these notifications. In order to bring a
Linux initiator again in sync after a LUN change, the initiator has to be told
to rescan SCSI devices. Rescanning SCSI devices is e.g. possible via the
rescsan-scsi-bus.sh script that can be found here:
http://www.garloff.de/kurt/linux/#rescan-scsi. An example:
$ rescan-scsi-bus --hosts=${srp_host_id} --channels=0 --ids=0 --luns=0-31
InfiniBand Partitions
---------------------
Just like a VLAN allows to segment traffic on an Ethernet network partitions
allow to segment traffic on an InfiniBand network. Each InfiniBand partition
is identified by a partition key which is a 16-bit number. During fabric
initialization the subnet manager assigns one or more partition keys to
each InfiniBand port. For opensm partitions are defined in
/etc/opensm/partitions.conf. ib_srpt uses the partition with index 0. Which
partition key corresponds to index 0 can be found out by querying sysfs:
$ head /sys/class/infiniband/*/ports/*/pkeys/0
==> /sys/class/infiniband/mlx4_0/ports/1/pkeys/0 <==
0xffff
==> /sys/class/infiniband/mlx4_0/ports/2/pkeys/0 <==
0xffff
High availability
-----------------
If there are redundant paths in the IB network between initiator and target,
automatic path failover can be set up on the initiator as follows:
* Edit /etc/infiniband/openib.conf to load the SRP driver and SRP HA daemon
automatically: set SRP_LOAD=yes and SRPHA_ENABLE=yes.
* To set up and use the high availability feature you need the dm-multipath
driver and multipath tool.
* Please refer to the OFED-1.x user manual for more detailed instructions
on how to enable and how to use the HA feature. See e.g.
http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED%20_Linux_user_manual_1_5_1_2.pdf.
A setup with automatic failover between redundant targets is possible by
installing and configuring DRBD on both targets. If the initiator system
supports mirroring (e.g. Linux), you can use the following approach:
* Configure DRBD in Active/Active mode.
* Configure the initiator(s) for mirroring between the redundant targets.
If the initiator system does not support mirroring (e.g. VMware ESX), you
can use the following approach:
* Configure DRBD in Active/Passive mode and enable STONITH mode in the
Heartbeat software.
For more information, see also:
* http://www.drbd.org/
* http://www.linux-ha.org/wiki/Main_Page
Performance Notes - Target Side
-------------------------------
* Building the SCST core and the ib_srpt target driver in release mode
improves performance compared to debug mode.
* When using high-latency storage devices (hard disks), the default value
choosen by SCST for DEVICE.threads_num should be fine. When using
low-latency storage devices though (SSDs), DEVICE.threads_num should be set
to 1 or 2 in /etc/scst.conf in order to reach optimal performance for small
block sizes (e.g. 4 KB).
* When multiple InfiniBand HCA's are present in a target system the Linux
kernel by default will assign the associated interrupt handlers to CPU 0.
Even irqbalance will often assign the interrupt handlers of multiple HCA's
to the same CPU. That is unfortunate because it leads to unfair handling of
SRP sessions. The solution is to assign InfiniBand HCA interrupts manually
to different CPU's. That's possible by writing looking up the InfiniBand
interrupt numbers in /proc/interrupts and by writing proper bitmasks into
/proc/irq/<n>/smp_affinity.
Performance Notes - Initiator Side
----------------------------------
* Choose a proper value for the ib_srp kernel module parameter
cmd_sg_entries. The default value 12 works well for buffered reads while
the throughput for write-dominated workloads improves by changing this value
into 255. One way to set this kernel module parameter is as follows:
echo options ib_srp cmd_sg_entries=255 >>/etc/modprobe.d/ib_srp.conf
* For multithreaded workloads using small block sizes changing rq_affinity
into 2 improves IOPS significantly (Linux kernel 3.1 and later; see also
commit 5757a6d76cdf6dda2a492c09b985c015e86779b1).
* For latency sensitive applications, using the noop scheduler at the initiator
side can give significantly better results than with other schedulers.
* The SRP initiator limits by default the queue depth to 64 commands. If your
workload benefits from a larger queue depth, enlarge the queue depth by
setting the max_cmd_per_lun parameter in the SRP login string.
* The following parameters have a small but measurable impact on SRP
performance:
* /sys/class/block/${dev}/queue/rotational
* /sys/class/block/${dev}/queue/rq_affinity
* /proc/irq/${ib_int_no}/smp_affinity
Performance Notes - Both Sides
------------------------------
* Disabling CONFIG_SCHED_DEBUG and CONFIG_SCHEDSTATS in the kernel config
helps.
* Disable CONFIG_IRQSOFF_TRACER such that CONFIG_TRACE_IRQFLAGS is disabled.
* Consider which memory allocator to use. With recent kernels using the SLUB
memory allocator instead of SLAB may help. On multi-socket systems the SLAB
memory allocator may result in better performance. Please note that SLAB is
tunable while SLUB is not. See also http://lkml.org/lkml/2010/7/9/264 and
http://www.ibm.com/developerworks/linux/library/l-linux-slab-allocator/.
Frequently Asked Questions
--------------------------
Q: Every now and then "SRP abort called" and "SRP reset_device called"
messages are logged at the initiator side. Around the same time I see the
following message in the target log: "ib_srpt: ***ERROR***: Command ...: IB
completion for idx ... has not been received in time (SRPT command state
...)". What is the meaning of these messages mean and how can I fix this ?
A: This means that a timeout occurred while a HCA was waiting for an
acknowledge message. Check the IB network for bad IB cables, bad HCA's
and/or bad switch ports. Also make sure that the HCA firmware is up to
date.
Q: Loading the kernel module ib_srpt triggers a kernel panic with a call trace
like the one below. What is the cause of this and how can this be solved ?
Call Trace:
[<ffffffffa02f2a50>] srpt_alloc_ioctx+0x60/0xb0 [ib_srpt]
[<ffffffffa02f2f0a>] srpt_alloc_ioctx_ring+0xea/0x1e0 [ib_srpt]
[<ffffffffa02f32e9>] srpt_add_one+0x2e9/0x670 [ib_srpt]
[<ffffffffa015a480>] ib_register_client+0x80/0xa0 [ib_core]
[<ffffffffa02421eb>] srpt_init_module+0x1eb/0x235 [ib_srpt]
[<ffffffff81000344>] do_one_initcall+0x34/0x1a0
[<ffffffff8107a63c>] sys_init_module+0xdc/0x260
[<ffffffff81002e3b>] system_call_fastpath+0x16/0x1b
A: This means that you are using a system on which OFED has been installed but
that ib_srpt has been compiled against the non-OFED kernel headers instead
of the OFED kernel headers. You can fix this by rebuilding ib_srpt against
the OFED kernel headers. The ib_srpt makefile should detect the OFED kernel
headers automatically - at least if ib_srpt is built after OFED has been
installed.
Feedback
--------
Send questions about this driver to scst-devel@lists.sourceforge.net, CC:
Vu Pham <vuhuong@mellanox.com> and Bart Van Assche <bvanassche@acm.org>.