mirror of
https://github.com/SCST-project/scst.git
synced 2026-05-22 05:01:27 +00:00
svn+ssh://yanb123@svn.code.sf.net/p/scst/svn/trunk
........
r5875 | bvassche | 2014-11-16 19:58:07 +0200 (Sun, 16 Nov 2014) | 1 line
nightly build: Update kernel versions
........
r5878 | bvassche | 2014-11-19 02:17:41 +0200 (Wed, 19 Nov 2014) | 1 line
srpt/Makefile: Add double quotes around a path
........
r5879 | bvassche | 2014-11-19 02:20:20 +0200 (Wed, 19 Nov 2014) | 1 line
scripts/generate-release-archive: Accept an optional list of file names
........
r5880 | bvassche | 2014-11-22 13:12:29 +0200 (Sat, 22 Nov 2014) | 1 line
nightly build: Update kernel versions
........
r5881 | bvassche | 2014-11-24 19:59:14 +0200 (Mon, 24 Nov 2014) | 4 lines
ib_srpt: Add support for HCA's that do not support SRQ
Based on a patch provided by Parav Pandit <Parav.Pandit@Emulex.Com>
........
r5882 | vlnb | 2014-11-26 09:02:17 +0200 (Wed, 26 Nov 2014) | 3 lines
Update for kernels 3.17.x
........
r5883 | bvassche | 2014-11-26 10:05:09 +0200 (Wed, 26 Nov 2014) | 1 line
Add kernel 3.17 build infrastructure
........
r5884 | bvassche | 2014-11-26 10:07:08 +0200 (Wed, 26 Nov 2014) | 1 line
nightly build: Add kernel 3.17
........
r5885 | bvassche | 2014-11-26 10:16:44 +0200 (Wed, 26 Nov 2014) | 6 lines
Fix kernel 3.17 checkpatch warnings about 'long long unsigned'
Avoid that checkpatch reports the following warning:
WARNING: type 'long long unsigned' should be specified in 'unsigned long long' order.
........
r5886 | bvassche | 2014-11-26 15:38:52 +0200 (Wed, 26 Nov 2014) | 1 line
Build fixes for RHEL 6.6 kernel 2.6.32-504
........
r5887 | bvassche | 2014-11-26 16:39:51 +0200 (Wed, 26 Nov 2014) | 1 line
ib_srpt: Make the send queue full messages more informational
........
r5888 | bvassche | 2014-11-26 18:25:57 +0200 (Wed, 26 Nov 2014) | 1 line
scripts/specialize-patch: Support blanks around numbers inside parentheses
........
r5889 | bvassche | 2014-11-26 21:42:10 +0200 (Wed, 26 Nov 2014) | 1 line
scripts/specialize-patch: Reduce noise in nightly build output
........
r5890 | vlnb | 2014-11-27 06:36:33 +0200 (Thu, 27 Nov 2014) | 3 lines
Cleanup
........
r5891 | bvassche | 2014-11-27 17:18:58 +0200 (Thu, 27 Nov 2014) | 1 line
scst.h: Add uintptr_t
........
r5892 | bvassche | 2014-11-27 17:19:21 +0200 (Thu, 27 Nov 2014) | 1 line
ib_srpt: Add support for immediate data
........
r5893 | bvassche | 2014-11-27 17:24:17 +0200 (Thu, 27 Nov 2014) | 1 line
ib_srpt: Log reject reason
........
r5894 | bvassche | 2014-11-27 17:29:29 +0200 (Thu, 27 Nov 2014) | 1 line
ib_srpt: Rework the max_sge computation changes from r5795
........
r5895 | bvassche | 2014-11-28 11:16:37 +0200 (Fri, 28 Nov 2014) | 1 line
scst: Add scripts/rebuild-rhel-kernel-rpm to the SCST release archive
........
r5903 | bvassche | 2014-12-03 13:50:06 +0200 (Wed, 03 Dec 2014) | 4 lines
scripts/rebuild-rhel-kernel-rpm: Fix an error message
Reported-by: Hiroyuki Sato <hiroysato@gmail.com>
........
r5904 | bvassche | 2014-12-03 19:06:57 +0200 (Wed, 03 Dec 2014) | 1 line
iscsi-scst/kernel/patches/rhel/put_page_callback-2.6.32-504.patch: Add
........
r5905 | bvassche | 2014-12-03 19:07:31 +0200 (Wed, 03 Dec 2014) | 1 line
scripts/rebuild-rhel-kernel-rpm: Add support for RHEL 6.6
........
r5910 | bvassche | 2014-12-04 13:50:58 +0200 (Thu, 04 Dec 2014) | 1 line
scripts/generate-kernel-patch: Swap two filters
........
r5912 | bvassche | 2014-12-04 14:19:56 +0200 (Thu, 04 Dec 2014) | 4 lines
/etc/init.d/scst: Exit with status code 0 upon 'start' if already running
Reported-by: Dimitar Tanev <dimitar@linuxdevgroup.org>
........
r5913 | vlnb | 2014-12-05 01:41:52 +0200 (Fri, 05 Dec 2014) | 3 lines
FORMAT commands should be strictly serialized
........
r5914 | vlnb | 2014-12-05 01:43:51 +0200 (Fri, 05 Dec 2014) | 3 lines
Oops, fix for the previous commit
........
r5928 | vlnb | 2014-12-06 07:02:27 +0200 (Sat, 06 Dec 2014) | 3 lines
Web updates
........
r5929 | bvassche | 2014-12-09 14:33:16 +0200 (Tue, 09 Dec 2014) | 1 line
rpm build: Add support for qla2x00t driver in QLogic git repository
........
r5931 | vlnb | 2014-12-11 06:27:17 +0200 (Thu, 11 Dec 2014) | 3 lines
Docs update
........
r5932 | vlnb | 2014-12-11 06:34:36 +0200 (Thu, 11 Dec 2014) | 8 lines
scst_vdisk: Increase virtual device name length
This change makes integration with OpenStack easier since OpenStack GUIDs
are 36 characters long: 32 hex characters and four dashes.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5933 | vlnb | 2014-12-11 06:38:04 +0200 (Thu, 11 Dec 2014) | 11 lines
vdisk_blockio: Report invalid scatterlists
It is possible for a target driver to pass a scatterlist via
scst_cmd_set_tgt_sg() that is valid for the vdisk_fileio handler
but not for the vdisk_blockio handler. Complain loudly if an invalid
scatterlist is passed to vdisk_blockio because such scatterlists
cause silent data corruption with most Linux block drivers.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5934 | bvassche | 2014-12-11 14:31:03 +0200 (Thu, 11 Dec 2014) | 1 line
scst_vdisk: Follow-up for r5932
........
r5935 | bvassche | 2014-12-11 14:37:02 +0200 (Thu, 11 Dec 2014) | 1 line
ib_srpt: Log P_Key during login
........
r5936 | bvassche | 2014-12-12 11:29:42 +0200 (Fri, 12 Dec 2014) | 1 line
scripts/generate-kernel-patch: Include scst_pg.sgml instead of sgv_cache.sgml
........
r5937 | bvassche | 2014-12-12 11:34:55 +0200 (Fri, 12 Dec 2014) | 1 line
doc/scst_pg.sgml: Remove trailing whitespace
........
r5938 | bvassche | 2014-12-17 09:48:40 +0200 (Wed, 17 Dec 2014) | 1 line
nightly build: Update kernel versions
........
r5939 | vlnb | 2014-12-19 05:50:58 +0200 (Fri, 19 Dec 2014) | 3 lines
Fallback to the old qla driver if the git one not detected
........
r5940 | vlnb | 2014-12-19 05:55:14 +0200 (Fri, 19 Dec 2014) | 7 lines
Replace in cases, where sporadic failures are possible, HARDWARE ERROR
by INTERNAL TARGET FAILURE, which is retriable (some OS'es don't retry
HARDWARE ERROR)
Reported and suggested by Shahar Salzman <shahar.salzman@kaminario.com>
........
r5941 | vlnb | 2014-12-20 05:48:07 +0200 (Sat, 20 Dec 2014) | 7 lines
scst_vdisk: Only accept NAA IDs allowed by SPC
See also paragraph 7.8.6.6 NAA designator format in SPC-4.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5942 | vlnb | 2014-12-20 05:49:23 +0200 (Sat, 20 Dec 2014) | 11 lines
scst_vdisk: Remove superfluous llseek() calls
vfs_read() and vfs_write() ignore the file offset set by llseek().
Hence remove the llseek() calls that occur just before vfs_read() and
vfs_write(). See also the implementation in the Linux kernel of the
pread64() and pwrite64() system calls for examples of code that uses
vfs_read() and vfs_write().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5943 | bvassche | 2014-12-22 14:28:13 +0200 (Mon, 22 Dec 2014) | 1 line
Source code spelling fix: Equivilant -> Equivalent
........
r5944 | bvassche | 2014-12-22 14:28:56 +0200 (Mon, 22 Dec 2014) | 1 line
Source code spelling fix: accesss -> access
........
r5945 | bvassche | 2014-12-22 14:29:51 +0200 (Mon, 22 Dec 2014) | 1 line
Source code spelling fix: addres -> address
........
r5946 | bvassche | 2014-12-22 14:31:08 +0200 (Mon, 22 Dec 2014) | 1 line
Source code spelling fix: authentification -> authentication
........
r5947 | bvassche | 2014-12-22 14:32:30 +0200 (Mon, 22 Dec 2014) | 1 line
Source code comment spelling fix: explicitely -> explicitly
........
r5948 | bvassche | 2014-12-22 14:33:06 +0200 (Mon, 22 Dec 2014) | 1 line
Source code comment spelling fix: hander -> handler
........
r5949 | bvassche | 2014-12-22 14:33:37 +0200 (Mon, 22 Dec 2014) | 1 line
Source code comment spelling fix: loosing -> losing
........
r5950 | bvassche | 2014-12-22 14:35:00 +0200 (Mon, 22 Dec 2014) | 1 line
Spelling fix: occured -> occurred
........
r5951 | bvassche | 2014-12-22 14:35:51 +0200 (Mon, 22 Dec 2014) | 1 line
Source code comment spelling fix: refering -> referring
........
r5952 | bvassche | 2014-12-22 14:36:47 +0200 (Mon, 22 Dec 2014) | 1 line
Spelling fix: shrinked -> shrunk
........
r5953 | bvassche | 2014-12-22 15:08:34 +0200 (Mon, 22 Dec 2014) | 1 line
Spelling fix: choosen -> chosen
........
r5954 | bvassche | 2014-12-22 15:09:20 +0200 (Mon, 22 Dec 2014) | 1 line
Spelling fix: existant -> existent
........
r5955 | bvassche | 2014-12-22 15:10:41 +0200 (Mon, 22 Dec 2014) | 1 line
Update for kernel 3.18
........
r5956 | bvassche | 2014-12-22 15:15:55 +0200 (Mon, 22 Dec 2014) | 1 line
Spelling fix: immediatelly -> immediately
........
r5957 | bvassche | 2014-12-24 16:28:36 +0200 (Wed, 24 Dec 2014) | 1 line
nightly build: Add kernel 3.18
........
r5958 | bvassche | 2014-12-29 14:14:52 +0200 (Mon, 29 Dec 2014) | 1 line
scst_lib: Convert spaces into tabs (reported by checkpatch)
........
r5959 | bvassche | 2015-01-06 15:25:28 +0200 (Tue, 06 Jan 2015) | 1 line
scst_calc_block_shift: Log block shift and sector size upon mismatch
........
r5960 | bvassche | 2015-01-07 11:20:06 +0200 (Wed, 07 Jan 2015) | 4 lines
scst_local: Fix unique per session sas address
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
........
r5961 | bvassche | 2015-01-09 14:23:25 +0200 (Fri, 09 Jan 2015) | 4 lines
scst_sysfs: return EINVAL on too big LUN
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
........
r5962 | bvassche | 2015-01-10 17:52:57 +0200 (Sat, 10 Jan 2015) | 1 line
nightly build: Update kernel versions
........
r5963 | bvassche | 2015-01-13 10:42:28 +0200 (Tue, 13 Jan 2015) | 10 lines
scst: Switch to thread context before executing a reservation command
Persistent reservation commands need thread context because
scst_pr_is_cmd_allowed() locks the PR mutex. Reservation commands
either need BH or thread context. Hence switch from atomic to
thread context before processing such commands.
Reported-by: Shahar Salzman <shahar.salzman@kaminario.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5964 | bvassche | 2015-01-13 10:51:08 +0200 (Tue, 13 Jan 2015) | 5 lines
scst_parse_unmap_descriptors(): Avoid using GFP_KERNEL in atomic context
Reported-by: Shahar Salzman <shahar.salzman@kaminario.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5965 | bvassche | 2015-01-13 10:55:46 +0200 (Tue, 13 Jan 2015) | 68 lines
qla2x00t: Copy entire SCST sense buffer to q2x ctio
There seems to be a bug in passing sense information to QLA HBAs, where
the last 2 bytes of the sense data (ASC, ASCQ) are not copied to the low
level sense buffer.
We encountered this in ESX, which relies on these 2 bytes to parse the
MISCOMPARE sense code (0xE1, 0x1D, 0x00).
Bellow is a simple test to recreate this issue, but during vMotion
operations (where VMs are moved from one host to another), this may
cause the operation to fail leaving the VM in an inconsistent state.
The test I ran to verify that we are indeed missing the bytes is the
following:
1. Create a SCST based device
2. Expose the device to 2 ESX hosts
3. Format the device as VMFS5, create a test directory
4. From both hosts, I start writing to this directory (no VMs involved,
just write normal files)
At this stage, both ESX hosts try to take access to the directory.
The VMFS filesystem contains a per-directory lock which is managed by
COMPARE AND WRITE command.
Each ESX will attempt to change the VMFS lock location from unlocked to
locked to create the new file.
Obviously there are bound to be failures (which are equivalent to
programming locking conflicts), these are reported by the MISCOMPARE
sense code.
Upon these MISCOMPARE errors, the host will re-try taking the lock until
it succeeds, and will then proceed to perform the write operation on the
directory.
Due to the bug in copying the sense buffer from the SCST core to the QLA
ctio, instead of the full sense code, only the key (0xE) is sent, and
ESX does not know how to handle it resulting in IO error.
Here are the errors as they appear on the command line:
/vmfs/volumes/54a297c4-ca5af1cc-7f94-002219d20f28/ats_test #
./open_close_test-esx2.sh
./open_close_test-esx2.sh: line 8: can't create
ats_fileoptest-esx2_1.txt: Input/output error
./open_close_test-esx2.sh: line 8: can't create
ats_fileoptest-esx2_21.txt: Input/output error
./open_close_test-esx2.sh: line 8: can't create
ats_fileoptest-esx2_110.txt: Input/output error
./open_close_test-esx2.sh: line 8: can't create
ats_fileoptest-esx2_111.txt: Input/output error
In the /var/log/vmkernel.log, we can see that the sense information is
missing (0xE, 0x0, 0x0) instead of (0xE, 0x1D, 0x0).
2014-12-30T12:13:20.714Z cpu6:33519)ScsiDeviceIO: 2338:
Cmd(0x412e84f957c0) 0x89, CmdSN 0x234d from world 519051 to dev
"eui.0024f400d5020007" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x0 0x0.
2014-12-30T12:13:20.766Z cpu6:33519)ScsiDeviceIO: 2338:
Cmd(0x412e84f91d00) 0x89, CmdSN 0x2350 from world 519051 to dev
"eui.0024f400d5020007" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x0 0x0.
2014-12-30T12:13:20.766Z cpu6:33519)ScsiDeviceIO: 2338:
Cmd(0x412e80449fc0) 0x89, CmdSN 0x234f from world 519051 to dev
"eui.0024f400d5020007" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x0 0x0.
This patch fixes this issue, the test will run without a problem with the
fix (no IO errors, all the files are properly written to the directory).
Signed-off-by: Shahar Salzman <shahar.salzman@kaminario.com>
Reviewed-by: Eran Mann <eran.mann@kaminario.com>
[bvanassche: simplified implementation]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
........
r5966 | bvassche | 2015-01-13 11:38:09 +0200 (Tue, 13 Jan 2015) | 5 lines
qla2x00t: Register for RSCNs in target mode
The QLogic firmware and qla2xxx do not register for RSCNs in
target-only mode, so do that explicitly.
........
r5967 | bvassche | 2015-01-14 10:06:12 +0200 (Wed, 14 Jan 2015) | 1 line
scst_targ: Use tabs instead of spaces for indentation (detected by checkpatch)
........
r5968 | bvassche | 2015-01-15 10:58:39 +0200 (Thu, 15 Jan 2015) | 4 lines
scst_targ: Avoid triggering a kernel panic if dev_user_parse() returns SCST_CMD_STATE_STOP
Reported-by: Ilan Steinberg <ilan.steinberg@kaminario.com>
........
r5969 | vlnb | 2015-01-16 03:21:10 +0200 (Fri, 16 Jan 2015) | 3 lines
Fix READ BUFFER and WRITE BUFFER commands
........
r5970 | vlnb | 2015-01-16 05:16:26 +0200 (Fri, 16 Jan 2015) | 3 lines
Follow up for r5968
........
r5971 | vlnb | 2015-01-16 05:53:29 +0200 (Fri, 16 Jan 2015) | 5 lines
Report during user devices unjam LUN NOT SUPPORTED sense
Reported-By: shahar.salzman <shahar.salzman@kaminario.com>
........
r5972 | bvassche | 2015-01-16 15:01:58 +0200 (Fri, 16 Jan 2015) | 2 lines
scst.spec.in: Rename variable kver into kversion
........
r5973 | bvassche | 2015-01-16 15:12:22 +0200 (Fri, 16 Jan 2015) | 2 lines
scst.spec.in: Pass kernel version via RPM-variable %{kversion} instead of shell variable ${KVER}
........
r5974 | bvassche | 2015-01-16 15:16:06 +0200 (Fri, 16 Jan 2015) | 6 lines
scst.spec.in: Determine version number correctly on a koji server
This patch has been tested on a koji build server and also on four
different RPM-based distributions (CentOS 7, Fedora 20, openSuSE 13.2
and SLES 11 SP3).
........
r5975 | bvassche | 2015-01-16 18:12:38 +0200 (Fri, 16 Jan 2015) | 1 line
scst.spec.in: Leave out kernel version from RPM name
........
r5976 | bvassche | 2015-01-16 18:20:10 +0200 (Fri, 16 Jan 2015) | 1 line
scst.spec.in: Add DKMS support
........
r5977 | vlnb | 2015-01-20 06:18:07 +0200 (Tue, 20 Jan 2015) | 3 lines
Revert r5964 as not needed
........
r5978 | vlnb | 2015-01-20 06:20:13 +0200 (Tue, 20 Jan 2015) | 3 lines
Revert r5963 as not needed
........
r5979 | bvassche | 2015-01-20 17:04:23 +0200 (Tue, 20 Jan 2015) | 13 lines
scst: Rework SCSI pass-through support for kernel versions >= 2.6.30
Changes in this patch:
- Rework the SCSI pass-through code such that for kernel versions
>= 2.6.30 the scst_exec_req_fifo patch is no longer needed.
- Modify the pass-through code such that blk_rq_append_bio() is only
called for kernel version 2.6.30. For later kernel versions
blk_make_request() is called instead.
- Rework scst_scsi_exec_async().
- Add debug tracing of SCSI pass-through result status.
- Add a lockdep_assert_held() call in scsi_end_async().
........
r5980 | bvassche | 2015-01-20 19:13:13 +0200 (Tue, 20 Jan 2015) | 1 line
nightly build: Update kernel versions
........
r5981 | vlnb | 2015-01-21 06:15:42 +0200 (Wed, 21 Jan 2015) | 3 lines
Follow up for r5979
........
r5982 | vlnb | 2015-01-21 06:20:53 +0200 (Wed, 21 Jan 2015) | 5 lines
Fix returning changeable values for caching mode page
Reported by Consus <consus@gmx.com>
........
r5983 | bvassche | 2015-01-21 15:11:56 +0200 (Wed, 21 Jan 2015) | 1 line
scst.h: Fix a sparse warning for kernels 2.6.29..2.6.31
........
r5984 | vlnb | 2015-01-22 07:03:17 +0200 (Thu, 22 Jan 2015) | 9 lines
[PATCH] scst_local: Fix bidirectional command support
scsi_setup_cmnd() sets sc_data_direction to DMA_TO_DEVICE for bidirectional
commands. Hence test SCpnt->request->next_rq instead of sc_data_direction
to figure out whether or not a command is bidirectional.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
........
r5985 | vlnb | 2015-01-22 07:06:45 +0200 (Thu, 22 Jan 2015) | 12 lines
[PATCH] scst_main: Suppress a checkpatch warning triggered by INIT_CACHEP{,_ALIGN}
Avoid that checkpatch v3.18 reports the following warning for these
two macros:
WARNING: Macros with flow control statements should be avoided
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
........
r5986 | vlnb | 2015-01-22 07:09:17 +0200 (Thu, 22 Jan 2015) | 9 lines
scst_vdisk: Micro-optimize vdisk_caching_pg
This patch does not change any behavior but micro-optimizes
vdisk_caching_pg(). Declaring the array caching_pg[] const reduces
11 bytes from the assembler code of this function.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
........
r5987 | vlnb | 2015-01-22 07:10:42 +0200 (Thu, 22 Jan 2015) | 10 lines
scst: Suppress a smatch warning in vdisk_unmap_range()
Avoid that the static source code analysis tool 'smatch' reports
the following warning:
vdisk_unmap_range() warn: should 'blocks << cmd->dev->block_shift' be a 64 bit type?
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
........
r5988 | vlnb | 2015-01-22 07:13:59 +0200 (Thu, 22 Jan 2015) | 27 lines
scst_vdisk: Fix zero-copy read for tmpfs
For some filesystems, e.g. tmpfs, address_space.readpage is NULL.
Disable zero-copy reading for such filesystems. See also shmem_aops
in mm/shmem.c. See also inode_init_always() and empty_aops in fs/inode.c.
This patch avoids that the following call trace is triggered:
BUG: unable to handle kernel NULL pointer dereference at (null)
Call Trace:
[<ffffffffa0547d66>] prepare_read+0x106/0x1d0 [scst_vdisk]
[<ffffffffa0547f20>] fileio_alloc_data_buf+0xf0/0x330 [scst_vdisk]
[<ffffffffa046fc9b>] scst_prepare_space+0x9b/0x6e0 [scst]
[<ffffffffa047d4d5>] scst_process_active_cmd+0x545/0x840 [scst]
[<ffffffffa047dad2>] scst_cmd_init_done+0x302/0x5d0 [scst]
[<ffffffffa0563ab2>] scst_cmd_init_stage1_done.constprop.37+0x12/0x20 [iscsi_scst]
[<ffffffffa056a9ea>] scsi_cmnd_start+0x25a/0x550 [iscsi_scst]
[<ffffffffa056b4a8>] cmnd_rx_start+0x148/0x1a0 [iscsi_scst]
[<ffffffffa056e4f8>] process_read_io+0x3b8/0x800 [iscsi_scst]
[<ffffffffa056ea07>] scst_do_job_rd+0xc7/0x220 [iscsi_scst]
[<ffffffffa056efed>] istrd+0x16d/0x2e0 [iscsi_scst]
[<ffffffff81079efd>] kthread+0xed/0x110
[<ffffffff817227fc>] ret_from_fork+0x7c/0xb0
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
........
r5989 | vlnb | 2015-01-24 07:37:57 +0200 (Sat, 24 Jan 2015) | 5 lines
scst_local: Rework data direction detection code
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
........
r5990 | bvassche | 2015-01-26 13:32:32 +0200 (Mon, 26 Jan 2015) | 1 line
ib_srpt: Detect Mellanox OFED 2.3 correctly
........
r5991 | vlnb | 2015-01-28 07:07:46 +0200 (Wed, 28 Jan 2015) | 3 lines
Cleanups
........
git-svn-id: http://svn.code.sf.net/p/scst/svn/branches/iser@5993 d57e44dd-8a1f-0410-8b47-8ef2f437770f
463 lines
19 KiB
Plaintext
463 lines
19 KiB
Plaintext
SCSI RDMA Protocol (SRP) Target driver for Linux
|
|
=================================================
|
|
|
|
The SRP target driver has been designed to work on top of the Linux RDMA
|
|
kernel drivers -- either the RDMA drivers included with a Linux distribution
|
|
or the OFED RDMA drivers. For more information about using the SRP target
|
|
driver in combination with OFED, see also README.ofed.
|
|
|
|
The SRP target driver has been implemented as an SCST driver. This
|
|
makes it possible to support a lot of I/O modes on real and virtual
|
|
devices. A few examples of supported device handlers are:
|
|
|
|
1. scst_disk. This device handler implements transparent pass-through
|
|
of SCSI commands and allows SRP to access and to export real
|
|
SCSI devices, i.e. disks, hardware RAID volumes, tape libraries
|
|
as SRP LUNs.
|
|
|
|
2. scst_vdisk, either in fileio or in blockio mode. This device handler
|
|
allows to export software RAID volumes, LVM volumes, IDE disks, and
|
|
normal files as SRP LUNs.
|
|
|
|
3. nullio. The nullio device handler allows to measure the performance
|
|
of the SRP target implementation without performing any actual I/O.
|
|
|
|
|
|
Installation
|
|
------------
|
|
|
|
Building and installing the SRP target driver is possible as follows:
|
|
|
|
cd ${SCST_DIR}
|
|
if type -p rpm >/dev/null; then
|
|
make -s rpm
|
|
sudo rpm -U rpmbuilddir/RPMS/*/*rpm scstadmin/rpmbuilddir/RPMS/*/*rpm
|
|
else
|
|
make -s scst_clean srpt_clean scst srpt scstadmin
|
|
sudo make -s scst_install srpt_install scstadm_install
|
|
fi
|
|
|
|
The ib_srpt kernel module supports the following parameters:
|
|
|
|
* max_sge_delta (unsigned): Number to subtract from max_sge. Some but not
|
|
all HCA's allow to use up to max_sge S/G-list elements in RDMA
|
|
communication. The default value of this parameter is 3 and works with all
|
|
HCA's. If you know that the HCA's that are used by the ib_srpt driver allow
|
|
to use S/G-lists that are longer than max_sge - 3 then you can decrease this
|
|
parameter. Note: setting this parameter too low will cause SRP every login
|
|
to fail and will cause a message similar to the following to be logged on
|
|
the target system: "ib_srpt: RDMA t ... for idx ... failed with status 12".
|
|
* one_target_per_port (boolean) and
|
|
* use_node_guid_in_target_name (boolean)
|
|
ib_srpt can operate in one of the following three modes:
|
|
1. Access control configuration per HCA and assigning a "ib_srpt_target_<n>"
|
|
style name to each HCA.
|
|
2. Access control configuration per HCA and referring to a HCA via its node
|
|
GUID (e.g. 0002:c903:0005:f34a).
|
|
3. Access control configuration per HCA port and referring to a HCA via its
|
|
port GID (e.g. fe80:0000:0000:0000:0002:c903:0005:f34b).
|
|
Mode (1) is chosen if both one_target_per_port and
|
|
use_node_guid_in_target_name are false. Mode (2) is chosen if
|
|
one_target_per_port is false and use_node_guid_in_target_name is true. Mode
|
|
(3) is chosen if one_target_per_port is true. This last mode is the
|
|
default mode.
|
|
* rdma_cm_port (number)
|
|
A 16-bit number that specifies the port number to be registered via the
|
|
RDMA/CM. Must be specified to make communication over RoCE or iWARP
|
|
possible. If this parameter is zero (the default value) the SRP target
|
|
driver does not register with the RDMA/CM.
|
|
* srp_max_req_size (number)
|
|
Maximum size of an SRP control message in bytes. Examples of SRP control
|
|
messages are: login request, logout request, data transfer request, ...
|
|
The larger this parameter, the more scatter/gather list elements can be
|
|
sent at once. Use the following formula to compute an appropriate value
|
|
for this parameter: 68 + 16 * (sg_tablesize). The default value of
|
|
this parameter is 4148, which corresponds to an sg table size of 255.
|
|
* srp_max_rsp_size (number)
|
|
Maximum size of an SRP response message in bytes. Sense data is sent back
|
|
via these messages towards the initiator. The default size is 256 bytes.
|
|
With this value there remains (256-36) = 220 bytes for sense data.
|
|
* srp_max_rdma_size (number)
|
|
Maximum number of bytes that may be transferred at once via RDMA. Defaults
|
|
to 65536 bytes, which is sufficient to use the full bandwidth of low-latency
|
|
HCAs. Increasing this value may decrease latency for applications
|
|
transferring large amounts of data at once.
|
|
* srpt_srq_size (number, default 4095)
|
|
ib_srpt uses a shared receive queue (SRQ) for processing incoming SRP
|
|
requests. This number may have to be increased when a large number of
|
|
initiator systems is accessing a single SRP target system.
|
|
* srpt_sq_size (number, default 4096)
|
|
Per-channel InfiniBand send queue size. The default setting is sufficient
|
|
for a credit limit of 128. Changing this parameter to a smaller value may
|
|
cause RDMA requests to be retried and hence may slow down data transfer
|
|
severely.
|
|
* trace_flag (unsigned integer, only available in debug builds)
|
|
The individual bits of the trace_flag parameter define which categories of
|
|
trace messages should be sent to the kernel log and which ones not.
|
|
|
|
|
|
Configuring the SRP Target System
|
|
---------------------------------
|
|
|
|
The first step is to choose whether access control will be controlled per
|
|
HCA or per HCA port and to create a modprobe configuration file that reflects
|
|
this choice. An example:
|
|
|
|
# cat /etc/modprobe.d/ib_srpt.conf
|
|
options ib_srpt one_target_per_port=1
|
|
|
|
Next, create the file /etc/scst.conf. You can create this file with
|
|
the scstadmin tool as follows:
|
|
|
|
/etc/init.d/scst stop
|
|
/etc/init.d/scst start
|
|
|
|
Now configure SCST using scstadmin - see also the scstadmin documentation for
|
|
further information. Once finished, save the configuration to /etc/scst.conf:
|
|
|
|
scstadmin -write_config /etc/scst.conf (sysfs version)
|
|
or
|
|
scstadmin -WriteConfig /etc/scst.conf (procfs version)
|
|
|
|
One can verify the contents of scst.conf e.g. as follows:
|
|
|
|
cat /etc/scst.conf
|
|
|
|
Now verify that loading the configuration from file works correctly:
|
|
|
|
/etc/init.d/scst reload
|
|
|
|
Note: when using InfiniBand loading the ib_ipoib kernel module and assigning
|
|
an IP address to each IPoIB interface is only needed when using the RDMA/CM.
|
|
When using the IB/CM however, it is allowed but not necessary to load the
|
|
ib_ipoib kernel module.
|
|
|
|
|
|
Configuring the SRP Initiator System
|
|
------------------------------------
|
|
|
|
First of all, load the SRP kernel module as follows:
|
|
|
|
modprobe ib_srp
|
|
|
|
Next, when using InfiniBand, discover the new SRP target by running the
|
|
srp_daemon command:
|
|
|
|
for d in /dev/infiniband/umad*; do srp_daemon -oacd$d; done
|
|
|
|
If you want to let the initiator system log in to all SRP targets available
|
|
in the same InfiniBand subnet that is possible as follows (-e = execute):
|
|
|
|
for d in /dev/infiniband/umad*; do srp_daemon -oecd$d; done
|
|
|
|
If you want to let the initiator log in to a specific target you can do that
|
|
e.g. as follows:
|
|
|
|
echo "id_ext=0002c903000f1366,ioc_guid=0002c903000f1366,dgid=fe800000000000000002c903000f1367,pkey=ffff,service_id=0002c903000f1366" > /sys/class/infiniband_srp/${SRP_HCA_NAME}/add_target; done
|
|
|
|
The meaning of the parameters in the above command is as follows:
|
|
* id_ext: must match ioc_guid.
|
|
* ioc_guid: see also the documentation of the ib_srpt ioc_guid parameter.
|
|
* dgid: target HCA port GID to connect to.
|
|
* pkey: IB partition key (P_Key) of the target to connect to.
|
|
* service_id: must match ioc_guid.
|
|
|
|
When using RoCE or iWARP, log in to the target system to determine the id_ext
|
|
and ioc_guid parameters and use these to log in. An example:
|
|
|
|
[ target system ]
|
|
# sed 's/,\(pkey\|dgid\|service_id\)=[^,]*//g' $(find /sys/kernel/scst_tgt/targets/ib_srpt -name login_info) | uniq
|
|
id_ext=0002c90300a34270,ioc_guid=0002c90300a34270
|
|
|
|
[ initiator system ]
|
|
echo dest=192.168.5.1:5000,id_ext=0002c90300a34270,ioc_guid=0002c90300a34270
|
|
>/sys/class/infiniband_srp/srp-mlx4_0-1/add_target
|
|
echo dest=192.168.6.1:5000,id_ext=0002c90300a34270,ioc_guid=0002c90300a34270
|
|
>/sys/class/infiniband_srp/srp-mlx4_0-2/add_target
|
|
|
|
Initiator port GIDs can be queried e.g. via sysfs:
|
|
|
|
$ for f in /sys/devices/*/*/*/infiniband/*/ports/*/gids/0; do echo $f; \
|
|
cat $f | sed 's/://g'; done
|
|
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/1/gids/0
|
|
fe800000000000000002c9030005f34b
|
|
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/2/gids/0
|
|
fe800000000000000002c9030005f34c
|
|
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/1/gids/0
|
|
fe800000000000000002c9030003cca7
|
|
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/2/gids/0
|
|
fe800000000000000002c9030003cca8
|
|
|
|
Finally run lsscsi to display the details of the newly discovered SCSI disks:
|
|
|
|
lsscsi
|
|
|
|
SRP targets can be recognized in the output of lsscsi by looking for
|
|
the disk names assigned on the SCST target ("disk01" in the example below):
|
|
|
|
[8:0:0:0] disk SCST_FIO disk01 102 /dev/sdb
|
|
|
|
|
|
Target names
|
|
------------
|
|
|
|
The name assigned by the ib_srpt target driver to an SCST target is either
|
|
ib_srpt_target_<n>, the node GUID of a HCA in hexadecimal form with a colon
|
|
after every fourth digit or the port GID with a colon afer every fourth
|
|
digit. The HCA node GUID and the port GIDs can be obtained via the
|
|
ibv_devinfo command. An example:
|
|
|
|
# ibv_devinfo -v | grep -E '[^a-z]port:|guid|GID'
|
|
node_guid: 0002:c903:0005:f34e
|
|
sys_image_guid: 0002:c903:0005:f351
|
|
port: 1
|
|
GID[0]: fe80:0000:0000:0000:0002:c903:0005:f34f
|
|
port: 2
|
|
GID[0]: fe80:0000:0000:0000:0002:c903:0005:f350
|
|
|
|
Once the ib_srpt driver has been loaded the available SCST targets can be
|
|
queried as follows:
|
|
|
|
# (cd /sys/kernel/scst_tgt/targets/ib_srpt && ls -d [0-9a-f]*)
|
|
fe80:0000:0000:0000:0002:c903:0005:f34f
|
|
fe80:0000:0000:0000:0002:c903:0005:f350
|
|
|
|
|
|
Session names
|
|
-------------
|
|
|
|
The name assigned by the ib_srpt target driver to a session depends on the
|
|
mode in which it is operating. If one_target_per_port=y then the source port
|
|
GID is used as the session name. If one_target_per_port=n then the 128-bit SRP
|
|
initiator port identifier is used as the session name. This identifier is sent
|
|
by the SRP initiator to the SRP target via the SRP_LOGIN_REQ information unit.
|
|
The Linux SRP initiator (ib_srp) generates the initiator port identifier as
|
|
follows:
|
|
- The first eight bytes are the identifier extension ('initiator_ext' parameter
|
|
specified in the login string echoed into the sysfs file 'add_target').
|
|
- The last eight bytes are the GUID of the initiator HCA port used to
|
|
communicate with the target.
|
|
|
|
An example:
|
|
|
|
[ INITIATOR ]
|
|
|
|
$ for f in /sys/devices/*/*/*/infiniband/*/ports/*/gids/0; do echo
|
|
f; cat $f; done
|
|
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/1/gids/0
|
|
fe80:0000:0000:0000:0002:c903:0005:f34b
|
|
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/ports/2/gids/0
|
|
fe80:0000:0000:0000:0002:c903:0005:f34c
|
|
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/1/gids/0
|
|
fe80:0000:0000:0000:0002:c903:0003:cca7
|
|
/sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/infiniband/mlx4_1/ports/2/gids/0
|
|
fe80:0000:0000:0000:0002:c903:0003:cca8
|
|
|
|
[ TARGET, after login ]
|
|
|
|
$ (cd /sys/kernel/scst_tgt/targets/ib_srpt/[0-9a-f]* && ls -d sessions/*)
|
|
sessions/fe80:0000:0000:0000:0002:c903:0003:cca7
|
|
sessions/fe80:0000:0000:0000:0002:c903:0005:f34b
|
|
|
|
|
|
LUN masking
|
|
-----------
|
|
|
|
In a straightforward configuration every LUN is visible to every initiator.
|
|
It is possible however to make a different set of LUNs visible to each
|
|
initiator by using the LUN masking feature of SCST. SRP initiators are
|
|
identified by their session name (see above). An example of an scst.conf
|
|
file using LUN masking for ib_srpt:
|
|
|
|
TARGET_DRIVER ib_srpt {
|
|
TARGET fe80:0000:0000:0000:0002:c903:0005:f34b {
|
|
enabled 1
|
|
rel_tgt_id 1
|
|
|
|
# LUNs visible by all initiators not listed below
|
|
LUN 0 disk01
|
|
|
|
GROUP grp1 {
|
|
# LUNs visible by initiator system 1
|
|
LUN 0 disk02
|
|
|
|
INITIATOR fe80:0000:0000:0000:0002:c903:0005:f34b
|
|
}
|
|
|
|
GROUP grp2 {
|
|
# LUNs visible by initiator system 2
|
|
LUN 0 disk03
|
|
|
|
INITIATOR fe80:0000:0000:0000:0002:c903:0005:f34c
|
|
}
|
|
}
|
|
}
|
|
|
|
|
|
Adding and Removing LUNs Dynamically
|
|
------------------------------------
|
|
|
|
It is possible to add and/or remove LUNs on the target without restarting
|
|
target or initiator. This can be done either via scstadmin or directly via the
|
|
sysfs interface. Although the SCST core will notify the initiator about LUN
|
|
changes, Linux initiators will ignore these notifications. In order to bring a
|
|
Linux initiator again in sync after a LUN change, the initiator has to be told
|
|
to rescan SCSI devices. Rescanning SCSI devices is e.g. possible via the
|
|
rescsan-scsi-bus.sh script that can be found here:
|
|
http://www.garloff.de/kurt/linux/#rescan-scsi. An example:
|
|
$ rescan-scsi-bus --hosts=${srp_host_id} --channels=0 --ids=0 --luns=0-31
|
|
|
|
|
|
InfiniBand Partitions
|
|
---------------------
|
|
|
|
Just like a VLAN allows to segment traffic on an Ethernet network partitions
|
|
allow to segment traffic on an InfiniBand network. Each InfiniBand partition
|
|
is identified by a partition key which is a 16-bit number. During fabric
|
|
initialization the subnet manager assigns one or more partition keys to
|
|
each InfiniBand port. For opensm partitions are defined in
|
|
/etc/opensm/partitions.conf. ib_srpt uses the partition with index 0. Which
|
|
partition key corresponds to index 0 can be found out by querying sysfs:
|
|
|
|
$ head /sys/class/infiniband/*/ports/*/pkeys/0
|
|
==> /sys/class/infiniband/mlx4_0/ports/1/pkeys/0 <==
|
|
0xffff
|
|
|
|
==> /sys/class/infiniband/mlx4_0/ports/2/pkeys/0 <==
|
|
0xffff
|
|
|
|
|
|
High availability
|
|
-----------------
|
|
|
|
If there are redundant paths in the IB network between initiator and target,
|
|
automatic path failover can be set up on the initiator as follows:
|
|
* Edit /etc/infiniband/openib.conf to load the SRP driver and SRP HA daemon
|
|
automatically: set SRP_LOAD=yes and SRPHA_ENABLE=yes.
|
|
* To set up and use the high availability feature you need the dm-multipath
|
|
driver and multipath tool.
|
|
* Please refer to the OFED-1.x user manual for more detailed instructions
|
|
on how to enable and how to use the HA feature. See e.g.
|
|
http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED%20_Linux_user_manual_1_5_1_2.pdf.
|
|
|
|
A setup with automatic failover between redundant targets is possible by
|
|
installing and configuring DRBD on both targets. If the initiator system
|
|
supports mirroring (e.g. Linux), you can use the following approach:
|
|
* Configure DRBD in Active/Active mode.
|
|
* Configure the initiator(s) for mirroring between the redundant targets.
|
|
If the initiator system does not support mirroring (e.g. VMware ESX), you
|
|
can use the following approach:
|
|
* Configure DRBD in Active/Passive mode and enable STONITH mode in the
|
|
Heartbeat software.
|
|
|
|
For more information, see also:
|
|
* http://www.drbd.org/
|
|
* http://www.linux-ha.org/wiki/Main_Page
|
|
|
|
|
|
Performance Notes - Target Side
|
|
-------------------------------
|
|
|
|
* Building the SCST core and the ib_srpt target driver in release mode
|
|
improves performance compared to debug mode.
|
|
|
|
* When using high-latency storage devices (hard disks), the default value
|
|
chosen by SCST for DEVICE.threads_num should be fine. When using
|
|
low-latency storage devices though (SSDs), DEVICE.threads_num should be set
|
|
to 1 or 2 in /etc/scst.conf in order to reach optimal performance for small
|
|
block sizes (e.g. 4 KB).
|
|
|
|
* When multiple InfiniBand HCA's are present in a target system the Linux
|
|
kernel by default will assign the associated interrupt handlers to CPU 0.
|
|
Even irqbalance will often assign the interrupt handlers of multiple HCA's
|
|
to the same CPU. That is unfortunate because it leads to unfair handling of
|
|
SRP sessions. The solution is to assign InfiniBand HCA interrupts manually
|
|
to different CPU's. That's possible by writing looking up the InfiniBand
|
|
interrupt numbers in /proc/interrupts and by writing proper bitmasks into
|
|
/proc/irq/<n>/smp_affinity.
|
|
|
|
|
|
Performance Notes - Initiator Side
|
|
----------------------------------
|
|
|
|
* Choose a proper value for the ib_srp kernel module parameter
|
|
cmd_sg_entries. The default value 12 works well for buffered reads while
|
|
the throughput for write-dominated workloads improves by changing this value
|
|
into 255. One way to set this kernel module parameter is as follows:
|
|
|
|
echo options ib_srp cmd_sg_entries=255 >>/etc/modprobe.d/ib_srp.conf
|
|
|
|
* For multithreaded workloads using small block sizes changing rq_affinity
|
|
into 2 improves IOPS significantly (Linux kernel 3.1 and later; see also
|
|
commit 5757a6d76cdf6dda2a492c09b985c015e86779b1).
|
|
|
|
* For latency sensitive applications, using the noop scheduler at the initiator
|
|
side can give significantly better results than with other schedulers.
|
|
|
|
* The SRP initiator limits by default the queue depth to 64 commands. If your
|
|
workload benefits from a larger queue depth, enlarge the queue depth by
|
|
setting the max_cmd_per_lun and queue_size parameters in the SRP login
|
|
string.
|
|
|
|
* The following parameters have a small but measurable impact on SRP
|
|
performance:
|
|
* /sys/class/block/${dev}/queue/rotational
|
|
* /sys/class/block/${dev}/queue/rq_affinity
|
|
* /proc/irq/${ib_int_no}/smp_affinity
|
|
|
|
|
|
Performance Notes - Both Sides
|
|
------------------------------
|
|
|
|
* Disabling CONFIG_SCHED_DEBUG and CONFIG_SCHEDSTATS in the kernel config
|
|
improves performance.
|
|
|
|
* Disable CONFIG_IRQSOFF_TRACER such that CONFIG_TRACE_IRQFLAGS is disabled.
|
|
|
|
* Consider which memory allocator to use. With recent kernels using the SLUB
|
|
memory allocator instead of SLAB may help. On multi-socket systems the SLAB
|
|
memory allocator may result in better performance. Please note that SLAB is
|
|
tunable while SLUB is not. See also http://lkml.org/lkml/2010/7/9/264 and
|
|
http://www.ibm.com/developerworks/linux/library/l-linux-slab-allocator/.
|
|
|
|
|
|
Frequently Asked Questions
|
|
--------------------------
|
|
|
|
Q: Every now and then "SRP abort called" and "SRP reset_device called"
|
|
messages are logged at the initiator side. Around the same time I see the
|
|
following message in the target log: "ib_srpt: ***ERROR***: Command ...: IB
|
|
completion for idx ... has not been received in time (SRPT command state
|
|
...)". What is the meaning of these messages mean and how can I fix this ?
|
|
|
|
A: This means that a timeout occurred while a HCA was waiting for an
|
|
acknowledge message. Check the IB network for bad IB cables, bad HCA's
|
|
and/or bad switch ports. Also make sure that the HCA firmware is up to
|
|
date.
|
|
|
|
Q: Loading the kernel module ib_srpt triggers a kernel panic with a call trace
|
|
like the one below. What is the cause of this and how can this be solved ?
|
|
|
|
Call Trace:
|
|
[<ffffffffa02f2a50>] srpt_alloc_ioctx+0x60/0xb0 [ib_srpt]
|
|
[<ffffffffa02f2f0a>] srpt_alloc_ioctx_ring+0xea/0x1e0 [ib_srpt]
|
|
[<ffffffffa02f32e9>] srpt_add_one+0x2e9/0x670 [ib_srpt]
|
|
[<ffffffffa015a480>] ib_register_client+0x80/0xa0 [ib_core]
|
|
[<ffffffffa02421eb>] srpt_init_module+0x1eb/0x235 [ib_srpt]
|
|
[<ffffffff81000344>] do_one_initcall+0x34/0x1a0
|
|
[<ffffffff8107a63c>] sys_init_module+0xdc/0x260
|
|
[<ffffffff81002e3b>] system_call_fastpath+0x16/0x1b
|
|
|
|
A: This means that you are using a system on which OFED has been installed but
|
|
that ib_srpt has been compiled against the in-tree kernel headers instead
|
|
of the OFED kernel headers. You can fix this by rebuilding ib_srpt against
|
|
the OFED kernel headers. The ib_srpt makefile should detect the OFED kernel
|
|
headers automatically - at least if ib_srpt is built after OFED has been
|
|
installed.
|
|
|
|
|
|
Feedback
|
|
--------
|
|
|
|
Send questions about this driver to scst-devel@lists.sourceforge.net.
|