ib_srpt Test Procedure ====================== At least the following tests must be run before releasing a new SRPT version: * Make sure that SRPT compiles and installs without triggering any compiler warning. Use the following command to compile and install SRPT: for d in scst srpt; do make -C $d -s clean && make -C $d -s install; done * Verify the output of run-regression-tests for kernel versions starting at 2.6.23 up to and including the latest released kernel. * Verify that SRPT compiles, installs and works fine when following the instructions in README.ofed for the latest released OFED distribution and at the latest released CentOS, Ubuntu and openSUSE distributions. * Verify that module loading and unloading works fine. * Verify that rejecting logins does not trigger a memory leak, e.g. as follows: * Run the following command on the target system: ${SCST_TRUNK}/scripts/monitor-memory-usage | tee memlog.txt * Run the following command on the initiator system: target_id="$(/usr/sbin/ibsrpdm -c -d /dev/infiniband/umad0)" for ((i=0;i<100000;i++)); do echo "$target_id" >/sys/class/infiniband_srp/srp-mlx4_0-1/add_target; done * Verify that the following I/O stress test does not report any errors even when left running for several hours: dev=... umount /mnt mkfs.ext4 -O ^has_journal $dev while \ mount $dev /mnt && \ rm -rf /mnt/test* && \ fio --verify=md5 -rw=randwrite --size=10m --bs=4k \ --loops=1000000 --iodepth=64 --group_reporting --sync=1 --direct=1 \ --norandommap --ioengine=aio --directory=/mnt --name=test --thread \ --numjobs=80 --runtime=30 && \ fsck -N $dev do true done * Repeat the above test with SCST_MAX_TGT_DEV_COMMANDS set to 48 and after having applied the patch below on ib_srpt.c. The expected result is that no data corruption occurs but that on the target error messages are logged about a negative req_lim value. The lowest expected value for req_lim is -15. Index: srpt/src/ib_srpt.c =================================================================== --- srpt/src/ib_srpt.c (revision 2412) +++ srpt/src/ib_srpt.c (working copy) @@ -2411,7 +2411,7 @@ ch->max_ti_iu_len = it_iu_len; rsp->buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); - rsp->req_lim_delta = cpu_to_be32(ch->rq_size); + rsp->req_lim_delta = cpu_to_be32(ch->rq_size + 16); atomic_set(&ch->req_lim, ch->rq_size); /* create cm reply */ * Verify that a SCSI reset works properly by running the following command on an initiator system (note: with kernel version 2.6.37 and before the command below triggers a bug in the Linux SRP initiator -- see also https://bugzilla.kernel.org/show_bug.cgi?id=13893): (target) echo add debug > /sys/kernel/scst_tgt/targets/ib_srpt/trace_level (initiator) sg_reset -d ${initiator_device} Verify that the target logged that it has processed a SRP_TSK_LUN_RESET message. * Run the following command on a target system: while true; do /etc/init.d/scst stop; sleep 3; /etc/init.d/scst start; sleep 5; done and the following commands on an initiator system: target_id="$(/usr/sbin/ibsrpdm -c -d /dev/infiniband/umad0)" while true; do date; rmmod ib_srp; modprobe ib_srp; echo "${target_id}" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target; sleep 2; done and verify that nothing unexpected happens. * Log in twice from an initiator system, and verify that the first session receives a DREQ upon the second login: target_id="$(/usr/sbin/ibsrpdm -c -d /dev/infiniband/umad0)" dmesg -c echo "${target_id}" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target dmesg -c echo "${target_id}" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target dmesg -c echo "${target_id}" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target * Test low memory conditions: load SRPT, reduce the amount of available memory by creating a large file on a tmpfs file system and run a stress test on an initiator system. * Test the state machine for SCST commands in SRPT by using SCST's error injection mechanism. Add the following to scst/src/Makefile, log in from an initiator system and trigger SRP I/O: EXTRA_CFLAGS += -DCONFIG_SCST_DEBUG -g EXTRA_CFLAGS += -DCONFIG_SCST_DEBUG_TM -DCONFIG_SCST_TM_DBG_GO_OFFLINE * Test with multiple values of ib_srp_tablesize in the range 1..128. * Verify that the initiator does not lock up while running the command below (see also https://bugzilla.kernel.org/show_bug.cgi?id=14235): dev=... fio --bs=512 --buffered=0 --ioengine=libaio --rw=read --invalidate=1 \ --thread --numjobs=8 --loops=10 --gtod_reduce=1 --group_reporting \ --name=$dev --filename=$dev * Test whether queue overflow recovery works correctly as follows: - On the target, reload ib_srpt with srpt_sq_size set to 64. Add e.g. the following line at the end of /etc/modprobe.d/99-local.conf: options ib_srpt srpt_sq_size=64 - On the initiator, run a direct I/O test with large block sizes, e.g. scripts/blockdev-perftest -f -d -j -m 12 -M 24 /dev/sdb - On the initiator, run the following two commands in parallel: scripts/blockdev-perftest -r -d -j -m 12 -M 24 /dev/sdb fio --verify=md5 -rw=randwrite --size=10m --bs=4k \ --loops=1000000 --iodepth=64 --group_reporting --sync=1 --direct=1 \ --norandommap --ioengine=aio --directory=/mnt --name=test --thread \ --numjobs=80 --runtime=30 * Test whether aborting multipart RDMA transfers works correctly as follows: - On the target, reload ib_srpt with srpt_sq_size set to 64. - On the initiator, run a direct I/O test with large block sizes, e.g. 128 KB. - Verify that on the target kernel messages similar to the following are logged frequently: ib_srpt: ***ERROR***: srpt_perform_rdmas[2966]: ib_post_send() returned -12 for 1/2 - On the target, unload and reload the ib_srpt kernel module. - Verify that no kernel crash occurs on the target. - Repeat the above a few times.