diff --git a/www/bart_res.txt b/www/bart_res.txt new file mode 100644 index 000000000..a939e1ad7 --- /dev/null +++ b/www/bart_res.txt @@ -0,0 +1,22 @@ +ISCSI over IPoIB (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results +for the buffered I/O test with a block size of 512K (initiator) +against a file of 1GB residing on a tmpfs filesystem on the target are +as follows: + +write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s. +read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s. + +And for a block size of 4 KB: + +write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s. +read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s. + +Or: depending on the test scenario, SCST transfers data between 2% and +30% faster via the iSCSI protocol over this network. + +Something that is not relevant for this comparison, but interesting to +know: with the SRP implementation in SCST the maximal read throughput +is 1290 MB/s on the same setup. + +Measured by Bart Van Assche + diff --git a/www/comparison.html b/www/comparison.html index dc6317765..5e7ac1109 100644 --- a/www/comparison.html +++ b/www/comparison.html @@ -66,9 +66,7 @@
REMARKS:
-1. STGT has SCSI target engine with memory management in user space and small hooks in the kernel to interact with in-kernel target drivers.
+1. STGT has SCSI target engine and memory management in user space with small hooks in the kernel to interact with in-kernel target drivers. + As a direct consequence, fully user space STGT target (e.g. iSCSI) can run without any kernel modules needed.
2. All iSCSI management implemented in user space and actual data transfers in kernel space without user space involved.
@@ -280,7 +290,8 @@ reinstatement 154. The result "in average" is listed. One target can be better somewhere, another one somewhere else. Although manual tuning of target and - system parameters tends the restore the difference listed in the comparison.
+ system parameters tends the restore the difference listed in the comparison. You can find example measurements here, + here and here.5. All SCST and its drivers' kernel patches supposed to be applied and SCST with the drivers built in the release or performance build. Without the kernel patches SCST performance will be at "****+" level, except for the case, when user space backstorage handler used @@ -290,14 +301,17 @@ reinstatement 15
7. Some zero-copy functionality isn't available from user space. For instance, zero-copy send to a socket.
+7. Some zero-copy functionality isn't available from user space, sometimes fundamentally. + For instance, zero-copy FILEIO with page cache or zero-copy send to a socket. Also STGT can't use splice() for in-kernel + target drivers, because it has memory management in user space. To use splice() with socket-based user space target drivers + STGT would need a deep redesign of internal interactions between target drivers, core and backend handlers.
8. "Local access to emulated backstorage devices" means that you can access emulated by a SCSI target devices - locally on the target host. For instance, you can mount your ISO image as a SCSI CDROM device locally on the - target host.
+ locally on the target host. For instance, you can mount your ISO image from emulated by the target + CDROM device on the locally target host. -9. "Advanced devices visibility management" means that different initiators can see different sets - of devices from the same target. This feature is essential with hardware targets, which don't have a possibility +
9. "Advanced devices access control" means that different initiators can see different sets + of devices from the same target. This feature is essential for hardware targets, which don't have ability to create virtual targets.
10. Not well tested, because at the moment there is no backend using this functionality.
diff --git a/www/comparison_history.html b/www/comparison_history.html index efd05f1e9..cb7233653 100644 --- a/www/comparison_history.html +++ b/www/comparison_history.html @@ -52,9 +52,14 @@ blocked my access to the LIO mailing list, preventing me from tell that to the interested people myself. After all our previous discussions and with his skills and experience it's nearly impossible to believe that Nicholas Bellinger didn't know that SCST is a lot more generic than LIO and has zero-copy -in all the same places, where LIO has. So, that comparison page looked -like rather a deliberate cheating attempt and for SCST there was no other way, except to setup own, correct comparison. -This comparison table turned out to be very useful, so it was extended to cover all the SCSI target areas. +in all the same places, where LIO has. Thus, that comparison page looked like rather a deliberate cheating +attempt. Seems SCST is so much superior over LIO, so Nicholas Bellinger gave up technical discussions and started +attacking people's perception about SCST, trying to inspire them the +opposite. + +So, for SCST there was no other way, except to setup own, +correct comparison. This comparison table turned out to be very useful, +so it was extended to cover all the SCSI target areas.
diff --git a/www/contributing.html b/www/contributing.html index 9367cb3e2..c69e9c0c6 100644 --- a/www/contributing.html +++ b/www/contributing.html @@ -45,9 +45,9 @@ better support and troubleshooting for you. -At the moment, SCST in FILEIO mode uses standard Linux read() and write() syscalls paths, which copy data from the page cache to the supplied buffer and back. Zero-copy FILEIO @@ -83,7 +83,7 @@
That's all. For WRITEs the current code path would remain unchanged.
-Implementation should be similar to zero-copy FILEIO for READ commands and should be done after it. All incoming data should be inserted in the page cache, then dereferenced in @@ -91,7 +91,7 @@ page cache, namely, locking issues related to it. They should be carefully investigated.
-Support for PERSISTENT RESERVE IN and PERSISTENT RESERVE OUT is required to work in many cluster environments, e.g. Windows 2003 Cluster.
@@ -107,7 +107,7 @@ devices only and reject PERSISTENT RESERVE IN and OUT commands for pass-through devices with "COMMAND NOT SUPPORTED" sense data. -At the moment, if security name for an initiator reassigned (moved) to another security group, the existing sessions from that initiator are not automatically reassigned to @@ -129,7 +129,7 @@
At the moment, if an initiator or several initiators simultaneously send to target too many commands, especially in seek intensive workloads, target can get @@ -279,7 +279,7 @@
Then, at the latest stage of the development, logic to not schedule the flow control work on idle devices should be added.
-At the moment, scst_vdisk handler doesn't support O_DIRECT option and possibility to set it was disabled. This limitation caused by Linux kernel expectation that memory supplied to @@ -291,7 +291,7 @@ by pages, taken directly from dio->curr_user_address. Each such page should be referenced by page_cache_get(). That's all.
-At the moment, in scst_vdisk handler command execution function vdisk_do_job() is overcomplicated and not very performance effective. It would be good to replace all those @@ -309,7 +309,7 @@ return vdisk_exec_fns[cmd->cdb[0]](cmd); }
-In the pass-through mode (i.e. using the pass-through device handlers scst_disk, scst_tape, etc) SCSI commands, coming from remote initiators, @@ -323,6 +323,98 @@
In sgv_big_order_alloc.diff you can find a possible way to solve this issue.
+ +In some cases a target driver might need to register memory used for data buffers in the + hardware. At the moment, none of SCST target drivers, including InfiniBand SRP target driver, + need that feature. But in case if in future there is a need in such a feature, it can be easily + added by extending SCST SGV cache. The SCST SGV cache is a memory management + subsystem in SCST. It doesn't free to the system each data buffer, + which is not used anymore, but keeps it for a while to let it be reused by the + next consecutive command to reduce command processing latency and, hence, improve performance.
+ +To support memory buffers registrations, it can be extended by the following way:
+ +1. Struct scst_tgt_template would be extended to have 2 new callbacks:
+ +2. SCST core would be extended to have 4 new functions:
+ +3. The workflow would be the following:
+ +SCST might also be used with non-SCSI speaking transports, like NBD or AoE. Such cooperation + would allow them to use SCST-emulated backend.
+ +For user space targets this is trivial: they simply should use SCST-emulated devices locally + via scst_local module.
+ +For in-kernel non-SCSI target driver it's a bit more complicated. They should implement a small layer, + which would translate their internal READ/WRITE requests to corresponding SCSI commands and, on the + way back, SCSI status and sense codes to their internal status codes.
+ diff --git a/www/scst_pg.html b/www/scst_pg.html index 48ff20d18..0865e98a4 100644 --- a/www/scst_pg.html +++ b/www/scst_pg.html @@ -334,7 +334,7 @@All outstanding commands will be finished regularly. After scst_unregister_session() returned no new commands must be sent to SCST via scst_rx_cmd(). Also, the caller must ensure that no - scst_rx_cmd() or scst_rx_mgmt_fn_*() is called in paralell with + scst_rx_cmd() or scst_rx_mgmt_fn_*() is called in parallel with scst_unregister_session().
Function scst_unregister_session() can be called before result_fn() of scst_register_session() called, i.e. during the session registration/initialization.
@@ -582,8 +582,8 @@These functions allows to add or delete some SCST threads. For example, if exec() function in - your device handler works synchronously, i.e. wait for job's completition, in order to prevent performance loss you + your device handler works synchronously, i.e. wait for job's completion, in order to prevent performance loss you can add for SCST as many threads as there are devices serviced by your device handler.
Function scst_add_threads() starts requested number of threads. It is defined as the following:
int scst_add_threads(
diff --git a/www/tomasz_res.txt b/www/tomasz_res.txt
new file mode 100644
index 000000000..ad90007dc
--- /dev/null
+++ b/www/tomasz_res.txt
@@ -0,0 +1,52 @@
+The target is running Debian Lenny 64bit userspace on an Intel Celeron 2.93GHz CPU, 2 GB RAM.
+
+Initiator is running Debian Etch 64 bit userspace, open-iscsi 2.0-869, Intel Xeon 3050/2.13GHz, 8 GB RAM.
+
+
+Each test was repeated 6 times, "sync" was made and caches were dropped on both sides before each test was started.
+
+dd parameters were like below, so 6.6 GB of data was read each time:
+
+dd if=/dev/sdag of=/dev/null bs=64k count=100000
+
+
+Data was read from two block devices:
+- /dev/md0, which is RAID-1 on two ST31500341AS 1.5 TB drives
+- encrypted dm-crypt device which is on top of /dev/md0
+
+Encrypted device was created with the following additional options passed to cryptsetup
+(it provides the most performance on systems where CPU is a bottleneck, but with decreased
+security when compared to default options):
+
+-c aes-ecb-plain -s 128
+
+
+Generally, CPU on the target was a bottleneck, so I also tested the load on target.
+
+
+md0, crypt columns - averages from dd
+us, sy, id, wa - averages from vmstat
+
+
+1. Disk speeds on the target
+
+Raw performance: 102.17 MB/s
+Raw performance (encrypted): 50.21 MB/s
+
+
+2. Read-ahead on the initiator: 256 (default); md0, crypt - MB/s
+
+ md0 us sy id wa | crypt us sy id wa
+STGT 50.63 4% 45% 18% 33% | 32.52 3% 62% 16% 19%
+SCST (debug + no patches) 43.75 0% 26% 30% 44% | 42.05 0% 84% 1% 15%
+SCST (fullperf + patches) 45.18 0% 25% 33% 42% | 44.12 0% 81% 2% 17%
+
+
+3. Read-ahead on the initiator: 16384; md0, crypt - MB/s
+
+ md0 us sy id wa | crypt us sy id wa
+STGT 56.43 3% 55% 2% 40% | 46.90 3% 90% 3% 4%
+SCST (debug + no patches) 73.85 0% 58% 1% 41% | 42.70 0% 85% 0% 15%
+SCST (fullperf + patches) 76.27 0% 63% 1% 36% | 42.52 0% 85% 0% 15%
+
+Measured by Tomasz Chmielewski
diff --git a/www/vl_res.txt b/www/vl_res.txt
new file mode 100644
index 000000000..cb35758f5
--- /dev/null
+++ b/www/vl_res.txt
@@ -0,0 +1,220 @@
+Setup:
+
+Target: HT 2.4GHz Xeon, x86_32, 2GB of memory limited to 256MB by kernel
+command line to have less test data footprint, 75GB 15K RPM SCSI disk as
+backstorage, dual port 1Gbps E1000 Intel network card, 2.6.29 kernel.
+
+Initiator: 1.7GHz Xeon, x86_32, 1GB of memory limited to 256MB by kernel
+command line to have less test data footprint, dual port 1Gbps E1000
+Intel network card, 2.6.27 kernel, open-iscsi 2.0-870-rc3.
+
+The target exported a 5GB file on XFS for FILEIO and 5GB partition for
+BLOCKIO.
+
+All the tests were ran 3 times and average written. All the values are
+in MB/s. The tests were ran with CFQ and deadline IO schedulers on the
+target. All other parameters on both target and initiator were default.
+
+==================================================================
+
+I. SEQUENTIAL ACCESS OVER SINGLE LINE
+
+1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
+
+ ISCSI-SCST IET STGT
+NULLIO: 106 105 103
+FILEIO/CFQ: 82 57 55
+FILEIO/deadline 69 69 67
+BLOCKIO/CFQ 81 28 -
+BLOCKIO/deadline 80 66 -
+
+------------------------------------------------------------------
+
+2. # dd if=/dev/zero of=/dev/sdX bs=512K count=2000
+
+I didn't do other write tests, because I have data on those devices.
+
+ ISCSI-SCST IET STGT
+NULLIO: 114 114 114
+
+------------------------------------------------------------------
+
+3. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then
+
+# dd if=/mnt/q of=/dev/null bs=512K count=2000
+
+were ran (/mnt/q was created before by the next test)
+
+ ISCSI-SCST IET STGT
+FILEIO/CFQ: 94 66 46
+FILEIO/deadline 74 74 72
+BLOCKIO/CFQ 95 35 -
+BLOCKIO/deadline 94 95 -
+
+------------------------------------------------------------------
+
+4. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then
+
+# dd if=/dev/zero of=/mnt/q bs=512K count=2000
+
+were ran (/mnt/q was created by the next test before)
+
+ ISCSI-SCST IET STGT
+FILEIO/CFQ: 97 91 88
+FILEIO/deadline 98 96 90
+BLOCKIO/CFQ 112 110 -
+BLOCKIO/deadline 112 110 -
+
+------------------------------------------------------------------
+
+Conclusions:
+
+1. ISCSI-SCST FILEIO on buffered READs on 27% faster than IET (94 vs
+74). With CFQ the difference is 42% (94 vs 66).
+
+2. ISCSI-SCST FILEIO on buffered READs on 30% faster than STGT (94 vs
+72). With CFQ the difference is 104% (94 vs 46).
+
+3. ISCSI-SCST BLOCKIO on buffered READs has about the same performance
+as IET, but with CFQ it's on 170% faster (95 vs 35).
+
+4. Buffered WRITEs are not so interesting, because they are async. with
+many outstanding commands at time, hence latency insensitive, but even
+here ISCSI-SCST always a bit faster than IET.
+
+5. STGT always the worst, sometimes considerably.
+
+6. BLOCKIO on buffered WRITEs is constantly faster, than FILEIO, so,
+definitely, there is a room for future improvement here.
+
+7. For some reason assess on file system is considerably better, than
+the same device directly.
+
+==================================================================
+
+II. Mostly random "realistic" access.
+
+For this test I used io_trash utility. This utility emulates DB-like
+access. For more details see http://lkml.org/lkml/2008/11/17/444. To
+show value of target-side caching in this test target was ran with full
+2GB of memory. I ran io_trash with the following parameters: "2 2 ./
+500000000 50000000 10 4096 4096 300000 10 90 0 10". Total execution
+time was measured.
+
+ ISCSI-SCST IET STGT
+FILEIO/CFQ: 4m45s 5m 5m17s
+FILEIO/deadline 5m20s 5m22s 5m35s
+BLOCKIO/CFQ 23m3s 23m5s -
+BLOCKIO/deadline 23m15s 23m25s -
+
+Conclusions:
+
+1. FILEIO on 500% (five times!) faster than BLOCKIO
+
+2. STGT, as usually, always the worst
+
+3. Deadline always a bit slower
+
+==================================================================
+
+III. SEQUENTIAL ACCESS OVER MPIO
+
+Unfortunately, my dual port network card isn't capable of simultaneous
+data transfers, so I had to do some "modeling" and put my network
+devices in 100Mbps mode. To make this model more realistic I also used
+my old IDE 5200RPM hard drive capable to produce locally 35MB/s
+throughput. So I modeled the case of double 1Gbps links with 350MB/s
+backstorage, if all the following rules satisfied:
+
+ - Both links a capable of simultaneous data transfers
+
+ - There is sufficient amount of CPU power on both initiator and target
+to cover requirements for the data transfers.
+
+All the tests were done with iSCSI-SCST only.
+
+1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
+
+NULLIO: 23
+FILEIO/CFQ: 20
+FILEIO/deadline 20
+BLOCKIO/CFQ 20
+BLOCKIO/deadline 17
+
+Single line NULLIO is 12.
+
+So, there is a 67% improvement using 2 lines. With 1Gbps it should be
+equivalent of 200MB/s. Not too bad.
+
+==================================================================
+
+Connection to the target were made with the following iSCSI parameters:
+
+# iscsi-scst-adm --op show --tid=1 --sid=0x10000013d0200
+InitialR2T=No
+ImmediateData=Yes
+MaxConnections=1
+MaxRecvDataSegmentLength=2097152
+MaxXmitDataSegmentLength=131072
+MaxBurstLength=2097152
+FirstBurstLength=262144
+DefaultTime2Wait=2
+DefaultTime2Retain=0
+MaxOutstandingR2T=1
+DataPDUInOrder=Yes
+DataSequenceInOrder=Yes
+ErrorRecoveryLevel=0
+HeaderDigest=None
+DataDigest=None
+OFMarker=No
+IFMarker=No
+OFMarkInt=Reject
+IFMarkInt=Reject
+
+# ietadm --op show --tid=1 --sid=0x10000013d0200
+InitialR2T=No
+ImmediateData=Yes
+MaxConnections=1
+MaxRecvDataSegmentLength=262144
+MaxXmitDataSegmentLength=131072
+MaxBurstLength=2097152
+FirstBurstLength=262144
+DefaultTime2Wait=2
+DefaultTime2Retain=20
+MaxOutstandingR2T=1
+DataPDUInOrder=Yes
+DataSequenceInOrder=Yes
+ErrorRecoveryLevel=0
+HeaderDigest=None
+DataDigest=None
+OFMarker=No
+IFMarker=No
+OFMarkInt=Reject
+IFMarkInt=Reject
+
+# tgtadm --op show --mode session --tid 1 --sid 1
+MaxRecvDataSegmentLength=2097152
+MaxXmitDataSegmentLength=131072
+HeaderDigest=None
+DataDigest=None
+InitialR2T=No
+MaxOutstandingR2T=1
+ImmediateData=Yes
+FirstBurstLength=262144
+MaxBurstLength=2097152
+DataPDUInOrder=Yes
+DataSequenceInOrder=Yes
+ErrorRecoveryLevel=0
+IFMarker=No
+OFMarker=No
+DefaultTime2Wait=2
+DefaultTime2Retain=0
+OFMarkInt=Reject
+IFMarkInt=Reject
+MaxConnections=1
+RDMAExtensions=No
+TargetRecvDataSegmentLength=262144
+InitiatorRecvDataSegmentLength=262144
+MaxOutstandingUnexpectedPDUs=0
+
+Measured by Vladislav Bolkhovitin