diff --git a/www/Gentoo-HOWTO.pdf b/www/Gentoo-HOWTO.pdf new file mode 100644 index 000000000..260f6b7c7 Binary files /dev/null and b/www/Gentoo-HOWTO.pdf differ diff --git a/www/bart_res.txt b/www/bart_res.txt new file mode 100644 index 000000000..9c87726a8 --- /dev/null +++ b/www/bart_res.txt @@ -0,0 +1,39 @@ +Setup: + +Target: 2.6.29 kernel, 64 bit, Intel E8400 CPU @ 3.00GHz, 4 GB RAM, SCST trunk + revision 727 (which is close to the 1.0.1 release). A file of 1 GB residing + on a tmpfs filesystem has been exported via SCST. + +Initiator: 2.6.29 kernel, 64 bit, Intel E6750 CPU @ 2.66 GHz, 2 GB RAM, + openSUSE 11.0 userspace. + +Network: two MHGH28-XTC (MT26418) ConnectX InfiniBand HCA's connected back to + back, which are DDR PCIe 1.0 HCA's. The IPoIB stack was configured with the + default MTU of 2044 bytes on both interfaces and was using datagram mode. + ib_read_bw reported a throughput of 1394 MB/s for this network, and netperf + reported a TCP/IP throughput of 1200 MB/s (with default parameters). + +Results: + +Buffered I/O, block size of 512K (dd if=/dev/sdb of=/dev/null bs=512K): + +write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s. +read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s. + +Buffered I/O, block size of 4 KB (dd if=/dev/sdb of=/dev/null bs=4K): + +write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s. +read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s. + +Or: depending on the test scenario, SCST transfers data between 2% and +30% faster via the iSCSI protocol over this network than IET. + +Note: at least for the tests with a block size of 4 KB, the initiator +system was the bottleneck, not the target system. + +Something that is not relevant for this comparison, but interesting to +know: with the SRP implementation in SCST the maximal read throughput +is 1290 MB/s on the same setup. + +Measured by Bart Van Assche + diff --git a/www/comparison.html b/www/comparison.html new file mode 100644 index 000000000..1e6a078ab --- /dev/null +++ b/www/comparison.html @@ -0,0 +1,566 @@ + + + + + + + + +SCSI Targets Comparison + + + + +
+ + + + + +
+ + +
+

Features comparison between Linux SCSI targets

+ +

This features comparison is intended to be a complete and fair feature-by-feature + comparison between the listed targets without any bias to SCST. If you see anything + wrong somewhere or anything missed, you are welcome to report it in scst-devel + mailing list and it will be corrected. +

+ +

Also Sebastian Riemer wrote a good summary in his e-mail (April 2013)

+ +

As on June 2011, briefly reviewed April 2013.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+SCST +STGT +IET +LIO/TCM +
General +
Upstream kernel - - - Since 2.6.38
Generic Target Engine + + iSCSI only +
Architecture Kernel only User space only Split + 1 + Kernel only
Stability + + + Probably
Performance 2 ***** + 3 + *** **** ****-
Zero-copy passing data between target and +backend drivers + + 4 + + + 5 + + +
Support for transports without expecting +transfer values (Wide (parallel) SCSI, SAS) + - - -
Interface with user space SysFS (or obsolete + ProcFS) Custom - ConfigFS/IOCTL/ProcFS
Features +
Target drivers in kernel space + - - +
Target drivers in user space Via scst_local (e.g. + using STGT + pass-through) + - Via tcm_loop (e.g. + using STGT pass-through)
Backstorage handlers in kernel space + - - +
Backstorage handlers in user space + + - -
Advanced devices access control +7 + - - +
Automatic sessions reassignment (changes in the +access control immediately seen by initiators) + - - -
Support for Asynchronous Event Notifications +(AEN) + - - -
Notifications for devices added/removed or +resized through AENs or Unit Attentions (initiators can instantly see +any target reconfiguration in a PnP-like manner) + - - -
Bidirectional Commands + + - +
Extended CDB (size >16 bytes) + + - +
Descriptor sense support + + - -
RESERVE/RELEASE +(Windows 2003 clustering) + + + +
Safe RESERVE/RELEASE implementation according to +SCSI requirements 9 Safe Safe Safe from + v1.4.18 Not safe
Safe implementation of Task Management commands +10 Safe Not safe Not safe LUN RESET - safe. + Other TM commands not + implemented.
Support for SCSI task attributes, including +ORDERED commands + + -,
data + corruption possible + + 11
-,
data + corruption possible + + 11
Persistent (SCSI-3) Reservations +(Windows 2008 clustering / RHEL5 I/O fencing) + +
(not all + functionality + implemented)
- +
Durable, i.e. transactional, save of Persistent +Through Power Loss Persistent Reservation data Durable Not supported - Not durable
ALUA +/- (Implicit + only) + 19 + - - +/- + 19
Failover Clustering + + + +
Different threading models to choose the best +performing + - - -
CPU affinity control + - - +
I/O context grouping between I/O threads (big +performance win with CFQ) + - + -
Per-initiator I/O context grouping (big +performance and fairness win if several initiators access the same +virtual or backend device on the target) + - - -
Protection against commands with +wrong transfer size or transfer direction (may lead to crash or +hard lockup of the target) + - - -
Protection against crashing target by making it +to allocate too much memory for buffers and go into OOM state + + - - -
Caching of allocated buffers + - - -
Latencies measurement facility + - - -
Configuration tool with ability to automatically +apply changes in the config file on fly without any restarts scstadmin - - rtsadmin?
SCSI MIBs - - - +- + 12
Supported transports and hardware +
iSCSI + + + +
QLogic (Fibre Channel and FCoE) + - - +
Emulex (Fibre Channel and FCoE) + - - +
SRP + - - Preliminary
iSER + + - +
Marvell (SAS) Beta - - -
FCoE + Under + development - Alpha
LSI (Parallel (Wide) SCSI and Fibre Channel) + Alpha - - -
LSI (SAS) Preliminary + (not completed) - - -
IBM pSeries Virtual SCSI + + - Preliminary
Local access to emulated backstorage devices +6 scst_local - - tcm_loop
Supported backstorage +
Kernel side FILEIO + - + +
Kernel side BLOCKIO + - + +
User space FILEIO + + - -
O_DIRECT FILEIO fileio_tgt + - -
Async FILEIO - + - -
Native RAMDISK - - - +
SCSI pass-through + 13 + Single + initiator only, not + enforced + 14 + - Single initiator only, not enforced, + limited functionality for tapes + 14
Zero-copy data read/write to/from backstorage + BLOCKIO, user space + FILEIO in O_DIRECT mode, + pass-through + 15 + - + 5 + BLOCKIO BLOCKIO, pass- + through
Cache safe8 +FILEIO Safe Safe only RDWR + backend Safe Safe
Cache safe8 +BLOCKIO Safe - Not safe Safe
4k sectors support in pass-through mode + - - ?
4k, 2k, 1k and 512 byte sectors emulation +in modes, other than pass-through + + - +
Virtual CD devices emulation from ISO files + + + - -
Possibility to write to emulated from ISO files +CD devices - + - -
Emulation of virtual tape and media changer +devices (VTL) - Experimental - -
Thin provisioning support +
? - +
iSCSI Target +
Architecture Split + 1 + User space + only Split + 1 + Kernel only
Interface with user space SysFS (or obsolete + ProcFS)/ + IOCTL/Netlink - IOCTL/ProcFS/ + Netlink ConfigFS/IOCTL/ProcFS
Zero-copy data send/receive Send only + 16 + In some cases, + send only + 5 + Send only Send only
Multiple connections per +session (MS/C) - - + +
Max ErrorRecoveryLevel 0 0 0 2
Support for limiting number of initiators +allowed to connect to a target + - + -
Per-portal targets visibility control + - + -
Per-initiators targets visibility control + + + -
Support for AHS + + - -
Support for iSCSI redirects + + + -
Bidirectional Commands + + - -
Extended CDB (size >16 bytes) + + - -
Support for AENs (initiators can instantly see any +target reconfiguration in a PnP-like manner) + - - -
Support for iSNS + + + -
Safe implementation of Task Management commands +10 Safe Not safe Not safe ABORT TASK - not safe, + LUN RESET - safe, + other TM commands not + implemented.
Safe implementation of connections and sessions +reinstatement 17 Safe Not safe Not safe Not safe
Usage of hardware instructions for digest +calculations, if available + - - +
Each connection multithreaded digest calculation + + - - -
Safe restart 18 + Safe ? Not safe before + v1.4.18. After - + probably safe. ?
iSCSI MIBs - - - +- + 12
Local access target +
Bidirectional support + - - +
Support for AENs (initiators can instantly see any +target reconfiguration in a PnP-like manner) + - - -
+ +
+

REMARKS:

+ +

1. All iSCSI management implemented in user space and actual data transfers in kernel space without user space involved.

+ +

2. The result "in average" is listed. One target can be better somewhere, another one somewhere else. Although manual tuning of target and + system parameters tends the restore the difference listed in the comparison. You can find example measurements here, + here and here.

+ +

3. All SCST and its drivers' kernel patches supposed to be applied and SCST with the drivers built in the release or performance build. + Without the kernel patches SCST performance will be at "****+" level, except for the case, when user space backstorage handler used + with iSCSI-SCST iSCSI target driver, where performance will be at "***+" level.

+ +

4 In SCST data are always passed in zero-copy manner between target and backend drivers without need for any + additional kernel patches, except in case, when local access (scst_local) used with user space backend.

+ +

5. Some zero-copy functionality isn't available from user space, sometimes fundamentally. + For instance, zero-copy FILEIO with page cache or zero-copy send to a socket. Also STGT can't use splice() for in-kernel + target drivers, because it has memory management in user space. To use splice() with socket-based user space target drivers + STGT would need a deep redesign of internal interactions between target drivers, core and backend handlers. But in + some cases STGT can use zero-copy sendfile().

+ +

6. "Local access to emulated backstorage devices" means that you can access emulated by a SCSI target devices + locally on the target host. For instance, you can mount your ISO image from emulated by the target + CDROM device locally on the target host.

+ +

7. "Advanced devices access control" means that different initiators can see different sets + of devices from the same target. This feature is required for hardware targets, which don't have ability + to create virtual targets.

+ +

8. "Cache safe" means that cache synchronization commands (SYNCHRONIZE_CACHE and FUA attribute) from initiators perform + what they expected to perform, i.e. push all the requested blocks from all caches, including devices' caches, + to non-volatile media.

+ +

9. SCSI requires that if an initiator clears reservation held by another initiator, the reservation holder must be notified + about the reservation clearance. Otherwise, several initiators can at the same time change supposed to be protected by the + reservation data, which can corrupt them. This is what was called + "Russian roulette with your data" on the VMware + community forum by someone working for VMware. But, sure, it can affect not only VMware, but also any other cluster + implementation, relying on this functionality.

+ +

10. After a task management command completed and before the corresponding response was sent to the initiator, who sent that task management + command, all the affected SCSI commands must get into a state, where they can't affect following after + the tasks management response commands from this initiator. This is the safe implementation. + The unsafe implementation only marks all the affected + SCSI commands as aborted and then immediately send task management response to the initiator. This implementation only + guarantees that the initiator will never receive responses from those commands, but it doesn't + guarantee that none of those commands will get executed by backstorage *AFTER* any + SCSI command, which initiator will send after it received the task management response thinking + that all the aborted commands actually fully aborted. This could lead to a data corruption.

+ +

11. Both IET and LIO report in INQUIRY command response support for full task management model. But they process ORDERED + commands the same way as SIMPLE commands, i.e. allow free reorder of them before they get executed. That violates SCSI standard + and can lead to a data corruption to any application relying on commands order provided by ORDERED attribute.

+ +

12. LIO exports the information needed for an RFC 4455 implementation, but requires additional RFC 4455 implementing module. + At the moment, there is no open source implementation of such module.

+ +

13. SCSI pass-through mode allows to export your local SCSI-capable device. For instance with it you can share your parallel + SCSI tape or SATA DVD-RW device to your iSCSI network.

+ +

14. STGT and LIO don't emulate all the necessary SCSI host functionality to allow to share SCSI devices + in pass-through mode to several initiators, although LIO has some necessary processing, but not all. + They can only pass SCSI commands from initiators to + SCSI devices and responses back. This is safe only with a single initiator. This limitation + isn't enforced anyhow and both STGT and LIO don't issue any warning about it, so an user will not be notified about this + limitation and can quietly corrupt his/her data. You can find more technical information about it + here. + Also LIO in pass-through mode doesn't do necessary sense processing for tape devices to + correctly return residual information, so tapes can used with it with limited functionality.

+ +

15. You can find a proposal how to implement zero-copy FILEIO in SCST on the + Contributing page.

+ +

16. Doesn't need any kernel patch, except in the case, when used with user space backend.

+ +

17. Connections and sessions reinstatement is, basically, a kind of Task Management command, because it implies commands aborting. + So, similarly to the safe task management above, a safe implementation of connections and sessions reinstatement + must not accept SCSI commands from new connection/session until all the SCSI commands in + being reinstated connection/session get into a state, where they can't affect new commands.

+ +

18. "Safe restart" means that after the iSCSI target restart, all the connected initiators will seamlessly restore all existing before + the restart connections. "Not safe" means that, most likely, the connected initiators will fail to restore + existing connections with some errors. However, your iSCSI initiator also should be able to handle the safe restart. For instance, + old (pre-CentOS/RHEL 5) open-iscsi has problems in this area. But the latest versions do it pretty well.

+ +

19. Generic implementation, i.e. not coupled to any particular cluster implementation, which means it is needed additional effort + to used with each particular cluster setup.

+ +
+
+
+ + + + + + + + + + diff --git a/www/contributing.html b/www/contributing.html new file mode 100644 index 000000000..c5b0155ec --- /dev/null +++ b/www/contributing.html @@ -0,0 +1,229 @@ + + + + + + + +SCST Contributing + + + +
+ + +
+
+

Contributing to SCST

+ +

If you would like to contribute to SCST development, you can do in many ways:

+ +
    +
  • By sending donations. + They will be spent on further work making SCST better, including buying new hardware, as well as on providing + better support and troubleshooting for you. If you want to donate another amount, than listed on the + provided buttons, you can directly edit URL they are pointing to.
  • +
  • By sending patches, which fix bugs or implement new functionality. + See below a list of possible SCST improvements with some possible + implementation ideas.
  • +
  • By writing or updating various documentation to keep it complete and up to date. + For instance, SCST internals description document is + in some areas quite outdated. Particularly, many functions were renamed since + time, when it was written. It would be good to bring it up to date.
  • +
  • By reporting bugs or other problems.
  • +
+ +

Possible SCST extensions and improvements

+ +

Asynchronous FILEIO in scst_vdisk handler

+ +

At the moment scst_vdisk handler for FILEIO uses regular synchronous read/write() calls + and makes deep queue depth by using multiple threads. This is not too high performance + model of operations. It would be much better to use asynchronous I/O with not blocking + I/O calls.

+ +

In the user space native AIO is available for many years, but only very recently ability to + use it was added in the kernel. Changing FILEIO to use the new interface should significantly + (up to multiple times) increase performance of FILEIO devices.

+ +

Support for O_DIRECT in scst_vdisk handler

+ +

At the moment, scst_vdisk handler doesn't support O_DIRECT option and possibility to set it + was disabled. This limitation caused by Linux kernel expectation that memory supplied to + read() and write() functions with O_DIRECT flag is mapped to some user space application. + Having O_DIRECT together with above asynchronous FILEIO would be another significant + performance boost for modern solid state devices. For instance, in fio utility + direct AIO long ago proven to be the fastest way to benchmark storage.

+ +

It is relatively easy to remove that limitation. Function dio_refill_pages() + should be modified to check before calling get_user_pages() if current->mm is not NULL. + If it is NULL, then, instead of calling get_user_pages(), dio->pages should be filled + by pages, taken directly from dio->curr_user_address. Each such page should be referenced + by page_cache_get(). That's all.

+ +

Solve SG IO count limitation issue in pass-through mode

+ +

In the pass-through mode (i.e. using the pass-through device handlers like + scst_tape, etc.) SCSI commands, coming from remote initiators, + are passed to local SCSI hardware on target as is, without any + modifications. As any other hardware, the local SCSI hardware can not + handle commands with amount of data and/or segments count in + scatter-gather array bigger some values. For some commands SCST can + split them on subcommands and, hence, workaround this problem, but it isn't + always possible. For instance, for tapes splitting write commands may mean + corrupting the tape data.

+ +

If you have this issue you will see + symptoms like small transfers work well, but large transfers stall and + messages like: "Unable to complete command due to SG IO count + limitation" are printed in the kernel logs.

+ +

The only complete way to fix this problem is to allocate data buffers with number + of entries inside the SG IO count limitation. In sgv_big_order_alloc.diff + you can find a possible way to solve this issue.

+ +

You can also look at patch + sgv_big_order_alloc-sfw5-rc3.diff + created by Frank Zago for SCST 2.0.0. It was submitted too late to be included in it. + Update for SCST trunk is welcome!

+ +

Note, scst_disk handler already implements a workaround for it.

+ +

Memory registration

+ +

In some cases a target driver might need to register memory used for data buffers in the + hardware. At the moment, none of SCST target drivers, including InfiniBand SRP target driver, + need that feature. But in case if in future there is a need in such a feature, it can be easily + added by extending SCST SGV cache. The SCST SGV cache is a memory management + subsystem in SCST. It doesn't free to the system each data buffer, + which is not used anymore, but keeps it for a while to let it be reused by the + next consecutive command to reduce command processing latency and, hence, improve performance.

+ +

To support memory buffers registrations, it can be extended by the following way:

+ +

1. Struct scst_tgt_template would be extended to have 2 new callbacks:

+ +
    + +
  • int register_buffer(struct scst_cmd *cmd)
  • + +
  • int unregister_buffer(unsigned long mem_priv, void *scst_priv)
  • + +
+ +

2. SCST core would be extended to have 4 new functions:

+ +
    + +
  • int scst_mem_registered(struct scst_cmd *cmd)
  • + +
  • int scst_mem_deregistered(void *scst_priv)
  • + +
  • int scst_set_mem_priv(struct scst_cmd *cmd, unsigned long mem_priv)
  • + +
  • unsigned long scst_get_mem_priv(struct scst_cmd *cmd)
  • + +
+ +

3. The workflow would be the following:

+ +
    +
  1. If target driver defined register_buffer() and unregister_buffer() callbacks, + SCST core would allocate a dedicated SGV cache for each instance of struct scst_tgt, + i.e. target.
  2. + +
  3. When there would be an SGV cache miss in memory buffer for a command allocation, + SCST would check if register_buffer() callback was defined in the target driver's template + and, if yes, would call it.
  4. + +
  5. In register_buffer() callback the target driver would do necessary actions to + start registration of the commands memory buffer.
  6. + +
  7. Upon register_buffer() callback returns, SCST core would suspend processing the + corresponding command and would switch to the next commands processing.
  8. + +
  9. After the memory registration finished, the target driver would call scst_set_mem_priv() + to associate the memory buffer with some internal data.
  10. + +
  11. Then the target driver would call scst_mem_registered() and SCST would resume processing + the command. Functions scst_set_mem_priv() and scst_mem_registered() can be called from inside register_buffer(). + In this case SCST core would continue processing the command immediately without suspending.
  12. + +
  13. After the command finished, the corresponding memory buffer would remain in the + SGV cache in the registered state and would be reused by the next commands. For each of them + the target driver can at any time figure out the associated with the registered buffer data + by using scst_get_mem_priv().
  14. + +
  15. When the SGV cache decide that there is a time to free the memory buffer, it would + call the target driver's unregister_buffer() callback.
  16. + +
  17. In this callback the target driver would do necessary actions to start deregistration of the + commands memory buffer.
  18. + +
  19. Upon unregister_buffer() callback returns, SGV cache would suspend freeing the corresponding buffer + and would switch to other deals it has.
  20. + +
  21. After the memory deregistration finished, the target driver would call scst_mem_deregistered() + and pass to it scst_priv pointer, received in unregister_buffer(). Then the memory buffer + would be freed by the SGV cache. Function scst_mem_deregistered() can be called from inside unregister_buffer(). + In this case SGV cache would free the buffer immediately without suspending. +
  22. +
+ +

SCST usage with non-SCSI transports

+ +

SCST might also be used with non-SCSI speaking transports, like NBD or AoE. Such cooperation + would allow them to use SCST-emulated backend.

+ +

For user space targets this is trivial: they simply should use SCST-emulated devices locally + via scst_local module.

+ +

For in-kernel non-SCSI target driver it's a bit more complicated. They should implement a small layer, + which would translate their internal READ/WRITE requests to corresponding SCSI commands and, on the + way back, SCSI status and sense codes to their internal status codes.

+ +

GET CONFIGURATION command

+ +

SCSI command GET CONFIGURATION is mandatory for SCSI multimedia devices, like CD/DVD-ROMs or + recorders, see MMC standard. Currently SCST lacks support for it, which leads to problems + with some programs depending on the result of GET CONFIGURATION command execution.

+ +

It would be good to add support for it in the SCST core.

+
+
+
+ + + + + + + + + + diff --git a/www/downloads.html b/www/downloads.html new file mode 100644 index 000000000..fd40f4bb6 --- /dev/null +++ b/www/downloads.html @@ -0,0 +1,99 @@ + + + + + + + + +SCST Downloads + + + + +
+ + + + + +
+
+

SCST Downloads

+ +

The latest stable version of SCST is 3.3. + The latest updates for that version are + available on the 3.3.x branch in the SVN + repository.

+ +

You can also download prebuilt SCST modules for + Scientific Linux CERN 5 (RHEL5-based), + Ubuntu, + Debian, + Alpine Linux and + openSUSE + (spec).

+ +

Since recently, SCST allows to build Debian packages using "make dpkg" command. Previous instructions how to build + SCST DKMS package for Debian-based systemd-enabled Linuxes you can find + here.

+ +

As a complete SCST-based system you can try Enterprise Storage OS (ESOS).

+ +

There is also a well done user space port, which you can find here.

+ +

The latest development version of SCST is 3.4. You can download it as well as target drivers and user space + utilities directly from the SCST SVN. You can access it using either + web-based SVN repository viewer or using anonymous access:

+ +

svn checkout svn://svn.code.sf.net/p/scst/svn/trunk scst-trunk

+ +

Also you can find in the SCST SVN the latest updates for the stable branches. More information about accessing SVN repository may be found + here. Or, alternatively, you can download it as a GNU tarball from + here.

+ +

History of the pre-SVN SCST development is available in SCST CVS repository, which is accessible using + web-based CVS repository viewer, or anonymous CVS access.

+ + +
+
+
+ + + + + + + + + diff --git a/www/handler_fileio_tgt.html b/www/handler_fileio_tgt.html new file mode 100644 index 000000000..bf16041a6 --- /dev/null +++ b/www/handler_fileio_tgt.html @@ -0,0 +1,97 @@ + + + + + + + + +FILEIO Target SCST Handler + + + + +
+ + + + + +
+ + +
+

FILEIO_TGT handler

+ +

User space program fileio_tgt uses interface of SCST's scst_user dev + handler and allows to see how it works in various modes. + Fileio_tgt provides mostly the same functionality as the kernel space + SCST's scst_vdisk handler with the only exceptions that it supports + O_DIRECT mode and doesn't support BLOCKIO one. O_DIRECT mode is + basically the same as BLOCKIO, but also supports files, so for some + loads it could be significantly faster, than the regular FILEIO access. + All the words about BLOCKIO mode from SCST's README file apply to + O_DIRECT mode as well.

+ +

You can find the latest development version of this handler in the SCST SVN. See the download page how to setup + access to it.

+ +
+
+
+ + + + + + + + + diff --git a/www/images/LPe16002.jpg b/www/images/LPe16002.jpg new file mode 100644 index 000000000..ce16b045b Binary files /dev/null and b/www/images/LPe16002.jpg differ diff --git a/www/images/Orange.css b/www/images/Orange.css new file mode 100644 index 000000000..c7cd8500c --- /dev/null +++ b/www/images/Orange.css @@ -0,0 +1,357 @@ +* { margin: 0; padding: 0; } + +body { + margin: 0; padding: 0; + font: 71%/1.5em Verdana, 'Trebuchet MS', Arial, Sans-serif; + background: url(headerbg-orange.gif) repeat-x; + color: #666666; + text-align: center; +} + +/* links */ +a { + background: inherit; + color: #EC981F; +} +a:hover { + background: inherit; + color: #806B4D; +} + +/* headers */ +h1, h2, h3, .companysubtitles { + font: bold 1em 'Trebuchet MS', Tahoma, Sans-serif; + text-transform: uppercase; + color: #555; +} +h1 { font-size: 1.5em; } +h2 { font-size: 1.3em; } +h3 { font-size: 1.2em; text-transform: none;} +.companysubtitles { font-size: 1.3em; } + +#main h1, #rightbar h1 { + padding: 10px 0 5px 5px; + margin: 0 0 0 10px; + text-transform: uppercase; + border-bottom: 1px solid #f2f2f2; +} +#sidebar h1 { + padding: 10px 0px 5px 30px; + background: url(square_arrow.gif) no-repeat 2px 12px; + margin: 0; + text-transform: uppercase; +} + +p, h1, h2, h3, h4, .companysubtitles { margin: 10px 15px; } + +ul, ol { + margin: 10px 30px; + padding: 0 15px; + color: #EC981F; +} + +ul span, ol span { color: #666666; } + +/* images */ +/* img { border: 2px solid #CCC; } */ +img.float-right { margin: 5px 0px 5px 15px; } +img.float-left { margin: 5px 15px 5px 0px; } + +a img { border: 0px solid #EC981F; } +a:hover img { border: 0px solid #806B4D !important;border: 0px solid #EC981F; } + +code { + margin: 5px 0; + padding: 10px; + text-align: left; + display: block; + overflow: auto; + font: 500 1em/1.5em 'Lucida Console', 'courier new', monospace; + background: #FAFAFA; + border: 1px solid #f2f2f2; + border-left: 3px solid #EC981F; +} + +acronym { cursor: help;border-bottom: 1px solid #777; } + +blockquote { + margin: 15px; + padding: 0 0 0 32px; + background: #FAFAFA url(quote.gif) no-repeat 5px 10px !important; + background-position: 8px 10px; + border: 1px solid #f2f2f2; + border-left: 3px solid #EC981F; + font-weight: bold; +} + +/* form elements */ +form { + margin:10px; padding: 0; + border: 1px solid #f2f2f2; + background-color: #FAFAFA; +} +label { + display:block; + font-weight:bold; + margin:5px 0; +} +input { + padding: 4px; + border:1px solid #eee; + font: normal 1em/1.5em Verdana, sans-serif; + color:#777; +} +textarea { + width:350px; + padding:4px; + font: normal 1em/1.5em Verdana, sans-serif; + border:1px solid #eee; + height:100px; + display:block; + color:#777; +} +input.button { + margin: 0; + font: bold 1em Arial, Sans-serif; + border: 1px solid #CCC; + background: #FFF; + padding: 2px 3px; + color: #333; +} + +/* search form */ +form.searchform { + background: transparent; + border: none; + margin: 0; padding: 0; +} +form.searchform input.textbox { + margin: 0; + width: 120px; + border: 1px solid #CCC; + background: #FFF; + color: #333; + vertical-align: top; +} +form.searchform input.button { + width: 55px; + vertical-align: top; +} + +/*****************/ +/* Layout */ +/*****************/ +#wrap { + margin: 0 auto; + width: 908px; + text-align: left; + background: #FFF; +} +#content-wrap { + clear:both; + margin: 0; padding:0; + width: 908px; +} + +/* header */ +#header { + position: relative; + background: url(headerbg-orange.gif) repeat-x 0% 0%; + height: 84px; +} + +div.logoimg { + position: relative; + background: url(logo.gif) no-repeat; + top: 15px; + height: 50px; +} + +#header h1#logo { + position: absolute; + margin: 0; padding: 0; + font: bolder 4.1em 'Trebuchet MS', Arial, Sans-serif; + letter-spacing: -2px; + color: #CCC; + /*text-transform: lowercase;*/ + /* change the values of top and Left to adjust the position of the logo*/ + top: 0; left: 55px; +} +#header h2#slogan { + position: absolute; + margin: 0; padding: 0; + font: bold 2em 'Trebuchet MS', Arial, Sans-serif; + text-transform: none; + color: #FFF; + /* change the values of top and Left to adjust the position of the slogan*/ + top: 30px; left:65px; +} +#header .searchform { + position: absolute; + top: 5px; right: 3px; +} + +/* main column */ +#main { + float: left; + margin-left: 15px; + padding: 0; + /*width: 54%;*/ + width: 72%; + /*border-left: 1px solid #f2f2f2; */ +} + +#main2 { + float: left; + margin-left: 15px; + padding: 0; + width: 96%; +} + +.post-footer { + background-color: #FAFAFA; + padding: 5px; margin-top: 20px; + font-size: 95%; + border: 1px solid #f2f2f2; +} +.post-footer .date { + background: url(clock.gif) no-repeat left center; + padding-left: 20px; margin: 0 10px 0 5px; +} +.post-footer .comments { + background: url(comment.gif) no-repeat left center; + padding-left: 20px; margin: 0 10px 0 5px; +} +.post-footer .readmore { + background: url(page.gif) no-repeat left center; + padding-left: 20px; margin: 0 10px 0 5px; +} + +/* sideabar */ +#sidebar { + float: left; + width: 24%; + margin: 0; padding: 0; + display: inline; +} +#sidebar ul.sidemenu { + list-style: none; + text-align: left; + margin: 0 0 8px 0; + padding-right: 0; + text-decoration: none; +} +#sidebar ul.sidemenu li { + border-bottom: 1px solid #EFF0F1; + background: url(arrow.gif) no-repeat 2px 5px; + padding: 2px 5px 2px 20px; +} + +* html body #sidebar ul.sidemenu li { height: 1%; } + +#sidebar ul.sidemenu a { + font-weight: bold; + background-image: none; + text-decoration: none; +} + +/* rightbar */ +#rightbar { + float: right; + width: 24%; + padding: 0; margin: 0; +} + +/* footer */ +#footer { + clear: both; + background: #FFF url(footerbg.gif) repeat-x left top; + border-top: 1px solid #F2F2F2; + text-align: center; + height: 50px; +} +#footer a { + text-decoration: none; + font-weight: bold; +} + +/* menu */ +#menu { + clear: both; + margin: 0; padding: 0; +} +#menu ul { + position: relative; + bottom: 4px; + margin: 0; padding: 0; + float: left; + font: bold 1.4em 'Trebuchet MS', Tahoma, Arial, Sans-serif; + width: 850px;/* 775px; */ + border: 1px solid #808080; + border-width: 0 0 4px 0; + list-style: none; +} +#menu ul li{ + display: inline; +} +#menu ul li a { + position: relative; bottom: -4px; + float: left; + color: #808080; + padding: 0px 10px; + text-decoration: none; + background: white url(menudivide.gif) repeat-y right top; + border-bottom: 4px solid #808080; +} +#menu ul li a:hover{ + color: black; + background-color: #F3F3F3; + border-bottom: 4px solid #FFA600; +} +#menu ul li#current a{ + color: #333; + background-color: #F3F3F3; + border-bottom: 4px solid #FFA600; +} +#menu ul li#sponsorship a{ + color: red; +} +#menu ul li#sp_current a{ + color: red; + background-color: #F3F3F3; + border-bottom: 4px solid #FFA600; +} + +/* Font colors */ +font.names { color: #EC981F ; } + +/* Company Logo Boxes */ +.companybox { + border-color : #FFFFFF; + border : 0px #FFFFFF; + border-top : #999999 0pt solid; + border-left : #999999 0pt solid; + border-right : #999999 0pt solid; + border-bottom : #999999 0pt solid; + text-align: left; + padding: 10px 10px 10px 10px; +} + + +/* Alignment classes */ +.float-left { float: left;} +.float-right { float: right; } +.align-left { text-align: left; } +.align-right { text-align: right; } +.align-center { text-align: center; } +.align-justify { text-align: justify; } + +/* display classes */ +.clear { clear: both; } +.block { display: block; } +.hide { display: none; } +.orange { color: #FFA600; } +.tab { padding: 0px 0px 0px 16px; } + + + diff --git a/www/images/arrow.gif b/www/images/arrow.gif new file mode 100644 index 000000000..b139f7e40 Binary files /dev/null and b/www/images/arrow.gif differ diff --git a/www/images/avago.jpg b/www/images/avago.jpg new file mode 100644 index 000000000..98be7cac0 Binary files /dev/null and b/www/images/avago.jpg differ diff --git a/www/images/clock.gif b/www/images/clock.gif new file mode 100644 index 000000000..df5d85da6 Binary files /dev/null and b/www/images/clock.gif differ diff --git a/www/images/comment.gif b/www/images/comment.gif new file mode 100644 index 000000000..951082f93 Binary files /dev/null and b/www/images/comment.gif differ diff --git a/www/images/fig1.png b/www/images/fig1.png new file mode 100644 index 000000000..94b9fa071 Binary files /dev/null and b/www/images/fig1.png differ diff --git a/www/images/fig2.png b/www/images/fig2.png new file mode 100644 index 000000000..3bdf882dc Binary files /dev/null and b/www/images/fig2.png differ diff --git a/www/images/fig3.png b/www/images/fig3.png new file mode 100644 index 000000000..e502f2dd3 Binary files /dev/null and b/www/images/fig3.png differ diff --git a/www/images/fig4.png b/www/images/fig4.png new file mode 100644 index 000000000..667e41289 Binary files /dev/null and b/www/images/fig4.png differ diff --git a/www/images/footerbg.gif b/www/images/footerbg.gif new file mode 100644 index 000000000..11bdbfc69 Binary files /dev/null and b/www/images/footerbg.gif differ diff --git a/www/images/headerbg-orange.gif b/www/images/headerbg-orange.gif new file mode 100644 index 000000000..22b25faaa Binary files /dev/null and b/www/images/headerbg-orange.gif differ diff --git a/www/images/headerbg.gif b/www/images/headerbg.gif new file mode 100644 index 000000000..92002f723 Binary files /dev/null and b/www/images/headerbg.gif differ diff --git a/www/images/init_scst.png b/www/images/init_scst.png new file mode 100644 index 000000000..98e342d22 Binary files /dev/null and b/www/images/init_scst.png differ diff --git a/www/images/iss.jpg b/www/images/iss.jpg new file mode 100644 index 000000000..d9a395393 Binary files /dev/null and b/www/images/iss.jpg differ diff --git a/www/images/logo.gif b/www/images/logo.gif new file mode 100755 index 000000000..2ae2688d0 Binary files /dev/null and b/www/images/logo.gif differ diff --git a/www/images/menubg.gif b/www/images/menubg.gif new file mode 100644 index 000000000..eee09e1e6 Binary files /dev/null and b/www/images/menubg.gif differ diff --git a/www/images/menubg_current.gif b/www/images/menubg_current.gif new file mode 100644 index 000000000..a0c506f15 Binary files /dev/null and b/www/images/menubg_current.gif differ diff --git a/www/images/menudivide.gif b/www/images/menudivide.gif new file mode 100644 index 000000000..00abf242e Binary files /dev/null and b/www/images/menudivide.gif differ diff --git a/www/images/page.gif b/www/images/page.gif new file mode 100644 index 000000000..0c49ba877 Binary files /dev/null and b/www/images/page.gif differ diff --git a/www/images/quote.gif b/www/images/quote.gif new file mode 100644 index 000000000..43cbdb3fa Binary files /dev/null and b/www/images/quote.gif differ diff --git a/www/images/scst_cmd_thread.png b/www/images/scst_cmd_thread.png new file mode 100644 index 000000000..824c7f6ff Binary files /dev/null and b/www/images/scst_cmd_thread.png differ diff --git a/www/images/scst_mgmt_cmd_thread.png b/www/images/scst_mgmt_cmd_thread.png new file mode 100644 index 000000000..dfcaa19b4 Binary files /dev/null and b/www/images/scst_mgmt_cmd_thread.png differ diff --git a/www/images/scst_mgmt_thread.png b/www/images/scst_mgmt_thread.png new file mode 100644 index 000000000..78f34e7f6 Binary files /dev/null and b/www/images/scst_mgmt_thread.png differ diff --git a/www/images/square_arrow.gif b/www/images/square_arrow.gif new file mode 100644 index 000000000..29dbcb85f Binary files /dev/null and b/www/images/square_arrow.gif differ diff --git a/www/images/t_emulex.gif b/www/images/t_emulex.gif new file mode 100644 index 000000000..5c4c3d3bb Binary files /dev/null and b/www/images/t_emulex.gif differ diff --git a/www/images/t_fcoe.gif b/www/images/t_fcoe.gif new file mode 100755 index 000000000..d8b63e00e Binary files /dev/null and b/www/images/t_fcoe.gif differ diff --git a/www/images/t_lsi.gif b/www/images/t_lsi.gif new file mode 100755 index 000000000..7c7caa8cd Binary files /dev/null and b/www/images/t_lsi.gif differ diff --git a/www/images/t_qlogic.gif b/www/images/t_qlogic.gif new file mode 100644 index 000000000..5f351cce2 Binary files /dev/null and b/www/images/t_qlogic.gif differ diff --git a/www/images/t_rdma.gif b/www/images/t_rdma.gif new file mode 100755 index 000000000..0b985bf23 Binary files /dev/null and b/www/images/t_rdma.gif differ diff --git a/www/images/t_sas.gif b/www/images/t_sas.gif new file mode 100644 index 000000000..38edd7074 Binary files /dev/null and b/www/images/t_sas.gif differ diff --git a/www/images/t_unsupported.gif b/www/images/t_unsupported.gif new file mode 100755 index 000000000..6ff9b1771 Binary files /dev/null and b/www/images/t_unsupported.gif differ diff --git a/www/images/tooltips.js b/www/images/tooltips.js new file mode 100644 index 000000000..307095d1c --- /dev/null +++ b/www/images/tooltips.js @@ -0,0 +1,52 @@ +var d = document; +var offsetfromcursorY=15 // y offset of tooltip +var ie=d.all && !window.opera; +var ns6=d.getElementById && !d.all; +var tipobj,op; + +function tooltip(el,txt) { + tipobj=d.getElementById('mess'); + tipobj.innerHTML = txt; + op = 0.1; + tipobj.style.opacity = op; + tipobj.style.display="block"; + tipobj.style.visibility="visible"; + + el.onmousemove=positiontip; + appear(); +} + +function hide_info(el) { + d.getElementById('mess').style.visibility='hidden'; + d.getElementById('mess').style.display='none'; + el.onmousemove=''; +} + +function ietruebody(){ +return (d.compatMode && d.compatMode!="BackCompat")? d.documentElement : d.body +} + +function positiontip(e) { + var curX=(ns6)?e.pageX : event.clientX+ietruebody().scrollLeft; + var curY=(ns6)?e.pageY : event.clientY+ietruebody().scrollTop; + var winwidth=ie? ietruebody().clientWidth : window.innerWidth-20 + var winheight=ie? ietruebody().clientHeight : window.innerHeight-20 + + var rightedge=ie? winwidth-event.clientX : winwidth-e.clientX; + var bottomedge=ie? winheight-event.clientY-offsetfromcursorY : winheight-e.clientY-offsetfromcursorY; + + if (rightedge < tipobj.offsetWidth) tipobj.style.left=curX-tipobj.offsetWidth+"px"; + else tipobj.style.left=curX+"px"; + + if (bottomedge < tipobj.offsetHeight) tipobj.style.top=curY-tipobj.offsetHeight-offsetfromcursorY+"px" + else tipobj.style.top=curY+offsetfromcursorY+"px"; +} + +function appear() { + if(op < 0.9) { + op += 0.07; + tipobj.style.opacity = op; + tipobj.style.filter = 'alpha(opacity='+op*100+')'; + t = setTimeout('appear()', 30); + } +} diff --git a/www/index.html b/www/index.html new file mode 100644 index 000000000..d8ea84628 --- /dev/null +++ b/www/index.html @@ -0,0 +1,207 @@ + + + + + + + + + +SCST: A Generic SCSI Target Subsystem for Linux + + + +
+ + +
+
+

Generic SCSI Target Subsystem for Linux

+ +

The generic SCSI target subsystem for Linux (SCST) allows creation of sophisticated storage devices + from any Linux box. Those + devices can provide advanced + functionality, like replication, thin provisioning, + deduplication, high availability, + automatic backup, etc. Another class of such devices + are Virtual Tape Libraries (VTL) + as well as other disk-based backup solutions.

+

SCST devices can use any link which supports + SCSI-style data exchange: iSCSI, Fibre Channel, FCoE, + SAS, + InfiniBand (SRP), Wide (parallel) SCSI, etc.

+

It might + well be that your favorite storage appliance is running SCST in the firmware.

+ +

SCST project consists from a set of subprojects: generic SCSI target mid-layer itself (SCST core) + with a set of device handlers as well as target drivers + and user space utilities. + +

Features of SCST Core

+
    +
  • SCST core provides unified, + consistent interface between SCSI target drivers and + Linux kernel as well as between Linux kernel and storage backend + handlers, connecting target drivers with real or emulated storage backend.
  • +
  • SCST core performs all required pre- and post- processing of incoming requests as well as + necessary error recovery.
  • +
  • SCST core undertakes most problems, related to execution contexts, thus practically eliminating one of the most + complicated problem in the kernel drivers development. For example, target drivers for Marvell + SAS adapters or for InfiniBand SRP are less 3000 lines of code long.
  • +
  • Very low overhead and fine-grained locks allow to reach the + maximum performance and scalability. Particularly, incoming requests can be processed in + the caller's context or in one of the internal SCST core's tasklets without any + extra context switches.
  • +
  • Device handlers architecture allows various I/O + modes in backstorage handling. For example, pass-through device handlers allow to export real + SCSI hardware and vdisk device handler allows to export files as virtual disks.
  • +
  • Advanced per-initiator devices visibility management (LUN masking) allows different + initiators to see different set of devices with different access permissions. For instance, + initiator A could see exported from target T devices X and Y read-writable, and initiator B from + the same target T could see devices Y read-only and Z read-writable. + This feature is required for hardware targets, which don't have ability to create + virtual targets (SAS adapters, for instance).
  • +
  • SCST core emulates necessary functionality of SCSI host adapter, because from remote initiators' point of view + a SCSI target acts as a SCSI host with its own devices. This is especially important in pass-through mode with + one to many relationship, i.e. when multiple initiators can connect to the exported pass-through + devices. You can find more deep elaboration why it is needed in this + message in thread "Question for pass-through target design" in linux-scsi mailing list. Some of the emulated functions are the following: +
      +
    • Generation of necessary UNIT ATTENTIONs, their storage and delivery to all connected + remote initiators.
    • + +
    • RESERVE/RELEASE functionality.
    • + +
    • All types of RESETs and other task management functions.
    • + +
    • REPORT LUNS command as well as SCSI address space management in order to have consistent + address space on all remote initiators, since local SCSI devices could not know about each + other to report via REPORT LUNS command. Additionally, SCST core responds with error on all + commands to non-existing devices and provides access control, so different remote + initiators could see different set of devices.
    • + +
    • Other necessary functionality (task attributes, etc.) as specified in SCSI standards.
    • +
    +
  • + +
  • SCST core has multithreaded design and complete SMP support, so, if necessary, all your CPU cores will participate in the commands + processing.
  • +
  • Well documented.
  • +
+

Interoperability between remote and local SCSI initiators (i.e. sd, st, etc.) is the additional issue that SCST is going to + address (it is not implemented yet). It is necessary, because local SCSI initiators can change the state of the + device, for example RESERVE the device, or some of its parameters and that could be done behind SCST, i.e. remote initiators + will not know about it, which could + lead to various problems, including data corruption. Thus, RESERVE/RELEASE commands, locally generated + UNIT ATTENTIONs, etc. should be intercepted and passed through SCST core.

+ +

You can find comparison of SCST with other SCSI targets on the Comparison page. + Some highlights what it can mean for end users you can find on the iSCSI-SCST page. + +

SCST core supports the following I/O modes

+
    +
  • Pass-through mode with one to many relationship, i.e. when multiple initiators can + connect to the exported pass-through devices, for virtually all SCSI devices types: disks (type 0), + tapes (type 1), processors (type 3), CDROMs (type 5), MO disks (type 7), medium changers (type 8) and RAID + controllers (type 0xC). In this mode you can, for instance, share your parallel SCSI tape or SATA + DVD-RW device to your iSCSI network.
  • +
  • FILEIO mode, which allows to use files on file systems or block devices as virtual + remotely available SCSI disks or CDROMs with benefits of the Linux cache.
  • +
  • BLOCKIO mode, which performs direct block IO with a block device, bypassing + page-cache for all operations. This mode works well with high-end storage HBAs and for applications that + either do not need caching between application and disk or need the large block throughput.
  • +
  • User space mode using scst_user device handler, which allows to implement in the + user space high performance virtual SCSI devices. Comparing with fully in-kernel dev handlers + this mode has very low overhead (few %%).
  • +
  • Performance testing device handlers as well as NULLIO mode to provide + a way for direct performance measurements without overhead of actual data + transfers from/to underlying SCSI devices. +
  • +
+ +

Certification

+ +

SCST core engine has passed VMware certification as part of + Scale’s Intelligent Clustered Storage technology + and ISS STORCIUM solution. It also has passed + VMware and Microsoft certification as part of SanDisk ION Accelerator + and storage arrays developed by Open-E, Inc.

+ +

In October 2012 Hewlett-Packard ProLiant BL465c Gen8 witrh SCST-based storage earned the maximum + score 59.99@62 tiles in VMmark Version 2.1.1. + In June 2016 this result was updated by HPE ProLiant DL580 Gen9

+
+ +
+
+ + + + + + + + + diff --git a/www/iscsi-scst/index.html b/www/iscsi-scst/index.html new file mode 100644 index 000000000..2baabeccc --- /dev/null +++ b/www/iscsi-scst/index.html @@ -0,0 +1,21 @@ + + + + + + +Old iSCSI-SCST page + + + + + + + + diff --git a/www/max_outstanding_r2t.txt b/www/max_outstanding_r2t.txt new file mode 100644 index 000000000..ef60d8af7 --- /dev/null +++ b/www/max_outstanding_r2t.txt @@ -0,0 +1,61 @@ + MaxOutstandingR2T iSCSI parameter and its + influence on performance in case of high + latency links. + + +Let's consider we have a 1Gbps network between initiator and target with +10ms latency (a good near distance WAN/Internet to another building in +the same town). We want to send backup from the initiator to a tape or +tape library on the target. We are limited to send only 1 write command +at time, because our tape doesn't allow more. We will send 2MB of data +in each command. + +Our initiator and target negotiated typical values InitialR2T No, +ImmediateData Yes, FirstBurstLength 65536, MaxBurstLength 262144. Other +parameters don't matter for our task, except MaxOutstandingR2T. Let we +can run 2 pieces of iSCSI target software on the target: one supporting +only MaxOutstandingR2T 1 and another one supporting MaxOutstandingR2T >1. +The first target negotiated MaxOutstandingR2T 1, the second one - 16. + +10ms means that a 1 byte packet send from the initiator reaches the +target in 5ms time. Then 5ms in the opposite direction. 1Gbps bandwidth +means that 64K of data transferred from the initiator to the target in +about 0.5ms. The maximum throuput we can have with 1Gbps link is about +120MB/s. + +For sake of simplification we will suppose the targets will process +requests and the initiator will process responses quick enough, so we +can ignore additional latency the processing on both sides adds. + +Since each R2T request must contain less than MaxBurstLength data, we +need ((2MB - 64K)/256K) = 8 R2T requests to send. The first 64K will be +sent as immediate/unsolicited data without explicit R2T request. + + +1. The first target with MaxOutstandingR2T 1. + +MaxOutstandingR2T 1 means that only one R2T request can be active on +time. I.e., the next request for data transfer can be sent after the +previous one completed and all the data received. + +Thus, on the first target each command will be completed on time: + +5 + 0.5 + (5 + 5 + 256K/64K * 0.5) * 7 + (5 + 5 + (256K - 64K)/64K * +0.5) + 5 = 106ms, i.e. 9 IOPS, which is 18MB/s. Remember, we have +120MB/s link. + + +2. The second target with MaxOutstandingR2T 16. + +With MaxOutstandingR2T 16 the second target can send all R2T requests at +once and the first R2T can be sent before immediate/unsolicited data +received. + +Thus, on the second target each command will be completed on time: + +5 + 5 + 5 + 256K/64K * 0.5 * 7 + (256K-64K)/64K * 0.5 + 5 = 35.5ms, +i.e. 28 IOPS, which is 56MB/s. + + +Thus, the second target with MaxOutstandingR2T 16 will perform on 56/18 = 311% +better that the first target with MaxOutstandingR2T 1. diff --git a/www/mc_s.html b/www/mc_s.html new file mode 100644 index 000000000..17ab34d17 --- /dev/null +++ b/www/mc_s.html @@ -0,0 +1,266 @@ + + + + + + + + +MC/S vs MPIO + + + + +
+ + + + + +
+ + +
+ +

MC/S vs MPIO

+ +

MC/S (Multiple Connections per Session) is a feature of iSCSI +protocol, which allows to combine several connections inside a single +session for performance and failover purposes. Let's consider what +practical value this feature has comparing with OS level multipath +(MPIO) and try to answer why none of Open Source OS'es neither still +support it, despite of many years since iSCSI protocol started +being actively used, nor going to implement it in the future.

+ +

MC/S is done on the iSCSI level, while MPIO is done on the higher +level. Hence, all MPIO infrastructure is shared among all SCSI +transports, including Fibre Channel, SAS, etc.

+ +

MC/S was designed at time, when most OS'es didn't have standard OS level +multipath. Instead, each vendor had its own implementation, which +created huge interoperability problems. So, one of the goals of MC/S was +to address this issue and standardize the multipath area in a single standard. But +nowadays almost all OS'es has OS level multipath implemented using +standard SCSI facilities, hence this purpose of MC/S isn't valid anymore.

+ +

Usually it is claimed, than MC/S has the following 2 advantages over MPIO:

+ +
    +
  1. Faster failover recovery.
  2. + +
  3. Better performance.
  4. + +
+ +

Let's look how realistic those claims are.

+ +

Failover recovery time

+ +

Let's consider a single target exporting a single device over 2 links.

+ +

For MC/S failover recovery is quite simple: all outstanding SCSI +commands reassigned to another connection. No other actions are +necessary, because session (i.e. I_T Nexus) remains the same. +Consequently, all reservations and other SCSI states as well as other +initiators connected to the device remain unaffected.

+ +

For MPIO failover recovery is much more complicated. This is because +it involves transfer of all outstanding commands and SCSI states from one +I_T Nexus to another. The first thing, which initiator will do for +that is to abort all outstanding commands on the faulted +I_T Nexus. There are 2 approaches for that: CLEAR TASK SET and LUN RESET +task management functions.

+ +

CLEAR TASK SET function aborts all commands on the device. +Unfortunately, it has limitations: it isn't always supported by device +and having single task set shared over initiators isn't always +appropriate for application.

+ +

LUN RESET function resets the device.

+ +

Both CLEAR TASK SET and LUN RESET functions can somehow harm +other initiators, because all commands from all initiators, not only +from one doing the failover recovery, will be aborted. Additionally, LUN +RESET resets all SCSI settings for all connected initiators to the +initial state and, if device had reservation from any initiator, it will +be cleared. + +

But the harm is minimal:

+ +
    +
  • With TAS bit set on Control Mode page, all the aborted commands will + be returned to all affected initiators with TASK ABORTED status, so they + can simply immediately retry them. For CLEAR TASK SET if TAS isn't set + all affected initiators will be notified by Unit Attention COMMANDS + CLEARED BY ANOTHER INITIATOR, so they also can immediately retry all + outstanding commands.
  • + +
  • In case of the device reset the affected initiators will be notified via + the corresponding Unit Attention about reset of + all SCSI settings to the initial state. Then the initiators can do necessary + recovery actions. Usually no recovery actions are needed, except for the + reservation holder, whose reservation was cleared. For it recovery might + be not trivial. But Persistent Reservations solve this issue, because + they are not cleared by the device reset.
  • +
+ +

Thus, with Persistent Reservations or using CLEAR TASK SET function +additional failover recovery time, which MPIO has comparing to MC/S, +is time to wait for reset or commands abort finished and time to +retry all the aborted commands. On a properly configured system it +should be less than few seconds, which is well acceptable on practice. +If Linux storage stack improved to allow to abort all submitted to it +commands (currently only wait for their completion is possible), then +time to abort all the commands can be decreased to a fraction of second.

+ +

Performance

+ +

At first, neither MC/S, nor MPIO can improve performance if there is +only one SCSI command sent to target at time. For instance, in case of +tape backup and restore. Both MC/S and MPIO work on the commands level, +so can't split data transfers for a single command over several links. +Only bonding (also known as NIC teaming or Link Aggregation) can improve +performance in this case, because it works on the link level.

+ +

MC/S over several links preserves commands execution order, i.e. with +it commands executed in the same order as they were submitted. MPIO +can't preserve this order, because it can't see, which command on which +link was submitted earlier. Delays in links processing can change +commands order in the place where target receives them.

+ +

Since initiators usually send commands in the optimal for performance +order, reordering can somehow hurt performance. But this can happen only with +naive target implementation, which can't recover the optimal commands execution +order. Currently Linux is not naive and quite good on this area. See, for +instance, section "SEQUENTIAL ACCESS OVER MPIO" in those measurements. Don't look at the absolute +numbers, look at %% of performance improvement using the second link. +The result equivalent to 200 MB/s over 2 1Gbps links, which is close to +possible maximum.

+ +

If free commands reorder is forbidden for a device, either +by use of ORDERED tag, or if the Queue Algorithm Modifier in the Control +Mode Page is set to 0, then MPIO will have to maintain commands order by +sending commands over only a single link. But on practice this case is +really rare and 99.(9)% of OS'es and applications allow free commands +reorder and it is enabled by default.

+ +

From other side, strictly preserving commands order as MC/S does has a +downside as well. It can lead to so called "commands ordering +bottleneck", when newer commands have to wait before one or more older +commands get executed, although it would be better for performance to +reorder them. As result, MPIO sometimes has better performance, than +MC/S, especially in setups, where maximum IOPS number is important. See, +for instance, +here. +

+ +

When MC/S is better than MPIO

+ +

For sake of completeness, we should mention that there are marginal cases, where MPIO can't be used or will not +provide any benefit, but MC/S can be successful:

+ +
    +
  1. When strict commands order is required.
  2. + +
  3. When aborted commands can't be retried.
  4. + +
+ +

For disks both of them are always false. However for some tape drives +and backup applications one or both can be true. But on practice:

+ +
    + +
  • There are neither known tape drives, nor backup + applications, which can use multiple outstanding commands at + time. All them support and use only one single outstanding + command at time. MC/S can't increase performance for them, only + bonding can. So, in this case there no difference between MC/S + and MPIO.
  • + +
  • The lack of ability to retry commands is rather a + limitation of legacy tape drives, which support only implicit + address commands, not of MPIO. Modern tape drives and backup + applications can use explicit address commands, which you can + abort and then retry, hence they are compatible with MPIO.
  • + +
+ +

Conclusion

+ +

Thus:

+ +
    +
  1. Cost to develop MC/S is high, but benefits of it are marginal and with future MPIO + improvements can be fully eliminated.
  2. + +
  3. MPIO allows to utilize existing infrastructure for all + transports, not only iSCSI. +
  4. + +
  5. All transports can benefit from improvements in MPIO.
  6. + +
  7. With MPIO there is no need to create multiple layers doing very similar + functionality.
  8. + +
  9. MPIO doesn't have commands ordering bottleneck, which MC/S has.
  10. + +
+ +

Simply, MC/S is rather a workaround done on the wrong level for some deficiencies of existing SCSI standards used for MPIO, +namely the lack of possibility to group several I_T Nexuses with ability to reassign commands +between them and preserve commands order among them. If in future those features added in the SCSI standards, MC/S will +not be needed at all, hence, all investments in it will be voided. No surprise then that no +Open Source OS'es neither support, nor going to implement it. Moreover, +when back to 2005 there was an attempt to add MC/S capable iSCSI initiator in Linux, it was +rejected. See for more details here +and here. +

+ +
+
+
+ + + + + + + + + + diff --git a/www/scst_admin.html b/www/scst_admin.html new file mode 100644 index 000000000..8f644bd74 --- /dev/null +++ b/www/scst_admin.html @@ -0,0 +1,95 @@ + + + + + + + + +SCST Admin Utility + + + + +
+ + + + + +
+ + +
+

SCST administration utility

+ +

SCST administration utility scstadmin developed by Mark Buechler.

+ +

With it you can manually or automatically using either plain text config file, or MySQL database configure every aspect of SCST.

+ +

Especially useful feature of scstadmin is ability to figure out and apply on the fly on the currently + running system changes in scst.conf file. In other words, you can have SCST subsystem running with configuration from file + scst.conf, then you edit this file, e.g. add new devices, then scstadmin will figure out that you added those devices + and add them to SCST.

+ + +
 
+
+
+
+ + + + + + + + + diff --git a/www/scstvslio.html b/www/scstvslio.html new file mode 100644 index 000000000..4cbc09489 --- /dev/null +++ b/www/scstvslio.html @@ -0,0 +1,119 @@ + + + + + + + +SCST vs LIO/TCM + + + + +
+ + + + + +
+ + +
+

SCST vs LIO/TCM

+

LIO, + from recently being renamed to TCM, is another independent from SCST implementation + of SCSI target framework for Linux. It's started as PyX iSCSI target and then was + accommodated to other transports. But it's still in many kinds iSCSI-oriented. + You can find an example when people are not happy with it + here.

+ +

LIO maintainer, Nicholas Bellinger, is very good in building personal relationships and promoting LIO, + although often using misleading half, less-then-half and simply deceitful statements + about LIO current state, capabilities and future directions as well as about its competitor, SCST. + For instance, he setup LIO targets comparison page with obviously wrong statements about SCST, like + that it isn't fully zero copy or it isn't a generic target engine (while LIO, of course, is fully + zero-copy and fully generic target engine). Any + attempts + to correct it were simply ignored.

+ +

With those tricks Nicholas Bellinger was capable to attract key Linux kernel developers, and + they suddenly changed their opinion about Linux SCSI target subsystem in the opposite direction. + They previously asserted that + in-kernel SCSI target is the wrong direction, SCSI target must be in the user space, so + STGT is what everybody needed. Now their opinion is that SCSI target driver should be in the kernel space + and the only target good for them is LIO, doesn't matter that:

+ +
    +
  1. SCST is a lot more mature and advanced
  2. + +
  3. SCST from the beginning is a generic SCSI target
  4. + +
  5. SCST has a lot more features
  6. + +
  7. SCST has better performance
  8. + +
  9. SCST has a lot more users
  10. + +
  11. SCST has much bigger community
  12. + +
+ +

So, rejecting base principles of the Linux kernel community that the best code should win, the worst + code was chosen.

+ +

You can find more background behind choosing LIO as the mainline kernel SCSI target + subsystem if you read this thread + as well as searching for targets related topics in Linux kernel and Linux SCSI mailing lists.

+ +

Particularly notable is that James Bottomley from the beginning was telling that SCST + can't be merged in the mainline kernel, because it doesn't offer a drop in replacement for STGT + to avoid having 2 target infrastructures in the kernel at the same time. But, since LIO can't + offer user space backend drivers and doesn't have ibmvstgt driver analog, for LIO the drop in replacement wasn't + a requirement, so 2.6.38+ kernels successfully have both STGT and LIO.

+
+
+
+ + + + + + + + + + diff --git a/www/scstvsstgt.html b/www/scstvsstgt.html new file mode 100644 index 000000000..24ee0780e --- /dev/null +++ b/www/scstvsstgt.html @@ -0,0 +1,99 @@ + + + + + + + + +SCST vs STGT + + + + +
+ + + + + +
+ + +
+

SCST vs STGT

+

STGT is alternative, independent from SCST implementation + of SCSI target framework for Linux. It has different architecture, where SCSI target state machine is placed in + the user space, while in SCST all the processing done in the kernel. Such architecture as STGT has was acknowledged + by the Linux SCSI subsystem maintainers as a "right" one, so kernel's part of STGT quickly + found its way to the kernel.

+ +

But such architecture has several inherent problems. Among them performance and complexity. + See description for the set of patches, submitted for + the first iteration of in-kernel inclusion review and comments in Linux kernel mailing list.

+ +

See also the following important discussions:

+ + +

Time has proved that STGT is too weak to satisfy modern storage requirements. Now it is obsolete and + soon going to be removed from the mainline kernel.

+ +
+
+
+ + + + + + + + + + diff --git a/www/sgv_big_order_alloc-sfw5-rc3.diff b/www/sgv_big_order_alloc-sfw5-rc3.diff new file mode 100644 index 000000000..132d2dde4 --- /dev/null +++ b/www/sgv_big_order_alloc-sfw5-rc3.diff @@ -0,0 +1,596 @@ +Index: scst/include/scst_sgv.h +=================================================================== +--- scst/include/scst_sgv.h (revision 3134) ++++ scst/include/scst_sgv.h (working copy) +@@ -82,12 +82,14 @@ void sgv_pool_put(struct sgv_pool *pool) + void sgv_pool_flush(struct sgv_pool *pool); + + void sgv_pool_set_allocator(struct sgv_pool *pool, +- struct page *(*alloc_pages_fn)(struct scatterlist *, gfp_t, void *), +- void (*free_pages_fn)(struct scatterlist *, int, void *)); ++ struct page *(*alloc_pages_fn)(struct scatterlist *, ++ gfp_t, int, void *), ++ void (*free_pages_fn)(struct scatterlist *, int, int, void *)); + + struct scatterlist *sgv_pool_alloc(struct sgv_pool *pool, unsigned int size, + gfp_t gfp_mask, int flags, int *count, +- struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv); ++ struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv, ++ int max_sg_count); + void sgv_pool_free(struct sgv_pool_obj *sgv, struct scst_mem_lim *mem_lim); + + void *sgv_get_priv(struct sgv_pool_obj *sgv); +Index: scst/src/scst_mem.h +=================================================================== +--- scst/src/scst_mem.h (revision 3134) ++++ scst/src/scst_mem.h (working copy) +@@ -37,6 +37,8 @@ struct sgv_pool_obj { + int cache_num; + int pages; + ++ int alloc_order; ++ + /* jiffies, protected by sgv_pool_lock */ + unsigned long time_stamp; + +@@ -66,9 +68,9 @@ struct sgv_pool_cache_acc { + */ + struct sgv_pool_alloc_fns { + struct page *(*alloc_pages_fn)(struct scatterlist *sg, gfp_t gfp_mask, +- void *priv); ++ int alloc_order, void *priv); + void (*free_pages_fn)(struct scatterlist *sg, int sg_count, +- void *priv); ++ int alloc_order, void *priv); + }; + + /* +Index: scst/src/scst_lib.c +=================================================================== +--- scst/src/scst_lib.c (revision 3134) ++++ scst/src/scst_lib.c (working copy) +@@ -4454,7 +4454,6 @@ int scst_alloc_space(struct scst_cmd *cm + int atomic = scst_cmd_atomic(cmd); + int flags; + struct scst_tgt_dev *tgt_dev = cmd->tgt_dev; +- static int ll; + + TRACE_ENTRY(); + +@@ -4465,40 +4464,23 @@ int scst_alloc_space(struct scst_cmd *cm + flags |= SGV_POOL_ALLOC_NO_CACHED; + + cmd->sg = sgv_pool_alloc(tgt_dev->pool, cmd->bufflen, gfp_mask, flags, +- &cmd->sg_cnt, &cmd->sgv, &cmd->dev->dev_mem_lim, NULL); ++ &cmd->sg_cnt, &cmd->sgv, &cmd->dev->dev_mem_lim, NULL, ++ tgt_dev->max_sg_cnt); + if (cmd->sg == NULL) + goto out; + +- if (unlikely(cmd->sg_cnt > tgt_dev->max_sg_cnt)) { +- if ((ll < 10) || TRACING_MINOR()) { +- PRINT_INFO("Unable to complete command due to " +- "SG IO count limitation (requested %d, " +- "available %d, tgt lim %d)", cmd->sg_cnt, +- tgt_dev->max_sg_cnt, cmd->tgt->sg_tablesize); +- ll++; +- } +- goto out_sg_free; +- } ++ EXTRACHECKS_BUG_ON(cmd->sg_cnt > tgt_dev->max_sg_cnt); + + if (cmd->data_direction != SCST_DATA_BIDI) + goto success; + + cmd->out_sg = sgv_pool_alloc(tgt_dev->pool, cmd->out_bufflen, gfp_mask, + flags, &cmd->out_sg_cnt, &cmd->out_sgv, +- &cmd->dev->dev_mem_lim, NULL); ++ &cmd->dev->dev_mem_lim, NULL, tgt_dev->max_sg_cnt); + if (cmd->out_sg == NULL) + goto out_sg_free; + +- if (unlikely(cmd->out_sg_cnt > tgt_dev->max_sg_cnt)) { +- if ((ll < 10) || TRACING_MINOR()) { +- PRINT_INFO("Unable to complete command due to " +- "SG IO count limitation (OUT buffer, requested " +- "%d, available %d, tgt lim %d)", cmd->out_sg_cnt, +- tgt_dev->max_sg_cnt, cmd->tgt->sg_tablesize); +- ll++; +- } +- goto out_out_sg_free; +- } ++ EXTRACHECKS_BUG_ON(cmd->out_sg_cnt > tgt_dev->max_sg_cnt); + + success: + res = 0; +@@ -4507,12 +4489,6 @@ out: + TRACE_EXIT(); + return res; + +-out_out_sg_free: +- sgv_pool_free(cmd->out_sgv, &cmd->dev->dev_mem_lim); +- cmd->out_sgv = NULL; +- cmd->out_sg = NULL; +- cmd->out_sg_cnt = 0; +- + out_sg_free: + sgv_pool_free(cmd->sgv, &cmd->dev->dev_mem_lim); + cmd->sgv = NULL; +Index: scst/src/scst_mem.c +=================================================================== +--- scst/src/scst_mem.c (revision 3134) ++++ scst/src/scst_mem.c (working copy) +@@ -110,8 +110,8 @@ static void sgv_dtor_and_free(struct sgv + TRACE_MEM("Destroying sgv obj %p", obj); + + if (obj->sg_count != 0) { +- pool->alloc_fns.free_pages_fn(obj->sg_entries, +- obj->sg_count, obj->allocator_priv); ++ pool->alloc_fns.free_pages_fn(obj->sg_entries, obj->sg_count, ++ obj->alloc_order, obj->allocator_priv); + } + if (obj->sg_entries != obj->sg_entries_data) { + if (obj->trans_tbl != +@@ -522,11 +522,13 @@ out: + } + + static void sgv_free_sys_sg_entries(struct scatterlist *sg, int sg_count, +- void *priv) ++ int alloc_order, void *priv) + { + int i; ++ const int num_pages = 1 << alloc_order; + +- TRACE_MEM("sg=%p, sg_count=%d", sg, sg_count); ++ TRACE_MEM("sg=%p, sg_count=%d, alloc_order=%d", ++ sg, sg_count, alloc_order); + + for (i = 0; i < sg_count; i++) { + struct page *p = sg_page(&sg[i]); +@@ -538,36 +540,23 @@ static void sgv_free_sys_sg_entries(stru + (unsigned long)p, len, pages); + + while (pages > 0) { +- int order = 0; +- +-/* +- * __free_pages() doesn't like freeing pages with not that order with +- * which they were allocated, so disable this small optimization. +- */ +-#if 0 +- if (len > 0) { +- while (((1 << order) << PAGE_SHIFT) < len) +- order++; +- len = 0; +- } +-#endif + TRACE_MEM("free_pages(): order %d, page %lx", +- order, (unsigned long)p); ++ alloc_order, (unsigned long)p); + +- __free_pages(p, order); ++ __free_pages(p, alloc_order); + +- pages -= 1 << order; +- p += 1 << order; ++ pages -= num_pages; ++ p += num_pages; + } + } + } + +-static struct page *sgv_alloc_sys_pages(struct scatterlist *sg, +- gfp_t gfp_mask, void *priv) ++static struct page *sgv_alloc_sys_pages(struct scatterlist *sg, gfp_t gfp_mask, ++ int alloc_order, void *priv) + { +- struct page *page = alloc_pages(gfp_mask, 0); ++ struct page *page = alloc_pages(gfp_mask, alloc_order); + +- sg_set_page(sg, page, PAGE_SIZE, 0); ++ sg_set_page(sg, page, PAGE_SIZE << alloc_order, 0); + TRACE_MEM("page=%p, sg=%p, priv=%p", page, sg, priv); + if (page == NULL) { + TRACE(TRACE_OUT_OF_MEM, "%s", "Allocation of " +@@ -579,7 +568,7 @@ static struct page *sgv_alloc_sys_pages( + static int sgv_alloc_sg_entries(struct scatterlist *sg, int pages, + gfp_t gfp_mask, enum sgv_clustering_types clustering_type, + struct trans_tbl_ent *trans_tbl, +- const struct sgv_pool_alloc_fns *alloc_fns, void *priv) ++ const struct sgv_pool_alloc_fns *alloc_fns, int alloc_order, void *priv) + { + int sg_count = 0; + int pg, i, j; +@@ -594,7 +583,7 @@ static int sgv_alloc_sg_entries(struct s + gfp_mask |= __GFP_ZERO; + #endif + +- for (pg = 0; pg < pages; pg++) { ++ for (pg = 0; pg < pages; pg += 1 << alloc_order) { + void *rc; + #ifdef CONFIG_SCST_DEBUG_OOM + if (((gfp_mask & __GFP_NOFAIL) != __GFP_NOFAIL) && +@@ -603,7 +592,7 @@ static int sgv_alloc_sg_entries(struct s + else + #endif + rc = alloc_fns->alloc_pages_fn(&sg[sg_count], gfp_mask, +- priv); ++ alloc_order, priv); + if (rc == NULL) + goto out_no_mem; + +@@ -623,8 +612,8 @@ static int sgv_alloc_sg_entries(struct s + if (merged == -1) + sg_count++; + +- TRACE_MEM("pg=%d, merged=%d, sg_count=%d", pg, merged, +- sg_count); ++ TRACE_MEM("pg=%d, merged=%d, sg_count=%d", ++ pg, merged, sg_count); + } + + if ((clustering_type != sgv_no_clustering) && (trans_tbl != NULL)) { +@@ -645,7 +634,7 @@ out: + return sg_count; + + out_no_mem: +- alloc_fns->free_pages_fn(sg, sg_count, priv); ++ alloc_fns->free_pages_fn(sg, sg_count, alloc_order, priv); + sg_count = 0; + goto out; + } +@@ -704,32 +693,16 @@ out_free: + goto out; + } + +-static struct sgv_pool_obj *sgv_get_obj(struct sgv_pool *pool, int cache_num, +- int pages, gfp_t gfp_mask, bool get_new) ++static struct sgv_pool_obj *sgv_create_obj(struct sgv_pool *pool, ++ int cache_num, ++ int pages, gfp_t gfp_mask, ++ int locked) + { + struct sgv_pool_obj *obj; + +- spin_lock_bh(&pool->sgv_pool_lock); +- +- if (unlikely(get_new)) { +- /* Used only for buffers preallocation */ +- goto get_new; +- } +- +- if (likely(!list_empty(&pool->recycling_lists[cache_num]))) { +- obj = list_entry(pool->recycling_lists[cache_num].next, +- struct sgv_pool_obj, recycling_list_entry); +- +- list_del(&obj->sorted_recycling_list_entry); +- list_del(&obj->recycling_list_entry); +- +- pool->inactive_cached_pages -= pages; +- +- spin_unlock_bh(&pool->sgv_pool_lock); +- goto out; +- } ++ if (!locked) ++ spin_lock_bh(&pool->sgv_pool_lock); + +-get_new: + if (pool->cached_entries == 0) { + TRACE_MEM("Adding pool %p to the active list", pool); + spin_lock_bh(&sgv_pools_lock); +@@ -759,6 +732,57 @@ get_new: + spin_unlock_bh(&pool->sgv_pool_lock); + } + ++ return obj; ++} ++ ++/* FZ Notes: cache_num == order, and we should have pages = 1 << cache_num. */ ++static struct sgv_pool_obj *sgv_get_obj(struct sgv_pool *pool, int cache_num, ++ int pages, gfp_t gfp_mask, ++ int max_sg_count, bool get_new) ++{ ++ struct sgv_pool_obj *obj; ++ ++ spin_lock_bh(&pool->sgv_pool_lock); ++ ++ if (unlikely(get_new)) { ++ /* Used only for buffers preallocation */ ++ /* TODO: caller of that should now call ++ * sgv_create_obj, and this will go away. */ ++ goto get_new; ++ } ++ ++ if (likely(!list_empty(&pool->recycling_lists[cache_num]))) { ++ list_for_each_entry(obj, &pool->recycling_lists[cache_num], ++ recycling_list_entry) { ++ ++ TRACE_MEM("obj %p, sg_count %d (max %d)", obj, ++ obj->sg_count, max_sg_count); ++ ++ if (unlikely(obj->sg_count > max_sg_count)) ++ continue; ++ ++ obj = list_entry(pool->recycling_lists[cache_num].next, ++ struct sgv_pool_obj, ++ recycling_list_entry); ++ ++ list_del(&obj->sorted_recycling_list_entry); ++ list_del(&obj->recycling_list_entry); ++ ++ pool->inactive_cached_pages -= pages; ++ ++ spin_unlock_bh(&pool->sgv_pool_lock); ++ ++ /* FZ: note entirely sure of that check. Need ++ * to investigate. */ ++ /*EXTRACHECKS_BUG_ON(obj->alloc_order <= cache_num);*/ ++ ++ goto out; ++ } ++ } ++ ++get_new: ++ obj = sgv_create_obj(pool, cache_num, pages, gfp_mask, true); ++ + out: + return obj; + } +@@ -908,14 +932,17 @@ static void sgv_uncheck_allowed_mem(stru + */ + struct scatterlist *sgv_pool_alloc(struct sgv_pool *pool, unsigned int size, + gfp_t gfp_mask, int flags, int *count, +- struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv) ++ struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv, ++ int max_sg_count) + { + struct sgv_pool_obj *obj; + int cache_num, pages, cnt; + struct scatterlist *res = NULL; + int pages_to_alloc; ++ int alloc_order; + int no_cached = flags & SGV_POOL_ALLOC_NO_CACHED; + bool allowed_mem_checked = false, hiwmk_checked = false; ++ int tmp; + + TRACE_ENTRY(); + +@@ -958,7 +985,7 @@ struct scatterlist *sgv_pool_alloc(struc + allowed_mem_checked = true; + + obj = sgv_get_obj(pool, cache_num, pages_to_alloc, gfp_mask, +- flags & SGV_POOL_ALLOC_GET_NEW); ++ max_sg_count, flags & SGV_POOL_ALLOC_GET_NEW); + if (unlikely(obj == NULL)) { + TRACE(TRACE_OUT_OF_MEM, "Allocation of " + "sgv_pool_obj failed (size %d)", size); +@@ -967,7 +994,30 @@ struct scatterlist *sgv_pool_alloc(struc + + if (obj->sg_count != 0) { + TRACE_MEM("Cached obj %p", obj); +- atomic_inc(&pool->cache_acc[cache_num].hit_alloc); ++ ++ if (unlikely(max_sg_count < obj->sg_count)) { ++ TRACE_MEM("Too many SG entries %d (max %d)", ++ obj->sg_count, max_sg_count); ++ ++ sgv_put_obj(obj); ++ ++ obj = sgv_create_obj(pool, cache_num, ++ pages_to_alloc, gfp_mask, ++ false); ++ if (obj && ++ unlikely(max_sg_count < obj->sg_count)) { ++ sgv_put_obj(obj); ++ obj = NULL; ++ } ++ ++ if (obj == NULL) { ++ TRACE(TRACE_OUT_OF_MEM, "Allocation of " ++ "sgv_pool_obj failed (size %d)", ++ size); ++ goto out_fail; ++ } ++ } else ++ atomic_inc(&pool->cache_acc[cache_num].hit_alloc); + goto success; + } + +@@ -1045,16 +1095,44 @@ struct scatterlist *sgv_pool_alloc(struc + TRACE_MEM("Big or no_cached obj %p (size %d)", obj, sz); + } + +- obj->sg_count = sgv_alloc_sg_entries(obj->sg_entries, +- pages_to_alloc, gfp_mask, pool->clustering_type, +- obj->trans_tbl, &pool->alloc_fns, priv); +- if (unlikely(obj->sg_count <= 0)) { +- obj->sg_count = 0; +- if ((flags & SGV_POOL_RETURN_OBJ_ON_ALLOC_FAIL) && +- (cache_num >= 0)) +- goto out_return1; +- else +- goto out_fail_free_sg_entries; ++ /* Allocate the scatter gather entries. Since the memory we ++ * request may fit in too many entries, we try to start with ++ * an order big enough. That will save some useless ++ * allocations. */ ++ alloc_order = 0; ++ tmp = pages_to_alloc; ++ while (tmp > max_sg_count) { ++ tmp >>= 1; ++ alloc_order++; ++ } ++ ++ while (1) { ++ obj->sg_count = sgv_alloc_sg_entries(obj->sg_entries, ++ pages_to_alloc, ++ gfp_mask, ++ pool->clustering_type, ++ obj->trans_tbl, ++ &pool->alloc_fns, ++ alloc_order, priv); ++ if (unlikely(obj->sg_count <= 0)) { ++ obj->sg_count = 0; ++ if ((flags & SGV_POOL_RETURN_OBJ_ON_ALLOC_FAIL) && ++ (cache_num >= 0)) ++ goto out_return1; ++ else ++ goto out_fail_free_sg_entries; ++ } ++ ++ obj->alloc_order = alloc_order; ++ ++ if (likely(obj->sg_count <= max_sg_count)) ++ break; ++ ++ obj->owner_pool->alloc_fns.free_pages_fn(obj->sg_entries, ++ obj->sg_count, ++ obj->alloc_order, ++ obj->allocator_priv); ++ alloc_order++; + } + + if (cache_num >= 0) { +@@ -1230,7 +1308,7 @@ void sgv_pool_free(struct sgv_pool_obj * + sgv_put_obj(obj); + } else { + obj->owner_pool->alloc_fns.free_pages_fn(obj->sg_entries, +- obj->sg_count, obj->allocator_priv); ++ obj->sg_count, obj->alloc_order, obj->allocator_priv); + kfree(obj); + sgv_hiwmk_uncheck(pages); + } +@@ -1289,7 +1367,7 @@ struct scatterlist *scst_alloc(int size, + * So, let's always don't use clustering. + */ + cnt = sgv_alloc_sg_entries(res, pages, gfp_mask, sgv_no_clustering, +- NULL, &sys_alloc_fns, NULL); ++ NULL, &sys_alloc_fns, 0, NULL); + if (cnt <= 0) + goto out_free; + +@@ -1326,7 +1404,7 @@ void scst_free(struct scatterlist *sg, i + + sgv_hiwmk_uncheck(count); + +- sgv_free_sys_sg_entries(sg, count, NULL); ++ sgv_free_sys_sg_entries(sg, count, 0, NULL); + kfree(sg); + return; + } +@@ -1580,8 +1658,9 @@ static void sgv_pool_destroy(struct sgv_ + * See the SGV pool documentation for more details. + */ + void sgv_pool_set_allocator(struct sgv_pool *pool, +- struct page *(*alloc_pages_fn)(struct scatterlist *, gfp_t, void *), +- void (*free_pages_fn)(struct scatterlist *, int, void *)) ++ struct page *(*alloc_pages_fn)(struct scatterlist *, gfp_t, ++ int, void *), ++ void (*free_pages_fn)(struct scatterlist *, int, int, void *)) + { + pool->alloc_fns.alloc_pages_fn = alloc_pages_fn; + pool->alloc_fns.free_pages_fn = free_pages_fn; +Index: scst/src/dev_handlers/scst_user.c +=================================================================== +--- scst/src/dev_handlers/scst_user.c (revision 3134) ++++ scst/src/dev_handlers/scst_user.c (working copy) +@@ -163,9 +163,9 @@ static int dev_user_disk_done(struct scs + static int dev_user_tape_done(struct scst_cmd *cmd); + + static struct page *dev_user_alloc_pages(struct scatterlist *sg, +- gfp_t gfp_mask, void *priv); ++ gfp_t gfp_mask, int alloc_order, void *priv); + static void dev_user_free_sg_entries(struct scatterlist *sg, int sg_count, +- void *priv); ++ int alloc_order, void *priv); + + static void dev_user_add_to_ready(struct scst_user_cmd *ucmd); + +@@ -392,7 +392,7 @@ static void dev_user_free_ucmd(struct sc + } + + static struct page *dev_user_alloc_pages(struct scatterlist *sg, +- gfp_t gfp_mask, void *priv) ++ gfp_t gfp_mask, int alloc_order, void *priv) + { + struct scst_user_cmd *ucmd = (struct scst_user_cmd *)priv; + int offset = 0; +@@ -401,8 +401,11 @@ static struct page *dev_user_alloc_pages + + /* *sg supposed to be zeroed */ + +- TRACE_MEM("ucmd %p, ubuff %lx, ucmd->cur_data_page %d", ucmd, +- ucmd->ubuff, ucmd->cur_data_page); ++ TRACE_MEM("ucmd %p, ubuff %lx, ucmd->cur_data_page %d, alloc_order %d", ++ ucmd, ucmd->ubuff, ucmd->cur_data_page, alloc_order); ++ ++ if (unlikely(alloc_order != 0)) ++ goto out; + + if (ucmd->cur_data_page == 0) { + TRACE_MEM("ucmd->first_page_offset %d", +@@ -495,7 +498,7 @@ static void __dev_user_free_sg_entries(s + } + + static void dev_user_free_sg_entries(struct scatterlist *sg, int sg_count, +- void *priv) ++ int alloc_order, void *priv) + { + struct scst_user_cmd *ucmd = (struct scst_user_cmd *)priv; + +@@ -582,7 +585,8 @@ static int dev_user_alloc_sg(struct scst + ucmd->buff_cached = cached_buff; + + cmd->sg = sgv_pool_alloc(pool, bufflen, gfp_mask, flags, &cmd->sg_cnt, +- &ucmd->sgv, &dev->udev_mem_lim, ucmd); ++ &ucmd->sgv, &dev->udev_mem_lim, ucmd, ++ cmd->tgt_dev->max_sg_cnt); + if (cmd->sg != NULL) { + struct scst_user_cmd *buf_ucmd = + (struct scst_user_cmd *)sgv_get_priv(ucmd->sgv); +@@ -614,20 +618,7 @@ static int dev_user_alloc_sg(struct scst + cmd, cmd->out_sg, cmd->out_sg_cnt, cmd->sg_cnt); + } + +- if (unlikely(cmd->sg_cnt > cmd->tgt_dev->max_sg_cnt)) { +- static int ll; +- if ((ll < 10) || TRACING_MINOR()) { +- PRINT_INFO("Unable to complete command due to " +- "SG IO count limitation (requested %d, " +- "available %d, tgt lim %d)", +- cmd->sg_cnt, cmd->tgt_dev->max_sg_cnt, +- cmd->tgt->sg_tablesize); +- ll++; +- } +- cmd->sg = NULL; +- /* sgv will be freed in dev_user_free_sgv() */ +- res = -1; +- } ++ EXTRACHECKS_BUG_ON(cmd->sg_cnt > cmd->tgt_dev->max_sg_cnt); + } else { + TRACE_MEM("Buf not alloced (ucmd %p, h %d, buff_cached, %d, " + "sg_cnt %d, ubuff %lx, sgv %p", ucmd, ucmd->h, +@@ -3137,6 +3128,14 @@ static int dev_user_prealloc_buffer(stru + + TRACE_ENTRY(); + ++ { ++ /* The SGV patch cannot support that feature because ++ * we don't know either the target or the number of SG ++ * buffer of the target. */ ++ res = -EINVAL; ++ goto out; ++ } ++ + mutex_lock(&dev_priv_mutex); + dev = (struct scst_user_dev *)file->private_data; + res = dev_user_check_reg(dev); +@@ -3188,7 +3187,7 @@ static int dev_user_prealloc_buffer(stru + pool = dev->pool; + + sg = sgv_pool_alloc(pool, bufflen, GFP_KERNEL, SGV_POOL_ALLOC_GET_NEW, +- &sg_cnt, &ucmd->sgv, &dev->udev_mem_lim, ucmd); ++ &sg_cnt, &ucmd->sgv, &dev->udev_mem_lim, ucmd, 0); + if (sg != NULL) { + struct scst_user_cmd *buf_ucmd = + (struct scst_user_cmd *)sgv_get_priv(ucmd->sgv); diff --git a/www/sgv_big_order_alloc.diff b/www/sgv_big_order_alloc.diff new file mode 100644 index 000000000..46d24c6a4 --- /dev/null +++ b/www/sgv_big_order_alloc.diff @@ -0,0 +1,445 @@ +In the pass-through mode (i.e. using the pass-through device handlers +scst_disk, scst_tape, etc) SCSI commands, coming from remote initiators, +are passed to local SCSI hardware on target as is, without any +modifications. As any other hardware, the local SCSI hardware can not +handle commands with amount of data and/or segments count in +scatter-gather array bigger some values. If you have this issue you will +see symptoms like small transfers work well, but large ones stall and +messages like: "Unable to complete command due to SG IO count +limitation" are printed in the kernel logs. + +This is proposed patch to solve that. It allows SGV cache do allocation +of pages with order > 0, i.e. more than 1 page per SG entry. + +Compile tested only. + +Index: scst/include/scst.h +=================================================================== +--- scst/include/scst.h (revision 558) ++++ scst/include/scst.h (working copy) +@@ -2649,12 +2649,13 @@ struct sgv_pool *sgv_pool_create(const c + void sgv_pool_destroy(struct sgv_pool *pool); + + void sgv_pool_set_allocator(struct sgv_pool *pool, +- struct page *(*alloc_pages_fn)(struct scatterlist *, gfp_t, void *), +- void (*free_pages_fn)(struct scatterlist *, int, void *)); ++ struct page *(*alloc_pages_fn)(struct scatterlist *, gfp_t, void *, int), ++ void (*free_pages_fn)(struct scatterlist *, int, void *, int)); + + struct scatterlist *sgv_pool_alloc(struct sgv_pool *pool, unsigned int size, + gfp_t gfp_mask, int flags, int *count, +- struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv); ++ struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv, ++ int max_sg_count); + void sgv_pool_free(struct sgv_pool_obj *sgv, struct scst_mem_lim *mem_lim); + + void *sgv_get_priv(struct sgv_pool_obj *sgv); +Index: scst/src/scst_mem.h +=================================================================== +--- scst/src/scst_mem.h (revision 558) ++++ scst/src/scst_mem.h (working copy) +@@ -36,6 +36,8 @@ struct sgv_pool_obj { + /* if <0 - pages, >0 - order */ + int order_or_pages; + ++ int alloc_order; ++ + struct { + /* jiffies, protected by pool_mgr_lock */ + unsigned long time_stamp; +@@ -67,9 +69,9 @@ struct sgv_pool_cache_acc { + + struct sgv_pool_alloc_fns { + struct page *(*alloc_pages_fn)(struct scatterlist *sg, gfp_t gfp_mask, +- void *priv); ++ void *priv, int alloc_order); + void (*free_pages_fn)(struct scatterlist *sg, int sg_count, +- void *priv); ++ void *priv, int alloc_order); + }; + + struct sgv_pool { +Index: scst/src/scst_lib.c +=================================================================== +--- scst/src/scst_lib.c (revision 558) ++++ scst/src/scst_lib.c (working copy) +@@ -1663,34 +1663,18 @@ int scst_alloc_space(struct scst_cmd *cm + flags |= SCST_POOL_ALLOC_NO_CACHED; + + cmd->sg = sgv_pool_alloc(tgt_dev->pool, cmd->bufflen, gfp_mask, flags, +- &cmd->sg_cnt, &cmd->sgv, &cmd->dev->dev_mem_lim, NULL); ++ &cmd->sg_cnt, &cmd->sgv, &cmd->dev->dev_mem_lim, NULL, ++ tgt_dev->max_sg_cnt); + if (cmd->sg == NULL) + goto out; + +- if (unlikely(cmd->sg_cnt > tgt_dev->max_sg_cnt)) { +- static int ll; +- if (ll < 10) { +- PRINT_INFO("Unable to complete command due to " +- "SG IO count limitation (requested %d, " +- "available %d, tgt lim %d)", cmd->sg_cnt, +- tgt_dev->max_sg_cnt, cmd->tgt->sg_tablesize); +- ll++; +- } +- goto out_sg_free; +- } ++ EXTRACHECKS_BUG_ON(cmd->sg_cnt > tgt_dev->max_sg_cnt); + + res = 0; + + out: + TRACE_EXIT(); + return res; +- +-out_sg_free: +- sgv_pool_free(cmd->sgv, &cmd->dev->dev_mem_lim); +- cmd->sgv = NULL; +- cmd->sg = NULL; +- cmd->sg_cnt = 0; +- goto out; + } + + void scst_release_space(struct scst_cmd *cmd) +Index: scst/src/scst_mem.c +=================================================================== +--- scst/src/scst_mem.c (revision 558) ++++ scst/src/scst_mem.c (working copy) +@@ -118,7 +118,7 @@ out_head: + } + + static void scst_free_sys_sg_entries(struct scatterlist *sg, int sg_count, +- void *priv) ++ void *priv, int alloc_order) + { + int i; + +@@ -134,7 +134,7 @@ static void scst_free_sys_sg_entries(str + (unsigned long)p, len, pages); + + while (pages > 0) { +- int order = 0; ++ int order = alloc_order; + + /* + * __free_pages() doesn't like freeing pages with not that order with +@@ -159,9 +159,9 @@ static void scst_free_sys_sg_entries(str + } + + static struct page *scst_alloc_sys_pages(struct scatterlist *sg, +- gfp_t gfp_mask, void *priv) ++ gfp_t gfp_mask, void *priv, int alloc_order) + { +- struct page *page = alloc_pages(gfp_mask, 0); ++ struct page *page = alloc_pages(gfp_mask, alloc_order); + + sg_set_page(sg, page, PAGE_SIZE, 0); + TRACE_MEM("page=%p, sg=%p, priv=%p", page, sg, priv); +@@ -174,10 +174,10 @@ static struct page *scst_alloc_sys_pages + + static int scst_alloc_sg_entries(struct scatterlist *sg, int pages, + gfp_t gfp_mask, int clustered, struct trans_tbl_ent *trans_tbl, +- const struct sgv_pool_alloc_fns *alloc_fns, void *priv) ++ const struct sgv_pool_alloc_fns *alloc_fns, void *priv, int alloc_order) + { + int sg_count = 0; +- int pg, i, j; ++ int pg, i, j, pg_inc = 1 << alloc_order; + int merged = -1; + + TRACE_MEM("pages=%d, clustered=%d", pages, clustered); +@@ -189,7 +189,7 @@ static int scst_alloc_sg_entries(struct + gfp_mask |= __GFP_ZERO; + #endif + +- for (pg = 0; pg < pages; pg++) { ++ for (pg = 0; pg < pages; pg += pg_inc) { + void *rc; + #ifdef CONFIG_SCST_DEBUG_OOM + if (((gfp_mask & __GFP_NOFAIL) != __GFP_NOFAIL) && +@@ -198,7 +198,7 @@ static int scst_alloc_sg_entries(struct + else + #endif + rc = alloc_fns->alloc_pages_fn(&sg[sg_count], gfp_mask, +- priv); ++ priv, alloc_order); + if (rc == NULL) + goto out_no_mem; + if (clustered) { +@@ -229,7 +229,7 @@ out: + return sg_count; + + out_no_mem: +- alloc_fns->free_pages_fn(sg, sg_count, priv); ++ alloc_fns->free_pages_fn(sg, sg_count, priv, alloc_order); + sg_count = 0; + goto out; + } +@@ -292,7 +292,7 @@ static void sgv_dtor_and_free(struct sgv + { + if (obj->sg_count != 0) { + obj->owner_pool->alloc_fns.free_pages_fn(obj->sg_entries, +- obj->sg_count, obj->allocator_priv); ++ obj->sg_count, obj->allocator_priv, obj->alloc_order); + } + if (obj->sg_entries != obj->sg_entries_data) { + if (obj->trans_tbl != +@@ -308,6 +308,36 @@ static void sgv_dtor_and_free(struct sgv + return; + } + ++static struct sgv_pool_obj *sgv_pool_cached_create(struct sgv_pool *pool, ++ int order, gfp_t gfp_mask, bool locked) ++{ ++ struct sgv_pool_obj *obj; ++ int pages = 1 << order; ++ ++ if (!locked) ++ spin_lock_bh(&sgv_pools_mgr.mgr.pool_mgr_lock); ++ ++ pool->acc.cached_entries++; ++ pool->acc.cached_pages += pages; ++ ++ spin_unlock_bh(&sgv_pools_mgr.mgr.pool_mgr_lock); ++ ++ obj = kmem_cache_alloc(pool->caches[order], ++ gfp_mask & ~(__GFP_HIGHMEM|GFP_DMA)); ++ if (likely(obj)) { ++ memset(obj, 0, sizeof(*obj)); ++ obj->order_or_pages = order; ++ obj->owner_pool = pool; ++ } else { ++ spin_lock_bh(&sgv_pools_mgr.mgr.pool_mgr_lock); ++ pool->acc.cached_entries--; ++ pool->acc.cached_pages -= pages; ++ spin_unlock_bh(&sgv_pools_mgr.mgr.pool_mgr_lock); ++ } ++ ++ return obj; ++} ++ + static struct sgv_pool_obj *sgv_pool_cached_get(struct sgv_pool *pool, + int order, gfp_t gfp_mask) + { +@@ -332,23 +362,7 @@ static struct sgv_pool_obj *sgv_pool_cac + goto out; + } + +- pool->acc.cached_entries++; +- pool->acc.cached_pages += pages; +- +- spin_unlock_bh(&sgv_pools_mgr.mgr.pool_mgr_lock); +- +- obj = kmem_cache_alloc(pool->caches[order], +- gfp_mask & ~(__GFP_HIGHMEM|GFP_DMA)); +- if (likely(obj)) { +- memset(obj, 0, sizeof(*obj)); +- obj->order_or_pages = order; +- obj->owner_pool = pool; +- } else { +- spin_lock_bh(&sgv_pools_mgr.mgr.pool_mgr_lock); +- pool->acc.cached_entries--; +- pool->acc.cached_pages -= pages; +- spin_unlock_bh(&sgv_pools_mgr.mgr.pool_mgr_lock); +- } ++ obj = sgv_pool_cached_create(pool, order, gfp_mask, true); + + out: + return obj; +@@ -546,12 +560,13 @@ static void scst_uncheck_allowed_mem(str + + struct scatterlist *sgv_pool_alloc(struct sgv_pool *pool, unsigned int size, + gfp_t gfp_mask, int flags, int *count, +- struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv) ++ struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv, ++ int max_sg_count) + { + struct sgv_pool_obj *obj; + int order, pages, cnt; + struct scatterlist *res = NULL; +- int pages_to_alloc; ++ int pages_to_alloc, alloc_order; + struct kmem_cache *cache; + int no_cached = flags & SCST_POOL_ALLOC_NO_CACHED; + bool allowed_mem_checked = false, hiwmk_checked = false; +@@ -605,7 +620,23 @@ struct scatterlist *sgv_pool_alloc(struc + if (obj->sg_count != 0) { + TRACE_MEM("Cached sgv_obj %p", obj); + EXTRACHECKS_BUG_ON(obj->order_or_pages != order); +- atomic_inc(&pool->cache_acc[order].hit_alloc); ++ ++ if (unlikely(max_sg_count < obj->sg_count)) { ++ TRACE_MEM("Too many SG entries %d (max %d)", ++ obj->sg_count, max_sg_count); ++ ++ sgv_pool_cached_put(obj); ++ ++ obj = sgv_pool_cached_create(pool, order, ++ gfp_mask, false); ++ if (obj == NULL) { ++ TRACE(TRACE_OUT_OF_MEM, "Allocation of " ++ "sgv_pool_obj failed (size %d)", ++ size); ++ goto out_fail; ++ } ++ } else ++ atomic_inc(&pool->cache_acc[order].hit_alloc); + goto success; + } + +@@ -682,15 +713,27 @@ struct scatterlist *sgv_pool_alloc(struc + TRACE_MEM("Big or no_cached sgv_obj %p (size %d)", obj, sz); + } + +- obj->sg_count = scst_alloc_sg_entries(obj->sg_entries, +- pages_to_alloc, gfp_mask, pool->clustered, obj->trans_tbl, +- &pool->alloc_fns, priv); +- if (unlikely(obj->sg_count <= 0)) { +- obj->sg_count = 0; +- if ((flags & SCST_POOL_RETURN_OBJ_ON_ALLOC_FAIL) && cache) +- goto out_return1; +- else +- goto out_fail_free_sg_entries; ++ alloc_order = 0; ++ while (1) { ++ obj->sg_count = scst_alloc_sg_entries(obj->sg_entries, ++ pages_to_alloc, gfp_mask, pool->clustered, ++ obj->trans_tbl, &pool->alloc_fns, priv, alloc_order); ++ if (unlikely(obj->sg_count <= 0)) { ++ obj->sg_count = 0; ++ if ((flags & SCST_POOL_RETURN_OBJ_ON_ALLOC_FAIL) && cache) ++ goto out_return1; ++ else ++ goto out_fail_free_sg_entries; ++ } ++ obj->alloc_order = alloc_order; ++ ++ if (max_sg_count >= obj->sg_count) ++ break; ++ ++ obj->owner_pool->alloc_fns.free_pages_fn(obj->sg_entries, ++ obj->sg_count, obj->allocator_priv, ++ obj->alloc_order); ++ alloc_order++; + } + + if (cache) { +@@ -815,7 +858,7 @@ void sgv_pool_free(struct sgv_pool_obj * + sgv_pool_cached_put(sgv); + } else { + sgv->owner_pool->alloc_fns.free_pages_fn(sgv->sg_entries, +- sgv->sg_count, sgv->allocator_priv); ++ sgv->sg_count, sgv->allocator_priv, sgv->alloc_order); + pages = (sgv->sg_count != 0) ? -sgv->order_or_pages : 0; + kfree(sgv); + sgv_pool_hiwmk_uncheck(pages); +@@ -861,7 +904,7 @@ struct scatterlist *scst_alloc(int size, + * So, always don't use clustering. + */ + *count = scst_alloc_sg_entries(res, pages, gfp_mask, 0, NULL, +- &sys_alloc_fns, NULL); ++ &sys_alloc_fns, NULL, 0); + if (*count <= 0) + goto out_free; + +@@ -888,7 +931,7 @@ void scst_free(struct scatterlist *sg, i + + sgv_pool_hiwmk_uncheck(count); + +- scst_free_sys_sg_entries(sg, count, NULL); ++ scst_free_sys_sg_entries(sg, count, NULL, 0); + kfree(sg); + return; + } +@@ -1060,8 +1103,8 @@ void sgv_pool_deinit(struct sgv_pool *po + } + + void sgv_pool_set_allocator(struct sgv_pool *pool, +- struct page *(*alloc_pages_fn)(struct scatterlist *, gfp_t, void *), +- void (*free_pages_fn)(struct scatterlist *, int, void *)) ++ struct page *(*alloc_pages_fn)(struct scatterlist *, gfp_t, void *, int), ++ void (*free_pages_fn)(struct scatterlist *, int, void *, int)) + { + pool->alloc_fns.alloc_pages_fn = alloc_pages_fn; + pool->alloc_fns.free_pages_fn = free_pages_fn; +Index: scst/src/dev_handlers/scst_user.c +=================================================================== +--- scst/src/dev_handlers/scst_user.c (revision 559) ++++ scst/src/dev_handlers/scst_user.c (working copy) +@@ -168,9 +168,9 @@ static int dev_user_disk_done(struct scs + static int dev_user_tape_done(struct scst_cmd *cmd); + + static struct page *dev_user_alloc_pages(struct scatterlist *sg, +- gfp_t gfp_mask, void *priv); ++ gfp_t gfp_mask, void *priv, int alloc_order); + static void dev_user_free_sg_entries(struct scatterlist *sg, int sg_count, +- void *priv); ++ void *priv, int alloc_order); + + static void dev_user_add_to_ready(struct scst_user_cmd *ucmd); + +@@ -368,7 +368,7 @@ static void dev_user_free_ucmd(struct sc + } + + static struct page *dev_user_alloc_pages(struct scatterlist *sg, +- gfp_t gfp_mask, void *priv) ++ gfp_t gfp_mask, void *priv, int alloc_order) + { + struct scst_user_cmd *ucmd = (struct scst_user_cmd *)priv; + int offset = 0; +@@ -377,8 +377,11 @@ static struct page *dev_user_alloc_pages + + /* *sg supposed to be zeroed */ + +- TRACE_MEM("ucmd %p, ubuff %lx, ucmd->cur_data_page %d", ucmd, +- ucmd->ubuff, ucmd->cur_data_page); ++ TRACE_MEM("ucmd %p, ubuff %lx, ucmd->cur_data_page %d, alloc_order %d", ++ ucmd, ucmd->ubuff, ucmd->cur_data_page, alloc_order); ++ ++ if (unlikely(alloc_order != 0)) ++ goto out; + + if (ucmd->cur_data_page == 0) { + TRACE_MEM("ucmd->first_page_offset %d", +@@ -468,7 +471,7 @@ static void __dev_user_free_sg_entries(s + } + + static void dev_user_free_sg_entries(struct scatterlist *sg, int sg_count, +- void *priv) ++ void *priv, int alloc_order) + { + struct scst_user_cmd *ucmd = (struct scst_user_cmd *)priv; + +@@ -537,7 +540,8 @@ static int dev_user_alloc_sg(struct scst + ucmd->buff_cached = cached_buff; + + cmd->sg = sgv_pool_alloc(dev->pool, bufflen, gfp_mask, flags, +- &cmd->sg_cnt, &ucmd->sgv, &dev->udev_mem_lim, ucmd); ++ &cmd->sg_cnt, &ucmd->sgv, &dev->udev_mem_lim, ucmd, ++ cmd->tgt_dev->max_sg_cnt); + if (cmd->sg != NULL) { + struct scst_user_cmd *buf_ucmd = + (struct scst_user_cmd *)sgv_get_priv(ucmd->sgv); +@@ -559,21 +563,7 @@ static int dev_user_alloc_sg(struct scst + "last_len %d, l %d)", ucmd, cached_buff, ucmd->ubuff, + last_len, cmd->sg[cmd->sg_cnt-1].length); + +- if (unlikely(cmd->sg_cnt > cmd->tgt_dev->max_sg_cnt)) { +- static int ll; +- if (ll < 10) { +- PRINT_INFO("Unable to complete command due to " +- "SG IO count limitation (requested %d, " +- "available %d, tgt lim %d)", +- cmd->sg_cnt, +- cmd->tgt_dev->max_sg_cnt, +- cmd->tgt->sg_tablesize); +- ll++; +- } +- cmd->sg = NULL; +- /* sgv will be freed in dev_user_free_sgv() */ +- res = -1; +- } ++ EXTRACHECKS_BUG_ON(cmd->sg_cnt > cmd->tgt_dev->max_sg_cnt); + } else { + TRACE_MEM("Buf not alloced (ucmd %p, h %d, buff_cached, %d, " + "sg_cnt %d, ubuff %lx, sgv %p", ucmd, ucmd->h, diff --git a/www/target_emulex.html b/www/target_emulex.html new file mode 100644 index 000000000..be4f61fda --- /dev/null +++ b/www/target_emulex.html @@ -0,0 +1,101 @@ + + + + + + + + +Emulex FC/FCoE target driver + + + + +
+ + + + + +
+ + +
+

Target driver for Emulex FC/FCoE

+

SCST Emulex + +

The Emulex OneCore Storage FC/FCoE driver (ocs_fc_scst) is developed and maintained by Broadcom. It is available on the + OneCore Storage SDK page.

+ +

The Emulex OneCore Storage SDK supports the Service Level Interface 4 (SLI-4) API and is compatible with the latest generation of Emulex 8 and 16 Gb/s Fibre Channel HBAs (LPe15000 and LPe16000 series), + as well as the latest generation of 10 and 40 Gb/s FCoE UCNAs (OCe14000-series). It supports both target and initiator mode of operation and a number of advanced features: NPIV, T10-PI, etc.

+ +

If you intend to use persistent reservations with this target driver, you may need to apply: this patch.

+ +

Note: The drivers on SourceForge Emulex Drivers + are very old, not maintained and not recommended for new designs.

+ +

+ +
 
+
+ +
+ + +
+ + + + + + + + + diff --git a/www/target_fcoe.html b/www/target_fcoe.html new file mode 100644 index 000000000..26e8bde28 --- /dev/null +++ b/www/target_fcoe.html @@ -0,0 +1,95 @@ + + + + + + + + +FCoE Target Driver + + + + +
+ + + + + +
+ + +
+

FCoE target

+

SCST Fcoe + SCST Fibre Channel over Ethernet (FCoE) target is developed by Open-FCoE team and Joe Eykholt. + Since February 2010 the main development place of it is SCST SVN repository. +

+

You can download the latest development version from the SCST SVN repository. See the download + page how to setup access to it. +





+ +
 
+
+ +
+ + +
+ + + + + + + + + diff --git a/www/target_ibmvscsi.html b/www/target_ibmvscsi.html new file mode 100644 index 000000000..09c39c177 --- /dev/null +++ b/www/target_ibmvscsi.html @@ -0,0 +1,96 @@ + + + + + + + + +SRP Target Driver + + + + +
+ + + + + +
+ + +
+

IBM Virtual SCSI Target

+

The virtual SCSI (VSCSI) protocol as defined in Power Architecture Standard + is a protocol that allows one logical partition (LPAR) to access SCSI targets provided by another LPAR. + The LPAR that provides one or more SCSI targets is called the VIO server or + VIOS. The ibmvstgt driver is a VIOS driver that makes it possible to access + exported target devices via the VSCSI protocol.

+

This driver is based on ibmvstgt driver, but comparing to the original ibmvstgt has a number of important fixes and improvements. + The port was made by Bart Van Assche.

+

You can download it from the SCST SVN repository. See the download page how to setup access to it.

+


+ +
 
+
+ +
+ + +
+ + + + + + + + + diff --git a/www/target_iscsi.html b/www/target_iscsi.html new file mode 100644 index 000000000..9b6fd430d --- /dev/null +++ b/www/target_iscsi.html @@ -0,0 +1,109 @@ + + + + + + + + +iSCSI Target Driver + + + + +
+ + + + + +
+ + +
+

ISCSI target driver iSCSI-SCST with iSER support

+

ISCSI-SCST is an iSCSI target driver for SCST. It was originated from + IET, but then became a deep rework of it with a lot of fixes and improvements in all areas of + performance, stability and functionality.

+ +

The latest improvement was addition of iSER support, + thanks a lot to Yan Burman and Mellanox Technologies!

+ +

If you are an IET user before installation carefully read README files of both iSCSI-SCST and + the SCST core. You can also use a migration tool developed by Scalable Informatics Inc., which will + convert your IET machine to an iSCSI-SCST machine. See README for more details.

+ +

You can find the latest development version of this driver in the SCST SVN. See the download page how to setup + access to it.

+ +

Certification

+ +

ISCSI-SCST has passed VMware certification as part of + Scale’s Intelligent Clustered Storage technology developed by + Scale Computing as well as VMware and Microsoft certification as part of + storage arrays developed by Open-E, Inc and + Starboard Storage.

+ + +
+
+
+ + + + + + + + + diff --git a/www/target_local.html b/www/target_local.html new file mode 100644 index 000000000..ddfc65a39 --- /dev/null +++ b/www/target_local.html @@ -0,0 +1,104 @@ + + + + + + + + +Local Target Driver + + + + +
+ + + + + +
+ + +
+

Target driver for local access

+ +

This driver allows you to access devices that are exported via SCST + directly on the same Linux system that they are exported from.

+ +

It makes no assumptions in the code about the device types on the target, so + any device handlers that you load in SCST will be visible, including tapes + and so forth.

+ +

Additionally, this driver allows creation of fully functional target drivers in user space. + See README for more details.

+ +

This driver was made by Richard Sharpe.

+ +

You can download + the latest development version from the SCST SVN repository. See the + download page how to setup access to it.




+ +
 
+
+ +
+ + +
+ + + + + + + + + diff --git a/www/target_lsi.html b/www/target_lsi.html new file mode 100644 index 000000000..e68d30c1d --- /dev/null +++ b/www/target_lsi.html @@ -0,0 +1,99 @@ + + + + + + + + +LSI Target Driver + + + + +
+ + + + + +
+ + +
+

Target driver for LSI/MPT adapters

+

SCST LSI + Target driver for LSI/MPT adapters was originally developed by Hu Gang, then Erik Habbinga has continued the development.

+ +

It supports parallel SCSI (SPI), including Wide SCSI, and Fibre Channel, but also should work with SAS. This driver is on the + alpha stage and available for download from the SCST SVN repository. See the download page how to setup access to it. +

+ +

Recently Theodore Vaida updated it for the latest hardware generation, including 12G support. You can download current version + from Github.

+ +


+ +
 
+
+ +
+ + +
+ + + + + + + + + diff --git a/www/target_mvsas.html b/www/target_mvsas.html new file mode 100644 index 000000000..6af1f54ff --- /dev/null +++ b/www/target_mvsas.html @@ -0,0 +1,93 @@ + + + + + + + + +Marvell SAS Target Driver + + + + +
+ + + + + +
+ + +
+

Target driver for Marvell SAS adapters

+

SCST Marvell SAS +

Target driver for Marvell SAS adapters is developed by Marvell and Andy Yan. It is fully functional + SAS target driver.

+ +

It is on the beta stage. You can download it from the SCST SVN repository. See the download page how + to setup access to it.




+ +
 
+
+ +
+ + +
+ + + + + + + + + diff --git a/www/target_old.html b/www/target_old.html new file mode 100644 index 000000000..254c1f2d4 --- /dev/null +++ b/www/target_old.html @@ -0,0 +1,129 @@ + + + + + + + + +Old Unsupported SCST Target Drivers + + + + +
+ + + + + +
+ + +
+

Target driver for QLogic ISP chipsets

+

SCST Unsupported + This is an SCST driver for ISP QLogic chipsets commonly used in many SCSI and FC host bus adapters. + It is based on Matthew Jacob's (http://www.feral.com) + multiplatform driver for ISP chipsets. Update for SCST was made by Stanislaw Gruszka for Open-E Inc.

+ +

The latest release is 1.0.2. It supports kernel versions between 2.6.16 and 2.6.32.

+

This driver is obsoleted in favor of qla2x00t.

+
+ + +

Old target driver for QLogic qla2x00t adapters for 2.4 kernels

+

SCST Unsupported + Old target driver for QLogic qla2x00t adapters is capable to work on 2.4 kernels. + It has all required features and looks to be quite stable. It is designed to work in conjunction with the + initiator driver, which is intended to perform all the initialization and shutdown tasks. In the current release as + a base for the initiator driver was taken Red Hat's driver from the stock 2.4.20 kernel. Then it was patched to + enable the target mode and provide all necessary callbacks, and it's still able to work as initiator only. Mode, + when a host acts as the initiator and the target simultaneously, is also supported. This driver is obsoleted in + favor of 2.6-based driver.

+

The latest version is 0.9.3.4. Requires Linux kernel versions 2.4.20 or higher and SCST version 0.9.3-pre4 or + higher. If you are lucky, it works also on 2.6 kernels, see README file for details. Tested on i386 only, but + should work on any other supported by Linux platform.

+

Currently it is not supported and listed here for historical reasons only.

+
+ +

Target drivers for Adaptec 7xxx and QLogic QLA12xx adapters

+

SCST Unsupported + Target drivers for Adaptec 7xxx and QLogic QLA12xx adapters have been developed by Hu Gang and they available for + download from http://bj.soulinfo.com/~hugang/scst/tgt/. + These drivers are not completed, but looks to be a good starting point if you are going to use one of these adapters. + SCST team don't have the appropriate hardware, therefore have not tested and don't support these drivers. + Send all questions to Hu Gang < hugang at soulinfo com >. If some of these drivers don't compile for + you, try again with SCST version 0.9.3-pre2.



+
+ +

Patches for UNH-iSCSI Target 1.5.03 and 1.6.00 to SCST

+

SCST Unsupported + SCST is much more advanced, than the internal mid-level of + UNH-iSCSI target driver. With SCST the iSCSI target benefits from all its features and gets ability to use all + its advantages, like high performance and scalability, SMP support, required SCSI functionality emulation, etc.

+ +

Since the interface between SCST and the target drivers is based on work, done by UNH IOL, it was relatively + simple to update UNH-iSCSI target to work over SCST. Mostly it was "search and replace" job. The built-in + scsi_target remains available as a compile-time option.

+ +

Requires Linux kernel versions 2.4.20 or higher or 2.6.7 or higher and SCST version 0.9.2 or higher.

+

Currently it is not supported and listed here for historical reasons only.

+ +
+
+ +
+ + + + + + + + + diff --git a/www/target_qla2x00t.html b/www/target_qla2x00t.html new file mode 100644 index 000000000..8003f7d3f --- /dev/null +++ b/www/target_qla2x00t.html @@ -0,0 +1,114 @@ + + + + + + + + +QLogic Fibre Channel Target Driver + + + + +
+ + + + + +
+ + +
+

Target driver qla2x00t for QLogic FC adapters

+

SCST QLogic + This is target driver for QLogic qla2xxx (22xx++) Fibre Channel adapters.

+ +

This driver starting from version 3.1 supports 16G Hilda QLogic + chip based adapters (post-Hilda QLogic chips not supported) and has many other important improvements. + This driver should also support FCoE, but that has never been verified. + It has passed intensive internal SanDisk tests. It is stable and production ready. + This driver is in stable maintenance mode in favor of the + QLogic git tree driver (see below). This is the recommended driver to use in production at the moment. +

+ +

You can find the latest updates for this driver in the SVN trunk.

+ +

The latest QLogic maintained version of this driver with full support + of the latest QLogic chips for both FC and FCoE, including 32G FC, you can find in + git://git.qlogic.com/scst-qla2xxx-unified.git. + It is maintained by QLogic, hence located in the QLogic's git. See SVN root README + for instructions how to integrate it into the SCST build tree.

+ +

Version released in March 2018 seems to be well functional and stable. + However, NPIV, T10-PI as well as modifying port name is not yet supported. + Also, this driver does not support old adapters (same as the in-kernel qla2xxx it is based on).

+ + +
 
+
+ +
+ + +
+ + + + + + + + + diff --git a/www/target_srp.html b/www/target_srp.html new file mode 100644 index 000000000..1aed712d2 --- /dev/null +++ b/www/target_srp.html @@ -0,0 +1,95 @@ + + + + + + + + +SRP Target Driver + + + + +
+ + + + + +
+ + +
+

Infiniband SCSI RDMA protocol (SRP) target driver

+

SCST SRP + SCSI RDMA Protocol (SRP) target driver has been developed by Vu Pham. Since March + 2008 the main development place of the SRP target driver is SCST SVN repository. + It is maintained by Bart Van Assche.

+

This driver is mainline Linux kernel ready and going to be pushed to it + together with other SCST patches.

+


+ +
 
+
+ +
+ + +
+ + + + + + + + + diff --git a/www/targets.html b/www/targets.html new file mode 100644 index 000000000..b2f22f900 --- /dev/null +++ b/www/targets.html @@ -0,0 +1,96 @@ + + + + + + + + +SCST SCSI Target Drivers & Utilities + + + + +
+ + + + + +
+ + +
+

SCST Target Drivers

+ +

SCST has target drivers for:

+
    +
  • iSCSI with iSER
  • +
  • Fibre Channel QLogic qla2xxx series
  • +
  • Infiniband SCSI RDMA Protocol (SRP)
  • +
  • Marvell SAS adapters
  • +
  • Emulex FC/FCoE
  • +
  • LSI/MPT adapters (parallel SCSI, including Wide Ultra320, SAS, Fibre Channel)
  • +
  • FCoE
  • +
  • Local access
  • +
  • IBM pSeries Virtual SCSI
  • +
  • ...
  • +
+ +
 
+
+
+
+ + + + + + + + + diff --git a/www/tomasz_res.txt b/www/tomasz_res.txt new file mode 100644 index 000000000..d8b23f150 --- /dev/null +++ b/www/tomasz_res.txt @@ -0,0 +1,52 @@ +The target is running Debian Lenny 64bit userspace on an Intel Celeron 2.93GHz CPU, 2 GB RAM. + +Initiator is running Debian Etch 64 bit userspace, open-iscsi 2.0-869, Intel Xeon 3050/2.13GHz, 8 GB RAM. + + +Each test was repeated 6 times, "sync" was made and caches were dropped on both sides before each test was started. + +dd parameters were like below, so 6.6 GB of data was read each time: + +dd if=/dev/sdag of=/dev/null bs=64k count=100000 + + +Data was read from two block devices: +- /dev/md0, which is RAID-1 on two ST31500341AS 1.5 TB drives +- encrypted dm-crypt device which is on top of /dev/md0 + +Encrypted device was created with the following additional options passed to cryptsetup +(it provides the most performance on systems where CPU is a bottleneck, but with decreased +security when compared to default options): + +-c aes-ecb-plain -s 128 + + +Generally, CPU on the target was a bottleneck, so I also tested the load on target. + + +md0, crypt columns - averages from dd +us, sy, id, wa - averages from vmstat + + +1. Disk speeds on the target + +Raw performance: 102.17 MB/s +Raw performance (encrypted): 50.21 MB/s + + +2. Read-ahead on the initiator: 256 (default); md0, crypt - MB/s + + md0 us sy id wa | crypt us sy id wa +STGT 50.63 4% 45% 18% 33% | 32.52 3% 62% 16% 19% +SCST (debug + no patches) 43.75 0% 26% 30% 44% | 42.05 0% 84% 1% 15% +SCST (fullperf + patches) 45.18 0% 25% 33% 42% | 44.12 0% 81% 2% 17% + + +3. Read-ahead on the initiator: 16384; md0, crypt - MB/s + + md0 us sy id wa | crypt us sy id wa +STGT 56.43 3% 55% 2% 40% | 46.90 3% 90% 3% 4% +SCST (debug + no patches) 73.85 0% 58% 1% 41% | 42.70 0% 85% 0% 15% +SCST (fullperf + patches) 76.27 0% 63% 1% 36% | 42.52 0% 85% 0% 15% + +Measured by Tomasz Chmielewski diff --git a/www/users.html b/www/users.html new file mode 100644 index 000000000..502747eac --- /dev/null +++ b/www/users.html @@ -0,0 +1,181 @@ + + + + + + + + +SCST Users + + + + + + +
+ + + + + +
+
+

SCST Users

+ +
+ + Companies developed SCST target drivers for their adapters + + + + + + + + + + +
+ Emulex + Cavium + Marvell Technology Group
+ Mellanox Technologies
+ +

+ + Companies using SCST in their products and solutions + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ One Stop Systems + Axxana + Bloombase
+ Data Domain + Enjellic Systems Development + Hewlett-Packard
+ IBM + InMage + Intelligent Systems Services Inc.
+ Kaminario + Logicworks + Open-E, Inc.
+ Openfiler + Oracle + Orabuntu-lxc
+ OS NEXUS + Pranah Storage Technologies + Proxmox
+ QStar Technologies + Scalable Informatics + Scale Computing
+ Small Tree Communications + Soul Information Technology Co., Ltd. + StarWind Software
+ System Fabric Works, Inc. + TechoPhil Ltd
+ +

+ + Companies using SCST for their internal storage infrastructure + + + + + + + +
+ DataCrunch Company + ISM eCompany + SEAKR Engineering, Inc.
+ +
+ +

If your company is using SCST in its products, solutions or internal storage infrastructure, + please contact vst at vlnb net and we will be proud to add you on this page. Also, if your company has certified its + SCST-powered product or solution, we will be proud to write on our pages that SCST engine has successfully + passed the certification tests. This is the least appreciation your company can do for SCST.

+ +
 
+
+
+
+ + + + + + + + + diff --git a/www/vl_res.txt b/www/vl_res.txt new file mode 100644 index 000000000..80c2a110e --- /dev/null +++ b/www/vl_res.txt @@ -0,0 +1,220 @@ +Setup: + +Target: HT 2.4GHz Xeon, x86_32, 2GB of memory limited to 256MB by kernel +command line to have less test data footprint, 75GB 15K RPM SCSI disk as +backstorage, dual port 1Gbps E1000 Intel network card, 2.6.29 kernel. + +Initiator: 1.7GHz Xeon, x86_32, 1GB of memory limited to 256MB by kernel +command line to have less test data footprint, dual port 1Gbps E1000 +Intel network card, 2.6.27 kernel, open-iscsi 2.0-870-rc3. + +The target exported a 5GB file on XFS for FILEIO and 5GB partition for +BLOCKIO. + +All the tests were ran 3 times and average written. All the values are +in MB/s. The tests were ran with CFQ and deadline IO schedulers on the +target. All other parameters on both target and initiator were default. + +================================================================== + +I. SEQUENTIAL ACCESS OVER SINGLE LINE + +1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000 + + ISCSI-SCST IET STGT +NULLIO: 106 105 103 +FILEIO/CFQ: 82 57 55 +FILEIO/deadline 69 69 67 +BLOCKIO/CFQ 81 28 - +BLOCKIO/deadline 80 66 - + +------------------------------------------------------------------ + +2. # dd if=/dev/zero of=/dev/sdX bs=512K count=2000 + +I didn't do other write tests, because I have data on those devices. + + ISCSI-SCST IET STGT +NULLIO: 114 114 114 + +------------------------------------------------------------------ + +3. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then + +# dd if=/mnt/q of=/dev/null bs=512K count=2000 + +were ran (/mnt/q was created before by the next test) + + ISCSI-SCST IET STGT +FILEIO/CFQ: 94 66 46 +FILEIO/deadline 74 74 72 +BLOCKIO/CFQ 95 35 - +BLOCKIO/deadline 94 95 - + +------------------------------------------------------------------ + +4. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then + +# dd if=/dev/zero of=/mnt/q bs=512K count=2000 + +were ran (/mnt/q was created by the next test before) + + ISCSI-SCST IET STGT +FILEIO/CFQ: 97 91 88 +FILEIO/deadline 98 96 90 +BLOCKIO/CFQ 112 110 - +BLOCKIO/deadline 112 110 - + +------------------------------------------------------------------ + +Conclusions: + +1. ISCSI-SCST FILEIO on buffered READs on 27% faster than IET (94 vs +74). With CFQ the difference is 42% (94 vs 66). + +2. ISCSI-SCST FILEIO on buffered READs on 30% faster than STGT (94 vs +72). With CFQ the difference is 104% (94 vs 46). + +3. ISCSI-SCST BLOCKIO on buffered READs has about the same performance +as IET, but with CFQ it's on 170% faster (95 vs 35). + +4. Buffered WRITEs are not so interesting, because they are async. with +many outstanding commands at time, hence latency insensitive, but even +here ISCSI-SCST always a bit faster than IET. + +5. STGT always the worst, sometimes considerably. + +6. BLOCKIO on buffered WRITEs is constantly faster, than FILEIO, so, +definitely, there is a room for future improvement here. + +7. For some reason assess on file system is considerably better, than +the same device directly. + +================================================================== + +II. Mostly random "realistic" access. + +For this test I used io_trash utility. This utility emulates DB-like +access. For more details see http://lkml.org/lkml/2008/11/17/444. To +show value of target-side caching in this test target was ran with full +2GB of memory. I ran io_trash with the following parameters: "2 2 ./ +500000000 50000000 10 4096 4096 300000 10 90 0 10". Total execution +time was measured. + + ISCSI-SCST IET STGT +FILEIO/CFQ: 4m45s 5m 5m17s +FILEIO/deadline 5m20s 5m22s 5m35s +BLOCKIO/CFQ 23m3s 23m5s - +BLOCKIO/deadline 23m15s 23m25s - + +Conclusions: + +1. FILEIO on 500% (five times!) faster than BLOCKIO + +2. STGT, as usually, always the worst + +3. Deadline always a bit slower + +================================================================== + +III. SEQUENTIAL ACCESS OVER MPIO + +Unfortunately, my dual port network card isn't capable of simultaneous +data transfers, so I had to do some "modeling" and put my network +devices in 100Mbps mode. To make this model more realistic I also used +my old IDE 5200RPM hard drive capable to produce locally 35MB/s +throughput. So I modeled the case of double 1Gbps links with 350MB/s +backstorage, if all the following rules satisfied: + + - Both links a capable of simultaneous data transfers + + - There is sufficient amount of CPU power on both initiator and target +to cover requirements for the data transfers. + +All the tests were done with iSCSI-SCST only. + +1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000 + +NULLIO: 23 +FILEIO/CFQ: 20 +FILEIO/deadline 20 +BLOCKIO/CFQ 20 +BLOCKIO/deadline 17 + +Single line NULLIO is 12. + +So, there is a 96% on NULLIO and 83% with HDD storage improvement using +2 lines. With 1Gbps 20MB/s should be equivalent of 200MB/s. Quite good! + +================================================================== + +Connection to the target were made with the following iSCSI parameters: + +# iscsi-scst-adm --op show --tid=1 --sid=0x10000013d0200 +InitialR2T=No +ImmediateData=Yes +MaxConnections=1 +MaxRecvDataSegmentLength=2097152 +MaxXmitDataSegmentLength=131072 +MaxBurstLength=2097152 +FirstBurstLength=262144 +DefaultTime2Wait=2 +DefaultTime2Retain=0 +MaxOutstandingR2T=1 +DataPDUInOrder=Yes +DataSequenceInOrder=Yes +ErrorRecoveryLevel=0 +HeaderDigest=None +DataDigest=None +OFMarker=No +IFMarker=No +OFMarkInt=Reject +IFMarkInt=Reject + +# ietadm --op show --tid=1 --sid=0x10000013d0200 +InitialR2T=No +ImmediateData=Yes +MaxConnections=1 +MaxRecvDataSegmentLength=262144 +MaxXmitDataSegmentLength=131072 +MaxBurstLength=2097152 +FirstBurstLength=262144 +DefaultTime2Wait=2 +DefaultTime2Retain=20 +MaxOutstandingR2T=1 +DataPDUInOrder=Yes +DataSequenceInOrder=Yes +ErrorRecoveryLevel=0 +HeaderDigest=None +DataDigest=None +OFMarker=No +IFMarker=No +OFMarkInt=Reject +IFMarkInt=Reject + +# tgtadm --op show --mode session --tid 1 --sid 1 +MaxRecvDataSegmentLength=2097152 +MaxXmitDataSegmentLength=131072 +HeaderDigest=None +DataDigest=None +InitialR2T=No +MaxOutstandingR2T=1 +ImmediateData=Yes +FirstBurstLength=262144 +MaxBurstLength=2097152 +DataPDUInOrder=Yes +DataSequenceInOrder=Yes +ErrorRecoveryLevel=0 +IFMarker=No +OFMarker=No +DefaultTime2Wait=2 +DefaultTime2Retain=0 +OFMarkInt=Reject +IFMarkInt=Reject +MaxConnections=1 +RDMAExtensions=No +TargetRecvDataSegmentLength=262144 +InitiatorRecvDataSegmentLength=262144 +MaxOutstandingUnexpectedPDUs=0 + +Measured by Vladislav Bolkhovitin