mirror of
https://github.com/SCST-project/scst.git
synced 2026-05-14 09:11:27 +00:00
Web updates
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@763 d57e44dd-8a1f-0410-8b47-8ef2f437770f
This commit is contained in:
22
www/bart_res.txt
Normal file
22
www/bart_res.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
ISCSI over IPoIB (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
|
||||
for the buffered I/O test with a block size of 512K (initiator)
|
||||
against a file of 1GB residing on a tmpfs filesystem on the target are
|
||||
as follows:
|
||||
|
||||
write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
|
||||
read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
|
||||
|
||||
And for a block size of 4 KB:
|
||||
|
||||
write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
|
||||
read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
|
||||
|
||||
Or: depending on the test scenario, SCST transfers data between 2% and
|
||||
30% faster via the iSCSI protocol over this network.
|
||||
|
||||
Something that is not relevant for this comparison, but interesting to
|
||||
know: with the SRP implementation in SCST the maximal read throughput
|
||||
is 1290 MB/s on the same setup.
|
||||
|
||||
Measured by Bart Van Assche
|
||||
|
||||
@@ -66,9 +66,7 @@
|
||||
<tr>
|
||||
<th align="left"> Architecture </th> <td> Kernel only</td> <td> User space only
|
||||
<sup><A HREF="#1">1</A>
|
||||
</sup> </td> <td> Split <sup>
|
||||
<A HREF="#2">2</A>
|
||||
</sup> </td> <td> Kernel only </td>
|
||||
</sup> </td> <td> - </td> <td> Kernel only </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> Stability </th> <td> + </td> <td> +
|
||||
@@ -91,7 +89,7 @@
|
||||
transfer values (parallel SCSI, SAS) </th> <td> + </td> <td> - </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> User Interface </th> <td> ProcFS </td> <td> Custom </td> <td> IOCTL/ProcFS </td> <td> ConfigFS/IOCTL/ProcFS </td>
|
||||
<th align="left"> Interface with user space</th> <td> ProcFS </td> <td> Custom </td> <td> - </td> <td> ConfigFS/IOCTL/ProcFS </td>
|
||||
</tr>
|
||||
|
||||
|
||||
@@ -115,7 +113,7 @@ transfer values (parallel SCSI, SAS) </th> <td> + </td> <td> - </td> <td> -
|
||||
<sup><A HREF="#8">8</A></sup></th> <td> + </td> <td> - </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> Advanced devices visibility management
|
||||
<th align="left"> Advanced devices access control
|
||||
<sup><A HREF="#9">9</A></sup></th> <td> + </td> <td> - </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
@@ -123,10 +121,8 @@ transfer values (parallel SCSI, SAS) </th> <td> + </td> <td> - </td> <td> -
|
||||
(AEN) </th> <td> + </td> <td> - </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> AEN for devices added/removed</th> <td> + </td> <td> - </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> AEN for devices resized</th> <td> + </td> <td> - </td> <td> - </td> <td> - </td>
|
||||
<th align="left"> Notification for devices added/removed or
|
||||
resized through AENs or Unit Attentions</th> <td> + </td> <td> - </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> Bidirectional Commands </th> <td> + <sup>
|
||||
@@ -213,7 +209,7 @@ SCSI requirements <sup><A HREF="#11">11</A></sup></th> <td> Safe </td> <td> S
|
||||
<th align="left"> User space side FILEIO </th> <td> + </td> <td> + </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> SCSI Pass-through </th> <td> + </td> <td> - </td> <td> - </td> <td> Single initiator only, not enforced
|
||||
<th align="left"> SCSI pass-through </th> <td> + </td> <td> - </td> <td> - </td> <td> Single initiator only, not enforced
|
||||
<sup><A HREF="#14">14</A></sup></td>
|
||||
</tr>
|
||||
<tr>
|
||||
@@ -224,7 +220,15 @@ SCSI requirements <sup><A HREF="#11">11</A></sup></th> <td> Safe </td> <td> S
|
||||
in modes, other than pass-through </th> <td> + </td> <td> - </td> <td> - </td> <td> + </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> CDROM emulation from ISO files </th> <td> + </td> <td> + </td> <td> - </td> <td> - </td>
|
||||
<th align="left"> Virtual CD devices emulation from ISO files </th> <td> + </td> <td> + </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> Possibility to write to emulated from ISO files
|
||||
CD devices</th> <td> - </td> <td> + </td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> Emulation of virtual tape and media changer
|
||||
devices</th> <td> - </td> <td>Experimental</td> <td> - </td> <td> - </td>
|
||||
</tr>
|
||||
|
||||
<tr bgcolor="#E0E0E0">
|
||||
@@ -239,6 +243,11 @@ in modes, other than pass-through </th> <td> + </td> <td> - </td> <td> - <
|
||||
</sup></td> <td> Kernel only </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> Interface with user space</th> <td>IOCTL/ProcFS/
|
||||
Netlink</td> <td> - </td> <td>IOCTL/ProcFS/
|
||||
Netlink</td> <td> ConfigFS/IOCTL/ProcFS </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th align="left"> Multiple connections per session (MS/C) </th> <td> - </td> <td> - </td> <td> - </td> <td> + </td>
|
||||
</tr>
|
||||
<tr>
|
||||
@@ -270,7 +279,8 @@ reinstatement <sup><A HREF="#15">15</A></sup></th> <td> Safe </td> <td> Not s
|
||||
<br/>
|
||||
<p><strong><big><u>REMARKS:</u></big></strong></p>
|
||||
|
||||
<p><A NAME="1"></A> 1. STGT has SCSI target engine with memory management in user space and small hooks in the kernel to interact with in-kernel target drivers.</p>
|
||||
<p><A NAME="1"></A> 1. STGT has SCSI target engine and memory management in user space with small hooks in the kernel to interact with in-kernel target drivers.
|
||||
As a direct consequence, fully user space STGT target (e.g. iSCSI) can run without any kernel modules needed.</p>
|
||||
|
||||
<p><A NAME="2"></A> 2. All iSCSI management implemented in user space and actual data transfers in kernel space without user space involved.</p>
|
||||
|
||||
@@ -280,7 +290,8 @@ reinstatement <sup><A HREF="#15">15</A></sup></th> <td> Safe </td> <td> Not s
|
||||
<a href="http://lists.wpkg.org/pipermail/stgt/2009-February/002630.html">http://lists.wpkg.org/pipermail/stgt/2009-February/002630.html</a></p>
|
||||
|
||||
<p><A NAME="4"></A> 4. The result "in average" is listed. One target can be better somewhere, another one somewhere else. Although manual tuning of target and
|
||||
system parameters tends the restore the difference listed in the comparison.</p>
|
||||
system parameters tends the restore the difference listed in the comparison. You can find example measurements <a href="vl_res.txt">here</a>,
|
||||
<a href="bart_res.txt">here</a> and <a href="tomasz_res.txt">here</a>.</p>
|
||||
|
||||
<p><A NAME="5"></A> 5. All SCST and its drivers' kernel patches supposed to be applied and SCST with the drivers built in the release or performance build.
|
||||
Without the kernel patches SCST performance will be at "****+" level, except for the case, when user space backstorage handler used
|
||||
@@ -290,14 +301,17 @@ reinstatement <sup><A HREF="#15">15</A></sup></th> <td> Safe </td> <td> Not s
|
||||
The conclusion was made by source code study only. LIO should have performance on the IET level or less,
|
||||
because of more processing overhead. It might be much less for small block sizes.</p>
|
||||
|
||||
<p><A NAME="7"></A> 7. Some zero-copy functionality isn't available from user space. For instance, zero-copy send to a socket.</p>
|
||||
<p><A NAME="7"></A> 7. Some zero-copy functionality isn't available from user space, sometimes fundamentally.
|
||||
For instance, zero-copy FILEIO with page cache or zero-copy send to a socket. Also STGT can't use splice() for in-kernel
|
||||
target drivers, because it has memory management in user space. To use splice() with socket-based user space target drivers
|
||||
STGT would need a deep redesign of internal interactions between target drivers, core and backend handlers.</p>
|
||||
|
||||
<p><A NAME="8"></A> 8. "Local access to emulated backstorage devices" means that you can access emulated by a SCSI target devices
|
||||
locally on the target host. For instance, you can mount your ISO image as a SCSI CDROM device locally on the
|
||||
target host.</p>
|
||||
locally on the target host. For instance, you can mount your ISO image from emulated by the target
|
||||
CDROM device on the locally target host.</p>
|
||||
|
||||
<p><A NAME="9"></A> 9. "Advanced devices visibility management" means that different initiators can see different sets
|
||||
of devices from the same target. This feature is essential with hardware targets, which don't have a possibility
|
||||
<p><A NAME="9"></A> 9. "Advanced devices access control" means that different initiators can see different sets
|
||||
of devices from the same target. This feature is essential for hardware targets, which don't have ability
|
||||
to create virtual targets.</p>
|
||||
|
||||
<p><A NAME="10"></A> 10. Not well tested, because at the moment there is no backend using this functionality.</p>
|
||||
|
||||
@@ -52,9 +52,14 @@ blocked my access to the LIO mailing list, preventing me from tell that
|
||||
to the interested people myself. After all our previous discussions and
|
||||
with his skills and experience it's nearly impossible to believe that Nicholas Bellinger
|
||||
didn't know that SCST is a lot more generic than LIO and has zero-copy
|
||||
in all the same places, where LIO has. So, that comparison page looked
|
||||
like rather a deliberate cheating attempt and for SCST there was no other way, except to setup own, correct comparison.
|
||||
This comparison table turned out to be very useful, so it was extended to cover all the SCSI target areas.</p>
|
||||
in all the same places, where LIO has. Thus, that comparison page looked like rather a deliberate cheating
|
||||
attempt. Seems SCST is so much superior over LIO, so Nicholas Bellinger gave up technical discussions and started
|
||||
attacking people's perception about SCST, trying to inspire them the
|
||||
opposite.</p>
|
||||
|
||||
<p>So, for SCST there was no other way, except to setup own,
|
||||
correct comparison. This comparison table turned out to be very useful,
|
||||
so it was extended to cover all the SCSI target areas.</p>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -45,9 +45,9 @@
|
||||
better support and troubleshooting for you.
|
||||
</ul>
|
||||
|
||||
<h1>Possible SCST improvements</h1>
|
||||
<h1>Possible SCST extensions and improvements</h1>
|
||||
|
||||
<h3>Zero-copy FILEIO for READ-direction commands</h3>
|
||||
<A NAME="ZC_READ"></A><h3>Zero-copy FILEIO for READ-direction commands</h3>
|
||||
|
||||
<p>At the moment, SCST in FILEIO mode uses standard Linux read() and write() syscalls paths,
|
||||
which copy data from the page cache to the supplied buffer and back. Zero-copy FILEIO
|
||||
@@ -83,7 +83,7 @@
|
||||
|
||||
<p>That's all. For WRITEs the current code path would remain unchanged.</p>
|
||||
|
||||
<h3>Zero-copy FILEIO for WRITE-direction commands</h3>
|
||||
<A NAME="ZC_WRITE"></A><h3>Zero-copy FILEIO for WRITE-direction commands</h3>
|
||||
|
||||
<p>Implementation should be similar to zero-copy FILEIO for READ commands and should
|
||||
be done after it. All incoming data should be inserted in the page cache, then dereferenced in
|
||||
@@ -91,7 +91,7 @@
|
||||
page cache, namely, locking issues related to it. They should be carefully
|
||||
investigated.</p>
|
||||
|
||||
<h3>Persistent reservations</h3>
|
||||
<A NAME="PR"></A><h3>Persistent reservations</h3>
|
||||
|
||||
<p>Support for PERSISTENT RESERVE IN and PERSISTENT RESERVE OUT is required to
|
||||
work in many cluster environments, e.g. Windows 2003 Cluster.</p>
|
||||
@@ -107,7 +107,7 @@
|
||||
devices only and reject PERSISTENT RESERVE IN and OUT commands for
|
||||
pass-through devices with "COMMAND NOT SUPPORTED" sense data.</p>
|
||||
|
||||
<h3>Automatic sessions reassignment</h3>
|
||||
<A NAME="AUTO_SESS"></A><h3>Automatic sessions reassignment</h3>
|
||||
|
||||
<p>At the moment, if security name for an initiator reassigned (moved) to another security
|
||||
group, the existing sessions from that initiator are not automatically reassigned to
|
||||
@@ -129,7 +129,7 @@
|
||||
<li><span>Resume the activities.</span></li>
|
||||
</ul>
|
||||
|
||||
<h3>Dynamic I/O flow control</h3>
|
||||
<A NAME="DYN_FLOW"></A><h3>Dynamic I/O flow control</h3>
|
||||
|
||||
<p>At the moment, if an initiator or several initiators simultaneously send to
|
||||
target too many commands, especially in seek intensive workloads, target can get
|
||||
@@ -279,7 +279,7 @@
|
||||
<p>Then, at the latest stage of the development, logic to not schedule the
|
||||
flow control work on idle devices should be added.</p>
|
||||
|
||||
<h3>Support for O_DIRECT in scst_vdisk handler</h3>
|
||||
<A NAME="O_DIRECT"></A><h3>Support for O_DIRECT in scst_vdisk handler</h3>
|
||||
|
||||
<p>At the moment, scst_vdisk handler doesn't support O_DIRECT option and possibility to set it
|
||||
was disabled. This limitation caused by Linux kernel expectation that memory supplied to
|
||||
@@ -291,7 +291,7 @@
|
||||
by pages, taken directly from dio->curr_user_address. Each such page should be referenced
|
||||
by page_cache_get(). That's all.</p>
|
||||
|
||||
<h3>Refactoring of command execution path in scst_vdisk handler</h3>
|
||||
<A NAME="VDISK_REFACTOR"></A><h3>Refactoring of command execution path in scst_vdisk handler</h3>
|
||||
|
||||
<p>At the moment, in scst_vdisk handler command execution function vdisk_do_job() is
|
||||
overcomplicated and not very performance effective. It would be good to replace all those
|
||||
@@ -309,7 +309,7 @@
|
||||
return vdisk_exec_fns[cmd->cdb[0]](cmd);
|
||||
}</p></listing>
|
||||
|
||||
<h3>Solve SG IO count limitation issue in pass-through mode</h3>
|
||||
<A NAME="SG_LIMIT"></A><h3>Solve SG IO count limitation issue in pass-through mode</h3>
|
||||
|
||||
<p>In the pass-through mode (i.e. using the pass-through device handlers
|
||||
scst_disk, scst_tape, etc) SCSI commands, coming from remote initiators,
|
||||
@@ -323,6 +323,98 @@
|
||||
|
||||
<p>In <a href="sgv_big_order_alloc.diff">sgv_big_order_alloc.diff</a> you
|
||||
can find a possible way to solve this issue.</p>
|
||||
|
||||
<A NAME="MEM_REG"></A><h3>Memory registration</h3>
|
||||
|
||||
<p>In some cases a target driver might need to register memory used for data buffers in the
|
||||
hardware. At the moment, none of SCST target drivers, including InfiniBand SRP target driver,
|
||||
need that feature. But in case if in future there is a need in such a feature, it can be easily
|
||||
added by extending SCST SGV cache. The SCST SGV cache is a memory management
|
||||
subsystem in SCST. It doesn't free to the system each data buffer,
|
||||
which is not used anymore, but keeps it for a while to let it be reused by the
|
||||
next consecutive command to reduce command processing latency and, hence, improve performance.</p>
|
||||
|
||||
<p>To support memory buffers registrations, it can be extended by the following way:</p>
|
||||
|
||||
<p>1. Struct scst_tgt_template would be extended to have 2 new callbacks:</p>
|
||||
|
||||
<ul>
|
||||
|
||||
<li><span>int register_buffer(struct scst_cmd *cmd)</span></li>
|
||||
|
||||
<li><span>int unregister_buffer(unsigned long mem_priv, void *scst_priv)</span></li>
|
||||
|
||||
</ul>
|
||||
|
||||
<p>2. SCST core would be extended to have 4 new functions:</p>
|
||||
|
||||
<ul>
|
||||
|
||||
<li><span>int scst_mem_registered(struct scst_cmd *cmd)</span></li>
|
||||
|
||||
<li><span>int scst_mem_deregistered(void *scst_priv)</span></li>
|
||||
|
||||
<li><span>int scst_set_mem_priv(struct scst_cmd *cmd, unsigned long mem_priv)</span></li>
|
||||
|
||||
<li><span>unsigned long scst_get_mem_priv(struct scst_cmd *cmd)</span></li>
|
||||
|
||||
</ul>
|
||||
|
||||
<p>3. The workflow would be the following:</p>
|
||||
|
||||
<ol>
|
||||
<li><span>If target driver defined register_buffer() and unregister_buffer() callbacks,
|
||||
SCST core would allocate a dedicated SGV cache for each instance of struct scst_tgt,
|
||||
i.e. target.</span></li>
|
||||
|
||||
<li><span>When there would be an SGV cache miss in memory buffer for a command allocation,
|
||||
SCST would check if register_buffer() callback was defined in the target driver's template
|
||||
and, if yes, would call it.</span></li>
|
||||
|
||||
<li><span>In register_buffer() callback the target driver would do necessary actions to
|
||||
start registration of the commands memory buffer.</span></li>
|
||||
|
||||
<li><span>Upon register_buffer() callback returns, SCST core would suspend processing the
|
||||
corresponding command and would switch to processing of the next commands.</span></li>
|
||||
|
||||
<li><span>After the memory registration finished, the target driver would call scst_set_mem_priv()
|
||||
to associate the memory buffer with some internal data.</span></li>
|
||||
|
||||
<li><span>Then the target driver would call scst_mem_registered() and SCST would resume processing
|
||||
the command.</span></li>
|
||||
|
||||
<li><span>After the command finished, the corresponding memory buffer would remain in the
|
||||
SGV cache in the registered state and would be reused by the next commands. For each of them
|
||||
the target driver can at any time figure out the associated with the registered buffer data
|
||||
by using scst_get_mem_priv().</span></li>
|
||||
|
||||
<li><span>When the SGV cache decide that there is a time to free the memory buffer, it would
|
||||
call the target driver's unregister_buffer() callback.</span></li>
|
||||
|
||||
<li><span>In this callback the target driver would do necessary actions to start deregistration of the
|
||||
commands memory buffer.</span></li>
|
||||
|
||||
<li><span>Upon unregister_buffer() callback returns, SGV cache would suspend freeing the corresponding buffer
|
||||
and would switch to other deals it has.</span></li>
|
||||
|
||||
<li><span>After the memory deregistration finished, the target driver would call scst_mem_deregistered()
|
||||
and pass to it scst_priv pointer, received in unregister_buffer(). Then the memory buffer
|
||||
would be freed by the SGV cache.
|
||||
</span></li>
|
||||
</ol>
|
||||
|
||||
<A NAME="NON_SCSI_TGT"></A><h3>SCST usage with non-SCSI transports</h3>
|
||||
|
||||
<p>SCST might also be used with non-SCSI speaking transports, like NBD or AoE. Such cooperation
|
||||
would allow them to use SCST-emulated backend.</p>
|
||||
|
||||
<p>For user space targets this is trivial: they simply should use SCST-emulated devices locally
|
||||
via scst_local module.</p>
|
||||
|
||||
<p>For in-kernel non-SCSI target driver it's a bit more complicated. They should implement a small layer,
|
||||
which would translate their internal READ/WRITE requests to corresponding SCSI commands and, on the
|
||||
way back, SCSI status and sense codes to their internal status codes.</p>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -334,7 +334,7 @@
|
||||
</ul></span></li></ul></p>
|
||||
<p>All outstanding commands will be finished regularly. After <strong>scst_unregister_session()</strong> returned
|
||||
no new commands must be sent to SCST via <strong>scst_rx_cmd()</strong>. Also, the caller must ensure that no
|
||||
<strong>scst_rx_cmd()</strong> or <strong>scst_rx_mgmt_fn_*()</strong> is called in paralell with
|
||||
<strong>scst_rx_cmd()</strong> or <strong>scst_rx_mgmt_fn_*()</strong> is called in parallel with
|
||||
<strong>scst_unregister_session()</strong>.</p>
|
||||
<p>Function <strong>scst_unregister_session()</strong> can be called before <strong>result_fn()</strong> of
|
||||
<strong>scst_register_session()</strong> called, i.e. during the session registration/initialization.</p>
|
||||
@@ -582,8 +582,8 @@
|
||||
<li><span><strong>int (*dev_done) (struct scst_cmd *cmd)</strong> - called to notify device handler about the result of the command's execution and perform some post processing. If <strong>parse()</strong> function is called, <strong>dev_done()</strong> is guaranteed to be called as well. The command's fields <strong>tgt_resp_flags</strong> and <strong>resp_data_len</strong> should be set by this function, but SCST offers good defaults. Pay attention to "atomic" attribute of the command, which can be get via <strong>scst_cmd_atomic()</strong>: it is true if the function called in the atomic (non-sleeping) context. Returns the command's next state or <strong>SCST_CMD_STATE_DEFAULT</strong>, if the next default state should be used, or <strong>SCST_CMD_STATE_NEED_THREAD_CTX</strong> if the function called in atomic context, but requires sleeping. In the last case, the function will be recalled in the thread context, where sleeping is allowed.</span></li>
|
||||
<li><span><strong>int (*task_mgmt_fn) (struct scst_mgmt_cmd *mgmt_cmd, struct scst_tgt_dev *tgt_dev, struct scst_cmd *cmd_to_abort)</strong> - called to execute a task management command. Returns:
|
||||
<ul>
|
||||
<li><span><strong>SCST_DEV_TM_COMPLETED_SUCCESS</strong> - the command is done with success, no firther actions required</span></li>
|
||||
<li><span><strong>SCST_DEV_TM_COMPLETED_FAILED</strong> - the command is failed, no firther actions required</span></li>
|
||||
<li><span><strong>SCST_DEV_TM_COMPLETED_SUCCESS</strong> - the command is done with success, no further actions required</span></li>
|
||||
<li><span><strong>SCST_DEV_TM_COMPLETED_FAILED</strong> - the command is failed, no further actions required</span></li>
|
||||
<li><span><strong>SCST_DEV_TM_NOT_COMPLETED</strong> - regular standard actions for the command should be done</span></li>
|
||||
</ul><strong>NOTE</strong>: for <strong>SCST_ABORT_TASK</strong> called under spinlock</span></li>
|
||||
<li><span><strong>void (*on_free_cmd) (struct scst_cmd *cmd)</strong> - called to notify device handler that the command is about to be freed. Could be called on IRQ context.</span></li>
|
||||
@@ -1052,7 +1052,7 @@
|
||||
|
||||
<h3>13.7 scst_add_threads() and scst_del_threads()</h3>
|
||||
<p>These functions allows to add or delete some SCST threads. For example, if <strong>exec()</strong> function in
|
||||
your device handler works synchronously, i.e. wait for job's completition, in order to prevent performance loss you
|
||||
your device handler works synchronously, i.e. wait for job's completion, in order to prevent performance loss you
|
||||
can add for SCST as many threads as there are devices serviced by your device handler.</p>
|
||||
<p>Function <strong>scst_add_threads()</strong> starts requested number of threads. It is defined as the following:</p>
|
||||
<p><code>int scst_add_threads(<br />
|
||||
|
||||
52
www/tomasz_res.txt
Normal file
52
www/tomasz_res.txt
Normal file
@@ -0,0 +1,52 @@
|
||||
The target is running Debian Lenny 64bit userspace on an Intel Celeron 2.93GHz CPU, 2 GB RAM.
|
||||
|
||||
Initiator is running Debian Etch 64 bit userspace, open-iscsi 2.0-869, Intel Xeon 3050/2.13GHz, 8 GB RAM.
|
||||
|
||||
|
||||
Each test was repeated 6 times, "sync" was made and caches were dropped on both sides before each test was started.
|
||||
|
||||
dd parameters were like below, so 6.6 GB of data was read each time:
|
||||
|
||||
dd if=/dev/sdag of=/dev/null bs=64k count=100000
|
||||
|
||||
|
||||
Data was read from two block devices:
|
||||
- /dev/md0, which is RAID-1 on two ST31500341AS 1.5 TB drives
|
||||
- encrypted dm-crypt device which is on top of /dev/md0
|
||||
|
||||
Encrypted device was created with the following additional options passed to cryptsetup
|
||||
(it provides the most performance on systems where CPU is a bottleneck, but with decreased
|
||||
security when compared to default options):
|
||||
|
||||
-c aes-ecb-plain -s 128
|
||||
|
||||
|
||||
Generally, CPU on the target was a bottleneck, so I also tested the load on target.
|
||||
|
||||
|
||||
md0, crypt columns - averages from dd
|
||||
us, sy, id, wa - averages from vmstat
|
||||
|
||||
|
||||
1. Disk speeds on the target
|
||||
|
||||
Raw performance: 102.17 MB/s
|
||||
Raw performance (encrypted): 50.21 MB/s
|
||||
|
||||
|
||||
2. Read-ahead on the initiator: 256 (default); md0, crypt - MB/s
|
||||
|
||||
md0 us sy id wa | crypt us sy id wa
|
||||
STGT 50.63 4% 45% 18% 33% | 32.52 3% 62% 16% 19%
|
||||
SCST (debug + no patches) 43.75 0% 26% 30% 44% | 42.05 0% 84% 1% 15%
|
||||
SCST (fullperf + patches) 45.18 0% 25% 33% 42% | 44.12 0% 81% 2% 17%
|
||||
|
||||
|
||||
3. Read-ahead on the initiator: 16384; md0, crypt - MB/s
|
||||
|
||||
md0 us sy id wa | crypt us sy id wa
|
||||
STGT 56.43 3% 55% 2% 40% | 46.90 3% 90% 3% 4%
|
||||
SCST (debug + no patches) 73.85 0% 58% 1% 41% | 42.70 0% 85% 0% 15%
|
||||
SCST (fullperf + patches) 76.27 0% 63% 1% 36% | 42.52 0% 85% 0% 15%
|
||||
|
||||
Measured by Tomasz Chmielewski
|
||||
220
www/vl_res.txt
Normal file
220
www/vl_res.txt
Normal file
@@ -0,0 +1,220 @@
|
||||
Setup:
|
||||
|
||||
Target: HT 2.4GHz Xeon, x86_32, 2GB of memory limited to 256MB by kernel
|
||||
command line to have less test data footprint, 75GB 15K RPM SCSI disk as
|
||||
backstorage, dual port 1Gbps E1000 Intel network card, 2.6.29 kernel.
|
||||
|
||||
Initiator: 1.7GHz Xeon, x86_32, 1GB of memory limited to 256MB by kernel
|
||||
command line to have less test data footprint, dual port 1Gbps E1000
|
||||
Intel network card, 2.6.27 kernel, open-iscsi 2.0-870-rc3.
|
||||
|
||||
The target exported a 5GB file on XFS for FILEIO and 5GB partition for
|
||||
BLOCKIO.
|
||||
|
||||
All the tests were ran 3 times and average written. All the values are
|
||||
in MB/s. The tests were ran with CFQ and deadline IO schedulers on the
|
||||
target. All other parameters on both target and initiator were default.
|
||||
|
||||
==================================================================
|
||||
|
||||
I. SEQUENTIAL ACCESS OVER SINGLE LINE
|
||||
|
||||
1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
|
||||
|
||||
ISCSI-SCST IET STGT
|
||||
NULLIO: 106 105 103
|
||||
FILEIO/CFQ: 82 57 55
|
||||
FILEIO/deadline 69 69 67
|
||||
BLOCKIO/CFQ 81 28 -
|
||||
BLOCKIO/deadline 80 66 -
|
||||
|
||||
------------------------------------------------------------------
|
||||
|
||||
2. # dd if=/dev/zero of=/dev/sdX bs=512K count=2000
|
||||
|
||||
I didn't do other write tests, because I have data on those devices.
|
||||
|
||||
ISCSI-SCST IET STGT
|
||||
NULLIO: 114 114 114
|
||||
|
||||
------------------------------------------------------------------
|
||||
|
||||
3. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then
|
||||
|
||||
# dd if=/mnt/q of=/dev/null bs=512K count=2000
|
||||
|
||||
were ran (/mnt/q was created before by the next test)
|
||||
|
||||
ISCSI-SCST IET STGT
|
||||
FILEIO/CFQ: 94 66 46
|
||||
FILEIO/deadline 74 74 72
|
||||
BLOCKIO/CFQ 95 35 -
|
||||
BLOCKIO/deadline 94 95 -
|
||||
|
||||
------------------------------------------------------------------
|
||||
|
||||
4. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then
|
||||
|
||||
# dd if=/dev/zero of=/mnt/q bs=512K count=2000
|
||||
|
||||
were ran (/mnt/q was created by the next test before)
|
||||
|
||||
ISCSI-SCST IET STGT
|
||||
FILEIO/CFQ: 97 91 88
|
||||
FILEIO/deadline 98 96 90
|
||||
BLOCKIO/CFQ 112 110 -
|
||||
BLOCKIO/deadline 112 110 -
|
||||
|
||||
------------------------------------------------------------------
|
||||
|
||||
Conclusions:
|
||||
|
||||
1. ISCSI-SCST FILEIO on buffered READs on 27% faster than IET (94 vs
|
||||
74). With CFQ the difference is 42% (94 vs 66).
|
||||
|
||||
2. ISCSI-SCST FILEIO on buffered READs on 30% faster than STGT (94 vs
|
||||
72). With CFQ the difference is 104% (94 vs 46).
|
||||
|
||||
3. ISCSI-SCST BLOCKIO on buffered READs has about the same performance
|
||||
as IET, but with CFQ it's on 170% faster (95 vs 35).
|
||||
|
||||
4. Buffered WRITEs are not so interesting, because they are async. with
|
||||
many outstanding commands at time, hence latency insensitive, but even
|
||||
here ISCSI-SCST always a bit faster than IET.
|
||||
|
||||
5. STGT always the worst, sometimes considerably.
|
||||
|
||||
6. BLOCKIO on buffered WRITEs is constantly faster, than FILEIO, so,
|
||||
definitely, there is a room for future improvement here.
|
||||
|
||||
7. For some reason assess on file system is considerably better, than
|
||||
the same device directly.
|
||||
|
||||
==================================================================
|
||||
|
||||
II. Mostly random "realistic" access.
|
||||
|
||||
For this test I used io_trash utility. This utility emulates DB-like
|
||||
access. For more details see http://lkml.org/lkml/2008/11/17/444. To
|
||||
show value of target-side caching in this test target was ran with full
|
||||
2GB of memory. I ran io_trash with the following parameters: "2 2 ./
|
||||
500000000 50000000 10 4096 4096 300000 10 90 0 10". Total execution
|
||||
time was measured.
|
||||
|
||||
ISCSI-SCST IET STGT
|
||||
FILEIO/CFQ: 4m45s 5m 5m17s
|
||||
FILEIO/deadline 5m20s 5m22s 5m35s
|
||||
BLOCKIO/CFQ 23m3s 23m5s -
|
||||
BLOCKIO/deadline 23m15s 23m25s -
|
||||
|
||||
Conclusions:
|
||||
|
||||
1. FILEIO on 500% (five times!) faster than BLOCKIO
|
||||
|
||||
2. STGT, as usually, always the worst
|
||||
|
||||
3. Deadline always a bit slower
|
||||
|
||||
==================================================================
|
||||
|
||||
III. SEQUENTIAL ACCESS OVER MPIO
|
||||
|
||||
Unfortunately, my dual port network card isn't capable of simultaneous
|
||||
data transfers, so I had to do some "modeling" and put my network
|
||||
devices in 100Mbps mode. To make this model more realistic I also used
|
||||
my old IDE 5200RPM hard drive capable to produce locally 35MB/s
|
||||
throughput. So I modeled the case of double 1Gbps links with 350MB/s
|
||||
backstorage, if all the following rules satisfied:
|
||||
|
||||
- Both links a capable of simultaneous data transfers
|
||||
|
||||
- There is sufficient amount of CPU power on both initiator and target
|
||||
to cover requirements for the data transfers.
|
||||
|
||||
All the tests were done with iSCSI-SCST only.
|
||||
|
||||
1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
|
||||
|
||||
NULLIO: 23
|
||||
FILEIO/CFQ: 20
|
||||
FILEIO/deadline 20
|
||||
BLOCKIO/CFQ 20
|
||||
BLOCKIO/deadline 17
|
||||
|
||||
Single line NULLIO is 12.
|
||||
|
||||
So, there is a 67% improvement using 2 lines. With 1Gbps it should be
|
||||
equivalent of 200MB/s. Not too bad.
|
||||
|
||||
==================================================================
|
||||
|
||||
Connection to the target were made with the following iSCSI parameters:
|
||||
|
||||
# iscsi-scst-adm --op show --tid=1 --sid=0x10000013d0200
|
||||
InitialR2T=No
|
||||
ImmediateData=Yes
|
||||
MaxConnections=1
|
||||
MaxRecvDataSegmentLength=2097152
|
||||
MaxXmitDataSegmentLength=131072
|
||||
MaxBurstLength=2097152
|
||||
FirstBurstLength=262144
|
||||
DefaultTime2Wait=2
|
||||
DefaultTime2Retain=0
|
||||
MaxOutstandingR2T=1
|
||||
DataPDUInOrder=Yes
|
||||
DataSequenceInOrder=Yes
|
||||
ErrorRecoveryLevel=0
|
||||
HeaderDigest=None
|
||||
DataDigest=None
|
||||
OFMarker=No
|
||||
IFMarker=No
|
||||
OFMarkInt=Reject
|
||||
IFMarkInt=Reject
|
||||
|
||||
# ietadm --op show --tid=1 --sid=0x10000013d0200
|
||||
InitialR2T=No
|
||||
ImmediateData=Yes
|
||||
MaxConnections=1
|
||||
MaxRecvDataSegmentLength=262144
|
||||
MaxXmitDataSegmentLength=131072
|
||||
MaxBurstLength=2097152
|
||||
FirstBurstLength=262144
|
||||
DefaultTime2Wait=2
|
||||
DefaultTime2Retain=20
|
||||
MaxOutstandingR2T=1
|
||||
DataPDUInOrder=Yes
|
||||
DataSequenceInOrder=Yes
|
||||
ErrorRecoveryLevel=0
|
||||
HeaderDigest=None
|
||||
DataDigest=None
|
||||
OFMarker=No
|
||||
IFMarker=No
|
||||
OFMarkInt=Reject
|
||||
IFMarkInt=Reject
|
||||
|
||||
# tgtadm --op show --mode session --tid 1 --sid 1
|
||||
MaxRecvDataSegmentLength=2097152
|
||||
MaxXmitDataSegmentLength=131072
|
||||
HeaderDigest=None
|
||||
DataDigest=None
|
||||
InitialR2T=No
|
||||
MaxOutstandingR2T=1
|
||||
ImmediateData=Yes
|
||||
FirstBurstLength=262144
|
||||
MaxBurstLength=2097152
|
||||
DataPDUInOrder=Yes
|
||||
DataSequenceInOrder=Yes
|
||||
ErrorRecoveryLevel=0
|
||||
IFMarker=No
|
||||
OFMarker=No
|
||||
DefaultTime2Wait=2
|
||||
DefaultTime2Retain=0
|
||||
OFMarkInt=Reject
|
||||
IFMarkInt=Reject
|
||||
MaxConnections=1
|
||||
RDMAExtensions=No
|
||||
TargetRecvDataSegmentLength=262144
|
||||
InitiatorRecvDataSegmentLength=262144
|
||||
MaxOutstandingUnexpectedPDUs=0
|
||||
|
||||
Measured by Vladislav Bolkhovitin
|
||||
Reference in New Issue
Block a user