mirror of
https://github.com/SCST-project/scst.git
synced 2026-05-14 09:11:27 +00:00
We moved to GitHub a year ago, so replace the SourceForge URLs with the actual GitHub URLs.
267 lines
12 KiB
HTML
267 lines
12 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<meta name="Keywords" content="MC/S, MC/S vs MPIO, multiple connections per session">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<meta name="author" content="Daniel Fernandes">
|
|
<meta name="Robots" content="index,follow">
|
|
<link rel="stylesheet" href="images/Orange.css" type="text/css">
|
|
<title>MC/S vs MPIO</title>
|
|
</head>
|
|
|
|
<body>
|
|
<!-- wrap starts here -->
|
|
<div id="wrap">
|
|
<div id="header">
|
|
<div class="logoimg"></div><h1 id="logo"><span class="orange"> </span></h1>
|
|
<h2 id=slogan>Generic SCSI Target Subsystem for Linux</h2>
|
|
</div>
|
|
|
|
<div id="menu">
|
|
<ul>
|
|
<li><a href="index.html">Home</a></li>
|
|
<li><a href="https://github.com/SCST-project/scst">Main</a></li>
|
|
<li><a href="https://github.com/SCST-project/scst/releases">News</a></li>
|
|
<li><a href="targets.html">Drivers</a></li>
|
|
<li><a href="downloads.html">Downloads</a></li>
|
|
<li><a href="contributing.html">Contributing</a></li>
|
|
<li id="current"><a href="comparison.html">Comparison</a></li>
|
|
<li><a href="users.html">Users</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<!-- content-wrap starts here -->
|
|
<div id="content-wrap">
|
|
<div id="sidebar">
|
|
<h1>Comparison</h1>
|
|
<ul class="sidemenu">
|
|
<li><a href="comparison.html">Features comparison</a></li>
|
|
<li><a href="scstvslio.html">SCST vs LIO/TCM</a></li>
|
|
<li><a href="scstvsstgt.html">SCST vs STGT</a></li>
|
|
<li><a href="mc_s.html">MC/S vs MPIO</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div id="main">
|
|
|
|
<h1>MC/S vs MPIO</h1>
|
|
|
|
<p>MC/S (Multiple Connections per Session) is a feature of iSCSI
|
|
protocol, which allows to combine several connections inside a single
|
|
session for performance and failover purposes. Let's consider what
|
|
practical value this feature has comparing with OS level multipath
|
|
(MPIO) and try to answer why none of Open Source OS'es neither still
|
|
support it, despite of many years since iSCSI protocol started
|
|
being actively used, nor going to implement it in the future.</p>
|
|
|
|
<p>MC/S is done on the iSCSI level, while MPIO is done on the higher
|
|
level. Hence, all MPIO infrastructure is shared among all SCSI
|
|
transports, including Fibre Channel, SAS, etc. </p>
|
|
|
|
<p>MC/S was designed at time, when most OS'es didn't have standard OS level
|
|
multipath. Instead, each vendor had its own implementation, which
|
|
created huge interoperability problems. So, one of the goals of MC/S was
|
|
to address this issue and standardize the multipath area in a single standard. But
|
|
nowadays almost all OS'es has OS level multipath implemented using
|
|
standard SCSI facilities, hence this purpose of MC/S isn't valid anymore.</p>
|
|
|
|
<p>Usually it is claimed, than MC/S has the following 2 advantages over MPIO:</p>
|
|
|
|
<ol>
|
|
<li><span>Faster failover recovery.</span></li>
|
|
|
|
<li><span>Better performance.</span></li>
|
|
|
|
</ol>
|
|
|
|
<p>Let's look how realistic those claims are.</p>
|
|
|
|
<h2>Failover recovery time</h2>
|
|
|
|
<p>Let's consider a single target exporting a single device over 2 links.</p>
|
|
|
|
<p>For MC/S failover recovery is quite simple: all outstanding SCSI
|
|
commands reassigned to another connection. No other actions are
|
|
necessary, because session (i.e. I_T Nexus) remains the same.
|
|
Consequently, all reservations and other SCSI states as well as other
|
|
initiators connected to the device remain unaffected.</p>
|
|
|
|
<p>For MPIO failover recovery is much more complicated. This is because
|
|
it involves transfer of all outstanding commands and SCSI states from one
|
|
I_T Nexus to another. The first thing, which initiator will do for
|
|
that is to abort all outstanding commands on the faulted
|
|
I_T Nexus. There are 2 approaches for that: CLEAR TASK SET and LUN RESET
|
|
task management functions. </p>
|
|
|
|
<p>CLEAR TASK SET function aborts all commands on the device.
|
|
Unfortunately, it has limitations: it isn't always supported by device
|
|
and having single task set shared over initiators isn't always
|
|
appropriate for application.</p>
|
|
|
|
<p>LUN RESET function resets the device.</p>
|
|
|
|
<p>Both CLEAR TASK SET and LUN RESET functions can somehow harm
|
|
other initiators, because all commands from all initiators, not only
|
|
from one doing the failover recovery, will be aborted. Additionally, LUN
|
|
RESET resets all SCSI settings for all connected initiators to the
|
|
initial state and, if device had reservation from any initiator, it will
|
|
be cleared.
|
|
|
|
<p>But the harm is minimal:</p>
|
|
|
|
<ul>
|
|
<li><span> With TAS bit set on Control Mode page, all the aborted commands will
|
|
be returned to all affected initiators with TASK ABORTED status, so they
|
|
can simply immediately retry them. For CLEAR TASK SET if TAS isn't set
|
|
all affected initiators will be notified by Unit Attention COMMANDS
|
|
CLEARED BY ANOTHER INITIATOR, so they also can immediately retry all
|
|
outstanding commands.</span></li>
|
|
|
|
<li><span>In case of the device reset the affected initiators will be notified via
|
|
the corresponding Unit Attention about reset of
|
|
all SCSI settings to the initial state. Then the initiators can do necessary
|
|
recovery actions. Usually no recovery actions are needed, except for the
|
|
reservation holder, whose reservation was cleared. For it recovery might
|
|
be not trivial. But Persistent Reservations solve this issue, because
|
|
they are not cleared by the device reset.</span></li>
|
|
</ul>
|
|
|
|
<p>Thus, with Persistent Reservations or using CLEAR TASK SET function
|
|
additional failover recovery time, which MPIO has comparing to MC/S,
|
|
is time to wait for reset or commands abort finished and time to
|
|
retry all the aborted commands. On a properly configured system it
|
|
should be less than few seconds, which is well acceptable on practice.
|
|
If Linux storage stack improved to allow to abort all submitted to it
|
|
commands (currently only wait for their completion is possible), then
|
|
time to abort all the commands can be decreased to a fraction of second. </p>
|
|
|
|
<h2>Performance</h2>
|
|
|
|
<p>At first, neither MC/S, nor MPIO can improve performance if there is
|
|
only one SCSI command sent to target at time. For instance, in case of
|
|
tape backup and restore. Both MC/S and MPIO work on the commands level,
|
|
so can't split data transfers for a single command over several links.
|
|
Only bonding (also known as NIC teaming or Link Aggregation) can improve
|
|
performance in this case, because it works on the link level.</p>
|
|
|
|
<p>MC/S over several links preserves commands execution order, i.e. with
|
|
it commands executed in the same order as they were submitted. MPIO
|
|
can't preserve this order, because it can't see, which command on which
|
|
link was submitted earlier. Delays in links processing can change
|
|
commands order in the place where target receives them.</p>
|
|
|
|
<p>Since initiators usually send commands in the optimal for performance
|
|
order, reordering can somehow hurt performance. But this can happen only with
|
|
naive target implementation, which can't recover the optimal commands execution
|
|
order. Currently Linux is not naive and quite good on this area. See, for
|
|
instance, section "SEQUENTIAL ACCESS OVER MPIO" in <a
|
|
href="vl_res.txt">those measurements</a>. Don't look at the absolute
|
|
numbers, look at %% of performance improvement using the second link.
|
|
The result equivalent to 200 MB/s over 2 1Gbps links, which is close to
|
|
possible maximum.</p>
|
|
|
|
<p>If free commands reorder is forbidden for a device, either
|
|
by use of ORDERED tag, or if the Queue Algorithm Modifier in the Control
|
|
Mode Page is set to 0, then MPIO will have to maintain commands order by
|
|
sending commands over only a single link. But on practice this case is
|
|
really rare and 99.(9)% of OS'es and applications allow free commands
|
|
reorder and it is enabled by default.</p>
|
|
|
|
<p>From other side, strictly preserving commands order as MC/S does has a
|
|
downside as well. It can lead to so called "commands ordering
|
|
bottleneck", when newer commands have to wait before one or more older
|
|
commands get executed, although it would be better for performance to
|
|
reorder them. As result, MPIO sometimes has better performance, than
|
|
MC/S, especially in setups, where maximum IOPS number is important. See,
|
|
for instance,
|
|
<a href="http://article.gmane.org/gmane.linux.scsi/16311">here</a>.
|
|
</p>
|
|
|
|
<h2>When MC/S is better than MPIO</h2>
|
|
|
|
<p>For sake of completeness, we should mention that there are marginal cases, where MPIO can't be used or will not
|
|
provide any benefit, but MC/S can be successful:</p>
|
|
|
|
<ol>
|
|
<li><span>When strict commands order is required.</span></li>
|
|
|
|
<li><span>When aborted commands can't be retried.</span></li>
|
|
|
|
</ol>
|
|
|
|
<p>For disks both of them are always false. However for some tape drives
|
|
and backup applications one or both can be true. But on practice:</p>
|
|
|
|
<ul>
|
|
|
|
<li><span>There are neither known tape drives, nor backup
|
|
applications, which can use multiple outstanding commands at
|
|
time. All them support and use only one single outstanding
|
|
command at time. MC/S can't increase performance for them, only
|
|
bonding can. So, in this case there no difference between MC/S
|
|
and MPIO.</span></li>
|
|
|
|
<li><span>The lack of ability to retry commands is rather a
|
|
limitation of legacy tape drives, which support only implicit
|
|
address commands, not of MPIO. Modern tape drives and backup
|
|
applications can use explicit address commands, which you can
|
|
abort and then retry, hence they are compatible with MPIO.</span></li>
|
|
|
|
</ul>
|
|
|
|
<h2>Conclusion</h2>
|
|
|
|
<p>Thus:</p>
|
|
|
|
<ol>
|
|
<li><span>Cost to develop MC/S is high, but benefits of it are marginal and with future MPIO
|
|
improvements can be fully eliminated.</span></li>
|
|
|
|
<li><span>MPIO allows to utilize existing infrastructure for all
|
|
transports, not only iSCSI.
|
|
</span></li>
|
|
|
|
<li><span>All transports can benefit from improvements in MPIO.</span></li>
|
|
|
|
<li><span>With MPIO there is no need to create multiple layers doing very similar
|
|
functionality.</span></li>
|
|
|
|
<li><span> MPIO doesn't have commands ordering bottleneck, which MC/S has. </span></li>
|
|
|
|
</ol>
|
|
|
|
<p>Simply, MC/S is rather a workaround done on the wrong level for some deficiencies of existing SCSI standards used for MPIO,
|
|
namely the lack of possibility to group several I_T Nexuses with ability to reassign commands
|
|
between them and preserve commands order among them. If in future those features added in the SCSI standards, MC/S will
|
|
not be needed at all, hence, all investments in it will be voided. No surprise then that no
|
|
Open Source OS'es neither support, nor going to implement it. Moreover,
|
|
when back to 2005 there was an attempt to add MC/S capable iSCSI initiator in Linux, it was
|
|
rejected. See for more details <a href="http://article.gmane.org/gmane.linux.scsi/15769">here</a>
|
|
and <a href="http://article.gmane.org/gmane.linux.scsi/16301">here</a>.
|
|
</p>
|
|
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<!-- wrap ends here -->
|
|
<!-- footer starts here -->
|
|
<div id="footer">
|
|
<p>© Copyright 2004 - 2021 <b><font color="#EC981F">Vladislav Bolkhovitin, Bart Van Assche & others</font></b>
|
|
Design by: <b><font color="#EC981F">Daniel Fernandes</font></b> </p>
|
|
</div>
|
|
<!-- footer ends here -->
|
|
<!-- Piwik -->
|
|
<script type="text/javascript">
|
|
var pkBaseURL = (("https:" == document.location.protocol) ? "https://apps.sourceforge.net/piwik/scst/" : "http://apps.sourceforge.net/piwik/scst/");
|
|
document.write(unescape("%3Cscript src='" + pkBaseURL + "piwik.js' type='text/javascript'%3E%3C/script%3E"));
|
|
</script><script type="text/javascript">
|
|
piwik_action_name = '';
|
|
piwik_idsite = 1;
|
|
piwik_url = pkBaseURL + "piwik.php";
|
|
piwik_log(piwik_action_name, piwik_idsite, piwik_url);
|
|
</script>
|
|
<object><noscript><p><img src="http://apps.sourceforge.net/piwik/scst/piwik.php?idsite=1" alt="piwik"></p></noscript></object>
|
|
<!-- End Piwik Tag -->
|
|
</body>
|
|
</html>
|