Vladislav Bolkhovitin 3d2959d5bf Fix for qla2x00t deadlock crash
The symptom of the crash is that one finds the system deadlocked
spinning on scsi_qla_host_t.hardware_lock in qla2x00_enable_tgt_mode
with a stack something like this:

crash> bt
PID: 6155   TASK: ffff88006e4bc3c0  CPU: 1   COMMAND: "scst_uid"
 #0 [ffff88007b915b28] machine_kexec at ffffffff8103163b
 #1 [ffff88007b915b88] crash_kexec at ffffffff810b8e52
 #2 [ffff88007b915c58] panic at ffffffff814ed0ab
 #3 [ffff88007b915cd8] spin_bug at ffffffff8127cd46
 #4 [ffff88007b915d18] _raw_spin_lock at ffffffff8127d015
 #5 [ffff88007b915d68] _spin_lock_irqsave at ffffffff814f02e4
 #6 [ffff88007b915d88] qla2x00_enable_tgt_mode at ffffffffa047b672 [qla2xxx]
 #7 [ffff88007b915db8] q2t_host_action at ffffffffa06db6a6 [qla2x00tgt]
 #8 [ffff88007b915df8] q2t_enable_tgt at ffffffffa06db6ea [qla2x00tgt]
 #9 [ffff88007b915e18] scst_process_tgt_enable_store at ffffffffa04f102e [scst]
#10 [ffff88007b915e48] scst_tgt_enable_store_work_fn at ffffffffa04f1176 [scst]
#11 [ffff88007b915e58] scst_process_sysfs_works at ffffffffa04e8bbe [scst]
#12 [ffff88007b915e78] sysfs_work_thread_fn at ffffffffa04e8db5 [scst]
#13 [ffff88007b915ed8] kthread at ffffffff8108f976
#14 [ffff88007b915f48] kernel_thread at ffffffff8100c20a

I was pulling my hair out on this one, because with the spinlock
debugging (enhanced to capture the PID along with the task pointer), I
figured out that the task (and process) that originally locked the lock
was gone!  It got really confusing when I added more spinlock debug code
to the kernel to detect locks held in the task switching and
task/process termination paths -- and didn't catch anything terminating
with locks held!

I finally tracked the problem down to two things:

1.  When qla24xx_create_vhost creates a new virtual scsi_qla_host_t it
does it by copying the physical (aka parent) scsi_qla_host_t.  Under the
right unlucky conditions, this can happen with the hardware_lock held
(the spinlock is embedded in the structure).

2.  The code should only be locking the hardware_lock of the physical
scsi_qla_host_t, because the lock is associated with the hardware.
Unfortunately, quite a few places are not using to_qla_parent to make
sure they lock the correct lock.  One of those places is
qla2x00_enable_tgt_mode.  Along with the deadlock, this has the
potential to leave the hardware and driver structures in unpredictable
states, because the lock isn't always providing serialization.

The fix entails two things:

1. Zeroing the lock after copying the scsi_qla_host_t structure:  This
won't stop the deadlock, but will enable the spinlock debug code to
easily catch anything that misbehaves and locks the wrong lock.  I also
initialized the other locks because they could have the same problem.  I
also initialized the list heads, because they could end up holding
dangling references.  I did not initialize all pointers, because there
are quite a few that point to read only data and are OK (and I didn't
have time to research all of them).

2.  Using to_qla_parent everywhere when locking and the scsi_qla_host_t
structure might be virtual.  This is a lot of changes, but they are the
same thing over and over again.

I did not make an effort to look for scalar or pointer fields that are
being picked from the wrong structure.  That's getting to be as much
pain as merging up to the latest QLogic driver (which would have gotten
rid of this problem).

From "Robinson, Herbie" <Herbie.Robinson@stratus.com>




git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@4420 d57e44dd-8a1f-0410-8b47-8ef2f437770f
2012-07-24 19:01:47 +00:00
2012-06-28 06:28:37 +00:00
2012-07-18 21:28:57 +00:00
2012-07-18 21:28:57 +00:00
2012-07-18 21:28:57 +00:00
2012-07-18 21:28:57 +00:00
2012-07-24 19:01:47 +00:00
2012-07-18 21:28:57 +00:00
2012-07-18 21:28:57 +00:00
2012-07-18 21:28:57 +00:00
2011-04-19 22:56:07 +00:00
2012-04-24 21:34:11 +00:00
2008-06-26 16:35:10 +00:00
2012-01-26 05:43:48 +00:00

This is the SCST development repository. It contains not a single
project SCST as one can think, but a number of them, which are divided
as the following:

1. SCST core in scst/ subdirectory

2. Administration utility for SCST core scstadmin in scstadmin/

3. Target drivers in own subdirectories qla2x00t/, iscsi-scst/, etc.

4. User space programs in usr/ subdirectory, like fileio_tgt.

5. Some various docs in doc/ subdirectory.

Those subprojects are in most cases independent from each other,
although some of them depend from the SCST core. They put in the single
repository only to simplify their development, they are released
independently.

Thus, use "make all" only if you really need to build everything.
Otherwise build only what you need, like for iSCSI-SCST:

make scst scst_install iscsi iscsi_install

For more information about each subproject see their README files.

Vladislav Bolkhovitin <vst@vlnb.net>, http://scst.sourceforge.net
Description
No description provided
Readme 33 MiB
Languages
C 90.1%
Perl 4.2%
Shell 1.8%
HTML 1.7%
Makefile 1.2%
Other 0.9%