I'm not sure how this happened but the patch that was intended to fix abort
handling was incomplete. This patch fixes that patch as follows:
- If aborting the SCSI command failed, wait until the SCSI command
completes.
- Return SUCCESS instead of FAILED if an abort attempt races with SCSI
command completion.
- Since qla2xxx_eh_abort() increments the sp reference count by calling
sp_get(), decrement the sp reference count before returning.
Fixes: 1b4ae64d8da6 ("qla2xxx: Fix a race condition between aborting and completing a SCSI command")
[ commit 8dd9593cc07ad7d999bef81b06789ef873a94881 upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8538 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Since qla2x00_abort_srb() starts with increasing the reference count of
@sp, decrease that same reference count before returning.
[ commit d2d2b5a5741d317bed1fa38211f1f3b142d8cf7a upstram ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8537 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Current driver report dev_loss_tmo to 0 for NVMe devices with short cable
pull. This causes NVMe controller to be freed along with NVMe namespace.
The side affect is IO would stop. By not setting dev_loss_tmo to 0, NVMe
namespace would stay until cable is plugged back in. This allows IO to
resume afterward.
[ commit 03cc44bf682af289d6536eb911e928b415bd0e1f upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8535 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Leverage the generic routine, qla24xx_update_fw_options(), for the
configuration of firmware options for ISP27xx/ISP28xx.
[ commit a36f1443e6fc738c1bcfc4be80d6f1609163c614 upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8534 d57e44dd-8a1f-0410-8b47-8ef2f437770f
The following sequence of event leads to NVME port disappearing:
- device port shut
- nvme_fc_unregister_remoteport
- device port online
- remote port delete completes
- relogin is scheduled
- "post gidpn" message appears due to rscn generation # mismatch
In short, if a device comes back online sooner than an unregister
completion, a mismatch in rscn generation number occurs, which is not
handled correctly during device relogin. Fix this by starting with a redo
of GNL.
When ql2xextended_error_logging is enabled, the re-plugged device's
discovery stops with the following messages printed:
--8<--
qla2xxx [0000:41:00.0]-480d:3: Relogin scheduled.
qla2xxx [0000:41:00.0]-4800:3: DPC handler sleeping.
qla2xxx [0000:41:00.0]-2902:3: qla24xx_handle_relogin_event 21:00:00:24:ff:17:9e:91 DS 0 LS 7 P 0 del 2 cnfl
(null) rscn 1|2 login 1|2 fl 1
qla2xxx [0000:41:00.0]-28e9:3: qla24xx_handle_relogin_event 1666 21:00:00:24:ff:17:9e:91 post gidpn
qla2xxx [0000:41:00.0]-480e:3: Relogin end.
--8<--
[ commit 9e744591ef1b8df27c25c68dac858dada8688f77 upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8533 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Current code misses or fails to account for proper recovery during early
initialization failures:
- Properly unwind allocations during probe() failures.
- Protect against non-initialization memory allocations during
unwinding.
- Propagate error status during HW initialization.
- Release SCSI host reference when memory allocations fail.
[ commit 26a77799195f4ff105f877042012c7fb355b3da1 upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8532 d57e44dd-8a1f-0410-8b47-8ef2f437770f
For any qla2xxx async command, the SRB buffer is used to send it. In
setting up the SRB buffer, the timer for this command is started before all
memory allocation has finished. Under low memory pressure, memory alloc
can go to sleep and not wake up before the timer expires. Once timer has
expired, the timer thread will access uninitialize fields resulting into
NULL pointer crash.
This patch fixes this crash by moving the start of timer after everything
is setup.
backtrace shows following
PID: 3720 TASK: ffff996928401040 CPU: 0 COMMAND: "qla2xxx_1_dpc"
0 [ffff99652751b698] __schedule at ffffffff965676c7
1 [ffff99652751b728] schedule at ffffffff96567bc9
2 [ffff99652751b738] schedule_timeout at ffffffff965655e8
3 [ffff99652751b7e0] io_schedule_timeout at ffffffff9656726d
4 [ffff99652751b810] congestion_wait at ffffffff95fd8d12
5 [ffff99652751b870] isolate_migratepages_range at ffffffff95fddaf3
6 [ffff99652751b930] compact_zone at ffffffff95fdde96
7 [ffff99652751b980] compact_zone_order at ffffffff95fde0bc
8 [ffff99652751ba20] try_to_compact_pages at ffffffff95fde481
9 [ffff99652751ba80] __alloc_pages_direct_compact at ffffffff9655cc31
10 [ffff99652751bae0] __alloc_pages_slowpath at ffffffff9655d101
11 [ffff99652751bbd0] __alloc_pages_nodemask at ffffffff95fc0e95
12 [ffff99652751bc80] dma_generic_alloc_coherent at ffffffff95e3217f
13 [ffff99652751bcc8] x86_swiotlb_alloc_coherent at ffffffff95e6b7a1
14 [ffff99652751bcf8] qla2x00_rft_id at ffffffffc055b5e0 [qla2xxx]
15 [ffff99652751bd50] qla2x00_loop_resync at ffffffffc0533e71 [qla2xxx]
16 [ffff99652751be68] qla2x00_do_dpc at ffffffffc05210ca [qla2xxx]
PID: 0 TASK: ffffffff96a18480 CPU: 0 COMMAND: "swapper/0"
0 [ffff99652fc03ae0] machine_kexec at ffffffff95e63674
1 [ffff99652fc03b40] __crash_kexec at ffffffff95f1ce12
2 [ffff99652fc03c10] crash_kexec at ffffffff95f1cf00
3 [ffff99652fc03c28] oops_end at ffffffff9656c758
4 [ffff99652fc03c50] no_context at ffffffff9655aa7e
5 [ffff99652fc03ca0] __bad_area_nosemaphore at ffffffff9655ab15
6 [ffff99652fc03cf0] bad_area_nosemaphore at ffffffff9655ac86
7 [ffff99652fc03d00] __do_page_fault at ffffffff9656f6b0
8 [ffff99652fc03d70] do_page_fault at ffffffff9656f915
9 [ffff99652fc03da0] page_fault at ffffffff9656b758
[exception RIP: unknown or invalid address]
RIP: 0000000000000000 RSP: ffff99652fc03e50 RFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff99652b79a600 RCX: ffff99652b79a760
RDX: ffff99652b79a600 RSI: ffffffffc0525ad0 RDI: ffff99652b79a600
RBP: ffff99652fc03e60 R8: ffffffff96a18a18 R9: ffffffff96ee3c00
R10: 0000000000000002 R11: ffff99652fc03de8 R12: ffff99652b79a760
R13: 0000000000000100 R14: ffffffffc0525ad0 R15: ffff99652b79a600
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
10 [ffff99652fc03e50] qla2x00_sp_timeout at ffffffffc0525af8 [qla2xxx]
11 [ffff99652fc03e68] call_timer_fn at ffffffff95ea7f58
12 [ffff99652fc03ea0] run_timer_softirq at ffffffff95eaa3bd
13 [ffff99652fc03f18] __do_softirq at ffffffff95ea0f05
14 [ffff99652fc03f88] call_softirq at ffffffff9657832c
15 [ffff99652fc03fa0] do_softirq at ffffffff95e2e675
16 [ffff99652fc03fc0] irq_exit at ffffffff95ea1285
17 [ffff99652fc03fd8] smp_apic_timer_interrupt at ffffffff965796c8
18 [ffff99652fc03ff0] apic_timer_interrupt at ffffffff96575df2
[ commit 3a4b6cc7332130ac5cbf3b505d8cddf0aa2ea745 upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8530 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Relogin fails to move forward due to scan_state flag indicating device is
not there. Before relogin process, Session delete process accidently
modified the scan_state flag.
[ commit 8b5292bcfcacf15182a77a973a98d310e76fd58b upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8529 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Reject eh_{abort|device_reset|target_reset} when rport is being torn down
or chip is down.
[ commit 7f4374e67b3046c9628cf0ab93a117704a38e95d upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8528 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Firmware dump captured during LOOP Init error does not yield any
significant information. This patch removes call to trigger firmware dump
collection during Loop Initialization.
[ commit 5e5402c147083786db2238302e25c44b7a7dc5e9 upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8527 d57e44dd-8a1f-0410-8b47-8ef2f437770f
For target mode, the default number of Q-Pairs allowed to use is 2. If the
number of Q-Pairs allocated is lower than the default Q-Pairs, then lower
value should be the set as default.
[ commit 178235f43ea142cf0f07dba67657494fcec21254 upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8526 d57e44dd-8a1f-0410-8b47-8ef2f437770f
If an abort times out, the Abort IOCB completion and Abort timer can race
against each other. This patch provides unique error code for timer path to
allow proper cleanup.
[ commit 0c6df59061b23c7a951836d23977be34e896d3da upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8525 d57e44dd-8a1f-0410-8b47-8ef2f437770f
With debug kernel we see following wanings indicating memory leak.
[28809.523959] WARNING: CPU: 3 PID: 6790 at lib/dma-debug.c:978
dma_debug_device_change+0x166/0x1d0
[28809.523964] pci 0000:0c:00.6: DMA-API: device driver has pending DMA
allocations while released from device [count=5]
[28809.523964] One of leaked entries details: [device
address=0x00000002aefe4000] [size=8208 bytes] [mapped with DMA_BIDIRECTIONAL]
[mapped as coherent]
Fix this by unmapping DMA memory.
[ commit 5d328de64d89400dcf9911125844d8adc0db697f upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8523 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Use vzalloc instead of using vmalloc to allocate memory and then zeroing it
with memset. This simplifies the code.
[ commit 56cc8fae5f7e9f38cb367754c52491ba1645d1bf upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8522 d57e44dd-8a1f-0410-8b47-8ef2f437770f
A null check before dma_pool_destroy is redundant, so remove it. This is
detected by coccinelle.
[ commit 0b3b6fe299c471e44ed8713b7a602882626e693f upstream ]
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8521 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Since r8322 was the result of an incorrect application of an upstream patch,
revert it.
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8520 d57e44dd-8a1f-0410-8b47-8ef2f437770f
r8478 was not necessary to fix the reported problem. Additionally, it
introduced a new problem, namely that detach_tgt was not called if the
associated device was deleted after the LUN was deleted and before it
was freed.
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8517 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Avoid that deleting a device concurrently with scst_cm_init_inq_finish()
causes command processing to hang.
Reported-by: valera <valer4ik@users.sourceforge.net>
Fixes: 0bb6de9471 ("scst_vdisk: Avoid that LUN refresh triggers a general protection fault" / r7101)
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8515 d57e44dd-8a1f-0410-8b47-8ef2f437770f
This patch makes the implementation of scst_prepare_request_sense() consistent
with the other code that submits internal commands.
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8512 d57e44dd-8a1f-0410-8b47-8ef2f437770f
This patch does not change any functionality but improves source code
readability.
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8511 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Some scripts that use the SCST sysfs interface depend on filp_close()
having been called before device deletion via the sysfs interface
finishes. Hence make device deletion again synchronous.
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8510 d57e44dd-8a1f-0410-8b47-8ef2f437770f
Merge scst_free_device() and scst_finally_free_device() into a single
function. Increase dev->refcnt when registering a device or virtual
device. Kill and decrease dev->refcnt when unregistering a device or
virtual device. These changes ensure that a scst_free_device() is
only called after all users (commands and tgt_devs) have stopped
accessing the SCST device.
Fixes: 3f2d50b589 ("scst: Do not suspend command processing when deleting a device"; r8067)
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8501 d57e44dd-8a1f-0410-8b47-8ef2f437770f
scst_free_tgt_dev() waits until pending I/O commands have finished and hence
can take a while.
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@8487 d57e44dd-8a1f-0410-8b47-8ef2f437770f