Commit Graph

9282 Commits

Author SHA1 Message Date
Brian M
b92e091999 scst: add pr_state sysfs attribute for PR state save/restore
Add a read/write pr_state attribute to scst_device that serializes the
current persistent reservation state (generation, reservation type/scope,
and all registrants with their transport IDs) to a text format, and
restores it from the same format.

This provides a stable interface for saving and restoring PR state across
device transitions where the in-memory state would otherwise be lost.
2026-03-12 13:58:22 +03:00
Brian M
7a4ab042e6 iscsi-scst: Fix performance regression
A recent change in iscsi_threads_pool_get changed 2 from being the
minimum count to the maximum.  Rectify.
2026-03-04 11:41:03 +03:00
Brian M
84a23a254b qla2xxx: fix session free stall by using scst_unregister_session wait=0
Calling scst_unregister_session(wait=1) from qlt_free_session_done
blocks the qla2xxx_wq worker until session teardown completes, but
teardown requires the TM thread to process SCST_UNREG_SESS_TM while
the TM thread is blocked on scst_mutex held by concurrent session
teardown in the global management thread. Under load this stall
exceeds the hung-task timeout. Switch to wait=0 and wait on
fcport->unreg_done instead, matching the pattern in iscsi-scst.
2026-03-02 10:18:30 +03:00
Brian M
1e94a45bd3 scstadmin/init.d: unload qla2x00tgt before qla2xxx_scst
qla2x00tgt holds a reference on qla2xxx_scst, so it must be unloaded
first.  The wrong order caused a 30-second timeout on every shutdown.
2026-03-02 10:11:36 +03:00
Brian M
7ba376819c debian, dev_handlers: Allow conditional compilation of modules
Add the ability to turn off compilation of scst_local, srpt and
several dev_handler. Default to compiling (i.e. no change).
2026-02-18 10:17:23 +03:00
Gleb Chesnokov
8d1673b2ac qla2x00t-32gbit: Fix build for kernel versions before v4.8 2026-02-12 11:19:30 +03:00
Gleb Chesnokov
afe15f2755 qla2x00t-32gbit: edif: Fix dma_free_coherent() size
Earlier in the function, the ha->flt buffer is allocated with size
sizeof(struct qla_flt_header) + FLT_REGIONS_SIZE but freed in the error
path with size SFP_DEV_SIZE.

Fixes: 84318a9f01ce ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://patch.msgid.link/20260112134326.55466-2-fourier.thomas@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 56bd3c0f749f upstream ]
2026-02-12 11:19:30 +03:00
Gleb Chesnokov
44e851bddd qla2x00t-32gbit: Sanitize payload size to prevent member overflow
In qla27xx_copy_fpin_pkt() and qla27xx_copy_multiple_pkt(), the frame_size
reported by firmware is used to calculate the copy length into
item->iocb. However, the iocb member is defined as a fixed-size 64-byte
array within struct purex_item.

If the reported frame_size exceeds 64 bytes, subsequent memcpy calls will
overflow the iocb member boundary. While extra memory might be allocated,
this cross-member write is unsafe and triggers warnings under
CONFIG_FORTIFY_SOURCE.

Fix this by capping total_bytes to the size of the iocb member (64 bytes)
before allocation and copying. This ensures all copies remain within the
bounds of the destination structure member.

Fixes: 875386b98857 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com>
Link: https://patch.msgid.link/20260106205344.18031-1-jiashengjiangcool@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 19bc5f2a6962 upstream ]
2026-02-12 11:19:30 +03:00
Gleb Chesnokov
2e4f2dc5dc qla2x00t-32gbit: Enable/disable IRQD_NO_BALANCING during reset
A dynamic remove/add storage adapter test hits EEH on PowerPC:

  EEH: [c00000000004f77c] __eeh_send_failure_event+0x7c/0x160
  EEH: [c000000000048464] eeh_dev_check_failure.part.0+0x254/0x660
  EEH: [c000000000934e0c] __pci_read_msi_msg+0x1ac/0x280
  EEH: [c000000000100f68] pseries_msi_compose_msg+0x28/0x40
  EEH: [c00000000020e1cc] irq_chip_compose_msi_msg+0x5c/0x90
  EEH: [c000000000214b1c] msi_domain_set_affinity+0xbc/0x100
  EEH: [c000000000206be4] irq_do_set_affinity+0x214/0x2c0
  EEH: [c000000000206e04] irq_set_affinity_locked+0x174/0x230
  EEH: [c000000000207044] irq_set_affinity+0x64/0xa0
  EEH: [c000000000212890] write_irq_affinity.constprop.0.isra.0+0x130/0x150
  EEH: [c00000000068868c] proc_reg_write+0xfc/0x160
  EEH: [c0000000005adb48] vfs_write+0xf8/0x4e0
  EEH: [c0000000005ae234] ksys_write+0x84/0x140
  EEH: [c00000000002e994] system_call_exception+0x164/0x310
  EEH: [c00000000000bfe8] system_call_vectored_common+0xe8/0x278

The irqbalance daemon kicks in before invoking qla2xxx->slot_reset
during the EEH recovery process.

  irqbalance daemon
  ->irq_set_affinity()
  ->msi_domain_set_affinity()
  ->irq_chip_set_affiinity_parent()
  ->xive_irq_set_affinity()
  ->pseries_msi_compose_ms()
  ->__pci_read_msi_msg()
  ->irq_chip_compose_msi_msg()

In __pci_read_msi_msg(), the first MSI-X vector is set to all F by the
irqbalance daemon.  pci_write_msg_msix: index=0, lo=ffffffff hi=fffffff

IRQ balancing is not required during adapter reset.

Enable "IRQ_NO_BALANCING" bit before starting adapter reset and disable
it calling pci_restore_state(). The irqbalance daemon is disabled for
this short period of time (~2s).

Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
Link: https://patch.msgid.link/20251028142427.3969819-3-wenxiong@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit eaea513077cd upstream ]
2026-02-12 11:19:30 +03:00
Gleb Chesnokov
15d25e24a5 nightly build: Update kernel versions
Another kernel versions update
2026-02-12 10:55:58 +03:00
Lev Vainblat
6c6e7251b2 iscsi-scst: Fix wrong variable in chap_calc_digest_af_alg memcpy
Use 'bytes' (the return value from af_alg_final) instead of 'res'
(which is 0 after the last successful af_alg_update call) when
copying the digest.

This bug caused the memcpy to copy 0 bytes, resulting in an
uninitialized digest buffer. It also triggered a GCC
-Werror=stringop-overflow warning because 'res' could theoretically
be negative, leading to a huge unsigned size.
2026-01-23 12:43:06 +03:00
Gleb Chesnokov
a2add2daa3 scstadmin.spec: Drop legacy symlink check in RPM %files
The symlink-based %files conditional was added when scstadmin supported
a procfs variant. Procfs support is legacy now, so drop the conditional
and always package the man pages.
2025-12-31 14:55:44 +03:00
Gleb Chesnokov
2d5862d4ae scst/include/backport.h: Unbreak the Ubuntu 20.04/22.04 build 2025-12-30 11:35:28 +03:00
Gleb Chesnokov
b2a1f6e66a Bump the version number to 3.11.0-pre
These changes have been generated by running the following command:

$ scripts/update-version 3 11 0 -pre
2025-12-29 13:06:44 +03:00
Gleb Chesnokov
1b48521653 nightly build: Update kernel versions v3.10 2025-12-29 12:40:21 +03:00
Gleb Chesnokov
3f9bb45ccb www: Update the version number from 3.8 to 3.9 2025-12-29 12:40:21 +03:00
Gleb Chesnokov
d4cb03e2b8 Bump the version number to 3.10.0
These changes have been generated by running the following command:

$ scripts/update-version 3 10 0
2025-12-29 12:40:21 +03:00
Gleb Chesnokov
60ba03998c scst/ChangeLog: Summarize the changes for the upcoming 3.10 release 2025-12-29 12:40:21 +03:00
Gleb Chesnokov
d11040a0b1 scst_lib: Use bdev_fput() to release bdev files
See also upstream commit 22650a99821d ("fs,block: yield devices
early") # v6.9.
2025-12-12 13:55:53 +03:00
Tony Battersby
1a7cfc8e68 scst_cmd_set_sn: remove lockless fast path
The lockless fast path for scst_cmd_set_sn() can cause commands to
lockup in state EXEC_CHECK_SN when there are multiple scst_tgts
accessing the same scst_device, for example two initiators
connected to the two ports of a dual-port QLogic FC HBA in target mode
both reading from the same shared disk.  The multithreaded_init_done
value is too low-level for this; it does not take the higher-level
configuration into account.

- Remove the lockless fast path.
- Remove multithreaded_init_done, which enabled/disabled the lockless
  fast path.
- Push the locking down into scst_cmd_set_sn(), which will now apply
  regardless of set_sn_on_restart_cmd, which matters for mixed-driver
  (e.g. iSCSI+qla2xxx) target-mode setups.
- Remove a bunch of comments explaining the rules for the lockless fast
  path.

Fixes: https://github.com/SCST-project/scst/issues/333
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
2025-12-11 12:19:22 +03:00
Gleb Chesnokov
cb6cdf3a82 qla2x00t, qla2x00t-32gbit: Port to Linux kernel v6.19
Support for the following changes in the Linux kernel v6.19:

  - 383d89699c50 ("treewide: Drop pci_save_state() after
    pci_restore_state()")
2025-12-10 21:42:47 +03:00
Gleb Chesnokov
12c870abe9 qla2x00t-32gbit: Fix improper freeing of purex item
In qla2xxx_process_purls_iocb(), an item is allocated via
qla27xx_copy_multiple_pkt(), which internally calls
qla24xx_alloc_purex_item().

The qla24xx_alloc_purex_item() function may return a pre-allocated item
from a per-adapter pool for small allocations, instead of dynamically
allocating memory with kzalloc().

An error handling path in qla2xxx_process_purls_iocb() incorrectly uses
kfree() to release the item. If the item was from the pre-allocated
pool, calling kfree() on it is a bug that can lead to memory corruption.

Fix this by using the correct deallocation function,
qla24xx_free_purex_item(), which properly handles both dynamically
allocated and pre-allocated items.

Fixes: 875386b98857 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com>
Link: https://patch.msgid.link/20251113151246.762510-1-zilin@seu.edu.cn
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 78b1a242fe61 upstream ]
2025-12-10 21:42:47 +03:00
Gleb Chesnokov
b5777ff929 scst: annotate workqueues for WQ_PERCPU / WQ_UNBOUND
Upstream workqueue changes introduce a new WQ_PERCPU flag and plan to
switch alloc_workqueue()'s default from per-CPU to unbound

To kepp SCST behaviour unchanged across kernels, this patch makes all
alloc_workqueue() users explicit about whether they want per-CPU or
unbound queues.
2025-12-10 21:42:47 +03:00
Gleb Chesnokov
b097d010fd qla2x00t-32gbit: Replace use of system_unbound_wq with system_dfl_wq
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the
API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be
used.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-2-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 49783aca15fb upstream ]
2025-12-10 21:42:47 +03:00
Gleb Chesnokov
fe28091a05 qla2x00t-32gbit: Backport to older kernel versions 2025-12-09 22:33:47 +03:00
Tony Battersby
64918c69a2 qla2x00t-32gbit: target: Add on_abort_cmd callback
This enables the initiator to abort commands that are stuck pending in
the HW without waiting for a timeout.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
2025-12-09 22:33:47 +03:00
Tony Battersby
f4b52771b6 qla2x00t-32gbit: target: Improve safety of cmd lookup by handle
The driver associates two different structs with numeric handles and
passes the handles to the hardware.  When the hardware passes the handle
back to the driver, the driver consults a table of void * to convert the
handle back to the struct without checking the type of struct.  This can
lead to type confusion if the HBA firmware misbehaves (and some firmware
versions do).  So verify the type of struct is what is expected before
using it.

But we can also do better than that.  Also verify that the exchange
address of the message sent from the hardware matches the exchange
address of the command being returned.  This adds an extra guard against
buggy HBA firmware that returns duplicate messages multiple times (which
has also been seen) in case the driver has reused the handle for a
different command of the same type.

These problems were seen on a QLE2694L with firmware 9.08.02 when
testing SLER / SRR support.  The SRR caused the HBA to flood the
response queue with hundreds of bogus entries.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/7c7cb574-fe62-42ae-b800-d136d8dd89ca@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 4f5eb50f7c82 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
6f2d20360e qla2x00t-32gbit: target: Add back SRR support
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
2025-12-09 22:33:47 +03:00
Tony Battersby
46e11e4b32 qla2x00t-32gbit: target: Add back SRR support
Background: loading qla2xxx with "ql2xtgt_tape_enable=1" enables
Sequence Level Error Recovery (SLER), which is most commonly used for
tape drives.  With SLER enabled, if there is a recoverable I/O error
during a SCSI command, a Sequence Retransmission Request (SRR) will be
used to retry the I/O at a low-level completely within the driver
without propagating the error to the upper levels of the SCSI stack.

SRR support was removed in 2017 by commit 2c39b5ca2a8c ("qla2xxx: Remove
SRR code"). Add it back, new and improved.

The old removed SRR code used sequence numbers to correlate the SRR
CTIOs with SRR immediate notify messages.  I don't see how that would
work reliably with MSI-X interrupts and multiple queues.  So instead use
the exchange address to find the command associated with the immediate
notify (qlt_srr_to_cmd).

The old removed SRR code had a function qlt_check_srr_debug() to
simulate a SRR, but it didn't work for me.  Instead I just used fiber
optic attenuators attached to the FC cable to reduce the strength of the
signal and induce errors.  Unfortunately this only worked for inducing
SRRs on Data-Out (write) commands, so that is all I was able to test.

The code to build a new scatterlist for a SRR with nonzero offset has
been improved to reduce memory requirements and has been well-tested.
However it does not support protection information.

When a single cmd gets multiple SRRs, the old removed SRR code would
restore the data buffer from the values in cmd->se_cmd before processing
the new SRR.  That might be needed if the offset for the new SRR was
lower than the offset for the previous SRR, but I am not sure if that
can happen.  In my testing, when a single cmd gets multiple SRRs, the
SRR offset always increases or stays the same.  But in case it can
decrease, I added the function qlt_restore_orig_sg().  If this is not
supposed to happen then qlt_restore_orig_sg() can be removed to simplify
the code.

I ran into some HBA firmware bugs with QLE269x, QLE27xx, and QLE28xx
firmware 9.05.xx - 9.08.xx where a SRR would cause the HBA to misbehave
badly.  Since SRRs are rare and therefore difficult to test, I figured
it would be worth checking for the buggy firmware and disabling SLER
with a warning instead of letting others run into the same problem on
the rare occasion that they get a SRR.  This turned out to be difficult
because the firmware version isn't known in the normal NVRAM config
routine, so I added a second NVRAM config routine that is called after
the firmware version is known.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/654b7181-b79e-40ed-a15b-6d6e441a5d5f@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit c7bd85a7b9c5 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
cdbe624028 qla2x00t-32gbit: target: Improve cmd logging
- Add the command tag to various messages so that different messages
  about the same command can be correlated.

- For CTIO errors (i.e. when the HW reports an error about a cmd), print
  the cmd tag, opcode, state, initiator WWPN, and LUN.  This info helps
  an administrator determine what is going wrong.

- When a command experiences a transport error, log a message when it
  is freed.  This makes debugging exceptions easier.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/c579987d-5658-41ae-9653-f0e58c9d1880@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 04957d8c9852 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
ead8a100c9 qla2x00t-32gbit: target: Add cmd->rsp_sent
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
2025-12-09 22:33:47 +03:00
Tony Battersby
658dce1cae qla2x00t-32gbit: target: Add cmd->rsp_sent
Add cmd->rsp_sent to indicate that the SCSI status has been sent
successfully, so that SCST can be informed of any transport errors.
This will also be used for logging in later patches.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/d4b0203f-7817-4517-9789-5866bb24fad7@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit f4199d581256 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
22a6aabf23 qla2x00t-32gbit: target: Fix invalid memory access with big CDBs
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
2025-12-09 22:33:47 +03:00
Tony Battersby
c7e629c8fb qla2x00t-32gbit: target: Fix invalid memory access with big CDBs
struct atio7_fcp_cmnd is a variable-length data structure because of
add_cdb_len, but it is embedded in struct atio_from_isp and copied
around like a fixed-length data structure.  For big CDBs > 16 bytes,
get_datalen_for_atio() called on a fixed-length copy of the atio will
access invalid memory.

In some cases this can be fixed by moving the atio to the end of the
data structure and using a variable-length allocation.  In other cases
such as allocating struct qla_tgt_cmd, the fixed-length data structures
are preallocated for speed, so in the case that add_cdb_len != 0,
allocate a separate buffer for the CDB.  Also add memcpy_atio() as a
safeguard against invalid memory accesses.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/306a9d0b-3c89-42fc-a69c-eebca8171347@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 091719c21d5a upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
f9023c31ef qla2x00t-32gbit: Fix TMR failure handling
(target mode)

If handle_tmr() fails:

- The code for QLA_TGT_ABTS results in memory-use-after-free and
  double-free:
	qlt_do_tmr_work()
		qlt_build_abts_resp_iocb()
			qpair->req->outstanding_cmds[h] = (srb_t *)mcmd;
		mempool_free(mcmd, qla_tgt_mgmt_cmd_mempool); FIRST FREE
	qlt_handle_abts_completion()
		mcmd = qlt_ctio_to_cmd()
			cmd = req->outstanding_cmds[h];
			return cmd;
		vha  = mcmd->vha; USE-AFTER-FREE
		ha->tgt.tgt_ops->free_mcmd(mcmd); SECOND FREE

- qlt_send_busy() makes no sense because it sends a SCSI command
  response instead of a TMR response.

Instead just call qlt_xmit_tm_rsp() to send a TMR failed response,
since that code is well-tested and handles a number of corner cases.
But it would be incorrect to call ha->tgt.tgt_ops->free_mcmd() after
handle_tmr() failed, so add a flag to mcmd indicating the proper way to
free the mcmd so that qlt_xmit_tm_rsp() can be used for both cases.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/09a1ff3d-6738-4953-a31b-10e89c540462@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 3d56983cc6f0 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
33bfeab7d0 qla2x00t-32gbit: target: Improve checks in qlt_xmit_response / qlt_rdy_to_xfer
Similar fixes to both functions:

qlt_xmit_response:

 - If the cmd cannot be processed, remember to call ->free_cmd() to
   prevent the target-mode midlevel from seeing a cmd lockup.

 - Do not try to send the response if the exchange has been terminated.

 - Check for chip reset once after lock instead of both before and after
   lock.

 - Give errors from qlt_pre_xmit_response() a lower priority to
   compensate for removing the first check for chip reset.

qlt_rdy_to_xfer:

 - Check for chip reset after lock instead of before lock to avoid
   races.

 - Do not try to receive data if the exchange has been terminated.

 - Give errors from qlt_pci_map_calc_cnt() a lower priority to
   compensate for moving the check for chip reset.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/cd6ccd31-33fa-4454-be36-507bf578a546@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 5c50d84798eb upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
dd23781350 qla2x00t-32gbit: target: Fix races with aborting commands
sqa_on_hw_pending_cmd_timeout() currently unmaps DMA, sets
outstanding_cmds[h] to NULL, and forces the command to complete.  This
could cause a kernel crash if the HW later accesses the DMA mapping.
It can also cause other problems if outstanding_cmds[h] is reused for a
different command.  Fix by doing this instead:

- In sqa_on_hw_pending_cmd_timeout(), call qlt_send_term_exchange()
  first and then restart the timeout.  After another timeout, reset the
  ISP.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
2025-12-09 22:33:47 +03:00
Tony Battersby
c704a87271 qla2x00t-32gbit: target: Fix races with aborting commands
cmd->cmd_lock only protects cmd->aborted, but when deciding how to
process a cmd, it is necessary to consider other factors such as
cmd->state and if the chip has been reset, which are protected by
qpair->qp_lock_ptr.  So replace cmd_lock with qp_lock_ptr, whick makes
it possible to check additional values and make decisions about what to
do without racing with the CTIO handler and other code.

 - Lock cmd->qpair->qp_lock_ptr when aborting a cmd.

 - Eliminate cmd->cmd_lock and change cmd->aborted to a bitfield since
   it is now protected by qp_lock_ptr just like all the other flags.

 - Add another command state QLA_TGT_STATE_DONE to avoid any possible
   races between qlt_abort_cmd() and tgt_ops->free_cmd().

 - Add the cmd->sent_term_exchg flag to indicate if
   qlt_send_term_exchange() has already been called.

 - Export qlt_send_term_exchange() for SCST so that it can be called
   directly instead of trying to make qlt_abort_cmd() work for both TMR
   abort and HW timeout.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/2c8d03e4-308b-4d5a-a418-a334be23f815@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 17488f139074 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
2c8529b5ba qla2x00t-32gbit: Clear cmds after chip reset
Commit aefed3e5548f ("scsi: qla2xxx: target: Fix offline port handling
and host reset handling") caused two problems:

1. Commands sent to FW, after chip reset got stuck and never freed as FW
   is not going to respond to them anymore.

2. BUG_ON(cmd->sg_mapped) in qlt_free_cmd().  Commit 26f9ce53817a
   ("scsi: qla2xxx: Fix missed DMA unmap for aborted commands")
   attempted to fix this, but introduced another bug under different
   circumstances when two different CPUs were racing to call
   qlt_unmap_sg() at the same time: BUG_ON(!valid_dma_direction(dir)) in
   dma_unmap_sg_attrs().

So revert "scsi: qla2xxx: Fix missed DMA unmap for aborted commands" and
partially revert "scsi: qla2xxx: target: Fix offline port handling and
host reset handling" at __qla2x00_abort_all_cmds.

Fixes: aefed3e5548f ("scsi: qla2xxx: target: Fix offline port handling and host reset handling")
Fixes: 26f9ce53817a ("scsi: qla2xxx: Fix missed DMA unmap for aborted commands")
Co-developed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/0e7e5d26-e7a0-42d1-8235-40eeb27f3e98@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit d46c69a087aa upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
61518181f5 qla2x00t-32gbit: target: Fix term exchange when cmd_sent_to_fw == 1
Properly set the nport_handle field of the terminate exchange message.
Previously when this field was not set properly, the term exchange would
fail when cmd_sent_to_fw == 1 but work when cmd_sent_to_fw == 0 (i.e. it
would fail when the HW was actively transferring data or status for the
cmd but work when the HW was idle).  With this change, term exchange
works in any cmd state, which now makes it possible to abort a command
that is locked up in the HW.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/1a221699-969b-4f28-8ea4-395d2f7a7c0a@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit ed382b95f5de upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
c89a047a59 qla2x00t-32gbit: target: Improve debug output for term exchange
Print better debug info when terminating a command, and print the
response status from the hardware.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/22f8a0b6-0e24-474d-9f28-9d65c9b7af03@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit c34e373f535e upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
635ea86df3 qla2x00t-32gbit: target: Remove code for unsupported hardware
As far as I can tell, CONTINUE_TGT_IO_TYPE and CTIO_A64_TYPE are message
types from non-FWI2 boards (older than ISP24xx), which are not supported
by qla_target.c.  Removing them makes it possible to turn a void * into
the real type and avoid some typecasts.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/cb006628-e321-4e30-a60b-08b37b8685a5@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 9da4e1dcea46 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
6ea8d5e6ae qla2x00t-32gbit: Use reinit_completion on mbx_intr_comp
If a mailbox command completes immediately after
wait_for_completion_timeout() times out, ha->mbx_intr_comp could be left
in an inconsistent state, causing the next mailbox command not to wait
for the hardware.  Fix by reinitializing the completion before use.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/11b6485e-0bfd-4784-8f99-c06a196dad94@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 957aa5974989 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
41692d22dd qla2x00t-32gbit: Fix lost interrupts with qlini_mode=disabled
When qla2xxx is loaded with qlini_mode=disabled,
ha->flags.disable_msix_handshake is used before it is set, resulting in
the wrong interrupt handler being used on certain HBAs
(qla2xxx_msix_rsp_q_hs() is used when qla2xxx_msix_rsp_q() should be
used).  The only difference between these two interrupt handlers is that
the _hs() version writes to a register to clear the "RISC" interrupt,
whereas the other version does not.  So this bug results in the RISC
interrupt being cleared when it should not be.  This occasionally causes
a different interrupt handler qla24xx_msix_default() for a different
vector to see ((stat & HSRX_RISC_INT) == 0) and ignore its interrupt,
which then causes problems like:

qla2xxx [0000:02:00.0]-d04c:6: MBX Command timeout for cmd 20,
  iocontrol=8 jiffies=1090c0300 mb[0-3]=[0x4000 0x0 0x40 0xda] mb7 0x500
  host_status 0x40000010 hccr 0x3f00
qla2xxx [0000:02:00.0]-101e:6: Mailbox cmd timeout occurred, cmd=0x20,
  mb[0]=0x20. Scheduling ISP abort
(the cmd varies; sometimes it is 0x20, 0x22, 0x54, 0x5a, 0x5d, or 0x6a)

This problem can be reproduced with a 16 or 32 Gbps HBA by loading
qla2xxx with qlini_mode=disabled and running a high IOPS test while
triggering frequent RSCN database change events.

While analyzing the problem I discovered that even with
disable_msix_handshake forced to 0, it is not necessary to clear the
RISC interrupt from qla2xxx_msix_rsp_q_hs() (more below).  So just
completely remove qla2xxx_msix_rsp_q_hs() and the logic for selecting
it, which also fixes the bug with qlini_mode=disabled.

The test below describes the justification for not needing
qla2xxx_msix_rsp_q_hs():

Force disable_msix_handshake to 0:
qla24xx_config_rings():
if (0 && (ha->fw_attributes & BIT_6) && (IS_MSIX_NACK_CAPABLE(ha)) &&
    (ha->flags.msix_enabled)) {

In qla24xx_msix_rsp_q() and qla2xxx_msix_rsp_q_hs(), check:
  (rd_reg_dword(&reg->host_status) & HSRX_RISC_INT)

Count the number of calls to each function with HSRX_RISC_INT set and
the number with HSRX_RISC_INT not set while performing some I/O.

If qla2xxx_msix_rsp_q_hs() clears the RISC interrupt (original code):
qla24xx_msix_rsp_q:    50% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs:  5% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
    (# of qla24xx_msix_rsp_q interrupts) * 3

If qla2xxx_msix_rsp_q_hs() does not clear the RISC interrupt (patched
code):
qla24xx_msix_rsp_q:    100% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs:   9% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
    (# of qla24xx_msix_rsp_q interrupts) * 3

In the case of the original code, qla24xx_msix_rsp_q() was seeing
HSRX_RISC_INT set only 50% of the time because qla2xxx_msix_rsp_q_hs()
was clearing it when it shouldn't have been.  In the patched code,
qla24xx_msix_rsp_q() sees HSRX_RISC_INT set 100% of the time, which
makes sense if that interrupt handler needs to clear the RISC interrupt
(which it does).  qla2xxx_msix_rsp_q_hs() sees HSRX_RISC_INT only 9% of
the time, which is just overlap from the other interrupt during the
high IOPS test.

Tested with SCST on:
QLE2742  FW:v9.08.02 (32 Gbps 2-port)
QLE2694L FW:v9.10.11 (16 Gbps 4-port)
QLE2694L FW:v9.08.02 (16 Gbps 4-port)
QLE2672  FW:v8.07.12 (16 Gbps 2-port)
both initiator and target mode

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/56d378eb-14ad-49c7-bae9-c649b6c7691e@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 4f6aaade2a22 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
12e3d15b9d qla2x00t-32gbit: Fix initiator mode with qlini_mode=exclusive
When given the module parameter qlini_mode=exclusive, qla2xxx in
initiator mode is initially unable to successfully send SCSI commands to
devices it finds while scanning, resulting in an escalating series of
resets until an adapter reset clears the issue.  Fix by checking the
active mode instead of the module parameter.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/1715ec14-ba9a-45dc-9cf2-d41aa6b81b5e@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 8f58fc64d559 upstream ]
2025-12-09 22:33:47 +03:00
Tony Battersby
5328814318 qla2x00t-32gbit: Revert "qla2x00t-32gbit: Perform lockless command completion in abort path"
This reverts commit 0367076b0817d5c75dfb83001ce7ce5c64d803a9.

The commit being reverted added code to __qla2x00_abort_all_cmds() to
call sp->done() without holding a spinlock.  But unlike the older code
below it, this new code failed to check sp->cmd_type and just assumed
TYPE_SRB, which results in a jump to an invalid pointer in target-mode
with TYPE_TGT_CMD:

qla2xxx [0000:65:00.0]-d034:8: qla24xx_do_nack_work create sess success
  0000000009f7a79b
qla2xxx [0000:65:00.0]-5003:8: ISP System Error - mbx1=1ff5h mbx2=10h
  mbx3=0h mbx4=0h mbx5=191h mbx6=0h mbx7=0h.
qla2xxx [0000:65:00.0]-d01e:8: -> fwdump no buffer
qla2xxx [0000:65:00.0]-f03a:8: qla_target(0): System error async event
  0x8002 occurred
qla2xxx [0000:65:00.0]-00af:8: Performing ISP error recovery -
  ha=0000000058183fda.
BUG: kernel NULL pointer dereference, address: 0000000000000000
PF: supervisor instruction fetch in kernel mode
PF: error_code(0x0010) - not-present page
PGD 0 P4D 0
Oops: 0010 [#1] SMP
CPU: 2 PID: 9446 Comm: qla2xxx_8_dpc Tainted: G           O       6.1.133 #1
Hardware name: Supermicro Super Server/X11SPL-F, BIOS 4.2 12/15/2023
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0018:ffffc90001f93dc8 EFLAGS: 00010206
RAX: 0000000000000282 RBX: 0000000000000355 RCX: ffff88810d16a000
RDX: ffff88810dbadaa8 RSI: 0000000000080000 RDI: ffff888169dc38c0
RBP: ffff888169dc38c0 R08: 0000000000000001 R09: 0000000000000045
R10: ffffffffa034bdf0 R11: 0000000000000000 R12: ffff88810800bb40
R13: 0000000000001aa8 R14: ffff888100136610 R15: ffff8881070f7400
FS:  0000000000000000(0000) GS:ffff88bf80080000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000010c8ff006 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __die+0x4d/0x8b
 ? page_fault_oops+0x91/0x180
 ? trace_buffer_unlock_commit_regs+0x38/0x1a0
 ? exc_page_fault+0x391/0x5e0
 ? asm_exc_page_fault+0x22/0x30
 __qla2x00_abort_all_cmds+0xcb/0x3e0 [qla2xxx_scst]
 qla2x00_abort_all_cmds+0x50/0x70 [qla2xxx_scst]
 qla2x00_abort_isp_cleanup+0x3b7/0x4b0 [qla2xxx_scst]
 qla2x00_abort_isp+0xfd/0x860 [qla2xxx_scst]
 qla2x00_do_dpc+0x581/0xa40 [qla2xxx_scst]
 kthread+0xa8/0xd0
 </TASK>

Then commit 4475afa2646d ("scsi: qla2xxx: Complete command early within
lock") added the spinlock back, because not having the lock caused a
race and a crash.  But qla2x00_abort_srb() in the switch below already
checks for qla2x00_chip_is_down() and handles it the same way, so the
code above the switch is now redundant and still buggy in target-mode.
Remove it.

Cc: stable@vger.kernel.org
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/3a8022dc-bcfd-4b01-9f9b-7a9ec61fa2a3@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit b57fbc88715b upstream ]
2025-12-09 22:33:47 +03:00
Gleb Chesnokov
29aa9a3202 scst: Port to Linux kernel v6.19
Support for the following core changes in the Linux kernel v6.19:

  - 15115830c887 ("preempt: Cleanup the macro maze a bit")
2025-12-09 17:00:47 +03:00
Gleb Chesnokov
78d41552b4 scst/include/backport.h: Unbreak build on kernels < 6.13 2025-12-09 16:06:22 +03:00
Gleb Chesnokov
f1faed032e Revert "qla2x00t-32gbit: Fix memcpy() field-spanning write issue"
This reverts commit 6f4b10226b6b1e7d1ff3cdb006cf0f6da6eed71e.

We've been testing this patch and it turns out there is a significant
bug here. This leaks memory and causes a driver hang.

Link: https://lore.kernel.org/linux-scsi/yq1zfajqpec.fsf@ca-mkp.ca.oracle.com/
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 285654d58a74 upstream ]
2025-12-09 16:06:22 +03:00
Gleb Chesnokov
7ce251a956 qla2x00t-32gbit: Fix incorrect sign of error code in qla_nvme_xmt_ls_rsp()
Change the error code EAGAIN to -EAGAIN in qla_nvme_xmt_ls_rsp() to
align with qla2x00_start_sp() returning negative error codes or
QLA_SUCCESS, preventing logical errors.

Fixes: 875386b98857 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe")
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Message-ID: <20250905075446.381139-4-rongqianfeng@vivo.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ commit 9877c004e9f4 upstream ]
2025-12-09 16:06:22 +03:00