SCST user space device handler interface description Vladislav Bolkhovitin Version 3.7.0-pre Introduction

SCST user space device handler module scst_user is a device handler for SCST, which provides a way to implement in the user space complete, full feature virtual SCSI devices in the SCST environment.

This document assumes that the reader is familiar with the SCST architecture and the states through which SCSI commands go during processing in SCST. Module scst_user basically only provides hooks to them. Their description could be found on the SCST web page on http://scst.sf.net. User space API

Module scst_user provides /dev/scst_user character device with the following system calls available: Device /dev/scst_user could be opened in blocking or non-blocking mode using O_NONBLOCK flag. In the blocking mode ioctl() SCST_USER_REPLY_GET_CMD function blocks until there is a new subcommand to process. In the non-blocking mode if there are no pending subcommands SCST_USER_REPLY_GET_CMD function returns immediately with EAGAIN error code, and the user space device handler can use poll() call to get notification about new subcommands arrival. The blocking mode is the default. The module scst_user API is defined in scst_user.h file. IOCTL() functions

There are following IOCTL functions available. All of them has one argument. They all, except SCST_USER_REGISTER_DEVICE return 0 for success or -1 in case of error, and errno is set appropriately. SCST_USER_REGISTER_DEVICE

SCST_USER_REGISTER_DEVICE registers new virtual user space device. The argument is: struct scst_user_dev_desc { aligned_u64 version_str; aligned_u64 license_str; uint8_t type; uint8_t sgv_shared; uint8_t sgv_disable_clustered_pool; int32_t sgv_single_alloc_pages; int32_t sgv_purge_interval; uint8_t has_own_order_mgmt; struct scst_user_opt opt; uint32_t block_size; char name[SCST_MAX_NAME]; char sgv_name[SCST_MAX_NAME]; }, where: 0, then the SGV cache will work in the fixed size buffers mode. In this case it sets the size of each buffer in pages. See the SGV cache documentation (http://scst.sourceforge.net/sgv_cache.txt) for more details. SCST_USER_REGISTER_DEVICE returns registered device's handler or -1 in case of error, and errno is set appropriately. In order to unregister the device, either call SCST_USER_UNREGISTER_DEVICE function, or close on its file descriptor. SCST_USER_UNREGISTER_DEVICE

SCST_USER_UNREGISTER_DEVICE is obsolete and should not be used. Just close the device's fd instead. SCST_USER_SET_OPTIONS/SCST_USER_GET_OPTIONS

SCST_USER_SET_OPTIONS/SCST_USER_GET_OPTIONS allows to set or get correspondingly various options that control various aspects of SCSI commands processing. The argument is: struct scst_user_opt { uint8_t parse_type; uint8_t on_free_cmd_type; uint8_t memory_reuse_type; uint8_t partial_transfers_type; uint32_t partial_len; uint8_t tst; uint8_t queue_alg; uint8_t tas; uint8_t swp; uint8_t d_sense; uint8_t has_own_order_mgmt; }, where: Flags SCST_USER_REPLY_AND_GET_CMD

SCST_USER_REPLY_AND_GET_CMD allows at one call reply on the current subcommand from SCST and get the next one. If 0 is returned by ioctl(), SCST_USER_REPLY_AND_GET_CMD returns a SCST subcommand in the argument, which is defined as the following: struct scst_user_get_cmd { uint32_t cmd_h; uint32_t subcode; union { uint64_t preply; struct scst_user_sess sess; struct scst_user_scsi_cmd_parse parse_cmd; struct scst_user_scsi_cmd_alloc_mem alloc_cmd; struct scst_user_scsi_cmd_exec exec_cmd; struct scst_user_scsi_on_free_cmd on_free_cmd; struct scst_user_on_cached_mem_free on_cached_mem_free; struct scst_user_tm tm_cmd; }; }, where: Other union members contain command's specific payload. For all received subcommands the user space device handler shall call SCST_USER_REPLY_AND_GET_CMD or SCST_USER_REPLY_CMD function to tell SCST that the subcommand's processing is finished, although some subcommands don't return a value. You can see description of possible subcommands in section . SCST_USER_REPLY_AND_GET_MULTI

SCST_USER_REPLY_AND_GET_MULTI allows at one call reply on the multiple subcommands from SCST and get the multiple next subcommands. Its argument is defined as: struct scst_user_get_multi { aligned_u64 preplies; int16_t replies_cnt; int16_t replies_done; int16_t cmds_cnt; struct scst_user_get_cmd cmds[0]; }, where: Returns 0 on success or -1 in case of error, and errno is set appropriately. SCST_USER_REPLY_CMD

SCST_USER_REPLY_CMD IOCTL function allows the user space handler to return the result of a command's execution. Its argument is defined as: struct scst_user_reply_cmd { uint32_t cmd_h; uint32_t subcode; union { int32_t result; struct scst_user_scsi_cmd_reply_parse parse_reply; struct scst_user_scsi_cmd_reply_alloc_mem alloc_reply; struct scst_user_scsi_cmd_reply_exec exec_reply; }; }, where: struct scst_user_scsi_cmd_reply_parse { uint8_t queue_type; uint8_t data_direction; uint16_t cdb_len; aligned_i64 lba; uint32_t op_flags; aligned_i64 data_len; int32_t bufflen; int32_t out_bufflen; }, where: struct scst_user_scsi_cmd_reply_alloc_mem { uint64_t pbuf; }, where: struct scst_user_scsi_cmd_reply_exec { int32_t resp_data_len; uint64_t pbuf; uint8_t reply_type; uint8_t status; uint8_t sense_len; aligned_u64 psense_buffer; }, where: SCST_USER_FLUSH_CACHE

SCST_USER_FLUSH_CACHE - flushes SGV cache for the corresponding virtual user space device and queues for all cached memory buffers corresponding SCST_USER_ON_CACHED_MEM_FREE subcommands. During execution of SCST_USER_FLUSH_CACHE at least one another thread must process all coming subcommands, otherwise after timeout it will fail with EBUSY error. SCST_USER_FLUSH_CACHE doesn't have any parameters. SCST_USER_FLUSH_CACHE returns 0 on success or -1 in case of error, and errno is set appropriately. SCST_USER_DEVICE_CAPACITY_CHANGED

SCST_USER_DEVICE_CAPACITY_CHANGED - queues CAPACITY DATA HAS CHANGED Unit Attention or corresponding Asynchronous Event to the corresponding virtual device. It will notify remote initiators, connected to the device, and allow them to automatically refresh new device size. You should use SCST_USER_DEVICE_CAPACITY_CHANGED after resize of the device. SCST_USER_DEVICE_CAPACITY_CHANGED doesn't have any parameters. SCST_USER_DEVICE_CAPACITY_CHANGED returns 0 on success or -1 in case of error, and errno is set appropriately. SCST_USER_GET_EXTENDED_CDB

SCST_USER_GET_EXTENDED_CDB - requests extended CDB, if CDB size is more than SCST_MAX_CDB_SIZE bytes. In this case SCST_USER_GET_EXTENDED_CDB returns additional CDB data beyond SCST_MAX_CDB_SIZE bytes. SCST_USER_GET_EXTENDED_CDB has the following arguments: struct scst_user_get_ext_cdb { uint32_t cmd_h; aligned_u64 ext_cdb_buffer; }, where: SCST_USER_GET_EXTENDED_CDB returns 0 on success or -1 in case of error, and errno is set appropriately. SCST_USER_PREALLOC_BUFFER

SCST_USER_PREALLOC_BUFFER - asks to preallocate a buffer. It has the following arguments: union scst_user_prealloc_buffer { struct scst_user_prealloc_buffer_in in; struct scst_user_prealloc_buffer_out out; }, where: Structure struct scst_user_prealloc_buffer_in { aligned_u64 pbuf; uint32_t bufflen; uint8_t for_clust_pool; }, where: Structure struct scst_user_prealloc_buffer_out { uint32_t cmd_h; } where: SCST_USER_PREALLOC_BUFFER returns 0 on success or -1 in case of error, and errno is set appropriately. SCST_USER subcommands

SCST_USER_ATTACH_SESS notifies the user space handler that a new initiator's session is about to be attached to the device. Payload contains struct scst_user_sess, which is defined as the following: struct scst_user_sess { uint64_t sess_h; uint64_t lun; uint16_t threads_num; uint8_t rd_only; uint16_t scsi_transport_version; uint16_t phys_transport_version; char initiator_name[SCST_MAX_NAME]; char target_name[SCST_MAX_NAME]; }, where: When SCST_USER_ATTACH_SESS is returned, it is guaranteed that there are no other commands are being executed or pending. After SCST_USER_ATTACH_SESS function completed, the user space device handler shall reply using "result" field of the corresponding reply command. SCST_USER_DETACH_SESS

SCST_USER_DETACH_SESS notifies the user space handler that the corresponding initiator is about to be detached from the particular device. Payload contains struct scst_user_sess, where only handle field is valid. When SCST_USER_DETACH_SESS is returned, it is guaranteed that there are no other commands are being executed or pending. This command doesn't reply any return value, although SCST_USER_REPLY_AND_GET_CMD or SCST_USER_REPLY_CMD function must be called. SCST_USER_PARSE

SCST_USER_PARSE returns SCSI command on PARSE state of the SCST processing. The PARSE state is intended to check validity of the command, determine data transfer type and the necessary data buffer size. This subcommand is returned only if SCST_USER_SET_OPTIONS parse_type isn't set to SCST_USER_PARSE_STANDARD. In this case the standard SCST internal parser for this SCSI device type will do all the job. Payload contains struct scst_user_scsi_cmd_parse, which is defined as the following: struct scst_user_scsi_cmd_parse { uint64_t sess_h; uint8_t cdb[SCST_MAX_CDB_SIZE]; uint16_t cdb_len; aligned_i64 lba; uint32_t timeout; int32_t bufflen; aligned_i64 data_len; int32_t out_bufflen; uint32_t op_flags; uint8_t queue_type; uint8_t data_direction; uint8_t expected_values_set; uint8_t expected_data_direction; int32_t expected_transfer_len; int32_t expected_out_transfer_len; uint32_t sn; }, where: Bits of scst_cdb_flags can be: In the PARSE state of SCSI commands processing the user space device handler shall check and provide SCST values for command data buffer length, data flow direction and timeout, which it shall reply using the corresponding reply command. In case of any error the error reporting should be deferred until SCST_USER_EXEC subcommand, where the appropriate SAM status and sense shall be set. SCST_USER_ALLOC_MEM

SCST_USER_ALLOC_MEM returns SCSI command on memory allocation state of the SCST processing. On this state the user space device handler shall allocate the command's data buffer with bufflen length and then return it to SCST using the corresponding reply command. Then SCST internally will convert it in SG vector in order to use it itself and by target drivers. If the memory reuse type is disabled (i.e. set to SCST_USER_MEM_NO_REUSE) there are no special requirements for buffer memory or its alignment, it could be just what malloc() returned. If the memory reuse type is enabled, the buffer shall be page size aligned, for example using memalign() function. Payload contains struct scst_user_scsi_cmd_alloc_mem, which is defined as the following: struct scst_user_scsi_cmd_alloc_mem { uint64_t sess_h; uint8_t cdb[SCST_MAX_CDB_SIZE]; uint16_t cdb_len; int32_t alloc_len; uint8_t queue_type; uint8_t data_direction; uint32_t sn; }, where: Memory allocation, preparation and freeing are ones of the most complicated and expensive operations during SCSI commands processing. Module scst_user provides a way to almost completely eliminate those operations by reusing once allocated memory for subsequent SCSI commands. It is controlled by for more details. Since the SGV cache caches SG vectors, which can be bigger, than actual data sizes of SCSI commands, alloc_len field could also be bigger, than actually required by the SCSI command. The memory reuse could be used in both SCSI tagged and untagged queuing environments. In the SCSI tagged queuing environment the SGV cache will take care that several commands don't use the same buffer simultaneously by asking the user space handler to allocate a new data buffer, when all cached ones are busy. Some important notes: If the user space handler needs to call fork(), it must call madvise() with MADV_DONTFORK flag for all allocated data buffers, otherwise parent or child process could loose the connection with them, which could lead to data corruption. See for details. The interface assumes that all allocated memory by the user space handler is DMA'able by the target hardware. This is almost always true for most modern systems, except if the target hardware isn't capable of using 64-bit address space and the system has >4GB of memory or the memory addresses are in address space, which is unavailable with 32-bit addresses. In case of any error the error reporting should be deferred until SCST_USER_EXEC subcommand, where the appropriate SAM status and sense should be set. SCST_USER_EXEC

SCST_USER_EXEC returns SCSI command on execution state of the SCST processing. The user space handler should execute the SCSI command and reply using the corresponding reply command. In some cases for performance reasons for READ-type SCSI commands SCST_USER_ALLOC_MEM subcommand isn't returned before SCST_USER_EXEC. Thus, if pbuf pointer is 0 and the SCSI command needs data transfer, the user space handler should be prepared to allocate the data buffer with size alloc_len, which could be bigger (due to the SGV cache), than actually required by the SCSI command. But field bufflen will contain the correct value. All the memory reusage rules, described for SCST_USER_ALLOC_MEM, apply to SCST_USER_EXEC as well. Payload contains struct scst_user_scsi_cmd_exec, which is defined as the following: struct scst_user_scsi_cmd_exec { uint64_t sess_h; uint8_t cdb[SCST_MAX_CDB_SIZE]; uint16_t cdb_len; aligned_i64 lba; aligned_i64 data_len; int32_t bufflen; int32_t alloc_len; uint64_t pbuf; uint8_t queue_type; uint8_t data_direction; uint8_t partial; uint32_t timeout; aligned_u64 p_out_buf; int32_t out_bufflen; uint32_t sn; uint32_t parent_cmd_h; int32_t parent_cmd_data_len; uint32_t partial_offset; }, where: It is guaranteed that only commands of the same queue_type per session can be returned simultaneously. In case of any error it should be reported via appropriate SAM status and sense. If it happens for a subcommand of a partial data transfers command, all other subcommands of this command, which already passed the the user space handler or will be passed in the future, will be aborted by scst_user, the user space handler should ignore them. SCST_USER_ON_FREE_CMD

SCST_USER_ON_FREE_CMD returns SCSI command when the command is about to be freed. At this stage, the user space device handler could do any necessary cleanups, for instance, free allocated for data buffer memory. struct scst_user_scsi_on_free_cmd { uint64_t pbuf; int32_t resp_data_len; uint8_t buffer_cached; uint8_t aborted; uint8_t status; uint8_t delivery_status; }, where: The user space handler should reply using the corresponding reply command. No error code is needed. SCST_USER_ON_CACHED_MEM_FREE

SCST_USER_ON_CACHED_MEM_FREE subcommand is returned, when SGV cache decided that this buffer isn't needed anymore. This happens after some time of inactivity or when the system is under memory pressure. Payload contains struct scst_user_on_cached_mem_free, which is defined as the following: struct scst_user_scsi_cmd_alloc_mem { uint64_t pbuf; }, where: SCST_USER_TASK_MGMT_RECEIVED

SCST_USER_TASK_MGMT_RECEIVED subcommand notifies that a task management function has been received. Payload contains struct scst_user_tm, which is defined as the following: struct scst_user_tm { uint64_t sess_h; uint32_t fn; uint32_t cmd_h_to_abort; uint32_t cmd_sn; uint8_t cmd_sn_set; }, where: On this notification dev handler should do the best to ensure that all aborted by this TM command SCSI commands complete ASAP. Possible values of The "result" field of the corresponding reply command is ignored for this subcommand. SCST_USER_TASK_MGMT_DONE

SCST_USER_TASK_MGMT_DONE subcommand notifies that all aborted by task management function commands have finished, so the dev handler can perform actual actions required by this TM command. For instance, reset all MODE PAGES variables to default values. Payload contains struct scst_user_tm, which was defined above. After the TM function is completed, the device handler shall reply using "result" field of the corresponding reply command. Possible return values are: Commands processing flow example.

As the example consider a simple synchronous VTL, which serves one virtual SCSI tape device and can process only one command at time from any initiator. At the beginning the VTL opens using Then it using Then it prepares struct Then it prepares If the received SCSI command is READ-type one, SCST does the necessary preparations, then the VTL receives Then it prepares That's all for this SCSI command. For the next command the used data buffer will be reused. For WRITE-type SCSI commands the processing is the same, but