SCST SGV cache description Vladislav Bolkhovitin Version 2.1.0 Introduction

SCST SGV cache is a memory management subsystem in SCST. One can call it a "memory pool", but Linux kernel already have a mempool interface, which serves different purposes. SGV cache provides to SCST core, target drivers and backend dev handlers facilities to allocate, build and cache SG vectors for data buffers. The main advantage of it is the caching facility, when it doesn't free to the system each vector, which is not used anymore, but keeps it for a while (possibly indefinitely) to let it be reused by the next consecutive command. This allows to: Reduce commands processing latencies and, hence, improve performance; Make commands processing latencies predictable, which is essential for RT applications. The freed SG vectors are kept by the SGV cache either for some (possibly indefinite) time, or, optionally, until the system needs more memory and asks to free some using the set_shrinker() interface. Also the SGV cache allows to: Cluster pages together. "Cluster" means merging adjacent pages in a single SG entry. It allows to have less SG entries in the resulting SG vector, hence improve performance handling it as well as allow to work with bigger buffers on hardware with limited SG capabilities. Set custom page allocator functions. For instance, scst_user device handler uses this facility to eliminate unneeded mapping/unmapping of user space pages and avoid unneeded IOCTL calls for buffers allocations. In fileio_tgt application, which uses a regular malloc() function to allocate data buffers, this facility allows ~30% less CPU load and considerable performance increase. Prevent each initiator or all initiators altogether to allocate too much memory and DoS the target. Consider 10 initiators, which can have access to 10 devices each. Any of them can queue up to 64 commands, each can transfer up to 1MB of data. So, all of them in a peak can allocate up to 10*10*64 = ~6.5GB of memory for data buffers. This amount must be limited somehow and the SGV cache performs this function. Implementation

From implementation POV the SGV cache is a simple extension of the kmem cache. It can work in 2 modes: With fixed size buffers. With a set of power 2 size buffers. In this mode each SGV cache (struct sgv_pool) has SGV_POOL_ELEMENTS (11 currently) of kmem caches. Each of those kmem caches keeps SGV cache objects (struct sgv_pool_obj) corresponding to SG vectors with size of order X pages. For instance, request to allocate 4 pages will be served from kmem cache[2&rsqb, since the order of the of number of requested pages is 2. If later request to allocate 11KB comes, the same SG vector with 4 pages will be reused (see below). This mode is in average allows less memory overhead comparing with the fixed size buffers mode. Consider how the SGV cache works in the set of buffers mode. When a request to allocate new SG vector comes, sgv_pool_alloc() via sgv_get_obj() checks if there is already a cached vector with that order. If yes, then that vector will be reused and its length, if necessary, will be modified to match the requested size. In the above example request for 11KB buffer, 4 pages vector will be reused and modified using trans_tbl to contain 3 pages and the last entry will be modified to contain the requested length - 2*PAGE_SIZE. If there is no cached object, then a new sgv_pool_obj will be allocated from the corresponding kmem cache, chosen by the order of number of requested pages. Then that vector will be filled by pages and returned. In the fixed size buffers mode the SGV cache works similarly, except that it always allocate buffer with the predefined fixed size. I.e. even for 4K request the whole buffer with predefined size, say, 1MB, will be used. In both modes, if size of a request exceeds the maximum allowed for caching buffer size, the requested buffer will be allocated, but not cached. Freed cached sgv_pool_obj objects are actually freed to the system either by the purge work, which is scheduled once in 60 seconds, or in sgv_shrink() called by system, when it's asking for memory. Interface sgv_pool *sgv_pool_create()

struct sgv_pool *sgv_pool_create( const char *name, enum sgv_clustering_types clustered, int single_alloc_pages, bool shared, int purge_interval) This function creates and initializes an SGV cache. It has the following arguments: 0, then the SGV cache will work in the fixed size buffers mode. In this case single_alloc_pages sets the size of each buffer in pages. Returns the resulting SGV cache or NULL in case of any error. void sgv_pool_del()

void sgv_pool_del( struct sgv_pool *pool) This function deletes the corresponding SGV cache. If the cache is shared, it will decrease its reference counter. If the reference counter reaches 0, the cache will be destroyed. void sgv_pool_flush()

void sgv_pool_flush( struct sgv_pool *pool) This function flushes, i.e. frees, all the cached entries in the SGV cache. void sgv_pool_set_allocator()

void sgv_pool_set_allocator( struct sgv_pool *pool, struct page *(*alloc_pages_fn)(struct scatterlist *sg, gfp_t gfp, void *priv), void (*free_pages_fn)(struct scatterlist *sg, int sg_count, void *priv)); This function allows to set for the SGV cache a custom pages allocator. For instance, scst_user uses such function to supply to the cache mapped from user space pages. This function should return the allocated page or NULL, if no page was allocated. struct scatterlist *sgv_pool_alloc()

struct scatterlist *sgv_pool_alloc( struct sgv_pool *pool, unsigned int size, gfp_t gfp_mask, int flags, int *count, struct sgv_pool_obj **sgv, struct scst_mem_lim *mem_lim, void *priv) This function allocates an SG vector from the SGV cache. It has the following parameters: This function returns pointer to the resulting SG vector or NULL in case of any error. void sgv_pool_free()

void sgv_pool_free( struct sgv_pool_obj *sgv, struct scst_mem_lim *mem_lim) This function frees previously allocated SG vector, referenced by SGV cache object sgv. void *sgv_get_priv(struct sgv_pool_obj *sgv)

void *sgv_get_priv( struct sgv_pool_obj *sgv) This function allows to get the allocation private data for this SGV cache object sgv. The private data are set by sgv_pool_alloc(). void scst_init_mem_lim()

void scst_init_mem_lim( struct scst_mem_lim *mem_lim) This function initializes memory limits structure mem_lim according to the current system configuration. This structure should be latter used to track and limit allocated by one or more SGV caches memory. Runtime information and statistics.

Runtime information and statistics is available in /sys/kernel/scst_tgt/sgv.