scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	e68cf55514	utils: cached_file: Fix alloc-dealloc mismatch during eviction on_evicted() is invoked in the LSA allocator context, set in the reclaimer callback instaled by the cache_tracker. However, cached_pages are allocated in the standard allocator context (note: page content is allocated inside LSA via lsa_buffer). The LSA region will happilly deallocate these, thinking that they these are large objects which were delegated to the standard allocator. But the _non_lsa_memory_in_use metric will underflow. When it underflows enough, shard_segment_pool.total_memory() will become 0 and memory reclamation will stop doing anything, leading to apparent OOM. The fix is to switch to the standard allocator context inside cached_page::on_evicted(). evict_range() was also given the same treatment as a precaution, it currently is only invoked in the standard allocator context. Fixes #10056	2022-02-23 18:38:05 +01:00
Tomasz Grabiec	b734615f51	util: cached_file: Fix corruption after memory reclamation was triggered from population If memory reclamation is triggered inside _cache.emplace(), the _cache btree can get corrupted. Reclaimers erase from it, and emplace() assumes that the tree is not modified during its execution. It first locates the target node and then does memory allocation. Fix by running emplace() under allocating section, which disables memory reclamation. The bug manifests with assert failures, e.g: ./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed. Fixes #9915 Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>	2022-01-30 19:57:35 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	369afe3124	treewide: use coroutine::maybe_yield() instead of co_await make_ready_future() The dedicated API shows the intent, and may be a tiny bit faster. Closes #9382	2021-09-23 12:28:56 +02:00
Tomasz Grabiec	f553db69f7	cached_file: Issue single I/O for the whole read range on miss Currently, reading a page range would issue I/O for each missing page. This is inefficient, better to issue a single I/O for the whole range and populate cache from that. As an optimization, issue a single I/O if the first page is missing. This is important for index reads which optimistically try to read 32KB of index file to read the partition index page.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	1f74863bf8	sstables, cached_file: Evict cache gently when sstable is destroyed We must evict before the _cached_index_file associated with the sstable goes away. Better to do it gently to avoid stalls.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	934824394a	sstables, cached_file: Avoid copying buffers from cache when parsing promoted index	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	7b6f18b4ed	cached_file: Introduce get_page_units() Will be needed later for reading a page view which cannot use make_tracked_temporary_buffer(). Standardize on get_page_units(), converting existing code to wrap the units in a deleter.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	484e06d69b	cached_file: Always start at offset 0 All current uses start at offset 0, so simplify the code by assuming it.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	078a6e422b	sstables: Cache all index file reads After this patch, there is a singe index file page cache per sstable, shared by index readers. The cache survives reads, which reduces amount of I/O on subsequent reads. As part of this, cached_file needed to be adjusted in the following ways. The page cache may occupy a significant portion of memory. Keeping the pages in the standard allocator could cause memory fragmentation problems. To avoid them, the cache_file is changed to keep buffers in LSA using lsa_buffer allocation method. When a page is needed by the seastar I/O layer, it needs to be copied to a temporary_buffer which is stable, so must be allocated in the standard allocator space. We copy the page on-demand. Concurrent requests for the same page will share the temporary_buffer. When page is not used, it only lives in the LSA space. In the subsequent patches cached_file::stream will be adjusted to also support access via cached_page::ptr_type directly, to avoid materializating a temporary_buffer. While a page is used, it is not linked in the LRU so that it is not freed. This ensures that the storage which is actively consumed remains stable, either via temporary_buffer (kept alive by its deleter), or by cached_page::ptr_type directly.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	019956739d	cached_file: Switch to bplus::tree In order to be able to move it to LSA later.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	8fbea0b5b7	utils: cached_file: Introduce file wrapper It's an adpator between seastar::file and cached_file. It gives a seastar::file which will serve reads using a given cached_file as a read-through cache.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	8e2118069b	sstables: cached_file: Account buffers returned by cached_file under read_permit We want buffers to be accounted only when they are used outside cached_file. Cached pages should not be accounted because they will stay around for longer than the read after subsequent commits.	2021-07-02 10:25:58 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Tomasz Grabiec	17ee1a2eed	utils: cached_file: Fix compilation error Fix field initialization order problem. In file included from ./sstables/mc/bsearch_clustered_cursor.hh:28, from sstables/index_reader.hh:32, from sstables/sstables.cc:49: ./utils/cached_file.hh: In constructor 'cached_file::stream::stream(cached_file&, const seastar::io_priority_class&, tracing::trace_state_ptr, cached_file::page_idx_type, cached_file::offset_type)': ./utils/cached_file.hh:119:34: error: 'cached_file::stream::_trace_state' will be initialized after [-Werror=reorder] 119 \| tracing::trace_state_ptr _trace_state; \| ^~~~~~~~~~~~ ./utils/cached_file.hh:117:23: error: 'cached_file::page_idx_type cached_file::stream::_page_idx' [-Werror=reorder] 117 \| page_idx_type _page_idx; \| ^~~~~~~~~ ./utils/cached_file.hh:127:9: error: when initialized here [-Werror=reorder] 127 \| stream(cached_file& cf, const io_priority_class& pc, tracing::trace_state_ptr trace_state, \| ^~~~~~ Message-Id: <1592478082-22505-1-git-send-email-tgrabiec@scylladb.com>	2020-06-18 14:08:29 +03:00
Tomasz Grabiec	58532cdf11	cached_file, sstables: Add tracing to index binary search and page cache	2020-06-16 16:15:24 +02:00
Tomasz Grabiec	c95dd67d11	utils: Introduce cached_file It is a read-through cache of a file. Will be used to cache contents of the promoted index area from the index file. Currently, cached pages are evicted manually using the invalidate_*() method family, or when the object is destroyed. The cached_file represents a subset of the file. The reason for this is to satisfy two requirements. One is that we have a page-aligned caching, where pages are aligned relative to the start of the underlying file. This matches requirements of the seastar I/O engine on I/O requests. Another requirement is to have an effective way to populate the cache using an unaligned buffer which starts in the middle of the file when we know that we won't need to access bytes located before the buffer's position. See populate_front(). If we couldn't assume that, we wouldn't be able to insert an unaligned buffer into the cache.	2020-06-16 16:15:23 +02:00

17 Commits