scylladb

Author	SHA1	Message	Date
Kefu Chai	a1dcddd300	utils: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16833	2024-01-18 12:50:06 +02:00
Michał Chojnowski	c7d9d35030	utils: cached_file: deglobalize cached_file metrics Move cached_file metrics from a thread_local variable to cache_tracker. This is needed so that cache_tracker can know the memory usage of index caches (for purposes of cache eviction) without relying on globals. But it also makes sense even without that motive.	2023-09-01 22:34:41 +02:00
Michał Chojnowski	50b429f255	config: add index_cache_fraction Adds a configurable upper limit to memory usage by index caches. See the source code comments added in this patch for more details. This patch shouldn't change visible behaviour, because the limit is set to 1.0 by default, so it is never triggerred. We will change the default in a future patch.	2023-09-01 22:34:23 +02:00
Raphael S. Carvalho	050ce9ef1d	cached_file: Evict unused pages that aren't linked to LRU yet It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes #14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14818	2023-07-27 00:01:46 +02:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Michał Chojnowski	f340c9cca5	utils: lru: remove unlink_from_lru() unlink_from_lru() allows for unlinking elements from cache without notifying the cache. This messes up any potential cache bookkeeping. Improved that by replacing all uses of unlink_from_lru() with calls to lru::remove(), which does have access to cache's metadata.	2022-10-17 12:07:27 +02:00
Michał Chojnowski	d785364375	cache: make all cache unlinks explicit Our LSA cache is implemented as an auto_unlink Boost intrusive list, meaning that elements of the list unlink themselves from the list automatically on destruction. Some parts of the code rely on that, and don't unlink them manually. However, this precludes accurate bookkeeping about the cache. Elements only have access to themselves and their neighbours, not to any bookkeeping context. Therefore, a destructor cannot update the relevant metadata. In this patch, we fix this by adding explicit unlink calls to places where it would be done by a destructor. In a following patch, we will add an assert to the destructor to check that every element is unlinked before destruction.	2022-10-17 12:07:27 +02:00
Tomasz Grabiec	e68cf55514	utils: cached_file: Fix alloc-dealloc mismatch during eviction on_evicted() is invoked in the LSA allocator context, set in the reclaimer callback instaled by the cache_tracker. However, cached_pages are allocated in the standard allocator context (note: page content is allocated inside LSA via lsa_buffer). The LSA region will happilly deallocate these, thinking that they these are large objects which were delegated to the standard allocator. But the _non_lsa_memory_in_use metric will underflow. When it underflows enough, shard_segment_pool.total_memory() will become 0 and memory reclamation will stop doing anything, leading to apparent OOM. The fix is to switch to the standard allocator context inside cached_page::on_evicted(). evict_range() was also given the same treatment as a precaution, it currently is only invoked in the standard allocator context. Fixes #10056	2022-02-23 18:38:05 +01:00
Tomasz Grabiec	b734615f51	util: cached_file: Fix corruption after memory reclamation was triggered from population If memory reclamation is triggered inside _cache.emplace(), the _cache btree can get corrupted. Reclaimers erase from it, and emplace() assumes that the tree is not modified during its execution. It first locates the target node and then does memory allocation. Fix by running emplace() under allocating section, which disables memory reclamation. The bug manifests with assert failures, e.g: ./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed. Fixes #9915 Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>	2022-01-30 19:57:35 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	369afe3124	treewide: use coroutine::maybe_yield() instead of co_await make_ready_future() The dedicated API shows the intent, and may be a tiny bit faster. Closes #9382	2021-09-23 12:28:56 +02:00
Tomasz Grabiec	f553db69f7	cached_file: Issue single I/O for the whole read range on miss Currently, reading a page range would issue I/O for each missing page. This is inefficient, better to issue a single I/O for the whole range and populate cache from that. As an optimization, issue a single I/O if the first page is missing. This is important for index reads which optimistically try to read 32KB of index file to read the partition index page.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	1f74863bf8	sstables, cached_file: Evict cache gently when sstable is destroyed We must evict before the _cached_index_file associated with the sstable goes away. Better to do it gently to avoid stalls.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	934824394a	sstables, cached_file: Avoid copying buffers from cache when parsing promoted index	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	7b6f18b4ed	cached_file: Introduce get_page_units() Will be needed later for reading a page view which cannot use make_tracked_temporary_buffer(). Standardize on get_page_units(), converting existing code to wrap the units in a deleter.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	484e06d69b	cached_file: Always start at offset 0 All current uses start at offset 0, so simplify the code by assuming it.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	078a6e422b	sstables: Cache all index file reads After this patch, there is a singe index file page cache per sstable, shared by index readers. The cache survives reads, which reduces amount of I/O on subsequent reads. As part of this, cached_file needed to be adjusted in the following ways. The page cache may occupy a significant portion of memory. Keeping the pages in the standard allocator could cause memory fragmentation problems. To avoid them, the cache_file is changed to keep buffers in LSA using lsa_buffer allocation method. When a page is needed by the seastar I/O layer, it needs to be copied to a temporary_buffer which is stable, so must be allocated in the standard allocator space. We copy the page on-demand. Concurrent requests for the same page will share the temporary_buffer. When page is not used, it only lives in the LSA space. In the subsequent patches cached_file::stream will be adjusted to also support access via cached_page::ptr_type directly, to avoid materializating a temporary_buffer. While a page is used, it is not linked in the LRU so that it is not freed. This ensures that the storage which is actively consumed remains stable, either via temporary_buffer (kept alive by its deleter), or by cached_page::ptr_type directly.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	019956739d	cached_file: Switch to bplus::tree In order to be able to move it to LSA later.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	8fbea0b5b7	utils: cached_file: Introduce file wrapper It's an adpator between seastar::file and cached_file. It gives a seastar::file which will serve reads using a given cached_file as a read-through cache.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	8e2118069b	sstables: cached_file: Account buffers returned by cached_file under read_permit We want buffers to be accounted only when they are used outside cached_file. Cached pages should not be accounted because they will stay around for longer than the read after subsequent commits.	2021-07-02 10:25:58 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Tomasz Grabiec	17ee1a2eed	utils: cached_file: Fix compilation error Fix field initialization order problem. In file included from ./sstables/mc/bsearch_clustered_cursor.hh:28, from sstables/index_reader.hh:32, from sstables/sstables.cc:49: ./utils/cached_file.hh: In constructor 'cached_file::stream::stream(cached_file&, const seastar::io_priority_class&, tracing::trace_state_ptr, cached_file::page_idx_type, cached_file::offset_type)': ./utils/cached_file.hh:119:34: error: 'cached_file::stream::_trace_state' will be initialized after [-Werror=reorder] 119 \| tracing::trace_state_ptr _trace_state; \| ^~~~~~~~~~~~ ./utils/cached_file.hh:117:23: error: 'cached_file::page_idx_type cached_file::stream::_page_idx' [-Werror=reorder] 117 \| page_idx_type _page_idx; \| ^~~~~~~~~ ./utils/cached_file.hh:127:9: error: when initialized here [-Werror=reorder] 127 \| stream(cached_file& cf, const io_priority_class& pc, tracing::trace_state_ptr trace_state, \| ^~~~~~ Message-Id: <1592478082-22505-1-git-send-email-tgrabiec@scylladb.com>	2020-06-18 14:08:29 +03:00
Tomasz Grabiec	58532cdf11	cached_file, sstables: Add tracing to index binary search and page cache	2020-06-16 16:15:24 +02:00
Tomasz Grabiec	c95dd67d11	utils: Introduce cached_file It is a read-through cache of a file. Will be used to cache contents of the promoted index area from the index file. Currently, cached pages are evicted manually using the invalidate_*() method family, or when the object is destroyed. The cached_file represents a subset of the file. The reason for this is to satisfy two requirements. One is that we have a page-aligned caching, where pages are aligned relative to the start of the underlying file. This matches requirements of the seastar I/O engine on I/O requests. Another requirement is to have an effective way to populate the cache using an unaligned buffer which starts in the middle of the file when we know that we won't need to access bytes located before the buffer's position. See populate_front(). If we couldn't assume that, we wouldn't be able to insert an unaligned buffer into the cache.	2020-06-16 16:15:23 +02:00

24 Commits