This fixes a use-after-free bug when parsing clustering key across pages. Also includes a fix for allocating section retry, which is potentially not safe (not in practice yet). Details of the first problem: Clustering key index lookup is based on the index file page cache. We do a binary search within the index, which involves parsing index blocks touched by the algorithm. Index file pages are 4 KB chunks which are stored in LSA. To parse the first key of the block, we reuse clustering_parser, which is also used when parsing the data file. The parser is stateful and accepts consecutive chunks as temporary_buffers. The parser is supposed to keep its state across chunks. In93482439, the promoted index cursor was optimized to avoid fully page copy when parsing index blocks. Instead, parser is given a temporary_buffer which is a view on the page. A bit earlier, inb1b5bda, the parser was changed to keep shared fragments of the buffer passed to the parser in its internal state (across pages) rather than copy the fragments into a new buffer. This is problematic when buffers come from page cache because LSA buffers may be moved around or evicted. So the temporary_buffer which is a view on the LSA buffer is valid only around the duration of a single consume() call to the parser. If the blob which is parsed (e.g. variable-length clustering key component) spans pages, the fragments stored in the parser may be invalidated before the component is fully parsed. As a result, the parsed clustering key may have incorrect component values. This never causes parsing errors because the "length" field is always parsed from the current buffer, which is valid, and component parsing will end at the right place in the next (valid) buffer. The problematic path for clustering_key parsing is the one which calls primitive_consumer::read_bytes(), which is called for example for text components. Fixed-size components are not parsed like this, they store the intermediate state by copying data. This may cause incorrect clustering keys to be parsed when doing binary search in the index, diverting the search to an incorrect block. Details of the solution: We adapt page_view to a temporary_buffer-like API. For this, a new concept is introduced called ContiguousSharedBuffer. We also change parsers so that they can be templated on the type of the buffer they work with (page_view vs temporary_buffer). This way we don't introduce indirection to existing algorithms. We use page_view instead of temporary_buffer in the promoted index parser which works with page cache buffers. page_view can be safely shared via share() and stored across allocating sections. It keeps hold to the LSA buffer even across allocating sections by the means of cached_file::page_ptr. Fixes #20766 Closes scylladb/scylladb#20837 * github.com:scylladb/scylladb: sstables: bsearch_clustered_cursor: Add trace-level logging sstables: bsearch_clustered_cursor: Move definitions out of line test, sstables: Verify parsing stability when allocating section is retried test, sstables: Verify parsing stability when buffers cross page boundary sstables: bsearch_clustered_cursor: Switch parsers to work with page_view cached_file: Adapt page_view to ContiguousSharedBuffer cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start sstables, utils: Allow parsers to work with different buffer types sstables: promoted_index_block_parser: Make reset() always bring parser to initial state sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried (cherry picked from commitfb8743b2d6) Closes scylladb/scylladb#20906
41 lines
1.4 KiB
C++
41 lines
1.4 KiB
C++
/*
|
|
* Copyright (C) 2024-present ScyllaDB
|
|
*/
|
|
|
|
/*
|
|
* SPDX-License-Identifier: AGPL-3.0-or-later
|
|
*/
|
|
|
|
#pragma once
|
|
|
|
#include <concepts>
|
|
#include <memory>
|
|
|
|
// A contiguous buffer of char objects which can be trimmed and
|
|
// supports zero-copy sharing of its underlying memory.
|
|
template<typename T>
|
|
concept ContiguousSharedBuffer = std::movable<T>
|
|
&& std::default_initializable<T>
|
|
&& requires(T& obj, size_t pos, size_t len) {
|
|
|
|
// Creates a new buffer that shares the memory of the original buffer.
|
|
// The lifetime of the new buffer is independent of the original buffer.
|
|
{ obj.share() } -> std::same_as<T>;
|
|
|
|
// Like share() but the new buffer represents a sub-range of the original buffer.
|
|
{ obj.share(pos, len) } -> std::same_as<T>;
|
|
|
|
// Trims the suffix of a buffer so that 'len' is the index of the first removed byte.
|
|
{ obj.trim(len) } -> std::same_as<void>;
|
|
|
|
// Trims the prefix of the buffer so that `pos` is the index of the first byte after the trim.
|
|
{ obj.trim_front(pos) } -> std::same_as<void>;
|
|
|
|
{ obj.begin() } -> std::same_as<const char*>;
|
|
{ obj.get() } -> std::same_as<const char*>;
|
|
{ obj.get_write() } -> std::same_as<char*>;
|
|
{ obj.end() } -> std::same_as<const char*>;
|
|
{ obj.size() } -> std::same_as<size_t>;
|
|
{ obj.empty() } -> std::same_as<bool>;
|
|
};
|