scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Files

Michał Chojnowski 882a3c60e4 utils/cached_file: reduce latency (and increase overhead) of partially-cached reads

Currently, `cached_file::stream` (currently used only by index_reader,
to read index pages), works as follows.

Assume that the caller requested a read of the range [pos, pos + size).
Then:

- If the first page of the requested range is uncached,
  the entire [pos, pos + size) range is read from disk (even if some
  later pieces of it are cached), the resulting pages are added to the cache,
  and the read completes (most likely) from the cached pages.
- If the first page of the read is cached, then the rest of the read
  is handled page-by-page, in a sequential loop, serving each page
  either from cache (if present) or from disk.

For example, assume that pages 0, 1, 2, 3, 4 are requested.

If exactly pages 1, 2 are cached, then `stream` will read the entire [0, 4] range
from disk and insert the missing 0, 3, 4, and then it will continue serving the
read from cache.

If exactly pages 0 and 3 are cached, then it will serve 0 from cache,
then it will read 1 from disk and insert it into cache,
then it will read 2 from disk and insert it into cache,
then it will serve 3 from cache,
then it will read 4 from disk and insert it into cache.

If exactly the first page is cached, a 128 kiB read turns
into 31 I/O sequential read ops.

This is weird, and doesn't look intended. In one case, we are reading even pages
we already have, just to avoid fragmenting the read, and in the other case
we are reading pages one-by-one (sequentially!) even if they are neighbours.

I'm not sure if cached_file should minimize IOPS or byte throughput,
but the current state is surely suboptimal. Even if its read strategy
is somehow optimal, it should still at least coalesce contiguous reads
and perform the non-contiguous reads in parallel.

This patch leans into minimizing IOPS. After the patch, we serve
as many front pages from the cache as we can, but when we see
an uncached page, we read the entire remainder of the read from disk.

As if we trimmed the read request by the longest cached prefix,
and then performed the rest using the logic from before the patch.

For example, if exactly pages 0 and 3 are cached,
then we serve 0 from cache,
then we read [1, 4] from disk and insert everything into cache.

For partially-cached files, this will result in more bytes read
from disk, but less IOPS. This might be a bad thing. But if so,
then we should lean the other way in a more explicit and efficient
way than we currently do.

Closes scylladb/scylladb#20935

2024-10-04 17:39:38 +02:00

abi

…

arch/powerpc/crc32-vpmsum

…

lsa

utils/stall_free: introduce reserve_gently

2024-06-18 23:36:30 +05:30

s3/client: Don't move file from write_body's lambda

2024-09-17 09:48:09 +03:00

allocation_strategy.hh

logalloc: add hold_reserve

2024-07-08 16:08:27 +02:00

amortized_reserve.hh

utils: do not include unused headers

2024-01-18 12:50:06 +02:00

anchorless_list.hh

…

array-search.cc

…

array-search.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

ascii.cc

Typos: fix typos in comments

2023-12-02 22:37:22 +02:00

ascii.hh

…

assert.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

atomic_vector.hh

…

aws_sigv4.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

aws_sigv4.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

base64.cc

utils: do not include unused headers

2024-01-18 12:50:06 +02:00

base64.hh

…

big_decimal.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

big_decimal.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

bit_cast.hh

utils: bit_cast: drop unused #includes

2023-12-12 21:09:51 +08:00

bloom_calculations.cc

…

bloom_calculations.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

bloom_filter.cc

utils/i_filter: introduce get_filter_size()

2024-06-24 12:06:01 +05:30

bloom_filter.hh

utils/i_filter: introduce get_filter_size()

2024-06-24 12:06:01 +05:30

bounded_stats_deque.hh

…

bptree.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

buffer_input_stream.cc

…

buffer_input_stream.hh

…

buffer_view-to-managed_bytes_view.hh

…

build_id.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

build_id.hh

…

cached_file_stats.hh

…

cached_file.hh

utils/cached_file: reduce latency (and increase overhead) of partially-cached reads

2024-10-04 17:39:38 +02:00

chunked_vector.hh

utils: chunked_vector: add ctor from std::initializer_list

2024-06-25 12:08:06 +03:00

class_registrator.hh

…

clmul.hh

…

CMakeLists.txt

utils: add on_internal_error with common logger

2024-01-31 16:45:09 +02:00

coarse_steady_clock.hh

…

collection-concepts.hh

…

compact-radix-tree.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

composite_abort_source.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

config_file_impl.hh

config: specialize from-string conversion for bool

2024-07-18 18:38:22 +03:00

config_file.cc

config: avoid binding an lvalue reference to an rvalue reference

2024-06-27 19:36:13 +03:00

config_file.hh

config: do not provide default value for set_value() and friends

2024-09-25 15:45:42 +03:00

contiguous_shared_buffer.hh

sstables, utils: Allow parsers to work with different buffer types

2024-09-27 01:24:54 +02:00

coroutine.hh

utils: do not include unused headers

2024-01-18 12:50:06 +02:00

crc.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

cross-shard-barrier.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

data_input.hh

…

date.h

treewide: apply codespell to the comments in source code

2023-12-20 10:25:03 +02:00

digest_algorithm.hh

feature: grandfather DIGEST_FOR_NULL_VALUES

2024-05-18 00:24:00 +03:00

digester.hh

feature: grandfather DIGEST_FOR_NULL_VALUES

2024-05-18 00:24:00 +03:00

directories.cc

directories: mark verification_error() with [[noreturn]]

2024-09-30 12:07:15 +08:00

directories.hh

directories: prevent inode cache fragmentation by orderly verifying data directory contents

2024-02-01 12:20:23 +05:30

disk-error-handler.cc

extensions: Add exception types for IO extensions and handle in memtable write path

2024-08-11 13:52:35 +03:00

disk-error-handler.hh

treewide: replace std::result_of_t with std::invoke_result_t

2024-05-26 16:45:42 +03:00

div_ceil.hh

utils/div_ceil: add constraints to template arguments

2024-08-04 15:32:01 +03:00

double-decker.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

dynamic_bitset.cc

…

dynamic_bitset.hh

…

entangled.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

enum_option.hh

config, enum_option: allow round-trip string conversion

2024-07-10 20:39:01 +03:00

error_injection.cc

…

error_injection.hh

utils: Ensure const correctness of injection_handler::get().

2024-08-20 14:15:50 +02:00

estimated_histogram.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

exception_container.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

exceptions.cc

utils: do not include unused headers

2024-01-18 12:50:06 +02:00

exceptions.hh

exceptions: s/#warn/#warning/

2024-02-01 14:50:17 +02:00

exponential_backoff_retry.hh

treewide: replace std::result_of_t with std::invoke_result_t

2024-05-26 16:45:42 +03:00

extremum_tracking.hh

…

file_lock.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

file_lock.hh

utils: Remove unused operator<< for file_lock object

2024-02-02 15:20:40 +01:00

flush_queue.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

fmt-compat.hh

…

fragment_range.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

fragmented_temporary_buffer.hh

sstables, utils: Allow parsers to work with different buffer types

2024-09-27 01:24:54 +02:00

hash.hh

…

hashers.cc

…

hashers.hh

…

hashing.hh

treewide: include used headers

2024-05-27 17:34:38 +03:00

histogram_metrics_helper.cc

…

histogram_metrics_helper.hh

histogram_metrics_helper: support native histogram

2024-01-23 13:12:34 +02:00

histogram.hh

utils/histogram.hh: Make summary support inifinite bucket.

2024-08-22 23:34:24 +03:00

http.hh

Update seastar submodule

2024-09-18 13:59:22 +03:00

human_readable.cc

utils/human_readable: add fmt::formatter for human_readable_value

2024-03-12 14:53:55 +08:00

human_readable.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

i_filter.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

i_filter.hh

utils/i_filter: introduce get_filter_size()

2024-06-24 12:06:01 +05:30

immutable-collection.hh

…

input_stream.hh

…

int_range.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

intrusive_btree.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

intrusive-array.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

large_bitset.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

large_bitset.hh

…

latency.hh

…

lexicographical_compare.hh

…

like_matcher.cc

…

like_matcher.hh

…

limiting_data_source.cc

…

limiting_data_source.hh

…

linearizing_input_stream.hh

…

lister.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

lister.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

loading_cache.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

loading_shared_values.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

log_heap.hh

…

logalloc.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

logalloc.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

lru.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

managed_bytes.cc

utils/managed_bytes: add fmt::formatters for managed_bytes and friends

2024-02-23 11:32:41 +08:00

managed_bytes.hh

utils/managed_bytes: add support for fmt::to_string() to bytes and friends

2024-04-19 22:56:13 +08:00

managed_ref.hh

…

managed_vector.hh

util: do not use variable length array

2023-11-20 23:02:41 +02:00

maybe_yield.hh

…

memory_data_sink.hh

s3/client: Unmark put-object lambdas from mutable

2024-07-04 10:07:48 +03:00

memory_limit_reached.hh

Typos: fix typos in comments

2023-12-02 22:37:22 +02:00

multiprecision_int.cc

…

multiprecision_int.hh

…

murmur_hash.cc

utils/murmur_hash: replace rotl64() with std::rotl()

2024-06-24 08:24:43 +03:00

murmur_hash.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

mutable_view.hh

utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view

2024-02-09 17:00:33 +01:00

neat-object-id.hh

…

observable.hh

…

on_internal_error.cc

utils: add on_internal_error with common logger

2024-01-31 16:45:09 +02:00

on_internal_error.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

overloaded_functor.hh

…

phased_barrier.hh

…

preempt.hh

utils: preempt: add preemption_source

2024-02-07 18:31:28 +01:00

pretty_printers.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

pretty_printers.hh

treewide: fix misspellings in code comments

2024-01-31 09:16:10 +02:00

ranges.hh

…

rate_limiter.cc

…

rate_limiter.hh

Typos: fix typos in comments

2023-12-02 22:37:22 +02:00

recent_entries_map.hh

…

result_combinators.hh

…

result_loop.hh

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

result_try.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

result.hh

…

reusable_buffer.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

rjson.cc

utils/rjson.cc: include the function name in exception message

2024-09-12 15:22:49 +03:00

rjson.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

rpc_utils.hh

…

runtime.cc

…

runtime.hh

…

sequenced_set.hh

…

serialization.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

serialized_action.hh

…

simple_hashers.hh

…

small_vector.hh

treewide: do not define FMT_DEPRECATED_OSTREAM

2024-04-19 22:57:36 +08:00

sorting.hh

utils/sorting: allow to pass any container as verticies

2024-08-08 10:42:09 +02:00

stall_free.hh

utils/stall_free: introduce reserve_gently

2024-06-18 23:36:30 +05:30

streaming_histogram.hh

…

tagged_integer.hh

utils/tagged_integer: remove conversion to underlying integer

2024-08-15 02:12:58 +02:00

throttle.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

to_string.cc

nodetool: rebuild: add force option

2024-08-19 17:20:12 +03:00

to_string.hh

utils/to_string: include fmt/std.h if fmt >= v10

2024-04-23 12:09:05 +03:00

top_k.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

tuple_utils.hh

…

unconst.hh

…

updateable_value.cc

…

updateable_value.hh

Typos: fix typos in comments

2023-12-02 22:37:22 +02:00

user_provided_param.hh

nodetool: rebuild: add force option

2024-08-19 17:20:12 +03:00

utf8.cc

Typos: fix typos in comments

2023-12-02 22:37:22 +02:00

utf8.hh

…

UUID_gen.cc

utils: UUID_gen: include <atomic>

2024-04-30 09:07:22 +03:00

UUID_gen.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

uuid.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

UUID.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

value_or_reference.hh

kl::reader::make_reader: Unify interface with mx::reader::make_reader

2024-08-13 10:02:43 +02:00

variant_element.hh

…

vle.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

xx_hasher.hh

feature: grandfather DIGEST_FOR_NULL_VALUES

2024-05-18 00:24:00 +03:00