Currently, `cached_file::stream` (currently used only by index_reader,
to read index pages), works as follows.
Assume that the caller requested a read of the range [pos, pos + size).
Then:
- If the first page of the requested range is uncached,
the entire [pos, pos + size) range is read from disk (even if some
later pieces of it are cached), the resulting pages are added to the cache,
and the read completes (most likely) from the cached pages.
- If the first page of the read is cached, then the rest of the read
is handled page-by-page, in a sequential loop, serving each page
either from cache (if present) or from disk.
For example, assume that pages 0, 1, 2, 3, 4 are requested.
If exactly pages 1, 2 are cached, then `stream` will read the entire [0, 4] range
from disk and insert the missing 0, 3, 4, and then it will continue serving the
read from cache.
If exactly pages 0 and 3 are cached, then it will serve 0 from cache,
then it will read 1 from disk and insert it into cache,
then it will read 2 from disk and insert it into cache,
then it will serve 3 from cache,
then it will read 4 from disk and insert it into cache.
If exactly the first page is cached, a 128 kiB read turns
into 31 I/O sequential read ops.
This is weird, and doesn't look intended. In one case, we are reading even pages
we already have, just to avoid fragmenting the read, and in the other case
we are reading pages one-by-one (sequentially!) even if they are neighbours.
I'm not sure if cached_file should minimize IOPS or byte throughput,
but the current state is surely suboptimal. Even if its read strategy
is somehow optimal, it should still at least coalesce contiguous reads
and perform the non-contiguous reads in parallel.
This patch leans into minimizing IOPS. After the patch, we serve
as many front pages from the cache as we can, but when we see
an uncached page, we read the entire remainder of the read from disk.
As if we trimmed the read request by the longest cached prefix,
and then performed the rest using the logic from before the patch.
For example, if exactly pages 0 and 3 are cached,
then we serve 0 from cache,
then we read [1, 4] from disk and insert everything into cache.
For partially-cached files, this will result in more bytes read
from disk, but less IOPS. This might be a bad thing. But if so,
then we should lean the other way in a more explicit and efficient
way than we currently do.
Closesscylladb/scylladb#20935