Files
scylladb/docs/dev/row_cache.md
copilot-swe-agent[bot] 481d5ae2e5 Replace LRU with W-TinyLFU cache eviction policy
Add Count-Min Sketch for frequency estimation and replace the single LRU
list with a W-TinyLFU policy using window, probation, and protected segments.
Update cache_tracker::touch() to use the new touch() method that handles
segment-aware promotion.

Co-authored-by: dorlaor <1735237+dorlaor@users.noreply.github.com>
2026-03-13 21:09:52 +00:00

3.5 KiB

Row Cache

Introduction

This document assumes familiarity with the mutation model and MVCC.

Cache is always paired with its underlying mutation source which it mirrors. That means that from the outside it appears as containing the same set of writes. Internally, it keeps a subset of data in memory, together with information about which parts are missing. Elements which are fully represented are called "complete". Complete ranges of elements are called "continuous".

Eviction

Eviction is about removing parts of the data from memory and recording the fact that information about those parts is missing. Eviction doesn't change the set of writes represented by cache as part of its mutation_source interface.

The smallest object which can be evicted, called eviction unit, is currently a single row (rows_entry). Eviction units are managed by a W-TinyLFU policy owned by a cache_tracker. The W-TinyLFU policy determines eviction order. It is shared among many tables. Currently, there is one per database.

W-TinyLFU Eviction Policy

The cache uses a W-TinyLFU (Window Tiny Least Frequently Used) eviction policy, which combines recency and frequency information for better hit rates than plain LRU.

The policy organizes entries into three segments:

  • Window (~1% of cache): A small LRU that admits all new entries. This allows new entries to build up frequency information before competing for main cache space.
  • Probation (~19% of cache): Part of the main SLRU cache. Entries from the window compete with probation victims for admission using a TinyLFU frequency filter.
  • Protected (~80% of cache): The other part of the main SLRU cache. Entries are promoted here from probation when accessed again.

The TinyLFU frequency filter uses a Count-Min Sketch to compactly estimate access frequency. When eviction is needed, the window victim competes with the probation victim: the entry with higher estimated frequency survives in probation while the other is evicted. The sketch is periodically aged (all counts halved) to adapt to changing access patterns.

All rows_entry objects which are owned by a cache_tracker are assumed to be either contained in a cache (in some row_cache::partitions_type) or be owned by a (detached) partition_snapshot. When the last row from a partition_entry is evicted, the containing cache_entry is evicted from the cache.

We never evict individual partition_version objects independently of the containing partition_entry. When the latest version becomes fully evicted, we evict the whole partition_entry together with all unreferenced versions. Snapshots become detached. partition_snapshots go away on their own, but eviction can make them contain no rows. Snapshots can undergo eviction even after they were detached from its original partition_entry.

The static row is not evictable, it goes away together with the partition. Partition reads which only read from the static row keep it alive by touching the last dummy rows_entry.

Every partition_version has a dummy entry after all rows (position_in_partition::after_all_clustering_rows()) so that the partition can be tracked in the LRU even if it doesn't have any rows and so that it can be marked as fully discontinuous when all of its rows get evicted.

rows_entry objects in memtables are not owned by a cache_tracker, they are not evictable. Data referenced by partition_snapshots created on non-evictable partition entries is not transferred to cache, so unevictable snapshots are not made evictable.