Files
scylladb/replica
Mikołaj Sielużycki 4cd42f97d0 table: Prevent creating unbounded number of sstables
If we reach a situation where flush rate exceeds compaction rate, we may
end up with arbitrarily large number of sstables on disk. If a read is
executed in such case, the amount of memory required is proportional to
the number of sstables for the given shard, which in extreme cases can
lead to OOM.

In the wild, this was observed in 2 scenarios:
- A node with >10 shards creates a keyspace with thousands of tables,
  drops the keyspace and shuts down before compaction finishes. Dropping
  keyspace drops tables, and each dropped table is smp::count writes to
  system.local table with flush after write, which creates tens of
  thousands of sstables. Bootstrap read from system.local will run OOM.
- A failure to agree on table schema (due to a code bug) between nodes
  during repair resulted in excessive flushing of small sstables which
  compaction couldn't keep up with.

In the unit test introduced in this patch series it can be proved that
even hard setting maximum shares for compaction and minimum shares for
flushing doesn't tilt the balance towards compaction enough to prevent
the problem. Since it's a fast producer, slow consumer problem, the
remaining solution is to block producer until the consumer catches up.
If there are too many table runs originating from memtable, we block the
current flush until the number of sstables is reduced (via ongoing
compaction or a truncate operation).
2022-06-15 10:57:28 +02:00
..