mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-24 00:32:15 +00:00
If we reach a situation where flush rate exceeds compaction rate, we may end up with arbitrarily large number of sstables on disk. If a read is executed in such case, the amount of memory required is proportional to the number of sstables for the given shard, which in extreme cases can lead to OOM. In the wild, this was observed in 2 scenarios: - A node with >10 shards creates a keyspace with thousands of tables, drops the keyspace and shuts down before compaction finishes. Dropping keyspace drops tables, and each dropped table is smp::count writes to system.local table with flush after write, which creates tens of thousands of sstables. Bootstrap read from system.local will run OOM. - A failure to agree on table schema (due to a code bug) between nodes during repair resulted in excessive flushing of small sstables which compaction couldn't keep up with. In the unit test introduced in this patch series it can be proved that even hard setting maximum shares for compaction and minimum shares for flushing doesn't tilt the balance towards compaction enough to prevent the problem. Since it's a fast producer, slow consumer problem, the remaining solution is to block producer until the consumer catches up. If there are too many table runs originating from memtable, we block the current flush until the number of sstables is reduced (via ongoing compaction or a truncate operation).