mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-29 12:47:02 +00:00
When off-strategy is disabled, data segregation is not postponed, meaning that getting partition estimate right is important to decrease filter's false positives. With streaming, we don't have min and max timestamps at destination, well, we could have extended the RPC verb to send them, but turns out we can deduce easily the amount of windows using default TTL. Given partitioner random nature, it's not absurd to assume that a given range being streamed may overlap with all windows, meaning that each range will yield one sstable for each window when segregating incoming data. Today, we assume the worst of 100 windows (which is the max amount of sstables the input data can be segregated into) due to the lack of metadata for estimating the window count. But given that users are recommended to target a max of ~20 windows, it means partition estimate is being downsized 5x more than needed. Let's improve it by using default TTL when estimating window count, so even on absence of timestamp metadata, the partition estimation won't be way off. Fixes #15704. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>