mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-30 03:30:49 +00:00
When murmur3_partitioner_ignore_msb_bits = 12 (which we'd like to be the default), a scan range can be split into a large number of subranges, each going to a separate shard. With the current implementation, subranges were queried sequentially, resulting in very long latency when the table was empty or nearly empty. Switch to an exponential retry mechanism, where the number of subranges queried doubles each time, dropping the latency from O(number of subranges) to O(log(number of subranges)). If, during an iteration of a retry, we read at most one range from each shard, then partial results are merged by concatentation. This optimizes for the dense(r) case, where few partial results are required. If, during an iteration of a retry, we need more than one range per shard, then we collapse all of a shard's ranges into just one range, and merge partial results by sorting decorated keys. This reduces the number of sstable read creations we need to make, and optimizes for the sparse table case, where we need many partial results, most of which are empty. We don't merge subranges that come from different partition ranges, because those need to be sorted in request order, not decorated key order. [tgrabiec: trivial conflicts] Message-Id: <20161220170532.25173-1-avi@scylladb.com>