mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-25 19:10:42 +00:00
"When reading a single row it is possible that the read will be satisfied by just reading from one of the data source candidates. To exploit this an optimization is employed which sorts data source candidates by their timestamp and reads mutations from the most recent to the oldest. When all needed cells are present and their earliest timestamp is still later than the latest one of the remaining data source the read can be terminated early. However this optimization also has the possibility to backfire as the data sources are read sequentially, so if all of them has to be read eventually then we will end up worse then without it. Thus the optimization can be disabled up-front or enabled to only run until its efficiency degrades below a certain threshold. Also counters are added to column-families to make it possible to observe how well it performs. Benchmarking Benchmarking was done with disabled cache and at a constant op rate of 4k (1/3 of the max op rate on my box), against 3 sstables containing the same 10000 rows. 1) Optimization turned off (all sstables read paralelly) latency mean : 1.3 [simple:1.3] latency median : 1.0 [simple:1.0] latency 95th percentile : 2.4 [simple:2.4] latency 99th percentile : 2.9 [simple:2.9] latency 99.9th percentile : 8.0 [simple:8.0] latency max : 13.5 [simple:13.5] 2) Optimization turned on, best case (1 of 3 sstables read) latency mean : 0.6 [simple:0.6] latency median : 0.6 [simple:0.6] latency 95th percentile : 1.0 [simple:1.0] latency 99th percentile : 1.2 [simple:1.2] latency 99.9th percentile : 4.4 [simple:4.4] latency max : 13.4 [simple:13.4] 3) Optimization turned on, best case, IN query (1 of 3 sstables read) latency mean : 0.7 [simple_in:0.7] latency median : 0.6 [simple_in:0.6] latency 95th percentile : 1.1 [simple_in:1.1] latency 99th percentile : 1.4 [simple_in:1.4] latency 99.9th percentile : 5.4 [simple_in:5.4] latency max : 16.8 [simple_in:16.8] 4) Optimization turned on, worst case (3 of 3 sstables read sequentally) latency mean : 2.8 [simple:2.8] latency median : 2.3 [simple:2.3] latency 95th percentile : 5.4 [simple:5.4] latency 99th percentile : 6.5 [simple:6.5] latency 99.9th percentile : 13.5 [simple:13.5] latency max : 19.2 [simple:19.2] 5) Optimization turned on, mid case (2 of 3 sstables read sequentally) latency mean : 1.4 [simple:1.4] latency median : 1.1 [simple:1.1] latency 95th percentile : 2.7 [simple:2.7] latency 99th percentile : 3.2 [simple:3.2] latency 99.9th percentile : 7.7 [simple:7.7] latency max : 15.1 [simple:15.1]" Ref #324 * 'bdenes/optimize_single_row_read_v6' of github.com:denesb/scylla: Add unit tests for single_key_sstable_reader Add counters for the single-key reader optimization Add single_key_parallel_scan_threshold option single_key_sstable_reader: optimize single-row queries single_key_sstable_reader: move reading code into it's own method Add selects_only_full_rows() and selects_only_full_rows_with_atomic_columns()