Files
scylladb/tests
Duarte Nunes 5e9cd128ad Merge 'Single key sstable reader optimization' from Botond
"When reading a single row it is possible that the read will be satisfied
by just reading from one of the data source candidates. To exploit this
an optimization is employed which sorts data source candidates by their
timestamp and reads mutations from the most recent to the oldest. When
all needed cells are present and their earliest timestamp is still
later than the latest one of the remaining data source the read can be
terminated early.
However this optimization also has the possibility to backfire as the
data sources are read sequentially, so if all of them has to be read
eventually then we will end up worse then without it.
Thus the optimization can be disabled up-front or enabled to only run
until its efficiency degrades below a certain threshold.
Also counters are added to column-families to make it possible to
observe how well it performs.

Benchmarking

Benchmarking was done with disabled cache and at a constant op rate of
4k (1/3 of the max op rate on my box), against 3 sstables containing the
same 10000 rows.

1) Optimization turned off (all sstables read paralelly)
latency mean              : 1.3 [simple:1.3]
latency median            : 1.0 [simple:1.0]
latency 95th percentile   : 2.4 [simple:2.4]
latency 99th percentile   : 2.9 [simple:2.9]
latency 99.9th percentile : 8.0 [simple:8.0]
latency max               : 13.5 [simple:13.5]

2) Optimization turned on, best case (1 of 3 sstables read)
latency mean              : 0.6 [simple:0.6]
latency median            : 0.6 [simple:0.6]
latency 95th percentile   : 1.0 [simple:1.0]
latency 99th percentile   : 1.2 [simple:1.2]
latency 99.9th percentile : 4.4 [simple:4.4]
latency max               : 13.4 [simple:13.4]

3) Optimization turned on, best case, IN query (1 of 3 sstables read)
latency mean              : 0.7 [simple_in:0.7]
latency median            : 0.6 [simple_in:0.6]
latency 95th percentile   : 1.1 [simple_in:1.1]
latency 99th percentile   : 1.4 [simple_in:1.4]
latency 99.9th percentile : 5.4 [simple_in:5.4]
latency max               : 16.8 [simple_in:16.8]

4) Optimization turned on, worst case (3 of 3 sstables read sequentally)
latency mean              : 2.8 [simple:2.8]
latency median            : 2.3 [simple:2.3]
latency 95th percentile   : 5.4 [simple:5.4]
latency 99th percentile   : 6.5 [simple:6.5]
latency 99.9th percentile : 13.5 [simple:13.5]
latency max               : 19.2 [simple:19.2]

5) Optimization turned on, mid case (2 of 3 sstables read sequentally)
latency mean              : 1.4 [simple:1.4]
latency median            : 1.1 [simple:1.1]
latency 95th percentile   : 2.7 [simple:2.7]
latency 99th percentile   : 3.2 [simple:3.2]
latency 99.9th percentile : 7.7 [simple:7.7]
latency max               : 15.1 [simple:15.1]"

Ref #324

* 'bdenes/optimize_single_row_read_v6' of github.com:denesb/scylla:
  Add unit tests for single_key_sstable_reader
  Add counters for the single-key reader optimization
  Add single_key_parallel_scan_threshold option
  single_key_sstable_reader: optimize single-row queries
  single_key_sstable_reader: move reading code into it's own method
  Add selects_only_full_rows() and selects_only_full_rows_with_atomic_columns()
2017-10-18 16:38:53 +01:00
..
2017-09-29 15:17:53 +02:00
2017-07-16 11:55:08 +02:00
2017-08-10 15:01:10 -04:00