Direct motivation for this is to be able to use two index readers from
a single mutation reader, one for lower bound of the range and one for
the upper bound of the range, without sacrificing optimization of
avoiding index reads when forwarding to partition ranges which are
close by. After the change, all index readers of given sstable will
share index buffers, so lower bound reader can reuse the page read by
the upper bound reader.
The reason for using two readers will be so that we are able to skip
inside the partition range, not only outside of it. This is not
possible if we use the same index reader to locate the upper bound of
the range, because we may only advance the cursor.
Index reader already can be queried only with monotonic positions, so
the concept of a cursor is ingrained. Making it explicit will make it easier
to define behavior for forwarding withing the partition.
After the change:
- lower_bound() is renamed to advance_to() and doesn't return
the position, only advances the cursor
- data file position for partition under cursor can be obtained
at any time with data_file_position()
If sstable Summary is not present Scylla does not refuses to boot but
instead creates summary information on the fly. There is a bug in this
code though. Summary files is a map between keys and offsets into Index
file, but the code creates map between keys and Data file offsets
instead. Fix it by keeping offset of an index entry in index_entry
structure and use it during Summary file creation.
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20161116165421.GA22296@scylladb.com>
index_reader is a helper that implements index lookups. Its goal is to
avoid dropping read buffers if they still may be needed (for example to
get end bound of the range or after fast forwarding the reader).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This is done so we can use other consumers. An example of that, is regeneration
of the Summary from an existing Index.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The one thing that is still showing pretty high at the read_indexes flamegraph,
is allocations.
We can, however, do better. Since most of the index is the keys anyway - and we need
all of them, the amount of memory we use by copying the buffers over is about the same
as the space we would use by just keeping the buffers around.
So we can change index_entry to just keep the shared_buffers, and since we always access
it through views anyway, that is perfectly fine. The index_entry destructor will then
release() the temporary_buffer, instead of doing this after the buffer copy.
This gives us a nice additional 4 %.
perf_sstable_g --smp 1 --iterations 30 --parallelism 1 --mode index_read
Before:
839484.65 +- 585.52 partitions / sec (30 runs, 1 concurrent ops)
After:
873323.18 +- 442.52 partitions / sec (30 runs, 1 concurrent ops)
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Now that we are using the NSM, and not the general parser for the index, there
is no reason to keep using disk_string<>s in it. Since it is staying in the way
of further optimizations, let's get rid of it and use bytes directly.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Since the child is a base class, we don't need to pass a reference: we can
just cast our 'this' pointer.
By doing that, the move constructor can come back.
Welcome back, move constructor.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>