All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We have an API that wraps open_file_dma which we use in some places, but in
many other places we call the reactor version directly.
This patch changes the latter to match the former. It will have the added benefit
of allowing us to make easier changes to these interfaces if needed.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>
Since bytes is a very generic value that is returned from many calls,
it is easy to pass it by mistake to a function expecting a data_value,
and to get a wrong result. It is impossible for the data_value constructor
to know if the argument is a genuine bytes variable, a data_value of another
type, but serialized, or some other serialized data type.
To prevent misuse, make the data_value(bytes) constructor
(and complementary data_value(optional<bytes>) explicit.
We use boost::any to convert to and from database values (stored in
serlialized form) and native C++ values. boost::any captures information
about the data type (how to copy/move/delete etc.) and stores it inside
the boost::any instance. We later retrieve the real value using
boost::any_cast.
However, data_value (which has a boost::any member) already has type
information as a data_type instance. By teaching data_type intances about
the corresponding native type, we can elimiante the use of boost::any.
While boost::any is evil and eliminating it improves efficiency somewhat,
the real goal is growing native type support in data_type. We will use that
later to store native types in the cache, enabling O(log n) access to
collections, O(1) access to tuples, and more efficient large blob support.
The current codes assumes a particular dir/generation pair. We
will use it for a more generic case. This code could really use some
clean up, by the way. We should do it later.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
There is no reason aside from testing for a table to just change its generation
number.
There will be, however, when we support loading new sstables. The method
however needs to be completely rewritten, so let's make sure the tests are not
using that.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
Now that we are using the NSM, and not the general parser for the index, there
is no reason to keep using disk_string<>s in it. Since it is staying in the way
of further optimizations, let's get rid of it and use bytes directly.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
read_indexes was one of the first functions coded in the sstable read path. At
the time, I made the (now so obviously) wrong decision to code it generic
enough so that we could specify the number of items to be read, instead of an
upper bound in the file.
The main reason for that, was that without the Summary, we have no way to know
where to stop reading, and the Summary is a relatively new addition to the C*
codebase: while I didn't really check when it got in, the code is full of tests
for its presence.
That turned out to be totally useless: we always read the indexes with the help
of the Summary. While the Summary is a relatively new addition to C*, it is
present in all version we aim to support. Meaning that reads without the
Summary will never happen in our codebase.
Even if, in the future, we happen to ditch the Summary file, we are very likely
to do so in favor of some other structure that also allows us to manipulate precise
borders in the Index.
The code as it is, however, would not be too big of a problem if that wasn't
causing us performance problems. But it is, and the majority of it is caused by
the fact that our underlying read_indexes do not know in advance how many bytes
to read, forcing us to do an element-per-element read.
It's time for a change.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
When a schema is available, we use it. However, we have, by now, way too many
tests. Some of them use tables for which we don't even know the schema. It would
have been a massive amount of work to require a schema for all of them - so I am
keeping both constructors around.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>