Before this patch, reading large ranges from a compressed data file involved
two inefficiencies:
1. The compressed data file was read one compressed chunk at a time.
Such a chunk is around 30 KB in size, well below our desired sstable
read-ahead size (sstable_buffer_size = 128 KB).
2. Because the compressed chunks have variable length (the uncompressed
chunk has a fixed length) they are not aligned to disk blocks, so
consecutive chunks have overlapping blocks which were unnecessarily
read twice.
The fix for both issues is to build the compressed_file_input_stream on
an existing file_input_stream, instead of using direct file IO to read the
individual chunks. file_input_stream takes care of doing the appropriate
amount of read-ahead, and the compressed_file_input_stream layer does the
decompression of the data read from the underlying layer.
Fixes#992.
Historical note: Implementing compressed_file_input_stream on top of
file_input_stream was already tried in the past, and rejected. The problem
at that time was that compressed_file_input_stream's constructor did not
specify the *end* of the range to read, so that when we wanted to read
only a small range we got too much read-ahead beyond the exactly one
compressed chunk that we needed to read. Following the fix to issue #964,
we now know on every streaming read also the intended *end* of the stream,
so we can now use this to stop reading at the end of the last required
chunk, even when we use a read-ahead buffer much larger than a chunk.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1457304335-8507-1-git-send-email-nyh@scylladb.com>
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
recently, "file" started to use a shared_ptr internally, and is already
copy-able and reference counted, and there is no reason to use
lw_shared_ptr<file>. This patch cleans up a few remaining places where
lw_shared_ptr<file> was used.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
If compression is used, we should provide both uncompressed and
compressed length to metadata collector, so as for the ratio to
be computed. Stats metadata stores compression ratio.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
lz4 is the unique compressor algorithm supported so far.
missing deflate and snappy algorithms.
Adding them should be relatively easy though.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
The field compressor is about saying which compressor algorithm
must be used in compression of sstable data file.
This is a small step towards compressed sstable data file.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
We always return a future, but with the threaded writer, we can get rid of
that. So while reads will still return a future, the writer will be able to
return void.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
CRC component is composed of chunk size, and a vector of checksums
for each chunk (at most chunk size bytes) composing the data file.
The implementation is about computing the checksum every time the
output stream of data file gets written. A write to output stream
may cross the chunk boundary, so that must be handled properly.
Note that CRC component will only be created if compression isn't
being used.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
It would be useful in some situations to know where does the data ends. If the
file is uncompressed, this is equivalent to the file length. If the data file
is compressed, this information needs to come from the compression structure.
Provide a method that encodes that.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Starting with LZ4, the default compressor.
Stub functions were added to other compression algorithms, which should
eventually be replaced with an actual implementation.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Previously we had both a "compression" structure (read from the Compression
Info file on disk) and a "compression_metadata" class with additional
information, which std::move()ed parts of the compression structure.
This caused problems for the simplistic sstable-writing test (which does
the non-interesting thing of writing a previously-read sstable).
I'm ashamed to say, fixing this was very hard, because all this code is
built like a house of cards - try to change one thing, and everything
falls apart. After many failed attempts in trying to improve this code, what
I ended up doing is simply *extending* the "compression" structure - the
extended part isn't read or written, but it is in the structure.
We also no longer move a shared pointer to the compression structure,
but rather just an ordinary pointer; The assumption is that the user
will already make sure that the sstable structure will live for the
durations of any processing on it - and the compression structure is just
one part of this sstable structure.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Move compress.{cc,hh} from db/ to sstables/. This makes more sense, as
this code is only used for sstables (un)compression.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>