file_data_source_impl always calls file::size(), which can be slow. This
slows down applications that create many short-lived input streams on the
same file (for random-access processing of a subset of the data).
Fix by not calling size(), and letting the file code handle short reads
itself.
the file_data_sink_impl::put() code assumes it is always called on buffers
with size multiple of dma alignment (4096), except the *last* one. After
writing one unaligned-size buffer, further writes cannot continue because
the offset into the file no longer has the right alignment! If a caller
does try to do that, there is a bug in the caller (it's not a run-time error,
it's a design bug), and better discover it quickly with an assert, as I do
in this patch.
I had such a caller in an example application, and it took me a whole day
of debugging just to figure out that this is where the caller actually had
a bug.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Our file-output code contains various layers making sometimes contradictory
assumptions, and it is a real art-form to make it all work together.
They usually do work together well, but there was one undetected bug for
large writes to a file output stream:
The problem is what happens when we try to write a large buffer (larger
than the output stream's buffer) in one output_stream::write() call.
By default, output_stream uses the efficient, zero-copy, implementation
which calls the underlying data sink's put function on the entire written
buffer, without copying it to the stream's buffer first.
Unfortunately, this solution does NOT work on *file* output streams.
Because of our use of AIO and O_DIRECT, we can only write from aligned
buffers, and at aligned (multiple of dma_alignment) sizes. Even a large
size cannot be fully written if not a multiple of dma_alignment, and
the need to align the buffers, and data already on the output_stream,
complicate things further.
Amazingly, we already had an option "_trim_to_size" in output_stream to
do the right thing, and we just need to enable it for file output stream.
In special cases (aligned position, aligned input buffer) it might be
possible to do something even more efficient - zero copy and just one
write request - but in the general case, _trim_to_size is exactly what
we needed.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
For some reason, I added a fsync call when the file underlying the
stream gets truncated. That happens when flushing a file, which
size isn't aligned to the requested DMA buffer.
Instead, fsync should only be called when closing the stream, so this
patch changes the code to do that.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
There is a hidden exception that could be thrown insize file::dma_get_bulk()
if file size is not aligned with fstream::_buffer_size. In this case
file::dma_read_bulk() will be given a _buffer_size as a length for the last data
chunk in the file too. file::dma_read_bulk() will get a short read (till EOF)
and then will try to read beyond it (by calling file::read_maybe_eof())
in order to differentiate between I/O error and EOF. file::read_maybe_eof()
will throw a file::eof_error exception to indicate the EOF, it will be caught
by file::dma_read_bulk() and since we have read some "good" bytes by now this
exception won't be forwarded further. However the damage by throwing the exception
has already been done and we want to avoid this in fstream flow (unless there are
real errors).
In order to prevent the above we will always request file::dma_read_bulk() to
read the amount of data it should be able to deliver (not beyond EOF).
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Move the get() logic in fstream.cc into the file::dma_read_bulk()
fixing some issues:
- Fix the funny "alignment" calculation.
- Make sure the length is aligned too.
- Added new functions:
- dma_read(pos, len): returns a temporary_buffer with read data and
doesn't assume/require any alignment from either "pos"
or "len". Unlike dma_read_bulk() this function will
trim the resulting buffer to the requested size.
- dma_read_exactly(pos, len): does exactly what dma_read(pos, len) does but it
will also throw and exception if it failed to read
the required number of bytes (e.g. EOF is reached).
- Changed the names of parameters of dma_read(pos, buf, len) in order to emphasize
that they have to be aligned.
- Added a description to dma_read(pos, buf, len) to make it even more clear.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
In fstream.cc, use the convenient new temporary_buffer<char>::aligned()
function for creating an aligned temporary buffer - instead of repeating
its implementation.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
The file_input_stream interface was messy: it was not fiber safe (e.g., code
doing seek() in the middle of an ongoing read_exactly()), and went against
the PIMPL philosophy.
So this patch removes the file_input_stream class, and replaces it with a
completely different design:
We now have in fstream.hh a global function:
input_stream<char>
make_file_input_stream(
lw_shared_ptr<file> file, uint64_t offset = 0,
uint64_t buffer_size = 8192);
In other words, instead of "seeking" in an input stream, we just open a new
input stream object at a particular offset of the given file. Multiple input
streams might be concurrently active on the same file.
Note how make_file_input_stream now returns a regular "input_stream", not a
subtype, and it can be used just like any normal input_stream to read the stream
starting at the given position.
This patch makes "input_stream" a "final" type: we no longer subclass it in our
code, and we shouldn't in the future because it goes against the PIMPL design
(the subclass should be of the inner workings, like the data_source_impl, not
of input_stream).
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Our file_stream interface supports seek, but when we try to seek to arbitrary
locations that are smaller than an aio-boundary (say, for instance, f->seek(4)),
we will end up not being able to perform the read.
We need to guarantee the reads are aligned, and will then present to the caller
the buffer properly offset.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>