Instead of issuing a system call for every aio, wait for them to accumulate,
and issue them all at once. This reduces syscall count, and allows the kernel
to batch requests (bu plugging the I/O queues during the call). A poller is
added so that requests are not delayed too much.
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
The field compressor is about saying which compressor algorithm
must be used in compression of sstable data file.
This is a small step towards compressed sstable data file.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
To handle the fact that --data-file-directories is supposed to be 1+
folders.
Note that boost::program_ops already "reserves" the use of std::vector
as reciever of values for multitoken options (i.e. those with more than
one value). Thus, values recieving a list of tokens via command line
should adhere to the multi-token rules, i.e. space separated values.
End result is that --data-file-directories now accept multiple paths,
white space separated,
i.e. --data-file-directories <path1> <path2>
And as it turns out, this is really a nicer way of writing stuff than
using "," or ":" seperation of paths etc, so...
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Pekka says:
"This series is the initial table merging code conversion. We now store
column family metadata in the database but without information about the
actual columns."
From Glauber:
"Up until now, we were still generating one future per element that we write.
Now that we have new infrastructure, we can avoid that, and generate only the
ones we really need to. This has the added advantage of lifting the need to do
lambda captures and allowing for a more straightfoward forwarding of rest...
parameters"
Up until now, we were still generating one future per element that we write.
Now that we have new infrastructure, we can avoid that, and generate only the
ones we really need to. This has the added advantage of lifting the need to do
lambda captures and allowing for a more straightfoward forwarding of rest...
parameters
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We always return a future, but with the threaded writer, we can get rid of
that. So while reads will still return a future, the writer will be able to
return void.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Table merging code needs to compare schema_ptrs for equality so add
comparison operators for column_definition and schema classes.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
For some reason, I added a fsync call when the file underlying the
stream gets truncated. That happens when flushing a file, which
size isn't aligned to the requested DMA buffer.
Instead, fsync should only be called when closing the stream, so this
patch changes the code to do that.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Glauber says:
"The current series fixes a small conversion bug and brings some much needed
cleanups (like in key.cc).
With that in place, it implements and wires support for collections and range
tombstones."
We're looking up shared_ptr<column_identifier> type so make sure we
lookup by value, not by pointer.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
We did so because passing shared pointers in the old code was so much easier.
But it is no longer, so we can avoid the reference bump.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This is the code to write a range tombstone. This is not yet wired up to actually do it.
The use case for collections is a lot simpler, and will be handled first.
The actual code should be virtually identical, though.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We can insert markers in the end of composites, which can be used to identify
the presence of ranges in a column.
One option, would be to change all methods in sstables/key.hh to take an
optional marker parameter, and append that as the last marker.
But because we are talking about a single byte, and always added to the end,
it's a lot easier to allow the composite to be created normally, and then
replace the last byte with the marker.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Tomek got confused with the fact that we had to pass bytes_type for this code
to work. And well, that's understandable: that code evolved quite a bit since
its first user, and now the interface is not quite the best for the job,
forcing us to employ weird tricks like that for the code to work.
In this cleanup, I am creating a serializer object, that will encode
information about how to serialize the component passed. In the majority of the
cases, a simple sstable_serializer will just serialize to itself - accepting as
parameters byte_views.
In the case we need to operate on a deeply exploded view - the only case for
which we truly needed types, the respective serializer will take a types vector
and use it accordingly.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
In a testament to how confusing our old code was, while collapsing the futures
I ended up getting the end_of_row element inside the loop for clustered keys.
The end of row, as the name implies, should only be written at the end of the
(thrift) row (= CQL partition).
Move it to the right place.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
writer.hh includes sstables.hh which includes writer.hh
We can't remove the reference if we include core/fstream.hh into writer.hh instead
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
From Glauber:
"This my attempt to convert the sstable write code to seastar threads.
The code does look a lot cleaner now, and the future path on how to improve
it, a bit more clear."
This code is blatantly wrong, because it writes stack variables to the
underlying storage.
After this patch, the code is no longer wrong. Right is better than wrong,
so we should apply this.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Technically speaking, the current code is not wrong. However, it was written
before we had do_with, and I ended up dowithing it while chasing our erratic
bug under the suspicion that this code could somehow be related with our bug.
Turns out it isn't, but now that I went through the trouble of dowithing it -
and since do_with is easier to reason about and guarantee liveness, let's go
with this option.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
checksum and crc are written inside the main function so we don't need
to export the file stream. But since the functions are actually trivial
we can just .get() the whole thing instead of changing them.
The others are still kept as futures and called after async::thread completes,
for maximum parallelism.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
After the last round of cleanups, this function turned out to be exactly the
same as write_column_name, except that the composite differs. Because the
composite is passed as a parameter, we can just use the same function for all
and pre-create the composite.
This will make the implementation of collections a lot easier, since for
collections we will prepend each element with the column name.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This is my attempt to convert the sstable code to seastar threads, as
we have been extensively discussing. I haven't yet measured how much do we
gain by this, but the final code looks *so* much better and less complicated,
that this alone should be enough reasoning.
Here's how I've done it, so you can easily follow:
Every function that we use and returns a future, is copied to another function
with the same name but ending in _t. This is better than copying the whole thing,
because it can be done in logical pieces that are easier to follow. This is also
easier to verify.
function_t() will do the same as function(), but will return void.
I am not changing more than I need to, so in the final code, without all the
do_withs and other stuff, there are some parts that start to cry for a cleanup.
They are left as is for now, and I will return to them later once the patch is
merged.
In this initial patch, you get the main write_components converted, and this
nice explanation message.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>