Commit Graph

44 Commits

Author SHA1 Message Date
Glauber Costa
799a6b5962 sstables: change summary_la to summary_ka
What we implement is ka, not la. Since the summary is the one element that
actually changed in the 2.2 implementation, it is particularly important that
we get this one right. I have previously missed this.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-11 17:47:48 +03:00
Glauber Costa
7bbf8c2a6f sstable types: correctly state version of metadata field
Don't let the current name fool you: Having this listed as "la" here
was just lack of discipline on my part. I meant by it "the format from
which we are importing" - which was named la for Origin. I wasn't
really thinking at the time that it would be dangerous to stop between
versions.

This should read ka, not la.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Avi Kivity
6a9d0495f8 sstables: fix memory corruption in metadata parsing
Since parsing involves a unique_ptr<metadata> holding a pointer to a
subclass of metadata, it must define a virtual destructor, or it can
cause memory leaks when deleted, or, with C++14 sized deallocators, it
can cause the wrong memory pool to be used for deleting the object.

Seen on EC2.

Define a virtual destructor to tell the compiler how to destroy
and free the object.
2015-07-22 17:46:37 +03:00
Tomasz Grabiec
e9a050da78 sstables: Obtain the key from entries using get_key() rather than casting to bytes_view
The entry contains not only the key, but other stuff like
position. Why would casting to bytes_view give the view on just the
key and not the whole entry. Better to be explicit.
2015-07-22 10:27:48 +02:00
Raphael S. Carvalho
79532b6603 sstables: merge prepare_statistics and add_statistics_metadata
The two separate functions can now be merged. As a result, the code
that generates statistics data is now much easier to understand.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-24 18:58:17 +03:00
Raphael S. Carvalho
f831d1bce9 sstables: add support to generate compaction metadata
compaction metadata is composed of ancestors and cardinality.

ancestors data is generated via compaction process, so it will be
empty by the time being.

cardinality data is generated by hashing the keys, offering the
values to hyperloglog and retrieving a buffer with the data to be
stored.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-23 12:13:26 -03:00
Glauber Costa
2dbd2b408a sstables: change describe_type's return type to auto
We always return a future, but with the threaded writer, we can get rid of
that. So while reads will still return a future, the writer will be able to
return void.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-08 15:25:35 +03:00
Raphael S. Carvalho
a9619866bb sstables: add initial support to generation of Statistics file
Statistics file is composed of three types of metadata:
- Validation
- Stats
- Compaction

This patch is adding support to generate the first two types.
Compaction is the hardest one to generate because it depends on
external modules. Anyway, I plan to convert whatever is needed
for us to support Compaction metadata as soon as possible.

Related to Stats metadata, we're filling the fields sstable_level
and repaired_at with default values. sstable_level is related to
compaction, and repaired_at is related to SStable repair.
In addition that we don't support compaction nor SStable repair yet,
those values come from upper layers in Cassandra.

Given the facts mentioned above, Statistics file is being generated
with only Validation and Stats metadata. Its on-disk format is
flexible enough so that a missing metadata won't damage it.
So it's technically possible to proceed without Compaction metadata
by the time being.

For reference:
../io/sstable/MetadataCollector.java
../io/sstable/ColumnStats.java
../io/sstable/format/big/BigTableWriter.java

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-02 10:32:12 +03:00
Raphael S. Carvalho
1310e79b98 sstables: convert EstimatedHistogram to C++ and start using it
In addition, this patch also fixes serialization and deserialization of
estimated histogram. Problem was found by reading the respective methods
in origin implementation.

The first element of the array offset is used for both the first and
second element of the array bucket. So given an array bucket of size N,
array offset will be of size N - 1. Our code wasn't handling this.

The new representation of estimated histogram provides us with methods
needed for writing the component Statistics.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-02 10:32:12 +03:00
Raphael S. Carvalho
05f2bfbe77 sstables: convert StreamingHistogram to C++ and start using it
This step was important to extend streaming_histogram with methods
needed for writing the SSTable component Statistics.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-02 10:32:12 +03:00
Raphael S. Carvalho
53a26a5966 sstables: move disk_* types to a header
That's needed to avoid circular dependencies of header files.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-02 10:32:12 +03:00
Raphael S. Carvalho
bdd3fe61c5 sstables: add initial support to generation of CRC component
CRC component is composed of chunk size, and a vector of checksums
for each chunk (at most chunk size bytes) composing the data file.
The implementation is about computing the checksum every time the
output stream of data file gets written. A write to output stream
may cross the chunk boundary, so that must be handled properly.
Note that CRC component will only be created if compression isn't
being used.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-01 12:25:01 -03:00
Avi Kivity
f2b82fd455 sstables/types.hh: add missing includes 2015-05-26 16:53:34 +03:00
Raphael S. Carvalho
4611fd373d sstables: add missing copyright
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-05-26 09:51:15 +03:00
Raphael S. Carvalho
57060b5dfe sstables: add initial support to generation of summary file
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-05-20 15:17:21 -03:00
Glauber Costa
9d3ef62789 sstables: add convenience constructors for filter type
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-05-19 11:22:41 -04:00
Raphael S. Carvalho
7dc9ba1714 sstables: write summary position field as little endian
The field entries must be written in memory order.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-05-17 11:55:35 +03:00
Glauber Costa
34c6cca845 sstable types: convert a deletion time to a tombstone
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-05-13 17:14:03 -04:00
Glauber Costa
0a4a5914a8 sstables: add helper method for deletion_time
That should make it easier for code to test if a cell or range is live whenever
needed.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-05-13 17:14:03 -04:00
Glauber Costa
2fba948ad8 sstables: move timestamps to signed integer
This is to follow Origin

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-05-13 17:14:02 -04:00
Raphael S. Carvalho
c646307711 sstables: add initial support to generation of data file
This initial version supports:
Regular columns
Clustering key
Compound Partition key
Compound Clustering key
Static Row

What's not supported:
Counters
Range tombstones
Collections
Compression
anything else that wasn't mentioned in the support list.

The generation of the data file consists of iterating through
a set of mutation_partition from a column_family, then writing
the SSTable rows according to the format.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-05-05 11:15:46 +03:00
Raphael S. Carvalho
f328609837 sstables: add write support to bytes_view
disk_string provides an easy way of serializing a string into the form
{ size, string[size] }. sstables::key, atomic_cell, among other types
provides a bytes_view for the view of data, so that's why this change
is needed. Otherwise, I would have to convert bytes_view into bytes,
which requires copy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-05-05 11:15:45 +03:00
Glauber Costa
198f55dc5c sstables: don't expose summary binary search
There is no need to expose binary search. It can be an internal function
that is accessible for test only.

Also, in the end, the implementation of the summary version was such a simple
one, that there is no need to have a specific method for that. We can just pass
the summary entries as a parameter.

Some header file massage is needed to keep it compiling

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-04-29 09:47:19 -04:00
Glauber Costa
28f936d6d2 sstables: summary binary search
Search code is trivially taken from Origin, but adapted so that the comparison
is done explicitly.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-04-24 10:11:50 -04:00
Glauber Costa
8508a246bb sstable: convenient view for types
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-04-24 10:11:49 -04:00
Glauber Costa
5b8b2e835a sstable: add sstable::key type
We have our own representation of a partition_key, clustering_key, etc. They
may different slightly from a legacy sstable key because we don't necessarily
serialize composites in our internal representation the same way as Origin
does. This patch encodes the Origin composite serialization, so we can create
keys that are compatible with Origin's understanding of what a partition key
should look like.

This whould be used when serializing or deserializing to/from an sstable.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-04-24 10:11:49 -04:00
Raphael S. Carvalho
c6e31346d8 sstables: add support to write the component Summary
The definition of summary_la at types.hh provides a good explanation
on the on-disk format of the Summary file.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-04-11 11:24:53 +03:00
Glauber Costa
a505ac487f sstables: use bytes instead of sstring
We should have done that from the start

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-04-11 11:22:30 +03:00
Raphael S. Carvalho
08e5d3ca8b sstables: add support to write the component statistics
This code adds the ability to write statistics to disk.

On-disk format:

uint32_t Size;
struct {
    uint32_t metadata_type;
    uint32_t offset; /* offset into this file */
} metadata_metadata[Size];

* each metadata_metadata entry corresponds to a metadata
stored in the file.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-04-04 12:51:50 +03:00
Raphael S. Carvalho
44735a3c88 sstables: add support to write the component filter
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-03-31 17:11:00 +03:00
Nadav Har'El
f80ac5a629 sstables: rework compression metadata to fix test.
Previously we had both a "compression" structure (read from the Compression
Info file on disk) and a "compression_metadata" class with additional
information, which std::move()ed parts of the compression structure.
This caused problems for the simplistic sstable-writing test (which does
the non-interesting thing of writing a previously-read sstable).

I'm ashamed to say, fixing this was very hard, because all this code is
built like a house of cards - try to change one thing, and everything
falls apart. After many failed attempts in trying to improve this code, what
I ended up doing is simply *extending* the "compression" structure - the
extended part isn't read or written, but it is in the structure.

We also no longer move a shared pointer to the compression structure,
but rather just an ordinary pointer; The assumption is that the user
will already make sure that the sstable structure will live for the
durations of any processing on it - and the compression structure is just
one part of this sstable structure.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-03-29 16:14:53 +03:00
Raphael S. Carvalho
daaa1a6dcb sstables: extend it to support write of components
By the time being, compression info is the unique component being
written by store(). Changes introduced by this patch are generic,
so as to make it easier writing other components as well.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-03-25 13:22:42 +02:00
Glauber Costa
37b9b4a08f sstable summary: provide method to query an index in the summary
Although the entries are in an array, and live on disk, the disk array
abstraction turns out to be a bad abstraction to read it in. This is because,
contrary to other types, the key sizes are not to be found on-disk. It is a lot
more convenient to treat it as a normal array to be constructed as a separate
step.

We will construct this array at load time, and provide a method that, given an
index, returns the corresponding key/position.  After a binary search - to be
implemented - we'll be able to fetch the real data.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-03-24 11:31:20 +02:00
Glauber Costa
e7011c1ce9 sstable: add two more summary fields
After the keys array, the Summary file includes the first and last keys in this file's
range. Add this to the format.

Note that there is still more information after that. But that seems to be related to
the writing method (it says mmap in my files), and not relevant for us.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-03-24 11:31:19 +02:00
Glauber Costa
bcf7e42933 column mask
Each column has a byte in the file that determines how to process whatever
data comes next. In the actual file, we can see one of those values, or a
combination of them.

Because it is an enum, no new parser is needed.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-03-15 10:47:34 +02:00
Glauber Costa
4e73bf8b11 sstables: deletion_time structure
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-03-15 10:47:33 +02:00
Glauber Costa
c0ad2a8e0e sstables: parse the index file
We usually don't read the whole file into memory, so the probing interface will
also allow for the specification of boundaries that we should be use for
reading.

The sstable needs to be informed - usually by the schema - of how many columns
the partition key is composed of - 1 for simple keys, more than one, for
composites.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-03-10 15:13:14 -03:00
Avi Kivity
f039904d75 Merge branch 'master' into db
Updated usages of std::hash<some_enum_type> to accomodate 25168fc73d.
2015-03-01 15:24:47 +02:00
Glauber Costa
fb3682cb4f sstable statistics file
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-02-26 12:14:19 -05:00
Glauber Costa
0d98caf885 summary file
TODO: read in the actual index. This is schema-dependent.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-02-26 12:14:19 -05:00
Glauber Costa
1b75a5bccb bloom filter
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-02-26 12:14:19 -05:00
Glauber Costa
d810f03bb7 compression file
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-02-26 12:14:19 -05:00
Glauber Costa
cc4f8f09e5 parser: disk_hash type
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-02-26 12:06:25 -05:00
Glauber Costa
001606209a sstable parser
This is a parser for the sstable files based on template recursion.
With this, one can easily extend it to parse any complex data structure
by writing

    parse(in, struct.a, struct.b ... );

This patch contains the most basic types used during parsing as building
blocks.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-02-26 12:06:25 -05:00