From Calle:
"Yet another reworking of schema/schema_builder interaction.
* schema::raw_schema now has single sorted column vector + offsets
* schema can be constructed from raw_schema
* schema_builder also contains raw_schema
* schema_builder can construct from schema, ensuring all info is kept
* schema_builder->schema via raw_schema"
There's no need to use decorated_key as schema results map key because
we just convert it back to a keyspace name string. It does, however,
cause problems when trying to pass the results around to different CPUs
so just switch to the sstring type.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This runs the file transformer on a file before returning the result.
This is used for templating support.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch clean up the gossiper implementation by using the new square
bracket operator for path param and by using the general function
container_to_vec.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Statistics file is composed of three types of metadata:
- Validation
- Stats
- Compaction
This patch is adding support to generate the first two types.
Compaction is the hardest one to generate because it depends on
external modules. Anyway, I plan to convert whatever is needed
for us to support Compaction metadata as soon as possible.
Related to Stats metadata, we're filling the fields sstable_level
and repaired_at with default values. sstable_level is related to
compaction, and repaired_at is related to SStable repair.
In addition that we don't support compaction nor SStable repair yet,
those values come from upper layers in Cassandra.
Given the facts mentioned above, Statistics file is being generated
with only Validation and Stats metadata. Its on-disk format is
flexible enough so that a missing metadata won't damage it.
So it's technically possible to proceed without Compaction metadata
by the time being.
For reference:
../io/sstable/MetadataCollector.java
../io/sstable/ColumnStats.java
../io/sstable/format/big/BigTableWriter.java
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Column name helper is used to manage a list of column names, where
each component is either the max or min of its own position.
It's useful when generating statistics because Stats metadata has
both max_column_names and min_column_names lists.
For reference: ../io/sstable/ColumnNameHelper.java
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
In addition, this patch also fixes serialization and deserialization of
estimated histogram. Problem was found by reading the respective methods
in origin implementation.
The first element of the array offset is used for both the first and
second element of the array bucket. So given an array bucket of size N,
array offset will be of size N - 1. Our code wasn't handling this.
The new representation of estimated histogram provides us with methods
needed for writing the component Statistics.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
This step was important to extend streaming_histogram with methods
needed for writing the SSTable component Statistics.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Validation metadata stores partitioner name and bloom filter chance.
Cassandra gets the partitioner name by getting a object of the class
itself and getting its canonical name.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Digest file stores adler checksum of data file converted into a
string. Testcase is added.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
CRC component is composed of chunk size, and a vector of checksums
for each chunk (at most chunk size bytes) composing the data file.
The implementation is about computing the checksum every time the
output stream of data file gets written. A write to output stream
may cross the chunk boundary, so that must be handled properly.
Note that CRC component will only be created if compression isn't
being used.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Instead of using do_with(), we open-code it and do it badly, dropping
the reference count on the remote shard.
Fix by dropping the reference count on the local core.
Use std::unordered_map instead of boost:bimap. std::unordered_map is
much much easier to use. It is a premature optimization to user bimap.
We can iterate the map to check if host_id is unique. Modification of
host_id is not a frequent or performance sensitive operation anyway.