Commit Graph

325 Commits

Author SHA1 Message Date
Raphael S. Carvalho
2608427469 sstables: add support to range tombstone of a clustered row
Range tombstone for a clustered row wasn't supported, so an assert
to remember that was being triggered.
Testcase was added.

Fixes #158.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-18 10:41:25 +03:00
Avi Kivity
7a14bcd66e Merge "API: add get estimated row size histogram to column family" from Amnon
"This series cleans the streaming_histogram and the estimated histogram that
were importad from origin, it then uses it to get the estimated min and max row
estimation in the API."
2015-08-16 17:31:23 +03:00
Glauber Costa
d552d99cdd sstables: record index file size on opening
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-16 17:12:20 +03:00
Raphael S. Carvalho
82425fd24a sstables: initial work on handling a partially written sstable
The solution was proposed by Nadav. When writing a new sstable,
write all usual files, write the TOC to a temporary file, and
then rename it, which is atomic.
Files not belonging to any TOC are invalid, so we ensure that
partially written sstables aren't reused.

Avi also proposed using fsync on the sstable directory to guarantee
that the files reached the disk before sealing the sstable.

Subsequently, we should add code to avoid loading sstable which
TOC is either temporary or doesn't exist. Temporary TOC files
should also be deleted.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-16 13:01:44 +03:00
Amnon Heiman
d97f9ea4c9 sstable add a getter for the sstable stats
This adds a getter function for the statistic of the sstable.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-12 13:10:18 +03:00
Amnon Heiman
13b6b0ce02 Cleaning the metadata_collector
This changes the constructor initilization of the metadata_collecr, it
would call the constructor directly without the java-like assignment.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-12 13:10:18 +03:00
Amnon Heiman
c2bb3f1c00 Cleaning the estimated_histogram
This do the following chagnes in the estimated_histogram, it uses
int64_t over unsigned to be compatible to origin and the API.

It adds a getter to the buckets and change the getteer to the
bucket_offset to be const.

It adds a get min and max similiar to origin. And it adds a merge
function to merge estimated histogram.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-12 13:10:06 +03:00
Amnon Heiman
0240080527 streaming_histogram modify the default constructor
The default constructor need to set the the max_bin size, so it was
combine with the non default one, with a default value.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-12 09:41:19 +03:00
Glauber Costa
799a6b5962 sstables: change summary_la to summary_ka
What we implement is ka, not la. Since the summary is the one element that
actually changed in the 2.2 implementation, it is particularly important that
we get this one right. I have previously missed this.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-11 17:47:48 +03:00
Raphael S. Carvalho
18c792c174 compaction: fix throughput calculation
(endsize / (1024*1024)) is an integer calculation, so if endsize is
lower than 1024^2, the result would be 0.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-10 13:18:11 +03:00
Avi Kivity
fee3a9513b sstables: add yet another variant of filter_has_key()
This time public, for use when preloading the cache.
2015-08-09 22:03:01 +03:00
Avi Kivity
416d8f7799 sstables: don't pass temporary string to regex
Since the regex match returns views into that string, it must not be
a temporary. gcc 5.1's libstdc++ won't accept it, either.
2015-08-07 21:46:55 +03:00
Glauber Costa
7bbf8c2a6f sstable types: correctly state version of metadata field
Don't let the current name fool you: Having this listed as "la" here
was just lack of discipline on my part. I meant by it "the format from
which we are importing" - which was named la for Origin. I wasn't
really thinking at the time that it would be dangerous to stop between
versions.

This should read ka, not la.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
a7f88004be sstables: build a descriptor from filename
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
859dc58511 sstables: construct filename for ka sstables
A helper struct - entry_descriptor - is introduced to aid in this goal.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
976de6f6f4 sstables: get cf and ks strings for filename
We will need them to properly build names in some situations.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
d5a5ee98f0 sstables: add new version
We'll keep the old one around. Eventually we'll need it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
cd8c9ad288 sstables: add ks and cf name to sstable constructor
When a schema is available, we use it. However, we have, by now, way too many
tests. Some of them use tables for which we don't even know the schema. It would
have been a massive amount of work to require a schema for all of them - so I am
keeping both constructors around.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
2cbfe261e3 sstables: reuse code for filename
We have currently two versions of filename: one static, where the caller has to
pass all parameters, and an internal one where those parameters are derived
from the sstable attributes. Implement the latter in terms of the former so
making changes gets easier.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
8a3c935c21 sstables: component_from_sstring
Analogous to version and format

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:18:09 -05:00
Raphael S. Carvalho
9916bff975 sstables: remove sstable::store
It was initially created to be the function to write all sstable
components, but later on, its purpose was only to write a few
components for testing. A similar function was created in the
tests, so now it can be removed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-06 17:39:05 +03:00
Raphael S. Carvalho
004af400de tests: add method in sstables::test to write components
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-06 17:39:05 +03:00
Avi Kivity
48a1ce28fc Merge "Switch to log-structured allocator" from Tomasz 2015-08-06 15:45:39 +03:00
Tomasz Grabiec
cda31eccf7 db: Use LSA to allocate data inside memtable 2015-08-06 14:05:16 +02:00
Tomasz Grabiec
1046ee6e80 memtable: Remove all_partitions()
Preferred way to access the memtable is via reader.
2015-08-06 14:05:16 +02:00
Glauber Costa
c2eca19737 sstable_test: fix check_toc_func
We are currently failing the sstable test. The reason is that we use the store()
function for test purposes, and that function does not store the TOC component.
It was removed by Aviccident in 3a5e3c88.

Because that function is only used for testing purposes, it doesn't need to write
the Index and Data components: we can then remove them from the list.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-06 10:11:55 +03:00
Raphael S. Carvalho
3ddb9be984 db: fix compaction on an empty column family
When forcing a compaction on a column family with no sstables, an
assert will fail because there is no sstables to be compacted.
This problem is fixed by ignoring a compaction request when no
sstable is provided.

Fixes #61.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-05 14:04:22 +03:00
Raphael S. Carvalho
1a3604f3c2 sstables: add a comment describing some sstable fields.
The reason is that the reader may think that these fields store
some statistics information about a sstable just loaded, but
they are only used when writing a new sstable.
Now I'm starting to see the value of having a sstable class for
a sstable loaded and another one for a sstable being created
(that's what Origin does).

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-05 10:24:02 +03:00
Avi Kivity
3a5e3c8829 sstables: de-futurize write path
The sstables write path has been partially de-futurized, but now creates a
ton of threads, and yet does not exploit this as everything is serialized.

Remove those extra threads and futures and use a single thread to write
everything.  If needed, we'll employ write-behind in output_stream to
increase parallelism.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-03 20:33:59 +03:00
Avi Kivity
ad443e4771 sstable: add accessor for first/last partition keys 2015-08-03 20:17:41 +03:00
Avi Kivity
6ca6f0c3a4 sstables: add conversion function from sstable key to partition key 2015-08-03 20:17:40 +03:00
Raphael S. Carvalho
477a3586d7 compaction: add missing information to compaction log
duration and throughput weren't being calculated.

closes #54.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-02 19:15:57 +03:00
Avi Kivity
98ec451d6a Extract range<> into its own header
It's not just for queries any more.
2015-08-02 16:07:42 +03:00
Paweł Dziepak
430f74a8bb sstables: read expired or expiring row marker
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-30 14:10:06 +02:00
Paweł Dziepak
f5e3764570 sstables: properly write expiring row marker
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-30 14:10:06 +02:00
Raphael S. Carvalho
c9fdc7dc5d compaction: get rid of invalid FIXME comment
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-28 19:22:26 +03:00
Avi Kivity
2e745bebad Merge "use compaction strategy options" from Raphael 2015-07-27 17:06:43 +03:00
Tomasz Grabiec
e5feff5d71 dht: ring_position: Switch to total ordering
range::is_wrap_around() and range::contains() rely on total ordering
on values to work properly. Current ring_position_comparator was only
imposing a weak ordering (token positions equal to all key positions
with that token).

range::before() and range::after() can't work for weak ordering. If
the bound is exclusive, we don't know if user-provided token position
is inside or outside.

Also, is_wrap_around() can't properly detect wrap around in all
cases. Consider this case:

 (1) ]A; B]
 (2) [A; B]

For A = (tok1) and B = (tok1, key1), (1) is a wrap around and (2) is
not. Without total ordering between A and B, range::is_wrap_around() can't
tell that.

I think the simplest soution is to define a total ordering on
ring_position by making token positions positioned either before or
after all keys with that token.
2015-07-24 16:08:41 +02:00
Raphael S. Carvalho
70770c261b sstables: remove double percentage symbol from compaction log message
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-24 10:21:38 +02:00
Raphael S. Carvalho
634d00511b compaction: use compaction options in strategy
Support to compaction strategy options was recently added.
Previously, we were using default values in compaction strategy for
options, but now we can use the options defined in the schema.
Currently, we only support size-tiered strategy, so let's start
with it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-23 15:26:47 -03:00
Glauber Costa
4cd143de87 filter_tracker: define and call a stop method
All sharded services "should" define a stop method. Calling them is also
a good practice. For this one specifically, though, we will not call stop.
We miss a good way to add a Deleter to a shared_ptr class, and that would
be the only reliable way to tie into its lifetime.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-23 11:11:57 -04:00
Glauber Costa
96f7c77a04 sstables: write dense tables
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-22 23:10:22 -04:00
Glauber Costa
2757cc595a sstable partition: read dense tables
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-22 23:10:22 -04:00
Glauber Costa
87c77acbac sstables: correctly write column names for non compound types
This can happen for COMPACT STORAGE.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-22 23:10:21 -04:00
Glauber Costa
3383c619ad partition: handle reads of non-composite types
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-22 23:10:21 -04:00
Glauber Costa
e9094db7ef sstable partition: remove dead code
This is no longer used

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-22 23:10:21 -04:00
Glauber Costa
5b7c749310 sstables: simplified version of write_column_name for non-clustered columns
We still want to wrap it instead of writing the column name directly, so we are
able to update the statistics.

It is better to have a separate function for this, because write_column_name
doesn't have enough information to decide when to do what. Augmenting it so we
could have would require passing the schema, or an extra parameter, which would
then spread to all callers.

Keep in mind that testing for an empty clustering key is not enough, since
composite types will serialize the empty clustering key in this case.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-22 23:10:21 -04:00
Raphael S. Carvalho
e57fe36249 compaction: get compaction threshold from schema instead
Get values from cf->schema instead of using hardcoded threshold
values. In addition, move DEFAULT_MIN_COMPACTION_THRESHOLD and
DEFAULT_MAX_COMPACTION_THRESHOLD to schema.hh so as not to have
knowledge duplicated.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-22 18:03:23 +03:00
Avi Kivity
6a9d0495f8 sstables: fix memory corruption in metadata parsing
Since parsing involves a unique_ptr<metadata> holding a pointer to a
subclass of metadata, it must define a virtual destructor, or it can
cause memory leaks when deleted, or, with C++14 sized deallocators, it
can cause the wrong memory pool to be used for deleting the object.

Seen on EC2.

Define a virtual destructor to tell the compiler how to destroy
and free the object.
2015-07-22 17:46:37 +03:00
Avi Kivity
69a94732df Merge "logging compaction activity" from Raphael 2015-07-22 16:00:57 +03:00