Commit Graph

10 Commits

Author SHA1 Message Date
Tomasz Grabiec
7aea858108 sstables: Make data_consume_rows(0, 0) return no rows
data_consume_rows(0, 0) was returning all partitions instead of no
partitions, because -1 was passed as count in such case, which was
then casted to uint64_t.

Special-casing it that way is problematic for code which calculates
the bounds, and when the key is not found we simple end up with 0 as
upper bound. Instead of convoluting the range lookup code to special
case for 0, let's simplify the interface so that (0, 0) returns no
rows, same as (1, 1). There is a new overload of data_consume_rows()
without bounds, which returns all data.
2015-07-22 13:10:01 +02:00
Nadav Har'El
d42c05b6ad sstable: Pull-style read interface
This patch replaces the sstable read APIs from having "push" style,
to having "pull style".

The sstable read code has two APIs:
 1. An API for sequentially consuming low-level sstable items - sstable
    row's beginning and end, cells, tombstones, etc.
 2. An API for sequentially consuming entire sstable rows in our "mutation"
    format.

Before this patch, both APIs were in "push style": The user supplies
callback functions, and the sstable read code "pushes" to these functions
the desired items (low-level sstable parts, or whole mutations).
However, a push API is very inconvenient for users, like the query
processing code, or the compaction code, which both iterate over mutations.
Such code wants to control its own progression through the iteration -
the user prefers to "pull" the next mutation when it wants it; Moreover,
the user wants to *stop* pulling more mutations if it wants, without
worrying about various continuations that are still scheduled in the
background (the latter concern was especially problematic in the "push"
design).

The modified APIs are:

1. The functions for iterating over mutations, sstable::read_rows() et al.,
   now return a "mutation_reader" object which can be used for iterating
   over the mutation: mutation_reader::read() asks for the next mutation,
   and returns a future to it (or an unassigned value on EOF).
   You can see an example on how it is used in sstable_mutation_test.cc.

2. The functions for consuming low-level sstable items (row begin, cell,
   etc.) are still partially push-style - the items are still fed into
   the consume object - but consumpton now *stops* (instead of defering
   and continuing later, as in the old code) when the consumer asks to.
   The caller can resume the consumption later when it wishes to (in
   this sense, this is a "pull" API, because the user asks for more
   input when it wants to).

This patch does *not* remove input_stream's feature of a consumer
function returning a non-ready future. However, this feature is no longer
used anywhere in our code - the new sstable reader code stops the
consumption when old sstable reader code paused it temporarily with
a non-ready future.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-03 10:55:34 +03:00
Nadav Har'El
33270efc39 sstables: make consume_row_end() a future
After commit 3ae81e68a0, we already support
in input_stream::consume() the possibility of the consumer blocking by
returning a future. But the code for sstable consumption had now way to
use this capability. This patch adds a future<> return code for
consume_row_end(), allowing the consumer to pause after reading each
sstable row (but not, currently, after each cell in the row).

We also need to use this capability in read_range_rows(), which wrongly
ignored the future<> returned by the "walker" function - now this future<>
is returned to the sstable reader, and causes it to pause reading until
the future is fulfilled.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-05-28 18:53:32 +03:00
Avi Kivity
3ae81e68a0 Merge seastar upstream
Updated sstables::data_consume_rows_context for input_stream::consume()
API change.
2015-05-19 19:57:09 +03:00
Raphael S. Carvalho
68d76cb915 sstables: fix a bug in data_consume_rows_context::read_64
When the temporary buffer has enough data for a uint64 to be
consumed, we readily consume it.
The problem is that we were wrongly storing the uint64 into
a uint32 variable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-05-08 11:16:42 +02:00
Tomasz Grabiec
e4ef356cc3 Revert "sstables: fix a bug in data_consume_rows_context::read_64"
This reverts commit f80f00476c.

This is the vrong version of the patch.
2015-05-08 11:16:10 +02:00
Raphael S. Carvalho
f80f00476c sstables: fix a bug in data_consume_rows_context::read_64
When the temporary buffer has enough data for a uint64 to be
consumed, we readily consume it.
The problem is that we were wrongly storing the uint64 into
a uint32 variable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-05-08 11:05:20 +02:00
Nadav Har'El
8e6d11df1b sstable read: support deleted cells
This patch adds support to reading deleted cells (a.k.a. cell tombstones)
from the SSTable.

The way deleted cells are encoded in the sstable is explained in the
"Cell tombstone" section of
https://github.com/cloudius-systems/urchin/wiki/SSTables-interpretation-in-Urchin

This more-or-less completes the low-level SSTable row reading code - the
only remaining untreated case are counters, which we agreed to leave to
later. If counters are found in the SSTable, we'll throw an exception.

This patch adds a new callback, consume_deleted_cell, taking the name of
the cell and its deletion_time (as usual, deletion_time includes both a
64-bit timestamp, for ordering events, and a 32-bit "local_deletion_time"
used to schedule gc of old tombstones).

This patch also adds a test SSTable with deleted cell, created by the
following Cassandra Commands:

	CREATE TABLE deleted (
		name text,
		age int,
		PRIMARY KEY (name)
	);
	INSERT INTO deleted (name, age) VALUES ('nadav', 40);
	<flush table - the second table is what we're after>
	DELETE age FROM deleted WHERE name = 'nadav';

We test our ability to read this sstable, and see the deleted cell
and its expected deletion time.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-04-28 14:56:04 +03:00
Nadav Har'El
0adc4812ba sstable read: support cell expiration time
This patch adds support to reading sstable cells with expiration time.

It adds two more parameters to the row_consumer::consume_cell() - "ttl"
and "expiration". The "ttl" is the original TTL set on the cell in seconds,
the "expiration" is the absolute time (in seconds since the Unix epoch) when
this cell is set to expire. I don't know why both values are needed...

When a cell has no expiration time set (most cells will be like that), the
callback with will be called expiration==0 (and ttl==0).

This patch also adds a test SSTable with cells with set TTL, created by
the following Cassandra commands:

	CREATE TABLE ttl (
		name text,
		age int,
		PRIMARY KEY (name)
	);
	INSERT INTO ttl (name, age) VALUES ('nadav', 40) USING TTL 3600;

And tests our ability to read the resulting sstable, and get the expected
expiration time.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-04-28 14:56:01 +03:00
Nadav Har'El
486e6271a1 sstables: data file row reading and streaming
The previous implementation could read either one sstable row or several,
but only when all the data was read in advance into a contiguous memory
buffer.

This patch changes the row read implementation into a state machine,
which can work on either a pre-read buffer, or data streamed via the
input_stream::consume() function:

The sstable::data_consume_rows_at_once() method reads the given byte range
into memory and then processes it, while the sstable::data_consume_rows()
method reads the data piecementally, not trying to fit all of it into
memory. The first function is (or will be...) optimized for reading one
row, and the second function for iterating over all rows - although both
can be used to read any number of rows.

The state-machine implementation is unfortunately a bit ugly (and much
longer than the code it replaces), and could probably be improved in the
future. But the focus was parsing performance: when we use large buffers
(the default is 8192 bytes), most of the time we don't need to read
byte-by-byte, and efficiently read entire integers at once, or even larger
chunks. For strings (like column names and values), we even avoid copying
them if they don't cross a buffer boundary.

To test the rare boundary-crossing case despite having a small sstable,
the code includes in "#if 0" a hack to split one buffer into many tiny
buffers (1 byte, or any other number) and process them one by one.
The tests still pass with this hack turned on.

This implementation of sstable reading also adds a feature not present
in the previous version: reading range tombstones. An sstable with an
INSERT of a collection always has a range tombstone (to delete all old
items from the collection), so we need this feature to read collections.
A test for this is included in this patch.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-04-13 17:40:46 +03:00