scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	7aea858108	sstables: Make data_consume_rows(0, 0) return no rows data_consume_rows(0, 0) was returning all partitions instead of no partitions, because -1 was passed as count in such case, which was then casted to uint64_t. Special-casing it that way is problematic for code which calculates the bounds, and when the key is not found we simple end up with 0 as upper bound. Instead of convoluting the range lookup code to special case for 0, let's simplify the interface so that (0, 0) returns no rows, same as (1, 1). There is a new overload of data_consume_rows() without bounds, which returns all data.	2015-07-22 13:10:01 +02:00
Nadav Har'El	d42c05b6ad	sstable: Pull-style read interface This patch replaces the sstable read APIs from having "push" style, to having "pull style". The sstable read code has two APIs: 1. An API for sequentially consuming low-level sstable items - sstable row's beginning and end, cells, tombstones, etc. 2. An API for sequentially consuming entire sstable rows in our "mutation" format. Before this patch, both APIs were in "push style": The user supplies callback functions, and the sstable read code "pushes" to these functions the desired items (low-level sstable parts, or whole mutations). However, a push API is very inconvenient for users, like the query processing code, or the compaction code, which both iterate over mutations. Such code wants to control its own progression through the iteration - the user prefers to "pull" the next mutation when it wants it; Moreover, the user wants to stop pulling more mutations if it wants, without worrying about various continuations that are still scheduled in the background (the latter concern was especially problematic in the "push" design). The modified APIs are: 1. The functions for iterating over mutations, sstable::read_rows() et al., now return a "mutation_reader" object which can be used for iterating over the mutation: mutation_reader::read() asks for the next mutation, and returns a future to it (or an unassigned value on EOF). You can see an example on how it is used in sstable_mutation_test.cc. 2. The functions for consuming low-level sstable items (row begin, cell, etc.) are still partially push-style - the items are still fed into the consume object - but consumpton now stops (instead of defering and continuing later, as in the old code) when the consumer asks to. The caller can resume the consumption later when it wishes to (in this sense, this is a "pull" API, because the user asks for more input when it wants to). This patch does not remove input_stream's feature of a consumer function returning a non-ready future. However, this feature is no longer used anywhere in our code - the new sstable reader code stops the consumption when old sstable reader code paused it temporarily with a non-ready future. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-03 10:55:34 +03:00
Nadav Har'El	33270efc39	sstables: make consume_row_end() a future After commit `3ae81e68a0`, we already support in input_stream::consume() the possibility of the consumer blocking by returning a future. But the code for sstable consumption had now way to use this capability. This patch adds a future<> return code for consume_row_end(), allowing the consumer to pause after reading each sstable row (but not, currently, after each cell in the row). We also need to use this capability in read_range_rows(), which wrongly ignored the future<> returned by the "walker" function - now this future<> is returned to the sstable reader, and causes it to pause reading until the future is fulfilled. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-05-28 18:53:32 +03:00
Avi Kivity	3ae81e68a0	Merge seastar upstream Updated sstables::data_consume_rows_context for input_stream::consume() API change.	2015-05-19 19:57:09 +03:00
Raphael S. Carvalho	68d76cb915	sstables: fix a bug in data_consume_rows_context::read_64 When the temporary buffer has enough data for a uint64 to be consumed, we readily consume it. The problem is that we were wrongly storing the uint64 into a uint32 variable. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-05-08 11:16:42 +02:00
Tomasz Grabiec	e4ef356cc3	Revert "sstables: fix a bug in data_consume_rows_context::read_64" This reverts commit `f80f00476c`. This is the vrong version of the patch.	2015-05-08 11:16:10 +02:00
Raphael S. Carvalho	f80f00476c	sstables: fix a bug in data_consume_rows_context::read_64 When the temporary buffer has enough data for a uint64 to be consumed, we readily consume it. The problem is that we were wrongly storing the uint64 into a uint32 variable. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-05-08 11:05:20 +02:00
Nadav Har'El	8e6d11df1b	sstable read: support deleted cells This patch adds support to reading deleted cells (a.k.a. cell tombstones) from the SSTable. The way deleted cells are encoded in the sstable is explained in the "Cell tombstone" section of https://github.com/cloudius-systems/urchin/wiki/SSTables-interpretation-in-Urchin This more-or-less completes the low-level SSTable row reading code - the only remaining untreated case are counters, which we agreed to leave to later. If counters are found in the SSTable, we'll throw an exception. This patch adds a new callback, consume_deleted_cell, taking the name of the cell and its deletion_time (as usual, deletion_time includes both a 64-bit timestamp, for ordering events, and a 32-bit "local_deletion_time" used to schedule gc of old tombstones). This patch also adds a test SSTable with deleted cell, created by the following Cassandra Commands: CREATE TABLE deleted ( name text, age int, PRIMARY KEY (name) ); INSERT INTO deleted (name, age) VALUES ('nadav', 40); <flush table - the second table is what we're after> DELETE age FROM deleted WHERE name = 'nadav'; We test our ability to read this sstable, and see the deleted cell and its expected deletion time. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-04-28 14:56:04 +03:00
Nadav Har'El	0adc4812ba	sstable read: support cell expiration time This patch adds support to reading sstable cells with expiration time. It adds two more parameters to the row_consumer::consume_cell() - "ttl" and "expiration". The "ttl" is the original TTL set on the cell in seconds, the "expiration" is the absolute time (in seconds since the Unix epoch) when this cell is set to expire. I don't know why both values are needed... When a cell has no expiration time set (most cells will be like that), the callback with will be called expiration==0 (and ttl==0). This patch also adds a test SSTable with cells with set TTL, created by the following Cassandra commands: CREATE TABLE ttl ( name text, age int, PRIMARY KEY (name) ); INSERT INTO ttl (name, age) VALUES ('nadav', 40) USING TTL 3600; And tests our ability to read the resulting sstable, and get the expected expiration time. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-04-28 14:56:01 +03:00
Nadav Har'El	486e6271a1	sstables: data file row reading and streaming The previous implementation could read either one sstable row or several, but only when all the data was read in advance into a contiguous memory buffer. This patch changes the row read implementation into a state machine, which can work on either a pre-read buffer, or data streamed via the input_stream::consume() function: The sstable::data_consume_rows_at_once() method reads the given byte range into memory and then processes it, while the sstable::data_consume_rows() method reads the data piecementally, not trying to fit all of it into memory. The first function is (or will be...) optimized for reading one row, and the second function for iterating over all rows - although both can be used to read any number of rows. The state-machine implementation is unfortunately a bit ugly (and much longer than the code it replaces), and could probably be improved in the future. But the focus was parsing performance: when we use large buffers (the default is 8192 bytes), most of the time we don't need to read byte-by-byte, and efficiently read entire integers at once, or even larger chunks. For strings (like column names and values), we even avoid copying them if they don't cross a buffer boundary. To test the rare boundary-crossing case despite having a small sstable, the code includes in "#if 0" a hack to split one buffer into many tiny buffers (1 byte, or any other number) and process them one by one. The tests still pass with this hack turned on. This implementation of sstable reading also adds a feature not present in the previous version: reading range tombstones. An sstable with an INSERT of a collection always has a range tombstone (to delete all old items from the collection), so we need this feature to read collections. A test for this is included in this patch. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-04-13 17:40:46 +03:00

10 Commits