mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-21 17:10:35 +00:00
The previous implementation could read either one sstable row or several, but only when all the data was read in advance into a contiguous memory buffer. This patch changes the row read implementation into a state machine, which can work on either a pre-read buffer, or data streamed via the input_stream::consume() function: The sstable::data_consume_rows_at_once() method reads the given byte range into memory and then processes it, while the sstable::data_consume_rows() method reads the data piecementally, not trying to fit all of it into memory. The first function is (or will be...) optimized for reading one row, and the second function for iterating over all rows - although both can be used to read any number of rows. The state-machine implementation is unfortunately a bit ugly (and much longer than the code it replaces), and could probably be improved in the future. But the focus was parsing performance: when we use large buffers (the default is 8192 bytes), most of the time we don't need to read byte-by-byte, and efficiently read entire integers at once, or even larger chunks. For strings (like column names and values), we even avoid copying them if they don't cross a buffer boundary. To test the rare boundary-crossing case despite having a small sstable, the code includes in "#if 0" a hack to split one buffer into many tiny buffers (1 byte, or any other number) and process them one by one. The tests still pass with this hack turned on. This implementation of sstable reading also adds a feature not present in the previous version: reading range tombstones. An sstable with an INSERT of a collection always has a range tombstone (to delete all old items from the collection), so we need this feature to read collections. A test for this is included in this patch. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>