scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Piotr Jastrzebski	7fd222e639	Pass schema to data_consume_context It will be needed to obtain column_translation that will be added to data_consume_context in the next patch. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 19:54:16 +02:00
Piotr Jastrzebski	54ef775501	Pass serialization_header to data_consume_rows_context* This header is needed to parse data for SSTable 3.0 format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:39:52 +02:00
Piotr Jastrzebski	9fad5831df	Make data_consume_context a template Parametrize it with the type of data consume rows context. There will be different implementations used for different sstable file formats. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-26 12:49:37 +02:00
Piotr Jastrzebski	e2b393df13	Move data_consume_rows_context from row.cc to row.hh It will be used as a template parameter for sstable_mutation_reader once it's turned into a template. This means the definition has to be accessible. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-26 12:49:37 +02:00
Piotr Jastrzebski	bcf5717753	Reduce visibility of sstable::data_consume_* They are used just in partition.cc, row.cc and sstables_test.cc so it is usefull to cut their scope by moving them to data_consume_context.hh. This will make it much easier to turn data_consume_context into a template. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-26 12:49:37 +02:00
Piotr Jastrzebski	578aa6826f	Move data_consume_context to separate header It's used only in row.cc, partition.cc and sstables_test.cc so it's better to reduce the dependency just to those files. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-26 12:49:37 +02:00
Piotr Jastrzebski	9ad00b8207	data_consume_rows_context: Mark RANGE_TOMBSTONE_5 as nonconsuming This state does not read any data and is used only to perform action when finishing to read a primitive type. According to comment on continuous_data_consumer::non_consuming such states should be marked as non_consuming. Tests: units (release) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <55a5c9b76268b50312ecd044291f28dcd8179a22.1523005293.git.piotr@scylladb.com>	2018-04-08 15:16:13 +03:00
Vladimir Krivopalov	0a7a56edd5	Simplify continuous_data_consumer::consume_input() interface. Remove redundant input parameter as continuous_data_consumer derivatives would only use themselves as a context. So take it internally and make the function regular (non-template) and having no parameters. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-01-29 11:57:26 -08:00
Vladimir Krivopalov	5dca3100ed	Support skipping over bytes from input stream in parsers based on continuous_data_consumer Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-01-29 11:56:55 -08:00
Glauber Costa	f0391bf9a0	sstables: enhance data consumer with a position tracker Callers, like compactions, will be able to know at any time the current progress of a read. As we do that, the currently unimplemented position() method of data_consume_context becomes redundant and is removed. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:07 -05:00
Raphael S. Carvalho	f699cf17ae	sstables: fix data_consume_context's move operator and ctor after `7f8b62bc0b`, its move operator and ctor broke. That potentially leads to error because data_consume_context dtor moves sstable ref to continuation when waiting for in-flight reads from input stream. Otherwise, sstable can be destroyed meanwhile and file descriptor would be invalid, leading to EBADF. Fixes #3020. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20171129014917.11841-1-raphaelsc@scylladb.com>	2017-11-29 09:53:47 +01:00
Piotr Jastrzebski	8afbe0ead0	Create data_consume_context_opt. This will be used in sstable_mutation_reader before first fill_buffer is called and a proper data_consume_context is created. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-11-22 15:24:22 +01:00
Piotr Jastrzebski	7f8b62bc0b	Merge data_consume_context::impl into data_consume_context There's no reason to use pimpl in data_consume_context Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-11-21 20:22:38 +01:00
Botond Dénes	47e07b787e	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-10-03 12:44:12 +03:00
Avi Kivity	78eae8bf48	Revert "Merge "Make restricting_mutation_reader more accurate" from Botond" This reverts commit `c6e5dcc556`, reversing changes made to `19b21a0ab2`. Failes to build, plus author has more changes.	2017-10-03 11:58:59 +03:00
Botond Dénes	33e97e7457	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-09-20 11:14:35 +03:00
Tomasz Grabiec	6baad2c2e6	sstables: Introduce data_consume_context::eof()	2017-08-28 09:19:43 +02:00
Nadav Har'El	3018df11b5	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170619152629.11703-1-nyh@scylladb.com>	2017-06-19 18:31:32 +03:00
Avi Kivity	6e2c9ef9fb	Revert "Allow reading exactly desired byte ranges and fast_forward_to" This reverts commit `317d7fc253` (and also the related `2c57ab84b2`). It causes crashes during range scans, reported by Gleb: "To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s dataset and 3 node cluster. Backtrace: at /home/gleb/work/seastar/seastar/core/apply.hh:36 rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57 range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142 at ./seastar/core/future.hh:890 at /home/gleb/work/seastar/seastar/core/future-util.hh:119 at /home/gleb/work/seastar/seastar/core/future-util.hh:142	2017-06-18 16:10:21 +03:00
Nadav Har'El	317d7fc253	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170614072122.13473-1-nyh@scylladb.com>	2017-06-15 13:22:46 +01:00
Jesse Haber-Kucharsky	376c661823	Eliminate duplicate definition of sstable column mask values The column mask identifies the kind of atom in a row in an sstable. Two definitions of these values were present: one as a C-style enumeration and one as a C++11-style enumeration. The C++11-style definition is used elsewhere in `sstables.cc`. It also offers additional type-safety. Therefore, this commit removes the inlined C-style enumeration. Fixes #2214. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <c525b4ae7fad3b54480e133921aa4ffe0dd5d9ce.1496352711.git.jhaberku@scylladb.com>	2017-06-02 00:06:31 +02:00
Tomasz Grabiec	a1dea3c4fc	sstables: Fix verify_end_state() to tolerate ATOM_START_2 state We would be in that state if consume_row_start() returns porceed::yes and the stream ends after that. This can happen if slicing using promoted index determined that there are no cells in the partition in the range.	2017-05-16 13:31:01 +02:00
Tomasz Grabiec	92dba05f0d	sstables: Fix malformed_sstable_exception from single-key reads After `4742008b70`, _read_partial_row is never set, and we will fail here in case the consumer will exhoust the range. That would be the case if the end bound of the slice aligns with the end of the index page. Fix by assuming that if we're out of range in the middle of partition, we sliced. Message-Id: <1493121249-18847-1-git-send-email-tgrabiec@scylladb.com>	2017-04-25 14:59:08 +03:00
Duarte Nunes	d45596ae8e	sstables: Read and write shadowable tombstones This patch serializes shadowable tombstones to sstables by adding a new, incompatible atom's mask. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Tomasz Grabiec	3472a74de4	sstables: Remove unused code	2017-04-20 11:23:05 +02:00
Tomasz Grabiec	a2a8312c78	sstables: Add trace-level logging for sstable consumption	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	27d86dfe18	sstables: Enable skipping to cells at data_consume_context level	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	d5e704ca1e	sstables: Make key_view constructor from bytes_view explicit	2017-03-28 18:10:39 +02:00
Paweł Dziepak	5905729c4a	sstables: read counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	19ad35610b	sstables: do not discard future returned by fast_forward_to() continuous_data_consumer::fast_forward_to() returns a future which was later ignored by data_consume_context::fast_forward_to(). With the current implementation, the future in question is always ready and that's why the problem didn't manifest itself in the form of crashes or invalid results. Message-Id: <20170120105746.7300-1-pdziepak@scylladb.com>	2017-01-20 12:22:17 +01:00
Gleb Natapov	ae0a2935b4	sstables: fix ad-hoc summary creation If sstable Summary is not present Scylla does not refuses to boot but instead creates summary information on the fly. There is a bug in this code though. Summary files is a map between keys and offsets into Index file, but the code creates map between keys and Data file offsets instead. Fix it by keeping offset of an index entry in index_entry structure and use it during Summary file creation. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20161116165421.GA22296@scylladb.com>	2016-11-17 11:05:23 +02:00
Paweł Dziepak	ab0eeae82d	sstables: keep separate stream history for single and range reads Single partition and partition range reads are expected to behave considerably different so it is worth to have them use separate file stream history. This also makes reads use different history for each sstable which is also a good thing. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	c63e88d556	sstables: implement mutation_reader::impl::fast_forward_to() This patch allows sstable readers to be fast forwarded without making it necessary to recreate the reader (and dropping all buffers in the process). It is built on top of index_reader and ability of data_consume_context to be fast forwarded. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	0bc873ace5	sstables: add fast_forward_to() to continuous_data_consumer Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	25b91c51e2	ssables: add data_consume_rows_context::reset() reset() is going to be used to restore valid state after fast forwarding the reader. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Nadav Har'El	022a69caea	sstables: promoted index read support This patch adds support more efficiently reading small parts of a large partition, without reading the entire partition as we had to do so far. This is done using the "promoted index". The "promoted index" is stored in the sstable index file, and provides for each large sstable row ("partition" in CQL nomenclature) a sample of the column names at (for example) 64KB intervals. This means that when we read a slice of columns (e.g., cql rows), or page through a large partition, we do not have to read the entire partition from disk. This patch only implements the read side of promoted index - a later patch will add the write-side support (i.e., writing the promoted index to the index file while saving the sstable). Nevertheless this patch can already be tested by reading existing sstables from Cassandra which include a promoted index - such as the one included in the test in the previous patch. The use of the promoted index currently has two limitations: 1. It is only used when reading a single partition with sstable::read_row(), not when scanning through many partitions with sstable::read_range_rows() or sstable::read_rows(). 2. It is only used when filtering a single clustering-key range, rather than a list of disjoint ranges. A single range is the common case. These two issues will be improved later. In the meantime, in those unsupported cases we simply continue to read entire partitions, so we're not worse-off than before. Also note that this patch only helps when sstable::read_row() is used with a clustering-key prefix (i.e., a slice). Our higher-level request handling code may decide to read an entire partition into the cache, and not use a clustering-key prefix at all when reading. We will need to indepdently improve the high-level code to use read_row()'s slicing capabilities when paging through large partitions, for example. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:09 +03:00
Nadav Har'El	1975abbfd6	sstables: disk_read_range Currently, the main sstable data parsing entry point data_consume_rows() takes a contiguous range of bytes to read from disk and parse. This range is supposed to be an entire partition or contiguous group of partitions. and is self contained (can be parsed without extra information about the identity of these partitions). For the promoted index feature (which we will add in a following patch) we will want the range to span only a part of a partition, and will need the caller to provide some information not available to the parser (such as the partition's key). In the future, we will also want to support a vector of byte ranges, instead of just one. So in preparation for this, this patch simply replaces the start/end pair by a new class disk_read_range, which can be easily extended in later patches. No new functionality is introduced in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:02 +03:00
Nadav Har'El	6fed716bd3	sstables: STOP_THEN_ATOM_START parser state In a later patch adding "promoted index" read support, we would like to parse only part of an sstable row. In that case, the parser should start not at the usual ROW_START state, but rather at the ATOM_START state. But there's a problem: The sstable parser consumer currently assumes that the parser stops after the start of the row, before reading any atoms. So in the partial row case too, we must stop parsing before reading the first atom. For this, this patch adds the new "STOP_THEN_ATOM_START" parser state. When starting in this state, the parser stops immediately (with row_consumer::proceed::no), and when restarted again it will be in the ATOM_START case. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:01 +03:00
Paweł Dziepak	02ffc28f0d	sstables: extend sstable life until reader is fully closed data_consume_rows_context needs to have close() called and the returned future waited for before it can be destroyed. data_consume_context::impl does that in the background upon its destruction. However, it is possible that the sstable is removed before data_consume_rows_context::close() completes in which case EBADF may happen. The solution is to make data_consume_context::impl keep a reference to the sstable and extend its life time until closing of data_consume_rows_context (which is performed in the background) completes. Side effect of this change is also that data_consume_context no longer requires its user to make sure that the sstable exists as long as it is in use since it owns its own reference to it. Fixes #1537. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1470222225-19948-1-git-send-email-pdziepak@scylladb.com>	2016-08-03 13:19:08 +02:00
Nadav Har'El	c647d917e0	sstables: move to_bytes_view to header file Move the to_bytes_view(temporary_buffer<char>) function from source file to header file where is can be used in more places. This saves one use of reinterpret_cast (which we are no re-evaluating), and moreover, we want to use this function also in the promoted index code (to return a bytes_view from the promoted index which was saved as a temporary_buffer). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1468761437-27046-1-git-send-email-nyh@scylladb.com>	2016-07-17 16:29:26 +03:00
Paweł Dziepak	55a6911d7a	sstables: close input_stream<> properly If read ahead is going to be enabled it is important to close input_stream<> properly (and wait for completion) before destroying it. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	9e8db53c46	sstables: allow row consumer to stop at any point Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Nadav Har'El	b7e29691c2	sstables: avoid index and data file over-reads When we do a streaming read that knows the expected end position of the read, we can use a large read-ahead buffer, and at the same time, stop reading at exactly the intended end (or small rounding of it to the DMA block size) and not waste resources blindly reading a large amount of data after the end just to fill the read-ahead buffer. The sstable reading code, both for reading the data file and the index file, created a file input stream without specifiying its end, thereby losing this optimization - so when a large buffer was used, we would get a large over-read. This patch fixes this, so sstable data file and index file are read using a file input stream which is a ware of its end. Fixes #964. Note that this patch does not change the behavior when reading a compressed data file. For compressed read, we did not have the problem of over-read in the first place, because chunks are read one by one. But we do have other sources of inefficiencies there (stemming, again, from the fact that the compressed chunks are read one by one), and I opened a separate issue #992 for that. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1457219304-12680-1-git-send-email-nyh@scylladb.com>	2016-03-08 17:26:10 +02:00
Glauber Costa	8e4bf025ae	sstables: wire priority for read path All the SSTable read path can now take an io_priority. The public functions will take a default parameter which is Seastar's default priority. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Tomasz Grabiec	657841922a	Mark move constructors noexcept when possible	2015-12-07 09:50:27 +01:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Raphael S. Carvalho	6f07379646	row consumer: don't fallthrough if mask cannot be consumed When row consumer fallthrough from ATOM_NAME_BYTES to ATOM_MASK, we assume that mask can be consumed, but it may happen that data.size() equals to zero, thus mask cannot be consumed. Solution is to add read_8 so that the code will only fallthrough if mask can be consumed right away. Fixes #197. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-09-04 12:26:03 +03:00
Avi Kivity	22f87e4252	sstables: avoid passing zero-sized buffers to make_file_input_stream With new seastar, it crashes.	2015-08-31 19:40:59 +03:00
Glauber Costa	2623362d20	continuous_data_consumer: do not pass reference to child Since the child is a base class, we don't need to pass a reference: we can just cast our 'this' pointer. By doing that, the move constructor can come back. Welcome back, move constructor. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 20:32:56 +03:00

1 2

71 Commits