scylladb

Author	SHA1	Message	Date
Nadav Har'El	3018df11b5	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170619152629.11703-1-nyh@scylladb.com>	2017-06-19 18:31:32 +03:00
Avi Kivity	6e2c9ef9fb	Revert "Allow reading exactly desired byte ranges and fast_forward_to" This reverts commit `317d7fc253` (and also the related `2c57ab84b2`). It causes crashes during range scans, reported by Gleb: "To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s dataset and 3 node cluster. Backtrace: at /home/gleb/work/seastar/seastar/core/apply.hh:36 rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57 range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142 at ./seastar/core/future.hh:890 at /home/gleb/work/seastar/seastar/core/future-util.hh:119 at /home/gleb/work/seastar/seastar/core/future-util.hh:142	2017-06-18 16:10:21 +03:00
Nadav Har'El	317d7fc253	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170614072122.13473-1-nyh@scylladb.com>	2017-06-15 13:22:46 +01:00
Tomasz Grabiec	892d4a2165	db: Enable creating forwardable readers via mutation_source Right now all mutation source implementations will use make_forwardable() wrapper.	2017-02-23 18:50:44 +01:00
Avi Kivity	b740aff777	tests: adjust mutation_query_test for partition and row limits Won't build otherwise.	2016-12-19 11:37:25 +02:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Avi Kivity	7faf2eed2f	build: support for linking statically with boost Remove assumptions in the build system about dynamically linked boost unit tests. Includes seastar update which would have otherwise broken the build.	2016-10-26 08:51:21 +03:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Duarte Nunes	84e3969014	mutation_query_test: Add test for partition limit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-02 22:08:39 +00:00
Duarte Nunes	69798df95e	query: Limit number of partitions returned This is required to implement a thrift verb. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:48:13 +02:00
Paweł Dziepak	494c6fa9c1	tests/mutation_query_test: make sure mutations are sliced properly Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:52 +01:00
Duarte Nunes	91aac30f12	mutations: Row tombstones are now a set of ranges This patch changes the type of the mutation partition's row_tombstones to be a range_tombstone_list, so that they are now represented as a set of disjoint ranges. All of its usages are updated accordingly. Fixes #1155 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	dc8319ed91	keys: Remove schema argument from make_empty An empty key is independent of the schema. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Gleb Natapov	f1cd52ff3f	tests: test for result row counting Message-Id: <1462377579-2419-2-git-send-email-gleb@scylladb.com>	2016-05-04 18:18:17 +02:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Tomasz Grabiec	c591157755	tests: mutation_query: Add test for dropping partitions with expired tombstones	2016-02-22 20:23:29 +01:00
Glauber Costa	58fdae33bd	mutation_source: turn it into a class Its definition as a lambda function is inconvenient, because it does not allow us to use default values for parameters. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Avi Kivity	47499dcf18	data_value: make conversion from bytes explicit Since bytes is a very generic value that is returned from many calls, it is easy to pass it by mistake to a function expecting a data_value, and to get a wrong result. It is impossible for the data_value constructor to know if the argument is a genuine bytes variable, a data_value of another type, but serialized, or some other serialized data type. To prevent misuse, make the data_value(bytes) constructor (and complementary data_value(optional<bytes>) explicit.	2015-11-13 17:12:29 +02:00
Tomasz Grabiec	f1306d3771	tests: Add tests for reversed mutation queries	2015-10-22 10:32:08 +02:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Avi Kivity	c720cddc5c	tests: mv tests/urchin/* -> tests/ Now that seastar is in a separate repository, we can use the tests/ directory.	2015-08-05 14:16:52 +03:00

23 Commits