scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 01:20:39 +00:00

Author	SHA1	Message	Date
Avi Kivity	725949e8bf	Merge "Fixes for intentional short reads" from Paweł "This patchset contains fixes for the changes introduced in "Query result size limiting". It also improves handling of short data reads. I order to minimise chances of digest mismatch during data queries replicas that were asked just to return a digest also keep track of the size of the data (in the IDL representation) so that they would stop at the same point nodes doing full data queries would. Moreover, data queries are not affected by per-shard memory limit and the coordinator sends individual result size limits to replicas in order not to depend on hardcoded values. It is still possible to get digest mismatches if the IDL changes (e.g. a new field is added), but, hopefully, that won't be a serious problem." * 'pdziepak/short-read-fixes/v4' of github.com:cloudius-systems/seastar-dev: query: introduce result_memory_accounter::foreign_state storage_proxy: fix short reads in parallel range queries storage_proxy: pass maximum result size to replicas mutation_partition: use result limiter for digest reads query: make result_memory_limiter constants available for linker result_memory_limiter: add accounter for digest reads idl: allow writers to use any output stream result_memory_limiter: split new_read() to new_{data, mutation}_read() idl: is_short_read() was added in 1.6 mutation_partition: honour allowed_short_read for static rows storage_proxy: fix _is_short_read computation storage_proxy: disallow short reads if got no live rows storage_proxy: don't stop after result with no live rows (cherry picked from commit `868b4d110c`)	2016-12-27 19:16:15 +02:00
Avi Kivity	8ad0e96025	Merge "storage_proxy: Enforce row limit" from Duarte "This patchset ensures the partition limit is enforced at the storage_proxy level. Uppers layers like the pager may already be depending on this behavior." * 'enforce-row-limit/v3' of https://github.com/duarten/scylla: query_pagers: Don't trim returned rows select_statement: Don't always trim result set query_result_merger: Limit rows mutation_query: to_data_query_result enforces row limit (cherry picked from commit `3421ebe8be`)	2016-12-27 16:36:41 +02:00
Avi Kivity	1a2a63787a	Merge "storage_proxy: Enforce partition limit" from Duarte "This patchset ensures the partition limit is enforced at the storage_proxy level. To achieve this, we add the partition count to query::result, and allow the result_merger to trim excess partitions." * 'enforce-partition-limit/v3' of https://github.com/duarten/scylla: storage_proxy: Decrease limits when retrying command storage_proxy: Don't fetch superfluous partitions query::result: Add partition count column_family: Use counters in query::result::builder query_result_builder: Use the underlying counters mutation_partition: Count partitions in query_compacted mutation_partition: Remove tabs in query_compacted query::result::builder: Add partition count query_result_merger: Limit partitions (cherry picked from commit `6bb875bdb7`)	2016-12-27 16:36:27 +02:00
Paweł Dziepak	ee89d80d5c	query: add result size limiter This patch introduces an infrastrucutre for limiting result size. There is a shard-local limit which makes sure that all results combined do not use more than 10% of the shard memory. There is also an invidual limit which restricts a result to 4 MB. In order In order to avoid sending tiny results there is minimum guaranteed size (4 kB), which the query needs to reserve before it starts producing the result. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:10:02 +00:00
Paweł Dziepak	da7ca85040	query: allow short reads When paging is used the cluster is allowed to return less rows than the client asked for. However, if such possibility is used we need a way of telling that to the coordinator and the paging implementation so that they can differentiate between short reads caused by the replica running out of data to sent and short reads caused by any other means. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:10:01 +00:00
Raphael S. Carvalho	768aced741	partition_slice: introduce key-independent function to get ranges That will be important for sstable code that will rule out a sstable if it doesn't cover a given clustering key range. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:50:56 -03:00
Paweł Dziepak	3fe5ed3cd9	query: use result_view::consume() where appropriate Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-08-22 09:31:33 +01:00
Duarte Nunes	aaa76d58ba	query: Move to_partition_range to dht namespace This patch moves to_partition_range, from the query namespace to the dht namespace, where it is a more natural fit. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1468498060-19251-1-git-send-email-duarte@scylladb.com>	2016-07-15 10:41:52 +02:00
Paweł Dziepak	3c08ffb275	query: add full_slice query::full_slice is a partiton slice which has full clustering row ranges for all partition keys and no per-partition row limit. Options and columns are not set. It is used as a helper object in cases when a reference to partition_slice is needed but the user code needs just all data there is (an example of such case would be sstable compaction). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Duarte Nunes	69798df95e	query: Limit number of partitions returned This is required to implement a thrift verb. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:48:13 +02:00
Duarte Nunes	01b18063ea	query: Add per-partition row limit This patch as a per-partition row limit. It ensures both local queries and the reconciliation logic abide by this limit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:46:51 +02:00
Gleb Natapov	db322d8f74	query: put live row count into query::result The patch calculates row count during result building and while merging. If one of results that are being merged does not have row count the merged result will not have one either.	2016-05-02 15:10:15 +03:00
Tomasz Grabiec	61435108a5	query: Do not take arguments via ... in the visitor Amnon reports that current code fails to compile on gcc 4.9: distcc[9700] ERROR: compile /home/amnon/.ccache/tmp/query.tmp.localhost.localdomain.9673.ii on localhost failed In file included from query.cc:30:0: query-result-reader.hh: In instantiation of ‘void query::result_view::consume(const query::partition_slice&, ResultVisitor&&) [with ResultVisitor = query::result::calculate_row_count(const query::partition_slice&)::<anonymous struct>&]’: query.cc:196:32: required from here query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class clustering_key_prefix’ through ‘...’ visitor.accept_new_row(*row.key(), static_row, view); ^ query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’ query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’ query-result-reader.hh:186:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’ visitor.accept_new_row(static_row, view); ^ query-result-reader.hh:186:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’ Work around the problem by not using '...'. Message-Id: <1460964042-2867-1-git-send-email-tgrabiec@scylladb.com>	2016-04-26 14:50:35 +03:00
Gleb Natapov	15ebe5e4e5	query: add calculate_row_count function to query::result	2016-04-14 19:26:00 +03:00
Gleb Natapov	f47b2dad18	query: add lazy printer to query::result query::result transformation to printable form is very heavy operation that allocates memory and thus can fail. Add a class to query::result that can be used with logger to push to string conversion when output is performed.	2016-04-14 19:26:00 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Paweł Dziepak	72970c9c90	query: add query::result::_digest to pretty printer Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:17 +00:00
Paweł Dziepak	bdc23ae5b5	remove db/serializer.hh includes Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:07:09 +00:00
Tomasz Grabiec	f4a86729f9	query: Move implementaion of result_merger to .cc file Message-Id: <1456855396-1563-1-git-send-email-tgrabiec@scylladb.com>	2016-03-01 20:06:42 +02:00
Tomasz Grabiec	6cec131432	query: Switch to IDL-generated views and writers The query result footprint for cassandra-stress mutation as reported by tests/memory-footprint increased by 18% from 285 B to 337 B. perf_simple_query shows slight regression in throughput (-8%): build/release/tests/perf/perf_simple_query -c4 -m1G --partitions 100000 Before: ~433k tps After: ~400k tps	2016-02-26 12:26:13 +01:00
Tomasz Grabiec	5f756fcbe5	query: Add cql_format property to partition_slice It will specify in which format CQL values should be serialized. Will allow for rolling out new CQL binary protocol versions without stalling reads.	2016-02-15 17:05:55 +01:00
Gleb Natapov	ab6703f9bc	Remove old query::result serializer	2016-01-24 12:45:41 +02:00
Gleb Natapov	043d132ba9	Remove no longer used serializers.	2016-01-24 12:45:41 +02:00
Gleb Natapov	7357b1ddfe	Move specific_ranges to .hh and un-nest it. Serializer requires class to be defined, so it has to be in .h file. It also does not support nested types yet, so move it outside of containing class.	2016-01-24 12:45:41 +02:00
Gleb Natapov	9ae7dc70da	Prepare partition_slice to be used by serializer. Add missing _specific_ranges getter and setter.	2016-01-24 12:45:41 +02:00
Calle Wilund	8de95cdee8	paging bugfix: Allow reset/removal of "specific ck range" Refs #752 Paged aggregate queries will re-use the partition_slice object, thus when setting a specific ck range for "last pk", we will hit an exception case. Allow removing entries (actually only the one), and overwriting (using schema equality for keys), so we maintain the interface while allowing the pager code to re-set the ck range for previous page pass. [tgrabiec: commit log cleanup, fixed issue ref] Message-Id: <1452616259-23751-1-git-send-email-calle@scylladb.com>	2016-01-12 17:45:57 +01:00
Tomasz Grabiec	04eb58159a	query: Add schema_version field to read_command	2016-01-11 10:34:51 +01:00
Nadav Har'El	faa87b31a8	fix to_partition_range() inclusiveness A cut-and-paste accident in query::to_partition_range caused the wrong end's inclusiveness to be tested. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-01-05 15:38:40 +02:00
Tomasz Grabiec	d64db98943	query: Convert serialization of query::result to use db::serializer<> That's what we're trying to standardize on. This patch also fixes an issue with current query::result::serialize() not being const-qualified, because it modifies the buffer. messaging_service did a const cast to work this around, which is not safe.	2015-12-03 09:19:11 +01:00
Calle Wilund	284b10cabe	Make partition_slice::row_ranges mulitplex on partition Allows for having more than one clustering row range set, depending on PK queried (although right now limited to one - which happens to be exactly the number of mutiplexing paging needs... What a coincidence...) Encapsulates the row_ranges member in a query function, and if needed holds ranges outside the default one in an extra object. Query result::builder::add_partition now fetches the correct row range for the partition, and this is the range used in subsequent iteration.	2015-11-10 13:12:33 +01:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Tomasz Grabiec	d9e6f0d1da	query: Introduce query::result::pretty_print()	2015-07-28 11:31:08 +02:00
Tomasz Grabiec	4d06c2aa1d	Move to_partition_range() adaptor to global scope It should be moved to i_partitioner.hh, but to do that range<> has to be first moved out of query-request.hh to break cyclic dependency. I didn't want to cause conflicts with in-flight patches to range<>.	2015-07-24 16:08:41 +02:00
Tomasz Grabiec	9bea6aa0a3	db: Introduce mutation query interface Mutation query differs from data query in that returns information needed to reconcile data slice with that retruned by other data sources. There is a generic mutation_query() algorithm introduced, which can work with any mutation_source. database::query_mutations() is a shard-local interface for mutation queries. The reconcilable_result is introduced as a medium for mutation query results. It piggy backs on frozen_mutation as a medium for reconcilable data.	2015-07-12 12:51:38 +02:00
Paweł Dziepak	290a7ca1bf	query: add timestamp to read_command Read command needs a timestamp in order to determine which cells have already expired. Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-07-02 17:01:19 +02:00
Gleb Natapov	730170ff1a	serialize data structures needed for read clustering	2015-07-01 13:36:28 +03:00
Tomasz Grabiec	9525464f74	Move query::ring_position to dht::ring_position	2015-06-25 18:45:12 +02:00
Tomasz Grabiec	51cae834e3	db: Put all sstables behind single reader This change abstracts reading from on-disk data sources behind a single reader which is then composed with memtable readers. This change also abstracts all data sources behind a single reader obtained via column_family::make_reader(). That reader is then used by algorithms like column_family::for_all_partitions() or column_family::query(). Having those abstractions will make it easier to add row cache, because it will be encapsulated in a single place.	2015-06-18 16:33:33 +02:00
Tomasz Grabiec	3779506990	db: query: Make partition_range hold ring_position Current model was not really correct because Origin doesn't support querying of partition ranges by their value. We can query slices according to dht::decorated_key ordering, which orders partitions first by token then by key value. ring_position encapsulates range constraint. Key value is optional, in which case only token is constrained.	2015-06-18 15:47:40 +02:00
Gleb Natapov	b7155ad862	pass partitions_ranges separately from from read_command partitions_ranges will be manipulated upon to be split for different destination, so provide it separately from read_command to not copy the later for each destination.	2015-06-11 15:18:07 +03:00
Tomasz Grabiec	00f99cefd4	db: split query.hh to reduce header dependencies	2015-04-15 20:44:59 +02:00
Tomasz Grabiec	878a740b9d	db: Write query results in serialized form This gives about 30% increase in tps in: build/release/tests/perf/perf_simple_query -c1 --query-single-key This patch switches query result format from a structured one to a serialized one. The problems with structured format are: - high level of indirection (vector of vectors of vectors of blobs), which is not CPU cache friendly - high allocation rate due to fine-grained object structure On replica side, the query results are probably going to be serialized in the transport layer anyway, so this change only subtracts work. There is no processing of the query results on replica other than concatenation in case of range queries. If query results are collected in serialized form from different cores, we can concatenate them without copying by simply appending the fragments into the packet. This optimization is not implemented yet. On coordinator side, the query results would have to be parsed from the transport layer buffers anyway, so this also doesn't add work, but again saves allocations and copying. The CQL server doesn't need complex data structures to process the results, it just goes over it linearly consuming it. This patch provides views, iterators and visitors for consuming query results in serialized form. Currently the iterators assume that the buffer is contiguous but we could easily relax this in future so that we can avoid linearization of data received from seastar sockets. The coordinator side could be optimized even further for CQL queries which do not need processing (eg. select * from cf where ...) we could make the replica send the query results in the format which is expected by the CQL binary protocol client. So in the typical case the coordinator would just pass the data using zero-copy to the client, prepending a header. We do need structure for prefetched rows (needed by list manipulations), and this change adds query result post-processing which converts serialized query result into a structured one, tailored particularly for prefetched rows needs. This change also introduces partition_slice options. In some queries (maybe even in typical ones), we don't need to send partition or clustering keys back to the client, because they are already specified in the query request, and not queried for. The query results hold now keys as optional elements. Also, meta-data like cell timestamp and ttl is now also optional. It is only needed if the query has writetime() or ttl() functions in it, which it typically won't have.	2015-04-15 20:44:50 +02:00
Tomasz Grabiec	7ebc7830b7	db: Optimize column family lookup in query path	2015-04-15 20:33:48 +02:00
Tomasz Grabiec	cae5565e06	Switch query.cc to use join() from to_string.hh	2015-03-23 11:05:03 +01:00
Tomasz Grabiec	ebfc1ffb20	query: Add facilities for printing query request	2015-03-20 18:59:29 +01:00

45 Commits