scylladb

Author	SHA1	Message	Date
Avi Kivity	9421cfded4	reconcilable_result_builder: don't aggrevate out-of-memory condition during recovery Consider an unpaged query that consumes all of available memory, despite `fea5067dfa` which limits them (perhaps the user raised the limit, or this is a system query). Eventually we will see a bad_alloc which will abort the query and destroy this reconcilable_result_builder. During destruction, we first destroy _memory_accounter, and then _result. Destroying _memory_accounter resumes some continuations which can then allocate memory synchronously when increasing the task queue to accomodate them. We will then crash. Had we not crashed, we would immediately afterwards release _result, freeing all the memory that we would ever need. Fix by making _result the last member, so it is freed first. Fixes #7240.	2020-09-15 19:53:05 +02:00
Wojciech Mitros	45215746fe	increase the maximum size of query results to 2^64 Currently, we cannot select more than 2^32 rows from a table because we are limited by types of variables containing the numbers of rows. This patch changes these types and sets new limits. The new limits take effect while selecting all rows from a table - custom limits of rows in a result stay the same (2^32-1). In classes which are being serialized and used in messaging, in order to be able to process queries originating from older nodes, the top 32 bits of new integers are optional and stay at the end of the class - if they're absent we assume they equal 0. The backward compatibility was tested by querying an older node for a paged selection, using the received paging_state with the same select statement on an upgraded node, and comparing the returned rows with the result generated for the same query by the older node, additionally checking if the paging_state returned by the upgraded node contained new fields with correct values. Also verified if the older node simply ignores the top 32 bits of the remaining rows number when handling a query with a paging_state originating from an upgraded node by generating and sending such a query to an older node and checking the paging_state in the reply(using python driver). Fixes #5101.	2020-08-03 17:32:49 +02:00
Botond Dénes	f7a4d19fb1	mutation_partition: abort read when hard limit is exceeded for non-paged reads If the read is not paged (short read is not allowed) abort the query if the hard memory limit is reached. On reaching the soft memory limit a warning is logged. This should allow users to adjust their application code while at the same time protecting the database from the really bad queries. The enforcement happens inside the memory accounter and doesn't require cooperation from the result builders. This ensures memory limit set for the query is respected for all kind of reads. Previously non-paged reads simply ignored the memory accounter requesting the read to stop and consumed all the memory they wanted.	2020-07-29 08:32:31 +03:00
Botond Dénes	6660a5df51	result_memory_accounter: remove default constructor If somebody wants to bypass proper memory accounting they should at the very least be forced to consider if that is indeed wise and think a second about the limit they want to apply.	2020-07-28 18:00:29 +03:00
Botond Dénes	517a941feb	query_class_config: move into the query namespace It belongs there, its name even starts with "query".	2020-07-28 18:00:29 +03:00
Juliusz Stasiewicz	e04fd9f774	counters: Read the state under timeout Counter update is a RMW operation. Until now the "Read" part was not guarded by a timeout, which is changed in this patch. Fixes #5069	2020-06-02 15:10:43 +03:00
Botond Dénes	639bbefcd3	database: use valid permit for counter read-before-write Counter writes involve a read-before-write, which will soon require a valid permit to be passed to it, so make sure we create and pass a valid permit to this read. We use `database::make_query_class_config()` to obtain the semaphore for the read which selects the appropriate user/system semaphore based on the scheduling group the counter write is running in.	2020-05-28 11:34:35 +03:00
Botond Dénes	14743c4412	data_query, mutation_query: use query_class_config We want to move away from the current practice of selecting the relevant read concurrency semaphore inside `table` and instead want to pass it down from `database` so that we can pass down a semaphore that is appropriate for the class of the query. Use the recently created `query_class_config` struct for this. This is added as a parameter to `data_query`, `mutation_query` and propagated down to the point where we create the `querier` to execute the read. We are already propagating down a parameter down the same route -- max_memory_reverse_query -- which also happens to be part of `query_class_config`, so simply replace this parameter with a `query_class_config` one. As the lower layers are not prepared for a semaphore passed from above, make sure this semaphore is the same that is selected inside `table`. After the lower layers are prepared for a semaphore arriving from above, we will switch it to be the appropriate one for the class of the query.	2020-05-28 11:34:35 +03:00
Avi Kivity	157fe4bd19	Merge "Remove default timeouts" from Botond " Timeouts defaulted to `db::no_timeout` are dangerous. They allow any modifications to the code to drop timeouts and introduce a source of unbounded request queue to the system. This series removes the last such default timeouts from the code. No problems were found, only test code had to be updated. tests: unit(dev) " * 'no-default-timeouts/v1' of https://github.com/denesb/scylla: database: database::query(), database::apply(): remove default timeouts database: table::query(): remove default timeout mutation_query: data_query(): remove default timeout mutation_query: mutation_query(): remove default timeout multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout reader_concurrency_semaphore: wait_admission(): remove default timeout utils/logallog: run_when_memory_available(): remove default timeout	2020-03-01 17:29:17 +02:00
Botond Dénes	8da88e6cb9	mutation_query: data_query(): remove default timeout	2020-02-27 19:02:40 +02:00
Botond Dénes	fdb45d16de	mutation_query: mutation_query(): remove default timeout	2020-02-27 18:56:30 +02:00
Botond Dénes	7bdeec4b00	flat_mutation_reader: make_reversing_reader(): add memory limit If the reversing requires more memory than the limit, the read is aborted. All users are updated to get a meaningful limit, from the respective table object, with the exception of tests of course.	2020-02-27 18:11:54 +02:00
Vladimir Davydov	e0b31dd273	query: add flag to return static row on partition with no rows A SELECT statement that has clustering key restrictions isn't supposed to return static content if no regular rows matches the restrictions, see #589. However, for the CAS statement we do need to return static content on failure so this patch adds a flag that allows the caller to override this behavior.	2019-10-28 21:50:44 +03:00
Avi Kivity	093d2cd7e5	reconcilable_result: use chunked_vector to hold partitions Usually, a reconcilable_result holds very few partitions (1 is common), since the page size is limited by 1MB. But if we have paging disabled or if we are reconciling a range full of tombstones, we may see many more. This can cause large allocations. Change to chunked_vector to prevent those large allocations, as they can be quite expensive. Fixes #4780.	2019-08-01 18:49:13 +03:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Botond Dénes	3bcd577907	Move reconcilable_result_builder declaration to mutation_query.hh It will be used by code outside of mutation_partition.cc so it needs to be public. The definition remains in mutation_partition.cc.	2018-09-03 10:31:44 +03:00
Botond Dénes	5f726e9a89	querier: move all to query namespace To avoid name clashes.	2018-09-03 10:31:44 +03:00
Avi Kivity	ebff1cfc37	database: make database::_mutation_query_stage inherit the scheduling group Like the preceeding patch and for the same reasons, adjust database::_mutation_query_stage to inherit the scheduling group from its caller.	2018-08-24 19:04:49 +03:00
Glauber Costa	9188059427	database: group statements in their own scheduling group When we introduced the CPU scheduler, we have also introduced a group for commitlog - but never used it. There is also doubtful value in separating reads from writes, since they are often part of the same workload. To accomodate for that, let's rename the query group to "statement" (query is not incorrect, just confusing), and move the write path, currently ungrouped, inside it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-20 16:58:36 -04:00
Botond Dénes	ff808d9ce6	Save and restore queriers in mutation_query() and data_query() Use the querier_cache (represented by the passed-in querier_cache_context) object to lookup saved queriers at the start of the page and save them at the end of it if it is likely that there will be more page requests.	2018-03-13 10:34:34 +02:00
Duarte Nunes	6b4b429883	query-result: Introduce class result_options Introduce class result_options to carry result options through the request pipeline, which at this point mean the result type and the digest algorithm. This class allows us to encapsulate the concrete digest algorithm to use. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Glauber Costa	8433702c90	mutation_query: add a timeout to the mutation query path data_query and mutation_query are patched so that they start accepting a per-query timeout. We will default to no timeout, and then no callers will be changed yet. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Gleb Natapov	3b7d8c8767	storage_proxy: add capability to read data/digest for non singular ranges Currently only mutation_data read supports non singular ranges. This patch extends data/digest reads to support them too.	2017-08-03 10:35:09 +03:00
Paweł Dziepak	a78501c206	mutation_query: add an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	582d397c41	introduce counter_write_query() Counter write path involves read-modify-write. That read is guaranteed to query only a single partition, does not care about dead cells and expects to receive an unserialized mutation as a result. Standard mutation queries can are able to produce results fit for counter updates, but the logic involved is much more general (i.e. slower), hence the addition of new, counter-specific kind of query.	2017-03-01 16:33:36 +00:00
Tomasz Grabiec	f46ae8128d	database: Fix mutation_source created by as_mutation_source() to not ignore trace_state_ptr It was using the state passed via as_mutation_source() instead. Let's respect mutation_source contract instead, and use the state passed via mutation_source invocation. Technically just a cleanup. Alse prerequisite for more cleanup.	2017-02-23 18:23:52 +01:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Duarte Nunes	efc986d548	mutation_query: to_data_query_result enforces row limit This patch changes mutation_query::to_data_query_result() so that it enforces the row limit alongside the partition limit and the per-partition limit. In the following patch, we'll enforce the row limit in an upper layer, but this lets us optimize the case where only when replica replies. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-15 10:56:40 +00:00
Duarte Nunes	781cd82cb8	column_family: Use counters in query::result::builder This patch changes column_family::query() to use the counters in the builder to determine how many partitions and rows to ask for and also to implement the stop condition. This saves a continuation to do the bookkeeping, and allows us to remove data_query_result. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-15 10:27:46 +00:00
Paweł Dziepak	6c33a4f177	db: create result_memory_accounters when starting query This pach ensures than when we start executing a query a minimum result size is reserved from result_memory_limiter. Moreover, range queries need a way of merging memory usage information from different shards. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:10:02 +00:00
Paweł Dziepak	15de8de9e5	reconcilable_result: keep result_memory_tracker object Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:10:02 +00:00
Paweł Dziepak	43fe3439ca	reconcilable_result: properly propagate short_read flag reconcilable_result can be merged with another or transformed into query::result. Make sure that short_read information is never lost. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:10:02 +00:00
Duarte Nunes	69798df95e	query: Limit number of partitions returned This is required to implement a thrift verb. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:48:13 +02:00
Paweł Dziepak	6f25533f4e	mutation_query: drop querying_reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:31:52 +01:00
Paweł Dziepak	0828c88b25	mutation_partition: implement streaming-friendly data_query() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:31:19 +01:00
Avi Kivity	db03295c8a	Merge "Fix query digest mismatch" from Tomasz "Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165."	2016-04-08 12:13:29 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	e4e8acc946	mutation_query: Extract main part of mutation_query() into more generic querying_reader So that it can be reused in query()	2016-04-07 19:03:04 +02:00
Paweł Dziepak	bdc23ae5b5	remove db/serializer.hh includes Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:07:09 +00:00
Gleb Natapov	63a5aa6122	prevent superfluous frozen_mutation copying Sometimes frozen_mutation is copied while it can be moved instead. Fix those cases. Message-Id: <20160204165708.GI6705@scylladb.com>	2016-02-07 10:54:16 +02:00
Gleb Natapov	b9b6f703c3	Remove old serializer for frozen_mutation and reconcilable_result	2016-01-24 12:45:41 +02:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	09188bccde	mutation_query: Make reconcilable_result printable	2015-12-16 18:06:54 +01:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Tomasz Grabiec	9bea6aa0a3	db: Introduce mutation query interface Mutation query differs from data query in that returns information needed to reconcile data slice with that retruned by other data sources. There is a generic mutation_query() algorithm introduced, which can work with any mutation_source. database::query_mutations() is a shard-local interface for mutation queries. The reconcilable_result is introduced as a medium for mutation query results. It piggy backs on frozen_mutation as a medium for reconcilable data.	2015-07-12 12:51:38 +02:00

45 Commits