scylladb

Author	SHA1	Message	Date
Karol Nowacki	086c6992f5	vector_search: Fix ANN query abort on CQL timeout When a CQL vector search request timed out, the underlying ANN query was not aborted and continued to run. This happened because the abort source was not being signaled upon request expiration. This commit ensures the ANN query is aborted when the CQL request times out preventing unnecessary resource consumption.	2025-12-02 01:17:01 +01:00
Karol Nowacki	9f1fd7f5a0	cql3: Rename indexed_table_select_statement To align with `vector_indexed_table_select_statement`, this commit renames `indexed_table_select_statement` to `view_indexed_table_select_statement` to clarify its usage with materialized views.	2025-10-29 08:37:25 +01:00
Karol Nowacki	357c0a8218	cql3: Move vector search select to dedicated class The execution of SELECT statements with ANN ordering (vector search) was previously implemented within `indexed_table_select_statement`. This was not ideal, as vector search logic is independent of secondary index selects. This resulted in unnecessary complexity because vector search queries don't use features like aggregates or paging. More importantly, `indexed_table_select_statement` assumed a non-null `view_schema` pointer, which doesn't hold for vector indexes (where `view_ptr` is null). This caused null pointer dereferences during ANN ordered selects, leading to crashes (VECTOR-179). Other parts of the class still dereference `view_schema` without null checks. Moving the vector search select logic out of `indexed_table_select_statement` simplifies the code and prevents these null pointer dereferences.	2025-10-29 08:37:21 +01:00
Nadav Har'El	921d07a26b	cql: make SELECT's "internal page size" configurable In some uses of SELECT, such as aggregation (sum() et al.), GROUP BY or secondary index, it needs to perform internal scans. It uses an "internal page size" which before this patch was always DEFAULT_COUNT_PAGE_SIZE = 10000. There was an ad-hoc and undocumented way to override this default in C++ tests, using functions in test/lib/select_statement_utils.hh, but it was so non-obvious that the test that most needed to override this default - the very slow test test_indexing_paging_and_aggregation which would have been must faster with a lower setting - never used it. So in this patch we replace the ad-hoc configuration functions by a bona-fide Scylla configuration option named "select_internal_page_size". The few C++ tests that used the old configuration functions were modified to use the new configuration parameters. The slow test test_indexing_paging_and_aggregation still doesn't use the new configuration to become faster - we'll do this in the next patch. Another benefit of having this "internal page size" as a configuration option is that one day a user might realize that the default choice 10,000 is bad for some reason (which I can't envision right now), so having it configurable might come it handy. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-10-15 18:42:09 +03:00
Michał Hudobski	b09d1f0a98	index, metrics: add per-index metrics This patch adds the possibility to track metrics per secondary index. Currently, only a histogram of query latencies is tracked, but more metrics can be added in the future. To add a new metric, it needs to be added to the index_metrics struct in index/secondary_index_manager.hh and then initialized in index/secondary_index_manager.cc in the constructor of the index_metrics struct. The metrics are created when the index is created and removed when the index is dropped. First lines of the new metric: \# HELP scylla_index_query_latencies Index query latencies \# TYPE scylla_index_query_latencies histogram scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640 scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1	2025-09-16 14:03:43 +02:00
Jan Łakomy	5fecad0ec8	cql3/statements: add `ANN OF` queries support to select statements Add parsing of `ANN OF` queries to the `select_statement` and `indexed_table_select_statement` classes. Add a placeholder for the implementation of external ANN queries. Rename `should_create_view` to `view_should_exist` as it is used not only to check if the view should be created but also if the view has been created. Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>	2025-08-01 12:08:50 +02:00
Pawel Pery	5bfce5290e	cql3: refactor primary_key as a top-level class This patch is a part of vector_store_client sharded service implementation for a communication with vector-store service. There is a need for forward declaration of primary_key class. This patch moves a nested definition of select_statement::primary_key (from a cql3::statements namespace) into a standalone class in a cql3::statements namespace. Reference: VS-47	2025-07-09 11:54:51 +02:00
Petr Gusev	3d262d2be8	LWT: create cas_shard in select_statement In this commit we create cas_shard in select_statement and pass it to the sp::query_result function.	2025-06-30 10:37:33 +02:00
Paweł Zakrzewski	9e7f79d1ab	cql3/select_statement: require LIMIT and PER PARTITION LIMIT to be strictly positive LIMIT and PER PARTITION LIMIT limit the number of rows returned or taken into consideration by a query. It makes no logical sense to have this value at less than 1. Cassandra also has this requirement. This patch ensures that the limit value is strictly positive and adds an explicit test for it - it was only tested in a test ported from Cassandra, that is disabled due to other issues. Closes scylladb/scylladb#23013	2025-03-03 08:13:27 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Paweł Zakrzewski	cb1483037c	cql3: show different messages for LIMIT and PER PARTITION LIMIT in get_limit select_statement::get_limit is used to evaluate the LIMIT value for both LIMIT and PER PARTITION LIMIT. This change fixes the error message for incorrect values passed by the user.	2024-11-18 17:56:53 +01:00
Botond Dénes	46563d719f	replica/mutation_dump: enfore pinning of effective replication map By making it a required argument, making sure the topology version is pinned for the duration of the query. This is needed because mutation dump queries bypass the storage proxy, where this pinning usually takes place. So it has to be enforced here.	2024-08-22 06:24:06 -04:00
Avi Kivity	3de4e8f91b	Merge 'cql: process LIMIT for GROUP BY select queries' from Paweł Zakrzewski This change fixes #17237, fixes #5361 and fixes #5362 by passing the limit value down the call chain in cql3. A test is also added. fixes #17237 fixes #5361 fixes #5362 The regression happened in 5.4 as we changed the way GROUP BY is processed in `432cb02` - to force aggregation when it is used. The LIMIT value was not passed to aggregations and thus we failed to adhere to it. W want to backport this fix to 5.4 and 6.0 to have continuous correct results for the test case from #17237 This patch consists of 4 commits: - fa4225ea0fac2057b7a9976f57dc06bcbd900cd4 - cql3: respect the user-defined page size in aggregate queries - a precondition for this patch to be implementable - 8fbe69e74dca16ed8832d9a90489ca47ba271d0b - cql3/select_statement: simplify the get_limit function - the `do_get_limit()` function did a lot of legwork that should not be associated with it. This change makes it trivial and makes its callers do additional checks (for unset guards, or for an aggregate query) - 162828194a2b88c22fbee335894ff045dcc943c9 - cql3: process LIMIT for GROUP BY queries - pass the limit value down the chain and make use of it. This is the actual fix to #17237 - b3dc6de6d6cda8f5c09b01463bb52f827a6a00b4 - test/cql-pytest: Add test for GROUP BY queries with LIMIT - tests Closes scylladb/scylladb#18842 * github.com:scylladb/scylladb: test/cql-pytest: Add test for GROUP BY queries with LIMIT cql3: process LIMIT for GROUP BY queries cql3/select_statement: simplify the get_limit function cql3: respect the user-defined page size in aggregate queries	2024-08-14 17:54:59 +03:00
Łukasz Paszkowski	309ba68692	select_statement: Execute reversed query in native format Use a reversed schema and a native reversed slice when constructing a read_command and executing a reversed select statement. Such a created read_command is passed further down to query_pagers::pager and storage::proxy::query_result that transform it to the format they accept/know, i.e. lagacy.	2024-08-13 10:03:46 +02:00
Paweł Zakrzewski	e7ae7f3662	cql3: process LIMIT for GROUP BY queries Currently LIMIT not passed to the query executor at all and it was just an accident that it worked for the case referenced in #17237. This change passes the limit value down the chain.	2024-08-11 09:08:43 +02:00
Paweł Zakrzewski	3838ad64b3	cql3/select_statement: simplify the get_limit function The get_limit() function performed tasks outside of its scope - for example checked if the statement was an aggregate. This change moves the onus of the check to the caller.	2024-08-11 09:08:43 +02:00
Kefu Chai	2dbf044b91	cql3: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16791	2024-01-16 16:43:17 +02:00
sylwiaszunejko	54f22927a3	cql3: send tablet if wrong node/shard is used during select statement	2023-11-22 09:23:43 +01:00
Botond Dénes	23898581d5	cql3: mutation_fragments_select_statement: use host_id instead of node The statement only uses the node to get its host_id later. Simpler to obtain and store only the host_id int he first place.	2023-10-24 03:12:58 -04:00
Gleb Natapov	4ffc39d885	cql3: Extend the scope of group0_guard during DDL statement execution Currently we hold group0_guard only during DDL statement's execute() function, but unfortunately some statements access underlying schema state also during check_access() and validate() calls which are called by the query_processor before it calls execute. We need to cover those calls with group0_guard as well and also move retry loop up. This patch does it by introducing new function to cql_statement class take_guard(). Schema altering statements return group0 guard while others do not return any guard. Query processor takes this guard at the beginning of a statement execution and retries if service::group0_concurrent_modification is thrown. The guard is passed to the execute in query_state structure. Fixes: #13942 Message-ID: <ZNsynXayKim2XAFr@scylladb.com>	2023-08-17 15:52:48 +03:00
Avi Kivity	d57a951d48	Revert "cql3: Extend the scope of group0_guard during DDL statement execution" This reverts commit `70b5360a73`. It generates a failure in group0_test .test_concurrent_group0_modifications in debug mode with about 4% probability. Fixes #15050	2023-08-15 00:26:45 +03:00
Gleb Natapov	70b5360a73	cql3: Extend the scope of group0_guard during DDL statement execution Currently we hold group0_guard only during DDL statement's execute() function, but unfortunately some statements access underlying schema state also during check_access() and validate() calls which are called by the query_processor before it calls execute. We need to cover those calls with group0_guard as well and also move retry loop up. This patch does it by introducing new function to cql_statement class take_guard(). Schema altering statements return group0 guard while others do not return any guard. Query processor takes this guard at the beginning of a statement execution and retries if service::group0_concurrent_modification is thrown. The guard is passed to the execute in query_state structure. Fixes: #13942 Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>	2023-08-13 14:19:39 +03:00
Botond Dénes	0b6b00178e	cql3/statments/select_statement: add mutation_fragments_select_statement Not wired in yet. SELECT * FROM MUTATION_FRAGMENTS($table) is a new select statement sub-type, which allows dumping the underling mutations making up the data of a given table. The output of this statement is mutation-fragments presented as CQL rows. Each row corresponds to a mutation-fragment. Subsequently, the output of this statement has a schema that is different than that of the underlying table. Data is always read from the local replica, on which the query is executed. Migrating queries between coordinators is not allowed.	2023-07-19 01:28:28 -04:00
Gleb Natapov	45ce608117	cql3: remove empty statement::validate functions There are a lot of empty overloads for the function so lets remove them and use the one in the parent class instead.	2023-06-22 13:57:33 +03:00
Avi Kivity	f54049322d	cql3: select_statement: split do_execute into fast-path and slow/slower paths select_statement::do_execute() has a fast path where it forwards to execute_without_checking_exception_message_non_aggregate_unpaged(). In this fast path, we aren't paging (a good reason for that is reading a partition without clustering keys) and in the slow/slower paths we page and/or perform complex processing like aggregation. The fast path doesn't need any continuations, but the slow/slower paths do. Split them off so that the slow/slower paths can be coroutinized without impacting the fast path.	2023-06-14 14:24:41 +03:00
Avi Kivity	096e569054	cql3: select_statement: disambiguate execute() overloads There are two execute() overloads, but they don't do the same thing - one is a partial implementation of the other. The same is true of two execute_without_checking_exception_message() overloads. Change the name of the subordinate overload to indicate its role. Overloads should be used when the only difference between overloads is the argument type, not when one does a subset of the other.	2023-06-13 19:28:29 +03:00
Avi Kivity	0b418fa7cf	cql3, transport, tests: remove "unset" from value type system The CQL binary protocol introduced "unset" values in version 4 of the protocol. Unset values can be bound to variables, which cause certain CQL fragments to be skipped. For example, the fragment `SET a = :var` will not change the value of `a` if `:var` is bound to an unset value. Unsets, however, are very limited in where they can appear. They can only appear at the top-level of an expression, and any computation done with them is invalid. For example, `SET list_column = [3, :var]` is invalid if `:var` is bound to unset. This causes the code to be littered with checks for unset, and there are plenty of tests dedicated to catching unsets. However, a simpler way is possible - prevent the infiltration of unsets at the point of entry (when evaluating a bind variable expression), and introduce guards to check for the few cases where unsets are allowed. This is what this long patch does. It performs the following: (general) 1. unset is removed from the possible values of cql3::raw_value and cql3::raw_value_view. (external->cql3) 2. query_options is fortified with a vector of booleans, unset_bind_variable_vector, where each boolean corresponds to a bind variable index and is true when it is unset. 3. To avoid churn, two compatiblity structs are introduced: cql3::raw_value{,_view}_vector_with_unset, which can be constructed from a std::vector<raw_value{,_view/}>, which is what most callers have. They can also be constructed with explicit unset vectors, for the few cases they are needed. (cql3->variables) 4. query_options::get_value_at() now throws if the requested bind variable is unset. This replaces all the throwing checks in expression evaluation and statement execution, which are removed. 5. A new query_options::is_unset() is added for the users that can tolerate unset; though it is not used directly. 6. A new cql3::unset_operation_guard class guards against unsets. It accepts an expression, and can be queried whether an unset is present. Two conditions are checked: the expression must be a singleton bind variable, and at runtime it must be bound to an unset value. 7. The modification_statement operations are split into two, via two new subclasses of cql3::operation. cql3::operation_no_unset_support ignores unsets completely. cql3::operation_skip_if_unset checks if an operand is unset (luckily all operations have at most one operand that tolerates unset) and applies unset_operation_guard to it. 8. The various sites that accept expressions or operations are modified to check for should_skip_operation(). This are the loops around operations in update_statement and delete_statement, and the checks for unset in attributes (LIMIT and PER PARTITION LIMIT) (tests) 9. Many unset tests are removed. It's now impossible to enter an unset value into the expression evaluation machinery (there's just no unset value), so it's impossible to test for it. 10. Other unset tests now have to be invoked via bind variables, since there's no way to create an unset cql3::expr::constant. 11. Many tests have their exception message match strings relaxed. Since unsets are now checked very early, we don't know the context where they happen. It would be possible to reintroduce it (by adding a format string parameter to cql3::unset_operation_guard), but it seems not to be worth the effort. Usage of unsets is rare, and it is explicit (at least with the Python driver, an unset cannot be introduced by ommission). I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't recognize unsets) with cql3::maybe_unset_value (that does), but that caused huge amounts of churn, so I abandoned that in favor of the current approach. Closes #12517	2023-01-16 21:10:56 +02:00
Avi Kivity	7f29efa0ad	cql3: select_statement: split process_results() into fast path and complex path This will allow us to coroutinize the complex path without adding an allocation to the fast path.	2022-12-04 21:30:45 +02:00
Jan Ciolek	76bf75a9d3	cql3: Use expression for index restrictions Restrictions that might be used by an index are currently being kept as shared_ptr<restrictions>. This stand in the way of replacing _parition_key_restrictions with an expression as an expression can't be cast to shared_ptr<restriction>. Change shared_ptr<restriction> to expression everywhere where necessary in index operations. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 16:29:11 +02:00
Pavel Emelyanov' via ScyllaDB development	a78af050fd	cql: Constify select_statement restrictions It is in fact immutable (both the pointer and the object it points to), so is the pointer copy returned by get_restrictions() method, so are those propagated to filtering stuff. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1028 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220624083351.24970-1-xemul@scylladb.com>	2022-06-24 12:27:36 +03:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Avi Kivity	8ab20bae68	Merge 'prepared_statements: Invalidate batch statement too' from Eliran Sinvani It seams that batch prepared statements always return false for depends_on_keyspace and depends_on_column_family, this in turn renders the removal criteria from the cache to always be false which result by the queries not being evicted. Here we change the functions to return the true state meaning, they will return true if any of the sub queries is dependant upon the keyspace or column family. In this fix we first make the API more coherent and then use this new API to implement the batch statement's dependency test. Fixes #10129 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes #10132 * github.com:scylladb/scylla: prepared_statements: Invalidate batch statement too cql3 statements: Change dependency test API to express better it's purpose	2022-03-07 14:00:05 +02:00
Nadav Har'El	fa7a302130	cross-tree: split coordinator_result from exceptions.hh Recently, coordinator_result was introduced as an alternative for exceptions. It was placed in the main "exceptions/exceptions.hh" header, which virtually every single source file in Scylla includes. But unfortunately, it brings in some heavy header files and templates, leading to a lot of wasted build time - ClangBuildAnalyzer measured that we include exceptions.hh in 323 source files, taking almost two seconds each on average. In this patch, we split the coordinator_result feature into a separate header file, "exceptions/coordinator_result", and only the few places which need it include the header file. Unfortunately, some of these few places are themselves header, so the new header file ends up being included in 100 source files - but 100 is still much less than 323 and perhaps we can reduce this number 100 later. After this patch, the total Scylla object-file size is reduced by 6.5% (the object size is a proxy for build time, which I didn't directly measure). ClangBuildAnalyzer reports that now each of the 323 includes of exceptions.hh only takes 80ms, coordinator_result.hh is only included 100 times, and virtually all the cost to include it comes from Boost's result.hh (400ms per inclusion). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220228204323.1427012-1-nyh@scylladb.com>	2022-03-02 10:12:57 +02:00
Eliran Sinvani	bf50dbd35b	cql3 statements: Change dependency test API to express better it's purpose Cql statements used to have two API functions, depends_on_keyspace and depends_on_column_family. The former, took as a parameter only a table name, which makes no sense. There could be multiple tables with the same name each in a different keyspace and it doesn't make sense to generalize the test - i.e to ask "Does a statement depend on any table named XXX?" In this change we unify the two calls to one - depends on that takes a keyspace name and optionally also a table name, that way every logical dependency tests that makes sense is supported by a single API call.	2022-02-27 11:48:03 +02:00
Piotr Dulikowski	ddf049738d	indexed_table_select_statement: return some exceptions as exception messages Adjusts the indexed_table_select_statement so that it uses the result-aware methods in storage_proxy and propagates failed results as result_message::exception.	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	3a4d3f3175	select_statement: implement execute_without_checking_exception_message The select_statement will be able to propagate coordinator failures without throwing, so it's important to override the default implementations of execute and excecute_without... so that the first calls the latter and not the other way around.	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	df7668797b	select_statement: introduce helpers for working with failed results Adds: - Includes for result-related helper methods (to be used in later commits), - Alias for coordinator_result, - The wrap_result_to_error_message function - a bit similar to utils::result_wrap. Adapts a callable T -> shared_ptr<result_message> to take result<T> -> shared_ptr<result_message>. If the result is failed, it converts it into result_message::exception and returns.	2022-02-22 16:25:21 +01:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Pavel Emelyanov	da4c29105d	select_statement: Replace all proxy-s with query_processor This is the largest user of proxy argument. Fix them all and their callers (all sit in the same .cc file). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-23 10:54:28 +03:00
Pavel Emelyanov	bce2ed9c6c	cql3: Make execution stages carry query_processor over The batch_ , modification_ and select_ statements get proxy from query processor just to push it through execution stage. Simplify that by pushing the query processor itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-23 10:53:44 +03:00
Pavel Emelyanov	b990ca5550	cql3: Make .validate() and .check_access() accept query_processor This is mostly a sed script that replaces methods' first argument plus fixes of compiler-generated errors. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-23 10:53:44 +03:00
Avi Kivity	d768e9fac5	cql3, related: switch to data_dictionary Stop using database (and including database.hh) for schema related purposes and use data_dictionary instead. data_dictionary::database::real_database() is called from several places, for these reasons: - calling yet-to-be-converted code - callers with a legitimate need to access data (e.g. system_keyspace) but with the ::database accessor removed from query_processor. We'll need to find another way to supply system_keyspace with data access. - to gain access to the wasm engine for testing whether used defined functions compile. We'll have to find another way to do this as well. The change is a straightforward replacement. One case in modification_statement had to change a capture, but everything else was just a search-and-replace. Some files that lost "database.hh" gained "mutation.hh", which they previously had access to through "database.hh".	2021-12-15 13:54:23 +02:00
Pavel Emelyanov	b0a8c153f7	select_statement: Remove unused proxy args and captures The generate_view_paging_state_from_base_query_results() has unused proxy argument that's carried over quite a long stack for nothing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20211210175203.26197-1-xemul@scylladb.com>	2021-12-10 20:39:55 +02:00
Jan Ciolek	075b3a45fd	select_statement: Store whether restrictions need filtering in a variable Instead of calculating _restrictions->need_filtering() we can calculate it only once and then use this computed variable. It turns out that _restrictions->need_filtering() is called during execution of prepared statements and it has to scan through the whole AST, so doing it only once gives us a performance gain. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-12-03 17:01:09 +01:00
Jan Ciolek	a24d06c195	cql3: Remove term in select_statement Replace all uses of term with expression in cql3/statements/select_statement Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-10-28 20:55:09 +02:00
Avi Kivity	daf028210b	build: enable -Winconsistent-missing-override warning This warning can catch a virtual function that thinks it overrides another, but doesn't, because the two functions have different signatures. This isn't very likely since most of our virtual functions override pure virtuals, but it's still worth having. Enable the warning and fix numerous violations. Closes #9347	2021-09-15 12:55:54 +03:00
Piotr Sarna	7506f44c77	cql3: use existing constant for max result in indexed statements Original code which introduced enforcing page limits for indexed statements created a new constant for max result size in bytes. Botond reported that we already have such a constant, so it's now used instead of reinventing it from scratch. Closes #8839	2021-06-10 11:08:54 +03:00
Avi Kivity	3e3003fcc1	Merge 'cql3: limit the concurrency of indexed statements' from Piotr Sarna Indexed select statements fetch primary key information from their internal materialized views and then use it to query the base table. Unfortunately, the current mechanism for retrieving base table rows makes it easy to overwhelm the replicas with unbounded concurrency - the number of concurrent ops is increased exponentially until a short read is encountered, but it's not enough to cap the concurrency - if data is fetched row-by-row, then short reads usually don't occur and as a result it's easy to see concurrency of 1M or higher. In order to avoid overloading the replicas, the concurrency of indexed queries is now capped at 4096 and additionally throttled if enough results are already fetched. For paged queries it means that the query returns as soon as 1MB of data is ready, and for unpaged ones the concurrency will no longer be doubled as soon as the previous iteration fetched 1MB of results. The fixed 4096 value can be subject to debate, its reasoning is as follows: for 2KiB rows, so moderately large but not huge, they result in fetching 10MB of data, which is the granularity used by replicas. For 200B rows, which is rather small, the result would still be around 1MB. At the same time, 4096 separate tasks also means 4096 allocations, so increasing the number also strains the allocator. Fixes #8799 Tests: unit(release), manual: observing metrics of modified index_paging_test Closes #8814 * github.com:scylladb/scylla: cql3: limit the transitional result size for indexed queries cql3: return indexed pages after 1MB worth of data cql3: limit the concurrency of indexed statements	2021-06-07 18:00:51 +03:00
Piotr Sarna	60e55b6c7f	cql3: return indexed pages after 1MB worth of data Currently there's no practical limit of the resulting page size for an indexed query, because it simply translates a page worth of base primary keys into base rows. In order to avoid sending too large pages, the result is returned after hitting a 1MB limit.	2021-06-07 16:05:50 +02:00
Piotr Sarna	8eeac10ded	cql3: limit the concurrency of indexed statements Indexed select statements fetch primary key information from their internal materialized views and then use it to query the base table. Unfortunately, the current mechanism for retrieving base table rows makes it easy to overwhelm the replicas with unbounded concurrency - the number of concurrent ops is increased exponentially until a short read is encountered, but it's not enough to cap the concurrency - if data is fetched row-by-row, then short reads usually don't occur and as a result it's easy to see concurrency of 1M or higher. In order to avoid overloading the replicas, the concurrency of indexed queries is now capped at 4096. The number can be subject to debate, its reasoning is as follows: for 2KiB rows, so moderately large but not huge, they result in fetching 10MB of data, which is the granularity used by replicas. For 200B rows, which is rather small, the result would still be around 1MB. At the same time, 4096 separate tasks also means 4096 allocations, so increasing the number also strains the allocator. Fixes #8799 Tests: unit(release), manual: observing metrics of modified index_paging_test	2021-06-07 15:56:15 +02:00

1 2 3

133 Commits