scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 12:06:44 +00:00

Author	SHA1	Message	Date
Kefu Chai	fa8eaab62b	build: remove duplicated test this change has no impact on `build.ninja` generated by `configure.py`. as we are using a `set` for tracking the tests to be built. but it's still an improvement, as we should not add duplicated entries in a set when initializing it. there are two occurrences of `test/boost/double_decker_test`, the one which is in the club of the local cluster of collections tests - bptree, btree, radix_tree and double_decker are preserved. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14478	2023-07-05 15:43:04 +03:00
Kefu Chai	e4697e2bd2	sstable: remove stale comment this comment should have been removed in `f014ccf369`. but better late than never. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14497	2023-07-05 15:42:11 +03:00
Pavel Emelyanov	e91f95a629	Merge 's3/test: restructure object_store test into a pytest based test suite' from Kefu Chai in this series, test/object_storage is restructured into a pytest based test. this paves the road to a test suites covers more use cases. so we can some more lower-level tests for tiered/caching-store. Closes #14165 * github.com:scylladb/scylladb: s3/test: do not return ip in managed_cluster() s3/test: verify the behavior with asserts s3/test: restructure object_store/run into a pytest s3/test: extract get_scylla_with_s3_cmd() out s3/test: s/restart_with_dir/kill_with_dir/ s3/test: vendor run_with_dir() and friends s3/test: remove get_tempdir() s3/test: extract managed_cluster() out	2023-07-05 15:40:43 +03:00
Gleb Natapov	c42a91ec72	cql3: Extend the scope of group0_guard during DDL statement execution Currently we hold group0_guard only during DDL statement's execute() function, but unfortunately some statements access underlying schema state also during check_access() and validate() calls which are called by the query_processor before it calls execute. We need to cover those calls with group0_guard as well and also move retry loop up. This patch does it by introducing new function to cql_statement class take_guard(). Schema altering statements return group0 guard while others do not return any guard. Query processor takes this guard at the beginning of a statement execution and retries if service::group0_concurrent_modification is thrown. The guard is passed to the execute in query_state structure. Fixes: #13942 Message-Id: <ZJ2aeNIBQCtnTaE2@scylladb.com>	2023-07-05 14:38:34 +02:00
Pavel Emelyanov	dfff5f2f2e	Merge 'test/pylib: retry if minio_server is not ready and define a name for alias' from Kefu Chai there is chance that minio_server is not ready to serve after launching the server executable process. so we need to retry until the first "mc" command is able to talk to it. in this change, add method `mc()` is added to run minio client, so we can retry the command before it timeouts. and it allows us to ignore the failure or specify the timeout. this should ready the minio server before tests start to connect to it. also, in this change, instead of hardwiring the alias of "local" in the code, define a variable for it. less repeating this way. Fixes https://github.com/scylladb/scylladb/issues/1719 Closes #14517 * github.com:scylladb/scylladb: test/pylib: do not hardwire alias to "local" test/pylib: retry if minio_server is not ready	2023-07-05 12:32:58 +03:00
Kefu Chai	9080f8842b	s3/test: do not return ip in managed_cluster() let's just use cluster.contact_points for retrieving the IP address of the scylla node in this single-node cluster. so the name of managed_cluster() is less weird. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 17:07:39 +08:00
Kefu Chai	ec6410653f	s3/test: verify the behavior with asserts instead of assigning to "success", let's use assert for this purpose. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 17:07:21 +08:00
Kefu Chai	471d75c6c6	s3/test: restructure object_store/run into a pytest instead of using a single run to perform the test, restructure it into a pytest based test suite with a single test case. this should allow us to add more tests exercising the object-storage and cached/tierd storage in future. * add fixtures so they can be reused by tests * use tmpdir fixture for managing the tmpdir, see https://docs.pytest.org/en/6.2.x/tmpdir.html#the-tmpdir-fixture * perform part of the teardown in the "test_tempdir()" fixture * change the type of test from "Run" to "Python" * rename "run" to "test_basic.py" * optionally start the minio server if the settings are not found in command line or env variables, so that the tests are self-contained without the fixture setup by test.py. * instead of sys.exit(), use assert statement, as this is what pytest uses. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 17:05:13 +08:00
Kefu Chai	bffaf84395	s3/test: extract get_scylla_with_s3_cmd() out * define a dedicated S3_server class which duck types MinioServer. it will be used to represent S3 server in place of MinioServer if S3 is used for testing * prepare object_storage.yaml in get_scylla_with_s3(), so it is more clear that we are using the same set of settings for launching scylla Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 16:49:04 +08:00
Kefu Chai	f74218f434	s3/test: s/restart_with_dir/kill_with_dir/ replace the restart_with_dir() with kill_with_dir(), so that we can simplify the usage of managed_cluster() by enabling it to start and stop the single-node cluster. with this change, the caller does not need to run the scylla and pass its pid to this function any more. since the restart_with_dir() call is superseded by managed_cluster(), which tears down the cluster, teardown() is now only responsible to print out the log file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 16:48:25 +08:00
Kefu Chai	a6bb5864ff	s3/test: vendor run_with_dir() and friends so we don't need to mess up with cql-pytest/run.py, which is use by cql-pytest. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 16:48:04 +08:00
Kefu Chai	b45049c968	s3/test: remove get_tempdir() to match with another call of managed_cluster(), so it's clear that we are just reusing test_tempdir. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 16:45:14 +08:00
Kefu Chai	a5a87d81c6	s3/test: extract managed_cluster() out for setting up the cluster and tearing down it. this helps to indent the code so that it is visually explicit the lifecycle of the cluster. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 16:45:14 +08:00
Kefu Chai	1faf50fc05	test/pylib: do not hardwire alias to "local" define a variable for it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 15:58:41 +08:00
Kefu Chai	d55cfdc152	test/pylib: retry if minio_server is not ready there is chance that minio_server is not ready to serve after launching the server executable process. so we need to retry until the first "mc" command is able to talk to it. in this change, add method `mc()` is added to run minio client, so we can retry the command before it timeouts. and it allows us to ignore the failure or specify the timeout. this should ready the minio server before tests start to connect to it. Fixes #1719 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 15:57:59 +08:00
Konstantin Osipov	b9c2b326bc	raft: do not update raft address map with obsolete gossip data It is possible that a gossip message from an old node is delivered out of order during a slow boot and the raft address map overwrites a new IP address with an obsolete one, from the previous incarnation of this node. Take into account the node restart counter when updating the address map. A test case requires a parameterized error injection, which we don't support yet. Will be added as a separate commit. Fixes #14257 Refs #14357 Closes #14329	2023-07-05 00:16:28 +02:00
Avi Kivity	0f59b17056	cql3: select_statement: don't copy metadata object needlessly It's a shared_ptr<const metadata>, so it's safe to pass around. perf-simple-query: before: 211989.40 tps ( 62.1 allocs/op, 13.1 tasks/op, 43812 insns/op, 0 errors) 217889.09 tps ( 62.1 allocs/op, 13.1 tasks/op, 43713 insns/op, 0 errors) 211418.75 tps ( 62.1 allocs/op, 13.1 tasks/op, 43782 insns/op, 0 errors) 217388.46 tps ( 62.1 allocs/op, 13.1 tasks/op, 43733 insns/op, 0 errors) 211528.74 tps ( 62.1 allocs/op, 13.1 tasks/op, 43766 insns/op, 0 errors) after: 215241.86 tps ( 61.1 allocs/op, 13.1 tasks/op, 43563 insns/op, 0 errors) 216172.41 tps ( 61.1 allocs/op, 13.1 tasks/op, 43562 insns/op, 0 errors) 212591.73 tps ( 61.1 allocs/op, 13.1 tasks/op, 43586 insns/op, 0 errors) 212217.28 tps ( 61.1 allocs/op, 13.1 tasks/op, 43553 insns/op, 0 errors) 215863.47 tps ( 61.1 allocs/op, 13.1 tasks/op, 43559 insns/op, 0 errors) About 200 instructions saved. Closes #14499	2023-07-04 16:41:51 +03:00
Marcin Maliszkiewicz	6424dd5ec4	alternator: close output_stream when exception is thrown during response streaming When exception occurs and we omit closing output_stream then the whole process is brought down by an assertion in ~output_stream. Fixes https://github.com/scylladb/scylladb/issues/14453 Relates https://github.com/scylladb/scylladb/issues/14403 Closes #14454	2023-07-04 16:15:08 +03:00
Pavel Emelyanov	3679792f49	Merge 'test/pylib: allow run minio_server.py as a stand-alone tool' from Kefu Chai this would allow developer to run a minio server for testing, for instance, s3_test. Closes #14485 * github.com:scylladb/scylladb: test/pylib: chmod +x minio_server.py test/pylib: allow run minio_server.py as a stand-alone tool	2023-07-04 13:41:51 +03:00
Kefu Chai	c005b6dce0	test/pylib: chmod +x minio_server.py add a shebang line. so we can just launch a minio_server using ```console test/pylib/minio_server.py --host 127.0.0.1 ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-04 13:19:34 +08:00
Kefu Chai	2bae0b9aa8	test/pylib: allow run minio_server.py as a stand-alone tool this would allow developer to run a minio server for testing, for instance, s3_test, using something like: ```console $ python3 test/pylib/minio_server.py --host 127.0.0.1 tempdir='/tmp/tmpfoobar-minio' export S3_SERVER_ADDRESS_FOR_TEST=127.0.0.1 export S3_SERVER_PORT_FOR_TEST=900 export S3_PUBLIC_BUCKET_FOR_TEST=testbucket ``` and developer is supposed to copy-and-paste the `export` commands to prepare the environmental variables for the test using the minio server. the tempdir is used for the rundir of minio, and it is also used for holding the log file of this tool. one might want to check it when necessary. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-04 13:14:42 +08:00
Tomasz Grabiec	7d35cf8657	Merge 'migration_manager: disable schema pulls when schema is Raft-managed' from Kamil Braun We want to disable `migration_manager` schema pulls and make schema managed only by Raft group 0 if Raft is enabled. This will be important with Raft-based topology, when schema will depend on topology (e.g. for tablets). We solved the problem partially in PR #13695. However, it's still possible for a bootstrapping node to pull schema in the early part of bootstrap procedure, before it setups group 0, because of how the currently used `_raft_gr.using_raft()` check is implemented. Here's the list of cases: - If a node is bootstrapping in non-Raft mode, schema pulls must remain enabled. - If a node is bootstrapping in Raft mode, it should never perform a schema pull. - If a bootstrapped node is restarting in non-Raft mode but with Raft feature enabled (which means we should start upgrading to use Raft), or restarting in the middle of Raft upgrade procedure, schema pulls must remain enabled until the Raft upgrade procedure finishes. This is also the case of restarting after RECOVERY. - If a bootstrapped node is restarting in Raft mode, it should never perform a schema pull. The `raft_group0` service is responsible for setting up Raft during boot and for the Raft upgrade procedure. So this is the most natural place to make the decision that schema pulls should be disabled. Instead of trying to come up with a correct condition that fully covers the above list of cases, store a `bool` inside `migration_manager` and set it from `raft_group0` function at the right moment - when we decide that we should boot in Raft mode, or restart with Raft, or upgrade. Most of the conditions are already checked in `setup_group0_if_exist`, we just need to set the bool. Also print a log message when schema pulls are disabled. Fix a small bug in `migration_manager::get_schema_for_write` - it was possible for the function to mark schema as synced without actually syncing it if it was running concurrently to the Raft upgrade procedure. Correct some typos in comments and update the comments. Fixes #12870 Closes #14428 * github.com:scylladb/scylladb: raft_group_registry: remove `has_group0()` raft_group0_client: remove `using_raft()` migration_manager: disable schema pulls when schema is Raft-managed	2023-07-03 23:54:34 +02:00
Nadav Har'El	ec77172b4b	Merge 'cql3: convert the SELECT clause evaluation phase to expressions' from Avi Kivity SELECT clause components (selectors) are currently evaluated during query execution using a stateful class hierarchy. This state is needed to hold intermediate state while aggregating over multiple rows. Because the selectors are stateful, we must re-create them each query using a selector_factory hierarchy. We'd like to convert all of this to the unified expression evaluation machinery, so we can have just one grammar for expressions, and just one way to evaluate expressions, but the statefulness makes this complex. In commit `59ab9aac44` "(Merge 'functions: reframe aggregate functions in terms of scalar functions' from Avi Kivity)", we made aggregate functions stateless, moving their state to aggregate_function_selector::_accumulator, and therefore into the class hierarchy we're addressing now. Another reason for keeping state is that selectors that aren't aggregated capture the first value they see in a GROUP BY group. Since expressions can't contain state directly, we break apart expressions that contain aggregate functions into two: an inner expression that processes incoming rows within a group, and an outer expression that generates the group's output. The two expressions communicate via a newly introduced expression element: a temporary. The problem of non-aggregated columns requiring state is solved by encapsulating those columns in an internal aggregate function, called the "first" function. In terms of performance, this series has little effect, since the common case of selectors that only contain direct column references without transformations is evaluated via a fast path (`simple_selection`). This fast-path is preserved with almost no changes. While the series makes it possible to start to extend the grammar and unify expression syntaxes, it does not do so. The grammar is unchanged. There is just one breaking change: the `SELECT JSON` statement generates json object field names based on the input selectors. In one case the name of the field has changed, but it is an esoteric case (where a function call is selected as part of `SELECT JSON`), and the new behavior is compatible with Cassandra. Closes #14467 * github.com:scylladb/scylladb: cql3: selection: drop selector_factories, selectables, and selectors cql3: select_statement: stop using selector_factories in SELECT JSON cql3: selection: don't create selector_factories any more cql3: selection: collect column_definitions using expressions cql3: selection: reimplement selection::is_aggregate() cql3: selection: evaluate aggregation queries via expr::evaluate() cql3: selection, select_statement: fine tune add_column_for_post_processing() usage cql3: selection: evaluate non-aggregating complex selections using expr::evaluate() cql3: selection: store primary key in result_set_builder cql3: expression: fix field_selection::type interpretation by evaluate() cql3: selection: make result_set_builder::current non-optional<> cql3: selection: simplify row/group processing cql3: selection: convert requires_thread to expressions cql: selection: convert used_functions() to expressions cql3: selection: convert is_reducible/get_reductions to expressions cql3: selection: convert is_count() to expressions cql3: selection convert contains_ttl/contains_writetime to work on expressions cql3: selection: make simple_selectors stateless cql3: expression: add helper to split expressions with aggregate functions cql3: selection: short-circuit non-aggregations cql3: selection: drop validate_selectors cql3: select_statement: force aggregation if GROUP BY is used cql3: select_statement: levellize aggregation depth cql3: selection: skip first_function when collecting metadata cql3: select_statement: explicitly disable automatic parallelization with no aggregates cql3: expression: introduce temporaries cql3: select_statement: use prepared selectors cql3: selection: avoid selector_factories in collect_metadata() cql3: expressions: add "metadata mode" formatter for expressions cql3: selection: convert collect_metadata() to the prepared expression domain cql3: selection: convert processes_selection to work on prepared expressions cql3: selection: prepare selectors earlier cql3: raw_selector: deinline cql3: expression: reimplement verify_no_aggregate_functions() cql3: expression: add helpers to manage an expression's aggregation depth cql3: expression: improve printing of prepared function calls cql3: functions: add "first" aggregate function	2023-07-03 23:21:33 +03:00
Avi Kivity	66c47d40e6	cql3: selection: drop selector_factories, selectables, and selectors The whole class hierarchy is no longer used by anything and we can just delete it.	2023-07-03 19:45:17 +03:00
Avi Kivity	d9cf81f1a6	cql3: select_statement: stop using selector_factories in SELECT JSON SELECT JSON uses selector_factories to obtain the names of the fields to insert into the json object, and we want to drop selector_factories entirely. Switch instead to the ":metadata" mode of printing expressions, which does what we want. Unfortunately, the switch changes how system functions are converted into field names. A function such as unixtimestampof() is now rendered as "system.unixtimestampof()"; before it did not have the keyspace prefix. This is a compatiblity problem, albeit an obscure one. Since the new behavior matches Cassandra, and the odds of hitting this are very low, I think we can allow the change.	2023-07-03 19:45:17 +03:00
Avi Kivity	039472ffb9	cql3: selection: don't create selector_factories any more We no longer use selector_factories for anything, so we can drop them.	2023-07-03 19:45:17 +03:00
Avi Kivity	e521557ce5	cql3: selection: collect column_definitions using expressions The replica needs to know which columns we're interested in. Iterate and recurse into all selector expressions to collect all mentioned columns. We use the same algorithm that create_factories_and_collect_column_definitions() uses, even though it is quadratic, to avoid causing surprises.	2023-07-03 19:45:17 +03:00
Avi Kivity	7bd317ace4	cql3: selection: reimplement selection::is_aggregate() We can get rid of the last use of selector_factories by reimplementing is_aggregate(). It's simple - if we have an inner loop, we're aggregating.	2023-07-03 19:45:17 +03:00
Avi Kivity	91cdaa72bd	cql3: selection: evaluate aggregation queries via expr::evaluate() When constructing a selection_with_processing, split the selectors into an inner loop and an outer loop with split_aggregation(). We can then reimplement add_input_row() and get_output_row() as follows: - add_input_row(): evaluate the inner loop expressions and store the results in temporaries - get_output_row(): evaluate the outer loop expressions, pulling in values from those temporaries. reset(), which is called between groups, simply copies the initial values rathered by split_aggregation() into the temporaries. The only complexity comes from add_column_for_post_query_processing(), which essentially re-does the work of split_aggregation(). It would be much better if we added the column before split_aggregation() was called, but some refactoring has to take place before that happens.	2023-07-03 19:45:17 +03:00
Avi Kivity	27254c4f50	cql3: selection, select_statement: fine tune add_column_for_post_processing() usage In three cases we need to consult a column that's possibly not explicitly selected: - for the WHERE clause - for GROUP BY - for ORDER BY The return value of the function is the index where the newly-added column can be found. Currently, the index is correct for both the internal column vector and the result set, but soon in won't be. In the first two cases (WHERE clause and ORDER BY), we're interested in the column before grouping, in the last case (ORDER BY) we're interested in the column after grouping, so we need to distinguish between the two. Since we already have selection::index_of() that returns the pre-grouping index, choose the post-grouping index for the return value of selection::add_column_for_post_processing(), and change the GROUP BY code to use index_of(). Comments are added.	2023-07-03 19:45:17 +03:00
Avi Kivity	6bf1bd7130	cql3: selection: evaluate non-aggregating complex selections using expr::evaluate() Now that everything is in place, implement the fast-path transform_input_row() for selection_with_processing. It's a straightforward call to evaluate() in a loop. We adjust add_column_for_post_processing() to also update _selectors, otherwise ORDER BY clauses that require an additional column will not see that column. Since every sub-class implements transform_input_row(), mark the base class declaration as pure virtual.	2023-07-03 19:45:17 +03:00
Avi Kivity	f5eb7fd6dc	cql3: selection: store primary key in result_set_builder expr::evaluate() expects an exploded primary key in its evaluation_inputs structure (this dates back from the conversion of filtering to expressions). But right now, the exploded primary key is only available in the filter. That's easy to fix however: move the primary key containers to result_set_builder and just keep references in the filter. After this, we can evaluate column_value expressions that reference the primary key.	2023-07-03 19:45:17 +03:00
Avi Kivity	0021f77e30	cql3: expression: fix field_selection::type interpretation by evaluate() field_selection::type refers to the type of the selection operation, not the type of the structure being selected. This is what prepare_expression() generates and how all other expression elements work, but evaluate() for field_selection thinks it's the type of the structure, and so fails when it gets an expression from prepare_expression(). Fix that, and adjust the tests.	2023-07-03 19:45:17 +03:00
Avi Kivity	aed01018a3	cql3: selection: make result_set_builder::current non-optional<> Previously, we used the engagedness of result_set_builder::optional as a flag, but the previous patch eliminated that and it's always engaged. Remove the optional wrapper to reduce noise.	2023-07-03 19:45:17 +03:00
Avi Kivity	44c8507075	cql3: selection: simplify row/group processing Processing a result set relies on calling result_set_builder::new_row(). This function is quite complex as it has several roles: - complete processing of the previously computed row, if any - determine if GROUP BY grouping has changed, and flush the previous group if so - flush the last group if that's the case This works now, but won't work with expr::evaluate. The reason is that new_row() is called after the partition key and clustering key of the new row have been evaluated, so processing of the previous row will see incorrect data. It works today because we copy the partition key and clustering key into result_set_builder::current, but expr::evaluate uses the exploded partition key and clustering key, which have been clobbered. The solution is to separate the roles. Instead of new_row() that's responsible for completing the previous row and starting a new one, we have start_new_row() that's responsible for what its name says, and complete_row() that's responsible for completing the row and checking for group change. The responsibity for flushing the final group is moved to result_set_builder::build(). This removes the awkward "more_rows_coming" parameter that makes everything more complicated. result_set_builder::current is still optional, but it's always engaged. The next patch will clean that up.	2023-07-03 19:45:17 +03:00
Avi Kivity	877f4f86d2	cql3: selection: convert requires_thread to expressions If any function requires a thread to execute (due to running in Lua or wasm), then the entire selection needs to run in a thread.	2023-07-03 19:45:17 +03:00
Avi Kivity	cbd68abde8	cql: selection: convert used_functions() to expressions used_functions() is used to check whether prepared statements need to be invalidated when user-defined functions change. We need to skip over empty scalar components of aggregates, since these can be defined by users (with the same meaning as if the identity function was used).	2023-07-03 19:45:17 +03:00
Avi Kivity	bfb1acc6d3	cql3: selection: convert is_reducible/get_reductions to expressions The current version of automatic query parallelization works when all selectors are reducible (e.g. have a state_reduction_function member), and all the inputs to the aggregates are direct column selectors without further transformation. The actual column names and reductions need to be packed up for forward_service to be used. Convert is_reducible()/get_reductions() to the expression world. The conversion is fairly straightforward.	2023-07-03 19:45:17 +03:00
Avi Kivity	d99fc29e2d	cql3: selection: convert is_count() to expressions Early versions of automatic query parallelization only supported `SELECT count(*)` with one selector. Convert the check to expressions.	2023-07-03 19:45:17 +03:00
Avi Kivity	d36eb8cea6	cql3: selection convert contains_ttl/contains_writetime to work on expressions contains_ttl/contains_writetime are two attributes of a selection. If a selection contains them, we must ask the replica to send them over; otherwise we don't have data to process. Not sending ttl/writetime saves some effort. The implementation is a straightforward recursive descent using expr::find_in_expression.	2023-07-03 19:45:17 +03:00
Avi Kivity	6c2bb5e1ed	cql3: selection: make simple_selectors stateless Now that we push all GROUP BY queries to selection_with_processing, we always process rows via transform_input_row() and there's no reason to keep any state in simple_selectors. Drop the state and raise an internal error if we're ever called for aggregation.	2023-07-03 19:45:17 +03:00
Avi Kivity	a26516ef65	cql3: expression: add helper to split expressions with aggregate functions Aggregate functions cannot be evaluated directly, since they implicitly refer to state (the accumulator). To allow for evaluation, we split the expression into two: an inner expression that is evaluated over the input vector (once per element). The inner expression calls the aggregation function, with an extra input parameter (the accumulator). The outer expression is evaluated once per input vector; it calls the final function, and its input is just the accumulator. The outer expression also contains any expressions that operate on the result of the aggregate function. The acculator is stored in a temporary. Simple example: sum(x) is transformed into an inner expression: t1 = (t1 + x) // really sum.aggregation_function and an outer expression: result = t1 // really sum.state_to_result_function Complicated example: scalar_func(agg1(x, f1(y)), agg2(x, f2(y))) is transformed into two inner expressions: t1 = agg1.aggregation_function(t1, x, f1(y)) t2 = agg2.aggregation_function(t2, x, f2(y)) and an outer expression output = scalar_func(agg1.state_to_result_function(t1), agg2.state_to_result_function(t2)) There's a small wart: automatically parallelized queries can generate "reducible" aggregates that have no state_to_result function, since we want to pass the state back to the coordinator. Detect that and short circuit evaluation to pass the accumulator directly.	2023-07-03 19:45:17 +03:00
Avi Kivity	f48ecb5049	cql3: selection: short-circuit non-aggregations Currently, selector evaluation assumes the most complex case where we aggregate, so multiple input rows combine into one output row. In effect the query either specifies an outer loop (for the group) and an inner loop (for input rows), or it only specifies the inner loop; but we always perform the outer and inner loop. Prepare to have a separate path for the non-aggregation case by introducing transform_input_row().	2023-07-03 19:45:17 +03:00
Avi Kivity	4a2428e4ec	cql3: selection: drop validate_selectors It's unused. It dates from the (perhaps better) time when regularity of aggregation across selectors was enforced.	2023-07-03 19:45:17 +03:00
Avi Kivity	432cb02d64	cql3: select_statement: force aggregation if GROUP BY is used GROUP BY is typically used with aggregation. In one case the aggregation is implicit: SELECT a, b, c FROM tab GROUP BY x, y, z One row will appear from each group, even though no aggregation was specified. To avoid this irregularity, rewrite this query as SELECT first(a), first(b), first(c) FROM tab GROUP BY x, y, z This allows us to have different paths for aggregations and non-aggregations, without worrying about this special case.	2023-07-03 19:45:17 +03:00
Avi Kivity	bc6c64e13c	cql3: select_statement: levellize aggregation depth Avoid mixed aggregate/non-aggregate queries by inserting calls to the first() function. This allows us to avoid internal state (simple_selector::_current) and make selector evaluation stateless apart from explicit temporaries.	2023-07-03 19:45:17 +03:00
Avi Kivity	ecdded90cd	cql3: selection: skip first_function when collecting metadata We plan to rewrite aggregation queries that have a non-aggregating selector using the first function, so that all selectors are aggregates (or none are). Prevent the first function from affecting metadata (the auto-generated column names), by skipping over the first function if detected. They input and output types are unchanged so this only affects the name.	2023-07-03 19:45:17 +03:00
Avi Kivity	996e02f5bf	cql3: select_statement: explicitly disable automatic parallelization with no aggregates A query of the form `SELECT foo, count(foo) FROM tab` returns the first value of the foo column along with the count. This can't be parallized today since the first selector isn't an aggregate. We plan to rewrite the query internally as `SELECT first(foo), count(foo) FROM tab`, in order to make the query more regular (no mixing of aggregates and non-aggregates). However, this will defeat the current check since after the rewrite, all selectors are aggregates. Prepare for this by performing the check on a pre-rewrite variable, so it won't be affected by the query rewrite in the next patch. Note that although even though we could add support for running first() in parallel, it's not possible to get the correct results, since first() is not commutative and we don't reduce in order. It's also not a particularly interesting query.	2023-07-03 19:45:17 +03:00
Avi Kivity	778ae2b461	cql3: expression: introduce temporaries Temporaries are similar to bind variables - they are values provided from outside the expression. While bind variables are provided by the user, temporaries are generated internally. The intended use is for aggregate accumulator storage. Currently aggregates store the accumulator in aggregate_function_selector::_accumulator, which means the entire selector hierarchy must be cloned for every query. With expressions, we can have a single expression object reused for many computations, but we need a way to inject the accumulator into an aggregation, which this new expression element provides.	2023-07-03 19:45:17 +03:00
Avi Kivity	7c3ceb6473	cql3: select_statement: use prepared selectors Change one more layer of processing to work on prepared rather than raw selectors. This moves the call to prepare the selectors early in select_statement processing. In turn this changes maybe_jsonize_select_clause() and forward_service's mock_selection() to work in the prepared realm as well. This moves us one step closer to using evaluate() to process the select clause, as the prepared selectors are now available in select_statement. We can't use them yet since we can't evaluate aggregations.	2023-07-03 19:45:17 +03:00

1 2 3 4 5 ...

37710 Commits