Commit Graph

37710 Commits

Author SHA1 Message Date
Kefu Chai
fa8eaab62b build: remove duplicated test
this change has no impact on `build.ninja` generated by `configure.py`.
as we are using a `set` for tracking the tests to be built. but it's
still an improvement, as we should not add duplicated entries in a set
when initializing it.

there are two occurrences of `test/boost/double_decker_test`, the one
which is in the club of the local cluster of collections tests - bptree,
btree, radix_tree and double_decker are preserved.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14478
2023-07-05 15:43:04 +03:00
Kefu Chai
e4697e2bd2 sstable: remove stale comment
this comment should have been removed in
f014ccf369. but better late than never.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14497
2023-07-05 15:42:11 +03:00
Pavel Emelyanov
e91f95a629 Merge 's3/test: restructure object_store test into a pytest based test suite' from Kefu Chai
in this series, test/object_storage is restructured into a pytest based test. this paves the road to a test suites covers more use cases. so we can some more lower-level tests for tiered/caching-store.

Closes #14165

* github.com:scylladb/scylladb:
  s3/test: do not return ip in managed_cluster()
  s3/test: verify the behavior with asserts
  s3/test: restructure object_store/run into a pytest
  s3/test: extract get_scylla_with_s3_cmd() out
  s3/test: s/restart_with_dir/kill_with_dir/
  s3/test: vendor run_with_dir() and friends
  s3/test: remove get_tempdir()
  s3/test: extract managed_cluster() out
2023-07-05 15:40:43 +03:00
Gleb Natapov
c42a91ec72 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942
Message-Id: <ZJ2aeNIBQCtnTaE2@scylladb.com>
2023-07-05 14:38:34 +02:00
Pavel Emelyanov
dfff5f2f2e Merge 'test/pylib: retry if minio_server is not ready and define a name for alias' from Kefu Chai
there is chance that minio_server is not ready to serve after
launching the server executable process. so we need to retry until
the first "mc" command is able to talk to it.

in this change, add method `mc()` is added to run minio client,
so we can retry the command before it timeouts. and it allows us to
ignore the failure or specify the timeout. this should ready the
minio server before tests start to connect to it.

also, in this change, instead of hardwiring the alias of "local" in the code,
define a variable for it. less repeating this way.

Fixes https://github.com/scylladb/scylladb/issues/1719

Closes #14517

* github.com:scylladb/scylladb:
  test/pylib: do not hardwire alias to "local"
  test/pylib: retry if minio_server is not ready
2023-07-05 12:32:58 +03:00
Kefu Chai
9080f8842b s3/test: do not return ip in managed_cluster()
let's just use cluster.contact_points for retrieving the IP address
of the scylla node in this single-node cluster. so the name of
managed_cluster() is less weird.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:07:39 +08:00
Kefu Chai
ec6410653f s3/test: verify the behavior with asserts
instead of assigning to "success", let's use assert for this purpose.
simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:07:21 +08:00
Kefu Chai
471d75c6c6 s3/test: restructure object_store/run into a pytest
instead of using a single run to perform the test, restructure
it into a pytest based test suite with a single test case.
this should allow us to add more tests exercising the object-storage
and cached/tierd storage in future.

* add fixtures so they can be reused by tests
* use tmpdir fixture for managing the tmpdir, see
  https://docs.pytest.org/en/6.2.x/tmpdir.html#the-tmpdir-fixture
* perform part of the teardown in the "test_tempdir()" fixture
* change the type of test from "Run" to "Python"
* rename "run" to "test_basic.py"
* optionally start the minio server if the settings are not
  found in command line or env variables, so that the tests are
  self-contained without the fixture setup by test.py.
* instead of sys.exit(), use assert statement, as this is
  what pytest uses.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:05:13 +08:00
Kefu Chai
bffaf84395 s3/test: extract get_scylla_with_s3_cmd() out
* define a dedicated S3_server class which duck types MinioServer.
  it will be used to represent S3 server in place of MinioServer if
  S3 is used for testing
* prepare object_storage.yaml in get_scylla_with_s3(), so it is more
  clear that we are using the same set of settings for launching
  scylla

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:49:04 +08:00
Kefu Chai
f74218f434 s3/test: s/restart_with_dir/kill_with_dir/
replace the restart_with_dir() with kill_with_dir(), so
that we can simplify the usage of managed_cluster() by enabling it
to start and stop the single-node cluster. with this change, the caller
does not need to run the scylla and pass its pid to this function
any more.

since the restart_with_dir() call is superseded by managed_cluster(),
which tears down the cluster, teardown() is now only responsible to
print out the log file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:48:25 +08:00
Kefu Chai
a6bb5864ff s3/test: vendor run_with_dir() and friends
so we don't need to mess up with cql-pytest/run.py, which is
use by cql-pytest.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:48:04 +08:00
Kefu Chai
b45049c968 s3/test: remove get_tempdir()
to match with another call of managed_cluster(), so it's clear that
we are just reusing test_tempdir.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:45:14 +08:00
Kefu Chai
a5a87d81c6 s3/test: extract managed_cluster() out
for setting up the cluster and tearing down it.
this helps to indent the code so that it is visually explicit
the lifecycle of the cluster.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:45:14 +08:00
Kefu Chai
1faf50fc05 test/pylib: do not hardwire alias to "local"
define a variable for it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 15:58:41 +08:00
Kefu Chai
d55cfdc152 test/pylib: retry if minio_server is not ready
there is chance that minio_server is not ready to serve after
launching the server executable process. so we need to retry until
the first "mc" command is able to talk to it.

in this change, add method `mc()` is added to run minio client,
so we can retry the command before it timeouts. and it allows us to
ignore the failure or specify the timeout. this should ready the
minio server before tests start to connect to it.

Fixes #1719
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 15:57:59 +08:00
Konstantin Osipov
b9c2b326bc raft: do not update raft address map with obsolete gossip data
It is possible that a gossip message from an old node is delivered
out of order during a slow boot and the raft address map overwrites
a new IP address with an obsolete one, from the previous incarnation
of this node. Take into account the node restart counter when updating
the address map.

A test case requires a parameterized error injection, which
we don't support yet. Will be added as a separate commit.

Fixes #14257
Refs #14357

Closes #14329
2023-07-05 00:16:28 +02:00
Avi Kivity
0f59b17056 cql3: select_statement: don't copy metadata object needlessly
It's a shared_ptr<const metadata>, so it's safe to pass around.

perf-simple-query:

before:
211989.40 tps ( 62.1 allocs/op,  13.1 tasks/op,   43812 insns/op,        0 errors)
217889.09 tps ( 62.1 allocs/op,  13.1 tasks/op,   43713 insns/op,        0 errors)
211418.75 tps ( 62.1 allocs/op,  13.1 tasks/op,   43782 insns/op,        0 errors)
217388.46 tps ( 62.1 allocs/op,  13.1 tasks/op,   43733 insns/op,        0 errors)
211528.74 tps ( 62.1 allocs/op,  13.1 tasks/op,   43766 insns/op,        0 errors)

after:
215241.86 tps ( 61.1 allocs/op,  13.1 tasks/op,   43563 insns/op,        0 errors)
216172.41 tps ( 61.1 allocs/op,  13.1 tasks/op,   43562 insns/op,        0 errors)
212591.73 tps ( 61.1 allocs/op,  13.1 tasks/op,   43586 insns/op,        0 errors)
212217.28 tps ( 61.1 allocs/op,  13.1 tasks/op,   43553 insns/op,        0 errors)
215863.47 tps ( 61.1 allocs/op,  13.1 tasks/op,   43559 insns/op,        0 errors)

About 200 instructions saved.

Closes #14499
2023-07-04 16:41:51 +03:00
Marcin Maliszkiewicz
6424dd5ec4 alternator: close output_stream when exception is thrown during response streaming
When exception occurs and we omit closing output_stream then the whole process is brought down
by an assertion in ~output_stream.

Fixes https://github.com/scylladb/scylladb/issues/14453
Relates https://github.com/scylladb/scylladb/issues/14403

Closes #14454
2023-07-04 16:15:08 +03:00
Pavel Emelyanov
3679792f49 Merge 'test/pylib: allow run minio_server.py as a stand-alone tool' from Kefu Chai
this would allow developer to run a minio server for testing, for instance, s3_test.

Closes #14485

* github.com:scylladb/scylladb:
  test/pylib: chmod +x minio_server.py
  test/pylib: allow run minio_server.py as a stand-alone tool
2023-07-04 13:41:51 +03:00
Kefu Chai
c005b6dce0 test/pylib: chmod +x minio_server.py
add a shebang line. so we can just launch
a minio_server using

```console
test/pylib/minio_server.py --host 127.0.0.1
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-04 13:19:34 +08:00
Kefu Chai
2bae0b9aa8 test/pylib: allow run minio_server.py as a stand-alone tool
this would allow developer to run a minio server for testing, for
instance, s3_test, using something like:

```console
$ python3 test/pylib/minio_server.py --host 127.0.0.1
tempdir='/tmp/tmpfoobar-minio'
export S3_SERVER_ADDRESS_FOR_TEST=127.0.0.1
export S3_SERVER_PORT_FOR_TEST=900
export S3_PUBLIC_BUCKET_FOR_TEST=testbucket
```

and developer is supposed to copy-and-paste the `export` commands
to prepare the environmental variables for the test using the
minio server. the tempdir is used for the rundir of minio, and it
is also used for holding the log file of this tool. one might want
to check it when necessary.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-04 13:14:42 +08:00
Tomasz Grabiec
7d35cf8657 Merge 'migration_manager: disable schema pulls when schema is Raft-managed' from Kamil Braun
We want to disable `migration_manager` schema pulls and make schema
managed only by Raft group 0 if Raft is enabled. This will be important
with Raft-based topology, when schema will depend on topology (e.g. for
tablets).

We solved the problem partially in PR #13695. However, it's still
possible for a bootstrapping node to pull schema in the early part of
bootstrap procedure, before it setups group 0, because of how the
currently used `_raft_gr.using_raft()` check is implemented.

Here's the list of cases:
- If a node is bootstrapping in non-Raft mode, schema pulls must remain
  enabled.
- If a node is bootstrapping in Raft mode, it should never perform a
  schema pull.
- If a bootstrapped node is restarting in non-Raft mode but with Raft
  feature enabled (which means we should start upgrading to use Raft),
  or restarting in the middle of Raft upgrade procedure, schema pulls must
  remain enabled until the Raft upgrade procedure finishes.
  This is also the case of restarting after RECOVERY.
- If a bootstrapped node is restarting in Raft mode, it should never
  perform a schema pull.

The `raft_group0` service is responsible for setting up Raft during boot
and for the Raft upgrade procedure. So this is the most natural place to
make the decision that schema pulls should be disabled. Instead of
trying to come up with a correct condition that fully covers the above
list of cases, store a `bool` inside `migration_manager` and set it from
`raft_group0` function at the right moment - when we decide that we
should boot in Raft mode, or restart with Raft, or upgrade. Most of the
conditions are already checked in `setup_group0_if_exist`, we just need
to set the bool. Also print a log message when schema pulls are
disabled.

Fix a small bug in `migration_manager::get_schema_for_write` - it was
possible for the function to mark schema as synced without actually
syncing it if it was running concurrently to the Raft upgrade procedure.

Correct some typos in comments and update the comments.

Fixes #12870

Closes #14428

* github.com:scylladb/scylladb:
  raft_group_registry: remove `has_group0()`
  raft_group0_client: remove `using_raft()`
  migration_manager: disable schema pulls when schema is Raft-managed
2023-07-03 23:54:34 +02:00
Nadav Har'El
ec77172b4b Merge 'cql3: convert the SELECT clause evaluation phase to expressions' from Avi Kivity
SELECT clause components (selectors) are currently evaluated during query execution
using a stateful class hierarchy. This state is needed to hold intermediate state while
aggregating over multiple rows. Because the selectors are stateful, we must re-create
them each query using a selector_factory hierarchy.

We'd like to convert all of this to the unified expression evaluation machinery, so we can
have just one grammar for expressions, and just one way to evaluate expressions, but
the statefulness makes this complex.

In commit 59ab9aac44 "(Merge 'functions: reframe aggregate functions in terms
of scalar functions' from Avi Kivity)", we made aggregate functions stateless, moving
their state to aggregate_function_selector::_accumulator, and therefore into the
class hierarchy we're addressing now. Another reason for keeping state is that selectors
that aren't aggregated capture the first value they see in a GROUP BY group.

Since expressions can't contain state directly, we break apart expressions that contain
aggregate functions into two: an inner expression that processes incoming rows within
a group, and an outer expression that generates the group's output. The two expressions
communicate via a newly introduced expression element: a temporary.

The problem of non-aggregated columns requiring state is solved by encapsulating
those columns in an internal aggregate function, called the "first" function.

In terms of performance, this series has little effect, since the common case of selectors
that only contain direct column references without transformations is evaluated via a fast
path (`simple_selection`). This fast-path is preserved with almost no changes.

While the series makes it possible to start to extend the grammar and unify expression
syntaxes, it does not do so. The grammar is unchanged. There is just one breaking change:
the `SELECT JSON` statement generates json object field names based on the input selectors.
In one case the name of the field has changed, but it is an esoteric case (where a function call
is selected as part of `SELECT JSON`), and the new behavior is compatible with Cassandra.

Closes #14467

* github.com:scylladb/scylladb:
  cql3: selection: drop selector_factories, selectables, and selectors
  cql3: select_statement: stop using selector_factories in SELECT JSON
  cql3: selection: don't create selector_factories any more
  cql3: selection: collect column_definitions using expressions
  cql3: selection: reimplement selection::is_aggregate()
  cql3: selection: evaluate aggregation queries via expr::evaluate()
  cql3: selection, select_statement: fine tune add_column_for_post_processing() usage
  cql3: selection: evaluate non-aggregating complex selections using expr::evaluate()
  cql3: selection: store primary key in result_set_builder
  cql3: expression: fix field_selection::type interpretation by evaluate()
  cql3: selection: make result_set_builder::current non-optional<>
  cql3: selection: simplify row/group processing
  cql3: selection: convert requires_thread to expressions
  cql: selection: convert used_functions() to expressions
  cql3: selection: convert is_reducible/get_reductions to expressions
  cql3: selection: convert is_count() to expressions
  cql3: selection convert contains_ttl/contains_writetime to work on expressions
  cql3: selection: make simple_selectors stateless
  cql3: expression: add helper to split expressions with aggregate functions
  cql3: selection: short-circuit non-aggregations
  cql3: selection: drop validate_selectors
  cql3: select_statement: force aggregation if GROUP BY is used
  cql3: select_statement: levellize aggregation depth
  cql3: selection: skip first_function when collecting metadata
  cql3: select_statement: explicitly disable automatic parallelization with no aggregates
  cql3: expression: introduce temporaries
  cql3: select_statement: use prepared selectors
  cql3: selection: avoid selector_factories in collect_metadata()
  cql3: expressions: add "metadata mode" formatter for expressions
  cql3: selection: convert collect_metadata() to the prepared expression domain
  cql3: selection: convert processes_selection to work on prepared expressions
  cql3: selection: prepare selectors earlier
  cql3: raw_selector: deinline
  cql3: expression: reimplement verify_no_aggregate_functions()
  cql3: expression: add helpers to manage an expression's aggregation depth
  cql3: expression: improve printing of prepared function calls
  cql3: functions: add "first" aggregate function
2023-07-03 23:21:33 +03:00
Avi Kivity
66c47d40e6 cql3: selection: drop selector_factories, selectables, and selectors
The whole class hierarchy is no longer used by anything and we can just
delete it.
2023-07-03 19:45:17 +03:00
Avi Kivity
d9cf81f1a6 cql3: select_statement: stop using selector_factories in SELECT JSON
SELECT JSON uses selector_factories to obtain the names of the
fields to insert into the json object, and we want to drop
selector_factories entirely. Switch instead to the ":metadata" mode
of printing expressions, which does what we want.

Unfortunately, the switch changes how system functions are converted
into field names. A function such as unixtimestampof() is now rendered
as "system.unixtimestampof()"; before it did not have the keyspace
prefix.

This is a compatiblity problem, albeit an obscure one. Since the new
behavior matches Cassandra, and the odds of hitting this are very low,
I think we can allow the change.
2023-07-03 19:45:17 +03:00
Avi Kivity
039472ffb9 cql3: selection: don't create selector_factories any more
We no longer use selector_factories for anything, so we can drop them.
2023-07-03 19:45:17 +03:00
Avi Kivity
e521557ce5 cql3: selection: collect column_definitions using expressions
The replica needs to know which columns we're interested in. Iterate
and recurse into all selector expressions to collect all mentioned columns.

We use the same algorithm that create_factories_and_collect_column_definitions()
uses, even though it is quadratic, to avoid causing surprises.
2023-07-03 19:45:17 +03:00
Avi Kivity
7bd317ace4 cql3: selection: reimplement selection::is_aggregate()
We can get rid of the last use of selector_factories by reimplementing
is_aggregate(). It's simple - if we have an inner loop, we're
aggregating.
2023-07-03 19:45:17 +03:00
Avi Kivity
91cdaa72bd cql3: selection: evaluate aggregation queries via expr::evaluate()
When constructing a selection_with_processing, split the
selectors into an inner loop and an outer loop with split_aggregation().
We can then reimplement add_input_row() and get_output_row() as follows:

 - add_input_row(): evaluate the inner loop expressions and store
   the results in temporaries
 - get_output_row(): evaluate the outer loop expressions, pulling in
   values from those temporaries.

reset(), which is called between groups, simply copies the initial
values rathered by split_aggregation() into the temporaries.

The only complexity comes from add_column_for_post_query_processing(),
which essentially re-does the work of split_aggregation(). It would
be much better if we added the column before split_aggregation() was
called, but some refactoring has to take place before that happens.
2023-07-03 19:45:17 +03:00
Avi Kivity
27254c4f50 cql3: selection, select_statement: fine tune add_column_for_post_processing() usage
In three cases we need to consult a column that's possibly not explicitly
selected:
 - for the WHERE clause
 - for GROUP BY
 - for ORDER BY

The return value of the function is the index where the newly-added
column can be found. Currently, the index is correct for both
the internal column vector and the result set, but soon in won't
be.

In the first two cases (WHERE clause and ORDER BY), we're interested
in the column before grouping, in the last case (ORDER BY) we're interested
in the column after grouping, so we need to distinguish between the two.

Since we already have selection::index_of() that returns the pre-grouping
index, choose the post-grouping index for the return value of
selection::add_column_for_post_processing(), and change the GROUP BY
code to use index_of(). Comments are added.
2023-07-03 19:45:17 +03:00
Avi Kivity
6bf1bd7130 cql3: selection: evaluate non-aggregating complex selections using expr::evaluate()
Now that everything is in place, implement the fast-path
transform_input_row() for selection_with_processing. It's a
straightforward call to evaluate() in a loop.

We adjust add_column_for_post_processing() to also update _selectors,
otherwise ORDER BY clauses that require an additional column will not
see that column.

Since every sub-class implements transform_input_row(), mark
the base class declaration as pure virtual.
2023-07-03 19:45:17 +03:00
Avi Kivity
f5eb7fd6dc cql3: selection: store primary key in result_set_builder
expr::evaluate() expects an exploded primary key in its
evaluation_inputs structure (this dates back from the conversion
of filtering to expressions). But right now, the exploded primary
key is only available in the filter.

That's easy to fix however: move the primary key containers
to result_set_builder and just keep references in the filter.

After this, we can evaluate column_value expressions that
reference the primary key.
2023-07-03 19:45:17 +03:00
Avi Kivity
0021f77e30 cql3: expression: fix field_selection::type interpretation by evaluate()
field_selection::type refers to the type of the selection operation,
not the type of the structure being selected. This is what
prepare_expression() generates and how all other expression elements
work, but evaluate() for field_selection thinks it's the type
of the structure, and so fails when it gets an expression
from prepare_expression().

Fix that, and adjust the tests.
2023-07-03 19:45:17 +03:00
Avi Kivity
aed01018a3 cql3: selection: make result_set_builder::current non-optional<>
Previously, we used the engagedness of result_set_builder::optional
as a flag, but the previous patch eliminated that and it's always
engaged. Remove the optional wrapper to reduce noise.
2023-07-03 19:45:17 +03:00
Avi Kivity
44c8507075 cql3: selection: simplify row/group processing
Processing a result set relies on calling result_set_builder::new_row().
This function is quite complex as it has several roles:

 - complete processing of the previously computed row, if any
 - determine if GROUP BY grouping has changed, and flush the previous group
   if so
 - flush the last group if that's the case

This works now, but won't work with expr::evaluate. The reason is that
new_row() is called after the partition key and clustering key of the
new row have been evaluated, so processing of the previous row will see
incorrect data. It works today because we copy the partition key and
clustering key into result_set_builder::current, but expr::evaluate
uses the exploded partition key and clustering key, which have been
clobbered.

The solution is to separate the roles. Instead of new_row() that's
responsible for completing the previous row and starting a new one,
we have start_new_row() that's responsible for what its name says,
and complete_row() that's responsible for completing the row and
checking for group change. The responsibity for flushing the final
group is moved to result_set_builder::build(). This removes the
awkward "more_rows_coming" parameter that makes everything more
complicated.

result_set_builder::current is still optional, but it's always
engaged. The next patch will clean that up.
2023-07-03 19:45:17 +03:00
Avi Kivity
877f4f86d2 cql3: selection: convert requires_thread to expressions
If any function requires a thread to execute (due to running in Lua
or wasm), then the entire selection needs to run in a thread.
2023-07-03 19:45:17 +03:00
Avi Kivity
cbd68abde8 cql: selection: convert used_functions() to expressions
used_functions() is used to check whether prepared statements need
to be invalidated when user-defined functions change.

We need to skip over empty scalar components of aggregates, since
these can be defined by users (with the same meaning as if the
identity function was used).
2023-07-03 19:45:17 +03:00
Avi Kivity
bfb1acc6d3 cql3: selection: convert is_reducible/get_reductions to expressions
The current version of automatic query parallelization works when all
selectors are reducible (e.g. have a state_reduction_function member),
and all the inputs to the aggregates are direct column selectors without
further transformation. The actual column names and reductions need to
be packed up for forward_service to be used.

Convert is_reducible()/get_reductions() to the expression world. The
conversion is fairly straightforward.
2023-07-03 19:45:17 +03:00
Avi Kivity
d99fc29e2d cql3: selection: convert is_count() to expressions
Early versions of automatic query parallelization only
supported `SELECT count(*)` with one selector. Convert the
check to expressions.
2023-07-03 19:45:17 +03:00
Avi Kivity
d36eb8cea6 cql3: selection convert contains_ttl/contains_writetime to work on expressions
contains_ttl/contains_writetime are two attributes of a selection. If a selection
contains them, we must ask the replica to send them over; otherwise we don't
have data to process. Not sending ttl/writetime saves some effort.

The implementation is a straightforward recursive descent using expr::find_in_expression.
2023-07-03 19:45:17 +03:00
Avi Kivity
6c2bb5e1ed cql3: selection: make simple_selectors stateless
Now that we push all GROUP BY queries to selection_with_processing,
we always process rows via transform_input_row() and there's no
reason to keep any state in simple_selectors.

Drop the state and raise an internal error if we're ever
called for aggregation.
2023-07-03 19:45:17 +03:00
Avi Kivity
a26516ef65 cql3: expression: add helper to split expressions with aggregate functions
Aggregate functions cannot be evaluated directly, since they implicitly
refer to state (the accumulator). To allow for evaluation, we
split the expression into two: an inner expression that is evaluated
over the input vector (once per element). The inner expression calls
the aggregation function, with an extra input parameter (the accumulator).

The outer expression is evaluated once per input vector; it calls
the final function, and its input is just the accumulator. The outer
expression also contains any expressions that operate on the result
of the aggregate function.

The acculator is stored in a temporary.

Simple example:

   sum(x)

is transformed into an inner expression:

   t1 = (t1 + x)   // really sum.aggregation_function

and an outer expression:

   result = t1     // really sum.state_to_result_function

Complicated example:

    scalar_func(agg1(x, f1(y)), agg2(x, f2(y)))

is transformed into two inner expressions:

    t1 = agg1.aggregation_function(t1, x, f1(y))
    t2 = agg2.aggregation_function(t2, x, f2(y))

and an outer expression

    output = scalar_func(agg1.state_to_result_function(t1),
                         agg2.state_to_result_function(t2))

There's a small wart: automatically parallelized queries can generate
"reducible" aggregates that have no state_to_result function, since we
want to pass the state back to the coordinator. Detect that and short
circuit evaluation to pass the accumulator directly.
2023-07-03 19:45:17 +03:00
Avi Kivity
f48ecb5049 cql3: selection: short-circuit non-aggregations
Currently, selector evaluation assumes the most complex case
where we aggregate, so multiple input rows combine into one output row.

In effect the query either specifies an outer loop (for the group)
and an inner loop (for input rows), or it only specifies the inner loop;
but we always perform the outer and inner loop.

Prepare to have a separate path for the non-aggregation case by
introducing transform_input_row().
2023-07-03 19:45:17 +03:00
Avi Kivity
4a2428e4ec cql3: selection: drop validate_selectors
It's unused. It dates from the (perhaps better) time when
regularity of aggregation across selectors was enforced.
2023-07-03 19:45:17 +03:00
Avi Kivity
432cb02d64 cql3: select_statement: force aggregation if GROUP BY is used
GROUP BY is typically used with aggregation. In one case the aggregation
is implicit:

    SELECT a, b, c
    FROM tab
    GROUP BY x, y, z

One row will appear from each group, even though no aggregation
was specified. To avoid this irregularity, rewrite this query as

    SELECT first(a), first(b), first(c)
    FROM tab
    GROUP BY x, y, z

This allows us to have different paths for aggregations and
non-aggregations, without worrying about this special case.
2023-07-03 19:45:17 +03:00
Avi Kivity
bc6c64e13c cql3: select_statement: levellize aggregation depth
Avoid mixed aggregate/non-aggregate queries by inserting
calls to the first() function. This allows us to avoid internal
state (simple_selector::_current) and make selector evaluation
stateless apart from explicit temporaries.
2023-07-03 19:45:17 +03:00
Avi Kivity
ecdded90cd cql3: selection: skip first_function when collecting metadata
We plan to rewrite aggregation queries that have a non-aggregating
selector using the first function, so that all selectors are
aggregates (or none are). Prevent the first function from affecting
metadata (the auto-generated column names), by skipping over the
first function if detected. They input and output types are unchanged
so this only affects the name.
2023-07-03 19:45:17 +03:00
Avi Kivity
996e02f5bf cql3: select_statement: explicitly disable automatic parallelization with no aggregates
A query of the form `SELECT foo, count(foo) FROM tab` returns the first
value of the foo column along with the count. This can't be parallized
today since the first selector isn't an aggregate.

We plan to rewrite the query internally as `SELECT first(foo), count(foo)
FROM tab`, in order to make the query more regular (no mixing of aggregates
and non-aggregates). However, this will defeat the current check since
after the rewrite, all selectors are aggregates.

Prepare for this by performing the check on a pre-rewrite variable, so
it won't be affected by the query rewrite in the next patch.

Note that although even though we could add support for running
first() in parallel, it's not possible to get the correct results,
since first() is not commutative and we don't reduce in order. It's
also not a particularly interesting query.
2023-07-03 19:45:17 +03:00
Avi Kivity
778ae2b461 cql3: expression: introduce temporaries
Temporaries are similar to bind variables - they are values provided from
outside the expression. While bind variables are provided by the user, temporaries
are generated internally.

The intended use is for aggregate accumulator storage. Currently aggregates
store the accumulator in aggregate_function_selector::_accumulator, which
means the entire selector hierarchy must be cloned for every query. With
expressions, we can have a single expression object reused for many computations,
but we need a way to inject the accumulator into an aggregation, which this
new expression element provides.
2023-07-03 19:45:17 +03:00
Avi Kivity
7c3ceb6473 cql3: select_statement: use prepared selectors
Change one more layer of processing to work on prepared
rather than raw selectors. This moves the call to prepare
the selectors early in select_statement processing. In turn
this changes maybe_jsonize_select_clause() and forward_service's
mock_selection() to work in the prepared realm as well.

This moves us one step closer to using evaluate() to process
the select clause, as the prepared selectors are now available
in select_statement. We can't use them yet since we can't evaluate
aggregations.
2023-07-03 19:45:17 +03:00