Replace index_supports_some_column(expression, ...) with
index_supports_some_column(single_column_restrictions_map, ...) to
eliminate get_single_column_restrictions_map() tree walks when checking
index support. The three call sites now use the maps already built
incrementally in the constructor loop:
_single_column_nonprimary_key_restrictions,
_single_column_clustering_key_restrictions, and
_single_column_partition_key_restrictions.
Also replace contains_multi_column_restriction() tree walk in
clustering_columns_restrictions_have_supporting_index() with
_has_multi_column.
Replace the extract_clustering_prefix_restrictions() tree walk with
incremental collection during the main loop. Two new locals --
mc_ck_preds and sc_ck_preds -- accumulate multi-column and single-column
clustering key predicates respectively. A short post-loop block
computes the longest contiguous prefix from sc_ck_preds (or uses
mc_ck_preds directly for multi-column), replacing the removed function.
Also remove the now-unused to_predicate_on_clustering_key_prefix(),
with_current_binary_operator() helper, and the
visitor_with_binary_operator_context concept.
Replace the extract_partition_range() tree walk with incremental
collection during the main loop. Two new locals before the loop --
token_pred and pk_range_preds -- accumulate token and single-column
EQ/IN partition key predicates respectively. A short post-loop block
materializes _partition_range_restrictions from these locals, replacing
the removed function.
This removes the last tree walk over partition-key restrictions.
Instead of accumulating all clustering-key restrictions into a
conjunction tree and then decomposing it by column via
get_single_column_restrictions_map() post-loop, build the
per-column map incrementally as each single-column clustering-key
predicate is processed.
The post-loop guard (!has_mc_clustering) is no longer needed:
multi-column predicates go through the is_multi_column branch
and never insert into this map, and mixing multi with single-column
is rejected with an exception.
This eliminates a post-loop tree walk over
_clustering_columns_restrictions.
Instead of accumulating all partition-key restrictions into a
conjunction tree and then decomposing it by column via
get_single_column_restrictions_map() post-loop, build the
per-column map incrementally as each single-column partition-key
predicate is processed.
The post-loop guard (!has_token_restrictions()) is no longer needed:
token predicates go through the on_partition_key_token branch and
never insert into this map, and mixing token with non-token is
rejected with an exception.
This eliminates a post-loop tree walk over
_partition_key_restrictions.
Instead of accumulating all non-primary-key restrictions into a
conjunction tree and then decomposing it by column via
get_single_column_restrictions_map() post-loop, build the
per-column map incrementally as each non-primary-key predicate
is processed.
This eliminates a post-loop tree walk over _nonprimary_key_restrictions.
Replace the two post-loop find_binop(_clustering_columns_restrictions,
is_multi_column) tree walks and the contains_multi_column_restriction()
tree walk with the already-tracked local has_mc_clustering.
The redundant second assignment inside the _check_indexes block is
removed entirely.
Replace the two in-loop calls to has_token_restrictions() (which
walks the _partition_key_restrictions expression tree looking for
token function calls) with a local bool has_token, set to true
when a token predicate is processed.
The member function is retained since it's used outside the
constructor.
With this change, the constructor loop's non-error control flow
performs zero expression tree scanning. The only remaining tree
walks are on error paths (get_sorted_column_defs,
get_columns_in_commons for formatting exception messages) and
structural (make_conjunction for building accumulated expressions).
Replace the in-loop call to partition_key_restrictions_is_empty()
(which walks the _partition_key_restrictions expression tree via
is_empty_restriction()) with a local bool pk_is_empty, set to false
at the two sites where partition key restrictions are added.
The member function is retained since it's used outside the
constructor.
Replace find_in_expression<binary_operator>(_clustering_columns_restrictions,
always_true), which walks the accumulated expression tree to find the
first binary_operator, with a tracked pointer first_mc_pred set when
the first multi-column predicate is added. This eliminates the tree
scan, the null check, and the is_lower_bound/is_upper_bound lambdas,
replacing them with direct predicate field accesses: first_mc_pred->order,
first_mc_pred->is_lower_bound, first_mc_pred->is_upper_bound, and
first_mc_pred->filter for error messages.
Replace get_last_column_def(_clustering_columns_restrictions), which
walks the entire accumulated expression tree to collect and sort all
column definitions, with a local pointer ck_last_column that tracks
the column with the highest schema position as single-column
clustering restrictions are added.
Replace has_slice(_clustering_columns_restrictions), which walks the
accumulated expression tree looking for slice operators, with a local
bool ck_has_slice set when any clustering predicate with is_slice is
added. Updated at all three clustering insertion points: multi-column
first assignment, multi-column slice conjunction, and single-column
conjunction.
Replace find_binop(_clustering_columns_restrictions, is_tuple_constructor),
which walks the accumulated expression tree looking for multi-column
restrictions, with a local bool has_mc_clustering set when a multi-column
predicate is first added. This serves both the multi-column branch
(checking existing restrictions are also multi-column) and the
single-column branch (checking no multi-column restrictions exist).
Replace is_empty_restriction(_clustering_columns_restrictions), which
recursively walks the accumulated expression tree, with a local bool
ck_is_empty that is set to false when a clustering restriction is
first added. Updated at both insertion points: multi-column first
assignment and single-column make_conjunction.
The constructor loop no longer needs to extract a binary_operator
reference from each predicate. All remaining uses (make_conjunction,
get_columns_in_commons, assignment to accumulated restriction members,
_where.push_back, and error formatting) accept expression directly,
which is what pred.filter already is. This eliminates the unnecessary
as<binary_operator> cast at the top of the loop.
In the single-column partition-key and clustering-key sub-branches,
replace direct binary_operator field inspections with pre-computed
predicate booleans: !pred.equality && !pred.is_in instead of
restr.op != EQ && restr.op != IN, pred.is_in instead of
find(restr, IN), and pred.is_slice instead of has_slice(restr).
Also fix a leftover restr.order in the multi-column branch error
message.
Replace direct operator comparisons with predicate boolean fields:
pred.equality, pred.is_in, pred.is_slice, pred.is_lower_bound,
pred.is_upper_bound, and pred.order.
Convert the constructor loop to first build predicates from the
prepared where clause, then iterate over the predicates.
The IS_NOT branch now uses pred.is_not_null_single_column and pred.on
instead of inspecting the expression directly. The branch conditions
for multi-column (pred.is_multi_column), token
(on_partition_key_token), and single-column (on_column) now use
predicate properties instead of expression helpers.
Remove extract_column_from_is_not_null_restriction() which is no
longer needed.
Add boolean fields to struct predicate that describe the operator:
equality, is_in, is_slice, is_upper_bound, is_lower_bound, and
comparison_order. Populate them in all to_predicates() return sites.
These fields will allow the constructor loop to inspect predicate
properties directly instead of re-examining the expression.
We want to move away from the unprepared domain to the prepared
domain to avoid confusion. Ideally we'd receive prepared expressions
via the constructor, but that is left for later.
Currently, possible_lhs_values accepts a column_definition parameter
that tells it which column we are interested in. This works
because callers pre-analyze the expression and only pass a
subexpression that contains the specified columns.
We wish to convert expressions to predicates early, and so won't
have the benefit of knowing which columns we're interested in.
Generally, this is simple: a binary operator contains a column on the
left-hand side, so use that. If the expression is on a token, use that.
When the expression is a boolean constant (not expressible by
the grammar, but somehow found its way into the code). We invent
a new `on_row` designator meaning it's not about a specific column.
It will be useful one day when we allow things like
`WHERE some_boolean_function(c1, c2)` that aren't specific to any
single column.
Finally, we introduce helpers that, given such an expression decomposed
into predicates and a column_definition, extract the predicate related
to the given column. This mimics the possible_lhs_values API and allows
us to make minimal changes to callers, deferring that until later.
possible_lhs_values() is renamed to to_predicates() and loses the
column_definition parameter to indicate its new role.
Currently, we are careful to call possible_lhs_values() for a token
function only when slice/equality operators are used. We wish to relax
this, so return nullptr (must filter) for the other cases instead of
raising an internal error.
Currently, possible_lhs_values() for a function call expression will
only be called when we're sure it's the token() function. But soon this
will no longer be the case. Return nullptr for non-token functions to
indicate we can't solve for a column value instead of an internal
error.
Since we're a first-resort call now, and there's a last-restort (evaluate)
Logically should be part of previous patch, but the rest of the code is still
careful enough not to call here when not expecting a solution, so the split
is not breaking bisectability.
Convert from an execute-time function to a prepare-time function
by returning a solver function instead of directly solving.
When not possible to solve, but still possible to evaluate (filter),
return nullptr.
Expressions are a tree-like structure so a single expression is sufficient
(for complicated ones, a conjunction is used), but predicates are flat.
Prepare for conversion to predicates by storing the expressions that
will correspond to predicates, namely the boolean factors of the WHERE
clause.
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
This unifies the signature with possible_lhs_values(), paving the way
to deduplicating the two functions. We always have the schema and may as
well pass it.
All query plans that try to solve for the possible values a column
(or token, or column-tuple) can take have been converted to set
analyzed_column::solve_for. Recognize that by removing the
fallback path.
This removes the last possible_column_values() call that isn't bound
(using std::bind_front), and will allow moving it to prepare time.
By moving query_options to the end, we can use std::bind_front to
convert it from a build-time to a run-time function that depends
only on the query_options.
Multi-column restrictions (a, b) > (:v1, :v2) do not obey normal
comparison rules. For example, given
(a, b) > (5, 1) AND a <= 5
We see that (a, b) = (5, 2) satisfies the constraint, but if we tried
to solve for the interval
( (5, 1), (5) ]
We'd have to conclude that (5,1) <= (5).
It's possible to extend the CQL type system to support this, but
that would be a lot of work, and in fact the current code doesn't
depend on it (by solving these intersections in its own code path
(multi_column_range_accumulator_builder's prefix3cmp).
So, we just mark such solvers as non-comparable, and generate an
internal error if we try to compare them in make_conjunction.
In statement_restriction's constructor, we check that all the boolean factors
are relations. This means the code to handle a constant here is dead code.
Remove it; while it's good to handle it, it should be handled at the top
level, not in multi-column restriction processing.
range_from_raw_bound processes restrictions of the form
(a, b) > SCYLLA_CLUSTERING_BOUND(?, ?)
indicating that comparisons respect whether columns are reversed or not.
Iterate over expressions during the prepare phase only; generating
"builder" functions to be executed during the query phase.
The get_clustering_bounds() family works in terms of vectors of
clustering ranges (to support IN) and in fact the only caller converts
it to a vector. Converting it immediately simplifies later patching.
multi_column_range_accumulator analyzes an expression containing multi-column
restrictions of the form (a, b) > (?, ?) and simultaneously analyzes
them and solves for the set of intervals that satisfy those restrictions.
Split this into prepare-time phase (that generates "builders", functions
that operator on the accumulator), and a query phase that executes
the builders. Importantly, the expression visitor ends up on the prepare
phase, so it can be merged with other parts of the analysis.
Helper functions of the visitor are made static, since they need to
run during the query phase but the visitor only exists during the
prepare phase.
Lay the groundwork for analyzing multi column clustering bounds by
splitting the function into prepare-time and execute-time parts.
To start with, all of the work is done at query time, but later
patches will move bits into prepare time.