Commit Graph

53404 Commits

Author SHA1 Message Date
Avi Kivity
d584bd7358 cql3: statement_restrictions: replace has_eq_restriction_on_column with precomputed set
has_eq_restriction_on_column() walked expression trees at prepare time to
find binary_operators with op==EQ that mention a given column on the LHS.
Its only caller is ORDER BY validation in select_statement, which checks
that clustering columns without an explicit ordering have an EQ restriction.

Replace the 50-line expression-walking free function with a precomputed
unordered_set<const column_definition*> (_columns_with_eq) populated during
the main predicate loop in analyze_statement_restrictions.  For single-column
EQ predicates the column is taken from on_column; for multi-column EQ like
(ck1, ck2) = (1, 2), all columns in on_clustering_key_prefix are included.

The member function becomes a single set::contains() call.
2026-04-19 20:57:09 +03:00
Avi Kivity
b7f86eaabc cql3: statement_restrictions: replace multi_column_range_accumulator_builder with direct predicate iteration
build_get_multi_column_clustering_bounds_fn() used expr::visit() to dispatch
each restriction through a 15-handler visitor struct.  Only the
binary_operator handler did real work; the conjunction handler just
recursed, and the remaining 13 handlers were dead-code on_internal_error
calls (the filter expression of each predicate is always a binary_operator).

Replace the visitor with a loop over predicates that does
as<binary_operator>(pred.filter) directly, building the same query-time
lambda inline.

Promote intersect_all() and process_in_values() from static methods of
the deleted struct to free functions in the anonymous namespace -- they
are still called from the query-time lambda.
2026-04-19 20:57:09 +03:00
Avi Kivity
ece9af229d cql3: statement_restrictions: use predicate fields in build_get_clustering_bounds_fn
Replace find_binop(..., is_multi_column) with pred.is_multi_column in
build_get_clustering_bounds_fn() and add_clustering_restrictions_to_idx_ck_prefix().

Replace is_clustering_order(binop) with pred.order == comparison_order::clustering
and iterate predicates directly instead of extracting filter expressions.

Remove the now-dead is_multi_column() free function.
2026-04-19 20:57:09 +03:00
Avi Kivity
72da1207d7 cql3: statement_restrictions: remove extract_single_column_restrictions_for_column
The previous commit made prepare_indexed_local() use the pre-built
predicate vectors instead of calling extract_single_column_restrictions_for_column().
That was the last production caller.

Remove the function definition (65 lines of expression-walking visitor)
and its declaration/doc-comment from the header.

Replace the unit test (expression_extract_column_restrictions) which
directly called the removed function with synthetic column_definitions,
with per_column_restriction_routing which exercises the same routing
logic through the public analyze_statement_restrictions() API.  The new
test verifies not just factor counts but the exact (column_name, oper_t)
pairs in each per-column entry, catching misrouted restrictions that a
count-only check would miss.
2026-04-19 20:57:09 +03:00
Avi Kivity
b093477cf7 cql3: statement_restrictions: use predicate vectors in prepare_indexed_local
Replace the extract_single_column_restrictions_for_column(_where, ...) call
in prepare_indexed_local() with a direct lookup in the pre-built predicate
vectors.

The old code walked the entire WHERE expression tree to extract binary
operators mentioning the indexed column, wrapped them in a conjunction,
translated column definitions to the index schema, then called
to_predicate_on_column() which walked the expression *again* to convert
back to predicates.

The new code selects the appropriate predicate vector map (PK, CK, or
non-PK) based on the indexed column's kind, looks up the column's
predicates directly, applies replace_column_def to each, and folds them
with make_conjunction -- producing the same result without any expression
tree walks.

This removes the last production caller of
extract_single_column_restrictions_for_column (unit tests in
statement_restrictions_test.cc still exercise it).
2026-04-19 20:57:09 +03:00
Avi Kivity
a725e39218 cql3: statement_restrictions: use predicate vector size for clustering prefix length
Replace the body of num_clustering_prefix_columns_that_need_not_be_filtered()
with a single return of _clustering_prefix_restrictions.size().

The old implementation called get_single_column_restrictions_map() to rebuild
a per-column map from the clustering expression tree, then iterated it in
schema order counting columns until it hit a gap, a needs-filtering predicate,
or a slice.  But _clustering_prefix_restrictions is already built with exactly
that same logic during the constructor (lines 1234-1248): it iterates CK
columns in schema order, appending predicates until it encounters a gap in
column_id, a predicate that needs_filtering, or a slice -- at which point it
stops.  So the vector's size is, by construction, the answer to the same
question the old code was re-deriving at query time.

This makes four helper functions dead code:

- get_single_column_restrictions_map(): walked the expression tree to build
  a map<column_definition*, expression> of per-column restrictions.  Was a
  ~15-line function that called get_sorted_column_defs() and
  extract_single_column_restrictions_for_column() for each column.

- get_the_only_column(): extracted the single column_value from a restriction
  expression, asserting it was single-column.  Called by the old loop body.

- is_single_column_restriction(): thin wrapper around
  get_single_column_restriction_column().

- get_single_column_restriction_column(): ~25-line function that walked an
  expression tree with for_each_expression<column_value> to determine whether
  all column_value nodes refer to the same column.  Called by the above two.

Remove all four functions and their forward declarations (-95 lines).
2026-04-19 20:57:08 +03:00
Avi Kivity
68c2e292ac cql3: statement_restrictions: replace do_find_idx and is_supported_by with predicate-based versions
Convert do_find_idx() from a member function that walks expression trees
via index_restrictions()/for_each_expression/extract_single_column_restrictions
to a static free function that iterates index_search_group spans using
are_predicates_supported_by().

Convert calculate_column_defs_for_filtering_and_erase_restrictions_used_for_index()
to use predicate vectors instead of expression-based is_supported_by().

Remove now-dead code: is_supported_by(), is_supported_by_helper(), score()
member function, and do_find_idx() member function.
2026-04-19 20:57:08 +03:00
Avi Kivity
c42397e995 cql3: statement_restrictions: remove expression-based has_supporting_index and index_supports_some_column
These functions are no longer called now that all index support checks
in the constructor use predicate-based alternatives. The expression-based
is_supported_by and is_supported_by_helper are still needed by choose_idx()
and calculate_column_defs_for_filtering_and_erase_restrictions_used_for_index().
2026-04-19 20:57:08 +03:00
Avi Kivity
1aafe0708a cql3: statement_restrictions: replace multi-column and PK index support checks with predicate-based versions
Replace clustering_columns_restrictions_have_supporting_index(),
multi_column_clustering_restrictions_are_supported_by(),
get_clustering_slice(), and partition_key_restrictions_have_supporting_index()
with predicate-based equivalents that use the already-accumulated mc_ck_preds
and sc_pk_pred_vectors locals.

The new multi_column_predicates_have_supporting_index() checks each
multi-column predicate's columns list directly against indexes, avoiding
expression tree walks through find_in_expression and bounds_slice.
2026-04-19 20:57:08 +03:00
Avi Kivity
fa6f239cc7 cql3: statement_restrictions: add predicate-based index support checking
Add `op` and `is_subscript` fields to `struct predicate` and populate them
in all predicate creation sites in `to_predicates()`. These fields record the
binary operator and whether the LHS is a subscript (map element access), which
are the two pieces of information needed to query index support.

Add `is_predicate_supported_by()` which mirrors `is_supported_by_helper()`
but operates on a single predicate's fields instead of walking the expression
tree.

Add a predicate-vector overload of `index_supports_some_column()` and use it
in the constructor to replace expression-based index support checks for
single-column partition key, clustering key, and non-primary-key restrictions.
The multi-column clustering key case still uses the existing expression-based
path.
2026-04-19 20:57:08 +03:00
Avi Kivity
25ba3bd649 cql3: statement_restrictions: use pre-built single-column maps for index support checks
Replace index_supports_some_column(expression, ...) with
index_supports_some_column(single_column_restrictions_map, ...) to
eliminate get_single_column_restrictions_map() tree walks when checking
index support.  The three call sites now use the maps already built
incrementally in the constructor loop:
_single_column_nonprimary_key_restrictions,
_single_column_clustering_key_restrictions, and
_single_column_partition_key_restrictions.

Also replace contains_multi_column_restriction() tree walk in
clustering_columns_restrictions_have_supporting_index() with
_has_multi_column.
2026-04-19 20:57:08 +03:00
Avi Kivity
fab90224b3 cql3: statement_restrictions: build clustering-prefix restrictions incrementally
Replace the extract_clustering_prefix_restrictions() tree walk with
incremental collection during the main loop.  Two new locals --
mc_ck_preds and sc_ck_preds -- accumulate multi-column and single-column
clustering key predicates respectively.  A short post-loop block
computes the longest contiguous prefix from sc_ck_preds (or uses
mc_ck_preds directly for multi-column), replacing the removed function.

Also remove the now-unused to_predicate_on_clustering_key_prefix(),
with_current_binary_operator() helper, and the
visitor_with_binary_operator_context concept.
2026-04-19 20:57:08 +03:00
Avi Kivity
3bd308986a cql3: statement_restrictions: build partition-range restrictions incrementally
Replace the extract_partition_range() tree walk with incremental
collection during the main loop.  Two new locals before the loop --
token_pred and pk_range_preds -- accumulate token and single-column
EQ/IN partition key predicates respectively.  A short post-loop block
materializes _partition_range_restrictions from these locals, replacing
the removed function.

This removes the last tree walk over partition-key restrictions.
2026-04-19 20:57:08 +03:00
Avi Kivity
db28411548 cql3: statement_restrictions: build clustering-key single-column restrictions map incrementally
Instead of accumulating all clustering-key restrictions into a
conjunction tree and then decomposing it by column via
get_single_column_restrictions_map() post-loop, build the
per-column map incrementally as each single-column clustering-key
predicate is processed.

The post-loop guard (!has_mc_clustering) is no longer needed:
multi-column predicates go through the is_multi_column branch
and never insert into this map, and mixing multi with single-column
is rejected with an exception.

This eliminates a post-loop tree walk over
_clustering_columns_restrictions.
2026-04-19 20:57:08 +03:00
Avi Kivity
a4608804d8 cql3: statement_restrictions: build partition-key single-column restrictions map incrementally
Instead of accumulating all partition-key restrictions into a
conjunction tree and then decomposing it by column via
get_single_column_restrictions_map() post-loop, build the
per-column map incrementally as each single-column partition-key
predicate is processed.

The post-loop guard (!has_token_restrictions()) is no longer needed:
token predicates go through the on_partition_key_token branch and
never insert into this map, and mixing token with non-token is
rejected with an exception.

This eliminates a post-loop tree walk over
_partition_key_restrictions.
2026-04-19 20:57:08 +03:00
Avi Kivity
e9b16a11ba cql3: statement_restrictions: build non-primary-key single-column restrictions map incrementally
Instead of accumulating all non-primary-key restrictions into a
conjunction tree and then decomposing it by column via
get_single_column_restrictions_map() post-loop, build the
per-column map incrementally as each non-primary-key predicate
is processed.

This eliminates a post-loop tree walk over _nonprimary_key_restrictions.
2026-04-19 20:57:08 +03:00
Avi Kivity
701366a8d1 cql3: statement_restrictions: use tracked has_mc_clustering for _has_multi_column
Replace the two post-loop find_binop(_clustering_columns_restrictions,
is_multi_column) tree walks and the contains_multi_column_restriction()
tree walk with the already-tracked local has_mc_clustering.

The redundant second assignment inside the _check_indexes block is
removed entirely.
2026-04-19 20:57:08 +03:00
Avi Kivity
da438507d0 cql3: statement_restrictions: track has-token state incrementally
Replace the two in-loop calls to has_token_restrictions() (which
walks the _partition_key_restrictions expression tree looking for
token function calls) with a local bool has_token, set to true
when a token predicate is processed.

The member function is retained since it's used outside the
constructor.

With this change, the constructor loop's non-error control flow
performs zero expression tree scanning.  The only remaining tree
walks are on error paths (get_sorted_column_defs,
get_columns_in_commons for formatting exception messages) and
structural (make_conjunction for building accumulated expressions).
2026-04-19 20:57:07 +03:00
Avi Kivity
1344278a19 cql3: statement_restrictions: track partition-key-empty state incrementally
Replace the in-loop call to partition_key_restrictions_is_empty()
(which walks the _partition_key_restrictions expression tree via
is_empty_restriction()) with a local bool pk_is_empty, set to false
at the two sites where partition key restrictions are added.

The member function is retained since it's used outside the
constructor.
2026-04-19 20:57:07 +03:00
Avi Kivity
14812ea1e0 cql3: statement_restrictions: track first multi-column predicate incrementally
Replace find_in_expression<binary_operator>(_clustering_columns_restrictions,
always_true), which walks the accumulated expression tree to find the
first binary_operator, with a tracked pointer first_mc_pred set when
the first multi-column predicate is added. This eliminates the tree
scan, the null check, and the is_lower_bound/is_upper_bound lambdas,
replacing them with direct predicate field accesses: first_mc_pred->order,
first_mc_pred->is_lower_bound, first_mc_pred->is_upper_bound, and
first_mc_pred->filter for error messages.
2026-04-19 20:57:07 +03:00
Avi Kivity
ef005c10ba cql3: statement_restrictions: track last clustering column incrementally
Replace get_last_column_def(_clustering_columns_restrictions), which
walks the entire accumulated expression tree to collect and sort all
column definitions, with a local pointer ck_last_column that tracks
the column with the highest schema position as single-column
clustering restrictions are added.
2026-04-19 20:57:07 +03:00
Avi Kivity
88bd5ea1b7 cql3: statement_restrictions: track clustering-has-slice incrementally
Replace has_slice(_clustering_columns_restrictions), which walks the
accumulated expression tree looking for slice operators, with a local
bool ck_has_slice set when any clustering predicate with is_slice is
added. Updated at all three clustering insertion points: multi-column
first assignment, multi-column slice conjunction, and single-column
conjunction.
2026-04-19 20:57:07 +03:00
Avi Kivity
1071c39f17 cql3: statement_restrictions: track has-multi-column-clustering incrementally
Replace find_binop(_clustering_columns_restrictions, is_tuple_constructor),
which walks the accumulated expression tree looking for multi-column
restrictions, with a local bool has_mc_clustering set when a multi-column
predicate is first added. This serves both the multi-column branch
(checking existing restrictions are also multi-column) and the
single-column branch (checking no multi-column restrictions exist).
2026-04-19 20:57:07 +03:00
Avi Kivity
aa6a0ad326 cql3: statement_restrictions: track clustering-empty state incrementally
Replace is_empty_restriction(_clustering_columns_restrictions), which
recursively walks the accumulated expression tree, with a local bool
ck_is_empty that is set to false when a clustering restriction is
first added. Updated at both insertion points: multi-column first
assignment and single-column make_conjunction.
2026-04-19 20:57:07 +03:00
Avi Kivity
d4ff613c0a cql3: statement_restrictions: replace restr bridge variable with pred.filter
The constructor loop no longer needs to extract a binary_operator
reference from each predicate. All remaining uses (make_conjunction,
get_columns_in_commons, assignment to accumulated restriction members,
_where.push_back, and error formatting) accept expression directly,
which is what pred.filter already is. This eliminates the unnecessary
as<binary_operator> cast at the top of the loop.
2026-04-19 20:57:07 +03:00
Avi Kivity
44b18f3399 cql3: statement_restrictions: convert single-column branch to use predicate properties
In the single-column partition-key and clustering-key sub-branches,
replace direct binary_operator field inspections with pre-computed
predicate booleans: !pred.equality && !pred.is_in instead of
restr.op != EQ && restr.op != IN, pred.is_in instead of
find(restr, IN), and pred.is_slice instead of has_slice(restr).
Also fix a leftover restr.order in the multi-column branch error
message.
2026-04-19 20:57:07 +03:00
Avi Kivity
b0c5eed384 cql3: statement_restrictions: convert multi-column branch to use predicate properties
Replace direct operator comparisons with predicate boolean fields:
pred.equality, pred.is_in, pred.is_slice, pred.is_lower_bound,
pred.is_upper_bound, and pred.order.
2026-04-19 20:57:07 +03:00
Avi Kivity
afd68187ea cql3: statement_restrictions: convert constructor loop to iterate over predicates
Convert the constructor loop to first build predicates from the
prepared where clause, then iterate over the predicates.

The IS_NOT branch now uses pred.is_not_null_single_column and pred.on
instead of inspecting the expression directly. The branch conditions
for multi-column (pred.is_multi_column), token
(on_partition_key_token), and single-column (on_column) now use
predicate properties instead of expression helpers.

Remove extract_column_from_is_not_null_restriction() which is no
longer needed.
2026-04-19 20:57:07 +03:00
Avi Kivity
440d9f2d82 cql3: statement_restrictions: annotate predicates with operator properties
Add boolean fields to struct predicate that describe the operator:
equality, is_in, is_slice, is_upper_bound, is_lower_bound, and
comparison_order. Populate them in all to_predicates() return sites.

These fields will allow the constructor loop to inspect predicate
properties directly instead of re-examining the expression.
2026-04-19 20:57:07 +03:00
Avi Kivity
e0eb3bde8d cql3: statement_restrictions: annotate predicates with is_not_null and is_multi_column
To avoid having to dig deep into the expression, compute is_not_null
and is_multicolumn early and store them in the predicate.
2026-04-19 20:57:06 +03:00
Avi Kivity
6892642176 cql3: statement_restrictions: complete preparation early
We want to move away from the unprepared domain to the prepared
domain to avoid confusion. Ideally we'd receive prepared expressions
via the constructor, but that is left for later.
2026-04-19 20:57:06 +03:00
Avi Kivity
ed5dd645e8 cql3: statement_restrictions: convert expressions to predicates without being directed at a specific column
Currently, possible_lhs_values accepts a column_definition parameter
that tells it which column we are interested in. This works
because callers pre-analyze the expression and only pass a
subexpression that contains the specified columns.

We wish to convert expressions to predicates early, and so won't
have the benefit of knowing which columns we're interested in.

Generally, this is simple: a binary operator contains a column on the
left-hand side, so use that. If the expression is on a token, use that.

When the expression is a boolean constant (not expressible by
the grammar, but somehow found its way into the code). We invent
a new `on_row` designator meaning it's not about a specific column.
It will be useful one day when we allow things like
`WHERE some_boolean_function(c1, c2)` that aren't specific to any
single column.

Finally, we introduce helpers that, given such an expression decomposed
into predicates and a column_definition, extract the predicate related
to the given column. This mimics the possible_lhs_values API and allows
us to make minimal changes to callers, deferring that until later.

possible_lhs_values() is renamed to to_predicates() and loses the
column_definition parameter to indicate its new role.
2026-04-19 20:57:06 +03:00
Avi Kivity
bfd1302311 cql3: statement_restrictions: refine possible_lhs_values() function_call processing
Currently, we are careful to call possible_lhs_values() for a token
function only when slice/equality operators are used. We wish to relax
this, so return nullptr (must filter) for the other cases instead of
raising an internal error.
2026-04-19 20:57:06 +03:00
Avi Kivity
736011b663 cql3: statement_restrictions: return nullptr for function solver if not token
Currently, possible_lhs_values() for a function call expression will
only be called when we're sure it's the token() function. But soon this
will no longer be the case. Return nullptr for non-token functions to
indicate we can't solve for a column value instead of an internal
error.
2026-04-19 20:57:06 +03:00
Avi Kivity
8faf62a1aa cql3: statement_restrictions: refine possible_lhs_values() subscript solving
Do more work at prepare time.
2026-04-19 20:57:06 +03:00
Avi Kivity
a28689a99a cql3: statement_restrictions: return nullptr from possible_lhs_values instead of on_internal_error
Since we're a first-resort call now, and there's a last-restort (evaluate)

Logically should be part of previous patch, but the rest of the code is still
careful enough not to call here when not expecting a solution, so the split
is not breaking bisectability.
2026-04-19 20:57:06 +03:00
Avi Kivity
370f3fd2e8 cql3: statement_restrictions: convert possible_lhs_values into a solver
Convert from an execute-time function to a prepare-time function
by returning a solver function instead of directly solving.

When not possible to solve, but still possible to evaluate (filter),
return nullptr.
2026-04-19 20:57:06 +03:00
Avi Kivity
92a43557dc cql3: statement_restrictions: split _where to boolean factors in preparation for predicates conversion
Expressions are a tree-like structure so a single expression is sufficient
(for complicated ones, a conjunction is used), but predicates are flat.
Prepare for conversion to predicates by storing the expressions that
will correspond to predicates, namely the boolean factors of the WHERE
clause.
2026-04-19 20:57:06 +03:00
Avi Kivity
694c1aed98 cql3: statement_restrictions: refactor IS NOT NULL processing
Move some code to a helper, but don't let it mutate state.
2026-04-19 20:57:06 +03:00
Avi Kivity
35f14544dc cql3: statement_restrictions: fold add_single_column_nonprimary_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:06 +03:00
Avi Kivity
1965741914 cql3: statement_restrictions: fold add_single_column_clustering_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:06 +03:00
Avi Kivity
1d631f7bac cql3: statement_restrictions: fold add_single_column_partition_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
24cd98e454 cql3: statement_restrictions: fold add_token_partition_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
be3239fc58 cql3: statement_restrictions: fold add_multi_column_clustering_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
8990346c75 cql3: statement_restrictions: avoid early return in add_multi_column_clustering_key_restrictions
Prepare for inlining it into its caller, which doesn't work easily if there's
an early return.
2026-04-19 20:57:05 +03:00
Avi Kivity
fa130051a6 cql3: statement_restrictions: fold add_is_not_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
63f9362c89 cql3: statement_restrictions: fold add_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
9cbb1b851e cql3: statement_restrictions: remove possible_partition_token_values()
It's just a call to possible_lhs_values() with a different signature.

Now possible_lhs_values() is our only solver.
2026-04-19 20:57:05 +03:00
Avi Kivity
c1fc596203 cql3: statement_restrictions: remove possible_column_values
replace with now-identical possible_lhs_values. This paves the way
to have only one solver function (after we remove
possible_partition_token_values).
2026-04-19 20:57:05 +03:00
Avi Kivity
b26e6f7330 cql3: statement_restrictions: pass schema to possible_column_values()
This unifies the signature with possible_lhs_values(), paving the way
to deduplicating the two functions. We always have the schema and may as
well pass it.
2026-04-19 20:57:05 +03:00