Commit Graph

53380 Commits

Author SHA1 Message Date
Avi Kivity
d4ff613c0a cql3: statement_restrictions: replace restr bridge variable with pred.filter
The constructor loop no longer needs to extract a binary_operator
reference from each predicate. All remaining uses (make_conjunction,
get_columns_in_commons, assignment to accumulated restriction members,
_where.push_back, and error formatting) accept expression directly,
which is what pred.filter already is. This eliminates the unnecessary
as<binary_operator> cast at the top of the loop.
2026-04-19 20:57:07 +03:00
Avi Kivity
44b18f3399 cql3: statement_restrictions: convert single-column branch to use predicate properties
In the single-column partition-key and clustering-key sub-branches,
replace direct binary_operator field inspections with pre-computed
predicate booleans: !pred.equality && !pred.is_in instead of
restr.op != EQ && restr.op != IN, pred.is_in instead of
find(restr, IN), and pred.is_slice instead of has_slice(restr).
Also fix a leftover restr.order in the multi-column branch error
message.
2026-04-19 20:57:07 +03:00
Avi Kivity
b0c5eed384 cql3: statement_restrictions: convert multi-column branch to use predicate properties
Replace direct operator comparisons with predicate boolean fields:
pred.equality, pred.is_in, pred.is_slice, pred.is_lower_bound,
pred.is_upper_bound, and pred.order.
2026-04-19 20:57:07 +03:00
Avi Kivity
afd68187ea cql3: statement_restrictions: convert constructor loop to iterate over predicates
Convert the constructor loop to first build predicates from the
prepared where clause, then iterate over the predicates.

The IS_NOT branch now uses pred.is_not_null_single_column and pred.on
instead of inspecting the expression directly. The branch conditions
for multi-column (pred.is_multi_column), token
(on_partition_key_token), and single-column (on_column) now use
predicate properties instead of expression helpers.

Remove extract_column_from_is_not_null_restriction() which is no
longer needed.
2026-04-19 20:57:07 +03:00
Avi Kivity
440d9f2d82 cql3: statement_restrictions: annotate predicates with operator properties
Add boolean fields to struct predicate that describe the operator:
equality, is_in, is_slice, is_upper_bound, is_lower_bound, and
comparison_order. Populate them in all to_predicates() return sites.

These fields will allow the constructor loop to inspect predicate
properties directly instead of re-examining the expression.
2026-04-19 20:57:07 +03:00
Avi Kivity
e0eb3bde8d cql3: statement_restrictions: annotate predicates with is_not_null and is_multi_column
To avoid having to dig deep into the expression, compute is_not_null
and is_multicolumn early and store them in the predicate.
2026-04-19 20:57:06 +03:00
Avi Kivity
6892642176 cql3: statement_restrictions: complete preparation early
We want to move away from the unprepared domain to the prepared
domain to avoid confusion. Ideally we'd receive prepared expressions
via the constructor, but that is left for later.
2026-04-19 20:57:06 +03:00
Avi Kivity
ed5dd645e8 cql3: statement_restrictions: convert expressions to predicates without being directed at a specific column
Currently, possible_lhs_values accepts a column_definition parameter
that tells it which column we are interested in. This works
because callers pre-analyze the expression and only pass a
subexpression that contains the specified columns.

We wish to convert expressions to predicates early, and so won't
have the benefit of knowing which columns we're interested in.

Generally, this is simple: a binary operator contains a column on the
left-hand side, so use that. If the expression is on a token, use that.

When the expression is a boolean constant (not expressible by
the grammar, but somehow found its way into the code). We invent
a new `on_row` designator meaning it's not about a specific column.
It will be useful one day when we allow things like
`WHERE some_boolean_function(c1, c2)` that aren't specific to any
single column.

Finally, we introduce helpers that, given such an expression decomposed
into predicates and a column_definition, extract the predicate related
to the given column. This mimics the possible_lhs_values API and allows
us to make minimal changes to callers, deferring that until later.

possible_lhs_values() is renamed to to_predicates() and loses the
column_definition parameter to indicate its new role.
2026-04-19 20:57:06 +03:00
Avi Kivity
bfd1302311 cql3: statement_restrictions: refine possible_lhs_values() function_call processing
Currently, we are careful to call possible_lhs_values() for a token
function only when slice/equality operators are used. We wish to relax
this, so return nullptr (must filter) for the other cases instead of
raising an internal error.
2026-04-19 20:57:06 +03:00
Avi Kivity
736011b663 cql3: statement_restrictions: return nullptr for function solver if not token
Currently, possible_lhs_values() for a function call expression will
only be called when we're sure it's the token() function. But soon this
will no longer be the case. Return nullptr for non-token functions to
indicate we can't solve for a column value instead of an internal
error.
2026-04-19 20:57:06 +03:00
Avi Kivity
8faf62a1aa cql3: statement_restrictions: refine possible_lhs_values() subscript solving
Do more work at prepare time.
2026-04-19 20:57:06 +03:00
Avi Kivity
a28689a99a cql3: statement_restrictions: return nullptr from possible_lhs_values instead of on_internal_error
Since we're a first-resort call now, and there's a last-restort (evaluate)

Logically should be part of previous patch, but the rest of the code is still
careful enough not to call here when not expecting a solution, so the split
is not breaking bisectability.
2026-04-19 20:57:06 +03:00
Avi Kivity
370f3fd2e8 cql3: statement_restrictions: convert possible_lhs_values into a solver
Convert from an execute-time function to a prepare-time function
by returning a solver function instead of directly solving.

When not possible to solve, but still possible to evaluate (filter),
return nullptr.
2026-04-19 20:57:06 +03:00
Avi Kivity
92a43557dc cql3: statement_restrictions: split _where to boolean factors in preparation for predicates conversion
Expressions are a tree-like structure so a single expression is sufficient
(for complicated ones, a conjunction is used), but predicates are flat.
Prepare for conversion to predicates by storing the expressions that
will correspond to predicates, namely the boolean factors of the WHERE
clause.
2026-04-19 20:57:06 +03:00
Avi Kivity
694c1aed98 cql3: statement_restrictions: refactor IS NOT NULL processing
Move some code to a helper, but don't let it mutate state.
2026-04-19 20:57:06 +03:00
Avi Kivity
35f14544dc cql3: statement_restrictions: fold add_single_column_nonprimary_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:06 +03:00
Avi Kivity
1965741914 cql3: statement_restrictions: fold add_single_column_clustering_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:06 +03:00
Avi Kivity
1d631f7bac cql3: statement_restrictions: fold add_single_column_partition_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
24cd98e454 cql3: statement_restrictions: fold add_token_partition_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
be3239fc58 cql3: statement_restrictions: fold add_multi_column_clustering_key_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
8990346c75 cql3: statement_restrictions: avoid early return in add_multi_column_clustering_key_restrictions
Prepare for inlining it into its caller, which doesn't work easily if there's
an early return.
2026-04-19 20:57:05 +03:00
Avi Kivity
fa130051a6 cql3: statement_restrictions: fold add_is_not_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
63f9362c89 cql3: statement_restrictions: fold add_restriction() into its caller
The goal is to simplify flow-control where the order in which
variables are updated depends on their location in the source.
With functions, this is difficult.
2026-04-19 20:57:05 +03:00
Avi Kivity
9cbb1b851e cql3: statement_restrictions: remove possible_partition_token_values()
It's just a call to possible_lhs_values() with a different signature.

Now possible_lhs_values() is our only solver.
2026-04-19 20:57:05 +03:00
Avi Kivity
c1fc596203 cql3: statement_restrictions: remove possible_column_values
replace with now-identical possible_lhs_values. This paves the way
to have only one solver function (after we remove
possible_partition_token_values).
2026-04-19 20:57:05 +03:00
Avi Kivity
b26e6f7330 cql3: statement_restrictions: pass schema to possible_column_values()
This unifies the signature with possible_lhs_values(), paving the way
to deduplicating the two functions. We always have the schema and may as
well pass it.
2026-04-19 20:57:05 +03:00
Avi Kivity
c6f6e81fe5 cql3: statement_restrictions: remove fallback path in solve()
All query plans that try to solve for the possible values a column
(or token, or column-tuple) can take have been converted to set
analyzed_column::solve_for. Recognize that by removing the
fallback path.

This removes the last possible_column_values() call that isn't bound
(using std::bind_front), and will allow moving it to prepare time.
2026-04-19 20:57:05 +03:00
Avi Kivity
e0445269e5 cql3: statement_restrictions: reorder possible_lhs_column parameters
By moving query_options to the end, we can use std::bind_front to
convert it from a build-time to a run-time function that depends
only on the query_options.
2026-04-19 20:57:05 +03:00
Avi Kivity
e42ad62561 cql3: statement_restrictions: prepare solver for multi-column restrictions
Multi-column restrictions (a, b) > (:v1, :v2) do not obey normal
comparison rules. For example, given

 (a, b) > (5, 1) AND a <= 5

We see that (a, b) = (5, 2) satisfies the constraint, but if we tried
to solve for the interval

 ( (5, 1), (5) ]

We'd have to conclude that (5,1) <= (5).

It's possible to extend the CQL type system to support this, but
that would be a lot of work, and in fact the current code doesn't
depend on it (by solving these intersections in its own code path
(multi_column_range_accumulator_builder's prefix3cmp).

So, we just mark such solvers as non-comparable, and generate an
internal error if we try to compare them in make_conjunction.
2026-04-19 20:57:05 +03:00
Avi Kivity
96e8414963 cql3: statement_restrictions: add solver for token restriction on index
possible_column_values() knows how to find the values that the token can
take, so add a solve_for implementation for tokens.
2026-04-19 20:57:04 +03:00
Avi Kivity
135809d97b cql3: statement_restrictions: pre-analyze column in value_for()
Since we pre-analyze the column, return a built function, and remove
the corresponding lambda from the caller.
2026-04-19 20:57:04 +03:00
Avi Kivity
0a16d90acb cql3: statement_restrictions: don't handle boolean constants in multi_column_range_accumulator_builder
In statement_restriction's constructor, we check that all the boolean factors
are relations. This means the code to handle a constant here is dead code.

Remove it; while it's good to handle it, it should be handled at the top
level, not in multi-column restriction processing.
2026-04-19 20:57:04 +03:00
Avi Kivity
56ae02d8a3 cql3: statement_restrictions: split range_from_raw_bounds into prepare phase and query phase
range_from_raw_bound processes restrictions of the form

   (a, b) > SCYLLA_CLUSTERING_BOUND(?, ?)

indicating that comparisons respect whether columns are reversed or not.

Iterate over expressions during the prepare phase only; generating
"builder" functions to be executed during the query phase.
2026-04-19 20:57:04 +03:00
Avi Kivity
2c75123bbd cql3: statement_restrictions: adjust signature of range_from_raw_bounds
The get_clustering_bounds() family works in terms of vectors of
clustering ranges (to support IN) and in fact the only caller converts
it to a vector. Converting it immediately simplifies later patching.
2026-04-19 20:57:04 +03:00
Avi Kivity
e646b763e7 cql3: statement_restrictions: split multi_column_range_accumulator into prepare-time and query-time phases
multi_column_range_accumulator analyzes an expression containing multi-column
restrictions of the form (a, b) > (?, ?) and simultaneously analyzes
them and solves for the set of intervals that satisfy those restrictions.

Split this into prepare-time phase (that generates "builders", functions
that operator on the accumulator), and a query phase that executes
the builders. Importantly, the expression visitor ends up on the prepare
phase, so it can be merged with other parts of the analysis.

Helper functions of the visitor are made static, since they need to
run during the query phase but the visitor only exists during the
prepare phase.
2026-04-19 20:57:04 +03:00
Avi Kivity
ea26186043 cql3: statement_restrictions: make get_multi_column_clustering_bounds a builder
Lay the groundwork for analyzing multi column clustering bounds by
splitting the function into prepare-time and execute-time parts.
To start with, all of the work is done at query time, but later
patches will move bits into prepare time.
2026-04-19 20:57:04 +03:00
Avi Kivity
c60e3d5cf7 cql3: statement_restrictions: multi-key clustering restrictions one layer deeper
For the multi column binary operator case, perform more of the work at
prepare time in preparation for consolidating the analysis.
2026-04-19 20:57:04 +03:00
Avi Kivity
b520e74128 cql3: statement_restrictions: push multi-column post-processing into get_multi_column_clustering_bounds()
Doing this splits the multi-column processing code into a preparation
phase and an evaluation phase in a single call, making it easier to
further split prepare/evaluate.
2026-04-19 20:57:04 +03:00
Avi Kivity
c4ab0ddb85 cql3: statement_restrictions: pre-analyze single-column clustering key restrictions
Change _clustering_prefix_restrictions and _idx_tbl_ck_prefix
(the latter is the equivalent of the former, for indexed queries),
to use predicate instead of expressions. This lets us do
more of the work of solving restrictions during prepare time.

We only handle single-column restrictions here. Multi-column
restrictions use the existing path.

We introduce two helpers:
 - value_set_to_singleton() converts a restriction solution to a singleton
   when we know that's the only possible answer
 - replace_column_def() overload for predicate, similar to the
   existing overload for expressions

There is a wart in get_single_column_clustering_bounds(): we arrive at
his point with the two vectors possibly pointing at different
columns. Previously, possible_lhs_values() did this check while solving.
We now check for it here.

The predicate::on variant gets another member, for clustering key prefixes.
Since everything is still handled by the legacy paths, we mostly
error out.
2026-04-19 20:57:04 +03:00
Avi Kivity
201ed53837 cql3: statement_restrictions: wrap value_for_index_partition_key()
To allow more work to be carried out during prepare time, wrap
the body in an std::function, which will be called at execution time.

Currently we actually do the work during execution time; but the
way is prepared.
2026-04-19 20:57:04 +03:00
Avi Kivity
325497d460 cql3: statement_restrictions: hide value_for()
value_for() is a general function that solves for values that
satisfy an expression set to TRUE. This goes against our goal to
prepare solvers for all the expressions we use. Fortunately, it's only
called with one expression, which comes from statement_restrictions, so
we can add an accessor that provides the expression from our own state.
Later, we'll be able to do prepare-time work on it.
2026-04-19 20:57:04 +03:00
Avi Kivity
dcdd2f7e72 cql3: statement_restrictions: push down clustering prefix wrapper one level
This allows us to tackle each case separately.
2026-04-19 20:57:03 +03:00
Avi Kivity
1039ed9ed2 cql3: statement_restrictions: wrap functions that return clustering ranges
During prepare time, build functions for use during execution time.

Currently, the wrappers are very shallow, and practically all the
work is done at execution time. But the stage is set for more peeling.

The index clustering ranges had on_internal_error()s if an index
was not used. They're converted to returning a null function. If
executed (which is never supposed to happen), it will throw
a bad_function_call.
2026-04-19 20:57:03 +03:00
Avi Kivity
620df7103f cql3: statement_restrictions: do not pass view schema back and forth
For indexed queries, statement_restrictions calculates _view_schema,
which is passed via get_view_schema() to indexed_select_statement(),
which passes it right back to statement_restrictions via one of three
functions to calculate clustering ranges.

Avoid the back-and-forth and use the stored value. Using a different
value would be broken.

This change allows unifying the signatures of the four functions that
get clustering ranges.
2026-04-19 20:57:03 +03:00
Avi Kivity
6fce090e30 cql3: statement_restrictions: pre-analyze token range restrictions
Convert token range restrictions to the predicate format we
introduced earlier, where we have a function to solve for the token
range rather than running the analysis at runtime. Again the truth is
that the function will delegate to possible_partition_token_values()
which actually will do the analysis at runtime, but it's one step closer.

We add a new variant element for predicate::on, since it doesn't
fit the existing element (the token isn't a column).
2026-04-19 20:57:03 +03:00
Avi Kivity
941011bb4a cql3: statement_restrictions: pre-analyze partition key columns
The expression tree for partition keys is analyzed during runtime:
in partition_range_from_singles() (for example), we call find_binop
and get_subscripted_column() to understand the expression structure.

This analysis is problematic because it has to match the analysis
during prepare time; and they have to evolve in lock step.

Here, we move the analysis to the prepare stage. This is done
by augmenting the expression into a new predicate struct. It
contains the original expression (as a fallback for paths not yet
converted), as well as a solve_for function which contains
a function built at prepare time that embeds all the necessary analysis.

We introduce the `predicate` type which is an augmentation
of boolean expressions. In addition to the expression, we remember
what column the expression is on, and a function that computes
what values the column can take on that would make the expression
true.

The field that says what column the predicate is about is typed
as a variant since later on we will have predicates on non-columns
(the token, or a clustering prefix).

Note that currently the function engages in some run-time analysis of
its own, since it calls possible_lhs_values that itself does analysis,
but this is a step in the right direction.
2026-04-19 20:57:03 +03:00
Avi Kivity
c73f3ac55f cql3: statement_restrictions: do not collect subscripted partition key columns
An indexed SELECT of the from

SELECT ...
WHERE pk['sub'] = ?

is impossible because our indexes do not support frozen maps, and
partition key collections must be frozen. Stop collecting such constructs
for the purpose of determining the partition range. This reduces having
to deal with combinations of restrictions on the column and its entries
later on.

In case we start supporting indexes on frozen maps, leave an
on_internal_error to remind us.
2026-04-19 20:57:03 +03:00
Avi Kivity
531f137ed3 cql3: statement_restrictions: split _partition_range_restrictions into three cases
_partition_range_restrictions are a vector of expressions, one per
partition key column, except that it can be empty if there is no
restriction on the partition that can be translated to a read command,
and if the restriction is on a token range, the first element only
is used.

Separate the three cases into distinct structs. After this, additional
work can be done utilizing the specialization.
2026-04-19 20:57:03 +03:00
Avi Kivity
fcf7c4c90d cql3: statement_restrictions: move value_list, value_set to header file
They don't really need to be public, but will be used in intermediate
storage.
2026-04-19 20:57:03 +03:00
Avi Kivity
926886fcfb cql3: statement_restrictions: wrap get_partition_key_ranges
statement_restrictions::get_partition_key_ranges() re-interprets
the expressions used to specify the partition key. This means that
the analysis phase (determining what those expressions are and how
they are to be used) and the execution phase (using them) are in separate
places. This makes it very hard to refactor while preserving correctness.

As a first step in unifying the two phases, we move the selection
of the strategy (using token, cartesian product, or single partition)
from execution to analysis, by making the if-tree return a function to
be executed at execution time, rather than running the if-tree itself
at execution time.
2026-04-19 20:57:03 +03:00