Add a function, which given an expression and a column,
extracts all restrictions involving this column.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
The combination of the new types and these functions cannot happen yet,
but as they are generic functions it is better to implement them in
case it becomes possible later.
Now that all selectable::raw subclasses have been converted to
cql3::selectable::with_expression::raw, the class structure is
just a wrapper around expressions. Peel it, converting the
virtual member functions to free functions, and replacing
object instances with expression or nested_expression as the
case allows.
Add a field_selection variant element to expression. Like function_call
and cast, the structure from which a field is selectewd cannot yet be
an expression, since not all seletable::raw:s are converted. This will
be done in a later pass. This is also why printing a field selection now
does not print the selected expression; this will also be corrected later.
Add a cast variant element to expression. Like function_call, the
argument being converted cannot yet be an expression, since not
all seletable::raw:s are converted. This will be done in a later
pass. This is also why printing a cast now does not print the
casted expression; this will also be corrected later.
Rather than creating a new variant element in expression, we extend
function_call to handle both named and anonymous functions, since
most of the processing is the same.
Add a function_call variant element to hold function calls. Note
that because not all selectables are yet converted, function call
arguments are still of type selectable::raw. They will be converted
to expressions later. This is also why printing a function now
does not print its arguments; this will also be corrected later.
Create a new element in the expression variant, column_mutation_attribute,
signifying we're picking up an attribute of a column mutation (not a
column value!). We use an enum rather than a bool to choose between
writetime and ttl (the two mutation attributes) for increased
explicitness.
Although there can only be one type for the column we're operating
on (it must be an unresolved_identifer), we use a nested_expression.
This is because we'll later need to also support a column_value
as the column type after we prepare it. This is somewhat similar
to the address of operator in C, which syntactically takes any
expression but semantically operates only on lvalues.
It is convenient to initialize a nested_expression variable from
one of the types that compose the expression variant, but C++ doesn't
allow it. Add a constructor that does this. Use the new variant_element
concept to constrain the input to be one of the variant's elements.
The exression type cannot be a member of a struct that is an
element of the expression variant. This is because it would then
be required to contain itself. So introduce a nested_expression
type to indirectly hold an expression, but keep the value semantics
we expect from expressions: it is copyable and a copy has separate
identity and storage.
In fact binary_operator had to resort to this trick, so it's converted
to nested_expression in the next patch.
Introduce unresolved_identifer as an unprepared counterpart to column_value.
column_identifier_raw no longer inherits from selectable::raw, but
methods for now to reduce churn.
Adds replace_token function which takes an expression
and replaces all left hand side occurences of token()
with the given column definition.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Currently, column names can only appear in a boolean binary expression,
but not on their own. This means that in the statement
SELECT a FROM tab WHERE a > 3;
We can represent the WHERE clause as an expression, but not the selector.
To pave the way for using expressions in selector contexts, we promote
the elements of binary_operator::lhs (column_value, column_value_tuple,
token) to be expressions in their own right. binary_operator::lhs
becomes an expression (wrapped in unique_ptr, because variants can't
contain themselves).
Note that all three new possibilities make sense in a selector:
SELECT column FROM tab
SELECT token(pk) FROM tab
SELECT function_that_accepts_a_tuple((col1, col2)) FROM tab
There is some fallout from this:
- because binary_operator contains a unique_ptr, it is no longer
copyable. We add a copy constructor and assignment operator to
compensate.
- often, the new elements don't make sense when evaluating a boolean
expression, which is the only context we had before. We call
on_internal_error in these cases. The parser right now prevents such
cases from being constructed in the first place (this is equivalent to
if (some_struct_value) in C).
- in statement_restrictions.cc, we need to evalute the lhs in the context
of the full binary operator. I introduced with_current_binary_operator()
for this; an alternative approach is to create a new sub-visitor.
Closes#8797
std::bind() copies the bound parameters for safekeeping. Here this
includes expr, which can be quite heavyweight. Use std::ref() to
prevent copying. This is safe since the bound expression is executed
and discarded before has_supporting_index() returns.
Closes#8791
Right now, binary_operator::lhs is a variant<column_value,
std::vector<column_value>, token>. The role of the second branch
(a vector of column values) is to represent a tuple of columns
e.g. "WHERE (a, b, c) = ?"), but this is not clear from the type
name.
Inroduce a wrapper type around the vector, column_value_tuple, to
make it clear we're dealing with tuples of CQL references (a
column_value is really a column_ref, since it doesn't actually
contain any value).
Closes#8208
Make to_range able to handle rvalues. We will pass managed_bytes&& to it
in the next patch to avoid pointless copying.
The public declaration of to_range is changed to a concrete function to avoid
having to explicitly instantiate to_range for all possible reference types of
clustering_key_prefix.
This patch switches the type used to store collection elements inside the
intermediate form used in lists::value, tuples::value etc. from bytes
to managed_bytes. After this patch, tuple and list elements are only linearized
in from_serialized, which will be corrected soon.
This commit introduces some additional copies in expression.cc, which
will be dealt with in a future commit.
We want to change the internals of cql3::raw_value{_view}.
However, users of cql3::raw_value and cql3::raw_value_view often
use them by extracting the internal representation, which will be different
after the planned change.
This commit prepares us for the change by making all accesses to the value
inside cql3::raw_value(_view) be done through helper methods which don't expose
the internal representation publicly.
After this commit we are free to change the internal representation of
raw_value_{view} without messing up their users.
Previously, we crashed when the IN marker is bound to null. Throw
invalid_request_exception instead.
Fixes#8265
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#8287
Before this change, we would print an expression like this:
((ColumnDefinition{name=c, type=org.apache.cassandra.db.marshal.Int32Type, kind=CLUSTERING_COLUMN, componentIndex=0, droppedAt=-9223372036854775808}) = 0000007b)
Now, we print the same expression like this:
(c = 0000007b)
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#8285
Omitting these operators didn't cause bugs, because needs_filtering()
is never invoked on them. But that will likely change in the future,
so add them now to prevent problems down the road.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Instead of defining this enum in multi_column_restriction::slice, put
it in the expr namespace and add it to binary_operator. We will need
it when we switch bounds calculation from multi_column_restriction to
expr classes.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
It will be used in statement_restrictions for calculating clustering
bounds. And it will come in handy elsewhere in the future, I'm sure.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Commit aab6b0ee27 introduced the
controversial new IMR format, which relied on a very template-heavy
infrastructure to generate serialization and deserialization code via
template meta-programming. The promise was that this new format, beyond
solving the problems the previous open-coded representation had (working
on linearized buffers), will speed up migrating other components to this
IMR format, as the IMR infrastructure reduces code bloat, makes the code
more readable via declarative type descriptions as well as safer.
However, the results were almost the opposite. The template
meta-programming used by the IMR infrastructure proved very hard to
understand. Developers don't want to read or modify it. Maintainers
don't want to see it being used anywhere else. In short, nobody wants to
touch it.
This commit does a conceptual revert of
aab6b0ee27. A verbatim revert is not
possible because related code evolved a lot since the merge. Also, going
back to the previous code would mean we regress as we'd revert the move
to fragmented buffers. So this revert is only conceptual, it changes the
underlying infrastructure back to the previous open-coded one, but keeps
the fragmented buffers, as well as the interface of the related
components (to the extent possible).
Fixes: #5578
This is a revival of #7490.
Quoting #7490:
The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope.
We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes.
As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized().
Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view).
By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure.
This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator.
Closes#7820
* github.com:scylladb/scylla:
test: add hashers_test
memtable: fix accounting of managed_bytes in partition_snapshot_accounter
test: add managed_bytes_test
utils: fragment_range: add a fragment iterator for FragmentedView
keys: update comments after changes and remove an unused method
mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator
row_cache: more indentation fixes
utils: remove unused linearization facilities in `managed_bytes` class
misc: fix indentation
treewide: remove remaining `with_linearized_managed_bytes` uses
memtable, row_cache: remove `with_linearized_managed_bytes` uses
utils: managed_bytes: remove linearizing accessors
keys, compound: switch from bytes_view to managed_bytes_view
sstables: writer: add write_* helpers for managed_bytes_view
compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view
types: change equal() to accept managed_bytes_view
types: add parallel interfaces for managed_bytes_view
types: add to_managed_bytes(const sstring&)
serializer_impl: handle managed_bytes without linearizing
utils: managed_bytes: add managed_bytes_view::operator[]
utils: managed_bytes: introduce managed_bytes_view
utils: fragment_range: add serialization helpers for FragmentedMutableView
bytes: implement std::hash using appending_hash
utils: mutable_view: add substr()
utils: fragment_range: add compare_unsigned
utils: managed_bytes: make the constructors from bytes and bytes_view explicit
utils: managed_bytes: introduce with_linearized()
utils: managed_bytes: constrain with_linearized_managed_bytes()
utils: managed_bytes: avoid internal uses of managed_bytes::data()
utils: managed_bytes: extract do_linearize_pure()
thrift: do not depend on implicit conversion of keys to bytes_view
clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view
cql3: expression: linearize get_value_from_mutation() eariler
bytes: add to_bytes(bytes)
cql3: expression: mark do_get_value() as static
Replace two methods for unreversal (`as` and `self_or_reversed`) with
a new one (`without_reversed`). More flexible and better named.
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#7889
When the right-hand side of IN is an unset value, we must report an
error, like Cassandra does.
This fixes testListWithUnsetValues, so re-enable it.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
do_get_value() is careful to return a fragmented view, but
its only caller get_value_from_mutation() linearizes it immediately
afterwards. Linearize it sooner; this prevents mixing in
fragmented values from cells (now via IMR) and fragmented values
from partition/clustering keys. It only works now because
keys are not fragmented outside LSA, and value_view has a special
case for single-fragment values.
This helps when keys become fragmented.
Citing #6138: > In the past few years we have converted most of our codebase to
work in terms of fragmented buffers, instead of linearised ones, to help avoid
large allocations that put large pressure on the memory allocator. > One
prominent component that still works exclusively in terms of linearised buffers
is the types hierarchy, more specifically the de/serialization code to/from CQL
format. Note that for most types, this is the same as our internal format,
notable exceptions are non-frozen collections and user types. > > Most types
are expected to contain reasonably small values, but texts, blobs and especially
collections can get very large. Since the entire hierarchy shares a common
interface we can either transition all or none to work with fragmented buffers.
This series gets rid of intermediate linearizations in deserialization. The next
steps are removing linearizations from serialization, validation and comparison
code.
Series summary:
- Fix a bug in `fragmented_temporary_buffer::view::remove_prefix`. (Discovered
while testing. Since it wasn't discovered earlier, I guess it doesn't occur in
any code path in master.)
- Add a `FragmentedView` concept to allow uniform handling of various types of
fragmented buffers (`bytes_view`, `temporary_fragmented_buffer::view`,
`ser::buffer_view` and likely `managed_bytes_view` in the future).
- Implement `FragmentedView` for relevant fragmented buffer types.
- Add helper functions for reading from `FragmentedView`.
- Switch `deserialize()` and all its helpers from `bytes_view` to
`FragmentedView`.
- Remove `with_linearized()` calls which just became unnecessary.
- Add an optimization for single-fragment cases.
The addition of `FragmentedView` might be controversial, because another concept
meant for the same purpose - `FragmentRange` - is already used. Unfortunately,
it lacks the functionality we need. The main (only?) thing we want to do with a
fragmented buffer is to extract a prefix from it and `FragmentRange` gives us no
way to do that, because it's immutable by design. We can work around that by
wrapping it into a mutable view which will track the offset into the immutable
`FragmentRange`, and that's exactly what `linearizing_input_stream` is. But it's
wasteful. `linearizing_input_stream` is a heavy type, unsuitable for passing
around as a view - it stores a pair of fragment iterators, a fragment view and a
size (11 words) to conform to the iterator-based design of `FragmentRange`, when
one fragment iterator (4 words) already contains all needed state, just hidden.
I suggest we replace `FragmentRange` with `FragmentedView` (or something
similar) altogether.
Refs: #6138Closes#7692
* github.com:scylladb/scylla:
types: collection: add an optimization for single-fragment buffers in deserialize
types: add an optimization for single-fragment buffers in deserialize
cql3: tuples: don't linearize in in_value::from_serialized
cql3: expr: expression: replace with_linearize with linearized
cql3: constants: remove unneeded uses of with_linearized
cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell
cql3: lists: remove unneeded use of with_linearized
query-result-set: don't linearize in result_set_builder::deserialize
types: remove unneeded collection deserialization overloads
types: switch collection_type_impl::deserialize from bytes_view to FragmentedView
cql3: sets: don't linearize in value::from_serialized
cql3: lists: don't linearize in value::from_serialized
cql3: maps: don't linearize in value::from_serialized
types: remove unused deserialize_aux
types: deserialize: don't linearize tuple elements
types: deserialize: don't linearize collection elements
types: switch deserialize from bytes_view to FragmentedView
types: deserialize tuple types from FragmentedView
types: deserialize set type from FragmentedView
types: deserialize map type from FragmentedView
types: deserialize list type from FragmentedView
types: add FragmentedView versions of read_collection_size and read_collection_value
types: deserialize varint type from FragmentedView
types: deserialize floating point types from FragmentedView
types: deserialize decimal type from FragmentedView
types: deserialize duration type from FragmentedView
types: deserialize IP address types from FragmentedView
types: deserialize uuid types from FragmentedView
types: deserialize timestamp type from FragmentedView
types: deserialize simple date type from FragmentedView
types: deserialize time type from FragmentedView
types: deserialize boolean type from FragmentedView
types: deserialize integer types from FragmentedView
types: deserialize string types from FragmentedView
types: remove unused read_simple_opt
types: implement read_simple* versions for FragmentedView
utils: fragmented_temporary_buffer: implement FragmentedView for view
utils: fragment_range: add single_fragmented_view
serializer: implement FragmentedView for buffer_view
utils: fragment_range: add linearized and with_linearized for FragmentedView
utils: fragment_range: add FragmentedView
utils: fragmented_temporary_buffer: fix view::remove_prefix
with_linearized creates an additional internal `bytes` when the input is
fragmented. linearized copies the data directly to the output `bytes`, so it's
more efficient.
Clang does not implement P1814R0 (class template argument deduction
for alias templates), so it can't deduce the template arguments
for range_bound, but it can for interval_bound, so switch to that.
Using the modern name rather than the compatibility alias is preferred
anyway.
Closes#7422
In bd6855e, we reverted to Boost ranges and commented out the concept
check. But Boost has its own concept check, which this patch enables.
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#7471
Replace operator_type with the nicer-behaved oper_t in CQL parser and,
consequently, in the relation hierarchy and column_condition.
After this, no references to operator_type remain in live code.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>