Commit Graph

26 Commits

Author SHA1 Message Date
Dejan Mircevski
8db24fc03b cql3/expr: Handle IN ? bound to null
Previously, we crashed when the IN marker is bound to null.  Throw
invalid_request_exception instead.

Fixes #8265

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>

Closes #8287
2021-03-17 09:59:22 +02:00
Dejan Mircevski
992d5c6184 cql3/expr: Improve column printing
Before this change, we would print an expression like this:

((ColumnDefinition{name=c, type=org.apache.cassandra.db.marshal.Int32Type, kind=CLUSTERING_COLUMN, componentIndex=0, droppedAt=-9223372036854775808}) = 0000007b)

Now, we print the same expression like this:

(c = 0000007b)

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>

Closes #8285
2021-03-17 09:59:22 +02:00
Dejan Mircevski
8dac132581 cql3/expr: Add is_multi_column()
It will come in handy when we start using expressions to calculate the
clustering slice.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-03-10 21:25:43 -05:00
Dejan Mircevski
1f591bd16e cql3/expr: Add more operators to needs_filtering
Omitting these operators didn't cause bugs, because needs_filtering()
is never invoked on them.  But that will likely change in the future,
so add them now to prevent problems down the road.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-03-10 21:25:43 -05:00
Dejan Mircevski
c0c93982d0 cql3: Replace CK-bound mode with comparison_order
Instead of defining this enum in multi_column_restriction::slice, put
it in the expr namespace and add it to binary_operator.  We will need
it when we switch bounds calculation from multi_column_restriction to
expr classes.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-03-10 21:25:43 -05:00
Dejan Mircevski
7dfe471b5a cql3/expr: Make to_range globally visible
It will be used in statement_restrictions for calculating clustering
bounds.  And it will come in handy elsewhere in the future, I'm sure.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-03-10 21:25:43 -05:00
Botond Dénes
ba7a9d2ac3 imr: switch back to open-coded description of structures
Commit aab6b0ee27 introduced the
controversial new IMR format, which relied on a very template-heavy
infrastructure to generate serialization and deserialization code via
template meta-programming. The promise was that this new format, beyond
solving the problems the previous open-coded representation had (working
on linearized buffers), will speed up migrating other components to this
IMR format, as the IMR infrastructure reduces code bloat, makes the code
more readable via declarative type descriptions as well as safer.
However, the results were almost the opposite. The template
meta-programming used by the IMR infrastructure proved very hard to
understand. Developers don't want to read or modify it. Maintainers
don't want to see it being used anywhere else. In short, nobody wants to
touch it.

This commit does a conceptual revert of
aab6b0ee27. A verbatim revert is not
possible because related code evolved a lot since the merge. Also, going
back to the previous code would mean we regress as we'd revert the move
to fragmented buffers. So this revert is only conceptual, it changes the
underlying infrastructure back to the previous open-coded one, but keeps
the fragmented buffers, as well as the interface of the related
components (to the extent possible).

Fixes: #5578
2021-02-16 23:43:07 +01:00
Avi Kivity
60f5ec3644 Merge 'managed_bytes: switch to explicit linearization' from Michał Chojnowski
This is a revival of #7490.

Quoting #7490:

The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope.

We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes.

As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized().

Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view).

By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure.

This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator.

Closes #7820

* github.com:scylladb/scylla:
  test: add hashers_test
  memtable: fix accounting of managed_bytes in partition_snapshot_accounter
  test: add managed_bytes_test
  utils: fragment_range: add a fragment iterator for FragmentedView
  keys: update comments after changes and remove an unused method
  mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator
  row_cache: more indentation fixes
  utils: remove unused linearization facilities in `managed_bytes` class
  misc: fix indentation
  treewide: remove remaining `with_linearized_managed_bytes` uses
  memtable, row_cache: remove `with_linearized_managed_bytes` uses
  utils: managed_bytes: remove linearizing accessors
  keys, compound: switch from bytes_view to managed_bytes_view
  sstables: writer: add write_* helpers for managed_bytes_view
  compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view
  types: change equal() to accept managed_bytes_view
  types: add parallel interfaces for managed_bytes_view
  types: add to_managed_bytes(const sstring&)
  serializer_impl: handle managed_bytes without linearizing
  utils: managed_bytes: add managed_bytes_view::operator[]
  utils: managed_bytes: introduce managed_bytes_view
  utils: fragment_range: add serialization helpers for FragmentedMutableView
  bytes: implement std::hash using appending_hash
  utils: mutable_view: add substr()
  utils: fragment_range: add compare_unsigned
  utils: managed_bytes: make the constructors from bytes and bytes_view explicit
  utils: managed_bytes: introduce with_linearized()
  utils: managed_bytes: constrain with_linearized_managed_bytes()
  utils: managed_bytes: avoid internal uses of managed_bytes::data()
  utils: managed_bytes: extract do_linearize_pure()
  thrift: do not depend on implicit conversion of keys to bytes_view
  clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view
  cql3: expression: linearize get_value_from_mutation() eariler
  bytes: add to_bytes(bytes)
  cql3: expression: mark do_get_value() as static
2021-01-18 11:01:28 +02:00
Dejan Mircevski
3aa80f47fe abstract_type: Rework unreversal methods
Replace two methods for unreversal (`as` and `self_or_reversed`) with
a new one (`without_reversed`).  More flexible and better named.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>

Closes #7889
2021-01-10 19:30:12 +02:00
Dejan Mircevski
4515a49d4d cql3: Fix IN ? for unset values
When the right-hand side of IN is an unset value, we must report an
error, like Cassandra does.

This fixes testListWithUnsetValues, so re-enable it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-01-07 13:22:20 +02:00
Avi Kivity
1dd6d7029a cql3: expression: linearize get_value_from_mutation() eariler
do_get_value() is careful to return a fragmented view, but
its only caller get_value_from_mutation() linearizes it immediately
afterwards. Linearize it sooner; this prevents mixing in
fragmented values from cells (now via IMR) and fragmented values
from partition/clustering keys. It only works now because
keys are not fragmented outside LSA, and value_view has a special
case for single-fragment values.

This helps when keys become fragmented.
2020-12-20 15:14:44 +01:00
Avi Kivity
28126257c2 cql3: expression: mark do_get_value() as static
It is used only later in this file.
2020-12-20 15:14:44 +01:00
Piotr Sarna
2015988373 Merge 'types: get rid of linearization in deserialize()' from Michał Chojnowski
Citing #6138: > In the past few years we have converted most of our codebase to
work in terms of fragmented buffers, instead of linearised ones, to help avoid
large allocations that put large pressure on the memory allocator.  > One
prominent component that still works exclusively in terms of linearised buffers
is the types hierarchy, more specifically the de/serialization code to/from CQL
format. Note that for most types, this is the same as our internal format,
notable exceptions are non-frozen collections and user types.  > > Most types
are expected to contain reasonably small values, but texts, blobs and especially
collections can get very large. Since the entire hierarchy shares a common
interface we can either transition all or none to work with fragmented buffers.

This series gets rid of intermediate linearizations in deserialization. The next
steps are removing linearizations from serialization, validation and comparison
code.

Series summary:
- Fix a bug in `fragmented_temporary_buffer::view::remove_prefix`. (Discovered
  while testing. Since it wasn't discovered earlier, I guess it doesn't occur in
  any code path in master.)
- Add a `FragmentedView` concept to allow uniform handling of various types of
  fragmented buffers (`bytes_view`, `temporary_fragmented_buffer::view`,
  `ser::buffer_view` and likely `managed_bytes_view` in the future).
- Implement `FragmentedView` for relevant fragmented buffer types.
- Add helper functions for reading from `FragmentedView`.
- Switch `deserialize()` and all its helpers from `bytes_view` to
  `FragmentedView`.
- Remove `with_linearized()` calls which just became unnecessary.
- Add an optimization for single-fragment cases.

The addition of `FragmentedView` might be controversial, because another concept
meant for the same purpose - `FragmentRange` - is already used. Unfortunately,
it lacks the functionality we need. The main (only?) thing we want to do with a
fragmented buffer is to extract a prefix from it and `FragmentRange` gives us no
way to do that, because it's immutable by design. We can work around that by
wrapping it into a mutable view which will track the offset into the immutable
`FragmentRange`, and that's exactly what `linearizing_input_stream` is. But it's
wasteful. `linearizing_input_stream` is a heavy type, unsuitable for passing
around as a view - it stores a pair of fragment iterators, a fragment view and a
size (11 words) to conform to the iterator-based design of `FragmentRange`, when
one fragment iterator (4 words) already contains all needed state, just hidden.
I suggest we replace `FragmentRange` with `FragmentedView` (or something
similar) altogether.

Refs: #6138

Closes #7692

* github.com:scylladb/scylla:
  types: collection: add an optimization for single-fragment buffers in deserialize
  types: add an optimization for single-fragment buffers in deserialize
  cql3: tuples: don't linearize in in_value::from_serialized
  cql3: expr: expression: replace with_linearize with linearized
  cql3: constants: remove unneeded uses of with_linearized
  cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell
  cql3: lists: remove unneeded use of with_linearized
  query-result-set: don't linearize in result_set_builder::deserialize
  types: remove unneeded collection deserialization overloads
  types: switch collection_type_impl::deserialize from bytes_view to FragmentedView
  cql3: sets: don't linearize in value::from_serialized
  cql3: lists: don't linearize in value::from_serialized
  cql3: maps: don't linearize in value::from_serialized
  types: remove unused deserialize_aux
  types: deserialize: don't linearize tuple elements
  types: deserialize: don't linearize collection elements
  types: switch deserialize from bytes_view to FragmentedView
  types: deserialize tuple types from FragmentedView
  types: deserialize set type from FragmentedView
  types: deserialize map type from FragmentedView
  types: deserialize list type from FragmentedView
  types: add FragmentedView versions of read_collection_size and read_collection_value
  types: deserialize varint type from FragmentedView
  types: deserialize floating point types from FragmentedView
  types: deserialize decimal type from FragmentedView
  types: deserialize duration type from FragmentedView
  types: deserialize IP address types from FragmentedView
  types: deserialize uuid types from FragmentedView
  types: deserialize timestamp type from FragmentedView
  types: deserialize simple date type from FragmentedView
  types: deserialize time type from FragmentedView
  types: deserialize boolean type from FragmentedView
  types: deserialize integer types from FragmentedView
  types: deserialize string types from FragmentedView
  types: remove unused read_simple_opt
  types: implement read_simple* versions for FragmentedView
  utils: fragmented_temporary_buffer: implement FragmentedView for view
  utils: fragment_range: add single_fragmented_view
  serializer: implement FragmentedView for buffer_view
  utils: fragment_range: add linearized and with_linearized for FragmentedView
  utils: fragment_range: add FragmentedView
  utils: fragmented_temporary_buffer: fix view::remove_prefix
2020-12-04 09:46:20 +01:00
Michał Chojnowski
68177a6721 cql3: expr: expression: replace with_linearize with linearized
with_linearized creates an additional internal `bytes` when the input is
fragmented. linearized copies the data directly to the output `bytes`, so it's
more efficient.
2020-12-04 09:19:39 +01:00
Dejan Mircevski
7f8ed811c1 cql3/expr: Clarify multi-column doesn't use indexing
Although not currently used, the old code was wrong and confusing to
readers.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-11-25 10:59:13 -05:00
Avi Kivity
6be9f49380 cql3: expression: switch from range_bound to interval_bound to avoid clang class template argument deduction woes
Clang does not implement P1814R0 (class template argument deduction
for alias templates), so it can't deduce the template arguments
for range_bound, but it can for interval_bound, so switch to that.
Using the modern name rather than the compatibility alias is preferred
anyway.

Closes #7422
2020-11-01 13:19:44 +02:00
Dejan Mircevski
40adf38915 cql3/expr: Use Boost concept assert
In bd6855e, we reverted to Boost ranges and commented out the concept
check.  But Boost has its own concept check, which this patch enables.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>

Closes #7471
2020-10-22 17:24:49 +03:00
Avi Kivity
bd6855ed62 cql3: expression: drop <ranges>
Clang has trouble with some parts of <ranges>. Replace with
boost range adaptors for now.
2020-10-19 10:23:30 +03:00
Dejan Mircevski
df3ea2443b cql3: Drop all uses_function methods
No one seems to call them except for other uses_function methods.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-09-04 17:27:30 +02:00
Dejan Mircevski
cbf8186a12 cql3/expr: Drop make_column_op()
Instantiating binary_operator directly is more readable.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-25 11:10:36 +03:00
Dejan Mircevski
fb6c011b52 everywhere: Insert space after switch
Quoth @avikivity: "switch is not a function, and we celebrate that by
putting a space after it like other control-flow keywords."

https://github.com/scylladb/scylla/pull/7052#discussion_r471932710

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 14:31:04 +03:00
Dejan Mircevski
1aa326c93b cql3: Drop operator_type entirely
Since no live code uses it anymore, it can be safely removed.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 12:27:01 +02:00
Dejan Mircevski
d97605f4f8 cql3: Drop operator_type from the parser
Replace operator_type with the nicer-behaved oper_t in CQL parser and,
consequently, in the relation hierarchy and column_condition.

After this, no references to operator_type remain in live code.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 12:27:00 +02:00
Dejan Mircevski
71c921111d cql3/expr: Replace operator_type with an enum
operator_type is awkward because it's not copyable or assignable.
Replace it in expression representation with a new enum class, oper_t.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 12:27:00 +02:00
Dejan Mircevski
8cae61ee6b cql3: Move #include from .hh to .cc
restrictions.hh included fmt/ostream.h, which is expensive due to its
transitive #includes.  Replace it with fmt/core.h, which transitively
includes only standard C++ headers.

As requested by #5763 feedback:
https://github.com/scylladb/scylla/pull/5763#discussion_r443210634

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-08 21:37:08 +03:00
Dejan Mircevski
df20854963 cql3: Move expressions to their own namespace
Move the classes representing CQL expressions (and utility functions
on them) from the `restrictions` namespace to a new namespace `expr`.

Most of the restriction.hh content was moved verbatim to
expression.hh.  Similarly, all expression-related code was moved from
statement_restrictions.cc verbatim to expression.cc.

As suggested in #5763 feedback
https://github.com/scylladb/scylla/pull/5763#discussion_r443210498

Tests: dev (unit)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-08 21:03:26 +03:00