Commit Graph

409 Commits

Author SHA1 Message Date
Nadav Har'El
27ab560abd cql: fix hang during certain SELECT statements
The function intersection(r1,r2) in statement_restrictions.cc is used
when several WHERE restrictions were applied to the same column.
For example, for "WHERE b<1 AND b<2" the intersection of the two ranges
is calculated to be b<1.

As noted in issue #18690, Scylla is inconsistent in where it allows or
doesn't allow these intersecting restrictions. But where they are
allowed they must be implemented correctly. And it turns out the
function intersection() had a bug that caused it to sometimes enter
an infinite loop - when the intent was only to call itself once with
swapped parameters.

This patch includes a test reproducing this bug, and a fix for the
bug. The test hangs before the fix, and passes after the fix.

While at it, I carefully reviewed the entire code used to implement
the intersection() function to try to make sure that the bug we found
was the only one. I also added a few more comments where I thought they
were needed to understand complicated logic of the code.

The bug, the fix and the test were originally discovered by
Michał Chojnowski.

Fixes #18688
Refs #18690

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18694
2024-05-16 11:25:44 +03:00
Nadav Har'El
f604269f0a cql3, secondary index: consistently choose index to use in a query
When a table has secondary indexes on *multiple* columns, and several
such columns are used for filtering in a query, Scylla chooses one
of these indexes as the main driver of the query, and the second
column's restriction is implemented as filtering.

Before this patch, the index to use was chosen fairly randomly, based on
the order of the indexes in the schema. This order may be different in
different coordinators, and may even change across restarts on the same
coordinators. This is not only inconsistent, it can cause outright wrong
results when using *paging* and switching (or restarting) coordinates
in the middle of a paged scan... One coordinator saves one index's key
in the paging state, and then the other coordinator gets this paging
state and wrongly believes it is supposed to be a key of a *different*
index.

The fix in this patch is to pick the index suitable for the first
indexed column mentioned in the query. This has two benefits over
the situation before the patch:

1. The decision of which index to use no longer changes between
   coordinators or across restarts - it just depends on the schema
   and the specific query.

2. Different indexes can have different "specificity" so using one
   or the other can change the query's performance. After this patch,
   the user is in control over which index is used by changing the
   order of terms in the query. A curious user can use tracing to
   check which index was used to implement a particular query.

An xfailing test we had for this issue no longer fails, so the "xfail"
marker is removed.

Fixes #7969

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#14450
2024-05-02 19:52:42 +02:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Avi Kivity
51df8b9173 interval: rename nonwrapping_interval to interval
Our interval template started life as `range`, and was supported
wrapping to follow Cassandra's convention of wrapping around the
maximum token.

We later recognized that an interval type should usually be non-wrapping
and split it into wrapping_range and nonwrapping_range, with `range`
aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and
so renamed it to `interval`. Given that intervals are usually non-wrapping,
the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes
that an interval is non-wrapping and so doesn't need the
nonwrapping_interval_designation. We just rename nonwrapping_interval
to `interval` and remove the type alias.
2024-02-21 19:43:17 +02:00
Avi Kivity
605bf6e221 range.hh: retire
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.

Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.

The unit tests are renamed and range.hh is deleted.

Closes scylladb/scylladb#17428
2024-02-21 00:24:25 +02:00
Kurashkin Nikita
d90eeb5c4f cql3:statement_restrictions.cc: multi-column relation null check
Before this patch we received internal server error
"Attempted to create key component from empty optional" when used null in
multi-column relations.
This patch adds a null check for each element of each tuple in the
expression and generates an invalid request error if it finds such an element.

Modified cassandra test and added a new one that checks the occurrence of null values in tuples.
Added a test that checks whether the wrong number of items is entered in tuples.

Fixes #13217

Closes scylladb/scylladb#16415
2024-01-25 14:17:43 +02:00
Kefu Chai
2dbf044b91 cql3: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16791
2024-01-16 16:43:17 +02:00
Kurashkin Nikita
c071cd92b5 cql3:statement_restrictions.cc add more conditions to prevent "allow filtering" error to pop up in delete/update statements
Modified Cassandra tests to check for Scylla's error messages
Fixes #12474

Closes scylladb/scylladb#15811
2023-12-07 21:25:18 +02:00
Kurashkin Nikita
1438e531f8 cql3: statement_restrictions: cartesian product size error message fix.
This commit fixes:
1.The error message will be specific about what type of keys
exceeds the limit (e.g clustering keys or partition keys).
2.Error message will be more general about what causes it, cartesian product
or simple list.
3.Error message will advise to use --max-partition-key-restrictions-per-query
or --max-clustering-key-restrictions-per-query configuration options to
override current (100) limit.

Fixes #15627

Closes scylladb/scylladb#16226
2023-12-05 07:27:03 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Benny Halevy
a1acf6854b everywhere: reduce dependencies on i_partitioner.hh
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:47:44 +02:00
Kefu Chai
d43afd576e cql3/restrictions/statement_restrictions: s/allow filtering/ALLOW FILTERING/
use the captalized "ALLOW FILTERING" in the error message, because the
error message is a part of the user interface, it would be better to
keep it aligned with our document, where "ALLOW FILTERING" is used.

so, in this change, the lower-cased "allow filtering" error message is
changed to "ALLOW FILTERING", and the tests are updated accordingly.

see also a0ffbf3291

Refs #14321
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15718
2023-10-26 10:00:37 +03:00
Pavel Emelyanov
ea9db1b35c Merge 'cql3: expr: remove the default constructor' from Avi Kivity
`expression`'s default constructor is dangerous as an it can leak
into computations and generate surprising results. Fix that by
removing the default constructor.

This is made somewhat difficult by the parser generator's reliance
on default construction, and we need to expand our workaround
(`uninitialized<>`) capabilities to do so.

We also remove some incidental uses of default-constructed expressions.

Closes #14706

* github.com:scylladb/scylladb:
  cql3: expr: make expression non-default-constructible
  cql3: grammar: don't default-construct expressions
  cql3: grammar: improve uninitialized<> flexibility
  cql3: grammar: adjust uninitialized<> wrapper
  test: expr_test: don't invoke expression's default constructor
  cql3: statement_restrictions: explicitly initialize expressions in index match code
  cql3: statement_restrictions: explicitly intitialize some expression fields
  cql3: statement_restrictions: avoid expression's default constructor when classifying restrictions
  cql3: expr: prepare_expression: avoid default-constructed expression
  cql3: broadcast_tables: prepare new_value without relying on expression default constructor
2023-07-19 21:46:03 +03:00
Botond Dénes
81175b5ffc cql3/restrictions/statement_restrictions: fix indentation
Left broken in the previous patch.
2023-07-19 01:28:28 -04:00
Botond Dénes
c7b3faccd2 cql3/restrictions/statement_restrictions: add check_indexes flag
Allowing caller to turn off checking for indexes. Useful if the
restrictions are applied on a pseudo-table, which has no corresponding
table object, and therefore no index manager (or indexes for that
matter).
2023-07-19 01:28:28 -04:00
Avi Kivity
4b6e38e704 cql3: statement_restrictions: explicitly initialize expressions in index match code
The index match code has some default-initialized expressions. These won't
compile when we remove expression's default constructor, so replace them
with the current default value, an empty conjunction.

An empty conjunction doesn't make any special sense here; the code
should be refactored not to rely on this random initial value. But this
is delicate code and the refactoring shouldn't be done in the middle of
an unrelated series.
2023-07-14 16:03:14 +03:00
Avi Kivity
a5921e4923 cql3: statement_restrictions: explicitly intitialize some expression fields
_partition_key_restrictions, _clustering_columns_restrictions, and
_nonprimary_key_restrictions are currently default-initialized. As
we're about to remove expression's default constructor, we need
to initialize them with something.

Use conjunction({}). Not only is this what the default constructor does,
that's what those fields' manipulators assume - they adjust field x
using make_conjunction(y, x). This dates to expression's roots as
a replacement for restrictions.
2023-07-14 15:57:41 +03:00
Avi Kivity
f94eb708e9 cql3: statement_restrictions: avoid expression's default constructor when classifying restrictions
We have some gnarly code that classifies restrictions by the column
they restrict. This uses std::unordered_map::operatorp[], which uses
the value's default constructor. This happens to be "expression", and
as we're about to remove the default constructor, this won't do.

Fix by using try_emplace(), which makes the code nicer and more
efficient. It could be further improved, but it's better to demolish it
instead.
2023-07-14 15:52:03 +03:00
Avi Kivity
1f9a999c26 cql3: statement_restrictions: clean up dead code
We have plenty of code marked with #if 0. Once it was an indication
of missing functionality, but the code has evolved so much it's
useless as an indication and only a distraction.

Delete it.

Closes #14511
2023-07-07 11:08:10 +02:00
Avi Kivity
778ae2b461 cql3: expression: introduce temporaries
Temporaries are similar to bind variables - they are values provided from
outside the expression. While bind variables are provided by the user, temporaries
are generated internally.

The intended use is for aggregate accumulator storage. Currently aggregates
store the accumulator in aggregate_function_selector::_accumulator, which
means the entire selector hierarchy must be cloned for every query. With
expressions, we can have a single expression object reused for many computations,
but we need a way to inject the accumulator into an aggregation, which this
new expression element provides.
2023-07-03 19:45:17 +03:00
Avi Kivity
b858a4669d cql3: expr: break up expression.hh header
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:

 - moving some restrictions-related definitions to
   the existing expr/restrictions.hh
 - moving evaluation related names to a new header
   expr/evaluate.hh
 - move utilities to a new header
   expr/expr-utilities.hh

expression.hh contains only expression definitions and the most
basic and common helpers, like printing.
2023-06-22 14:21:03 +03:00
Avi Kivity
6c0f8a73c5 cql3: statement_restrictions: deinline
Reduce future header fan-in by deinlining functions. These are
all on the prepare path.
2023-06-22 14:19:43 +03:00
Jan Ciolek
9223cd8c3a statement_restrictions: add get_not_null_columns()
The IS NOT NULL restrictions are handled in a special way.
Instead of putting them together with other restrictions,
statement_restrictions collects all columns restricted
by IS NOT NULL and puts them in the _not_null_columns field.

Add a getter to access this set of columns.
The field is private, so it can't be accessed without
a function that explictily exposes it.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-05-17 16:12:10 +02:00
Kefu Chai
0a3a254284 cql3: do not capture reference to temporary value
`data_dictionary::database::find_column_family()` return a temporary value,
and `data_dictionary::table::get_index_manager()` returns a reference in
this temporary value, so we cannot capture this reference and use it after
the expression is evaluated. in this change, we keep the return value
of `find_column_family()` by value, to extend the lifecycle of the return
value of `get_index_manager()`.

this should address the warning from GCC-13, like:

```
/home/kefu/dev/scylladb/cql3/restrictions/statement_restrictions.cc:519:15: error: possibly dangling reference to a temporary [-Werror=dangling-reference]
  519 |         auto& sim = db.find_column_family(_schema).get_index_manager();
      |               ^~~
```

Fixes #13727
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13728
2023-05-01 22:39:48 +03:00
Jan Ciolek
be8ef63bf5 cql3: remove expr::token
Let's remove expr::token and replace all of its functionality with expr::function_call.

expr::token is a struct whose job is to represent a partition key token.
The idea is that when the user types in `token(p1, p2) < 1234`,
this will be internally represented as an expression which uses
expr::token to represent the `token(p1, p2)` part.

The situation with expr::token is a bit complicated.
On one hand side it's supposed to represent the partition token,
but sometimes it's also assumed that it can represent a generic
call to the token() function, for example `token(1, 2, 3)` could
be a function_call, but it could also be expr::token.

The query planning code assumes that each occurence of expr::token
represents the partition token without checking the arguments.
Because of this allowing `token(1, 2, 3)` to be represented
as expr::token is dangerous - the query planning
might think that it is `token(p1, p2, p3)` and plan the query
based on this, which would be wrong.

Currently expr::token is created only in one specific case.
When the parser detects that the user typed in a restriction
which has a call to `token` on the LHS it generates expr::token.
In all other cases it generates an `expr::function_call`.
Even when the `function_call` represents a valid partition token,
it stays a `function_call`. During preparation there is no check
to see if a `function_call` to `token` could be turned into `expr::token`.
This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented
as `expr::token` and the query planner handles that, but sometimes it might
be represented as `function_call`, which the query planner doesn't handle.

There is also a problem because there's a lot of duplication
between a `function_call` and `expr::token`. All of the evaluation
and preparation is the same for `expr::token` as it's for a `function_call`
to the token function. Currently it's impossible to evaluate `expr::token`
and preparation has some flaws, but implementing it would basically
consist of copy-pasting the corresponding code from token `function_call`.

One more aspect is multi-table queries. With `expr::token` we turn
a call to the `token()` function into a struct that is schema-specific.
What happens when a single expression is used to make queries to multiple
tables? The schema is different, so something that is representad
as `expr::token` for one schema would be represented as `function_call`
in the context of a different schema.
Translating expressions to different tables would require careful
manipulation to convert `expr::token` to `function_call` and vice versa.
This could cause trouble for index queries.

Overall I think it would be best to remove expr::token.

Although having a clear marker for the partition token
is sometimes nice for query planning, in my opinion
the pros are outweighted by the cons.
I'm a big fan of having a single way to represent things,
having two separate representations of the same thing
without clear boundaries between them causes trouble.

Instead of having expr::token and function_call we can
just have the function_call and check if it represents
a partition token when needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:11:31 +02:00
Jan Ciolek
6e0ae59c5a cql3: keep a schema in visitor for extract_clustering_prefix_restrictions
The schema will be needed once we remove expr::token
and switch to using expr::is_partition_token_for_schema,
which requires a schema arguments.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:11:31 +02:00
Jan Ciolek
551135e83f cql3: keep a schema inside the visitor for extract_partition_range
The schema will be needed once we remove expr::token
and switch to using expr::is_partition_token_for_schema,
which requires a schema arguments.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:11:30 +02:00
Jan Ciolek
096efc2f38 cql3/expr: split possible_lhs_values into column and token variants
The possible_lhs_values takes an expression and a column
and finds all possible values for the column that make
the expression true.

Apart from finding column values it's also capable of finding
all matching values for the partition key token.
When a nullptr column is passed, possible_lhs_values switches
into token values mode and finds all values for the token.

This interface isn't ideal.
It's confusing to pass a nullptr column when one wants to
find values for the token. It would be better to have a flag,
or just have a separate function.

Additionally in the future expr::token will be removed
and we will use expr::is_partition_token_for_schema
to find all occurences of the partition token.
expr::is_partition_token_for_schema takes a schema
as an argument, which possible_lhs_values doesn't have,
so it would have to be extended to get the schema from
somewhere.

To fix these two problems let's split possible_lhs_values
into two functions - one that finds possible values for a column,
which doesn't require a schema, and one that finds possible values
for the partition token and requires a schema:

value_set possible_column_values(const column_definition* col, const expression& e, const query_options& options);
value_set possible_partition_token_values(const expression& e, const query_options& options, const schema& table_schema);

This will make the interface cleaner and enable smooth transition
once expr::token is removed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:53 +02:00
Jan Ciolek
ad5c931102 cql3/expr: add a schema argument to expr::replace_token
Just like has_token, replace_token will use
expr::is_partition_token_for_schema to find all instance
of the partition token to replace.

Let's prepare for this change by adding a schema argument
to the function before making the big change.

It's unsued at the moment, but having a separate commit
should make it easier to review.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:52 +02:00
Jan Ciolek
18879aad6f cql3/expr: add a schema argument to expr::has_token
In the future expr::token will be removed and checking
whether there is a partition token inside an expression
will be done using expr::is_partition_token_for_schema.

This function takes a schema as an argument,
so all functions that will call it also need
to get the schema from somewhere.

Right now it's an unused argument, but in the future
it will be used. Adding it in a separate commit
makes it easier to review.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:52 +02:00
Jan Ciolek
90b3b85bd0 cql3: use statement_restrictions::has_token_restrictions() wherever possible
The statement_restrictions class has a method called has_token_restriction().
This method checks whether the partition key restrictions contain expr::token.

Let's use this function in all applicable places instead of manually calling has_token().

In the future has_token() will have an additional schema argument,
so eliminating calls to has_token() will simplify the transition.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-04-29 13:04:52 +02:00
Kefu Chai
c37f4e5252 treewide: use fmt::join() when appropriate
now that fmtlib provides fmt::join(). see
https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view
there is not need to revent the wheel. so in this change, the homebrew
join() is replaced with fmt::join().

as fmt::join() returns an join_view(), this could improve the
performance under certain circumstances where the fully materialized
string is not needed.

please note, the goal of this change is to use fmt::join(), and this
change does not intend to improve the performance of existing
implementation based on "operator<<" unless the new implementation is
much more complicated. we will address the unnecessarily materialized
strings in a follow-up commit.

some noteworthy things related to this change:

* unlike the existing `join()`, `fmt::join()` returns a view. so we
  have to materialize the view if what we expect is a `sstring`
* `fmt::format()` does not accept a view, so we cannot pass the
  return value of `fmt::join()` to `fmt::format()`
* fmtlib does not format a typed pointer, i.e., it does not format,
  for instance, a `const std::string*`. but operator<<() always print
  a typed pointer. so if we want to format a typed pointer, we either
  need to cast the pointer to `void*` or use `fmt::ptr()`.
* fmtlib is not able to pick up the overload of
  `operator<<(std::ostream& os, const column_definition* cd)`, so we
  have to use a wrapper class of `maybe_column_definition` for printing
  a pointer to `column_definition`. since the overload is only used
  by the two overloads of
  `statement_restrictions::add_single_column_parition_key_restriction()`,
  the operator<< for `const column_definition*` is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 20:34:18 +08:00
Avi Kivity
6aa91c13c5 Merge 'Optimize topology::compare_endpoints' from Benny Halevy
The code for compare_endpoints originates at the dawn of time (bc034aeaec)
and is called on the fast path from storage_proxy via `sort_by_proximity`.

This series considerably reduces the function's footprint by:
1. carefully coding the many comparisons in the function so to reduce the number of conditional banches (apparently the compiler isn't doing a good enough job at optimizing it in this case)
2. avoid sstring copy in topology::get_{datacenter,rack}

Closes #12761

* github.com:scylladb/scylladb:
  topology: optimize compare_endpoints
  to_string: add print operators for std::{weak,partial}_ordering
  utils: to_sstring: deinline std::strong_ordering print operator
  move to_string.hh to utils/
  test: network_topology: add test_topology_compare_endpoints
2023-03-07 15:17:19 +02:00
Benny Halevy
25ebc63b82 move to_string.hh to utils/
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-15 11:09:04 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
0b418fa7cf cql3, transport, tests: remove "unset" from value type system
The CQL binary protocol introduced "unset" values in version 4
of the protocol. Unset values can be bound to variables, which
cause certain CQL fragments to be skipped. For example, the
fragment `SET a = :var` will not change the value of `a` if `:var`
is bound to an unset value.

Unsets, however, are very limited in where they can appear. They
can only appear at the top-level of an expression, and any computation
done with them is invalid. For example, `SET list_column = [3, :var]`
is invalid if `:var` is bound to unset.

This causes the code to be littered with checks for unset, and there
are plenty of tests dedicated to catching unsets. However, a simpler
way is possible - prevent the infiltration of unsets at the point of
entry (when evaluating a bind variable expression), and introduce
guards to check for the few cases where unsets are allowed.

This is what this long patch does. It performs the following:

(general)

1. unset is removed from the possible values of cql3::raw_value and
   cql3::raw_value_view.

(external->cql3)

2. query_options is fortified with a vector of booleans,
   unset_bind_variable_vector, where each boolean corresponds to a bind
   variable index and is true when it is unset.
3. To avoid churn, two compatiblity structs are introduced:
   cql3::raw_value{,_view}_vector_with_unset, which can be constructed
   from a std::vector<raw_value{,_view/}>, which is what most callers
   have. They can also be constructed with explicit unset vectors, for
   the few cases they are needed.

(cql3->variables)

4. query_options::get_value_at() now throws if the requested bind variable
   is unset. This replaces all the throwing checks in expression evaluation
   and statement execution, which are removed.
5. A new query_options::is_unset() is added for the users that can tolerate
   unset; though it is not used directly.
6. A new cql3::unset_operation_guard class guards against unsets. It accepts
   an expression, and can be queried whether an unset is present. Two
   conditions are checked: the expression must be a singleton bind
   variable, and at runtime it must be bound to an unset value.
7. The modification_statement operations are split into two, via two
   new subclasses of cql3::operation. cql3::operation_no_unset_support
   ignores unsets completely. cql3::operation_skip_if_unset checks if
   an operand is unset (luckily all operations have at most one operand that
   tolerates unset) and applies unset_operation_guard to it.
8. The various sites that accept expressions or operations are modified
   to check for should_skip_operation(). This are the loops around
   operations in update_statement and delete_statement, and the checks
   for unset in attributes (LIMIT and PER PARTITION LIMIT)

(tests)

9. Many unset tests are removed. It's now impossible to enter an
   unset value into the expression evaluation machinery (there's
   just no unset value), so it's impossible to test for it.
10. Other unset tests now have to be invoked via bind variables,
   since there's no way to create an unset cql3::expr::constant.
11. Many tests have their exception message match strings relaxed.
   Since unsets are now checked very early, we don't know the context
   where they happen. It would be possible to reintroduce it (by adding
   a format string parameter to cql3::unset_operation_guard), but it
   seems not to be worth the effort. Usage of unsets is rare, and it is
   explicit (at least with the Python driver, an unset cannot be
   introduced by ommission).

I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't
recognize unsets) with cql3::maybe_unset_value (that does), but that
caused huge amounts of churn, so I abandoned that in favor of the
current approach.

Closes #12517
2023-01-16 21:10:56 +02:00
Avi Kivity
ea901fdb9d cql3: expr: fold null into untyped_constant/constant
Our `null` expression, after the prepare stage, is redundant with a
`constant` expression containing the value NULL.

Remove it. Its role in the unprepared stage is taken over by
untyped_constant, which gains a new type_class enumeration to
represent it.

Some subtleties:
 - Usually, handling of null and untyped_constant, or null and constant
   was the same, so they are just folded into each other
 - LWT "like" operator now has to discriminate between a literal
   string and a literal NULL
 - prepare and test_assignment were folded into the corresponing
   untyped_constant functions. Some care had to be taken to preserve
   error messages.

Closes #12118
2022-11-29 11:02:18 +02:00
Petr Gusev
f9936bb0cb cql: DELETE with null value for IN parameter should be forbidden
If a DELETE statement contains an IN operator and the
parameter value for it is NULL, this should also trigger
an error. This is in line with how Cassandra
behaves in this case.
2022-11-23 21:39:23 +04:00
Petr Gusev
c123f94110 cql: add column name to the error message in case of null primary key component
It's more user-friendly and the error message
corresponds to what Cassandra provides in this case.
2022-11-23 21:39:23 +04:00
Petr Gusev
89a5397d7c cql: wrong column name in error messages
If INSERT or DELETE statement is missing a non-last element of
the primary key, the error message generated contains
an invalid column name.

The problem occurs if the query contains a column with the list type,
otherwise
statement_restrictions::process_clustering_columns_restrictions
checks that all the components of the key are specified.

Fixes: #12046
2022-11-23 21:39:16 +04:00
Jan Ciolek
52bbc1065c cql3: allow lists of IN elements to be NULL
Requests like `col IN NULL` used to cause
an error - Invalid null value for colum col.

We would like to allow NULLs everywhere.
When a NULL occurs on either side
of a binary operator, the whole operation
should just evaluate to NULL.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #11775
2022-10-13 15:11:32 +02:00
Michał Radwański
2babee2cdc column_computation.hh, schema.cc: compute_value interface refactor
The compute_value function of column_computation has had previously the
following signature:
    virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;

This is superfluous, since never in the history of Scylla, the last
parameter (row) was used in any implentation, and never did it happen
that it returned bytes_opt. The absurdity of this interface can be seen
especially when looking at call sites like following, where dummy empty
row was created:
```
    token_column.get_computation().compute_value(
            *_schema, pkv_linearized, clustering_row(clustering_key_prefix::make_empty()));
```
2022-08-14 10:29:13 +03:00
Avi Kivity
a5dd588465 cql3: statement_restrictions: accept a single expression rather than a vector
Move closer to the goal of accepting a generic expression for WHERE
clause by accepting a generic expression in statement_restrictions. The
various callers will synthesize it from a vector of terms.
2022-07-22 20:14:48 +03:00
Avi Kivity
43aca25496 cql3: statement_restrictions: merge if and for
A `for` loop does nothing on an empty container, so no need for an
extra `if` for that condition. Drop the `if`.
2022-07-22 20:14:48 +03:00
Jan Ciolek
599bcd6ea7 cql3: Remove all remaining restrictions code
The classes restriction, restrictions and its children
aren't used anywhere now and can be safely removed.

Some includes need to be modified for the code to compile.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Jan Ciolek
b269e5a24d cql3: Remove initial_key_restrictions
initial_key restrictions was a class used by statement_restrictions
to represent empty restrictions of different types and simplify
restriction merging logic. They are not used anymore and can
be removed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Jan Ciolek
d7e954307f cql3: Remove _new from _new_nonprimary_key_restrictions
The _new prefix was used to distinguish the new field
from the old represenation.

Now the new field has fully replaced the old one
and _new can be removed from its name.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Jan Ciolek
b6ae72f095 cql3: Remove _nonprimary_key_restrictions field
All code that made use of _nonprimary_key_restrictions
has been modified to use _new_nonprimary_key_restrictions
instead.

The field can be removed.

Additionally the old code responsible for adding new restrictions
can be fully removed, everything is now done using add_restriction.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Jan Ciolek
9d1ba07471 cql3: Reimplement uses of _nonprimary_key_restrictions using expression
All parts of the code that use _nonprimary_key_restrictions
are changed to use _new_nonprimary_key_restrictions instead.
I decided not to split this into multiple commits,
as there isn't a lot of changes and they are
analogous to the ones done before for partition
and clustering columns.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:30 +02:00
Jan Ciolek
2c28554390 cql3: Keep a map of single column nonprimary key restrictions
Keep a map of extracted restrictions for each restricted nonprimar column.
This map will be useful, just like the ones for clustering and partition
columns.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:30 +02:00