Commit Graph

114 Commits

Author SHA1 Message Date
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Jan Ciolek
e458340821 cql3: Remove term
term isn't used anywhere now. We can remove it and all classes that derive from it.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-11-04 15:56:45 +01:00
Jan Ciolek
890c8f4026 cql3: Remove term in operations
Replace term with expression in cql3/operation and its children.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-11-04 15:56:45 +01:00
Jan Ciolek
a82351dc79 cql3: expr: Add constant::view() method
Add a method that returns raw_value_view to expr::constant.

It's added for convenience - without it in many places
we would have to write my_value.value.to_view().

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-10-28 15:14:52 +02:00
Jan Ciolek
9c40516071 cql3: expr: Add receiver to expr::bind_variable
bind_variable used to have only the type of bound value.
Now this type is replaced with receiver, which describes information about column corresponding to this value.
A receiver contains type, column name, etc.

Receiver is needed in order to implement fill_prepare_context in the next commit.
It's an argument of prepare_context::add_variable_specification.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-10-28 15:14:52 +02:00
Jan Ciolek
d61b2dbf8a cql3: Implement term::to_expression for collections
Each collection delayed_value can now be converted to expression.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Jan Ciolek
c40f227c14 cql3: Implement term::to_expression for marker classes
Implement to_expression for non terminals that represent a bind marker.
For now each bind marker has a shape describing where it is used, but hopefully this can be removed in the future.

In order to evaluate a bind_variable we need to know its type.
The type is needed to pass to constant and to validate the value.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Jan Ciolek
f86a1270b0 cql3: Add term::to_expression method
Add a method that converts given term to the matching expression.
It will be used as an intermediate step when implementing evaluate(expression).
evaluate(term) will convert the term to the expression and then call evaluate(expression).

For terminals this is simply calling get() to serialize the value.
For non-terminals the implementation is more complicated and will be implemeted in the following commits.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Jan Ciolek
3d23d6f9dd cql3: term: Remove get_elements and multi_item_terminal from terminals
terminal now isn't used as a final value anywhere.
Remove things that are no longer needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-21 16:33:00 +02:00
Jan Ciolek
2523c9ba48 cql3: Replace most uses of terminal with expr::constant
constant is now ready to replace terminal as a final value representation.
Replace bind() with evaluate and shared_ptr<terminal> with constant.

We can't get rid of terminal yet. Sometimes terminal is converted back
to term, which constant can't do. This won't be a problem once we
replace term with expression.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-21 16:28:15 +02:00
Jan Ciolek
221ed38e94 cql3: Replace all uses of bind_and_get with evaluate_to_raw_view
Start using evaluate_to_raw_value instead of bind_and_get.
This is a step towards using only evaluate.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-21 16:20:30 +02:00
Jan Ciolek
adaf6e5eec cql3: expr: Add evaluate_IN_list
A list representing IN values might contain NULLs before evaluation.
We can remove them during evaluation, because nothing equals NULL.
If we don't remove them, there are gonna be errors, because a list can't contain NULLs.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-21 16:20:29 +02:00
Jan Ciolek
2936adc570 cql3: Move data_type to terminal, make get_value_type non-virtual
Every class now has implementation of get_value_type().
We can simply make base class keep the data_type.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-21 16:20:28 +02:00
Jan Ciolek
6bf6b03d12 cql3: lists: Implement get_value_type in lists.hh
To convert a terminal to expr::constant we need know the value type.
Implement getting value type for terminals in lists.hh.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-21 16:13:36 +02:00
Jan Ciolek
a964827696 cql3: expr: Add expr::evaluate
Adds the functions:
constant evaluate(term*, const query_options&);
raw_value_view evaluate(term*, const query_options&);

These functions take a term, bind it and convert the terminal
to constant or raw_value_view.

In the future these functions will take expression instead of term.
For that to happen bind() has to be implemented on expression,
this will be done later.

Also introduces terminal::get_value_type().
In order to construct a constant from terminal we need to know the type.
It will be implemented in the following commits.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-21 16:13:34 +02:00
Jan Ciolek
561e6b0a59 cql3: Make collection term get() use the internal serialization format
A term should always be serialized using the internal cql serialization format.
A term represents a value received from the driver,
but for every use we are going to need it in the internal serialization format.

Other places in the code already do this, for example see list_prepare_term,
it calls value.bind(query_options::DEFAULT) to evaluate a collection_constructor.
query_options::DEFAULT has the latest cql serialization format.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-21 16:05:09 +02:00
Avi Kivity
8c0f2f9e3d Revert "Merge 'cql3: Add expr::constant to replace terminal' from Jan Ciołek"
This reverts commit e9343fd382, reversing
changes made to 27138b215b. It causes a
regression in v2 serialization_format support:

collection_serialization_with_protocol_v2_test fails with: marshaling error: read_simple_bytes - not enough bytes (requested 1627390306, got 3)

Fixes #9360
2021-09-20 15:15:09 +03:00
Jan Ciolek
fd98d40b75 cql3: term: Remove get_elements and multi_item_terminal from terminals
terminal now isn't used as a final value anywhere.
Remove things that are no longer needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-13 17:47:17 +02:00
Jan Ciolek
a0ec2113ae cql3: Replace most uses of terminal with expr::constant
constant is now ready to replace terminal as a final value representation.
Replace bind() with evaluate and shared_ptr<terminal> with constant.

We can't get rid of terminal yet. Sometimes terminal is converted back
to term, which constant can't do. This won't be a problem once we
replace term with expression.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-13 17:47:17 +02:00
Jan Ciolek
c3fb2f2b57 cql3: Replace all uses of bind_and_get with evaluate_to_raw_view
Start using evaluate_to_raw_value instead of bind_and_get.
This is a step towards using only evaluate.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-13 17:44:06 +02:00
Jan Ciolek
25caa1950d cql3: expr: Add evaluate_IN_list
A list representing IN values might contain NULLs before evaluation.
We can remove them during evaluation, because nothing equals NULL.
If we don't remove them, there are gonna be errors, because a list can't contain NULLs.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-13 17:03:23 +02:00
Jan Ciolek
9b6b2899ed cql3: Move data_type to terminal, make get_value_type non-virtual
Every class now has implementation of get_value_type().
We can simply make base class keep the data_type.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-13 17:03:23 +02:00
Jan Ciolek
0b3436598a cql3: lists: Implement get_value_type in lists.hh
To convert a terminal to expr::constant we need know the value type.
Implement getting value type for terminals in lists.hh.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-13 17:03:23 +02:00
Jan Ciolek
844bf2d472 cql3: expr: Add expr::evaluate
Adds the functions:
constant evaluate(term*, const query_options&);
raw_value_view evaluate(term*, const query_options&);

These functions take a term, bind it and convert the terminal
to constant or raw_value_view.

In the future these functions will take expression instead of term.
For that to happen bind() has to be implemented on expression,
this will be done later.

Also introduces terminal::get_value_type().
In order to construct a constant from terminal we need to know the type.
It will be implemented in the following commits.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-13 17:03:23 +02:00
Avi Kivity
4defb42c86 cql3: lists, expr: move list raw functions around
Move them closer to prepare related functions for modification.
2021-08-26 15:08:14 +03:00
Avi Kivity
51f62d5953 cql3: constants: extricate cql3::constants::null_literal::null_value from null_literal
null_literal (which is in the term::raw domain) will be converted to an
expression, so unnest the nested class null_value (which is in the term
domain and is not converted now).
2021-08-26 14:44:21 +03:00
Avi Kivity
660be97028 cql3: term::raw, multi_column_raw: unify prepare() signatures
In order to replace the term::raw hierarchy with expressions,
we need to unify the signatures of term::raw::prepare() and
term::multi_column_raw::prepare(). This is because we'll only have
one expression type to represent both single values and tuples
(although, different subexpression types will may used).

The difference in the two prepare() signatures is the
`receiver` parameter - which is a (type, name) pair used
to perfom type inference on the expression being prepared,
with the name used to report errors. In a perfect world, this
would just be an expression - a tuple or a singular expression
as the case requires. But we don't have the needed expression
infrastructure yet - general tuples or name-annotated expressions.

Resolve the problem by introducing a variant for the single-value
and tuple. This is more or less creating a mini-expression type
used just for this. Once our expression type grows the needed
capabilities, it can replace this type.

Note that for some cases, this replaces compile-time checks by
runtime checks (which should never trigger). In other cases
the classes really needed both interfaces, so the new variant
is a better fit.
2021-08-26 14:11:42 +03:00
Pavel Solodovnikov
49ddd269ea cql3: rename variable_specifications to prepare_context
The class is repurposed to be more generic and also be able
to hold additional metadata related to function calls within
a CQL statement. Rename all methods appropriately.

Visitor functions in AST nodes (`collect_marker_specification`)
are also renamed to a more generic `fill_prepare_context`.

The name `prepare_context` designates that this metadata
structure is a byproduct of `stmt::raw::prepare()` call and
is needed only for "prepare" step of query execution.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-07-24 14:33:33 +03:00
Avi Kivity
cb8ef1489b cql3: lists: catch polymorphic exceptions by reference
gcc 11 notes that catching polymorphic exceptions is a bad idea;
the resulting copy can slice the exception object. Fix by
capturing by reference.
2021-07-11 17:34:43 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Michał Chojnowski
dcbc053ecd cql3: change the internal type of lists::value from std::vector to chunked_vector
Lists can grow very big. Let's use a chunked vector to prevent large contiguous
allocations.
2021-05-17 17:09:55 +02:00
Michał Chojnowski
4baeea0199 cql3: in multi_item_terminal, return the vector of items by value
Returning by reference requires that the elements are internally stored in in
the multi_item_terminal as a std::vector, but in the next patch we will change
the internal type of lists::value from std::vector to utils::chunked_vector.

The copy is not a problem because all users of multi_item_terminal were copying
the returned vector.
2021-05-17 16:46:28 +02:00
Konstantin Osipov
c83cf1f965 uuid: switch the API to use std::chrono
A follow up for the patch for #7611. This change was requested
during review and moved out of #7611 to reduce its scope.

The patch switches UUID_gen API from using plain integers to
hold time units to units from std::chrono.

For one, we plan to switch the entire code base to std::chrono units,
to ensure type safety. Secondly, using std::chrono units allows to
increase code reuse with template metaprogramming and remove a few
of UUID_gen functions that beceme redundant as a result.

* switch  get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch
  min_time_UUID(), max_time_UUID(), create_time_safe() to
  std::chrono
* remove unused variant of from_unix_timestamp()
* remove unused get_time_UUID_bytes(), create_time_unsafe(),
  redundant get_adjusted_timestamp()
* inline get_raw_UUID_bytes()
* collapse to similar implementations of get_time_UUID()
* switch internal constants to std::chrono
* remove unnecessary unique_ptr from UUID_gen::_instance
Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>
2021-04-06 17:12:54 +03:00
Michał Chojnowski
458878a414 cql3: optimize the deserialization of collections
Before this patch, deserializing a collection from a (prepared) CQL request
involved deserializing every element and serializing it again. Originally this
was a hacky method of validation, and it was also needed to reserialize nested
frozen collections from the CQLv2 format to the CQLv3 format.

But since then we started doing validation separately (before calls to
from_serialized) and CQLv2 became irrelevant, making reserialization of
elements (which, among other things, involves a memory alocation for every
element) pure waste.

This patch adds a faster path for collections in the v3 format, which does not
involve linearizing or reserializing the elements (since v3 is the same as
our internal format).

After this patch, the path from prepared CQL statements to
atomic_cell_or_collection is almost completely linearization-free. The last
remaining place is collection_mutation_description, where map keys are
linearized.
2021-04-01 10:44:21 +02:00
Michał Chojnowski
0bb959e890 cql3: don't linearize elements of lists, tuples, and user types
This patch switches the type used to store collection elements inside the
intermediate form used in lists::value, tuples::value etc. from bytes
to managed_bytes. After this patch, tuple and list elements are only linearized
in from_serialized, which will be corrected soon.
This commit introduces some additional copies in expression.cc, which
will be dealt with in a future commit.
2021-04-01 10:44:21 +02:00
Michał Chojnowski
8927aaf225 cql3: output managed_bytes instead of bytes in get_with_protocol_version 2021-04-01 10:44:21 +02:00
Michał Chojnowski
b9322a6b71 cql3: switch users of cql3::raw_value_view to internals-independent API
We want to change the internals of cql3::raw_value{_view}.
However, users of cql3::raw_value and cql3::raw_value_view often
use them by extracting the internal representation, which will be different
after the planned change.

This commit prepares us for the change by making all accesses to the value
inside cql3::raw_value(_view) be done through helper methods which don't expose
the internal representation publicly.

After this commit we are free to change the internal representation of
raw_value_{view} without messing up their users.
2021-04-01 10:42:04 +02:00
Konstantin Osipov
b4f875f08e uuid: reduce code dependency on UUID_gen.hh
Do not include UUID_gen.hh in trace_state.hh and lists.hh
to reduce header level dependency on it.

Message-Id: <20210127173114.725761-2-kostja@scylladb.com>
2021-01-27 20:08:29 +02:00
Konstantin Osipov
232ce6f611 lists: rewrite list prepend to use append machinery
Rewrite list prepend to use the same machinery
as append, and thus produce correct results when used in LWT.

After this patch, list prepend begins to honor user supplied timestamps.

If a user supplied timestamp for prepend is less than 2010-01-01 00:00:00
an exception is thrown.

Fixes #7611
2021-01-21 13:03:59 +03:00
Konstantin Osipov
2b8ce83eea lists: use query timestamp for list cell values during append
Scylla list cells are represented internally as a map of
timeuuid => value. To append a new value to a list
the coordinator generates a timeuuid reflecting the current time as key
and adds a value to the map using this key.

Before this patch, Scylla always generated a timeuuid for a new
value, even if the query had a user supplied or LWT timestamp.
This could break LWT linearizability. User supplied timestamps were
ignored.

This is reported as https://github.com/scylladb/scylla/issues/7611

A statement which appended multiple values to a list or a BATCH
generated an own microsecond-resolution timeuuid for each value:

BEGIN BATCH
  UPDATE ... SET a = a + [3]
  UPDATE ... SET a = a + [4]
APPLY BATCH

UPDATE ... SET a = a + [3, 4]

To fix the bug, it's necessary to preserve monotonicity of
timeuuids within a batch or multi-value append, but make sure
they all use the microsecond time, as is set by LWT or user.

To explain the fix, it's first necessary to recall the structure
of time-based UUIDs:

60 bits: time since start of GMT epoch, year 1582, represented
         in 100-nanosecond units
4 bits:  version
14 bits: clock sequence, a random number to avoid duplicates
         in case system clock is adjusted
2 bits:  type
48 bits: MAC address (or other hardware address)

The purpose of clockseq bits is as defined in
https://tools.ietf.org/html/rfc4122#section-4.1.5
is to reduce the probability of UUID collision in case clock
goes back in time or node id changes. The implementation should reset it
whenever one of these events may occur.

Since LWT microsecond time is guaranteed to be
unique by Paxos, the RFC provisioning for clockseq and MAC
slots becomes excessive.

The fix thus changes timeuuid slot content in the following way:
- time component now contains the same microsecond time for all
  values of a statement or a batch. The time is unique and monotonic in
  case of LWT. Otherwise it's most always monotonic, but may not be
  unique if two timestamps are created on different coordinators.
- clockseq component is used to store a sequence number which is
  unique and monotonic for all values within the statement/batch.
- to protect against time back-adjustments and duplicates
  if time is auto-generated, MAC component contains a random (spoof)
  MAC address, re-created on each restart. The address is different
  at each shard.

The change is made for all sources of time: user, generated, LWT.
Conditioning the list key generation algorithm on the source of
time would unnecessarily complicate the code while not increase
quality (uniqueness) of created list keys.

Since 14 bits of clockseq provide us with only 16383 distinct slots
per statement or batch, 3 extra bits in nanosecond part of the time
are used to extend the range to 131071 values per statement/batch.
If the rang is exceeded beyond the limit, an exception is produced.

A twist on the use of clockseq to extend timeuuid uniqueness is
that Scylla, like Cassandra, uses int8 compare to compare lower
bits of timeuuid for ordering. The patch takes this into account
and sign-complements the clockseq value to make it monotonic
according to the legacy compare function.

Fixes #7611

test: unit (dev)
2021-01-21 13:03:59 +03:00
Dejan Mircevski
3aa80f47fe abstract_type: Rework unreversal methods
Replace two methods for unreversal (`as` and `self_or_reversed`) with
a new one (`without_reversed`).  More flexible and better named.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>

Closes #7889
2021-01-10 19:30:12 +02:00
Dejan Mircevski
6bb10fcf36 cql3: Fix handling of reverse-order lists
When the clustering order is reversed on a list column, the column type
is reversed_type_impl, not list_type_impl.  Therefore, we have to check
for both reversed type and list type in some places.

This patch handles reverse types in enough places to make
test_clustering_key_reverse_frozen_list pass.  However, it leaves
other places (invocations of is_list() and *_cast<list_type_impl>())
as they currently are; some are protected by callers from being
invoked on reverse types, but some are quite possibly bugs untriggered
by existing tests.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-01-07 13:22:20 +02:00
Michał Chojnowski
3f3a10c588 cql3: lists: remove unnecessary use of with_linearized
We can validate directly from fragmented buffers now.
2020-12-11 09:53:07 +01:00
Michał Chojnowski
c43ef3951b cql3: lists: remove unneeded use of with_linearized
We can deserialize directly from fragmented buffers now.
2020-12-04 09:19:39 +01:00
Michał Chojnowski
64e64fd2b3 cql3: lists: don't linearize in value::from_serialized
We can deserialize directly from fragmented buffers now.
2020-12-03 10:57:07 +01:00
Piotr Jastrzebski
49fd17a4ef cql3: Improve error messages for markers binding
When binding prepared statement it is possible that values being binded
are not correct. Unfortunately before this patch, the error message
was only saying what type got a wrong value. This was not very helpful
because there could be multiple columns with the same type in the table.
We also support collections so sometimes error was saying that there
is a wrong value for a type T but the affected column was actually of
type collection<T>.

This patch adds information about a column name that got the wrong
value so that it's easier to find and fix the problem.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <90b70a7e5144d7cb3e53876271187e43fd8eee25.1597832048.git.piotr@scylladb.com>
2020-08-20 12:51:36 +03:00
Rafael Ávila de Espíndola
ad6d65dbbd Everywhere: Explicitly instantiate make_shared
seastar::make_shared has a constructor taking a T&&. There is no such
constructor in std::make_shared:

https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared

This means that we have to move from

    make_shared(T(...)

to

    make_shared<T>(...)

If we don't want to depend on the idiosyncrasies of
seastar::make_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-07-21 10:33:49 -07:00
Calle Wilund
3b74b9585f cql3::lists: Fix setter_by_uuid not handing null value
Fixes #6828

When using the scylla list index from UUID extension,
null values were not handled properly causing throws
from underlying layer.
2020-07-15 13:52:09 +02:00
Pavel Solodovnikov
c4bbeb80db cql3: pass column_specification by ref to cql3::assignment_testable functions
This patch changes the signatures of `test_assignment` and
`test_all` functions to accept `cql3::column_specification` by
const reference instead of shared pointer.

Mostly a cosmetic change reducing overall shared_ptr bloat in
cql3 code.

Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200529195249.767346-1-pa.solodovnikov@scylladb.com>
2020-05-30 09:49:29 +03:00
Pavel Solodovnikov
f6e765b70f cql3: pass column_specification via lw_shared_ptr
`column_specification` class is marked as "final": it's safe
to use non-polymorphic pointer "lw_shared_ptr" instead of a
more generic "shared_ptr".

tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200427084016.26068-1-pa.solodovnikov@scylladb.com>
2020-04-27 12:47:42 +03:00