This commit eliminates unused boost header includes from the tree.
Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22997
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:
- moving some restrictions-related definitions to
the existing expr/restrictions.hh
- moving evaluation related names to a new header
expr/evaluate.hh
- move utilities to a new header
expr/expr-utilities.hh
expression.hh contains only expression definitions and the most
basic and common helpers, like printing.
When we start allowing NULL in lists in some contexts, the exact
location where an error is raised (when it's disallowed) will
change. To prepare for that, relax the exception check to just
ensure the word NULL is there, without caring about the exact
wording.
Lists allow NULL in some contexts (bind variables for LWT "IN ?"
conditions), but not in most others. Currently, the implementation
just disallows NULLs in list values, and the cases where it is allowed
are hacked around. To reduce the special cases, we'll allow lists
to have NULLs, and just restrict them for storage. This is similar
to how scalar values can be NULL, but not when they are part of a
partition key.
To prepare for the transition, identify the locations where lists
(and sets, which share the same storage) are stored as frozen
values and add a NULL check there. Non-frozen lists already have the
check. Since sets share the same format as lists, apply the same to
them.
No actual checks are done yet, since NULLs are impossible. This
is just a stub.
The CQL binary protocol introduced "unset" values in version 4
of the protocol. Unset values can be bound to variables, which
cause certain CQL fragments to be skipped. For example, the
fragment `SET a = :var` will not change the value of `a` if `:var`
is bound to an unset value.
Unsets, however, are very limited in where they can appear. They
can only appear at the top-level of an expression, and any computation
done with them is invalid. For example, `SET list_column = [3, :var]`
is invalid if `:var` is bound to unset.
This causes the code to be littered with checks for unset, and there
are plenty of tests dedicated to catching unsets. However, a simpler
way is possible - prevent the infiltration of unsets at the point of
entry (when evaluating a bind variable expression), and introduce
guards to check for the few cases where unsets are allowed.
This is what this long patch does. It performs the following:
(general)
1. unset is removed from the possible values of cql3::raw_value and
cql3::raw_value_view.
(external->cql3)
2. query_options is fortified with a vector of booleans,
unset_bind_variable_vector, where each boolean corresponds to a bind
variable index and is true when it is unset.
3. To avoid churn, two compatiblity structs are introduced:
cql3::raw_value{,_view}_vector_with_unset, which can be constructed
from a std::vector<raw_value{,_view/}>, which is what most callers
have. They can also be constructed with explicit unset vectors, for
the few cases they are needed.
(cql3->variables)
4. query_options::get_value_at() now throws if the requested bind variable
is unset. This replaces all the throwing checks in expression evaluation
and statement execution, which are removed.
5. A new query_options::is_unset() is added for the users that can tolerate
unset; though it is not used directly.
6. A new cql3::unset_operation_guard class guards against unsets. It accepts
an expression, and can be queried whether an unset is present. Two
conditions are checked: the expression must be a singleton bind
variable, and at runtime it must be bound to an unset value.
7. The modification_statement operations are split into two, via two
new subclasses of cql3::operation. cql3::operation_no_unset_support
ignores unsets completely. cql3::operation_skip_if_unset checks if
an operand is unset (luckily all operations have at most one operand that
tolerates unset) and applies unset_operation_guard to it.
8. The various sites that accept expressions or operations are modified
to check for should_skip_operation(). This are the loops around
operations in update_statement and delete_statement, and the checks
for unset in attributes (LIMIT and PER PARTITION LIMIT)
(tests)
9. Many unset tests are removed. It's now impossible to enter an
unset value into the expression evaluation machinery (there's
just no unset value), so it's impossible to test for it.
10. Other unset tests now have to be invoked via bind variables,
since there's no way to create an unset cql3::expr::constant.
11. Many tests have their exception message match strings relaxed.
Since unsets are now checked very early, we don't know the context
where they happen. It would be possible to reintroduce it (by adding
a format string parameter to cql3::unset_operation_guard), but it
seems not to be worth the effort. Usage of unsets is rare, and it is
explicit (at least with the Python driver, an unset cannot be
introduced by ommission).
I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't
recognize unsets) with cql3::maybe_unset_value (that does), but that
caused huge amounts of churn, so I abandoned that in favor of the
current approach.
Closes#12517
An expr::constant is an expression that happens to represent a constant,
so it's too heavyweight to be used for evaluation. Right now the extra
weight is just a type (which causes extra work by having to maintain
the shared_ptr reference count), but it will grow in the future to include
source location (for error reporting) and maybe other things.
Prior to e9b6171b5 ("Merge 'cql3: expr: unify left-hand-side and
right-hand-side of binary_operator prepares' from Avi Kivity"), we had
to use expr::constant since there was not enough type infomation in
expressions. But now every expression carries its type (in programming
language terms, expressions are now statically typed), so carrying types
in values is not needed.
So change evaluate() to return cql3::raw_value. The majority of the
patch just changes that. The rest deals with some fallout:
- cql3::raw_value gains a view() helper to convert to a raw_value_view,
and is_null_or_unset() to match with expr::constant and reduce further
churn.
- some helpers that worked on expr::constant and now receive a
raw_value now need the type passed via an additional argument. The
type is computed from the expression by the caller.
- many type checks during expression evaluation were dropped. This is
a consequence of static typing - we must trust the expression prepare
phase to perform full type checking since values no longer carry type
information.
Closes#10797
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Add a method that returns raw_value_view to expr::constant.
It's added for convenience - without it in many places
we would have to write my_value.value.to_view().
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
bind_variable used to have only the type of bound value.
Now this type is replaced with receiver, which describes information about column corresponding to this value.
A receiver contains type, column name, etc.
Receiver is needed in order to implement fill_prepare_context in the next commit.
It's an argument of prepare_context::add_variable_specification.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Implement to_expression for non terminals that represent a bind marker.
For now each bind marker has a shape describing where it is used, but hopefully this can be removed in the future.
In order to evaluate a bind_variable we need to know its type.
The type is needed to pass to constant and to validate the value.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a method that converts given term to the matching expression.
It will be used as an intermediate step when implementing evaluate(expression).
evaluate(term) will convert the term to the expression and then call evaluate(expression).
For terminals this is simply calling get() to serialize the value.
For non-terminals the implementation is more complicated and will be implemeted in the following commits.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
constant is now ready to replace terminal as a final value representation.
Replace bind() with evaluate and shared_ptr<terminal> with constant.
We can't get rid of terminal yet. Sometimes terminal is converted back
to term, which constant can't do. This won't be a problem once we
replace term with expression.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Start using evaluate_to_raw_value instead of bind_and_get.
This is a step towards using only evaluate.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
A list representing IN values might contain NULLs before evaluation.
We can remove them during evaluation, because nothing equals NULL.
If we don't remove them, there are gonna be errors, because a list can't contain NULLs.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Every class now has implementation of get_value_type().
We can simply make base class keep the data_type.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
To convert a terminal to expr::constant we need know the value type.
Implement getting value type for terminals in lists.hh.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Adds the functions:
constant evaluate(term*, const query_options&);
raw_value_view evaluate(term*, const query_options&);
These functions take a term, bind it and convert the terminal
to constant or raw_value_view.
In the future these functions will take expression instead of term.
For that to happen bind() has to be implemented on expression,
this will be done later.
Also introduces terminal::get_value_type().
In order to construct a constant from terminal we need to know the type.
It will be implemented in the following commits.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
A term should always be serialized using the internal cql serialization format.
A term represents a value received from the driver,
but for every use we are going to need it in the internal serialization format.
Other places in the code already do this, for example see list_prepare_term,
it calls value.bind(query_options::DEFAULT) to evaluate a collection_constructor.
query_options::DEFAULT has the latest cql serialization format.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This reverts commit e9343fd382, reversing
changes made to 27138b215b. It causes a
regression in v2 serialization_format support:
collection_serialization_with_protocol_v2_test fails with: marshaling error: read_simple_bytes - not enough bytes (requested 1627390306, got 3)
Fixes#9360
constant is now ready to replace terminal as a final value representation.
Replace bind() with evaluate and shared_ptr<terminal> with constant.
We can't get rid of terminal yet. Sometimes terminal is converted back
to term, which constant can't do. This won't be a problem once we
replace term with expression.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Start using evaluate_to_raw_value instead of bind_and_get.
This is a step towards using only evaluate.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
A list representing IN values might contain NULLs before evaluation.
We can remove them during evaluation, because nothing equals NULL.
If we don't remove them, there are gonna be errors, because a list can't contain NULLs.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Every class now has implementation of get_value_type().
We can simply make base class keep the data_type.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
To convert a terminal to expr::constant we need know the value type.
Implement getting value type for terminals in lists.hh.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Adds the functions:
constant evaluate(term*, const query_options&);
raw_value_view evaluate(term*, const query_options&);
These functions take a term, bind it and convert the terminal
to constant or raw_value_view.
In the future these functions will take expression instead of term.
For that to happen bind() has to be implemented on expression,
this will be done later.
Also introduces terminal::get_value_type().
In order to construct a constant from terminal we need to know the type.
It will be implemented in the following commits.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
null_literal (which is in the term::raw domain) will be converted to an
expression, so unnest the nested class null_value (which is in the term
domain and is not converted now).
In order to replace the term::raw hierarchy with expressions,
we need to unify the signatures of term::raw::prepare() and
term::multi_column_raw::prepare(). This is because we'll only have
one expression type to represent both single values and tuples
(although, different subexpression types will may used).
The difference in the two prepare() signatures is the
`receiver` parameter - which is a (type, name) pair used
to perfom type inference on the expression being prepared,
with the name used to report errors. In a perfect world, this
would just be an expression - a tuple or a singular expression
as the case requires. But we don't have the needed expression
infrastructure yet - general tuples or name-annotated expressions.
Resolve the problem by introducing a variant for the single-value
and tuple. This is more or less creating a mini-expression type
used just for this. Once our expression type grows the needed
capabilities, it can replace this type.
Note that for some cases, this replaces compile-time checks by
runtime checks (which should never trigger). In other cases
the classes really needed both interfaces, so the new variant
is a better fit.
The class is repurposed to be more generic and also be able
to hold additional metadata related to function calls within
a CQL statement. Rename all methods appropriately.
Visitor functions in AST nodes (`collect_marker_specification`)
are also renamed to a more generic `fill_prepare_context`.
The name `prepare_context` designates that this metadata
structure is a byproduct of `stmt::raw::prepare()` call and
is needed only for "prepare" step of query execution.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Returning by reference requires that the elements are internally stored in in
the multi_item_terminal as a std::vector, but in the next patch we will change
the internal type of lists::value from std::vector to utils::chunked_vector.
The copy is not a problem because all users of multi_item_terminal were copying
the returned vector.
A follow up for the patch for #7611. This change was requested
during review and moved out of #7611 to reduce its scope.
The patch switches UUID_gen API from using plain integers to
hold time units to units from std::chrono.
For one, we plan to switch the entire code base to std::chrono units,
to ensure type safety. Secondly, using std::chrono units allows to
increase code reuse with template metaprogramming and remove a few
of UUID_gen functions that beceme redundant as a result.
* switch get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch
min_time_UUID(), max_time_UUID(), create_time_safe() to
std::chrono
* remove unused variant of from_unix_timestamp()
* remove unused get_time_UUID_bytes(), create_time_unsafe(),
redundant get_adjusted_timestamp()
* inline get_raw_UUID_bytes()
* collapse to similar implementations of get_time_UUID()
* switch internal constants to std::chrono
* remove unnecessary unique_ptr from UUID_gen::_instance
Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>
Before this patch, deserializing a collection from a (prepared) CQL request
involved deserializing every element and serializing it again. Originally this
was a hacky method of validation, and it was also needed to reserialize nested
frozen collections from the CQLv2 format to the CQLv3 format.
But since then we started doing validation separately (before calls to
from_serialized) and CQLv2 became irrelevant, making reserialization of
elements (which, among other things, involves a memory alocation for every
element) pure waste.
This patch adds a faster path for collections in the v3 format, which does not
involve linearizing or reserializing the elements (since v3 is the same as
our internal format).
After this patch, the path from prepared CQL statements to
atomic_cell_or_collection is almost completely linearization-free. The last
remaining place is collection_mutation_description, where map keys are
linearized.
This patch switches the type used to store collection elements inside the
intermediate form used in lists::value, tuples::value etc. from bytes
to managed_bytes. After this patch, tuple and list elements are only linearized
in from_serialized, which will be corrected soon.
This commit introduces some additional copies in expression.cc, which
will be dealt with in a future commit.
We want to change the internals of cql3::raw_value{_view}.
However, users of cql3::raw_value and cql3::raw_value_view often
use them by extracting the internal representation, which will be different
after the planned change.
This commit prepares us for the change by making all accesses to the value
inside cql3::raw_value(_view) be done through helper methods which don't expose
the internal representation publicly.
After this commit we are free to change the internal representation of
raw_value_{view} without messing up their users.
Rewrite list prepend to use the same machinery
as append, and thus produce correct results when used in LWT.
After this patch, list prepend begins to honor user supplied timestamps.
If a user supplied timestamp for prepend is less than 2010-01-01 00:00:00
an exception is thrown.
Fixes#7611
Scylla list cells are represented internally as a map of
timeuuid => value. To append a new value to a list
the coordinator generates a timeuuid reflecting the current time as key
and adds a value to the map using this key.
Before this patch, Scylla always generated a timeuuid for a new
value, even if the query had a user supplied or LWT timestamp.
This could break LWT linearizability. User supplied timestamps were
ignored.
This is reported as https://github.com/scylladb/scylla/issues/7611
A statement which appended multiple values to a list or a BATCH
generated an own microsecond-resolution timeuuid for each value:
BEGIN BATCH
UPDATE ... SET a = a + [3]
UPDATE ... SET a = a + [4]
APPLY BATCH
UPDATE ... SET a = a + [3, 4]
To fix the bug, it's necessary to preserve monotonicity of
timeuuids within a batch or multi-value append, but make sure
they all use the microsecond time, as is set by LWT or user.
To explain the fix, it's first necessary to recall the structure
of time-based UUIDs:
60 bits: time since start of GMT epoch, year 1582, represented
in 100-nanosecond units
4 bits: version
14 bits: clock sequence, a random number to avoid duplicates
in case system clock is adjusted
2 bits: type
48 bits: MAC address (or other hardware address)
The purpose of clockseq bits is as defined in
https://tools.ietf.org/html/rfc4122#section-4.1.5
is to reduce the probability of UUID collision in case clock
goes back in time or node id changes. The implementation should reset it
whenever one of these events may occur.
Since LWT microsecond time is guaranteed to be
unique by Paxos, the RFC provisioning for clockseq and MAC
slots becomes excessive.
The fix thus changes timeuuid slot content in the following way:
- time component now contains the same microsecond time for all
values of a statement or a batch. The time is unique and monotonic in
case of LWT. Otherwise it's most always monotonic, but may not be
unique if two timestamps are created on different coordinators.
- clockseq component is used to store a sequence number which is
unique and monotonic for all values within the statement/batch.
- to protect against time back-adjustments and duplicates
if time is auto-generated, MAC component contains a random (spoof)
MAC address, re-created on each restart. The address is different
at each shard.
The change is made for all sources of time: user, generated, LWT.
Conditioning the list key generation algorithm on the source of
time would unnecessarily complicate the code while not increase
quality (uniqueness) of created list keys.
Since 14 bits of clockseq provide us with only 16383 distinct slots
per statement or batch, 3 extra bits in nanosecond part of the time
are used to extend the range to 131071 values per statement/batch.
If the rang is exceeded beyond the limit, an exception is produced.
A twist on the use of clockseq to extend timeuuid uniqueness is
that Scylla, like Cassandra, uses int8 compare to compare lower
bits of timeuuid for ordering. The patch takes this into account
and sign-complements the clockseq value to make it monotonic
according to the legacy compare function.
Fixes#7611
test: unit (dev)
Replace two methods for unreversal (`as` and `self_or_reversed`) with
a new one (`without_reversed`). More flexible and better named.
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#7889