Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
first(x) returns the first x it sees in the group. This is useful
for SELECT clauses that return a mix of aggregates and non-aggregates,
for example
SELECT max(x), x
with inputs of x = { 1, 2, 3 } is expected to return (3, 1).
Currently, this behavior is handled by individual selectors,
which means they need to contain extra state for this, which
cannot be easily translated to expressions. The new first function
allows translating the SELECT clause above to
SELECT max(x), first(x)
so all selectors are aggregations and can be handled in the same
way.
The first() function is not exposed to users.
count(col), unlike count(*), does not count rows for which col is NULL.
However, if col's data type is not a scalar (e.g. a collection, tuple,
or user-defined type) it behaves like count(*), counting NULLs too.
The cause is that get_dynamic_aggregate() converts count() to
the count(*) version. It works for scalars because get_dynamic_aggregate()
intentionally fails to match scalar arguments, and functions::get() then
matches the arguments against the pre-declared count functions.
As we can only pre-declare count(scalar) (there's an infinite number
of non-scalar types), we change the approach to be the same as min/max:
we make count() a generic function. In fact count(col) is much better
as a generic function, as it only examines its input to see if it is
NULL.
A unit test is added. It passes with Cassandra as well.
Fixes#14198.
Closes#14199
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `function_name` without the help of `operator<<`.
the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13608
Database functions currently receive their arguments as an std::vector. This
is inflexible (for example, one cannot use small_vector to reduce allocations).
This series adapts the function signature to accept parameters using std::span.
Some changes in the keys interface are needed to support this. Lastly, one call
site is migrated to small_vector.
This is in support of changing selectors to use expressions.
Closes#13581
* github.com:scylladb/scylladb:
cql3: abstract_function_selector: use small_vector for argument buffer
db, cql3: functions: pass function parameters as a span instead of a vector
keys: change from_optional_exploded to accept a span instead of a vector
Spans are more flexible and can be constructed from any contiguous
container (such as small_vector), or a subrange of such a container.
This can save allocations, so change the signature to accept a span.
Spans cannot be constructed from std::initializer_list, so one such
call site is changed to use construct a span directly from the single
argument.
When floating-point data contains +Inf and -Inf, the sum is NaN.
Our SUM() aggregation calculated this sum correctly, but then instead
of returning it, complained that the sum overflowed by narrowing.
This was a false positive: The sum() finalizer wanted to test that no
precision was lost when casting the accumulator to the result type,
so checked that the result before and after the cast are the same.
But specifically for NaN, it is never equal to anything - not even
to itself. This check is wrong for floating point, but moreover -
isn't even necessary when the two types (accumulator type and result
type) are identical so in this patch we skip it in this case.
Note that in the current code, a different accumulator and result type
is only used in the case of integer types; When accumulating floating
point sums, the same type is used, so the broken check will be avoided.
The test for this issue starts to pass with this patch, so the xfail
tag is removed.
Fixes#13551
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Now that all aggregate functions are derived from
stateless_aggregate_function_adapter, we can just fold its functionality
into the base class. This exposes stateless_aggregate_function to
all users of aggregate_function, so they can begin to benefit from
the transformation, though this patch doesn't touch those users.
The aggregate_function base class is partiallly devirtualized since
there is just a single implementation now.
The accumulator_for template is used to select the accumulator
type for aggregates. After refactoring, all that is needed from
it is to select the native type, so remove all the excess code.
Currently, we use __int128, but this has no direct counterpart
in CQL, so we can't express the accumulator type as part of a
CQL scalar function. Switch to varint which is a superset, although
slower.
Currently it works without this, but later unreversing will
be removed from another part of the stack, causing min/max
on reversed types to return incorrect results. Anticipate
that an unreverse the types during construction.
min() and max() had two implementations: one static (for each type in
a select list) and one dynamic (for compound types). Since the
dynamic implementation is sufficient, we only reimplement that. This
means we don't use the automarshalling helpers, since we don't do any
arithemetic on values apart from comparison, which is conveniently
provided by abstract_type.
Add a helper function to consolidate the internal native function
class and the automatic marshalling introduced in previous patches.
Since decaying a lambda into a function pointer (in order to
infer its signature) there are two overloads: one accepts a lambda
and decays it into a function pointer, the second accepts a function
pointer, infers its argument, and constructs the function object.
Add a helper that, given a C++ function, deduces its arguument types
and wraps the function in marshalling/unmarshalling code.
The native function expects non-null inputs, so an additional helper is
called to decide what to do if nulls are encountered. One such
helper is return_accumulator_on_null (since that's the default
behavior of aggregates), and the other is return_any_nonnull(),
useful for reductions.
We'll need many scalar functions to implement aggregates in terms
of scalars, so we add an internal_scalar_function class to reduce
boilerplate. The new class proxies the scalar function into a
native noncopyable_function provided by the constructor.
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.
the building systems are updated accordingly.
the source files referencing `types.hh` were updated using following
command:
```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```
the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12926
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.
Closes#10562
Makes final function and initial condition to be optional while
creating UDA. No final function means UDA returns final state
and defeult initial condition is `null`.
Fixes: #10324
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
A user-defined aggregate is represented as an aggregate
which calls its state function on each input row
and then finalizes its execution by calling its final function
on the final state, after all rows were already processed.
The min/max aggregators use aggregate_type_for comparators, and the
aggregate_type_for<timeuuid> is regular uuid. But that yields wrong
results; timeuuids should be compared as timestamps.
Fix it by changing aggregate_type_for<timeuuid> from uuid to timeuuid,
so aggregators can distinguish betwen the two. Then specialize the
aggregation utilities for timeuuid.
Add a cql-pytest and change some unit tests, which relied on naive
uuid comparators.
Fixes#7729.
Tests: unit (dev, debug)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#7910
impl_count_function doesn't explicitly initialize _count, so its correctness
depends on default initialization. Let's explicitly initialize _count to
make the code future proof.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200714162604.64402-1-raphaelsc@scylladb.com>
For collections and UDTs the `MIN()` and `MAX()` functions are
generated on the fly. Until now they worked by comparing just the
byte representations of arguments.
This patch uses specific per-type comparators to provide semantically
sensible, dynamically created aggregates.
Fixes#6768
The goal is to forward-declare utils::multiprecision_int, something
beyond my capabilities for boost::multiprecision::cpp_int, to reduce
compile time bloat.
The patch is mostly search-and-replace, with a few casts added to
disambiguate conversions the compiler had trouble with.
Other types do not have a wider accumulator at the moment.
And static_cast<accumulator_type>(ret) != _sum evaluates as
false for NaN/Inf floating point values.
Fixes#5586
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>
We had a lot of code in a .hh file, that while using templeates, was
only used from creating functions during startup.
This moves it to a new .cc file.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200101002158.246736-1-espindola@scylladb.com>