scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Jan Ciolek	15ed83adbc	cql3/functions: make column family argument optional in functions::get The method `functions::get` is used to get the `functions::function` object of the CQL function called using `expr::function_call`. Until now `functions::get` required the caller to pass both the keyspace and the column family. The keyspace argument is always needed, as every CQL function belongs to some keyspace, but the column family isn't used in most cases. The only case where having the column family is really required is the `token()` function. Each variant of the `token()` function belongs to some table, as the arguments to the function are the consecutive partition key columns. Let's make the column family argument optional. In most cases the function will work without information about column family. In case of the `token()` function there's gonna be a check and it will throw an exception if the argument is nullopt. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:00:01 +02:00
Kefu Chai	ca6ebbd1f0	cql3, db: sstable: specialize fmt::formatter<function_name> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `function_name` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13608	2023-04-21 10:07:28 +03:00
Kefu Chai	ecb5380638	treewide: s/boost::lexical_cast<std::string>/fmt::to_string()/ this change replaces all occurrences of `boost::lexical_cast<std::string>` in the source tree with `fmt::to_string()`. for couple reasons: * `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`, so the latter is easier to parse and read. * `boost::lexical_cast<std::string>` creates a stringstream under the hood, so it can use the `operator<<` to stringify the given object. but stringstream is known to be less performant than fmtlib. * we are migrating to fmtlib based formatting, see #13245. so using `fmt::to_string()` helps us to remove yet another dependency on `operator<<`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13611	2023-04-21 09:43:53 +03:00
Botond Dénes	d828cfcb23	Merge 'db, cql3: functions: switch argument passing to std::span' from Avi Kivity Database functions currently receive their arguments as an std::vector. This is inflexible (for example, one cannot use small_vector to reduce allocations). This series adapts the function signature to accept parameters using std::span. Some changes in the keys interface are needed to support this. Lastly, one call site is migrated to small_vector. This is in support of changing selectors to use expressions. Closes #13581 * github.com:scylladb/scylladb: cql3: abstract_function_selector: use small_vector for argument buffer db, cql3: functions: pass function parameters as a span instead of a vector keys: change from_optional_exploded to accept a span instead of a vector	2023-04-21 06:49:07 +03:00
Avi Kivity	3e0aacc8b5	db, cql3: functions: pass function parameters as a span instead of a vector Spans are more flexible and can be constructed from any contiguous container (such as small_vector), or a subrange of such a container. This can save allocations, so change the signature to accept a span. Spans cannot be constructed from std::initializer_list, so one such call site is changed to use construct a span directly from the single argument.	2023-04-19 20:38:55 +03:00
Nadav Har'El	81e0f5b581	cql3: allow SUM() aggregation to result in a NaN When floating-point data contains +Inf and -Inf, the sum is NaN. Our SUM() aggregation calculated this sum correctly, but then instead of returning it, complained that the sum overflowed by narrowing. This was a false positive: The sum() finalizer wanted to test that no precision was lost when casting the accumulator to the result type, so checked that the result before and after the cast are the same. But specifically for NaN, it is never equal to anything - not even to itself. This check is wrong for floating point, but moreover - isn't even necessary when the two types (accumulator type and result type) are identical so in this patch we skip it in this case. Note that in the current code, a different accumulator and result type is only used in the case of integer types; When accumulating floating point sums, the same type is used, so the broken check will be avoided. The test for this issue starts to pass with this patch, so the xfail tag is removed. Fixes #13551 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-04-19 09:31:41 +03:00
Nadav Har'El	59ab9aac44	Merge 'functions: reframe aggregate functions in terms of scalar functions' from Avi Kivity Currently, aggregate functions are implemented in a statefull manner. The accumulator is stored internally in an aggregate_function::aggregate, requiring each query to instantiate new instances (see aggregate_function_selector's constructor, and note how it's called from selector::new_instance()). This makes aggregates hard to use in expressions, since expressions are stateless (with state only provided to evaluate()). To facilitate migration towards stateless expressions, we define a stateless_aggregate_function (modeled after user-defined aggregates, which are already stateless). This new struct defines the aggregate in terms of three scalar functions: one to aggregate a new input into an accumulator (provided in the first parameter), one to finalize an accumulator into a result, and one to reduce two accumulators for parallelized aggregation. All existing native aggregate functions are converted to the new model, and the old interface is removed. This series does not yet convert selectors to expressions, but it does remove one of the obstacles. Performance evaluation: I created a table with a million ints on a single-node cluster, and ran the avg() function on them. I measured the number of instructions executed with `perf stat -p $(pgrep scylla) -e instructions` while the query was running. The query executed from cache, memtables were flushed beforehand. The instruction count per row increased from roughly 49k to roughly 52k, indicating 3k extra instructions per row. While 3k instructions to execute a function is huge, it is currently dwarfed by other overhead (and will be even less important in a cluster where it CL>1 will cause non-coordinator code to run multiple times). Closes #13105 * github.com:scylladb/scylladb: cql3/selection, forward_service: use use stateless_aggregate_function directly db: functions: fold stateless_aggregate_function_adapter into aggregate_function cql3: functions: simplify accumulator_for template cql3: functions: base user-defined aggregates on stateless aggregates cql3: functions: drop native_aggregate_function cql3: functions: reimplement count(column) statelessly cql3: functions: reimplement avg() statelessly cql3: functions: reimplement sum() statelessly cql3: functions: change wide accumulator type to varint cql3: functions: unreverse types for min/max cql3: functions: rename make_{min,max}_dynamic_function cql3: functions: reimplement min/max statelessly cql3: functions: reimplement count(*) statelessly cql3: functions: simplify creating native functions even more cql3: functions: add helpers for automating marshalling for scalar functions types: fix big_decimal constructor from literal 0 cql3: functions: add helper class for internal scalar functions db: functions: add stateless aggregate functions db, cql3: move scalar_function from cql3/functions to db/functions	2023-03-30 13:58:47 +03:00
Avi Kivity	58eb21aa5d	db: functions: fold stateless_aggregate_function_adapter into aggregate_function Now that all aggregate functions are derived from stateless_aggregate_function_adapter, we can just fold its functionality into the base class. This exposes stateless_aggregate_function to all users of aggregate_function, so they can begin to benefit from the transformation, though this patch doesn't touch those users. The aggregate_function base class is partiallly devirtualized since there is just a single implementation now.	2023-03-28 23:47:11 +03:00
Avi Kivity	68529896aa	cql3: functions: simplify accumulator_for template The accumulator_for template is used to select the accumulator type for aggregates. After refactoring, all that is needed from it is to select the native type, so remove all the excess code.	2023-03-28 23:47:11 +03:00
Avi Kivity	4ea3136026	cql3: functions: base user-defined aggregates on stateless aggregates Since the model for stateless aggregates was taken from user defined aggregates, the conversion is trivial.	2023-03-28 23:47:11 +03:00
Avi Kivity	f2715b289a	cql3: functions: drop native_aggregate_function Now that all aggregates are implemented staetelessly, native_aggregate_function no longer has subclasses, so drop it.	2023-03-28 23:47:11 +03:00
Avi Kivity	6bceb25982	cql3: functions: reimplement count(column) statelessly Note that we don't use the automarshalling helper for the aggregation function, since it doesn't work for compound types.	2023-03-28 23:47:11 +03:00
Avi Kivity	4f2cdace9a	cql3: functions: reimplement avg() statelessly	2023-03-28 23:47:11 +03:00
Avi Kivity	b0a8fd3287	cql3: functions: reimplement sum() statelessly	2023-03-28 23:47:11 +03:00
Avi Kivity	d21d11466a	cql3: functions: change wide accumulator type to varint Currently, we use __int128, but this has no direct counterpart in CQL, so we can't express the accumulator type as part of a CQL scalar function. Switch to varint which is a superset, although slower.	2023-03-28 23:47:11 +03:00
Avi Kivity	3252dc0172	cql3: functions: unreverse types for min/max Currently it works without this, but later unreversing will be removed from another part of the stack, causing min/max on reversed types to return incorrect results. Anticipate that an unreverse the types during construction.	2023-03-28 23:47:09 +03:00
Avi Kivity	ed466b7e68	cql3: functions: rename make_{min,max}_dynamic_function There's no longer a statically-typed variant, so no need to distinguish the dynamically-typed one.	2023-03-28 23:37:49 +03:00
Avi Kivity	bfd70c192e	cql3: functions: reimplement min/max statelessly min() and max() had two implementations: one static (for each type in a select list) and one dynamic (for compound types). Since the dynamic implementation is sufficient, we only reimplement that. This means we don't use the automarshalling helpers, since we don't do any arithemetic on values apart from comparison, which is conveniently provided by abstract_type.	2023-03-26 15:18:22 +03:00
Avi Kivity	e6342d476b	cql3: functions: reimplement count(*) statelessly Note we have to explicitly decay lambdas to functions using unary operator +.	2023-03-26 15:18:22 +03:00
Avi Kivity	9291ec5ed1	cql3: functions: simplify creating native functions even more Add a helper function to consolidate the internal native function class and the automatic marshalling introduced in previous patches. Since decaying a lambda into a function pointer (in order to infer its signature) there are two overloads: one accepts a lambda and decays it into a function pointer, the second accepts a function pointer, infers its argument, and constructs the function object.	2023-03-26 15:15:36 +03:00
Kefu Chai	c37f4e5252	treewide: use fmt::join() when appropriate now that fmtlib provides fmt::join(). see https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view there is not need to revent the wheel. so in this change, the homebrew join() is replaced with fmt::join(). as fmt::join() returns an join_view(), this could improve the performance under certain circumstances where the fully materialized string is not needed. please note, the goal of this change is to use fmt::join(), and this change does not intend to improve the performance of existing implementation based on "operator<<" unless the new implementation is much more complicated. we will address the unnecessarily materialized strings in a follow-up commit. some noteworthy things related to this change: * unlike the existing `join()`, `fmt::join()` returns a view. so we have to materialize the view if what we expect is a `sstring` * `fmt::format()` does not accept a view, so we cannot pass the return value of `fmt::join()` to `fmt::format()` * fmtlib does not format a typed pointer, i.e., it does not format, for instance, a `const std::string`. but operator<<() always print a typed pointer. so if we want to format a typed pointer, we either need to cast the pointer to `void` or use `fmt::ptr()`. * fmtlib is not able to pick up the overload of `operator<<(std::ostream& os, const column_definition* cd)`, so we have to use a wrapper class of `maybe_column_definition` for printing a pointer to `column_definition`. since the overload is only used by the two overloads of `statement_restrictions::add_single_column_parition_key_restriction()`, the operator<< for `const column_definition*` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 20:34:18 +08:00
Avi Kivity	7a5e609d8d	cql3: functions: add helpers for automating marshalling for scalar functions Add a helper that, given a C++ function, deduces its arguument types and wraps the function in marshalling/unmarshalling code. The native function expects non-null inputs, so an additional helper is called to decide what to do if nulls are encountered. One such helper is return_accumulator_on_null (since that's the default behavior of aggregates), and the other is return_any_nonnull(), useful for reductions.	2023-03-15 22:28:41 +02:00
Avi Kivity	6c8d942fa1	cql3: functions: add helper class for internal scalar functions We'll need many scalar functions to implement aggregates in terms of scalars, so we add an internal_scalar_function class to reduce boilerplate. The new class proxies the scalar function into a native noncopyable_function provided by the constructor.	2023-03-15 22:22:02 +02:00
Avi Kivity	82c4341e0e	db, cql3: move scalar_function from cql3/functions to db/functions Previously, we moved cql3::functions::function to the db::functions namespace, since functions are a part of the data dictionary, which is independent of cql3. We do the same now for scalar_function, since we wish to make use of it in a new db::functions::stateless_aggregate_function. A stub remains in cql3/functions to avoid churn.	2023-03-15 20:37:25 +02:00
Avi Kivity	6aa91c13c5	Merge 'Optimize topology::compare_endpoints' from Benny Halevy The code for compare_endpoints originates at the dawn of time (`bc034aeaec`) and is called on the fast path from storage_proxy via `sort_by_proximity`. This series considerably reduces the function's footprint by: 1. carefully coding the many comparisons in the function so to reduce the number of conditional banches (apparently the compiler isn't doing a good enough job at optimizing it in this case) 2. avoid sstring copy in topology::get_{datacenter,rack} Closes #12761 * github.com:scylladb/scylladb: topology: optimize compare_endpoints to_string: add print operators for std::{weak,partial}_ordering utils: to_sstring: deinline std::strong_ordering print operator move to_string.hh to utils/ test: network_topology: add test_topology_compare_endpoints	2023-03-07 15:17:19 +02:00
Kefu Chai	79d2eb1607	cql3: functions: validate arguments for 'token()' also since "token()" computes the token for a given partition key, if we pass the key of the wrong type, it should reject. in this change, * we validate the keys before returning the "token()" function. * drop the "xfail" decorator from two of the tests. they pass now after this fix. * change the tests which previously passed the wrong number of arguments containing null to "token()" and expect it to return null, so they verify that "token()" should reject these arguments with the expected error message. Fixes #10448 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12991	2023-02-26 19:01:58 +02:00
Kefu Chai	df63e2ba27	types: move types.{cc,hh} into types they are part of the CQL type system, and are "closer" to types. let's move them into "types" directory. the building systems are updated accordingly. the source files referencing `types.hh` were updated using following command: ``` find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} + ``` the source files under sstables include "types.hh", which is indeed the one located under "sstables", so include "sstables/types.hh" instea, so it's more explicit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12926	2023-02-19 21:05:45 +02:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Benny Halevy	25ebc63b82	move to_string.hh to utils/ Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:09:04 +02:00
Wojciech Mitros	02bfac0c66	uda: change the UDF used in a UDA if it's replaced Currently, if a UDA uses a UDF that's being replaced, the UDA will still keep using the old UDF until the node is restarted. This patch fixes this behavior by checking all UDAs when replacing a UDF and updating them if necessary. Fixes #12709	2023-02-07 12:17:52 +01:00
Wojciech Mitros	58987215dc	functions: add helper same_signature method When deciding whether two functions have the same signature, we have to check if they have the same name and parameter types. Additionally, if they're represented by pointers, we need to check if any of them is a nullptr. This logic is used multiple times, so it's extracted to a separate function. To use this function, the `used_by_user_aggregate` method takes now a function instead of name and types list - we can do it because we always use it with an existing user function (that we're trying to drop). The method will also be useful when we'll be not dropping, but replacing a user function.	2023-02-07 10:15:12 +01:00
Wojciech Mitros	20069372e7	uda: return aggregate functions as shared pointers We will want to reuse the functions that we get from an aggregate without making a deep copy, and it's only possible if we get pointers from the aggregate instead of actual values.	2023-02-07 10:15:09 +01:00
Wojciech Mitros	ef1dac813b	udf: also check reducefunc to confirm that a UDF is not used in a UDA When dropping a UDF we're checking if it's not begin used in any UDAs and fail otherwise. However, we're only checking its state function and final function, and it may also be used as its reduce function. This patch adds the missing checks and a test for them.	2023-02-06 13:02:54 +01:00
Wojciech Mitros	49077dd144	udf: fix dropping UDFs that share names with other UDFs used in UDAs Currently, when dropping a function, we only check if there exist an aggregate that uses a function with the same name as its state function or final function. This may cause the drop to fail even when it's just another UDF with the same name that's used in the aggregate, even when the actual dropped function is not used there. This patch fixes this by checking whether not only the name of the UDA's sfunc and finalfunc, but also their argument types.	2023-02-06 13:02:53 +01:00
Wojciech Mitros	86c61828e6	udt: disallow dropping a user type used in a user function Currently, nothing prevents us from dropping a user type used in a user function, even though doing so may make us unable to use the function correctly. This patch prevents this behavior by checking all function argument and return types when executing a drop type statement and preventing it from completing if the type is referenced by any of them. Closes #12680	2023-02-01 18:53:29 +02:00
Avi Kivity	2739ac66ed	treewide: drop cql_serialization_format Now that we don't accept cql protocol version 1 or 2, we can drop cql_serialization format everywhere, except when in the IDL (since it's part of the inter-node protocol). A few functions had duplicate versions, one with and one without a cql_serialization_format parameter. They are deduplicated. Care is taken that `partition_slice`, which communicates the cql_serialization_format across nodes, still presents a valid cql_serialization_format to other nodes when transmitting itself and rejects protocol 1 and 2 serialization\ format when receiving. The IDL is unchanged. One test checking the 16-bit serialization format is removed.	2023-01-03 19:54:13 +02:00
Michał Jadwiszczak	06cd03d3cd	cql3:functions: `get_user_functions()` and `get_user_aggregates()` Helper functions to obtain UDFs/UDAs for certain keyspace.	2022-12-10 12:36:59 +01:00
Michał Jadwiszczak	29ad5a08a8	implement `keyspace_element` interface This patch implements `data_dictionary::keyspace_element` interfece in: `keyspace_metadata`, `user_type_impl`, `user_function`, `user_aggregate` and schema.	2022-12-10 12:34:09 +01:00
Wojciech Mitros	9281ba3919	wasm: reuse UDF instances When executing a wasm UDF, most of the time is spent on setting up the instance. To minimize its cost, we reuse the instance using wasm::instance_cache. This patch adds a wasm instance cache, that stores a wasmtime instance for each UDF and scheduling group. The instances are evicted using LRU strategy. The cache may store some entries for the UDF after evicting the instance, but they are evicted when the corresponding UDF is dropped, which greatly limits their number. The size of stored instances is estimated using the size of their WASM memories. In order to be able to read the size of memory, we require that the memory is exported by the client. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-07-20 18:19:22 +02:00
Jadw1	a0a6d87c1b	cql3:functions: Add cql3::functions::functions::mock_get() `mock_get` was created only for forward_service use, thus it only checks for aggregate functions if no declared function was found. The reason for this function is, there is no serialization of `cql3::selection::selection`, so functions lying underneath these selections has to be refound. Most of this code is copied from `functions::get()`, however `functions::get()` is not used because it requires to mock or serialize expressions and `functions::find()` is not enough, because it does not search for dynamic aggregate functions	2022-07-18 15:25:41 +02:00
Jadw1	59498caeca	db,cql3: Move part of cql3's function into db Moving `function`, `function_name` and `aggregate_function` into db namespace to avoid including cql3 namespace into query-request. For now, only minimal subset of cql3 function was moved to db.	2022-07-18 15:25:41 +02:00
Jadw1	0f08c8e099	cql3: reducible aggregates Introduces reducible aggregates which don't return final result but accumulator, that can be later reduced.	2022-07-18 15:25:41 +02:00
Jadw1	d13f347621	DB: Add `scylla_aggregates` system table Saving information about UDA's reduce function to `scylla_aggregates` table and distributing it across cluster.	2022-07-18 15:25:37 +02:00
Jadw1	d8f3461147	CQL3: Add reduce function to UDA Add optional field to UDA, that describes reduce function to allow parallelization of UDA aggregates.	2022-07-18 14:18:48 +02:00
Aleksandra Martyniuk	bf34589fc1	cql3: create tokens out of null values properly Method reponsible for creating a token of given values is not meant to be used with empty optionals. Thus, having requested a token of the columns containing null values resulted with an exception being thrown. This kind of behaviour was not compatible with the one applied in cassandra. To fix this, before the computation of a token, it is checked whether no null value is contained. If any value in the processed vector is null, null value is returned. Fixes: #10594 Closes #10942	2022-07-04 10:42:23 +02:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Jadw1	c921efd1b3	cql3: allow no final_func and no initcond in UDA Makes final function and initial condition to be optional while creating UDA. No final function means UDA returns final state and defeult initial condition is `null`. Fixes: #10324	2022-04-06 09:08:50 +02:00
Wojciech Mitros	56c5459c50	wasm: add null handling for wasm udf As the name suggests, for UDFs defined as RETURNS NULL ON NULL INPUT, we sometimes want to return nulls. However, currently we do not return nulls. Instead, we fail on the null check in init_arg_visitor. Fix by adding null handling before passing arguments, same as in lua. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #10298	2022-03-31 12:27:38 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	d768e9fac5	cql3, related: switch to data_dictionary Stop using database (and including database.hh) for schema related purposes and use data_dictionary instead. data_dictionary::database::real_database() is called from several places, for these reasons: - calling yet-to-be-converted code - callers with a legitimate need to access data (e.g. system_keyspace) but with the ::database accessor removed from query_processor. We'll need to find another way to supply system_keyspace with data access. - to gain access to the wasm engine for testing whether used defined functions compile. We'll have to find another way to do this as well. The change is a straightforward replacement. One case in modification_statement had to change a capture, but everything else was just a search-and-replace. Some files that lost "database.hh" gained "mutation.hh", which they previously had access to through "database.hh".	2021-12-15 13:54:23 +02:00

1 2 3 4 5 ...

262 Commits