scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 16:22:15 +00:00

Author	SHA1	Message	Date
Jan Ciolek	464437ef90	types/user: modify idx_of_field to use bytes_view Let's change the argument type from `bytes` to `bytes_view`. Sometimes it's possible to get an instance of `bytes_view`, but getting `bytes` would require a copy, which is wasteful. `bytes_view` allows to avoid copies. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-16 01:11:31 +02:00
Jan Ciolek	ab1ba497b5	types: add read_nth_user_type_field() Add a function which can be used to read the nth field of a serialized UDT value. We could deserialize the whole value and then choose one of the deserialized fields, but that would be wasteful. Sometimes we only need the value of one field, not all of them. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-16 01:11:30 +02:00
Jan Ciolek	5fce4d9675	types: add read_nth_tuple_element() Add a function which retrieves the value of nth field from a serialized tuple value. I tried to make it as efficient as possible. Other functions, like evaluate(subscript) tend to deserialize the whole structure and put all of its elements in a vector. Then they select a single element from this vector. This is wasteful, as we only need a single element's value. This function goes over the serialized fields and directly returns the one that is needed. No allocations are needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-14 07:22:39 +02:00
Avi Kivity	42a1ced73b	cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt The expression system uses managed_bytes_opt for values, but result_set uses bytes_opt. This means that processing values from the result set in expressions requires a copy. Out of the two, managed_bytes_opt is the better choice, since it prevents large contiguous allocations for large blobs. So we switch result_set to use managed_bytes_opt. Users of the result_set API are adjusted. The db::function interface is not modified to limit churn; instead we convert the types on entry and exit. This will be adjusted in a following patch.	2023-05-07 17:17:36 +03:00
Avi Kivity	d3e9fd49a3	types: abstract_type: add mixed-type versions of compare() and equal() compare() and equal() can compare two unfragmented values or two fragmented values, but a mix of a fragmented value and an unfragmented value runs afoul of C++ conversion rules. Add more overloads to make it simpler for users.	2023-05-07 17:17:36 +03:00
Benny Halevy	935ff0fcbb	types: timestamp_from_string: print current_exception on error We may catch exceptions that are not `marshal_exception`. Print std::current_exception() in this case to provide some context about the marshalling error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13693	2023-04-27 22:30:55 +03:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Kefu Chai	a2aa133822	treewide: use std::lexicographical_compare_threeway this the standard library offers `std::lexicographical_compare_threeway()`, and we never uses the last two addition parameters which are not provided by `std::lexicographical_compare_threeway()`. there is no need to have the homebrew version of trichotomic compare function. in this change, * all occurrences of `lexicographical_tri_compare()` are replaced with `std::lexicographical_compare_threeway()`. * ``lexicographical_tri_compare()` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13615	2023-04-21 14:28:18 +03:00
Kefu Chai	6bb32efac0	utils: big_decimal: replace compare() with <=> operator now that we are using C++20, it'd be more convenient if we can use the <=> operator for comparing. the compiler creates the 6 other operators for us if the <=> operator is defined. so the code is more compacted. in this change, `big_decimal::compare()` is replaced with `operator<=>`, and its caller is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-15 12:52:30 +08:00
Nadav Har'El	d26bb8c12d	Merge 'tree: migrate from std::regex to boost::regex' from Botond Dénes Except for where usage of `std::regex` is required by 3rd party library interfaces. As demonstrated countless times, std::regex's practice of using recursion for pattern matching can result in stack overflow, especially on AARCH64. The most recent incident happened after merging https://github.com/scylladb/scylladb/pull/13075, which (indirectly) uses `sstables::make_entry_descriptor()` to test whether a certain path is a valid scylla table path in a trial-and-error manner. This resulted in stacks blowing up in AARCH64. To prevent this, use the already tried and tested method of switching from `std::regex` to `boost::regex`. Don't wait until each of the `std::regex` sites explode, replace them all preemptively. Refs: https://github.com/scylladb/scylladb/issues/13404 Closes #13452 * github.com:scylladb/scylladb: test: s/std::regex/boost::regex/ utils: s/std::regex/boost::regex/ db/commitlog: s/std::regex/boost::regex/ types: s/std::regex/boost::regex/ index: s/std::regex/boost::regex/ duration.cc: s/std::regex/boost::regex/ cql3: s/std::regex/boost::regex/ thrift: s/std::regex/boost::regex/ sstables: use s/std::regex/boost::regex/	2023-04-09 18:47:41 +03:00
Botond Dénes	712889c99f	types: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical is for the most part. escape() needs some special treatment, looks like boost::regex wants double escaped bacspace.	2023-04-06 09:50:45 -04:00
Botond Dénes	00f06522c2	types/user: add get_name() accessor For the raw name (bytes).	2023-03-27 01:44:00 -04:00
Kefu Chai	e796525f23	types: remove unused header <iterator> was introduced back in `1cf02cb9d8`, but lexicographical_compare.hh was extracted out in `bdfc0aa748`, since we don't have any users of <iterator> in types.hh anymore, let's remove it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13327	2023-03-26 16:55:16 +03:00
Avi Kivity	bdfc0aa748	utils, types, test: extract lexicographical compare utilities UUID_test uses lexicograhical_compare from the types module. This is a layering violation, since UUIDs are at a much lower level than the database type system. In practical terms, this cause link failures with gcc due to some thread-local-storage variables defined in types.hh but not provided by any object, since we don't link with types.o in this test. Fix by extracting the relevant functions into a new header.	2023-03-21 15:42:53 +02:00
Kefu Chai	c37f4e5252	treewide: use fmt::join() when appropriate now that fmtlib provides fmt::join(). see https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view there is not need to revent the wheel. so in this change, the homebrew join() is replaced with fmt::join(). as fmt::join() returns an join_view(), this could improve the performance under certain circumstances where the fully materialized string is not needed. please note, the goal of this change is to use fmt::join(), and this change does not intend to improve the performance of existing implementation based on "operator<<" unless the new implementation is much more complicated. we will address the unnecessarily materialized strings in a follow-up commit. some noteworthy things related to this change: * unlike the existing `join()`, `fmt::join()` returns a view. so we have to materialize the view if what we expect is a `sstring` * `fmt::format()` does not accept a view, so we cannot pass the return value of `fmt::join()` to `fmt::format()` * fmtlib does not format a typed pointer, i.e., it does not format, for instance, a `const std::string`. but operator<<() always print a typed pointer. so if we want to format a typed pointer, we either need to cast the pointer to `void` or use `fmt::ptr()`. * fmtlib is not able to pick up the overload of `operator<<(std::ostream& os, const column_definition* cd)`, so we have to use a wrapper class of `maybe_column_definition` for printing a pointer to `column_definition`. since the overload is only used by the two overloads of `statement_restrictions::add_single_column_parition_key_restriction()`, the operator<< for `const column_definition*` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 20:34:18 +08:00
Avi Kivity	6aa91c13c5	Merge 'Optimize topology::compare_endpoints' from Benny Halevy The code for compare_endpoints originates at the dawn of time (`bc034aeaec`) and is called on the fast path from storage_proxy via `sort_by_proximity`. This series considerably reduces the function's footprint by: 1. carefully coding the many comparisons in the function so to reduce the number of conditional banches (apparently the compiler isn't doing a good enough job at optimizing it in this case) 2. avoid sstring copy in topology::get_{datacenter,rack} Closes #12761 * github.com:scylladb/scylladb: topology: optimize compare_endpoints to_string: add print operators for std::{weak,partial}_ordering utils: to_sstring: deinline std::strong_ordering print operator move to_string.hh to utils/ test: network_topology: add test_topology_compare_endpoints	2023-03-07 15:17:19 +02:00
Avi Kivity	3042deb930	types: reimplement in terms of a variable template data_type_for() is a function template that converts a C++ type to a database dynamic type (data_type object). Instead of implementing a function per type, implement a variable template instance. This is shorter and nicer. Since the original type variables (e.g. long_type) are defined separately, use a reference instead of copying to avoid initialization order problems. To catch misuses of data_type_for the general data_type_for_v variable template maps to some unused tag type which will cause a build error when instantiated. The original motivation for this was to allow for partial specialization of data_type_for() for tuple types, but this isn't really workable since the native type for tuples is std::vector<data_value>, not std::tuple, and I only checked this after getting the work done, so this isn't helping anything; it's just a little nicer. Closes #13043	2023-03-01 11:25:39 +02:00
Botond Dénes	ef548e654d	types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory Check the first fragment before dereferencing it, the fragment might be empty, in which case move to the next one. Found by running range scan tests with random schema and random data. Fixes: #12821 Fixes: #12823 Fixes: #12708 Closes #12824	2023-02-21 17:39:18 +02:00
Kefu Chai	df63e2ba27	types: move types.{cc,hh} into types they are part of the CQL type system, and are "closer" to types. let's move them into "types" directory. the building systems are updated accordingly. the source files referencing `types.hh` were updated using following command: ``` find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} + ``` the source files under sstables include "types.hh", which is indeed the one located under "sstables", so include "sstables/types.hh" instea, so it's more explicit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12926	2023-02-19 21:05:45 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Avi Kivity	390a0ca47b	types: allow lists with NULL Allow transient lists that contain NULL throughout the evaluation machinery. This makes is possible to evalute things like `IF col IN (1, 2, NULL)` without hacks, once LWT conditions are converted to expressions. A few tests are relaxed to accommodate the new behavior: - cql_query_test's test_null_and_unset_in_collections is relaxed to allow `WHERE col IN ?`, with the variable bound to a list containing NULL; now it's explicitly allowed - expr_test's evaluate_bind_variable_validates_no_null_in_list was checking generic lists for NULLs, and was similary relaxed (and renamed) - expr_Test's evaluate_bind_variable_validates_null_in_lists_recursively was similarly relaxed to allow NULLs.	2023-01-18 10:38:24 +02:00
Avi Kivity	00145f9ada	test: relax NULL check test predicate When we start allowing NULL in lists in some contexts, the exact location where an error is raised (when it's disallowed) will change. To prepare for that, relax the exception check to just ensure the word NULL is there, without caring about the exact wording.	2023-01-18 10:38:24 +02:00
Avi Kivity	5f8540ecfa	cql3, types: validate listlike collections (sets, lists) for storage Lists allow NULL in some contexts (bind variables for LWT "IN ?" conditions), but not in most others. Currently, the implementation just disallows NULLs in list values, and the cases where it is allowed are hacked around. To reduce the special cases, we'll allow lists to have NULLs, and just restrict them for storage. This is similar to how scalar values can be NULL, but not when they are part of a partition key. To prepare for the transition, identify the locations where lists (and sets, which share the same storage) are stored as frozen values and add a NULL check there. Non-frozen lists already have the check. Since sets share the same format as lists, apply the same to them. No actual checks are done yet, since NULLs are impossible. This is just a stub.	2023-01-18 10:38:24 +02:00
Avi Kivity	2739ac66ed	treewide: drop cql_serialization_format Now that we don't accept cql protocol version 1 or 2, we can drop cql_serialization format everywhere, except when in the IDL (since it's part of the inter-node protocol). A few functions had duplicate versions, one with and one without a cql_serialization_format parameter. They are deduplicated. Care is taken that `partition_slice`, which communicates the cql_serialization_format across nodes, still presents a valid cql_serialization_format to other nodes when transmitting itself and rejects protocol 1 and 2 serialization\ format when receiving. The IDL is unchanged. One test checking the 16-bit serialization format is removed.	2023-01-03 19:54:13 +02:00
Michał Jadwiszczak	29ad5a08a8	implement `keyspace_element` interface This patch implements `data_dictionary::keyspace_element` interfece in: `keyspace_metadata`, `user_type_impl`, `user_function`, `user_aggregate` and schema.	2022-12-10 12:34:09 +01:00
Karol Baryła	aa47f4a15c	types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented std::begin in concept for build_value_fragmented's parameter allows creating it from an array	2022-08-14 10:29:52 +03:00
Michael Livshin	632b4e5a9a	fix "ninja dev-headers" Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
cvybhu	345e89756b	cql3: Add null and unset checks in collection validation Validating a collection should ensure that there are no null or unset values inside the collection. The validation already fails in case of such values, but it does so in an ugly way. Length of null and unset value is negative but is cast to unsigned size_t. Then it tries to read a really large value and fails with marshalling error. The new checks are a better way to handle this. Signed-off-by: cvybhu <jan.ciolek@scylladb.com>	2022-05-18 11:05:14 +02:00
Piotr Grabowski	d3673f2b29	types/map.hh: add missing const qualifiers Add missing const qualifiers in serialize_to_bytes and serialize_to_managed_bytes. Lack of those qualifiers caused GCC compilation error: ./types/map.hh: In instantiation of ‘static bytes map_type_impl::serialize_to_bytes(const Range&) [with Range = std::map<seastar::basic_sstring<signed char, unsigned int, 31, false>, seastar::basic_sstring<signed char, unsigned int, 31, false>, serialized_compare>; bytes = seastar::basic_sstring<signed char, unsigned int, 31, false>]’: cql3/type_json.cc:138:45: required from here ./types/map.hh:72:41: error: loop variable ‘elem’ of type ‘const std::pair<seastar::basic_sstring<signed char, unsigned int, 31, false>, seastar::basic_sstring<signed char, unsigned int, 31, false> >&’ binds to a temporary constructed from type ‘const std::pair<const seastar::basic_sstring<signed char, unsigned int, 31, false>, seastar::basic_sstring<signed char, unsigned int, 31, false> >’ [-Werror=range-loop-construct] 72 \| for (const std::pair<bytes, bytes>& elem : map_range) { \| ^~~~ ./types/map.hh:72:41: note: use non-reference type ‘const std::pair<seastar::basic_sstring<signed char, unsigned int, 31, false>, seastar::basic_sstring<signed char, unsigned int, 31, false> >’ to make the copy explicit or ‘const std::pair<const seastar::basic_sstring<signed char, unsigned int, 31, false>, seastar::basic_sstring<signed char, unsigned int, 31, false> >&’ to prevent copying Adding those const qualifiers there is correct, as the definition of those functions specifies that the range is of std::pair<const bytes, bytes> elements, not std::pair<bytes, bytes> (before the change): requires std::convertible_to<std::ranges::range_value_t<Range>, std::pair<const bytes, bytes>> Note that there are some GCC compilation problems still left apart from this one. Closes #10157	2022-03-03 14:24:05 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Jan Ciolek	e5391f1eed	types: Add map_type_impl::serialize(range of <bytes, bytes>) Adds two functions that take a range over pairs of serialized values and return a serialized map value. There are 2 functions - one operating on bytes and one operating on managed_bytes. The version with managed_bytes is used in expression.cc, used to be a local static function. The bytes version will be used in type_json.cc in the next commit. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-10-28 15:14:52 +02:00
Jan Ciolek	e9f24edc9b	cql3: types: Optimize abstract_type::contains_collection contains_collection() and contains_set_or_map() used to be calculated on each call(). Now the result is calculated only once during type creation. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-09-24 13:45:38 +02:00
Avi Kivity	e52ebe2da5	types: convert abstract_type::compare and related to std::strong_ordering Change comparators around types to std::strong_ordering. Ref #1449.	2021-07-28 13:19:24 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Avi Kivity	14a4173f50	treewide: make headers self-sufficient In preparation for some large header changes, fix up any headers that aren't self-sufficient by adding needed includes or forward declarations.	2021-04-20 21:23:00 +03:00
Michał Chojnowski	ba53c85829	cdc: log: rewrite collection merge to use managed_bytes instead of bytes	2021-04-08 10:16:21 +02:00
Michał Chojnowski	42acdc4d09	cdc: log: don't linearize collections in get_preimage_col_value	2021-04-08 10:16:21 +02:00
Michał Chojnowski	472f0eb932	types: collection: remove an unused version of pack_fragmented It was made unused by previous patches in this series.	2021-04-01 10:44:21 +02:00
Michał Chojnowski	0bb959e890	cql3: don't linearize elements of lists, tuples, and user types This patch switches the type used to store collection elements inside the intermediate form used in lists::value, tuples::value etc. from bytes to managed_bytes. After this patch, tuple and list elements are only linearized in from_serialized, which will be corrected soon. This commit introduces some additional copies in expression.cc, which will be dealt with in a future commit.	2021-04-01 10:44:21 +02:00
Michał Chojnowski	aab9509775	types: collection: add versions of pack for fragmented buffers We will need them to port the representation of collection types in cql3/ from bytes to managed_bytes. The version which takes an iterator of `bytes` as an argument will be removed after that transition is complete.	2021-04-01 10:44:21 +02:00
Michał Chojnowski	3387d43a34	cql3: tuples, user_types: avoid linearization in from_serialized() and get() Deserialize from raw_value_view without linearizing and output managed_bytes instead of bytes.	2021-04-01 10:44:20 +02:00
Michał Chojnowski	a10a82da30	types: tuple: add build_value_fragmented A version of build_value which produces fragmented output. We will use it to avoid linearization in tuples::value and user_types::value.	2021-04-01 10:42:07 +02:00
Wojciech Mitros	f57fa935a2	types: remove linearization from abstract_type::compare To avoid high latencies caused by large contigous allocations needed by linearizing, work on fragmented buffers instead. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-03-31 06:35:10 +02:00
Wojciech Mitros	daa31be37f	types: replace buffers in tuple_deserializing_iterator with fragmented ones In preparation for removing linearization from abstract_type::compare, add options to avoid linearization in tuple_deserializing_iterator. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-03-31 06:35:09 +02:00
Wojciech Mitros	823d4c7529	types: make tuple_type_impl::split work with any FragmentedViews We may want to store a tuple in a fragmented buffer. To split it into a vector of optional bytes, tuple_type_impl::split can be used. To split a contiguous buffer(bytes_view), simply pass single_fragmented_view(bytes_view). Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-03-31 06:34:37 +02:00
Wojciech Mitros	b152dc8c86	types: move read_collection_size/value specialization to header file The template method needs to be specialized in each file that is using it. To avoid rewriting the specialization into multiple files, move it to the header file. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-03-25 12:18:38 +01:00
Calle Wilund	bba43ce31a	listlike_partial_deserializing_iterator: expose templated collection routines To allow using fragmented types as input.	2021-03-03 10:19:46 +00:00
Botond Dénes	ba7a9d2ac3	imr: switch back to open-coded description of structures Commit `aab6b0ee27` introduced the controversial new IMR format, which relied on a very template-heavy infrastructure to generate serialization and deserialization code via template meta-programming. The promise was that this new format, beyond solving the problems the previous open-coded representation had (working on linearized buffers), will speed up migrating other components to this IMR format, as the IMR infrastructure reduces code bloat, makes the code more readable via declarative type descriptions as well as safer. However, the results were almost the opposite. The template meta-programming used by the IMR infrastructure proved very hard to understand. Developers don't want to read or modify it. Maintainers don't want to see it being used anywhere else. In short, nobody wants to touch it. This commit does a conceptual revert of `aab6b0ee27`. A verbatim revert is not possible because related code evolved a lot since the merge. Also, going back to the previous code would mean we regress as we'd revert the move to fragmented buffers. So this revert is only conceptual, it changes the underlying infrastructure back to the previous open-coded one, but keeps the fragmented buffers, as well as the interface of the related components (to the extent possible). Fixes: #5578	2021-02-16 23:43:07 +01:00

1 2 3

127 Commits