scylladb

Author	SHA1	Message	Date
Dawid Mędrek	ac9062644f	cql3: Represent create_statement using managed_string When describing a table, we need to do it carefully: if some columns were dropped, we must specify that explicitly by ``` ALTER TABLE {table} DROP {column} USING TIMESTAMP ... ``` in the result of the DESCRIBE statement. Failing to do so could lead to data resurrection. However, if a table has been altered many, many times, we might end up with a huge create statement. Constructing it could, in turn, trigger an oversized allocation. Some tests ran into that very problem in fact. In this commit, we want to mitigate the problem: instead of allocating a contiguous chunk of memory for the create statement, we use `fragmented_ostringstream` and `managed_string` to possibly keep data scattered in memory. It makes handling `cql3::description` less convenient in the code, but since the struct is pretty much immediately serialized after creating it, it's a very good trade-off. We provide a reproducer. It consistently passes with this commit, while having about 50% chance of failure before it (based on my own experiments). Playing with the parameters of the test doesn't seem to improve that chance, so let's keep it as-is. Fixes scylladb/scylladb#24018	2025-07-01 12:58:02 +02:00
Alexander Turetskiy	3ac533251a	allow "UTC" and "GMT" in string format of timestamp fix problem with statements like: INSERT INTO tbl (pk, time) VALUES (1, '2016-09-27 16:10:00 UTC'); fixes #20501 Closes scylladb/scylladb#22426	2025-02-12 09:38:28 +02:00
Jan Łakomy	9561ae5fc8	types: implement vector_type_impl The vector is a fixed-length array of non-null specified type elements. Implement serialization, deserialization, comparison, JSON and Lua support, and other functionalities. Co-authored-by: Dawid Pawlik <501149991dp@gmail.com>	2025-01-26 19:36:41 +01:00
Avi Kivity	de8253b98a	types: explicitly instantiate map_type_impl::deserialize() The definition of the template is in a source translation unit, but there are also uses outside the translation unit. Without lto/pgo it worked due to the definition in the translation unit, but with lto/pgo we can presume the definition was inlined, so callers outside the translation unit did not have anything to link with. Fix by explicitly instantiating the template function. Closes scylladb/scylladb#22136	2025-01-08 11:52:11 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Nadav Har'El	766ee56536	type: change from_sstring() to from_string_view() All CQL type implementations have a from_sstring(sstring_view) method. The "sstring_view" type is just an historic alias for std::string_view, so this patch switches to use the standard type as suggested in #4062, and also renames these functions from_string_view() to emphesize they can take any string view, and not necessarily a "sstring" as their old name suggested. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-11-18 15:33:04 +02:00
Kefu Chai	00810e6a01	treewide: include seastar/core/format.hh instead of seastar/core/print.hh The later includes the former and in addition to `seastar::format()`, `print.hh` also provides helpers like `seastar::fprint()` and `seastar::print()`, which are deprecated and not used by scylladb. Previously, we include `seastar/core/print.hh` for using `seastar::format()`. and in seastar 5b04939e, we extracted `seastar::format()` into `seastar/core/format.hh`. this allows us to include a much smaller header. In this change, we just include `seastar/core/format.hh` in place of `seastar/core/print.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21574	2024-11-14 17:45:07 +02:00
Avi Kivity	1bbd1436b4	types: move from boost ranges to standard ranges Reduce depdendency load. tuple_deserializing_iterator gained a default constructor so it matches iterator constraints. Closes scylladb/scylladb#21029	2024-10-18 11:00:49 +03:00
Dawid Mędrek	b357307406	data_dictionary: Remove keyspace_element.hh The interface is not used anywhere anymore, so we can remove it safely. It has been replaced by custom functions for each keyspace element and `cql3::description`.	2024-09-20 14:24:54 +02:00
Dawid Mędrek	df94e92b06	treewide: Fix indentation in describe functions After modifying new functions for generating `cql3::description`, we fix indentation in them in this commit.	2024-09-20 14:24:54 +02:00
Dawid Mędrek	86722e4cea	treewide: Return create statement optionally in describe functions We add a new parameter in functions used to generate instances of `cql3::description` for types related to situations where we might not need a create statement. An example of such a scenario could be `DESCRIBE TYPES`.	2024-09-20 14:24:54 +02:00
Dawid Mędrek	0702e93e32	treewide: Add new describe overloads to implementations of data_dictionary::keyspace_element We're removing `data_dictionary::keyspace_element`. Before we can do that, we need to substitute the existing methods used for describing keyspace elements with their new versions returning `cql3::description`. That's what happens in this commit.	2024-09-20 14:24:53 +02:00
Dawid Mędrek	35a92d189e	types: Introduce a function `cql3_type_name_without_frozen()` The introduced function returns the actual name of the type represented by `abstract_type`. It circumvents name processing like wrapping a type within `frozen<>` or using Cassandra's syntax. We add the function to be able to describe UDFs in the upcoming commits that require that their arguments not be `frozen<>`. We also test the implementation.	2024-09-20 14:24:53 +02:00
Kefu Chai	3e84d43f93	treewide: use seastar::format() or fmt::format() explicitly before this change, we rely on `using namespace seastar` to use `seastar::format()` without qualifying the `format()` with its namespace. this works fine until we changed the parameter type of format string `seastar::format()` from `const char*` to `fmt::format_string<...>`. this change practically invited `seastar::format()` to the club of `std::format()` and `fmt::format()`, where all members accept a templated parameter as its `fmt` parameter. and `seastar::format()` is not the best candidate anymore. despite that argument-dependent lookup (ADT for short) favors the function which is in the same namespace as its parameter, but `using namespace` makes `seastar::format()` more competitive, so both `std::format()` and `seastar::format()` are considered as the condidates. that is what is happening scylladb in quite a few caller sites of `format()`, hence ADT is not able to tell which function the winner in the name lookup: ``` /__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous 265 \| return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id()); \| ^~~~~~ /usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 4290 \| format(format_string<_Args...> __fmt, _Args&&... __args) \| ^ /__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 143 \| format(fmt::format_string<A...> fmt, A&&... a) { \| ^ ``` in this change, we change all `format()` to either `fmt::format()` or `seastar::format()` with following rules: - if the caller expects an `sstring` or `std::string_view`, change to `seastar::format()` - if the caller expects an `std::string`, change to `fmt::format()`. because, `sstring::operator std::basic_string` would incur a deep copy. we will need another change to enable scylladb to compile with the latest seastar. namely, to pass the format string as a templated parameter down to helper functions which format their parameters. to miminize the scope of this change, let's include that change when bumping up the seastar submodule. as that change will depend on the seastar change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-11 23:21:40 +03:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Kefu Chai	7b10cc8079	treewide: include seastar headers with brackets this change was created in the same spirit of `ebff5f5d`. despite that we include Seastar as a submodule, Seastar is not a part of scylla project. so we'd better include its headers using brackets. `ebff5f5d` addressed this cosmetic issue a while back. but probably clangd's header-insertion helped some of contributor to insert the missing headers with `"`. so this style of `include` returned to the tree with these new changes. unfortunately, clangd does not allow us to configure the style of `include` at the time of writing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19406	2024-06-21 19:20:27 +03:00
Kefu Chai	fd0de02b81	types: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Michał Jadwiszczak	8157d260f2	types: add a method to get all referenced user types The method allows to collect all UDTs used to create a type. This is required to sort UDTs in a topological order.	2024-05-16 13:30:03 +02:00
Kefu Chai	e2d5054c53	types: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18326	2024-04-23 12:08:23 +03:00
Kefu Chai	1b859e484f	treewide: use fmt::to_string() to transform a UUID to std::string without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is implemented using its `fmt::formatter`, which is not available at the end of this header file where `UUID` is defined. at this moment, we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can still use `UUID::to_sstring()`, but in {fmt} v10, we cannot. so, in this change, we change all callers of `UUID::to_sstring()` to `fmt::to_string()`, so that we don't depend on `FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-26 13:38:37 +08:00
Kurashkin Nikita	7ce9a3e9e5	cql: add limits for integer values when creating date type Added a simple check that prevents entering int values that lead to overflow when creating a date type. Fixes #17066 Closes scylladb/scylladb#17102	2024-02-08 00:08:01 +02:00
Botond Dénes	53a11cba62	Merge 'types/types.cc: move stringstream content instead of copying it' from Patryk Wróbel C++20 introduced a new overload of std::ostringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. It also removes a helper function `inet_addr_type_impl::to_sstring()` - it was used only in two places. It was replaced with `fmt::to_string()`. Closes scylladb/scylladb#16991 * github.com:scylladb/scylladb: use fmt::to_string() for seastar::net::inet_address types/types.cc: move stringstream content instead of copying it	2024-02-06 13:11:41 +02:00
Kefu Chai	6f07d9edaa	types: use {fmt} to format boolean {fmt} format boolean as "true" / "false" since v2.0.1, no need to reinvent the wheel. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-06 10:40:02 +08:00
Kefu Chai	be29556955	types: use {fmt} to format time so we can tighten our dependencies a little bit. there are only three places where we are using the `date` library. the outputs of these two ways are identical: see https://wandbox.org/permlink/Lo9NUrQNUEqyiMEa and https://godbolt.org/z/YEha9ah7v to compare their outputs. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-06 10:39:30 +08:00
Patryk Wrobel	cc186c1798	use fmt::to_string() for seastar::net::inet_address This change removes inet_addr_type_impl::to_sstring() and replaces its usages with fmt::to_string(). The removed helper performed an uneeded copying via std::ostringstream::str(). Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-02-05 16:56:40 +01:00
Patryk Wrobel	8c0d30cd88	types/types.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::ofstringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-02-05 16:35:27 +01:00
Kefu Chai	f5d1836a45	types: fix indent `f344e130` failed to get the indent right, so fix it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16834	2024-01-18 09:14:39 +02:00
Kefu Chai	f344e13066	types: add formatter for data_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for data_value, but its its operator<<() is preserved as we are still using the generic homebrew formatter for formatting std::vector, which in turn uses operator<< of the element type. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16767	2024-01-15 13:18:23 +02:00
Lakshmi Narayanan Sreethar	cd9e027047	types: fix ambiguity in align_up call Compilation fails with recent boost versions (>=1.79.0) due to an ambiguity with the align_up function call. Fix that by adding type inference to the function call. Fixes #16746 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16747	2024-01-12 10:50:31 +02:00
Kefu Chai	80c656a08b	types: use more readable error message when serializing non-ASCII string before this change, we print marshaling error: Value not compatible with type org.apache.cassandra.db.marshal.AsciiType: '...' but the wording is not quite user friendly, it is a mapping of the underlying implementation, user would have difficulty understanding "marshaling" and/or "org.apache.cassandra.db.marshal.AsciiType" when reading this error message. so, in this change 1. change the error message to: Invalid ASCII character in string literal: '...' which should be more straightforward, and easier to digest. 2. update the test accordingly please note, the quoted non-ASCII string is preserved instead of being printed in hex, as otherwise user would not be able to map it with his/her input. Refs #14320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15678	2023-10-20 09:25:44 +03:00
Raphael S. Carvalho	2a81b2e49a	types: Avoid unneeded copy in simple_date_type_impl::from_sstring() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15645	2023-10-06 11:05:27 +03:00
Nadav Har'El	d9c2cd3024	cql: implement missing type functions for "counters" type types.cc had eight of its functions unimplemented for the "counters" types, throwing an "unimplemented::cause::COUNTERS" when used. A ninth function (validate) was unimplemented for counters but did not even throw. Many code paths did not use any of these functions so didn't care, but some do - e.g., the silly do-nothing "SELECT CAST(c AS counter)" when c is already a counter column, which causes this operation to fail. When the types.cc code encounters a counter value, it is (if I understand it correctly) already a single uint64_t ("long_type") value, so we fall back to the long_type implementation of all the functions. To avoid mistakes, I simply copied the reversed_type implementation for all these functions - whereas the reversed_type implementation falls back to using the underlying type, the counter_type implementation always falls back to long_type. After this patch, "SELECT CAST(c AS counter)" for a counter column works. We'll introduce a test that verifies this (and other things) in a later patch in this series. The following patches will also need more of these functions to be implemented correctly (e.g., blobascounter() fails to validate the size of the input blob if the validate function isn't implemented for the counter type). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-07-30 20:16:25 +03:00
Alexey Novikov	ff721ec3e3	make timestamp string format cassandra compatible when we convert timestamp into string it must look like: '2017-12-27T11:57:42.500Z' it concerns any conversion except JSON timestamp format JSON string has space as time separator and must look like: '2017-12-27 11:57:42.500Z' both formats always contain milliseconds and timezone specification Fixes #14518 Fixes #7997 Closes #14726	2023-07-27 12:01:09 +03:00
Kefu Chai	bab16eb30e	treewide: remove #includes not use directly for faster build times and clear inter-module dependencies, we should not #includes headers not directly used. instead, we should only #include the headers directly used by a certain compilation unit. in this change, the source files under "/compaction" directories are checked using clangd, which identifies the cases where we have an #include which is not directly used. all the #includes identified by clangd are removed. because some source files rely on the incorrectly included header file, those ones are updated to #include the header file they directly use. if a forward declaration suffice, the declaration is added instead. see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 17:36:31 +08:00
Jan Ciolek	464437ef90	types/user: modify idx_of_field to use bytes_view Let's change the argument type from `bytes` to `bytes_view`. Sometimes it's possible to get an instance of `bytes_view`, but getting `bytes` would require a copy, which is wasteful. `bytes_view` allows to avoid copies. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-16 01:11:31 +02:00
Jan Ciolek	ab1ba497b5	types: add read_nth_user_type_field() Add a function which can be used to read the nth field of a serialized UDT value. We could deserialize the whole value and then choose one of the deserialized fields, but that would be wasteful. Sometimes we only need the value of one field, not all of them. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-16 01:11:30 +02:00
Jan Ciolek	5fce4d9675	types: add read_nth_tuple_element() Add a function which retrieves the value of nth field from a serialized tuple value. I tried to make it as efficient as possible. Other functions, like evaluate(subscript) tend to deserialize the whole structure and put all of its elements in a vector. Then they select a single element from this vector. This is wasteful, as we only need a single element's value. This function goes over the serialized fields and directly returns the one that is needed. No allocations are needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-14 07:22:39 +02:00
Avi Kivity	d3e9fd49a3	types: abstract_type: add mixed-type versions of compare() and equal() compare() and equal() can compare two unfragmented values or two fragmented values, but a mix of a fragmented value and an unfragmented value runs afoul of C++ conversion rules. Add more overloads to make it simpler for users.	2023-05-07 17:17:36 +03:00
Benny Halevy	935ff0fcbb	types: timestamp_from_string: print current_exception on error We may catch exceptions that are not `marshal_exception`. Print std::current_exception() in this case to provide some context about the marshalling error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13693	2023-04-27 22:30:55 +03:00
Kefu Chai	a2aa133822	treewide: use std::lexicographical_compare_threeway this the standard library offers `std::lexicographical_compare_threeway()`, and we never uses the last two addition parameters which are not provided by `std::lexicographical_compare_threeway()`. there is no need to have the homebrew version of trichotomic compare function. in this change, * all occurrences of `lexicographical_tri_compare()` are replaced with `std::lexicographical_compare_threeway()`. * ``lexicographical_tri_compare()` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13615	2023-04-21 14:28:18 +03:00
Kefu Chai	6bb32efac0	utils: big_decimal: replace compare() with <=> operator now that we are using C++20, it'd be more convenient if we can use the <=> operator for comparing. the compiler creates the 6 other operators for us if the <=> operator is defined. so the code is more compacted. in this change, `big_decimal::compare()` is replaced with `operator<=>`, and its caller is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-15 12:52:30 +08:00
Botond Dénes	712889c99f	types: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical is for the most part. escape() needs some special treatment, looks like boost::regex wants double escaped bacspace.	2023-04-06 09:50:45 -04:00
Kefu Chai	c37f4e5252	treewide: use fmt::join() when appropriate now that fmtlib provides fmt::join(). see https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view there is not need to revent the wheel. so in this change, the homebrew join() is replaced with fmt::join(). as fmt::join() returns an join_view(), this could improve the performance under certain circumstances where the fully materialized string is not needed. please note, the goal of this change is to use fmt::join(), and this change does not intend to improve the performance of existing implementation based on "operator<<" unless the new implementation is much more complicated. we will address the unnecessarily materialized strings in a follow-up commit. some noteworthy things related to this change: * unlike the existing `join()`, `fmt::join()` returns a view. so we have to materialize the view if what we expect is a `sstring` * `fmt::format()` does not accept a view, so we cannot pass the return value of `fmt::join()` to `fmt::format()` * fmtlib does not format a typed pointer, i.e., it does not format, for instance, a `const std::string`. but operator<<() always print a typed pointer. so if we want to format a typed pointer, we either need to cast the pointer to `void` or use `fmt::ptr()`. * fmtlib is not able to pick up the overload of `operator<<(std::ostream& os, const column_definition* cd)`, so we have to use a wrapper class of `maybe_column_definition` for printing a pointer to `column_definition`. since the overload is only used by the two overloads of `statement_restrictions::add_single_column_parition_key_restriction()`, the operator<< for `const column_definition*` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 20:34:18 +08:00
Botond Dénes	ef548e654d	types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory Check the first fragment before dereferencing it, the fragment might be empty, in which case move to the next one. Found by running range scan tests with random schema and random data. Fixes: #12821 Fixes: #12823 Fixes: #12708 Closes #12824	2023-02-21 17:39:18 +02:00
Kefu Chai	df63e2ba27	types: move types.{cc,hh} into types they are part of the CQL type system, and are "closer" to types. let's move them into "types" directory. the building systems are updated accordingly. the source files referencing `types.hh` were updated using following command: ``` find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} + ``` the source files under sstables include "types.hh", which is indeed the one located under "sstables", so include "sstables/types.hh" instea, so it's more explicit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12926	2023-02-19 21:05:45 +02:00

45 Commits