In the current code, support for case-sensitive (quoted) user-defined type
names is broken. For example, a test doing:
CREATE TYPE "PHone" (country_code int, number text)
CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk))
Fails - the first line creates the type with the case-sensitive name PHone,
but the second line wrongly ends up looking for the lowercased name phone,
and fails with an exception "Unknown type ks.phone".
The problem is in cql3_type_name_impl. This class is used to convert a
type object into its proper CQL syntax - for example frozen<list<int>>.
The problem is that for a user-defined type, we forgot to quote its name
if not lowercase, and the result is wrong CQL; For example, a list of
PHone will be written as list<PHone> - but this is wrong because the CQL
parser, when it sees this expression, lowercases the unquoted type name
PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>.
The solution is for cql3_type_name_impl to use for a user-defined type
its get_name_as_cql_string() method instead of get_name_as_string().
get_name_as_cql_string() is a new method which prints the name of the
user type as it should be in a CQL expression, i.e., quoted if necessary.
The bug in the above test was apparently caused when our code serialized
the type name to disk as the string PHone (without any quoting), and then
later deserialized it using the CQL type parser, which converted it into
a lowercase phone. With this patch, the type's name is serialized as
"PHone", with the quotes, and deserialized properly as the type PHone.
While the extra quotes may seem excessive, they are necessary for the
correct CQL type expression - remember that the type expression may be
significantly more complex, e.g., frozen<list<"PHone">> and all of this,
including the quotes, is necessary for our parser to be able to translate
this string back into a type object.
This patch may cause breakage to existing databases which used case-
sensitive user-defined types, but I argue that these use cases were
already broken (as demonstrated by this test) so we won't break anything
that actually worked before.
Fixes#5544
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200101160805.15847-1-nyh@scylladb.com>
I found these mismatched types while converting some member functions
to standalone functions, since they have to use the public API that
has more type checks.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-4-espindola@scylladb.com>
Comparing user types after adding new fields was bugged.
In the following scenario:
create type ut (a int);
create table cf (a int primary key, b frozen<ut>);
insert into cf (a, b) values (0, (0));
alter type ut add b int;
select * from cf where b = {a:0,b:null};
the row with a = 0 should be returned, even though the value stored in the database is shorter
(by one null) than the value given by the user. Until now it wouldn't
have.
is_value_compatible_with_internal and update_user_type were generalized
to the non-frozen case.
For now, all user_type_impls in the code are non-multi-cell (frozen).
This will be changed in future commits.
These functions are used to translate field indices, which are used to
identify fields inside UDTs, from/to a serialized representation to be
stored inside sstables and mutations.
They do it in a way that is compatible with C*.
The purpose of collection_type_impl::to_value was to serialize a
collection for sending over CQL. The corresponding function in origin
is called serializeForNativeProtocol, but the name is a bit lengthy,
so I settled for serialize_for_cql.
The method now became a free-standing function, using the visit
function to perform a dispatch on the collection type instead
of a virtual call. This also makes it easier to generalize it to UDTs
in future commits.
Remove the old serialize_for_native_protocol with a FIXME: implement
inside. It was already implemented (to_value), just called differently.
remove dead methods: enforce_limit and serialized_values. The
corresponding methods in C* are auxiliary methods used inside
serializeForNativeProtocol. In our case, the entire algorithm
is wholly written in serialize_for_cql.
`collection_type_impl::serialize_mutation_form`
became `collection_mutation(_view)_description::serialize`.
Previously callers had to cast their data_type down to collection_type
to use serialize_mutation_form. Now it's done inside `serialize`.
In the future `serialize` will be generalized to handle UDTs.
`collection_type_impl::deserialize_mutation_form`
became a free standing function `deserialize_collection_mutation`
with similiar benefits. Actually, noone needs to call this function
manually because of the next paragraph.
A common pattern consisting of linearizing data inside a `collection_mutation_view`
followed by calling `deserialize_mutation_form` has been abstracted out
as a `with_deserialized` method inside collection_mutation_view.
serialize_mutation_form_only_live was removed,
because it hadn't been used anywhere.
collection_type_impl::mutation became collection_mutation_description.
collection_type_impl::mutation_view became collection_mutation_view_description.
These classes now reside inside collection_mutation.hh.
Additional documentation has been written for these classes.
Related function implementations were moved to collection_mutation.cc.
This makes it easier to generalize these classes to non-frozen UDTs in future commits.
The new names (together with documentation) better describe their purpose.
The classes 'collection_mutation' and 'collection_mutation_view'
were moved to a separate header, collection_mutation.hh.
Implementations of functions that operate on these classes,
including some methods of collection_type_impl, were moved
to a separate compilation unit, collection_mutation.cc.
This makes it easier to modify these structures in future commits
in order to generalize them for non-frozen User Defined Types.
Some additional documentation has been written for collection_mutation.
Multi-cell lists and maps may be stored in different formats: as sorted
vectors of pairs of values, when retreived from storage, or as sorted
vectors of values, when created from parser literals or supplied as
parameter values.
Implement a specialized compare for use when receiver and paramter
representation don't match.
Add helpers.
This simplifies the implementation of from_varint_to_integer and
avoids using the fact that a static_cast from cpp_int to uint64_t
seems to just keep the low 64 bits.
The boost release notes
(https://www.boost.org/users/history/version_1_67_0.html) implies that
the conversion function should return the maximum value a uint64_t can
hold if the original value is too large.
The idea of using a & with ~0 is a suggestion from the boost release
notes.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
"
The release notes for boost 1.67.0 includes:
Breaking Change: When converting a multiprecision integer to a narrower type, if the value is too large (or negative) to fit in the smaller type, then the result is either the maximum (or minimum) value of the target
Since we just moved out of boost 1.66, we have to update our code.
This fixes issue #4960
"
* 'espindola/fix-4960' of https://github.com/espindola/scylla:
types: fix varint to integer conversion
types: extract a from_varint_to_integer from make_castas_fctn_from_decimal_to_integer
types: fix decimal to integer conversion
types: extract helper for converting a decimal to a cppint
types: rename and detemplate make_castas_fctn_from_decimal_to_integer
According to the comments, the only different between date_type_impl
and timestamp_type_impl is the comparison function.
This patch makes that explicit by merging all code paths except:
* The warning when converting between the two
* The compare function
The date_type_impl type can still be user visible via very old
sstables or via the thrift protocol. It is not clear if we still need
to support either, but with this patch it is easy to do so.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
In these cases it is pretty clear that the original code wanted to
create a timestamp_type data_value but was creating a date_type one
because of the old defaults.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Now that we know that anything expecting a date_type has been
converted to date_type_native_type, switch to using
db_clock::time_point when we want a timestamp_type.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
date_type was replaced with timestamp_type, but it was very easy to
create a date_type instead of a timestamp_type by accident.
This patch changes the code so that a date_type is no longer
implicitly used when constructing a data_value. All existing code that
was depending on this is converted to explicitly using
date_type_native_type. A followup patch will convert to timestamp_type
when appropriate.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The previous code was using the boost::multiprecision::cpp_int to
integer conversion, but that doesn't have the same semantics an cql
for signed numbers.
This fixes the dtest cql_cast_test.py:CQLCastTest.cast_varint_test.
Fixes#4960
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The previous code was using the boost::multiprecision::cpp_rational to
integer conversion, but that doesn't have the same semantics an cql.
This patch avoids creating a cpp_rational in the first place and works
just with integers.
This fixes the dtest cql_cast_test.py:CQLCastTest.cast_decimal_test.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This turns find into a template so there is only one switch over the
kind of each type in the search.
To evaluate the change in code size sizes, I added [[noinline]] to
find and obtained the following results.
The release columns for release in the before case have an extra column
because the functions are sufficiently complex to trigger gcc to split
them in hot + cold.
before:
dev release (hot + cold split)
find 0x35f = 863 0x3d5 + 0x112 = 1255
references_duration 0x62 + 0x22 + 0x8 = 140 0x55 + 0x1f + 0x2a + 0x8 = 166
references_user_type 0x6b + 0x26 + 0x111 = 418 0x65 + 0x1f + 0x32 + 0x11b = 465
after:
dev release
find 0xd6 + 0x1b4 = 650 0xd2 + 0x1f5 = 711
references_duration 0x13 = 19 0x13 = 19
references_user_type 0x1a = 26 0x21 = 33
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This was broken since the type refactoring. It was checking the static
type, which is always abstract_type. Unfortunately we only had dtests
for this.
This can probably be optimized to avoid the double switch over kind,
but it is probably better to do the simple fix first.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190821155354.47704-1-espindola@scylladb.com>