Commit Graph

23 Commits

Author SHA1 Message Date
Michał Hudobski
5c957e83cb vector_search: remove dependence on cql3
This patch removes the dependence of vector search module
on the cql3 module by moving the contents of cql3/type_json.hh
to types/json_utils.hh and removing the usage of cql3 primary_key
object in vector_store_client. We also make the needed adjustments
to files that were previously using the afformentioned type_json.hh
file.

This fixes the circular dependency cql3 <-> vector_search.

Closes scylladb/scylladb#26482
2025-10-21 17:41:55 +03:00
Michał Chojnowski
bf90018b8e types/comparable_bytes: add a missing implementation for date_type_impl
date_type_impl is like timestamp_type_impl, but unsigned.
2025-09-17 12:22:40 +02:00
Lakshmi Narayanan Sreethar
4547f6f188 types/comparable_bytes: update compatibility testcase to support collection types
The `abstract_type::from_string()` method used to parse the input data
doesn't support collections yet. So the collection testdata will be
passed as JSON strings to the testcase. This patch updates the testcase
to adapt to this workaround.

Also, extended the testcase to verify that Scylla's implementation can
successfully decode the byte comparable output encoded by Cassandra.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar
0997b3533c types/comparable_bytes: support empty type
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar
b799101a09 types/comparable_bytes: support reversed types
A reversed type is first encoded using the underlying type and then all
the bits are flipped to ensure that the lexicographical sort order is
reversed. During decode, the bytes are flipped first and then decoded
using the underlying type.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar
6c2a3e2c51 types/comparable_bytes: support vector cql3 type
The CQL vector type encoding is similar to the lists, where each element
is transformed into a byte-comparable format and prefixed with a
component marker. The sequence is terminated with a terminator marker to
indicate the end of the collection.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar
1ccfe522f1 types/comparable_bytes: support tuple and UDT cql3 type
The CQL tuple and UDT types share the same internal implementation and
therefore use the same byte comparable encoding. The encoding is similar
to lists, where each element is transformed into a byte-comparable
format and prefixed with a component marker. The sequence is terminated
with a terminator marker to indicate the end of the collection.

TODO: Add duplicate test items to maps, lists and sets
      For maps, add more entries that share keys
      ex map1 : key1 : value1, key2 : value2
         map2 : key1 : value4
         map3 : key2 : value5 etc

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar
ca38c15a97 types/comparable_bytes: support map cql3 type
The CQL map type is encoded as a sequence of key-value pairs. Each key
and each value is individually prefixed with a component marker, and the
sequence is terminated with a terminator marker to indicate the end of
the collection.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar
4d5e5f0c84 types/comparable_bytes: support set and list cql3 types
The CQL set and list types are encoded as a sequence of elements, where
each element is transformed into a byte-comparable format and prefixed
with a component marker. The sequence is terminated with a terminator
marker to indicate the end of the collection.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar
8e46e8be01 types/comparable_bytes: introduce encode/decode_component
The components of a collection, such as an element from a list, set, or
vector; a key or value from a map; or a field from a tuple, share the
same encode and decode logic. During encode, the component is transformed
into the byte comparable format and is prefixed with the `NEXT_COMPONENT`
marker. During decode, the component is transformed back into its
serialized form and is prefixed with the serialized size.

A null component is encoded as a single `NEXT_COMPONENT_NULL` marker and
during decode, a `-1` is written to the serialized output.

This commit introduces few helper methods that implement the above
mentioned encode and decode logics.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-08-29 12:26:21 +05:30
Lakshmi Narayanan Sreethar
5f5a8cf54c types/comparable_bytes: add testcase to verify compatibility with cassandra 2025-07-01 22:19:08 +05:30
Lakshmi Narayanan Sreethar
6c1853a830 types/comparable_bytes: support variable-length natively byte-ordered data types
The following cql3 data types - ascii, blob, duration, inet, and text -
are natively byte-ordered in their serialized forms. To encode them into
a byte-comparable format, zeros are escaped, and since these types have
variable lengths, the encoded form is terminated in an escaped state to
mark its end.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:08 +05:30
Lakshmi Narayanan Sreethar
5c77d17834 types/comparable_bytes: support decimal cql3 types
The decimal cql3 type is internally stored as a scale and an unscaled
integer. To convert them into a byte comparable format, they are first
normalized into a base-100 exponent and a mantissa that lies in [0.01, 1)
and then encoded into a byte sequence that preserves the numerical order.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:08 +05:30
Lakshmi Narayanan Sreethar
832236d044 types/comparable_bytes: introduce count_digits() method
Implemented a method `count_digits()` to return the number of significant
digits in a given boost::multiprecision:cpp_int. This is required to
convert big_decimal to a byte comparable format.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:08 +05:30
Lakshmi Narayanan Sreethar
a00c5d3899 types/comparable_bytes: support uuid and timeuuid cql3 types
The uuid type values are composed of two fixed-length unsigned integers:
an msb and an lsb. The msb contains a version digit, which must be
pulled first in a byte-comparable representation. For version 1 uuids,
in addition to extracting the version digit first, the msb must be
rearranged to make it byte comparable. The lsb is written as is.

For the timeuuid type, the msb is handled simliar to the version 1 uuid
values. The lsb however is treated differently - the sign bits of all
bytes are inverted to preserve the legacy comparison order, which
compared individual bytes as signed values.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:08 +05:30
Lakshmi Narayanan Sreethar
4592b9764c types/comparable_bytes: support varint cql3 type
Any varint value less than 7 bytes is encoded using the signed long
encoding format and remaining values are all encoded using the full form
encoding :

  <signbyte><length as unsigned integer - 7><7 or more bytes>,

where <signbyte> is 00 for negative numbers and FF for positive ones,
and the length's bytes are inverted if the number is negative (so that
longer length sorts smaller).

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:07 +05:30
Lakshmi Narayanan Sreethar
ad45a19373 types/comparable_bytes: introduce encode/decode_varint_length
The length of a varint value is encoded separately as an unsigned
variable-length integer. For negative varint values, the encoded bytes
are flipped to ensure that longer lengths sort smaller. This patch
implements both encoding and decoding logic for varint lengths and will
be used by the subsequent patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:07 +05:30
Lakshmi Narayanan Sreethar
7af153c237 types/comparable_bytes: support float and double cql3 types
The sign bit is flipped for positive values to ensure that they are
ordered after negative values. For negative values, all the bytes are
inverted, allowing larger negative values to be ordered before smaller
ones.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:07 +05:30
Lakshmi Narayanan Sreethar
0145c1d705 types/comparable_bytes: support date, time and timestamp cql3 types
Both the date and time cql3 types are internally unsigned fixed length
integers. Their serialized form is already byte comparable, so the
encoder and decoder return the serialized bytes as it is.

The timestamp type is encoded using the fixed length signed integer
encoding.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:07 +05:30
Lakshmi Narayanan Sreethar
b6ff3f5304 types/comparable_bytes: support bigint cql3 type
The bigint type, internally implemented as a long data type, is encoded
using a variable-length encoding similar to UTF-8. This enables a
significant amount of space to be saved when smaller numbers are
frequently used, while still permitting large values to be efficiently
encoded.

The first bit of the encoding represents the inverted sign (i.e., 1 for
positive, 0 for negative), followed by length encoded as a sequence of
bits matching the inverted sign. This is then followed by a differing
bit (except for 9-byte encodings) and the bits of the number's two's
complement.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:07 +05:30
Lakshmi Narayanan Sreethar
c0d25060bd types/comparable_bytes: support fixed length signed integers
To encode fixed-length signed integers in a byte-comparable format, the
first bit of each value is inverted. This ensures that negative numbers
are ordered before positive ones during comparison. This patch adds
support for the data types : byte_type (tinyint), short_type (smallint),
and int32_type (int). Although long_type (bigint) is a fixed length
integer type, it has different byte comparable encoding and will be
handled separately in another patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:07 +05:30
Lakshmi Narayanan Sreethar
8572afca2b types/comparable_bytes: support boolean cql3 type
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:07 +05:30
Lakshmi Narayanan Sreethar
74c556a33d types: introduce comparable_bytes class
This patch implements a new class, `comparable_bytes`, designed to
implement methods for converting data values to and from byte-comparable
formats. The class stores the comparable bytes as `managed_bytes` and
currently provides the structure for all required methods. The actual
logic for converting various data types will be implemented in subsequent
patches.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-07-01 22:19:07 +05:30