Commit Graph

11 Commits

Author SHA1 Message Date
Szymon Malewski
15493872b2 vector_search: fix decimal/varint precision loss in filter value_to_json()
value_to_json() converts CQL values to JSON for vector search filters.
For decimal and varint types, it used rjson::parse() on the JSON string,
which parses through a double and silently loses precision for values
exceeding ~15 significant digits — producing wrong filter results.

Additionally, for decimal type we need an exact string representation
that preserves the original (unscaled, scale) pair, because partition
keys use byte-level identity: different serialized representations of
the same numeric value are distinct rows, so the filter must reproduce
the exact representation stored in the key.

Add big_decimal::to_string_canonical() which follows the Java BigDecimal
toString() spec (JDK 8+), producing a bijective string representation
that uses exponential notation for extreme scales instead of expanding
trailing zeros (which could cause OOM). This could replace to_string(),
but doing so has wider consequences (e.g. hash/equality contract for
decimal_type) described in SCYLLADB-1574. Use it in value_to_json() for
decimal_type, and use rjson::from_string() for varint_type, both
bypassing the lossy double parse path.

Tests cover the new to_string_canonical() and the filter fix, as well as
existing decimal type behavior (key representation, clustering order,
toJson) that we rely on and must not break. The CQL decimal type tests
(test_type_decimal.py) also pass against Cassandra.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1583
Refs: https://scylladb.atlassian.net/browse/SCYLLADB-1574

Closes scylladb/scylladb#29505
2026-05-18 17:07:26 +03:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Lakshmi Narayanan Sreethar
279253ffd0 utils/big_decimal: fix scale overflow when parsing values with large exponents
The exponent of a big decimal string is parsed as an int32, adjusted for
the removed fractional part, and stored as an int32. When parsing values
like `1.23E-2147483647`, the unscaled value becomes `123`, and the scale
is adjusted to `2147483647 + 2 = 2147483649`. This exceeds the int32
limit, and since the scale is stored as an int32, it overflows and wraps
around, losing the value.

This patch fixes that the by parsing the exponent as an int64 value and
then adjusting it for the fractional part. The adjusted scale is then
checked to see if it is still within int32 limits before storing. An
exception is thrown if it is not within the int32 limits.

Note that strings with exponents that exceed the int32 range, like
`0.01E2147483650`, were previously not parseable as a big decimal. They
are now accepted if the final adjusted scale fits within int32 limits.
For the above value, unscaled_value = 1 and scale = -2147483648, so it
is now accepted. This is in line with how Java's `BigDecimal` parses
strings.

Fixes: #24581

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#24640
2025-06-26 15:29:28 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Botond Dénes
b6a9c79af3 utils/big_decimal: add fast paths to operator <=>
Currently, the tri-compare operator for big_decimal (operator <=>), uses
a precise but potentially very expensive algorithm for comparing the
numbers: it first brings them to the same scale, then compares the
normalized unscaled values. big_decimal has abritrary precisions,
therefore the stored numbers can be arbitrarily large.
In extreme cases, comparing two numbers can result in huge amount of
memory allocated and stalls. If this type is used int he primary key of
a table, these comparisons can make the node completely unresponsive.

This patch adds the following fast-paths to operator <=>:
* An early return for the case of equal scales.
* An early return for different signs.
* An early return for the case where one or both of the numbers are 0.
* A fast algorithm for detecting the case where the there is a big
  difference between the two numbers. This algorithm works only with the
  scales and is able to compare the two numbers by using only one division
  and some additions and substractions. This algorithm is imprecise and
  when the numbers are closer than its confidence window, it will
  fall-back to the current slow but precise tri-compare.

All but the last case should have been fast before as well, but the
scale-compare algorithm makes a huge difference. Numbers, which would
previously make the node unresponsive, now compare in constant-time.

Fixes: scylladb/scylladb#21716

Closes scylladb/scylladb#21715
2024-12-03 14:56:51 +02:00
Avi Kivity
6572b297a2 treewide: clean up stray license blurbs
After the mechanical change in fcb8d040e8
("treewide: use Software Package Data Exchange (SPDX) license identifiers"),
a few stray license blurbs or fragments thereof remain. In two cases
these were extra blurbs in code generators intended for the generated code,
in others they were just missed by the script.

Clean them up, adding an SPDX license identifier where needed.

Closes #10072
2022-02-13 14:16:16 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Rafael Ávila de Espíndola
85bb7ff743 big_decimal: Add a test for a corner case
This behavior is different from cassandra, but without arithmetic
operations it doesn't seem possible to notice the difference from
CQL. Using avg produces the same results, since we use an initial
value of 0 (scale = 0).

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-06-25 15:37:23 -07:00
Piotr Sarna
ecc4a87a24 test: add test cases to big_decimal_test
Test cases for big decimals were quite complete, but since the
implementation was recently changed, some corner cases are added:
 - incorrect strings
 - numbers not fitting into uint64_t
 - numbers less than uint64_t::max themselves, but with the unscaled
   value exceeding the maximum
2020-06-01 16:11:49 +02:00
Konstantin Osipov
1c8736f998 tests: move all test source files to their new locations
1. Move tests to test (using singular seems to be a convention
   in the rest of the code base)
2. Move boost tests to test/boost, other
   (non-boost) unit tests to test/unit, tests which are
   expected to be run manually to test/manual.

Update configure.py and test.py with new paths to tests.
2019-12-16 17:47:42 +03:00