Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
DynamoDB limits the allowed magnitude and precision of numbers - valid
decimal exponents are between -130 and 125 and up to 38 significant
decimal digitst are allowed. In contrast, Scylla uses the CQL "decimal"
type which offers unlimited precision. This can cause two problems:
1. Users might get used to this "unofficial" feature and start relying
on it, not allowing us to switch to a more efficient limited-precision
implementation later.
2. If huge exponents are allowed, e.g., 1e-1000000, summing such a
number with 1.0 will result in a huge number, huge allocations and
stalls. This is highly undesirable.
After this patch, all tests in test/alternator/test_number.py now
pass. The various failing tests which verify magnitude and precision
limitations in different places (key attributes, non-key attributes,
and arithmetic expressions) now pass - so their "xfail" tags are removed.
Fixes#6794
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
data
We'll try to distinguish the case when data comes from the storage rather
than user reuqest. Such attribute can be used in expressions and
when it can't be decoded it should make expression evaluate as
false to simply exclude the row during filter query or scan.
Note that this change focuses on binary type, for other types we
may have some inconsistencies in the implementation.
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
Recently, we added full position-in-partition support to alternator's
paging cookie, so it can support stopping at arbitrary positions. This
support however is only really needed when tables have range tombstones
and alternator tables never have them. So to avoid having to make the
new fields in 'ExclusiveStartKey' reserved, we avoid filling these in
when reading an alternator table, as in this case it is safe to assume
the position is `after_key($clustring_key)`. We do include these new
members however when reading CQL tables through alternator. As this is
only supported for system tables, we can also be sure that the elaborate
names we used for these fields are enough to avoid naming clashes.
The condition in the code implementing this is actually even more
general: it only includes the region/weight members when the position
differs from that of a normal alternator one.
The former allows for expressing more positions, like a position
before/after a clustering key. This practically enables the coordinator
side paging logic, for a query to be stopped at a tombstone (which can
have said positions).
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
We have an unwrap_number() function which in case of data errors (such
as the value not being a number) throws an exception with a given
string used in the message.
In this patch we add a variant of unwrap_number() - try_unwrap_number() -
which doesn't take a message, and doesn't throw exceptions - instead it
returns an empty std::optional if the given value is not a number.
This function is useful in places where we need to know if we got a
number or not, but both are fine but not errors. We'll use it in a
following patch to parse expiration times for the TTL feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In the DynamoDB API, UpdateItem's AttributeUpdates parameter (the older
syntax, which was superseded by UpdateExpression) has a DELETE operation
that can do two different things: It can delete an attribute, or it can
delete elements from a set. Before this patch we only implemented the
first feature, and this patch implements the second.
Note that unlike the ordinary delete, the second feature - set subtraction -
is a read-modify-write operation. This is not only because of Alternator's
serialization (as JSON strings, not CRDTs) - but also fundementally because
of the API's guarantees - e.g., the operation is supposed to fail if the
attribute's existing value is *not* a set of the correct type, so it
needs to read the old value.
The test for this feature begins to pass, so its "xfail" mark is
removed. After this, all tests in test/alternator/test_item.py pass :-)
Fixes#5864.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211103151206.157184-1-nyh@scylladb.com>
In the DynamoDB API, a number is encoded in JSON requests as something
like: {"N": "123"} - the type is "N" and the value "123". Note that the
value of the number is encoded as a string, because the floating-point
range and accuracy of DynamoDB differs from what various JSON libraries
may support.
We have a function unwrap_number() which supported the value of the
number being encoded as an actual number, not a string. But we should
NOT support this case - DynamoDB doesn't. In this patch we add a test
that confirms that DynamoDB doesn't, and remove the unnecessary case
from unwrap_number(). The unnecessary case also had a FIXME, so it's
a good opportunity to get rid of a FIXME.
When writing the test, I noticed that the error which DynamoDB returns
in this case is SerializionException instead of the more usual
ValidationException. I don't know why, but let's also change the error
type in this patch.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211115125738.197099-1-nyh@scylladb.com>
The rjson::set() *sounds* like it can set any member of a JSON object
(i.e., map), but that's not true :-( It calls the RapidJson function
AddMember() so it can only add a member to an object which doesn't have
a member with the same name (i.e., key). If it is called with a key
that already has a value, the result may have two values for the same
key, which is ill-formed and can cause bugs like issue #9542.
So in this patch we begin by renaming rjson::set() and its variant to
rjson::add() - to suggest to its user that this function only adds
members, without checking if they already exist.
After this rename, I was left with dozens of calls to the set() functions
that need to changed to either add() - if we're sure that the object
cannot already have a member with the same name - or to replace() if
it might.
The vast majority of the set() calls were starting with an empty item
and adding members with fixed (string constant) names, so these can
be trivially changed to add().
It turns out that *all* other set() calls - except the one fixed in
issue #9542 - can also use add() because there are various "excuses"
why we know the member names will be unique. A typical example is
a map with column-name keys, where we know that the column names
are unique. I added comments in front of such non-obvious uses of
add() which are safe.
Almost all uses of rjson except a handful are in Alternator, so I
verified that all Alternator test cases continue to pass after this
patch.
Fixes#9583
Refs #9542
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104152540.48900-1-nyh@scylladb.com>
The list_concatenate() function was only used for UpdateExpression's
ADD operation, so we made it a static function in the source file where
it was used. In the next patch, we'll want to use it in another place
(AttributeUpdates' ADD operation), so let's move it to the same file
where similar functions for sets exist.
This patch is almost entirely a code move, but also makes one small
change: list_concatenate() used to throw an exception if one of the
arguments wasn't a list, but the text of this exception was specific to
UpdateExpression. So in the new version, we return a null value in this
case - and the caller checks for it and throws the right exception.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
base64.hh pulls in the huge rjson.hh, so if someone just wants
a base64 codec they have to pull in the entire rapidjson library.
Move the json related parts of base64.hh to rjson.hh and adjust
includes and namespaces.
In practice it doesn't make much difference, as all users of base64
appear to want json too. But it's cleaner not to mix the two.
Closes#9433
The base64 encoding/decoding functions will be used for serialization of
hint sync point descriptions. Base64 format is not specific to
Alternator, so it can be moved to utils.
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.
`contains` does not only express the intend of the code better but also
does it in more unified way.
This commit replaces all the occurences of the `count` with the
`contains`.
Tests: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
All the places in conditions.cc, expressions.cc and serialization.cc where
we constructed an api_error, we always used the ValidationException type
string, which the code repeated dozens of times.
This patch converts all these places to use the factory function
api_error::validation().
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In order to eventually switch to a single JSON library,
most of the libjsoncpp usage is dropped in favor of rjson.
Unfortunately, one usage still remains:
test/utils/test_repl utility heavily depends on the *exact textual*
format of its output JSON files, so replacing a library results
in all tests failing because of differences in formatting.
It is possible to force rjson to print its documents in the exact
matching format, but that's left for later, since the issue is not
critical. It would be nice though if our test suite compared
JSON documents with a real JSON parser, since there are more
differences - e.g. libjsoncpp keeps children of the object
sorted, while rapidjson uses an unordered data structure.
This change should cause no change in semantics, it strives
just to replace all usage of libjsoncpp with rjson.
In a couple of places, where we already have a std::string_view, there
is no need to convert to to a std::string (which requires allocation).
One cool observation (by Piotr Sarna) is that map over std::string_view
is fine, when the strings in the map are always string constants.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The source file alternator/executor.cc has grown too much, reaching almost
4,000 lines. In this patch I move about 400 lines out of executor.cc:
1. Some functions related to serialization of sets and lists were moved to
serialization.cc,
2. Functions related to evaluating parsed expressions were moved to
expressions.cc.
The header file expressions_eval.hh was also removed - the calculate_value()
functions now live in expressions.cc, so we can just define them in
expressions.hh, no need for a separate header files.
This patch just moves code around. It doesn't make any functional changes.
Refs #5783.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
While most types (e.g. boolean) are not valid key types for alternator users,
system tables derived from Scylla may still use this type for keys,
e.g. system_auth.roles. Note that types which are not directly
supported by alternator (e.g. double) will not be representable
out-of-the-box - instead, they simply fall back to string, which is both
human-readable and supported by alternator.
The default serialization path for items was subtly broken -
instead of parsing JSON string representation of objects,
it tried to parse a regular string implementation - which is often
also a valid JSON, but nothing guarantees that it actually is.
Tests: alternator-test(local)
Message-Id: <e1668bf4e9029f2675a4ac28bb4598714575efeb.1586096732.git.sarna@scylladb.com>
Our rjson::find() convenience function used RapidJson's "StringRef" type,
which is almost exactly like std::string_view. If we switch to use
string_view as we do in this patch, a lot of call sites become much simpler.
Moreover, there was an even more important motivation for this patch:
the RapidJson FindMember() function we used in rjson::find() has a bug when
given a StringRef - although a StringRef contains a length, the FindMember()
code ignores it and expects the string to be null-terminated (see:
https://github.com/Tencent/rapidjson/issues/1649). In this patch, we wrap
the pointer and length of a std::string_view in an rjson::value, a code path
which bypasses the FindMember bug, and yet does not require copying the
string.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200303141814.26929-1-nyh@scylladb.com>
A couple of places in executor code leaked raw JSON errors to the user
instead of formulating a proper ValidationException message.
These places are now fixed, and the next patch in this series will
act as a regression checker, since all JSON errors will be returned
as SerializationException, not ValidationException instances.
We had a get_key_from_typed_value() utility function to decode a
JSON-encoded value with a known type (the JSON encoding is a map whose
key is the type, the value always a string because all possible key types -
string, bytes and number, are encoded as strings).
However, the function was less useful than it could have been - it was
missing one check for a malformed object (a check which only appeared in
one of its callers), it unnecessarily received the column's expected type
(all the callers passed it the given key column's type).
The cleaned up function will be more useful for the following patch
to support KeyConditionExpression, which wants to reuse it.
While at it, this patch also uses rjson::to_string_view(it->value)
instead of the less correct it->value.GetString() (the latter relies
on null-termination, which is actually true for JSON strings, but there
is no reason to rely on it).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200213192509.32685-3-nyh@scylladb.com>
In 1ca9dc5d47, it was established that the correct way to
base64-decode a JSON value is via string_view, rather than directly
from GetString().
This patch adds a base64_decode(rjson::value) overload, which
automatically uses the correct procedure. It saves typing, ensures
correctness (fixing one incorrect call found), and will come in handy
for future EXPECTED comparisons.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
unwrap_number() is now a public function in serialization.hh instead
of a static function visible only in executor.cc.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
String views used in JSON serialization should use not only the pointer
returned by rapidjson, but also the string length, as it may contain
\0 characters.
Additionally, one unnecessary copy is elided.
It turns out that recent rjson patches introduced some buggy
tabs instead of spaces due to bad IDE configuration. The indentation
is restored to spaces.
Currently the only utility function for getting key bytes
from JSON was to parse a document with the following format:
"key_column_name" : { "key_column_type" : VALUE }.
However, it's also useful to parse only the inner document, i.e.:
{ "key_column_type" : VALUE }.
Profiling alternator implied that JSON parsing takes up a fair amount
of CPU, and as such should be optimized. libjsoncpp is a standard
library for handling JSON objects, but it also proves slower than
rapidjson, which is hereby used instead.
The results indicated that libjsoncpp used roughly 30% of CPU
for a single-shard alternator instance under stress, while rapidjson
dropped that usage to 18% without optimizations.
Future optimizations should include eliding object copying, string copying
and perhaps experimenting with different JSON allocators.
The CQL type singletons like utf8_type et al. are separate for separate
shards and cannot be used across shards. So whatever hash tables we use
to find them, also needs to be per-shard. If we fail to do this, we
get errors running the debug build with multiple shards.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190804165904.14204-1-nyh@scylladb.com>
Attributes used to be written into the database in raw JSON format,
which is far from optimal. This patch introduces more robust
serializationi routines for simple alternator types: S, B, BOOL, N.
Serialization uses the first byte to encode attribute type
and follows with serializing data in binary form.
More complex types (sets, lists, etc.) are currently still
serialized in raw JSON and will be optimized in follow-up patches.
Message-Id: <10955606455bbe9165affb8ac8fba4d9e7c3705f.1559646761.git.sarna@scylladb.com>