scylladb

Author	SHA1	Message	Date
Kefu Chai	a0e5c14c55	alternator: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16736	2024-01-12 10:53:32 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Nadav Har'El	3c0603558c	alternator: add validation of numbers' magnitude and precision DynamoDB limits the allowed magnitude and precision of numbers - valid decimal exponents are between -130 and 125 and up to 38 significant decimal digitst are allowed. In contrast, Scylla uses the CQL "decimal" type which offers unlimited precision. This can cause two problems: 1. Users might get used to this "unofficial" feature and start relying on it, not allowing us to switch to a more efficient limited-precision implementation later. 2. If huge exponents are allowed, e.g., 1e-1000000, summing such a number with 1.0 will result in a huge number, huge allocations and stalls. This is highly undesirable. After this patch, all tests in test/alternator/test_number.py now pass. The various failing tests which verify magnitude and precision limitations in different places (key attributes, non-key attributes, and arithmetic expressions) now pass - so their "xfail" tags are removed. Fixes #6794 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-05-02 11:04:05 +03:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Marcin Maliszkiewicz	6f055ca5f9	alternator: evaluate expressions as false for stored malformed binary data We'll try to distinguish the case when data comes from the storage rather than user reuqest. Such attribute can be used in expressions and when it can't be decoded it should make expression evaluate as false to simply exclude the row during filter query or scan. Note that this change focuses on binary type, for other types we may have some inconsistencies in the implementation.	2023-01-16 15:15:27 +01:00
Marcin Maliszkiewicz	bcbaccc143	rjson: avoid copy constructors in from_string calls when possible This function anyway copies the value so no need to do extra copy.	2023-01-16 15:15:26 +01:00
Avi Kivity	2739ac66ed	treewide: drop cql_serialization_format Now that we don't accept cql protocol version 1 or 2, we can drop cql_serialization format everywhere, except when in the IDL (since it's part of the inter-node protocol). A few functions had duplicate versions, one with and one without a cql_serialization_format parameter. They are deduplicated. Care is taken that `partition_slice`, which communicates the cql_serialization_format across nodes, still presents a valid cql_serialization_format to other nodes when transmitting itself and rejects protocol 1 and 2 serialization\ format when receiving. The IDL is unchanged. One test checking the 16-bit serialization format is removed.	2023-01-03 19:54:13 +02:00
Botond Dénes	f1a039fc2b	treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} We just added a convenience static factory method for partition start, change the present users of the clunky constructor+tag to use it instead.	2022-11-11 09:58:18 +02:00
Botond Dénes	2b6eeadc07	alternator: use position-in-partition in paging cookie only when reading CQL tables Recently, we added full position-in-partition support to alternator's paging cookie, so it can support stopping at arbitrary positions. This support however is only really needed when tables have range tombstones and alternator tables never have them. So to avoid having to make the new fields in 'ExclusiveStartKey' reserved, we avoid filling these in when reading an alternator table, as in this case it is safe to assume the position is `after_key($clustring_key)`. We do include these new members however when reading CQL tables through alternator. As this is only supported for system tables, we can also be sure that the elaborate names we used for these fields are enough to avoid naming clashes. The condition in the code implementing this is actually even more general: it only includes the region/weight members when the position differs from that of a normal alternator one.	2022-06-30 15:10:30 +03:00
Botond Dénes	2b0bc11f2e	service/paging: use position_in_partition instead of clustering_key for last row The former allows for expressing more positions, like a position before/after a clustering key. This practically enables the coordinator side paging logic, for a query to be stopped at a tombstone (which can have said positions).	2022-06-23 13:36:20 +03:00
Botond Dénes	adabe3b5a3	alternator/serialization: extract value object parsing logic To make it reusable by a method added by the next patch.	2022-06-23 11:33:18 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Nadav Har'El	f7e984110d	alternator: add another unwrap_number() variant We have an unwrap_number() function which in case of data errors (such as the value not being a number) throws an exception with a given string used in the message. In this patch we add a variant of unwrap_number() - try_unwrap_number() - which doesn't take a message, and doesn't throw exceptions - instead it returns an empty std::optional if the given value is not a number. This function is useful in places where we need to know if we got a number or not, but both are fine but not errors. We'll use it in a following patch to parse expiration times for the TTL feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:37 +02:00
Nadav Har'El	253387ea07	alternator: implement AttributeUpdates DELETE operation with Value In the DynamoDB API, UpdateItem's AttributeUpdates parameter (the older syntax, which was superseded by UpdateExpression) has a DELETE operation that can do two different things: It can delete an attribute, or it can delete elements from a set. Before this patch we only implemented the first feature, and this patch implements the second. Note that unlike the ordinary delete, the second feature - set subtraction - is a read-modify-write operation. This is not only because of Alternator's serialization (as JSON strings, not CRDTs) - but also fundementally because of the API's guarantees - e.g., the operation is supposed to fail if the attribute's existing value is not a set of the correct type, so it needs to read the old value. The test for this feature begins to pass, so its "xfail" mark is removed. After this, all tests in test/alternator/test_item.py pass :-) Fixes #5864. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211103151206.157184-1-nyh@scylladb.com>	2021-11-23 08:51:06 +01:00
Nadav Har'El	6e1344eb4f	alternator: better error handling for wrongly-encoded numbers In the DynamoDB API, a number is encoded in JSON requests as something like: {"N": "123"} - the type is "N" and the value "123". Note that the value of the number is encoded as a string, because the floating-point range and accuracy of DynamoDB differs from what various JSON libraries may support. We have a function unwrap_number() which supported the value of the number being encoded as an actual number, not a string. But we should NOT support this case - DynamoDB doesn't. In this patch we add a test that confirms that DynamoDB doesn't, and remove the unnecessary case from unwrap_number(). The unnecessary case also had a FIXME, so it's a good opportunity to get rid of a FIXME. When writing the test, I noticed that the error which DynamoDB returns in this case is SerializionException instead of the more usual ValidationException. I don't know why, but let's also change the error type in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211115125738.197099-1-nyh@scylladb.com>	2021-11-15 14:47:49 +01:00
Nadav Har'El	5e52858295	rjson, alternator: rename set() functions add() The rjson::set() sounds like it can set any member of a JSON object (i.e., map), but that's not true :-( It calls the RapidJson function AddMember() so it can only add a member to an object which doesn't have a member with the same name (i.e., key). If it is called with a key that already has a value, the result may have two values for the same key, which is ill-formed and can cause bugs like issue #9542. So in this patch we begin by renaming rjson::set() and its variant to rjson::add() - to suggest to its user that this function only adds members, without checking if they already exist. After this rename, I was left with dozens of calls to the set() functions that need to changed to either add() - if we're sure that the object cannot already have a member with the same name - or to replace() if it might. The vast majority of the set() calls were starting with an empty item and adding members with fixed (string constant) names, so these can be trivially changed to add(). It turns out that all other set() calls - except the one fixed in issue #9542 - can also use add() because there are various "excuses" why we know the member names will be unique. A typical example is a map with column-name keys, where we know that the column names are unique. I added comments in front of such non-obvious uses of add() which are safe. Almost all uses of rjson except a handful are in Alternator, so I verified that all Alternator test cases continue to pass after this patch. Fixes #9583 Refs #9542 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211104152540.48900-1-nyh@scylladb.com>	2021-11-04 16:35:38 +01:00
Nadav Har'El	7e6c5394f3	alternator: move list_concatenate() function The list_concatenate() function was only used for UpdateExpression's ADD operation, so we made it a static function in the source file where it was used. In the next patch, we'll want to use it in another place (AttributeUpdates' ADD operation), so let's move it to the same file where similar functions for sets exist. This patch is almost entirely a code move, but also makes one small change: list_concatenate() used to throw an exception if one of the arguments wasn't a list, but the text of this exception was specific to UpdateExpression. So in the new version, we return a null value in this case - and the caller checks for it and throws the right exception. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-03 10:19:26 +02:00
Avi Kivity	d3f8148807	utils: untie rjson.hh from base64.hh base64.hh pulls in the huge rjson.hh, so if someone just wants a base64 codec they have to pull in the entire rapidjson library. Move the json related parts of base64.hh to rjson.hh and adjust includes and namespaces. In practice it doesn't make much difference, as all users of base64 appear to want json too. But it's cleaner not to mix the two. Closes #9433	2021-10-05 12:57:54 +02:00
Piotr Dulikowski	5a0942a0f8	utils,alternator: move base64 code from alternator to utils The base64 encoding/decoding functions will be used for serialization of hint sync point descriptions. Base64 format is not specific to Alternator, so it can be moved to utils.	2021-08-09 09:24:36 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Nadav Har'El	bca88521ba	alternator: use api_error::validation() All the places in conditions.cc, expressions.cc and serialization.cc where we constructed an api_error, we always used the ValidationException type string, which the code repeated dozens of times. This patch converts all these places to use the factory function api_error::validation(). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-23 15:36:39 +03:00
Piotr Sarna	7ae3b25d8e	alternator: cleanup raw GetString() calls Instead of using raw GetString() from rapidjson, it's neater to use a helper for creating string views: rjson::to_string_view(). Message-Id: <3afda97403d4601c9600f6838f2028bfabd2f2f9.1594289250.git.sarna@scylladb.com>	2020-07-09 13:58:40 +03:00
Piotr Sarna	96426df72e	alternator: translate number errors to ValidationException In order to be consistent with returned error types, marshaling exceptions thrown from parsing big decimals are translated to ValidationException. Message-Id: <1446878cd63ad8291327a399cf700e4f402d108c.1594289250.git.sarna@scylladb.com>	2020-07-09 13:58:25 +03:00
Piotr Sarna	4cb79f04b0	treewide: replace libjsoncpp usage with rjson In order to eventually switch to a single JSON library, most of the libjsoncpp usage is dropped in favor of rjson. Unfortunately, one usage still remains: test/utils/test_repl utility heavily depends on the exact textual format of its output JSON files, so replacing a library results in all tests failing because of differences in formatting. It is possible to force rjson to print its documents in the exact matching format, but that's left for later, since the issue is not critical. It would be nice though if our test suite compared JSON documents with a real JSON parser, since there are more differences - e.g. libjsoncpp keeps children of the object sorted, while rapidjson uses an unordered data structure. This change should cause no change in semantics, it strives just to replace all usage of libjsoncpp with rjson.	2020-07-03 10:27:23 +02:00
Nadav Har'El	493d7e6716	alternator: avoid unnecessary conversion to string In a couple of places, where we already have a std::string_view, there is no need to convert to to a std::string (which requires allocation). One cool observation (by Piotr Sarna) is that map over std::string_view is fine, when the strings in the map are always string constants. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	8c026b9f10	alternator: move some code out of executor.cc The source file alternator/executor.cc has grown too much, reaching almost 4,000 lines. In this patch I move about 400 lines out of executor.cc: 1. Some functions related to serialization of sets and lists were moved to serialization.cc, 2. Functions related to evaluating parsed expressions were moved to expressions.cc. The header file expressions_eval.hh was also removed - the calculate_value() functions now live in expressions.cc, so we can just define them in expressions.hh, no need for a separate header files. This patch just moves code around. It doesn't make any functional changes. Refs #5783. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Piotr Sarna	7006389f69	alternator: refuse empty strings/binary blobs in keys In order to be compatible with DynamoDB, we should refuse items which keys contain empty strings or byte blobs.	2020-05-19 11:32:18 +02:00
Piotr Sarna	df02fc6b06	alternator: add fallback serialization for all types While most types (e.g. boolean) are not valid key types for alternator users, system tables derived from Scylla may still use this type for keys, e.g. system_auth.roles. Note that types which are not directly supported by alternator (e.g. double) will not be representable out-of-the-box - instead, they simply fall back to string, which is both human-readable and supported by alternator.	2020-04-09 09:41:30 +02:00
Piotr Sarna	0bb211a65f	alternator: defuse a serialization path time bomb The default serialization path for items was subtly broken - instead of parsing JSON string representation of objects, it tried to parse a regular string implementation - which is often also a valid JSON, but nothing guarantees that it actually is. Tests: alternator-test(local) Message-Id: <e1668bf4e9029f2675a4ac28bb4598714575efeb.1586096732.git.sarna@scylladb.com>	2020-04-05 18:55:54 +03:00
Nadav Har'El	0fcb226412	alternator: switch rjson::find() to use std::string_view Our rjson::find() convenience function used RapidJson's "StringRef" type, which is almost exactly like std::string_view. If we switch to use string_view as we do in this patch, a lot of call sites become much simpler. Moreover, there was an even more important motivation for this patch: the RapidJson FindMember() function we used in rjson::find() has a bug when given a StringRef - although a StringRef contains a length, the FindMember() code ignores it and expects the string to be null-terminated (see: https://github.com/Tencent/rapidjson/issues/1649). In this patch, we wrap the pointer and length of a std::string_view in an rjson::value, a code path which bypasses the FindMember bug, and yet does not require copying the string. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200303141814.26929-1-nyh@scylladb.com>	2020-03-03 16:35:41 +01:00
Piotr Sarna	6f8c70d54b	alternator: fix returning raw JSON errors A couple of places in executor code leaked raw JSON errors to the user instead of formulating a proper ValidationException message. These places are now fixed, and the next patch in this series will act as a regression checker, since all JSON errors will be returned as SerializationException, not ValidationException instances.	2020-02-28 07:57:12 +02:00
Piotr Sarna	0af8516675	alternator: remove rjson::parse_raw With parse() being based on std::string_view, there's not much sense in keeping a separate parse_raw function, so it's deleted.	2020-02-28 07:57:12 +02:00
Nadav Har'El	15515b2cc1	alternator: more useful get_key_from_typed_value() utility function We had a get_key_from_typed_value() utility function to decode a JSON-encoded value with a known type (the JSON encoding is a map whose key is the type, the value always a string because all possible key types - string, bytes and number, are encoded as strings). However, the function was less useful than it could have been - it was missing one check for a malformed object (a check which only appeared in one of its callers), it unnecessarily received the column's expected type (all the callers passed it the given key column's type). The cleaned up function will be more useful for the following patch to support KeyConditionExpression, which wants to reuse it. While at it, this patch also uses rjson::to_string_view(it->value) instead of the less correct it->value.GetString() (the latter relies on null-termination, which is actually true for JSON strings, but there is no reason to rely on it). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200213192509.32685-3-nyh@scylladb.com>	2020-02-16 11:22:30 +02:00
Piotr Sarna	9504bbf5a4	alternator: move unwrap_set to serialization header The utility function for unwrapping a set is going to be useful across source files, so it's moved to serialization.hh/serialization.cc.	2019-12-10 15:08:47 +01:00
Rafael Ávila de Espíndola	786b1ec364	types: Move json code to its own file Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-7-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Dejan Mircevski	ceae3c182f	alternator: Overload base64_decode on rjson::value In `1ca9dc5d47`, it was established that the correct way to base64-decode a JSON value is via string_view, rather than directly from GetString(). This patch adds a base64_decode(rjson::value) overload, which automatically uses the correct procedure. It saves typing, ensures correctness (fixing one incorrect call found), and will come in handy for future EXPECTED comparisons. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 15:56:03 -04:00
Dejan Mircevski	9955f0342f	alternator: Make unwrap_number() visible unwrap_number() is now a public function in serialization.hh instead of a static function visible only in executor.cc. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-10-31 10:46:30 -04:00
Piotr Sarna	f922d6d771	alternator: Add 'mismatch' to serialization error message In order to match the tests and origin more properly, the error message for mismatched types is updated so it contains the word 'mismatch'.	2019-09-11 18:01:05 +03:00
Nadav Har'El	c9eb9d9c76	alternator: update license blurbs Update all the license blurbs to the one we use in the open-source Scylla project, licensed under the AGPL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825160321.10016-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	1ca9dc5d47	alternator: use correct string views in serialization String views used in JSON serialization should use not only the pointer returned by rapidjson, but also the string length, as it may contain \0 characters. Additionally, one unnecessary copy is elided.	2019-09-11 18:01:05 +03:00
Piotr Sarna	ab25472034	alternator: migrate to visitor pattern in serialization Types can now be processed with a visitor pattern, which is more neat than a chain of if statements. Message-Id: <256429b7593d8ad8dff737d8ddb356991fb2a423.1566386758.git.sarna@scylladb.com>	2019-09-11 18:01:05 +03:00
Piotr Sarna	88eed415bd	alternator: fix indentation It turns out that recent rjson patches introduced some buggy tabs instead of spaces due to bad IDE configuration. The indentation is restored to spaces.	2019-09-11 18:01:05 +03:00
Piotr Sarna	9c05051b59	alternator: extract getting key value subfunction Currently the only utility function for getting key bytes from JSON was to parse a document with the following format: "key_column_name" : { "key_column_type" : VALUE }. However, it's also useful to parse only the inner document, i.e.: { "key_column_type" : VALUE }.	2019-09-11 18:01:05 +03:00
Piotr Sarna	cb29d6485e	alternator: migrate to rapidjson library Profiling alternator implied that JSON parsing takes up a fair amount of CPU, and as such should be optimized. libjsoncpp is a standard library for handling JSON objects, but it also proves slower than rapidjson, which is hereby used instead. The results indicated that libjsoncpp used roughly 30% of CPU for a single-shard alternator instance under stress, while rapidjson dropped that usage to 18% without optimizations. Future optimizations should include eliding object copying, string copying and perhaps experimenting with different JSON allocators.	2019-09-11 18:01:04 +03:00
Nadav Har'El	1b1ede9288	alternator: fix cross-shard use of CQL type objects The CQL type singletons like utf8_type et al. are separate for separate shards and cannot be used across shards. So whatever hash tables we use to find them, also needs to be per-shard. If we fail to do this, we get errors running the debug build with multiple shards. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190804165904.14204-1-nyh@scylladb.com>	2019-09-11 16:05:39 +03:00
Piotr Sarna	b67f22bfc6	alternator: move related functions to serialization.cc Existing functions related to serialization and deserialization are moved to serialization.cc source file. Message-Id: <fb49a08b05fdfcf7473e6a7f0ac53f6eaedc0144.1559646761.git.sarna@scylladb.com>	2019-09-11 15:06:05 +03:00
Piotr Sarna	b3fd4b5660	alternator: add simple attribute serialization routines Attributes used to be written into the database in raw JSON format, which is far from optimal. This patch introduces more robust serializationi routines for simple alternator types: S, B, BOOL, N. Serialization uses the first byte to encode attribute type and follows with serializing data in binary form. More complex types (sets, lists, etc.) are currently still serialized in raw JSON and will be optimized in follow-up patches. Message-Id: <10955606455bbe9165affb8ac8fba4d9e7c3705f.1559646761.git.sarna@scylladb.com>	2019-09-11 15:01:07 +03:00

49 Commits