scylladb

Author	SHA1	Message	Date
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Botond Dénes	cd6bbd37a4	utils/utf8.c: move includes outside of namespaces Including in the middle of a namespace is not a good practice. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210528142502.962947-1-bdenes@scylladb.com>	2021-05-30 23:23:20 +03:00
Avi Kivity	3c292e31af	utils: utf8: fix validate_partial() on non-SIMD-optimized architectures validate_partial() is declared in the internal namespace, but defined outside it. This causes calls to validate_partial() to be ambiguous on architectures that haven't been SIMD-optimized yet (e.g. s390x). Fix by defining it in the internal namespace. Closes #8268	2021-03-23 09:21:14 +02:00
Avi Kivity	3d1be9286f	utils: utf8: expose validate_partial() in a header Since fragmented buffers are templates, we'll need access to validate_partial() in a header. Move it there.	2020-10-21 11:14:44 +03:00
Avi Kivity	22a0c457e2	utils: utf8: introduce validate_partial() The current validators expect the buffer to contain a full UTF-8 string. This won't be the case for fragmented buffers, since a codepoint can straddle two (or more) buffers. To prepare for that, convert the existing validators to validate_partial(), which returns either an error, or success with an indication of the size of the tail that was not validated and now many bytes it is missing. This is natural since the SIMD validators already cannot process a tail in SIMD mode if it's smaller than the vector size, so only minor rearrangements are needed. In addition, we now have validate_partial() for non-SIMD architectures, since we'll need it for fragmented buffer validation.	2020-10-21 11:14:44 +03:00
Avi Kivity	900699f1b5	utils: utf8: extract a function to evaluate a single codepoint Our SIMD optimized validators cannot process a codepoint that spans multiple buffers, and adapting them to be able to will slow them down. So our strategy is to special-case any codepoint that spans two buffers. To do that, extract an evaluate_codepoint() function from the current validate_naive() function. It returns three values: - if a codepoint was successfully decoded from the buffer, how many bytes were consumed - if not enough bytes were in the buffer, how many more are needed - otherwise, an error happened, so return an indication The new function uses a table to calculate a codepoint's size from its first byte, similar to the SIMD variants. validate_naive() is now implemented in terms of evaluate_codepoint().	2020-10-21 11:14:43 +03:00
Avi Kivity	31a5378a82	utils: utf8: avoid harmless integer overflow 240 doesn't fit in char without overflow, so cast it explicitly to avoid a clang warning.	2020-09-22 17:24:33 +03:00
Piotr Grabowski	ffd8c8c505	utf8: Print invalid UTF-8 character position Add new validate_with_error_position function which returns -1 if data is a valid UTF-8 string or otherwise a byte position of first invalid character. The position is added to exception messages of all UTF-8 parsing errors in Scylla. validate_with_error_position is done in two passes in order to preserve the same performance in common case when the string is valid.	2020-09-07 18:11:21 +03:00
Yibo Cai (Arm Technology China)	6fadba56cc	utils: optimize UTF-8 validation UTF-8 string is now validated by boost::locale::conv::utf_to_utf, it actually does string conversions which is more than necessary. As observed on Arm server, UTF-8 validation can become bottleneck under heavy loads. This patch introduces a brand new SIMD implementation supporting both NEON and SSE, as well as a naive approach to handle short strings. The naive approach is 3x faster than boost utf_to_utf, whilst SIMD method outperforms naive approach 3x ~ 5x on Arm and x86. Details at https://github.com/cyb70289/utf8/. UTF-8 unit test is added to check various corner cases. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1543978498-12123-1-git-send-email-yibo.cai@arm.com>	2018-12-05 21:51:01 +02:00

11 Commits