Values usually come in a single fragment, but we pay the cost of fragmented
deserialization nevertheless: bigger view objects (4 words instead of 2 words)
more state to keep updated (i.e. total view size in addition to current fragment
size) and more branches.
This patch adds a special case for single-fragment buffers to
abstract_type::deserialize. They are converted to a single_fragmented_view
before doing anything else. Templates instantiated with single_fragmented_view
should compile to better code than their multi-fragmented counterparts. If
abstract_type::deserialize is inlined, this patch should completely prevent any
performance penalties for switching from with_linearized to fragmented
deserialization.
Devirtualizes collection_type_impl::deserialize (so it can be templated) and
adds a FragmentedView overload. This will allow us to deserialize collections
with explicit cql_serialization_format directly from fragmented buffers.
The final part of the transition of deserialize from bytes_view to
FragmentedView.
Adds a FragmentedView overload to abstract_type::deserialize and
switches deserialize_visitor from bytes_view to FragmentedView, allowing
deserialization of all types with no intermediate linearization.
Although the code for it existed already, the validation function
hasn't been invoked properly. This change fixes that, adding
a validating check when converting from text to specific value
type and throwing a marshal exception if some characters
are not ASCII.
Fixes#5421Closes#7532
Currently, we linearize large UTF8 cells in order to validate them.
This can cause large latency spikes if the cell is large.
This series changes UTF8 validation to work on fragmented buffers.
This is somewhat tricky since the validation routines are optimized
for single-instruction-multiple-data (SIMD) architectures.
The unit tests are expanded to cover the new functionality.
Fixes#7448.
Closes#7449
* github.com:scylladb/scylla:
types: don't linearize utf8 for validation
test: utf8: add fragmented buffer validation tests
utils: utf8: add function to validate fragmented buffers
utils: utf8: expose validate_partial() in a header
utils: utf8: introduce validate_partial()
utils: utf8: extract a function to evaluate a single codepoint
Use the new non-linearizing validator, avoiding linearization.
Linearization can cause large contiguous memory allocations, which
in turn causes latency spikes.
Fixes#7448.
Clang does not implement P0960R3, parenthesized initialization of
aggregates, so we have to use brace initialization in
validation_visitor. It also does not implement class template
argument deduction for aggregates (P1816r0), so we have to
specify the template parameters explicity.
When the number of nanosecond digits is greater than 9, the std::pow()
expression that corrects the nanosecond value becomes infinite. This
is because sstring::length() is unsigned, and so negative values
underflow and become large.
Following Cassandra, fix by forbidding more than 9 digits of
nanosecond precision.
Found by clang's ubsan.
Closes#7371
Following 5d249a8e27, apply the same
fix for user_type_impl.
This works around https://bugs.llvm.org/show_bug.cgi?id=47747
Depending on this might be unstable, as the bug bug can show up at any
corner, but this is sufficient right now to get
test_user_function_disabled to pass.
Closes#7370
Instead of eagerly linearizing all values as they are passed to
validate(), defer linearization to those validators that actually need
linearized values. Linearizing large values puts pressure on the memory
allocator with large contiguous allocation requests. This is something
we are trying to actively avoid, especially if it is not really neaded.
Turns out the types, whose validators really want linearized values are
a minority, as most validators just look at the size of the value, and
some like bytes don't need validation at all, while usually having large
values.
This is achieved by templating the validator struct on the view and
using the FragmentedRange concept to treat all passed in views
(`bytes_view` and `fragmented_temporary_buffer_view`) uniformly.
This patch makes no attempt at converting existing validators to work
with fragmented buffers, only trivial cases are converted. The major
offenders still left are ascii/utf8 and collections.
Fixes: #7318
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201007054524.909420-1-bdenes@scylladb.com>
Due to #7175, microseconds are stored in a db_clock::time_point
as if they were milliseconds.
std::chrono::duration_cast<std::chrono::nanoseconds> may cause overflow
and end up with invalid/negative nanos.
This change specializes time_point_to_string to std::chrono::milliseconds
since it's currently only called to print db_clock::time_point
and uses boost::posix_time::milliseconds to print the count.
This would generate an exception in today's time stamps
and the output will look like:
1599493018559873 milliseconds (Year is out of valid range: 1400..9999)
instead of:
1799-07-16T19:57:52.175010
It is preferrable to print the numeric value annotated as out of valid range
than to print a bogus date in the past.
Test: unit(dev), commitlog_test:TestCommitLog.test_mixed_mode_commitlog_same_partition_smp_1
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200907162845.147477-1-bhalevy@scylladb.com>
Add new validate_with_error_position function
which returns -1 if data is a valid UTF-8 string
or otherwise a byte position of first invalid
character. The position is added to exception
messages of all UTF-8 parsing errors in Scylla.
validate_with_error_position is done in two
passes in order to preserve the same performance
in common case when the string is valid.
As seen in https://github.com/scylladb/scylla/issues/7175,
1e676cd845
that was merged in bc77939ada
exposed a preexisting problem in time_point_to_string
where it tried printing a timestamp that was in microseconds
(taken from an api::timestamp_type instead of db_clock::time_point)
and hit `boost::wrapexcept<boost::gregorian::bad_year> (Year is out of valid range: 1400..9999)`
If hit, this patch with print the offending time_stamp in
nanoseconds and the error message.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200907083303.33229-1-bhalevy@scylladb.com>
T& tp may have other period than milliseconds.
Cast the time_point duration to nanoseconds (or microseconds
if boost doesn't supports it) so it is printed in the best possible
resolution.
Note that we presume that the time_point epoch is the
Unix epoch of 1970-01-01, but the c++ standard doesn't guwarntee
that. See https://github.com/scylladb/scylla/issues/5498
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200906171106.690872-1-bhalevy@scylladb.com>
Seastar recently lost support for the experimental Concepts Technical
Specification (TS) and gained support for C++20 concepts. Re-enable
concepts in Scylla by updating our use of concepts to the C++20
standard.
This change:
- peels off uses of the GCC6_CONCEPT macro
- removes inclusions of <seastar/gcc6-concepts.hh>
- replaces function-style concepts (no longer supported) with
equation-style concepts
- semicolons added and removed as needed
- deprecated std::is_pod replaced by recommended replacement
- updates return type constraints to use concepts instead of
type names (either std::same_as or std::convertible_to, with
std::same_as chosen when possible)
No attempt is made to improve the concepts; this is a specification
update only.
Message-Id: <20200531110254.2555854-1-avi@scylladb.com>
Fixed-size integer types are legal varints - both are serialized as
two's complement in network byte order. So there's tinyint, shortint,
int, and bigint can be interpreted as varints.
Change is_compatible_with() to reflect that.
Message-Id: <20200516115143.28690-2-avi@scylladb.com>
The short and byte types are two's complement network byte order,
just like varint (except fixed size) and so varint can read them
just fine.
Mark them as value compatible like int32_type and long_type.
A unit test is added.
Message-Id: <20200516115143.28690-1-avi@scylladb.com>
So that nested exceptions are not lost. Also, marshal exceptions, the
ones we have in these places, already have a backtrace, so might as well
use that, instead of creating a new one, loosing unwound frames.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200507091405.244544-1-bdenes@scylladb.com>
Queries with `ALLOW FILTERING` and constraints on counter
values used to be rejected as "unimplemented". The reason
was a missing tri-comparator, which is added in this patch.
Fixes#5635
* jul-stas-5635-filtering-on-counters:
cql/tests: Added test for filtering on counter columns
counters: add comparator and remove `unimplemented` from restrictions