Commit Graph

440 Commits

Author SHA1 Message Date
Avi Kivity
a11ecfe231 Merge 'types: don't linearize in validate()' from Michał Chojnowski
A sequel to #7692.

This series gets rid of linearization when validating collections and tuple types. (Other types were already validated without linearizing).
The necessary helpers for reading from fragmented buffers were introduced in #7692. All this series does is put them to use in `validate()`.

Refs: #6138

Closes #7770

* github.com:scylladb/scylla:
  types: add single-fragment optimization in validate()
  utils: fragment_range: add with_simplified()
  cql3: statements: select_statement: remove unnecessary use of with_linearized
  cql3: maps: remove unnecessary use of with_linearized
  cql3: lists: remove unnecessary use of with_linearized
  cql3: tuples: remove unnecessary use of with_linearized
  cql3: sets: remove unnecessary use of with_linearized
  cql3: tuples: remove unnecessary use of with_linearized
  cql3: attributes: remove unnecessary uses of with_linearized
  types: validate lists without linearizing
  types: validate tuples without linearizing
  types: validate sets without linearizing
  types: validate maps without linearizing
  types: template abstract_type::validate on FragmentedView
  types: validate_visitor: transition from FragmentRange to FragmentedView
  utils: fragmented_temporary_buffer: add empty() to FragmentedView
  utils: fragmented_temporary_buffer: don't add to null pointer
2020-12-11 17:33:59 +02:00
Michał Chojnowski
150473f074 types: add single-fragment optimization in validate()
Manipulating fragmented views is costlier that manipulating contiguous views,
so let's detect the common situation when the fragmented view is actually
contiguous underneath, and make use of that.

Note: this optimization is only useful for big types. For trivial types,
validation usually only checks the size of the view.
2020-12-11 09:53:07 +01:00
Michał Chojnowski
0581b3ff31 types: validate lists without linearizing
We can validate collections directly from fragmented buffers now.
2020-12-11 09:53:07 +01:00
Michał Chojnowski
4fe41b69fd types: validate tuples without linearizing
We can validate tuples directly from fragmented buffers now.
2020-12-11 09:53:07 +01:00
Michał Chojnowski
a7dd736d03 types: validate sets without linearizing
We can validate collections directly from fragmented buffers now.
2020-12-11 09:53:07 +01:00
Michał Chojnowski
1459608375 types: validate maps without linearizing
We can validate collections directly from fragmented buffers now.
2020-12-11 09:53:07 +01:00
Michał Chojnowski
82befbe8c0 types: template abstract_type::validate on FragmentedView
This is primarily a stylistic change. It makes the interface more consistent
with deserialize(). It will also allow us to call `validate()` for collection
elements in `validate_aux()`.
2020-12-11 09:53:07 +01:00
Michał Chojnowski
15dbe00e8a types: validate_visitor: transition from FragmentRange to FragmentedView
This will allow us to easily get rid of linearizations when validating
collections and tuples, because the helpers used in validate_aux() already
have FragmentedView overloads.
2020-12-11 09:53:07 +01:00
Michał Chojnowski
d43fd456cd types: switch serialize_for_cql from bytes to bytes_ostream
Now we can serialize collections from collection_mutation_view_description
without linearizations.
2020-12-07 17:55:36 +01:00
Michał Chojnowski
81a55b032d types: switch serialize_for_cql_aux from bytes to bytes_ostream
We will switch serialize_for_cql itself to bytes_ostream soon.
2020-12-07 17:55:35 +01:00
Michał Chojnowski
71183cf0bd types: serialize user types to bytes_ostream
Avoids linearization by serializing to a fragmented type.
It's still linearized at the very end, this will be changed in the near future.
2020-12-07 17:52:06 +01:00
Michał Chojnowski
41b889d0c8 types: serialize lists to bytes_ostream
Avoids linearization by serializing to a fragmented type.
It's still linearized at the very end, this will be changed in the near future.
2020-12-07 17:49:21 +01:00
Michał Chojnowski
2b3d2c193d types: serialize sets to bytes_ostream
Avoids linearization by serializing to a fragmented type.
It's still linearized at the very end, this will be changed in the near future.
2020-12-07 17:47:49 +01:00
Michał Chojnowski
35823d12db types: serialize maps to bytes_ostream
Avoids linearization by serializing to a fragmented type.
It's still linearized at the very end, this will be changed in the near future.
2020-12-07 17:47:12 +01:00
Michał Chojnowski
1fe7490970 types: add write_collection_value() overload for bytes_ostream and value_view
We will use it to serialize collections to bytes_ostream in serialize_for_cql().
2020-12-07 08:48:31 +01:00
Michał Chojnowski
a1f7fabb3d types: collection: add an optimization for single-fragment buffers in deserialize
Helpers parametrized with single_fragmented_view should compile to better code,
so let's use them when possible.
2020-12-04 09:21:05 +01:00
Michał Chojnowski
08c394726e types: add an optimization for single-fragment buffers in deserialize
Values usually come in a single fragment, but we pay the cost of fragmented
deserialization nevertheless: bigger view objects (4 words instead of 2 words)
more state to keep updated (i.e. total view size in addition to current fragment
size) and more branches.

This patch adds a special case for single-fragment buffers to
abstract_type::deserialize. They are converted to a single_fragmented_view
before doing anything else. Templates instantiated with single_fragmented_view
should compile to better code than their multi-fragmented counterparts. If
abstract_type::deserialize is inlined, this patch should completely prevent any
performance penalties for switching from with_linearized to fragmented
deserialization.
2020-12-04 09:19:39 +01:00
Michał Chojnowski
04786dee30 types: remove unneeded collection deserialization overloads
Inherit the method from base class rather than reimplementing it in every child.
2020-12-04 09:19:39 +01:00
Michał Chojnowski
c08419e28d types: switch collection_type_impl::deserialize from bytes_view to FragmentedView
Devirtualizes collection_type_impl::deserialize (so it can be templated) and
adds a FragmentedView overload. This will allow us to deserialize collections
with explicit cql_serialization_format directly from fragmented buffers.
2020-12-04 09:19:37 +01:00
Michał Chojnowski
58d9f52363 types: remove unused deserialize_aux
Dead code.
2020-12-03 10:57:07 +01:00
Michał Chojnowski
8440279130 types: deserialize: don't linearize tuple elements
We can deserialize directly from fragmented buffers now.
2020-12-03 10:57:07 +01:00
Michał Chojnowski
a216b0545f types: deserialize: don't linearize collection elements
We can deserialize directly from fragmented buffers now.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
1ccdfc7a90 types: switch deserialize from bytes_view to FragmentedView
The final part of the transition of deserialize from bytes_view to
FragmentedView.
Adds a FragmentedView overload to abstract_type::deserialize and
switches deserialize_visitor from bytes_view to FragmentedView, allowing
deserialization of all types with no intermediate linearization.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
898cea4cde types: deserialize tuple types from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
507883f808 types: deserialize set type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
9b211a7285 types: deserialize map type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
5f1939554c types: deserialize list type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
ad7ab73cd0 types: add FragmentedView versions of read_collection_size and read_collection_value
We will need those to deserialize collections from FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
495bf5c431 types: deserialize varint type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
0f8ad89740 types: deserialize floating point types from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
0bb0291e50 types: deserialize decimal type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
760bc5fd60 types: deserialize duration type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
75a56f439b types: deserialize IP address types from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
9f668929db types: deserialize uuid types from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
3e1a24ca0d types: deserialize timestamp type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
a4bc43ab19 types: deserialize simple date type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
24bd986aea types: deserialize time type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
c03ad52513 types: deserialize boolean type from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
2f351928e2 types: deserialize integer types from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Michał Chojnowski
28b727082f types: deserialize string types from FragmentedView
A part of the transition of deserialize from bytes_view to FragmentedView.
2020-12-03 10:57:06 +01:00
Piotr Wojtczak
caa3c471c0 Validate ascii values when creating from CQL
Although the code for it existed already, the validation function
hasn't been invoked properly. This change fixes that, adding
a validating check when converting from text to specific value
type and throwing a marshal exception if some characters
are not ASCII.

Fixes #5421

Closes #7532
2020-11-02 16:47:32 +02:00
Nadav Har'El
6740907f3d Merge 'utf8: don't linearize cells for validation' from Avi Kivity
Currently, we linearize large UTF8 cells in order to validate them.
This can cause large latency spikes if the cell is large.

This series changes UTF8 validation to work on fragmented buffers.
This is somewhat tricky since the validation routines are optimized
for single-instruction-multiple-data (SIMD) architectures.

The unit tests are expanded to cover the new functionality.

Fixes #7448.

Closes #7449

* github.com:scylladb/scylla:
  types: don't linearize utf8 for validation
  test: utf8: add fragmented buffer validation tests
  utils: utf8: add function to validate fragmented buffers
  utils: utf8: expose validate_partial() in a header
  utils: utf8: introduce validate_partial()
  utils: utf8: extract a function to evaluate a single codepoint
2020-10-21 20:51:15 +03:00
Avi Kivity
c0ca54395a types: don't linearize utf8 for validation
Use the new non-linearizing validator, avoiding linearization.
Linearization can cause large contiguous memory allocations, which
in turn causes latency spikes.

Fixes #7448.
2020-10-21 11:14:44 +03:00
Avi Kivity
ed6775c585 types: adjust validation_visitor construction for clang
Clang does not implement P0960R3, parenthesized initialization of
aggregates, so we have to use brace initialization in
validation_visitor. It also does not implement class template
argument deduction for aggregates (P1816r0), so we have to
specify the template parameters explicity.
2020-10-19 14:53:00 +03:00
Avi Kivity
affa234151 types: don't linearize ascii during validation
ascii has no inter-byte dependencies and so can
be validated fragment by fragment, reducing large
contiguous allocations.

Fixes #7393.

Closes #7394
2020-10-12 13:15:24 +03:00
Avi Kivity
4c63723ead types: tighten digit count requirement on time nanoseconds components
When the number of nanosecond digits is greater than 9, the std::pow()
expression that corrects the nanosecond value becomes infinite. This
is because sstring::length() is unsigned, and so negative values
underflow and become large.

Following Cassandra, fix by forbidding more than 9 digits of
nanosecond precision.

Found by clang's ubsan.

Closes #7371
2020-10-11 14:13:46 +03:00
Rafael Ávila de Espíndola
a3bd546197 types: Work around a clang thread-local code generation bug (user_type)
Following 5d249a8e27, apply the same
fix for user_type_impl.

This works around https://bugs.llvm.org/show_bug.cgi?id=47747

Depending on this might be unstable, as the bug bug can show up at any
corner, but this is sufficient right now to get
test_user_function_disabled to pass.

Closes #7370
2020-10-11 12:36:38 +03:00
Botond Dénes
db56ae695c types: validate(): linearize values lazily
Instead of eagerly linearizing all values as they are passed to
validate(), defer linearization to those validators that actually need
linearized values. Linearizing large values puts pressure on the memory
allocator with large contiguous allocation requests. This is something
we are trying to actively avoid, especially if it is not really neaded.
Turns out the types, whose validators really want linearized values are
a minority, as most validators just look at the size of the value, and
some like bytes don't need validation at all, while usually having large
values.

This is achieved by templating the validator struct on the view and
using the FragmentedRange concept to treat all passed in views
(`bytes_view` and `fragmented_temporary_buffer_view`) uniformly.
This patch makes no attempt at converting existing validators to work
with fragmented buffers, only trivial cases are converted. The major
offenders still left are ascii/utf8 and collections.

Fixes: #7318

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201007054524.909420-1-bdenes@scylladb.com>
2020-10-07 11:00:18 +03:00
Rafael Ávila de Espíndola
5d249a8e27 types: Work around a clang thread-local code generation bug
This works around https://bugs.llvm.org/show_bug.cgi?id=47747

Depending on this might be unstable, as the bug bug can show up at any
corner, but this is sufficient right now to get
test_user_function_disabled to pass.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20201007000713.1503302-1-espindola@scylladb.com>
2020-10-07 09:49:53 +03:00
Benny Halevy
f8d9e81bdb types: time_point_to_string: prevent overflow of nanoseconds
Due to #7175, microseconds are stored in a db_clock::time_point
as if they were milliseconds.

std::chrono::duration_cast<std::chrono::nanoseconds> may cause overflow
and end up with invalid/negative nanos.

This change specializes time_point_to_string to std::chrono::milliseconds
since it's currently only called to print db_clock::time_point
and uses boost::posix_time::milliseconds to print the count.

This would generate an exception in today's time stamps
and the output will look like:
    1599493018559873 milliseconds (Year is out of valid range: 1400..9999)
instead of:
    1799-07-16T19:57:52.175010

It is preferrable to print the numeric value annotated as out of valid range
than to print a bogus date in the past.

Test: unit(dev), commitlog_test:TestCommitLog.test_mixed_mode_commitlog_same_partition_smp_1
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200907162845.147477-1-bhalevy@scylladb.com>
2020-09-08 10:02:02 +03:00