Commit Graph

240 Commits

Author SHA1 Message Date
Avi Kivity
24caf0824d Merge "Complete the LIKE operator" from Dejan
"
Implement LIKE parsing, intermediate representation, and query processing. Add tests
for this implementation (leaving the LIKE functionality tests in
tests/like_matcher_test.cc).

Refs #4477.
"

* 'finish-like' of https://github.com/dekimir/scylla:
  cql3: Add LIKE operator to CQL grammar
  cql3: Ensure LIKE filtering for partition columns
  cql3: Add LIKE restriction
  cql3: Add LIKE relation
2019-07-06 12:26:08 +03:00
kbr-
8995945052 Implement tuple_type_impl::to_string_impl. (#4645)
Resolves #4633.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-06 12:26:08 +03:00
Dejan Mircevski
21d7722594 cql3: Add LIKE relation
Add a new type of relation with operator LIKE.  Handle it in
relation::to_restriction by introducing a new virtual method for it.
The temporary implementation of this method returns null; that will be
replaced in a subsequent patch.

Add abstract_type::is_string() to recognize string columns and
disallow LIKE operator on non-string columns.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:54:30 +02:00
Tomasz Grabiec
3e30a33e31 Merge "Introduce tests::random_schema" from Botond
Most of our tests use overly simplistic schemas (`simple_schema`) or
very specialized ones that focus on exercising a specific area of the
tested code. This is fine in most places as not all code is schema
dependent, however practice has showed that there can be nasty bugs
hiding in dark corners that only appear with a schema that has a
specific combination of types.

This series introduces `tests::random_schema` a utility class for
generating random schemas and random data for them. An important goal is
to make using random schemas in tests as simple and convenient as
possible, therefore fostering the appearance of tests using random
schemas.

Random schema was developed to help testing code I'm currently working
on, which segregates data by time-windows. As I wasn't confident in my
ability to think of every possible combination of types that can break
my code I came up with random-schema to help me finding these corner
cases. So far I consider it a success, it already found bugs in my code
that I'm not sure I would have found if I had relied on specific
schemas. It also found bugs in unrelated areas of the code which proves
my point in the first paragraph.

* https://github.com/denesb/scylla.git random_schema/v5:
  tests/data_model: approximate to the modeled data structures
  data_value: add ascii constructor
  tests/random-utils.hh: add stepped_int_distribution
  tests/random-utils.hh: get_int() add overloads that accept external
    rand engine
  tests/random-utils.hh: add get_real()
  tests: introduce random_schema
2019-06-26 18:10:20 +02:00
Botond Dénes
572a738777 collection: use chunked_vector to store cells
This is quick fix to the immediate problem of large collections causing
large allocations, triggering stalls or OOM. The proper fix is to
use IMR for storing the cells, but that is a complex change that will
require time, so let's not stall/OOM in the meanwhile.
2019-06-26 11:40:44 +03:00
Botond Dénes
c68ffc330e types: don't copy collection_type_impl::mutation_view
Just because its a view its not cheap to copy.
2019-06-26 11:39:41 +03:00
Botond Dénes
a3f9932a2f data_value: add ascii constructor
To allow a `data_value` with `ascii_type` to be constructed.
2019-06-25 12:01:33 +03:00
Rafael Ávila de Espíndola
65ac0a831c Add to_string_impl that takes a data_value
Currently to_string takes raw bytes. This means that to print a
data_value it has to first be serialized to be passed to to_string,
which will then deserializes it.

This patch adds a virtual to_string_impl that takes a data_value and
implements a now non virtual to_sting on top of it.

I don't expect this to have a performance impact. It mostly documents
how to access a data_value without converting it to bytes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620183449.64779-3-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Piotr Sarna
f50f418066 types: isolate deserializing iterator to separate file
In order to be used outside types.cc, listlike deserializing iterator
is moved to a separate header.

Message-Id: <d9416e6a8d170aa4936826b54ca7be4acb4ec8e6.1559745816.git.sarna@scylladb.com>
2019-06-05 17:46:51 +03:00
Piotr Sarna
b3396dbb57 types: migrate to_json_string to use bytes view
The to_json_string utility implementation was based on const references
instead of views, which can be a source of unnecessary memory copying.
This patch migrates all to_json_string to use bytes_view and leaves
the const reference version as a thin wrapper.

Message-Id: <2bf9f1951b862f8e8a2211cb4e83852e7ac70c67.1559654014.git.sarna@scylladb.com>
2019-06-04 19:17:46 +03:00
Paweł Dziepak
49b4aeca4d Merge "hinted handoff: prevent sending attempts" from Vlad
"
Fix the broken logic that is meant to prevent sending hints when node is
in a DOWN NORMAL state.
"

* 'hinted_handoff_stop_sending_to_down_node-v2' of https://github.com/vladzcloudius/scylla:
  hints_manager: rename the state::ep_state_is_not_normal enum value
  hinted handoff: fix the logic that detects that the destination node is in DN state
  hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check
  hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper()
  types.cc: fix the compilation with fmt v5.3.0
2019-05-09 15:18:57 +01:00
Avi Kivity
43867fe618 types: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:01:36 +03:00
Vlad Zolotarov
fe82437dea types.cc: fix the compilation with fmt v5.3.0
Compilation fails with fmt release 5.3.0 when we print a bytes_view
using "{}" formatter.

Compiler's complain is: "error: static assertion failed: mismatch
between char-types of context and argument"

Fix this by explicitly using to_hex() converter.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-25 23:04:02 -04:00
Paweł Dziepak
85409c1a16 Merge "Validate elements of collections" from Piotr
"
Previously we weren't validating elements of collections so it
was possible to add non-UTF-8 string to a column with type
list<text>.

Tests: unit(release)

Fixes #4009
"

* 'haaawk/4009/v5' of github.com:scylladb/seastar-dev:
  types: Test correct map validation
  types: Test correct in clause validation
  types: Test correct tuple validation
  types: Test correct set validation
  types: Test correct list validation
  types: Add test_tuple_elements_validation
  types: Add test_in_clause_validation
  types: Add test_map_elements_validation
  types: Add test_set_elements_validation
  types: Add test_list_elements_validation
  types: Validate input when tuples
  types: Validate input when parsing a set
  types: Validate input when parsing a map
  types: Validate input when parsing a list
  types: Implement validation for tuple
  types: Implement validation for set
  types: Implement validation for map
  types: Implement validation for list
  types: Add cql_serialization_format parameter to validate
2019-04-18 19:07:14 +03:00
Botond Dénes
6e85d1e8c1 date_type_impl: add notice explaining why its not used
And why is it still in the code. The note has been copied from Origin.

Refs: #4419
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c7790a898c331a7f58014d82a10cbc9ee7ad3265.1555483620.git.bdenes@scylladb.com>
2019-04-18 19:07:14 +03:00
Botond Dénes
f201f8abab types: fix date_type_impl::less() (timestamp cql type)
date_type_impl::less() invokes `compare_unsigned()` to compare the
underlying raw byte values. `compared_unsigned()` is a tri comparator,
however `date_type_impl::less()` implicitely converted the returned
value to bool. In effect, `date_type_impl::less()` would *always* return
`true` when the two compared values were not equal.

Found while working on a unit test which empoly a randomly generated
schema to test a component.


Fixes #4419.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <8a17c81bad586b3772bf3d1d1dae0e3dc3524e2d.1554907100.git.bdenes@scylladb.com>
2019-04-10 21:01:25 +03:00
Piotr Jastrzebski
8482764003 types: Implement validation for tuple
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
bd2823b623 types: Implement validation for set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
086d8abf89 types: Implement validation for map
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
4a51ee6e34 types: Implement validation for list
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
f5f6367674 types: Add cql_serialization_format parameter to validate
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Avi Kivity
a77762b02a Merge "Optimise vint deserialisation" from Paweł
"

Variable length integers are used are used extensively by SSTables mc
format. The current deserialisation routine is quite naive in a way that
it reads each byte separately. Since, those vints usually appear inside
much larger buffers, we optimise for such cases, read 8-bytes at once
and then mask out the unneeded parts (as well as fix their order because
big-endian).

Tests: unit(dev).

perf_vint (average time per element when deserializing 1000 vints):

before:
vint.deserialize                            69442000    14.400ns     0.000ns    14.399ns    14.400ns

after:
vint.deserialize                           241502000     4.140ns     0.000ns     4.140ns     4.140ns

perf_fast_forward (data on /tmp):
large-partition-single-key-slice on dataset large-part-ds1:

before:
   range            time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> [0, 1]           0.000278         8792         2       7190        119       7367       1960      3        104       2       0        0        1        1        0        0        1 100.0%
-> [1, 100)         0.000344           96        99     288100       4335     307689     193809      2        108       2       0        0        1        1        0        0        1 100.0%
-> (100, 200]       0.000339        13254       100     295263       2824     301734     222725      2        108       2       0        0        1        1        0        0        1 100.0%

after:
   range            time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> [0, 1]           0.000236        10001         2       8461         59       8718       2261      3        104       2       0        0        1        1        0        0        1 100.0%
-> [1, 100)         0.000285           89        99     347500       2441     355826     215745      2        108       2       0        0        1        1        0        0        1 100.0%
-> (100, 200]       0.000293        14369       100     341302       1512     350123     222049      2        108       2       0        0        1        1        0        0        1 100.0%
"

* tag 'optimise-vint/v2' of https://github.com/pdziepak/scylla:
  sstable: pass full length of buffer to vint deserialiser
  vint: optimise deserialisation routine
  vint: drop deserialize_type structure
  tests/vint: reduce test dependencies
  tests/perf: add performance test for vint serialisation
2019-03-26 16:41:44 +02:00
Piotr Sarna
287a02dc05 types: fix varint and decimal serialization
Varint and decimal types serialization did not update the output
iterator after generating a value, which may lead to corrupted
sstables - variable-length integers were properly serialized,
but if anything followed them directly in the buffer (e.g. in a tuple),
their value will be overwritten.

Fixes #4348

Tests: unit (dev)
dtest: json_test.FromJsonUpdateTests.complex_data_types_test
       json_test.FromJsonInsertTests.complex_data_types_test
       json_test.ToJsonSelectTests.complex_data_types_test

Note that dtests still do not succeed 100% due to formatting differences
in compared results (e.g. 1.0e+07 vs 1.0E7, but it's no longer a query
correctness issue.
2019-03-26 11:02:43 +01:00
Rafael Ávila de Espíndola
53ab298957 Turn cql3_type into a trivial wrapper over data_type
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.

To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.

Even before this patch cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.

This avoids the reference cycle and is easier to understand IMHO.

Tests: unit (dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 14:10:28 -07:00
Paweł Dziepak
57de2c26b3 vint: drop deserialize_type structure
Deserialisation function returns a structure containing both the value
and its length in the input buffer. In the vast majority of the cases
the caller will already know the length and having this structure will
make it harder for the compiler to emit good code, especially if the
function is not inlined.

In practice I've seen the structure causing register pressure problems
that lead to spilling variables to memory.
2019-03-14 13:37:06 +00:00
Piotr Sarna
ebf0eb92bb types: add JSON support to UDT
User defined types can now be serialized to and deserialized from JSON.

Fixes #3708
2019-03-05 16:08:05 +01:00
Piotr Sarna
aa0cc8a8a2 types: add JSON support for tuples
Tuples can now be serialized to and deserialized from JSON.

Refs #3708
2019-03-05 16:08:04 +01:00
Piotr Jastrzebski
5a5201a50b Move collection_type_impl out of types.hh to types/collection.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
ad016a732b Move set_type_impl out of types.hh to types/set.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
b1e1b66732 Move list_type_impl out of types.hh to types/list.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
147cc031db Move map_type_impl out of types.hh to types/map.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
b6b2fdc5be Move tuple_type_impl from types.hh to types/tuple.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
e92b4c3dbc Move user_type_impl out of types.hh to types/user.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:04:04 +01:00
Paweł Dziepak
14757d8a83 types: collection_type: drop tombstone if covered by higher-level one
At the moment are inefficiencies in how
collection_type_impl::mutation::compact_and_expire( handles tombstones.
If there is a higher-level tombstone that covers the collection one
(including cases where there is no collection tombstone) it will be
applied to the collection tombstone and present in the compaction
output. This also means that the collection tombstone is never dropped
if fully covered by a higher-level one.

This patch fixes both those problems. After the compaction the
collection tombstone is either unchanged or removed if covered by a
higher-level one.

Fixes #4092.

Message-Id: <20190118174244.15880-1-pdziepak@scylladb.com>
2019-01-20 15:32:34 +02:00
Piotr Jastrzebski
96b880f81c Add comment explaining tuple type name creation
To keep format compatibiliti we never wrap tuple type name
into "org.apache.cassandra.db.marshal.FrozenType(...)".
Even when the tuple is frozen.
This patch adds a comment in tuple_type_impl::make_name that
explains the situation.

For more details see #4087

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:14:26 +01:00
Piotr Jastrzebski
57e655d716 Add "FrozenType(...)" to UDT name only when it's frozen
At the moment Scylla supports only frozen UDTs but
the code should be able to handle non-frozen UDTs as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:08:02 +01:00
Piotr Jastrzebski
fc17bd376b Move "FrozenType(...)" addition to UDT name to user_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:07:47 +01:00
Piotr Jastrzebski
1fdfc461b8 Add "frozen<...>" to tuple CQL name only when it's frozen
At the moment Scylla supports only frozen tuples but
the code should be able to handle non-frozen tuples as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
749eee2711 Move "frozen<...>" addition to tuple CQL name to tuple_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
7aba17de2c Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
56060573bb Add "frozen<...>" to UDT CQL name only when it's frozen
At the moment Scylla supports only frozen UDTs but
the code should be able to handle non-frozen UDTs as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
a928c103c2 Move "frozen<...>" addition to UDT CQL name to user_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:09:00 +01:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Yibo Cai (Arm Technology China)
422987ab04 utils: add fast ascii string validation
Validate ascii string by ORing all bytes and check if 7-th bit is 0.
Compared with original std::any_of(), which checks ascii string byte
by byte, this new approach validates input in 8 bytes and two
independent streams. Performance is much higher for normal cases,
though slightly slower when string is very short. See table below.

Speed(MB/s) of ascii string validation
+---------------+-------------+---------+
| String length | std::any_of | u64 x 2 |
+---------------+-------------+---------+
| 9 bytes       | 1691        | 1635    |
+---------------+-------------+---------+
| 31 bytes      | 2923        | 3181    |
+---------------+-------------+---------+
| 129 bytes     | 3377        | 15110   |
+---------------+-------------+---------+
| 1039 bytes    | 3357        | 31815   |
+---------------+-------------+---------+
| 16385 bytes   | 3448        | 47983   |
+---------------+-------------+---------+
| 1048576 bytes | 3394        | 31391   |
+---------------+-------------+---------+

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1544669646-31881-1-git-send-email-yibo.cai@arm.com>
2018-12-24 09:58:08 +02:00
Yibo Cai (Arm Technology China)
6fadba56cc utils: optimize UTF-8 validation
UTF-8 string is now validated by boost::locale::conv::utf_to_utf, it
actually does string conversions which is more than necessary.  As
observed on Arm server, UTF-8 validation can become bottleneck under
heavy loads.

This patch introduces a brand new SIMD implementation supporting both
NEON and SSE, as well as a naive approach to handle short strings.
The naive approach is 3x faster than boost utf_to_utf, whilst SIMD
method outperforms naive approach 3x ~ 5x on Arm and x86. Details at
https://github.com/cyb70289/utf8/.

UTF-8 unit test is added to check various corner cases.

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1543978498-12123-1-git-send-email-yibo.cai@arm.com>
2018-12-05 21:51:01 +02:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Avi Kivity
a71ab365e3 toplevel: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
8db8c01fbe types: get rid of PRId64 formatting
It's not needed for out sprint() implementation, and gets in the way of
converting all formatting to fmt.
2018-11-01 13:16:16 +00:00
Piotr Sarna
37a5c38471 types: enable deserializing varint from JSON string
Previously deserialization failed because the JSON string
representing a number was unnecessarily quoted.

Fixes #3666
Message-Id: <a0a100dbac7c151d627522174303657d1da05c27.1534845398.git.sarna@scylladb.com>
2018-08-21 11:20:11 +01:00
Piotr Sarna
b3f438bfec types: enable parsing numeric JSON values from string
In order to be Cassandra-compatible, JSON values passed in INSERT JSON
statement should accept string parameters for numeric types - int,
double, etc.

Fixes #3666
Message-Id: <4da9a2f68de31492a2e9432493663a62b138c2f2.1534153955.git.sarna@scylladb.com>
2018-08-13 23:57:37 +01:00