scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Avi Kivity	24caf0824d	Merge "Complete the LIKE operator" from Dejan " Implement LIKE parsing, intermediate representation, and query processing. Add tests for this implementation (leaving the LIKE functionality tests in tests/like_matcher_test.cc). Refs #4477. " * 'finish-like' of https://github.com/dekimir/scylla: cql3: Add LIKE operator to CQL grammar cql3: Ensure LIKE filtering for partition columns cql3: Add LIKE restriction cql3: Add LIKE relation	2019-07-06 12:26:08 +03:00
kbr-	8995945052	Implement tuple_type_impl::to_string_impl. (#4645 ) Resolves #4633. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-06 12:26:08 +03:00
Dejan Mircevski	21d7722594	cql3: Add LIKE relation Add a new type of relation with operator LIKE. Handle it in relation::to_restriction by introducing a new virtual method for it. The temporary implementation of this method returns null; that will be replaced in a subsequent patch. Add abstract_type::is_string() to recognize string columns and disallow LIKE operator on non-string columns. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-07-04 10:54:30 +02:00
Tomasz Grabiec	3e30a33e31	Merge "Introduce tests::random_schema" from Botond Most of our tests use overly simplistic schemas (`simple_schema`) or very specialized ones that focus on exercising a specific area of the tested code. This is fine in most places as not all code is schema dependent, however practice has showed that there can be nasty bugs hiding in dark corners that only appear with a schema that has a specific combination of types. This series introduces `tests::random_schema` a utility class for generating random schemas and random data for them. An important goal is to make using random schemas in tests as simple and convenient as possible, therefore fostering the appearance of tests using random schemas. Random schema was developed to help testing code I'm currently working on, which segregates data by time-windows. As I wasn't confident in my ability to think of every possible combination of types that can break my code I came up with random-schema to help me finding these corner cases. So far I consider it a success, it already found bugs in my code that I'm not sure I would have found if I had relied on specific schemas. It also found bugs in unrelated areas of the code which proves my point in the first paragraph. * https://github.com/denesb/scylla.git random_schema/v5: tests/data_model: approximate to the modeled data structures data_value: add ascii constructor tests/random-utils.hh: add stepped_int_distribution tests/random-utils.hh: get_int() add overloads that accept external rand engine tests/random-utils.hh: add get_real() tests: introduce random_schema	2019-06-26 18:10:20 +02:00
Botond Dénes	572a738777	collection: use chunked_vector to store cells This is quick fix to the immediate problem of large collections causing large allocations, triggering stalls or OOM. The proper fix is to use IMR for storing the cells, but that is a complex change that will require time, so let's not stall/OOM in the meanwhile.	2019-06-26 11:40:44 +03:00
Botond Dénes	c68ffc330e	types: don't copy collection_type_impl::mutation_view Just because its a view its not cheap to copy.	2019-06-26 11:39:41 +03:00
Botond Dénes	a3f9932a2f	data_value: add ascii constructor To allow a `data_value` with `ascii_type` to be constructed.	2019-06-25 12:01:33 +03:00
Rafael Ávila de Espíndola	65ac0a831c	Add to_string_impl that takes a data_value Currently to_string takes raw bytes. This means that to print a data_value it has to first be serialized to be passed to to_string, which will then deserializes it. This patch adds a virtual to_string_impl that takes a data_value and implements a now non virtual to_sting on top of it. I don't expect this to have a performance impact. It mostly documents how to access a data_value without converting it to bytes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190620183449.64779-3-espindola@scylladb.com>	2019-06-23 16:03:06 +03:00
Piotr Sarna	f50f418066	types: isolate deserializing iterator to separate file In order to be used outside types.cc, listlike deserializing iterator is moved to a separate header. Message-Id: <d9416e6a8d170aa4936826b54ca7be4acb4ec8e6.1559745816.git.sarna@scylladb.com>	2019-06-05 17:46:51 +03:00
Piotr Sarna	b3396dbb57	types: migrate to_json_string to use bytes view The to_json_string utility implementation was based on const references instead of views, which can be a source of unnecessary memory copying. This patch migrates all to_json_string to use bytes_view and leaves the const reference version as a thin wrapper. Message-Id: <2bf9f1951b862f8e8a2211cb4e83852e7ac70c67.1559654014.git.sarna@scylladb.com>	2019-06-04 19:17:46 +03:00
Paweł Dziepak	49b4aeca4d	Merge "hinted handoff: prevent sending attempts" from Vlad " Fix the broken logic that is meant to prevent sending hints when node is in a DOWN NORMAL state. " * 'hinted_handoff_stop_sending_to_down_node-v2' of https://github.com/vladzcloudius/scylla: hints_manager: rename the state::ep_state_is_not_normal enum value hinted handoff: fix the logic that detects that the destination node is in DN state hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper() types.cc: fix the compilation with fmt v5.3.0	2019-05-09 15:18:57 +01:00
Avi Kivity	43867fe618	types: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 10:01:36 +03:00
Vlad Zolotarov	fe82437dea	types.cc: fix the compilation with fmt v5.3.0 Compilation fails with fmt release 5.3.0 when we print a bytes_view using "{}" formatter. Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument" Fix this by explicitly using to_hex() converter. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:04:02 -04:00
Paweł Dziepak	85409c1a16	Merge "Validate elements of collections" from Piotr " Previously we weren't validating elements of collections so it was possible to add non-UTF-8 string to a column with type list<text>. Tests: unit(release) Fixes #4009 " * 'haaawk/4009/v5' of github.com:scylladb/seastar-dev: types: Test correct map validation types: Test correct in clause validation types: Test correct tuple validation types: Test correct set validation types: Test correct list validation types: Add test_tuple_elements_validation types: Add test_in_clause_validation types: Add test_map_elements_validation types: Add test_set_elements_validation types: Add test_list_elements_validation types: Validate input when tuples types: Validate input when parsing a set types: Validate input when parsing a map types: Validate input when parsing a list types: Implement validation for tuple types: Implement validation for set types: Implement validation for map types: Implement validation for list types: Add cql_serialization_format parameter to validate	2019-04-18 19:07:14 +03:00
Botond Dénes	6e85d1e8c1	date_type_impl: add notice explaining why its not used And why is it still in the code. The note has been copied from Origin. Refs: #4419 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c7790a898c331a7f58014d82a10cbc9ee7ad3265.1555483620.git.bdenes@scylladb.com>	2019-04-18 19:07:14 +03:00
Botond Dénes	f201f8abab	types: fix date_type_impl::less() (timestamp cql type) date_type_impl::less() invokes `compare_unsigned()` to compare the underlying raw byte values. `compared_unsigned()` is a tri comparator, however `date_type_impl::less()` implicitely converted the returned value to bool. In effect, `date_type_impl::less()` would always return `true` when the two compared values were not equal. Found while working on a unit test which empoly a randomly generated schema to test a component. Fixes #4419. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <8a17c81bad586b3772bf3d1d1dae0e3dc3524e2d.1554907100.git.bdenes@scylladb.com>	2019-04-10 21:01:25 +03:00
Piotr Jastrzebski	8482764003	types: Implement validation for tuple Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	bd2823b623	types: Implement validation for set Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	086d8abf89	types: Implement validation for map Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	4a51ee6e34	types: Implement validation for list Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Piotr Jastrzebski	f5f6367674	types: Add cql_serialization_format parameter to validate Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-09 16:58:22 +02:00
Avi Kivity	a77762b02a	Merge "Optimise vint deserialisation" from Paweł " Variable length integers are used are used extensively by SSTables mc format. The current deserialisation routine is quite naive in a way that it reads each byte separately. Since, those vints usually appear inside much larger buffers, we optimise for such cases, read 8-bytes at once and then mask out the unneeded parts (as well as fix their order because big-endian). Tests: unit(dev). perf_vint (average time per element when deserializing 1000 vints): before: vint.deserialize 69442000 14.400ns 0.000ns 14.399ns 14.400ns after: vint.deserialize 241502000 4.140ns 0.000ns 4.140ns 4.140ns perf_fast_forward (data on /tmp): large-partition-single-key-slice on dataset large-part-ds1: before: range time (s) iterations frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> [0, 1] 0.000278 8792 2 7190 119 7367 1960 3 104 2 0 0 1 1 0 0 1 100.0% -> [1, 100) 0.000344 96 99 288100 4335 307689 193809 2 108 2 0 0 1 1 0 0 1 100.0% -> (100, 200] 0.000339 13254 100 295263 2824 301734 222725 2 108 2 0 0 1 1 0 0 1 100.0% after: range time (s) iterations frags frag/s mad f/s max f/s min f/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> [0, 1] 0.000236 10001 2 8461 59 8718 2261 3 104 2 0 0 1 1 0 0 1 100.0% -> [1, 100) 0.000285 89 99 347500 2441 355826 215745 2 108 2 0 0 1 1 0 0 1 100.0% -> (100, 200] 0.000293 14369 100 341302 1512 350123 222049 2 108 2 0 0 1 1 0 0 1 100.0% " * tag 'optimise-vint/v2' of https://github.com/pdziepak/scylla: sstable: pass full length of buffer to vint deserialiser vint: optimise deserialisation routine vint: drop deserialize_type structure tests/vint: reduce test dependencies tests/perf: add performance test for vint serialisation	2019-03-26 16:41:44 +02:00
Piotr Sarna	287a02dc05	types: fix varint and decimal serialization Varint and decimal types serialization did not update the output iterator after generating a value, which may lead to corrupted sstables - variable-length integers were properly serialized, but if anything followed them directly in the buffer (e.g. in a tuple), their value will be overwritten. Fixes #4348 Tests: unit (dev) dtest: json_test.FromJsonUpdateTests.complex_data_types_test json_test.FromJsonInsertTests.complex_data_types_test json_test.ToJsonSelectTests.complex_data_types_test Note that dtests still do not succeed 100% due to formatting differences in compared results (e.g. 1.0e+07 vs 1.0E7, but it's no longer a query correctness issue.	2019-03-26 11:02:43 +01:00
Rafael Ávila de Espíndola	53ab298957	Turn cql3_type into a trivial wrapper over data_type Both cql3_type and abstract_type are normally used inside shared_ptr. This creates a problem when an abstract_type needs to refer to a cql3_type as that creates a cycle. To avoid warnings from asan, we were using a std::unordered_map to store one of the edges of the cycle. This avoids the warning, but wastes even more memory. Even before this patch cql3_type was a fairly light weight structure. This patch pushes in that direction and now cql3_type is a struct with a single member variable, a data_type. This avoids the reference cycle and is easier to understand IMHO. Tests: unit (dev) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-20 14:10:28 -07:00
Paweł Dziepak	57de2c26b3	vint: drop deserialize_type structure Deserialisation function returns a structure containing both the value and its length in the input buffer. In the vast majority of the cases the caller will already know the length and having this structure will make it harder for the compiler to emit good code, especially if the function is not inlined. In practice I've seen the structure causing register pressure problems that lead to spilling variables to memory.	2019-03-14 13:37:06 +00:00
Piotr Sarna	ebf0eb92bb	types: add JSON support to UDT User defined types can now be serialized to and deserialized from JSON. Fixes #3708	2019-03-05 16:08:05 +01:00
Piotr Sarna	aa0cc8a8a2	types: add JSON support for tuples Tuples can now be serialized to and deserialized from JSON. Refs #3708	2019-03-05 16:08:04 +01:00
Piotr Jastrzebski	5a5201a50b	Move collection_type_impl out of types.hh to types/collection.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	ad016a732b	Move set_type_impl out of types.hh to types/set.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	b1e1b66732	Move list_type_impl out of types.hh to types/list.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	147cc031db	Move map_type_impl out of types.hh to types/map.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	b6b2fdc5be	Move tuple_type_impl from types.hh to types/tuple.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	e92b4c3dbc	Move user_type_impl out of types.hh to types/user.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:04:04 +01:00
Paweł Dziepak	14757d8a83	types: collection_type: drop tombstone if covered by higher-level one At the moment are inefficiencies in how collection_type_impl::mutation::compact_and_expire( handles tombstones. If there is a higher-level tombstone that covers the collection one (including cases where there is no collection tombstone) it will be applied to the collection tombstone and present in the compaction output. This also means that the collection tombstone is never dropped if fully covered by a higher-level one. This patch fixes both those problems. After the compaction the collection tombstone is either unchanged or removed if covered by a higher-level one. Fixes #4092. Message-Id: <20190118174244.15880-1-pdziepak@scylladb.com>	2019-01-20 15:32:34 +02:00
Piotr Jastrzebski	96b880f81c	Add comment explaining tuple type name creation To keep format compatibiliti we never wrap tuple type name into "org.apache.cassandra.db.marshal.FrozenType(...)". Even when the tuple is frozen. This patch adds a comment in tuple_type_impl::make_name that explains the situation. For more details see #4087 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:14:26 +01:00
Piotr Jastrzebski	57e655d716	Add "FrozenType(...)" to UDT name only when it's frozen At the moment Scylla supports only frozen UDTs but the code should be able to handle non-frozen UDTs as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:08:02 +01:00
Piotr Jastrzebski	fc17bd376b	Move "FrozenType(...)" addition to UDT name to user_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 12:07:47 +01:00
Piotr Jastrzebski	1fdfc461b8	Add "frozen<...>" to tuple CQL name only when it's frozen At the moment Scylla supports only frozen tuples but the code should be able to handle non-frozen tuples as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	749eee2711	Move "frozen<...>" addition to tuple CQL name to tuple_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	7aba17de2c	Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	56060573bb	Add "frozen<...>" to UDT CQL name only when it's frozen At the moment Scylla supports only frozen UDTs but the code should be able to handle non-frozen UDTs as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:14:30 +01:00
Piotr Jastrzebski	a928c103c2	Move "frozen<...>" addition to UDT CQL name to user_type_impl This logic belongs in types.hh/types.cc layer. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-11 11:09:00 +01:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Yibo Cai (Arm Technology China)	422987ab04	utils: add fast ascii string validation Validate ascii string by ORing all bytes and check if 7-th bit is 0. Compared with original std::any_of(), which checks ascii string byte by byte, this new approach validates input in 8 bytes and two independent streams. Performance is much higher for normal cases, though slightly slower when string is very short. See table below. Speed(MB/s) of ascii string validation +---------------+-------------+---------+ \| String length \| std::any_of \| u64 x 2 \| +---------------+-------------+---------+ \| 9 bytes \| 1691 \| 1635 \| +---------------+-------------+---------+ \| 31 bytes \| 2923 \| 3181 \| +---------------+-------------+---------+ \| 129 bytes \| 3377 \| 15110 \| +---------------+-------------+---------+ \| 1039 bytes \| 3357 \| 31815 \| +---------------+-------------+---------+ \| 16385 bytes \| 3448 \| 47983 \| +---------------+-------------+---------+ \| 1048576 bytes \| 3394 \| 31391 \| +---------------+-------------+---------+ Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1544669646-31881-1-git-send-email-yibo.cai@arm.com>	2018-12-24 09:58:08 +02:00
Yibo Cai (Arm Technology China)	6fadba56cc	utils: optimize UTF-8 validation UTF-8 string is now validated by boost::locale::conv::utf_to_utf, it actually does string conversions which is more than necessary. As observed on Arm server, UTF-8 validation can become bottleneck under heavy loads. This patch introduces a brand new SIMD implementation supporting both NEON and SSE, as well as a naive approach to handle short strings. The naive approach is 3x faster than boost utf_to_utf, whilst SIMD method outperforms naive approach 3x ~ 5x on Arm and x86. Details at https://github.com/cyb70289/utf8/. UTF-8 unit test is added to check various corner cases. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1543978498-12123-1-git-send-email-yibo.cai@arm.com>	2018-12-05 21:51:01 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Avi Kivity	a71ab365e3	toplevel: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	8db8c01fbe	types: get rid of PRId64 formatting It's not needed for out sprint() implementation, and gets in the way of converting all formatting to fmt.	2018-11-01 13:16:16 +00:00
Piotr Sarna	37a5c38471	types: enable deserializing varint from JSON string Previously deserialization failed because the JSON string representing a number was unnecessarily quoted. Fixes #3666 Message-Id: <a0a100dbac7c151d627522174303657d1da05c27.1534845398.git.sarna@scylladb.com>	2018-08-21 11:20:11 +01:00
Piotr Sarna	b3f438bfec	types: enable parsing numeric JSON values from string In order to be Cassandra-compatible, JSON values passed in INSERT JSON statement should accept string parameters for numeric types - int, double, etc. Fixes #3666 Message-Id: <4da9a2f68de31492a2e9432493663a62b138c2f2.1534153955.git.sarna@scylladb.com>	2018-08-13 23:57:37 +01:00

1 2 3 4 5

240 Commits