scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 12:17:02 +00:00

Author	SHA1	Message	Date
Benny Halevy	64a4ffc579	large_data_handler: do not delete records in the absence of large_data_stats The previous way of deleting records based on the whole sstatble data_size causes overzealous deletions (#7668) and inefficiency in the rows cache due to the large number of range tombstones created. Therefore we'd be better of by juts letting the records expire using he 30 days TTL. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201206083725.1386249-1-bhalevy@scylladb.com>	2020-12-06 11:34:37 +02:00
Avi Kivity	dc77d128e9	Revert "Merge "raft: fix replication if existing log on leader" from Gleb" This reverts commit `0aa1f7c70a`, reversing changes made to `72c59e8000`. The diff is strange, including unrelated commits. There is no understanding of the cause, so to be safe, revert and try again.	2020-12-06 11:34:19 +02:00
Piotr Sarna	2015988373	Merge 'types: get rid of linearization in deserialize()' from Michał Chojnowski Citing #6138: > In the past few years we have converted most of our codebase to work in terms of fragmented buffers, instead of linearised ones, to help avoid large allocations that put large pressure on the memory allocator. > One prominent component that still works exclusively in terms of linearised buffers is the types hierarchy, more specifically the de/serialization code to/from CQL format. Note that for most types, this is the same as our internal format, notable exceptions are non-frozen collections and user types. > > Most types are expected to contain reasonably small values, but texts, blobs and especially collections can get very large. Since the entire hierarchy shares a common interface we can either transition all or none to work with fragmented buffers. This series gets rid of intermediate linearizations in deserialization. The next steps are removing linearizations from serialization, validation and comparison code. Series summary: - Fix a bug in `fragmented_temporary_buffer::view::remove_prefix`. (Discovered while testing. Since it wasn't discovered earlier, I guess it doesn't occur in any code path in master.) - Add a `FragmentedView` concept to allow uniform handling of various types of fragmented buffers (`bytes_view`, `temporary_fragmented_buffer::view`, `ser::buffer_view` and likely `managed_bytes_view` in the future). - Implement `FragmentedView` for relevant fragmented buffer types. - Add helper functions for reading from `FragmentedView`. - Switch `deserialize()` and all its helpers from `bytes_view` to `FragmentedView`. - Remove `with_linearized()` calls which just became unnecessary. - Add an optimization for single-fragment cases. The addition of `FragmentedView` might be controversial, because another concept meant for the same purpose - `FragmentRange` - is already used. Unfortunately, it lacks the functionality we need. The main (only?) thing we want to do with a fragmented buffer is to extract a prefix from it and `FragmentRange` gives us no way to do that, because it's immutable by design. We can work around that by wrapping it into a mutable view which will track the offset into the immutable `FragmentRange`, and that's exactly what `linearizing_input_stream` is. But it's wasteful. `linearizing_input_stream` is a heavy type, unsuitable for passing around as a view - it stores a pair of fragment iterators, a fragment view and a size (11 words) to conform to the iterator-based design of `FragmentRange`, when one fragment iterator (4 words) already contains all needed state, just hidden. I suggest we replace `FragmentRange` with `FragmentedView` (or something similar) altogether. Refs: #6138 Closes #7692 * github.com:scylladb/scylla: types: collection: add an optimization for single-fragment buffers in deserialize types: add an optimization for single-fragment buffers in deserialize cql3: tuples: don't linearize in in_value::from_serialized cql3: expr: expression: replace with_linearize with linearized cql3: constants: remove unneeded uses of with_linearized cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell cql3: lists: remove unneeded use of with_linearized query-result-set: don't linearize in result_set_builder::deserialize types: remove unneeded collection deserialization overloads types: switch collection_type_impl::deserialize from bytes_view to FragmentedView cql3: sets: don't linearize in value::from_serialized cql3: lists: don't linearize in value::from_serialized cql3: maps: don't linearize in value::from_serialized types: remove unused deserialize_aux types: deserialize: don't linearize tuple elements types: deserialize: don't linearize collection elements types: switch deserialize from bytes_view to FragmentedView types: deserialize tuple types from FragmentedView types: deserialize set type from FragmentedView types: deserialize map type from FragmentedView types: deserialize list type from FragmentedView types: add FragmentedView versions of read_collection_size and read_collection_value types: deserialize varint type from FragmentedView types: deserialize floating point types from FragmentedView types: deserialize decimal type from FragmentedView types: deserialize duration type from FragmentedView types: deserialize IP address types from FragmentedView types: deserialize uuid types from FragmentedView types: deserialize timestamp type from FragmentedView types: deserialize simple date type from FragmentedView types: deserialize time type from FragmentedView types: deserialize boolean type from FragmentedView types: deserialize integer types from FragmentedView types: deserialize string types from FragmentedView types: remove unused read_simple_opt types: implement read_simple* versions for FragmentedView utils: fragmented_temporary_buffer: implement FragmentedView for view utils: fragment_range: add single_fragmented_view serializer: implement FragmentedView for buffer_view utils: fragment_range: add linearized and with_linearized for FragmentedView utils: fragment_range: add FragmentedView utils: fragmented_temporary_buffer: fix view::remove_prefix	2020-12-04 09:46:20 +01:00
Michał Chojnowski	a1f7fabb3d	types: collection: add an optimization for single-fragment buffers in deserialize Helpers parametrized with single_fragmented_view should compile to better code, so let's use them when possible.	2020-12-04 09:21:05 +01:00
Michał Chojnowski	08c394726e	types: add an optimization for single-fragment buffers in deserialize Values usually come in a single fragment, but we pay the cost of fragmented deserialization nevertheless: bigger view objects (4 words instead of 2 words) more state to keep updated (i.e. total view size in addition to current fragment size) and more branches. This patch adds a special case for single-fragment buffers to abstract_type::deserialize. They are converted to a single_fragmented_view before doing anything else. Templates instantiated with single_fragmented_view should compile to better code than their multi-fragmented counterparts. If abstract_type::deserialize is inlined, this patch should completely prevent any performance penalties for switching from with_linearized to fragmented deserialization.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	f75db1fcf5	cql3: tuples: don't linearize in in_value::from_serialized We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	68177a6721	cql3: expr: expression: replace with_linearize with linearized with_linearized creates an additional internal `bytes` when the input is fragmented. linearized copies the data directly to the output `bytes`, so it's more efficient.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	5ffe40d5a2	cql3: constants: remove unneeded uses of with_linearized We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	3c98806df9	cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	c43ef3951b	cql3: lists: remove unneeded use of with_linearized We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	0d5c5b8645	query-result-set: don't linearize in result_set_builder::deserialize We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	04786dee30	types: remove unneeded collection deserialization overloads Inherit the method from base class rather than reimplementing it in every child.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	c08419e28d	types: switch collection_type_impl::deserialize from bytes_view to FragmentedView Devirtualizes collection_type_impl::deserialize (so it can be templated) and adds a FragmentedView overload. This will allow us to deserialize collections with explicit cql_serialization_format directly from fragmented buffers.	2020-12-04 09:19:37 +01:00
Michał Chojnowski	d731b34d95	cql3: sets: don't linearize in value::from_serialized We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	64e64fd2b3	cql3: lists: don't linearize in value::from_serialized We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	536a2f8c8d	cql3: maps: don't linearize in value::from_serialized We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	58d9f52363	types: remove unused deserialize_aux Dead code.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	8440279130	types: deserialize: don't linearize tuple elements We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	a216b0545f	types: deserialize: don't linearize collection elements We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	1ccdfc7a90	types: switch deserialize from bytes_view to FragmentedView The final part of the transition of deserialize from bytes_view to FragmentedView. Adds a FragmentedView overload to abstract_type::deserialize and switches deserialize_visitor from bytes_view to FragmentedView, allowing deserialization of all types with no intermediate linearization.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	898cea4cde	types: deserialize tuple types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	507883f808	types: deserialize set type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	9b211a7285	types: deserialize map type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	5f1939554c	types: deserialize list type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	ad7ab73cd0	types: add FragmentedView versions of read_collection_size and read_collection_value We will need those to deserialize collections from FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	495bf5c431	types: deserialize varint type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	0f8ad89740	types: deserialize floating point types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	0bb0291e50	types: deserialize decimal type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	760bc5fd60	types: deserialize duration type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	75a56f439b	types: deserialize IP address types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	9f668929db	types: deserialize uuid types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	3e1a24ca0d	types: deserialize timestamp type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	a4bc43ab19	types: deserialize simple date type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	24bd986aea	types: deserialize time type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	c03ad52513	types: deserialize boolean type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	2f351928e2	types: deserialize integer types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	28b727082f	types: deserialize string types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	426308f526	types: remove unused read_simple_opt Dead code.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	e1145fe410	types: implement read_simple* versions for FragmentedView We will need those to switch deserialize() from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Benny Halevy	c7311d1080	docs: sstable-scylla-format: document large_data_type in more details This adds details about large_data_type on top of `ca5184052d` and introduces structured indentation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201202110539.634880-1-bhalevy@scylladb.com>	2020-12-02 13:25:49 +02:00
Avi Kivity	a95c2a946c	Merge 'mutation_reader: introduce clustering_order_reader_merger' from Kamil Braun This abstraction is used to merge the output of multiple readers, each opened for a single partition query, into a non-decreasing stream of mutation_fragments. It is similar to `mutation_reader_merger`, but an important difference is that the new merger may select new readers in the middle of a partition after it already returned some fragments from that partition. It uses the new `position_reader_queue` abstraction to select new readers. It doesn't support multi-partition (ring range) queries. The new merger will be later used when reading from sstable sets created by TimeWindowCompactionStrategy. This strategy creates many sstables that are mostly disjoint w.r.t the contained clustering keys, so we can delay opening sstable readers when querying a partition until after we have processed all mutation fragments with positions before the keys contained by these sstables. A microbenchmark was added that compares the existing combining reader (which uses `mutation_reader_merger` underneath) with a new combining reader built using the new `clustering_order_reader_merger` and a simple queue of readers that returns readers from some supplied set. The used set of readers is built from the following ranges of keys (each range corresponds to a single reader): `[0, 31]`, `[30, 61]`, `[60, 91]`, `[90, 121]`, `[120, 151]`. The microbenchmark runs the reader and divides the result by the number of mutation fragments. The results on my laptop were: ``` $ build/release/test/perf/perf_mutation_readers -t clustering_combined.* -r 10 single run iterations: 0 single run duration: 1.000s number of runs: 10 test iterations median mad min max clustering_combined.ranges_generic 2911678 117.598ns 0.685ns 116.175ns 119.482ns clustering_combined.ranges_specialized 3005618 111.015ns 0.349ns 110.063ns 111.840ns ``` `ranges_generic` denotes the existing combining reader, `ranges_specialized` denotes the new reader. Split from https://github.com/scylladb/scylla/pull/7437. Closes #7688 * github.com:scylladb/scylla: tests: mutation_source_test for clustering_order_reader_merger perf: microbenchmark for clustering_order_reader_merger mutation_reader_test: test clustering_order_reader_merger in memory test: generalize `random_subset` and move to header mutation_reader: introduce clustering_order_reader_merger	2020-12-02 12:15:35 +02:00
Kamil Braun	502ed2e9f7	tests: mutation_source_test for clustering_order_reader_merger	2020-12-02 11:13:58 +01:00
Nadav Har'El	fae2ba60e9	cql-pytest: start to port Cassandra's CQL unit tests In issue #7722, it was suggested that we should port Cassandra's CQL unit tests into our own repository, by translating the Java tests into Python using the new cql-pytest framework. Cassandra's CQL unit test framework is orders of magnitude faster than dtest, and in-tree, so Cassandra have been moving many CQL correctness tests there, and we can also benefit from their test cases. In this patch, we take the first step in a long journey: 1. I created a subdirectory, test/cql-pytest/cassandra_tests, where all the translated Cassandra tests will reside. The structure of this directory will mirror that of the test/unit/org/apache/cassandra/cql3 directory in the Cassandra repository. pytest conveniently looks for test files recursively, so when all the cql-pytest are run, the cassandra_tests files will be run as well. As usual, one can also run only a subset of all the tests, e.g., "test/cql-pytest/run -vs cassandra_tests" runs only the tests in the cassandra_tests subdirectory (and its subdirectories). 2. I translated into Python two of the smallest test files - validation/entities/{TimeuuidTest,DataTypeTest}.java - containing just three test functions. The plan is to translate entire Java test files one by one, and to mirror their original location in our own repository, so it will be easier to remember what we already translated and what remains to be done. 3. I created a small library, porting.py, of functions which resemble the common functions of the Java tests (CQLTester.java). These functions aim to make porting the tests easier. Despite the resemblence, the ported code is not 100% identical (of course) and some effort is still required in this porting. As we continue this porting effort, we'll probably need more of these functions, can can also continue to improve them to reduce the porting effort. Refs #7722. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201201192142.2285582-1-nyh@scylladb.com>	2020-12-02 09:29:22 +01:00
Avi Kivity	77466177ab	Merge 'Use large_data_counters in scylla_metadata to decide when to delete large_data records' from Benny Halevy This series introduces a `large_data_counters` element to `scylla_metadata` component to explicitly count the number of `large_{partitions,rows,cells}` and `too_many_rows` in the sstable. These are accounted for in the sstable writer whenever the respective large data entry is encountered. It is taken into account in `large_data_handler::maybe_delete_large_data_entries`, when engaged. Otherwise, if deleting a legacy sstable that has no such entry in `scylla_metadata`, just revert to using the current method of comparing the sstable's `data_size` to the various thresholds. Fixes #7668 Test: unit(dev) Dtest: wide_rows_test.py (in progress) Closes #7669 * github.com:scylladb/scylla: docs: sstable-scylla-format: add large_data_stats subcomponent large_data_handler: maybe_delete_large_data_entries: use sstable large data stats large_data_handler: maybe_delete_large_data_entries: accept shared_sstable large_data_handler: maybe_delete_large_data_entries: move out of line sstables: load large_data_stats from scylla_metadata sstables: store large_data_stats in scylla_metadata sstables: writer: keep track of large data stats large_data_handler: expose methods to get threshold sstables: kl/writer: never record too many rows large_data_handler: indicate recording of large data entries large_data_handler: move constructor out of line	2020-12-02 10:08:18 +02:00
Nadav Har'El	5c08489569	cql-pytest: don't run tests if Scylla boot timed out In test/cql-pytest/run.py we have a 200 second timeout to boot Scylla. I never expected to reach this timeout - it normally takes (in dev build mode) around 2 seconds, but in one run on Jenkins we did reach it. It turns out that the code does not recognize this timeout correctly, thought that Scylla booted correctly - and then failed all the subtests when they fail to connect to Scylla. This patch fixes the timeout logic. After the timeout, if Scylla's CQL port is still not responsive, the test run is failed - without trying to run many individual tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201201150927.2272077-1-nyh@scylladb.com>	2020-12-02 08:48:44 +02:00
Kamil Braun	2da723b9c8	cdc: produce postimage when inserting with no regular columns When a row was inserted into a table with no regular columns, and no such row existed in the first place, postimage would not be produced. Fix this. Fixes #7716. Closes #7723	2020-12-01 18:01:23 +02:00
Benny Halevy	ca5184052d	docs: sstable-scylla-format: add large_data_stats subcomponent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	4406a2514e	large_data_handler: maybe_delete_large_data_entries: use sstable large data stats If the sstable has scylla_metadata::large_data_stats use them to determine whether to delete the corresponding large data records. Otherwise, defer to the current method of comparing the sstable data_size to the respective thresholds. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	8cebe7776f	large_data_handler: maybe_delete_large_data_entries: accept shared_sstable Since the actual deletion if the large data entries is done in the background, and we don't captures the shared_sstable, we can safely pass it to maybe_delete_large_data_entries when deleting the sstable in sstable::unlink and it will be release as soon as maybe_delete_large_data_entries returns. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	f7d0ae3d10	large_data_handler: maybe_delete_large_data_entries: move out of line It is called on the cold path, when the sstable is deleted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00

1 2 3 4 5 ...

24523 Commits