scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 12:17:02 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	ffed8a5603	mutation_partition: fix iterator invalidation in trim_rows Reversed iterators are adaptors for 'normal' iterators. These underlying iterators point to different objects that the reversed iterators themselves. The consequence of this is that removing an element pointed to by a reversed iterator may invalidate reversed iterator which point to a completely different object. This is what happens in trim_rows for reversed queries. Erasing a row can invalidate end iterator and the loop would fail to stop. The solution is to introduce reversal_traits::erase_dispose_and_update_end() funcion which erases and disposes object pointed to by a given iterator but takes also a reference to and end iterator and updates it if necessary to make sure that it stays valid. Fixes #1609. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com> (cherry picked from commit `6012a7e733`)	2016-08-25 17:31:46 +03:00
Gleb Natapov	5fef0717cc	query: find latest modification timestamp while calculating result digest	2016-05-24 13:27:34 +03:00
Piotr Jastrzebski	23c23abe53	Make memtable mutation_reader slice using clustering ranges. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 11:46:41 +02:00
Gleb Natapov	b75475de80	query: fix result row counting for results with multiple partitions Message-Id: <1462377579-2419-1-git-send-email-gleb@scylladb.com>	2016-05-04 18:18:15 +02:00
Gleb Natapov	db322d8f74	query: put live row count into query::result The patch calculates row count during result building and while merging. If one of results that are being merged does not have row count the merged result will not have one either.	2016-05-02 15:10:15 +03:00
Tomasz Grabiec	c69d0a8e87	mutation_partition: Fix collection emptiness check Broken by `f15c380a4f`. This resulted in empty collection being returned in the results instead of no collection. Fixes org.apache.cassandra.cql3.validation.entities.CollectionsTest from cassandra-unit-tests.	2016-04-15 18:14:05 +02:00
Tomasz Grabiec	c2b955d40b	mutation_partition: Fix static row being returned when paginating Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test. Broken by `f15c380a4f`, where the calcualtion of has_ck_selector got broken, in such a way that present clustering restrictions were treated as if not present, which resulted in static row being returned when it shouldn't. While at it, unify the check between query_compacted() and do_compact() by extracting it to a function.	2016-04-08 20:53:33 +02:00
Tomasz Grabiec	a1539fed95	mutation_partition: Fix reversed trim_rows() The first erase_and_dispose(), which removes rows between last position and beginning of the next range, can invalidate end() iterator of the range. Fix by looking up end after erasing. mutation_partition::range() was split into lower_bound() and upper_bound() to allow for that. This affects for example queries with descending order where the selected clustering range is empty and falls before all rows. Exposed by `f15c380a4f`, which is now calling do_compact() during query. Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test	2016-04-08 20:53:33 +02:00
Avi Kivity	db03295c8a	Merge "Fix query digest mismatch" from Tomasz "Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165."	2016-04-08 12:13:29 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	f15c380a4f	database: Compact mutations when executing data queries Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165.	2016-04-07 19:56:58 +02:00
Tomasz Grabiec	dc290f0af7	mutation_partition: Make apply() atomic even in case of exception We cannot leave partially applied mutation behind when the write fails. It may fail if memory allocation fails in the middle of apply(). This for example would violate write atomicity, readers should either see the whole write or none at all. This fix makes apply() revert partially applied data upon failure, by the means of ReversiblyMergeable concept. In a nut shell the idea is to store old state in the source mutation as we apply it and swap back in case of exception. At cell level this swapping is inexpensive, just rewiring pointers. For this to work, the source mutation needs to be brought into mutable form, so frozen mutations need to be unfrozen. In practice this doesn't increase amount of cell allocations in the memtable apply path because incoming data will usually be newer and we will have to copy it into LSA anyway. There are extra allocations though for the data structures which holds cells. I didn't see significant change in performance of: build/release/tests/perf/perf_simple_query -c1 -m1G --write --duration 13 The score fluctuates around ~77k ops/s. Fixes #283.	2016-03-21 21:49:52 +01:00
Tomasz Grabiec	e09d186c7c	mutation_partition: Make intrusive sets ReversiblyMergeable	2016-03-21 21:49:52 +01:00
Tomasz Grabiec	e4a576a90f	mutation_partition: Make rows_entry ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	aadcd75d89	mutation_partition: Make row_marker ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	ea7c2dd085	mutation_partition: Make row ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	d5e66a5b0d	mutation_partition: row: Allow storing empty cells internally Currently only "set" storage could store empty cells, but not the "vector" one because there empty cell has the meaning of being missing. To implement rolback, we need to be able to distinguish empty cells from missing ones. Solve by making vector storage use a bitmap for presence checking instead of emptiness. This adds 4 bytes to vector storage.	2016-03-21 18:41:27 +01:00
Tomasz Grabiec	ed1e6515db	mutation_partition: Make row::merge() tolerate empty row The row may be empty and still have a set storage, in which case rbegin() dereference is undefined behavior.	2016-03-21 18:41:27 +01:00
Tomasz Grabiec	518e956736	mutation_partition: Make row::vector_to_set() exception-safe Currently allocation failure can leave the old row in a half-moved-from state and leak cell_entry objects.	2016-03-18 22:30:04 +01:00
Tomasz Grabiec	c91eefa183	mutation_partition: Unmark cell_entry's copy constructor as noexcept It was a mistake, it certainly may throw because it copies cells.	2016-03-18 22:30:04 +01:00
Paweł Dziepak	21e2ebcf8c	query: build only result, only digest or both Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Paweł Dziepak	46079f763b	query: add keys and tombstones to result digest Query result digest is used to verify that all replicas have the same data. Therefore, it needs to contain more information than the query result itself in order to ensure proper detection of disagreements. Generally, adding clustering keys to the digest regardless of whether the client asked for them will guarantee correctness. However, adding tombstones as well improves the chances of early detection of nodes containing stale data. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Paweł Dziepak	c1f7f11d54	mutation_partition: do not add ck to result when not asked to Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Amnon Heiman	1c7bc28d35	idl-compiler: change optional vector implementation This patch change the way optional vector are implemented. Now a vector of optional would be handle like any other non primitive types, with a single method add() that would return a writer to the optional. The writer to the optional would have a skip and write method like simple optional field. For basic types the write method would get the value as a parameter, for composite type, it would return a writer to the type. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1456796143-3366-2-git-send-email-amnon@scylladb.com>	2016-03-01 09:41:30 +02:00
Tomasz Grabiec	6cec131432	query: Switch to IDL-generated views and writers The query result footprint for cassandra-stress mutation as reported by tests/memory-footprint increased by 18% from 285 B to 337 B. perf_simple_query shows slight regression in throughput (-8%): build/release/tests/perf/perf_simple_query -c4 -m1G --partitions 100000 Before: ~433k tps After: ~400k tps	2016-02-26 12:26:13 +01:00
Tomasz Grabiec	4284715ddf	Relax includes	2016-02-26 12:26:13 +01:00
Tomasz Grabiec	a921479e71	Merge tag '807-v3' from https://github.com/avikivity/scylla From Avi: This patchset introduces a linearization context for managed_bytes objects. Within this context, any scattered managed_bytes (found only in lsa regions, so limited to memtable and cache) are auto-linearized for the lifetime of the context. This ensures that key and value lookups can use fast contiguous iterators instead of using slow discontiguous iterators (or crashing, as is the case now).	2016-02-16 14:29:48 +01:00
Avi Kivity	13144ea9eb	managed_bytes: get rid of explicit linearize/scatter Now that everything is in a linarization context, we don't need to explicitly gather data.	2016-02-16 14:37:46 +02:00
Tomasz Grabiec	63006e5dd2	query: Serialize collection cells using CQL format We want the format of query results to be eventually defined in the IDL and be independent of the format we use in memory to represent collections. This change is a step in this direction. The change decouples format of collection cells in query results from our in-memory representation. We currently use collection_mutation_view, after the change we will use CQL binary protocol format. We use that because it requires less transformations on the coordinator side. One complication is that some list operations need to retrieve keys used in list cells, not only values. To satisfy this need, new query option was added called "collections_as_maps" which will cause lists and sets to be reinterpreted as maps matching their underlying representation. This allows the coordinator to generate mutations referencing existing items in lists.	2016-02-15 17:05:55 +01:00
Tomasz Grabiec	036974e19b	Make mutation interfaces support multiple versions Schema is tracked in memtable and cache per-entry. Entries are upgraded lazily on access. Incoming mutations are upgraded to table's current schema on given shard. Mutating nodes need to keep schema_ptr alive in case schema version is requested by target node.	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	f59ec59abc	mutation: Implement upgrade() Converts mutation to a new schema.	2016-01-08 21:10:26 +01:00
Tomasz Grabiec	ade5cf1b4b	mutation_partition: Make visitable with mutation_partition_visitor	2016-01-08 21:10:25 +01:00
Tomasz Grabiec	6f955e1290	mutation_partition: Make equal() work with different schemas	2016-01-08 21:10:25 +01:00
Tomasz Grabiec	ff3a2e1239	mutation_partition: Drop row tombstones in do_compact()	2016-01-08 21:10:25 +01:00
Raphael S. Carvalho	03eee06784	remove empty rows in mutation_partition::do_compact do_compact() wasn't removing an empty row that is covered by a tombstone. As a result, an empty partition could be written to a sstable. To solve this problem, let's make trim_rows remove a row that is considered to be empty. A row is empty if it has no tombstone, no marker and no cells. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-05 15:19:21 +01:00
Calle Wilund	8c17e9e26c	mutation_partition: Do not return static row if CK range does not match Fixes #589 If we got no rows, but have live static columns, we should only give them back IFF we did not have any CK restrictions. If ck:s exist, and we have a restriction on them, we either have maching rows, or return nothing, since cql does not allow "is null".	2015-12-21 10:38:48 +00:00
Paweł Dziepak	71f92c4d14	mutation_partition: do not move rows_entry::_link Apparently, link hook copy constructor is a no-op and move contructor doesn't exist so the code is correct, but that explicit move makes code needlessly confusing. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-15 13:22:23 +01:00
Avi Kivity	fd14cb3743	mutation_partition: fix leak in move assignment operator The default move assignment operator calls boost::intrusive::set's move assignment operator, which leaks, because it does not believe it owns the data. Fix by providing a custom implementation.	2015-12-14 10:33:19 +01:00
Paweł Dziepak	64f50a4f40	db: make clustering_key a prefix Schemas using compact storage can have clustering keys with the trailing components not set and effectively being a clustering key prefixes instead of full clustering keys. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-10 05:46:47 +01:00
Paweł Dziepak	5f1e9fd88f	mutation_partition: remove unused find_entry() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-10 05:46:26 +01:00
Avi Kivity	f9e2a9a086	mutation_partition: work on linearized atomic_cell_or_mutation objects Ensure that when we examine atomic_cell_or_mutation objects for merging, that they are contiguous in memory. When we are done we scatter them again.	2015-12-08 15:17:09 +02:00
Calle Wilund	284b10cabe	Make partition_slice::row_ranges mulitplex on partition Allows for having more than one clustering row range set, depending on PK queried (although right now limited to one - which happens to be exactly the number of mutiplexing paging needs... What a coincidence...) Encapsulates the row_ranges member in a query function, and if needed holds ranges outside the default one in an extra object. Query result::builder::add_partition now fetches the correct row range for the partition, and this is the range used in subsequent iteration.	2015-11-10 13:12:33 +01:00
Tomasz Grabiec	5bbc902eec	mutation_partition: Drop now unnecessary unconst() usage This change was actually promised by `f74c665671`.	2015-10-22 17:12:03 +02:00
Tomasz Grabiec	c7be350961	mutation_partition: Rename reversion_traits to reversal_traits As pointed out by Nadav, 'reversion' is from 'revert', 'reversal' is from 'reverse'.	2015-10-22 18:09:07 +03:00
Tomasz Grabiec	f74c665671	mutation_partition: Add non-const-qualified version of range() and use it	2015-10-22 18:09:07 +03:00
Avi Kivity	0129e42b06	Merge "Mutation diff" from Paweł "This series add code for computing mutation_partition difference. For mutations A and B: diffA = A.difference(B); diffB = B.difference(A); AB = A.apply(B); diffA is the minimal mutation that when applied to B makes it equal to AB and diffB is the minimal mutation that applied to A results in AB. Fixes #430."	2015-10-22 16:38:25 +03:00
Paweł Dziepak	f78a80dfa3	mutation_partition: add method for computing difference Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-22 12:08:53 +02:00
Paweł Dziepak	85edc3de07	mutation_partition: compute row difference Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-22 12:08:53 +02:00
Paweł Dziepak	2aa96eb00f	mutation_partition: add insert_row() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-22 12:08:53 +02:00
Paweł Dziepak	a064181d7c	mutation_partition: add row::with_both_ranges() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-22 12:08:53 +02:00

1 2 3

103 Commits