scylladb

Author	SHA1	Message	Date
Paweł Dziepak	e95f4eaee4	Merge "partition_limit: Don't count dead partitions" from Duarte "This patch series ensures we don't count dead partitions (i.e., partitions with no live rows) towards the partition_limit. We also enforce the partition limit at the storage_proxy level, so that limits with smp > 1 works correctly." (cherry picked from commit `5f11a727c9`)	2016-08-03 12:44:32 +03:00
Duarte Nunes	21d0a2c764	query: Optionally send cell ttl This patch adds support to send a cell's ttl as part of a query's result. This is needed for thrift support. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-14 15:36:23 +02:00
Paweł Dziepak	93cc4454a6	streamed_mutation: emit range_tombstones directly Originally, streamed_mutations guaranteed that emitted tombstones are disjoint. In order to achieve that two separate objects were produced for each range tombstone: range_tombstone_begin and range_tombstone_end. Unfortunately, this forced sstable writer to accumulate all clustering rows between range_tombstone_begin and range_tombstone_end. However, since there is no need to write disjoint tombstones to sstables (see #1153 "Write range tombstones to sstables like Cassandra does") it is also not necessary for streamed_mutations to produce disjoint range tombstones. This patch changes that by making streamed_mutation produce range_tombstone objects directly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:18 +01:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Paweł Dziepak	23d0bfd065	mutation_partition: add row::memory_usage() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:17:25 +01:00
Paweł Dziepak	7a95847014	mutation_compactor: prepare for sstable compaction compact_mutation code is going to be shared among queries and sstable compaction. There are some differences though. Queries don't provide _max_purgeable and sstable compaction don't need any limits. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	4133cc7a53	mutation_reader: make consume_flattened() produce decorated keys Since decorated keys are already computed it is better to pass more information than less. Consumers interested just in partition key can just drop token and the ones requiring full decorated key don't need to recompute it. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:00 +01:00
Paweł Dziepak	3e86f9ab73	mutation_partition: extract compact_for_query to a separate header The compacting logic inside compact_for_query is going to be shared with sstable compaction. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	b70bf086b7	frozen_mutation: handle reversed streams properly Freezing streamed_mutations assumed that mutation fragments are streamed in the order they appear in the frozen mutation. That is not true for reversed streams. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1467277069-18702-1-git-send-email-pdziepak@scylladb.com>	2016-06-30 11:26:45 +02:00
Duarte Nunes	69798df95e	query: Limit number of partitions returned This is required to implement a thrift verb. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:48:13 +02:00
Duarte Nunes	594e43a60a	compact_query: Rename partition_limit This patch renames compact_query::_partition_limit to _current_partition_limit for clarity, as the next patch adds a partition limit that limits the number of partitions. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:47:29 +02:00
Duarte Nunes	e9ebd87991	compact_query: Rename limit to row_limit This patch renames compact_query::_limit to _row_limit for clarity, as a subsequent patch introduces yet another limit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:47:28 +02:00
Duarte Nunes	01b18063ea	query: Add per-partition row limit This patch as a per-partition row limit. It ensures both local queries and the reconciliation logic abide by this limit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:46:51 +02:00
Paweł Dziepak	ed12c164f8	mutation_query: make mutation queries streaming-friendly Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:31:28 +01:00
Paweł Dziepak	0828c88b25	mutation_partition: implement streaming-friendly data_query() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:31:19 +01:00
Paweł Dziepak	67ae9457e3	mutation_partition: introduce mutation_querier mutation_querier is a streamed_mutation consumer that adds the mutation content to query::result. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:53 +01:00
Paweł Dziepak	f54e604a16	mutation_partition: introduce compact_for_query compact_for_query is an intermediate stage used to compact data in a flattened stream of mutations before they are consumed by query building consumers. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:53 +01:00
Paweł Dziepak	f95c5542dc	mutation_partition: allow slicing moved mutation_partition Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	5a60f6d1ec	range_tombstone: extract is_single_clustering_row_tombstone() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	847bf878ec	mutation_partition: add more row::apply() overloads Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Duarte Nunes	70083efee2	sstables: Read and write range tombstone bounds This patch uses the composite_marker to add inclusiveness information to the prefixes of a range tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	7628e403a3	sstables: Drop code for tombstone merging Since Scylla now supports proper range tombstones, the code for reading ranges from sstables and converting them to overlapping tombstones is no longer necessary, and is, in fact, wasteful as the internal representation converts overlapping tombstones back to ranges. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	95594b8171	mutations: Encapsulate row tombstones difference This patch moves the difference between two mutation_partition's row_tombstones inside the range_tombstone_list. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	91aac30f12	mutations: Row tombstones are now a set of ranges This patch changes the type of the mutation partition's row_tombstones to be a range_tombstone_list, so that they are now represented as a set of disjoint ranges. All of its usages are updated accordingly. Fixes #1155 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Gleb Natapov	5fef0717cc	query: find latest modification timestamp while calculating result digest	2016-05-24 13:27:34 +03:00
Piotr Jastrzebski	23c23abe53	Make memtable mutation_reader slice using clustering ranges. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 11:46:41 +02:00
Gleb Natapov	b75475de80	query: fix result row counting for results with multiple partitions Message-Id: <1462377579-2419-1-git-send-email-gleb@scylladb.com>	2016-05-04 18:18:15 +02:00
Gleb Natapov	db322d8f74	query: put live row count into query::result The patch calculates row count during result building and while merging. If one of results that are being merged does not have row count the merged result will not have one either.	2016-05-02 15:10:15 +03:00
Tomasz Grabiec	c69d0a8e87	mutation_partition: Fix collection emptiness check Broken by `f15c380a4f`. This resulted in empty collection being returned in the results instead of no collection. Fixes org.apache.cassandra.cql3.validation.entities.CollectionsTest from cassandra-unit-tests.	2016-04-15 18:14:05 +02:00
Tomasz Grabiec	c2b955d40b	mutation_partition: Fix static row being returned when paginating Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test. Broken by `f15c380a4f`, where the calcualtion of has_ck_selector got broken, in such a way that present clustering restrictions were treated as if not present, which resulted in static row being returned when it shouldn't. While at it, unify the check between query_compacted() and do_compact() by extracting it to a function.	2016-04-08 20:53:33 +02:00
Tomasz Grabiec	a1539fed95	mutation_partition: Fix reversed trim_rows() The first erase_and_dispose(), which removes rows between last position and beginning of the next range, can invalidate end() iterator of the range. Fix by looking up end after erasing. mutation_partition::range() was split into lower_bound() and upper_bound() to allow for that. This affects for example queries with descending order where the selected clustering range is empty and falls before all rows. Exposed by `f15c380a4f`, which is now calling do_compact() during query. Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test	2016-04-08 20:53:33 +02:00
Avi Kivity	db03295c8a	Merge "Fix query digest mismatch" from Tomasz "Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165."	2016-04-08 12:13:29 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	f15c380a4f	database: Compact mutations when executing data queries Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165.	2016-04-07 19:56:58 +02:00
Tomasz Grabiec	dc290f0af7	mutation_partition: Make apply() atomic even in case of exception We cannot leave partially applied mutation behind when the write fails. It may fail if memory allocation fails in the middle of apply(). This for example would violate write atomicity, readers should either see the whole write or none at all. This fix makes apply() revert partially applied data upon failure, by the means of ReversiblyMergeable concept. In a nut shell the idea is to store old state in the source mutation as we apply it and swap back in case of exception. At cell level this swapping is inexpensive, just rewiring pointers. For this to work, the source mutation needs to be brought into mutable form, so frozen mutations need to be unfrozen. In practice this doesn't increase amount of cell allocations in the memtable apply path because incoming data will usually be newer and we will have to copy it into LSA anyway. There are extra allocations though for the data structures which holds cells. I didn't see significant change in performance of: build/release/tests/perf/perf_simple_query -c1 -m1G --write --duration 13 The score fluctuates around ~77k ops/s. Fixes #283.	2016-03-21 21:49:52 +01:00
Tomasz Grabiec	e09d186c7c	mutation_partition: Make intrusive sets ReversiblyMergeable	2016-03-21 21:49:52 +01:00
Tomasz Grabiec	e4a576a90f	mutation_partition: Make rows_entry ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	aadcd75d89	mutation_partition: Make row_marker ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	ea7c2dd085	mutation_partition: Make row ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	d5e66a5b0d	mutation_partition: row: Allow storing empty cells internally Currently only "set" storage could store empty cells, but not the "vector" one because there empty cell has the meaning of being missing. To implement rolback, we need to be able to distinguish empty cells from missing ones. Solve by making vector storage use a bitmap for presence checking instead of emptiness. This adds 4 bytes to vector storage.	2016-03-21 18:41:27 +01:00
Tomasz Grabiec	ed1e6515db	mutation_partition: Make row::merge() tolerate empty row The row may be empty and still have a set storage, in which case rbegin() dereference is undefined behavior.	2016-03-21 18:41:27 +01:00
Tomasz Grabiec	518e956736	mutation_partition: Make row::vector_to_set() exception-safe Currently allocation failure can leave the old row in a half-moved-from state and leak cell_entry objects.	2016-03-18 22:30:04 +01:00
Tomasz Grabiec	c91eefa183	mutation_partition: Unmark cell_entry's copy constructor as noexcept It was a mistake, it certainly may throw because it copies cells.	2016-03-18 22:30:04 +01:00
Paweł Dziepak	21e2ebcf8c	query: build only result, only digest or both Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Paweł Dziepak	46079f763b	query: add keys and tombstones to result digest Query result digest is used to verify that all replicas have the same data. Therefore, it needs to contain more information than the query result itself in order to ensure proper detection of disagreements. Generally, adding clustering keys to the digest regardless of whether the client asked for them will guarantee correctness. However, adding tombstones as well improves the chances of early detection of nodes containing stale data. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Paweł Dziepak	c1f7f11d54	mutation_partition: do not add ck to result when not asked to Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Amnon Heiman	1c7bc28d35	idl-compiler: change optional vector implementation This patch change the way optional vector are implemented. Now a vector of optional would be handle like any other non primitive types, with a single method add() that would return a writer to the optional. The writer to the optional would have a skip and write method like simple optional field. For basic types the write method would get the value as a parameter, for composite type, it would return a writer to the type. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1456796143-3366-2-git-send-email-amnon@scylladb.com>	2016-03-01 09:41:30 +02:00
Tomasz Grabiec	6cec131432	query: Switch to IDL-generated views and writers The query result footprint for cassandra-stress mutation as reported by tests/memory-footprint increased by 18% from 285 B to 337 B. perf_simple_query shows slight regression in throughput (-8%): build/release/tests/perf/perf_simple_query -c4 -m1G --partitions 100000 Before: ~433k tps After: ~400k tps	2016-02-26 12:26:13 +01:00
Tomasz Grabiec	4284715ddf	Relax includes	2016-02-26 12:26:13 +01:00
Tomasz Grabiec	a921479e71	Merge tag '807-v3' from https://github.com/avikivity/scylla From Avi: This patchset introduces a linearization context for managed_bytes objects. Within this context, any scattered managed_bytes (found only in lsa regions, so limited to memtable and cache) are auto-linearized for the lifetime of the context. This ensures that key and value lookups can use fast contiguous iterators instead of using slow discontiguous iterators (or crashing, as is the case now).	2016-02-16 14:29:48 +01:00

1 2 3

126 Commits