scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 17:10:35 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	582d397c41	introduce counter_write_query() Counter write path involves read-modify-write. That read is guaranteed to query only a single partition, does not care about dead cells and expects to receive an unserialized mutation as a result. Standard mutation queries can are able to produce results fit for counter updates, but the logic involved is much more general (i.e. slower), hence the addition of new, counter-specific kind of query.	2017-03-01 16:33:36 +00:00
Duarte Nunes	7e150a18eb	mutation_partition: Introduce shadowable tombstone This patch introduces shadowable row tombstones. A shadowable row tombstone is valid only if the row has no live marker. In other words, the row tombstone is only valid as long as no newer insert is done (thus setting a live row marker; note that if the row timestamp set is lower than the tombstone's, then the tombstone remains in effect as usual). If a row has a shadowable tombstone with timestamp Ti and that row is updated with a timestamp Tj, such that Tj > Ti (and that update sets the row marker), then the shadowable tombstone is shadowed by that update. A concrete consequence is that if the update has cells with timestamp lower than Ti, then those cells are preserved (since the deletion is removed), and this is contrary to a regular, non-shadowable row tombstone where the tombstone is preserved and such cells are removed. Currently, only Materialized Views require shadowable row tombstones, which solve a problem with view row deletions. Consider a base row with columns p, v1, v2, PRIMARY KEY (p) denormalized into a view row consisting of columns p, v1, v2 PRIMARY KEY (p, v1), and the following operations: 1) INSERT INTO base (p, v1, v2) VALUES (0, 0, 1) USING TIMESTAMP 0; 2) UPDATE base SET v1 = 1 USING TIMESTAMP 1 WHERE p = 0; 3) UPDATE base SET v1 = 0 USING TIMESTAMP 2 WHERE p = 0; Without shadowable tombstones, the view contains: At 1), pk = (0, 0), row_marker@T0, v2=1@T0 At 2), pk = (0, 0), row_marker@T0, row_tombstone@T1, v2=1@T0 pk = (0, 1), row_marker@T1, v2=1@T0 At 3), pk = (0, 0), row_marker@T2, row_tombstone@T1, v2=1@T0 pk = (0, 1), row_marker@T1, row_tombstone@T2, v2=1@T0 Notice how, if we read row (0, 0), the value of v2 will be shadowed by the row tombstone we previously inserted. With a view's row tombstone becoming shadowable, at 3) the row (0, 0) will look like pk = (0, 0), row_marker@T2, shadowable_tombstone@T1, v2=1@T0, which is equivalent to pk = (0, 0), row_marker@T2, v2=1@T0. Since the shadowable tombstone is shadowed by the new row marker (T0 < T2), now v2 would be taken into account. Finally, note that this patch doesn't generalize the idea of shadowable tombstone, instead taking advantage of the fact that they are only needed by Materialized Views. This saves changing the tombstone representation to account for an extra flag, the bits such representation would require, and also avoids changes to the storage format. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Paweł Dziepak	b6564651e4	mutation_partition: make for_each_cell() accessible outside source file for_each_cell() const already can be used from any place in the code, allow the same with non-const version.	2017-02-02 10:35:14 +00:00
Piotr Jastrzebski	15cc8460bd	mutation_partition: make rows_entry constructors explicit All converting constructors should be explicit otherwise they can create a confusion. I got myself in such a situation when clustering key got implicitly converted into rows_entry when I was not expecting it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c3f19719760f6dc7cf5e858b9c452506faedf521.1485950529.git.piotr@scylladb.com>	2017-02-01 17:57:50 +01:00
Tomasz Grabiec	ddfee57c97	Replace iostream include with iosfwd in headers Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:44 +02:00
Piotr Jastrzebski	041b0a65ac	Implement intrusive set using rbtree_algorithms This new implementation takes less memory because it does not store comparator. It also uses tree nodes optimized for size. This means that instead of storing an enum field \|color\| they embed this information inside pointer to parent. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:46:58 +01:00
Piotr Jastrzebski	4bbe05dd47	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Piotr Jastrzebski	fe3c91db90	mutation_partition: Extract intrusive set logic to a class. It will make it easier to change the implementation of the intrusive set. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Avi Kivity	1d9ee358f1	Revert "Merge "Reduce the size of mutation_partition" from Piotr" This reverts commit `aa392810ff`, reversing changes made to a24ff47c637e6a5fd158099b8a65f1191fc2d023; it uses boost::intrusive::detail directly, which it must not, and doesn't compile on all boost versions as a consequence.	2016-12-25 16:07:48 +02:00
Piotr Jastrzebski	671affc36c	Implement intrusive set using rbtree_algorithms This new implementation takes less memory because it does not store comparator. It also uses tree nodes optimized for size. This means that instead of storing an enum field \|color\| they embed this information inside pointer to parent. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:32:13 +01:00
Piotr Jastrzebski	2af6ff68d9	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:29:07 +01:00
Piotr Jastrzebski	b3b924dec9	mutation_partition: Extract intrusive set logic to a class. It will make it easier to change the implementation of the intrusive set. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:29:07 +01:00
Paweł Dziepak	ef57b9a26f	rename memory_usage() to external_memory_usage() where applicable Renaming the function to external_memory_usage() makes it clear that sizeof(T) is not included, something that was a source of confusion in the past. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-11-18 11:25:36 +00:00
Piotr Jastrzebski	b05b90b3a5	Introduce clustering_key_filter_ranges. This fixes the problem of multiple concurrent get_ranges calls. Previously each call was invalidating the result of the previous call. Now they don't step on each other foot. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 19:46:38 +02:00
Duarte Nunes	5161ea283f	query: query::clustering_range can't wrap around This patch changes the type of query::clustering_range to express that ranges that wrap around are not allowed, and ranges that have the start bound after the end bound are considered empty. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-15 14:50:20 +00:00
Paweł Dziepak	27fea7bf2c	mutation_partition: add non-cons rows and tombstones accessors Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:07 +01:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Paweł Dziepak	23d0bfd065	mutation_partition: add row::memory_usage() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:17:25 +01:00
Paweł Dziepak	f95c5542dc	mutation_partition: allow slicing moved mutation_partition Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	22160ae6d5	mutation_partition: make rows_type public Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	847bf878ec	mutation_partition: add more row::apply() overloads Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Duarte Nunes	70083efee2	sstables: Read and write range tombstone bounds This patch uses the composite_marker to add inclusiveness information to the prefixes of a range tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	7628e403a3	sstables: Drop code for tombstone merging Since Scylla now supports proper range tombstones, the code for reading ranges from sstables and converting them to overlapping tombstones is no longer necessary, and is, in fact, wasteful as the internal representation converts overlapping tombstones back to ranges. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	91aac30f12	mutations: Row tombstones are now a set of ranges This patch changes the type of the mutation partition's row_tombstones to be a range_tombstone_list, so that they are now represented as a set of disjoint ranges. All of its usages are updated accordingly. Fixes #1155 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Piotr Jastrzebski	23c23abe53	Make memtable mutation_reader slice using clustering ranges. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 11:46:41 +02:00
Tomasz Grabiec	a1539fed95	mutation_partition: Fix reversed trim_rows() The first erase_and_dispose(), which removes rows between last position and beginning of the next range, can invalidate end() iterator of the range. Fix by looking up end after erasing. mutation_partition::range() was split into lower_bound() and upper_bound() to allow for that. This affects for example queries with descending order where the selected clustering range is empty and falls before all rows. Exposed by `f15c380a4f`, which is now calling do_compact() during query. Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test	2016-04-08 20:53:33 +02:00
Avi Kivity	db03295c8a	Merge "Fix query digest mismatch" from Tomasz "Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165."	2016-04-08 12:13:29 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	f15c380a4f	database: Compact mutations when executing data queries Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165.	2016-04-07 19:56:58 +02:00
Tomasz Grabiec	a7966e9b71	mutation_partition: Fix friend declarations Missing "class" confuses CLion IDE.	2016-03-21 21:49:53 +01:00
Tomasz Grabiec	dc290f0af7	mutation_partition: Make apply() atomic even in case of exception We cannot leave partially applied mutation behind when the write fails. It may fail if memory allocation fails in the middle of apply(). This for example would violate write atomicity, readers should either see the whole write or none at all. This fix makes apply() revert partially applied data upon failure, by the means of ReversiblyMergeable concept. In a nut shell the idea is to store old state in the source mutation as we apply it and swap back in case of exception. At cell level this swapping is inexpensive, just rewiring pointers. For this to work, the source mutation needs to be brought into mutable form, so frozen mutations need to be unfrozen. In practice this doesn't increase amount of cell allocations in the memtable apply path because incoming data will usually be newer and we will have to copy it into LSA anyway. There are extra allocations though for the data structures which holds cells. I didn't see significant change in performance of: build/release/tests/perf/perf_simple_query -c1 -m1G --write --duration 13 The score fluctuates around ~77k ops/s. Fixes #283.	2016-03-21 21:49:52 +01:00
Tomasz Grabiec	e09d186c7c	mutation_partition: Make intrusive sets ReversiblyMergeable	2016-03-21 21:49:52 +01:00
Tomasz Grabiec	f1a4feb1fc	mutation_partition: Make row_tombstones_entry ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	e4a576a90f	mutation_partition: Make rows_entry ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	aadcd75d89	mutation_partition: Make row_marker ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	ea7c2dd085	mutation_partition: Make row ReversiblyMergeable	2016-03-21 19:26:24 +01:00
Tomasz Grabiec	9fc7f8a5ed	mutation_partition: row: Add empty()	2016-03-21 18:41:27 +01:00
Tomasz Grabiec	d5e66a5b0d	mutation_partition: row: Allow storing empty cells internally Currently only "set" storage could store empty cells, but not the "vector" one because there empty cell has the meaning of being missing. To implement rolback, we need to be able to distinguish empty cells from missing ones. Solve by making vector storage use a bitmap for presence checking instead of emptiness. This adds 4 bytes to vector storage.	2016-03-21 18:41:27 +01:00
Tomasz Grabiec	8134992024	mutation_partition: Add cell_entry constructor which makes an empty cell	2016-03-18 22:30:04 +01:00
Tomasz Grabiec	c91eefa183	mutation_partition: Unmark cell_entry's copy constructor as noexcept It was a mistake, it certainly may throw because it copies cells.	2016-03-18 22:30:04 +01:00
Tomasz Grabiec	6cec131432	query: Switch to IDL-generated views and writers The query result footprint for cassandra-stress mutation as reported by tests/memory-footprint increased by 18% from 285 B to 337 B. perf_simple_query shows slight regression in throughput (-8%): build/release/tests/perf/perf_simple_query -c4 -m1G --partitions 100000 Before: ~433k tps After: ~400k tps	2016-02-26 12:26:13 +01:00
Tomasz Grabiec	4284715ddf	Relax includes	2016-02-26 12:26:13 +01:00
Avi Kivity	1f245e3bcb	mutation_partition: fix use of boost::intrusive::set<>::comp() Seems like boost::intrusive::set<>::comp() is not accessible on some versions of boost. Replace by the equivalent boost::intrusive::set<>::key_comp(). Fixes #858. Message-Id: <1454326483-29780-1-git-send-email-avi@scylladb.com>	2016-02-01 13:54:52 +01:00
Tomasz Grabiec	036974e19b	Make mutation interfaces support multiple versions Schema is tracked in memtable and cache per-entry. Entries are upgraded lazily on access. Incoming mutations are upgraded to table's current schema on given shard. Mutating nodes need to keep schema_ptr alive in case schema version is requested by target node.	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	f59ec59abc	mutation: Implement upgrade() Converts mutation to a new schema.	2016-01-08 21:10:26 +01:00
Tomasz Grabiec	2cfdfe261d	Introduce converting_mutation_partition_applier	2016-01-08 21:10:26 +01:00
Tomasz Grabiec	a6084ee007	mutation: Make hashable The computed hash is independent of any internal representation thus can be used as a digest across nodes and versions.	2016-01-08 21:10:26 +01:00
Tomasz Grabiec	ade5cf1b4b	mutation_partition: Make visitable with mutation_partition_visitor	2016-01-08 21:10:25 +01:00
Tomasz Grabiec	bc9ee083dd	db: Move atomic_cell_or_collection to separate header To break future cyclic dependency: atomic_cell.hh -> schema.hh (new) -> types.hh -> atomic_cell.hh	2016-01-08 21:10:25 +01:00
Tomasz Grabiec	6f955e1290	mutation_partition: Make equal() work with different schemas	2016-01-08 21:10:25 +01:00

1 2 3

117 Commits