scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Glauber Costa	d49ecae201	mutation_partition: estimate size of partition In the memtable flusher, we account for the size of a partition as we read them. However, there are other points in the architecture where we would like to calculate the size of a partition in a point in which we are not reading it. One such example is the cache update process. This patch enhances the mutation_partition adding a method that returns the total size for this partition. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 16:21:44 -05:00
Tomasz Grabiec	749f5770df	mutation: Introduce apply(mutation_fragment)	2017-11-02 12:16:17 +01:00
Tomasz Grabiec	72028bb048	mutation_partition: Allow creating rows_entry at any clustered position_in_partition In preparation for supporting setting continuity of arbitrary clustering range.	2017-11-02 11:05:19 +01:00
Tomasz Grabiec	409adc045a	mutation_partition: Remove delegating_compare() It can't work with rows_entry at any position_in_partition, so we need to drop it.	2017-11-02 11:05:19 +01:00
Tomasz Grabiec	455a1b0d24	mutation_partition: Introduce range continuity checking methods	2017-09-13 17:47:04 +02:00
Tomasz Grabiec	abc489e99d	mutation_partition: Enable rows_entry::compare() on position_in_partition_views For full symmetry with existing overloads.	2017-09-13 17:47:04 +02:00
Tomasz Grabiec	b6ae5783cd	mvcc: Introduce partition_entry::evict() The operation frees as much memory as possible, marking affected mutation elements as discontinuous.	2017-09-13 17:47:03 +02:00
Paweł Dziepak	43cce6c2f4	rows_entry: make position() inlineable	2017-07-26 14:38:27 +01:00
Tomasz Grabiec	0770845a23	mutation_partition: Introduce r-value accepting deletable_row::apply()	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	efc75b0bc3	mutation_partition: Add rows_entry constructor which accepts full contents [tgrabiec: Extracted from different patch]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	dce293e11c	tests: row_cache: Apply only fully continuous mutations to underlying mutation source Cache currently assumes that mutations coming from outside are fully continuous.	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	05b56fcfb0	mutation_partition: Add support for specifying continuity This will allow expressing lack of information about certain ranges of rows (including the static row), which will be used in cache to determine if information in cache is complete or not. Continuity is represented internally using flags on row entries. The key range between two consecutive entries is continuous iff rows_entry::continuous() is true for the later entry. The range starting after the last entry is assumed to be continuous. The range corresponding to the key of the entry is continuous iff rows_entry::dummy() is false. [tgrabiec: - based on the following commits: 4a5bf75 - Piotr Jastrzebski : mutation_partition: introduce dummy rows_entry 773070e - Piotr Jastrzebski : mutation_partition: add continuity flag to rows_entry - documented that partition tombstone is always complete - require specifying the partition tombstone when creating an incomplete entry - replaced rows_entry(dummy_tag, ...) constructor with more general rows_entry(position_in_partition, ...) - documented continuity semantics on mutation_partition - fixed _static_row_cached being lost by mutation_partition copy constructors - fixed conversion to streamed_mutation to ignore dummy entries - fixed mutation_partition serializer to drop dummy entries - documented semantics of continuity on mutation_partition level - dropped assumptions that dummy entries can be only at the last position - changed equality to ignore continuity completely, rather than partially (it was not ignoring dummy entries, but ignoring continuity flag) - added printout of continuity information in mutation_partition - fixed handling of empty entries in apply_reversibly() with regards to continuity; we no longer can remove empty entries before merging, since that may affect continuity of the right-hand mutation. Added _erased flag. - fixed mutation_partition::clustered_row() with dummy==true to not ignore the key - fixed partition_builder to not ignore continuity - renamed dummy_tag_t to dummy_tag. _t suffix is reserved. - standardized all APIs on is_dummy and is_continuous bool_class:es - replaced add_dummy_entry() with ensure_last_dummy() with safer semantics - dropped unused remove_dummy_entry() - simplified and inlined cache_entry::add_dummy_entry() - fixed mutation_partition(incomplete_tag) constructor to mark all row ranges as discontinuous ]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	a77734952d	mutation_partition: Make rows_entry comparable with position_in_partition	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	65b3123516	mutation_partition: Use rows_entry::position() in comparators key() will not be valid for dummy entries, but position() is always valid. [tgrabiec: Extracted from other commits] [tgrabiec: Added missing change to range_tombstone_stream::get_next]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	660f3127a6	mutation_partition: Introduce rows_entry::position() In preparation for enabling dummy entries with postion past all clustering rows.	2017-06-24 18:06:11 +02:00
Gleb Natapov	f5679e0416	database: remove remnants of no longer existing db::serializer. Message-Id: <20170604100552.GD8248@scylladb.com>	2017-06-04 13:07:17 +03:00
Duarte Nunes	9e88b60ef5	mutation: Set cell using clustering_key_prefix Change the clustering key argument in mutation::set_cell from exploded_clustering_prefix to clustering_key_prefix, which allows for some overall code simplification and fewer copies. This mostly affects the cql3 layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Duarte Nunes	db63ffdbb4	mutation_partition: Harmonize apply_delete overloads This patch ensures the different mutation_partition::apply_delete() overloads behave similarly, so that, for example, an empty clustering key is treated the same way as an empty exploded_clustering_key_prefix. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	6a2bccd4ae	mutation_partion: Introduce row_tombstone This patch introduces the row_tombstone class, which represents a tombstone made up of a regular tombstone and a shadowable one. The rules for row_tombstones are as follows: - The shadowable tombstone is always >= than the regular one; - The regular tombstone works as expected; - The shadowable tombstone doesn't erase or compact away the regular row tombstone, nor dead cells; - The shadowable tombstone can erase live cells, but only provided they can be recovered (e.g., by including all cells in a MV update, both updated cells and pre-existing ones); - The shadowable tombstone can be erased or compacted away by a newer row marker. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:28 +02:00
Duarte Nunes	3d49c1da01	mutation_partition: Introduce shadowable tombstones A shadowable tombstone is a tombstone that can be replaced by a smaller one if provided a row_marker with a bigger timestamp than the shadowable tombstone. In the context of a row, it is only valid as long as no newer insert is done (thus setting a live row marker; note that if the row timestamp set is lower than the tombstone's, then the tombstone remains in effect as usual). If a row has a shadowable tombstone with timestamp Ti and that row is updated with a timestamp Tj, such that Tj > Ti (and that update sets the row marker), then the shadowable tombstone is shadowed by that update. A concrete consequence is that if the update has cells with timestamp lower than Ti, then those cells are preserved (since the deletion is removed), and this is contrary to a regular, non-shadowable row tombstone where the tombstone is preserved and such cells are removed. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:22 +02:00
Duarte Nunes	392403b5b3	row_marker: Mark constructors explicit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Paweł Dziepak	582d397c41	introduce counter_write_query() Counter write path involves read-modify-write. That read is guaranteed to query only a single partition, does not care about dead cells and expects to receive an unserialized mutation as a result. Standard mutation queries can are able to produce results fit for counter updates, but the logic involved is much more general (i.e. slower), hence the addition of new, counter-specific kind of query.	2017-03-01 16:33:36 +00:00
Duarte Nunes	7e150a18eb	mutation_partition: Introduce shadowable tombstone This patch introduces shadowable row tombstones. A shadowable row tombstone is valid only if the row has no live marker. In other words, the row tombstone is only valid as long as no newer insert is done (thus setting a live row marker; note that if the row timestamp set is lower than the tombstone's, then the tombstone remains in effect as usual). If a row has a shadowable tombstone with timestamp Ti and that row is updated with a timestamp Tj, such that Tj > Ti (and that update sets the row marker), then the shadowable tombstone is shadowed by that update. A concrete consequence is that if the update has cells with timestamp lower than Ti, then those cells are preserved (since the deletion is removed), and this is contrary to a regular, non-shadowable row tombstone where the tombstone is preserved and such cells are removed. Currently, only Materialized Views require shadowable row tombstones, which solve a problem with view row deletions. Consider a base row with columns p, v1, v2, PRIMARY KEY (p) denormalized into a view row consisting of columns p, v1, v2 PRIMARY KEY (p, v1), and the following operations: 1) INSERT INTO base (p, v1, v2) VALUES (0, 0, 1) USING TIMESTAMP 0; 2) UPDATE base SET v1 = 1 USING TIMESTAMP 1 WHERE p = 0; 3) UPDATE base SET v1 = 0 USING TIMESTAMP 2 WHERE p = 0; Without shadowable tombstones, the view contains: At 1), pk = (0, 0), row_marker@T0, v2=1@T0 At 2), pk = (0, 0), row_marker@T0, row_tombstone@T1, v2=1@T0 pk = (0, 1), row_marker@T1, v2=1@T0 At 3), pk = (0, 0), row_marker@T2, row_tombstone@T1, v2=1@T0 pk = (0, 1), row_marker@T1, row_tombstone@T2, v2=1@T0 Notice how, if we read row (0, 0), the value of v2 will be shadowed by the row tombstone we previously inserted. With a view's row tombstone becoming shadowable, at 3) the row (0, 0) will look like pk = (0, 0), row_marker@T2, shadowable_tombstone@T1, v2=1@T0, which is equivalent to pk = (0, 0), row_marker@T2, v2=1@T0. Since the shadowable tombstone is shadowed by the new row marker (T0 < T2), now v2 would be taken into account. Finally, note that this patch doesn't generalize the idea of shadowable tombstone, instead taking advantage of the fact that they are only needed by Materialized Views. This saves changing the tombstone representation to account for an extra flag, the bits such representation would require, and also avoids changes to the storage format. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Paweł Dziepak	b6564651e4	mutation_partition: make for_each_cell() accessible outside source file for_each_cell() const already can be used from any place in the code, allow the same with non-const version.	2017-02-02 10:35:14 +00:00
Piotr Jastrzebski	15cc8460bd	mutation_partition: make rows_entry constructors explicit All converting constructors should be explicit otherwise they can create a confusion. I got myself in such a situation when clustering key got implicitly converted into rows_entry when I was not expecting it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c3f19719760f6dc7cf5e858b9c452506faedf521.1485950529.git.piotr@scylladb.com>	2017-02-01 17:57:50 +01:00
Tomasz Grabiec	ddfee57c97	Replace iostream include with iosfwd in headers Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:44 +02:00
Piotr Jastrzebski	041b0a65ac	Implement intrusive set using rbtree_algorithms This new implementation takes less memory because it does not store comparator. It also uses tree nodes optimized for size. This means that instead of storing an enum field \|color\| they embed this information inside pointer to parent. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:46:58 +01:00
Piotr Jastrzebski	4bbe05dd47	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Piotr Jastrzebski	fe3c91db90	mutation_partition: Extract intrusive set logic to a class. It will make it easier to change the implementation of the intrusive set. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Avi Kivity	1d9ee358f1	Revert "Merge "Reduce the size of mutation_partition" from Piotr" This reverts commit `aa392810ff`, reversing changes made to a24ff47c637e6a5fd158099b8a65f1191fc2d023; it uses boost::intrusive::detail directly, which it must not, and doesn't compile on all boost versions as a consequence.	2016-12-25 16:07:48 +02:00
Piotr Jastrzebski	671affc36c	Implement intrusive set using rbtree_algorithms This new implementation takes less memory because it does not store comparator. It also uses tree nodes optimized for size. This means that instead of storing an enum field \|color\| they embed this information inside pointer to parent. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:32:13 +01:00
Piotr Jastrzebski	2af6ff68d9	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:29:07 +01:00
Piotr Jastrzebski	b3b924dec9	mutation_partition: Extract intrusive set logic to a class. It will make it easier to change the implementation of the intrusive set. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:29:07 +01:00
Paweł Dziepak	ef57b9a26f	rename memory_usage() to external_memory_usage() where applicable Renaming the function to external_memory_usage() makes it clear that sizeof(T) is not included, something that was a source of confusion in the past. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-11-18 11:25:36 +00:00
Piotr Jastrzebski	b05b90b3a5	Introduce clustering_key_filter_ranges. This fixes the problem of multiple concurrent get_ranges calls. Previously each call was invalidating the result of the previous call. Now they don't step on each other foot. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 19:46:38 +02:00
Duarte Nunes	5161ea283f	query: query::clustering_range can't wrap around This patch changes the type of query::clustering_range to express that ranges that wrap around are not allowed, and ranges that have the start bound after the end bound are considered empty. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-15 14:50:20 +00:00
Paweł Dziepak	27fea7bf2c	mutation_partition: add non-cons rows and tombstones accessors Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:07 +01:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Paweł Dziepak	23d0bfd065	mutation_partition: add row::memory_usage() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:17:25 +01:00
Paweł Dziepak	f95c5542dc	mutation_partition: allow slicing moved mutation_partition Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	22160ae6d5	mutation_partition: make rows_type public Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	847bf878ec	mutation_partition: add more row::apply() overloads Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Duarte Nunes	70083efee2	sstables: Read and write range tombstone bounds This patch uses the composite_marker to add inclusiveness information to the prefixes of a range tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	7628e403a3	sstables: Drop code for tombstone merging Since Scylla now supports proper range tombstones, the code for reading ranges from sstables and converting them to overlapping tombstones is no longer necessary, and is, in fact, wasteful as the internal representation converts overlapping tombstones back to ranges. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	91aac30f12	mutations: Row tombstones are now a set of ranges This patch changes the type of the mutation partition's row_tombstones to be a range_tombstone_list, so that they are now represented as a set of disjoint ranges. All of its usages are updated accordingly. Fixes #1155 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Piotr Jastrzebski	23c23abe53	Make memtable mutation_reader slice using clustering ranges. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 11:46:41 +02:00
Tomasz Grabiec	a1539fed95	mutation_partition: Fix reversed trim_rows() The first erase_and_dispose(), which removes rows between last position and beginning of the next range, can invalidate end() iterator of the range. Fix by looking up end after erasing. mutation_partition::range() was split into lower_bound() and upper_bound() to allow for that. This affects for example queries with descending order where the selected clustering range is empty and falls before all rows. Exposed by `f15c380a4f`, which is now calling do_compact() during query. Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test	2016-04-08 20:53:33 +02:00
Avi Kivity	db03295c8a	Merge "Fix query digest mismatch" from Tomasz "Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165."	2016-04-08 12:13:29 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00

1 2 3

139 Commits