scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Vladimir Krivopalov	ed62b9a667	Add mutation_partition::apply_insert() overload that accepts TTL and expiry for row marker. For #1969. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-04-26 13:27:42 -07:00
Duarte Nunes	c8baba4e3a	mutation_partition: Clarify comment about emptiness empty() doesn't distinguish between live and dead data, so clarify that in its comment. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:03 +01:00
Duarte Nunes	67dac67c46	mutation_partition: Regular base column in view determines row liveness When views contain a primary key column that is not part of the base table primary key, that column determines whether the row is live or not. We need to ensure that when that cell is dead, and thus the derived row marker, either by normal deletion of by TTL, so is the rest of the row. This patch introduces the idea of shawdowing row marker. We map the status of the regular base column in the view's PK to the view row's marker. If this marker is dead, so is that cell in the base table, and so should the view row become. To enforce that, a view row's dead marker shadows the whole row if that view includes a base regular column in its PK. Fixes #3360 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Duarte Nunes	b0cb5480d5	mutation_fragment: Allow querying if row is live For clustering_row and static_row, allow querying whether they are live or not. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Tomasz Grabiec	381bf02f55	cache: Evict with row granularity Instead of evicting whole partitions, evicts whole rows. As part of this, invalidation of partition entries was changed to not evict from snapshots right away, but unlink them and let them be evicted by the reclaimer.	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	5320705300	cache: Propagate cache_tracker to places manipulating evictable entries cache_tracker reference will be needed to link/unlink row entries. No change of behavior in this patch.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	3dc9000c51	mutation_partition: Introduce rows_entry::is_last_dummy() Will be needed by row evictor, which needs to treat last dummies specially (not evict them).	2018-03-06 11:50:26 +01:00
Tomasz Grabiec	d9a38c1c85	mutation_partition: Add API to walk from rows_entry to cache_entry Will be needed on row eviction, to unlink containers when they become fully evicted.	2018-03-06 11:50:26 +01:00
Tomasz Grabiec	9893e8e5f7	mvcc: Make each version have independent continuity This change is a preparation for introducing row-level eviction, such that entries can be evicted from older versions without having to touch other versions. Currently continuity flags on entries are interpreted relative to the combined view merged from all entries. For example: v2: <key=2, cont=1> v1: <key=1, cont=1> In v2, the flag on entry key=2 marks the range (1, 2) as continuous. This is problematic because if the old version is evicted, continuity will change in an incorrect way: v2: <key=2, cont=1> Here, the range (-inf, 1) would be marked as continuous, which is not true. To solve this problem, we change the rules for continuity interpretation in MVCC. Each version will have its own continuity, fully specified in that version, independent of continuity of other versions. Continuity of the snapshot will be a union of continuous ranges in each version. It is assumed that continuous intervals in different versions are non- overlapping, except for points corresponding to complete rows, in which case a later version may overlap with an older version (overwrite). We make use of this assumption to make calculation of the union of intervals on merging easier. I make use of the above assumption in mutation_partition::apply_monotonically(). MVCC population of incomplete entries already almost maintains the non-overlapping invariant, because population intervals correspond to intervals which are incomplete in the old snapshot. The only change needed is to ensure that both population bounds will have entries in the latest version. Population from memtables doesn't mark any intervals as continuous, so also conforms. The only change needed there is to not inherit continuity flags from the old snapshot, effectively making the new version internally discontinuous except for row points. The example from the beginning will become: v2: <key=1, cont=0> <key=2, cont=1> v1: <key=1, cont=1> When marking a range as continuous with some rows present only in older versions, we need to insert entries in the latest version, so that we can mark the range as continuous. The easiest solution is to copy the entry from the old version. Another option would be to add support for incomplete rows and insert such instead. This way we would avoid duplicating row contents. This optimization is deferred.	2018-03-06 11:50:25 +01:00
Duarte Nunes	99a3e3aa76	mutation_partition: Allow caching cell hashes We add storage to a row to hold the cached hashes of each individual cell. We don't store the hash in each cell because that would a) change the cell equality function, and b) require us to change a cell in a potentially fragmented buffer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:47 +00:00
Duarte Nunes	71ba99d53e	mutation_partition: Force vector_storage internal storage size This patch forces the size of vector_storage's internal storage to 5, meaning that the underlying managed_vector will ensure it doesn't need to externally allocate a buffer to hold the row, if only its first 5 cells are set. We define this size explicitly so we can change the vector's value type in upcoming patches without affecting the optimization. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:51 +00:00
Tomasz Grabiec	da0c48a987	mutation_partition: Add rows_entry::set_dummy()	2018-01-18 11:32:49 +01:00
Duarte Nunes	83e983d4d0	mutation_partition: Remove unused operator==() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180115013546.67260-1-duarte@scylladb.com>	2018-01-15 11:16:35 +02:00
Duarte Nunes	9d1d9883ff	mutation_partition: Remove unused for_each_cell() overload Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180115013618.67351-1-duarte@scylladb.com>	2018-01-15 11:16:34 +02:00
Tomasz Grabiec	8e8ece5dec	mutation_partition: Introduce deletable_row::apply() from a clustering_row fragment	2017-12-08 17:50:47 +01:00
Tomasz Grabiec	b3709047b0	mutation_partition: Extract sliced() from mutation into mutation_partition So that we can call it on mutation_partition.	2017-12-08 17:50:47 +01:00
Tomasz Grabiec	bde050835f	mutation_partition: Make check_continuity() const-qualified	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	f9257886cb	mutation_partition: Make check_continuity() public	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	865bd8a594	mutation_partition: Introduce mutation_partition::get_continuity() Intended to be used in tests.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	22138554e6	mutation_partition: Leave moved-from row in an empty state Needed by apply_monotonically(). Fixes SIGSEGV in mutation_test_g.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	70e14f78a7	mutation_partition: Drop apply_reversibly()	2017-11-28 13:03:06 +01:00
Tomasz Grabiec	091e10fc70	mutation_partition: Relax exception guarantees of apply() The uses which needed strong or weak exception guarantees were switched to a solution involving apply_monotonically(). All remaining uses don't need any exception guarantees.	2017-11-28 13:03:06 +01:00
Tomasz Grabiec	988d3c67b4	mutation_partition: Introduce apply_weak() Intended to be used by code which doesn't need any exception guarantees. Currently just delegates to apply_monotonically().	2017-11-28 13:03:03 +01:00
Tomasz Grabiec	97ebf51d3a	mutation_partition: Introduce apply_monotonically() Has weaker exception guarantees than apply(), which allows for simpler implementation. Intended to replace the apply() with strong exception guarantees.	2017-11-28 12:28:51 +01:00
Tomasz Grabiec	978b874065	mutation_partition: Introduce row::consume_with()	2017-11-28 11:20:03 +01:00
Glauber Costa	d49ecae201	mutation_partition: estimate size of partition In the memtable flusher, we account for the size of a partition as we read them. However, there are other points in the architecture where we would like to calculate the size of a partition in a point in which we are not reading it. One such example is the cache update process. This patch enhances the mutation_partition adding a method that returns the total size for this partition. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 16:21:44 -05:00
Tomasz Grabiec	749f5770df	mutation: Introduce apply(mutation_fragment)	2017-11-02 12:16:17 +01:00
Tomasz Grabiec	72028bb048	mutation_partition: Allow creating rows_entry at any clustered position_in_partition In preparation for supporting setting continuity of arbitrary clustering range.	2017-11-02 11:05:19 +01:00
Tomasz Grabiec	409adc045a	mutation_partition: Remove delegating_compare() It can't work with rows_entry at any position_in_partition, so we need to drop it.	2017-11-02 11:05:19 +01:00
Tomasz Grabiec	455a1b0d24	mutation_partition: Introduce range continuity checking methods	2017-09-13 17:47:04 +02:00
Tomasz Grabiec	abc489e99d	mutation_partition: Enable rows_entry::compare() on position_in_partition_views For full symmetry with existing overloads.	2017-09-13 17:47:04 +02:00
Tomasz Grabiec	b6ae5783cd	mvcc: Introduce partition_entry::evict() The operation frees as much memory as possible, marking affected mutation elements as discontinuous.	2017-09-13 17:47:03 +02:00
Paweł Dziepak	43cce6c2f4	rows_entry: make position() inlineable	2017-07-26 14:38:27 +01:00
Tomasz Grabiec	0770845a23	mutation_partition: Introduce r-value accepting deletable_row::apply()	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	efc75b0bc3	mutation_partition: Add rows_entry constructor which accepts full contents [tgrabiec: Extracted from different patch]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	dce293e11c	tests: row_cache: Apply only fully continuous mutations to underlying mutation source Cache currently assumes that mutations coming from outside are fully continuous.	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	05b56fcfb0	mutation_partition: Add support for specifying continuity This will allow expressing lack of information about certain ranges of rows (including the static row), which will be used in cache to determine if information in cache is complete or not. Continuity is represented internally using flags on row entries. The key range between two consecutive entries is continuous iff rows_entry::continuous() is true for the later entry. The range starting after the last entry is assumed to be continuous. The range corresponding to the key of the entry is continuous iff rows_entry::dummy() is false. [tgrabiec: - based on the following commits: 4a5bf75 - Piotr Jastrzebski : mutation_partition: introduce dummy rows_entry 773070e - Piotr Jastrzebski : mutation_partition: add continuity flag to rows_entry - documented that partition tombstone is always complete - require specifying the partition tombstone when creating an incomplete entry - replaced rows_entry(dummy_tag, ...) constructor with more general rows_entry(position_in_partition, ...) - documented continuity semantics on mutation_partition - fixed _static_row_cached being lost by mutation_partition copy constructors - fixed conversion to streamed_mutation to ignore dummy entries - fixed mutation_partition serializer to drop dummy entries - documented semantics of continuity on mutation_partition level - dropped assumptions that dummy entries can be only at the last position - changed equality to ignore continuity completely, rather than partially (it was not ignoring dummy entries, but ignoring continuity flag) - added printout of continuity information in mutation_partition - fixed handling of empty entries in apply_reversibly() with regards to continuity; we no longer can remove empty entries before merging, since that may affect continuity of the right-hand mutation. Added _erased flag. - fixed mutation_partition::clustered_row() with dummy==true to not ignore the key - fixed partition_builder to not ignore continuity - renamed dummy_tag_t to dummy_tag. _t suffix is reserved. - standardized all APIs on is_dummy and is_continuous bool_class:es - replaced add_dummy_entry() with ensure_last_dummy() with safer semantics - dropped unused remove_dummy_entry() - simplified and inlined cache_entry::add_dummy_entry() - fixed mutation_partition(incomplete_tag) constructor to mark all row ranges as discontinuous ]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	a77734952d	mutation_partition: Make rows_entry comparable with position_in_partition	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	65b3123516	mutation_partition: Use rows_entry::position() in comparators key() will not be valid for dummy entries, but position() is always valid. [tgrabiec: Extracted from other commits] [tgrabiec: Added missing change to range_tombstone_stream::get_next]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	660f3127a6	mutation_partition: Introduce rows_entry::position() In preparation for enabling dummy entries with postion past all clustering rows.	2017-06-24 18:06:11 +02:00
Gleb Natapov	f5679e0416	database: remove remnants of no longer existing db::serializer. Message-Id: <20170604100552.GD8248@scylladb.com>	2017-06-04 13:07:17 +03:00
Duarte Nunes	9e88b60ef5	mutation: Set cell using clustering_key_prefix Change the clustering key argument in mutation::set_cell from exploded_clustering_prefix to clustering_key_prefix, which allows for some overall code simplification and fewer copies. This mostly affects the cql3 layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Duarte Nunes	db63ffdbb4	mutation_partition: Harmonize apply_delete overloads This patch ensures the different mutation_partition::apply_delete() overloads behave similarly, so that, for example, an empty clustering key is treated the same way as an empty exploded_clustering_key_prefix. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	6a2bccd4ae	mutation_partion: Introduce row_tombstone This patch introduces the row_tombstone class, which represents a tombstone made up of a regular tombstone and a shadowable one. The rules for row_tombstones are as follows: - The shadowable tombstone is always >= than the regular one; - The regular tombstone works as expected; - The shadowable tombstone doesn't erase or compact away the regular row tombstone, nor dead cells; - The shadowable tombstone can erase live cells, but only provided they can be recovered (e.g., by including all cells in a MV update, both updated cells and pre-existing ones); - The shadowable tombstone can be erased or compacted away by a newer row marker. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:28 +02:00
Duarte Nunes	3d49c1da01	mutation_partition: Introduce shadowable tombstones A shadowable tombstone is a tombstone that can be replaced by a smaller one if provided a row_marker with a bigger timestamp than the shadowable tombstone. In the context of a row, it is only valid as long as no newer insert is done (thus setting a live row marker; note that if the row timestamp set is lower than the tombstone's, then the tombstone remains in effect as usual). If a row has a shadowable tombstone with timestamp Ti and that row is updated with a timestamp Tj, such that Tj > Ti (and that update sets the row marker), then the shadowable tombstone is shadowed by that update. A concrete consequence is that if the update has cells with timestamp lower than Ti, then those cells are preserved (since the deletion is removed), and this is contrary to a regular, non-shadowable row tombstone where the tombstone is preserved and such cells are removed. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:22 +02:00
Duarte Nunes	392403b5b3	row_marker: Mark constructors explicit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Paweł Dziepak	582d397c41	introduce counter_write_query() Counter write path involves read-modify-write. That read is guaranteed to query only a single partition, does not care about dead cells and expects to receive an unserialized mutation as a result. Standard mutation queries can are able to produce results fit for counter updates, but the logic involved is much more general (i.e. slower), hence the addition of new, counter-specific kind of query.	2017-03-01 16:33:36 +00:00
Duarte Nunes	7e150a18eb	mutation_partition: Introduce shadowable tombstone This patch introduces shadowable row tombstones. A shadowable row tombstone is valid only if the row has no live marker. In other words, the row tombstone is only valid as long as no newer insert is done (thus setting a live row marker; note that if the row timestamp set is lower than the tombstone's, then the tombstone remains in effect as usual). If a row has a shadowable tombstone with timestamp Ti and that row is updated with a timestamp Tj, such that Tj > Ti (and that update sets the row marker), then the shadowable tombstone is shadowed by that update. A concrete consequence is that if the update has cells with timestamp lower than Ti, then those cells are preserved (since the deletion is removed), and this is contrary to a regular, non-shadowable row tombstone where the tombstone is preserved and such cells are removed. Currently, only Materialized Views require shadowable row tombstones, which solve a problem with view row deletions. Consider a base row with columns p, v1, v2, PRIMARY KEY (p) denormalized into a view row consisting of columns p, v1, v2 PRIMARY KEY (p, v1), and the following operations: 1) INSERT INTO base (p, v1, v2) VALUES (0, 0, 1) USING TIMESTAMP 0; 2) UPDATE base SET v1 = 1 USING TIMESTAMP 1 WHERE p = 0; 3) UPDATE base SET v1 = 0 USING TIMESTAMP 2 WHERE p = 0; Without shadowable tombstones, the view contains: At 1), pk = (0, 0), row_marker@T0, v2=1@T0 At 2), pk = (0, 0), row_marker@T0, row_tombstone@T1, v2=1@T0 pk = (0, 1), row_marker@T1, v2=1@T0 At 3), pk = (0, 0), row_marker@T2, row_tombstone@T1, v2=1@T0 pk = (0, 1), row_marker@T1, row_tombstone@T2, v2=1@T0 Notice how, if we read row (0, 0), the value of v2 will be shadowed by the row tombstone we previously inserted. With a view's row tombstone becoming shadowable, at 3) the row (0, 0) will look like pk = (0, 0), row_marker@T2, shadowable_tombstone@T1, v2=1@T0, which is equivalent to pk = (0, 0), row_marker@T2, v2=1@T0. Since the shadowable tombstone is shadowed by the new row marker (T0 < T2), now v2 would be taken into account. Finally, note that this patch doesn't generalize the idea of shadowable tombstone, instead taking advantage of the fact that they are only needed by Materialized Views. This saves changing the tombstone representation to account for an extra flag, the bits such representation would require, and also avoids changes to the storage format. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-02-06 13:36:45 +01:00
Paweł Dziepak	b6564651e4	mutation_partition: make for_each_cell() accessible outside source file for_each_cell() const already can be used from any place in the code, allow the same with non-const version.	2017-02-02 10:35:14 +00:00

1 2 3 4

164 Commits