scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Botond Dénes	ff808d9ce6	Save and restore queriers in mutation_query() and data_query() Use the querier_cache (represented by the passed-in querier_cache_context) object to lookup saved queriers at the start of the page and save them at the end of it if it is likely that there will be more page requests.	2018-03-13 10:34:34 +02:00
Tomasz Grabiec	da901b93fc	cache: Track number of rows and row invalidations	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	381bf02f55	cache: Evict with row granularity Instead of evicting whole partitions, evicts whole rows. As part of this, invalidation of partition entries was changed to not evict from snapshots right away, but unlink them and let them be evicted by the reclaimer.	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	ab407d99cc	mvcc: Store complete rows in each version in evictable entries For row-level eviction we need to ensure that each version has complete rows so that eviction from older versions doesn't affect the value of the row in newer snapshots. This is achieved by copying the row from an older version before applying the increment in the new version. Only affects evictable entries, memtables are not affected.	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	bee875fa7d	cache: Ensure all evictable partition_versions have a dummy after all rows Every evictable version will have a dummy entry at the end so that it can be tracked in the LRU. It is also needed to allow old versions to stay around (with tombstones and static rows) after all rows are evicted. Such versions must be fully discontinuous, and we need some entry to mark that.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	5320705300	cache: Propagate cache_tracker to places manipulating evictable entries cache_tracker reference will be needed to link/unlink row entries. No change of behavior in this patch.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	3dc9000c51	mutation_partition: Introduce rows_entry::is_last_dummy() Will be needed by row evictor, which needs to treat last dummies specially (not evict them).	2018-03-06 11:50:26 +01:00
Tomasz Grabiec	9893e8e5f7	mvcc: Make each version have independent continuity This change is a preparation for introducing row-level eviction, such that entries can be evicted from older versions without having to touch other versions. Currently continuity flags on entries are interpreted relative to the combined view merged from all entries. For example: v2: <key=2, cont=1> v1: <key=1, cont=1> In v2, the flag on entry key=2 marks the range (1, 2) as continuous. This is problematic because if the old version is evicted, continuity will change in an incorrect way: v2: <key=2, cont=1> Here, the range (-inf, 1) would be marked as continuous, which is not true. To solve this problem, we change the rules for continuity interpretation in MVCC. Each version will have its own continuity, fully specified in that version, independent of continuity of other versions. Continuity of the snapshot will be a union of continuous ranges in each version. It is assumed that continuous intervals in different versions are non- overlapping, except for points corresponding to complete rows, in which case a later version may overlap with an older version (overwrite). We make use of this assumption to make calculation of the union of intervals on merging easier. I make use of the above assumption in mutation_partition::apply_monotonically(). MVCC population of incomplete entries already almost maintains the non-overlapping invariant, because population intervals correspond to intervals which are incomplete in the old snapshot. The only change needed is to ensure that both population bounds will have entries in the latest version. Population from memtables doesn't mark any intervals as continuous, so also conforms. The only change needed there is to not inherit continuity flags from the old snapshot, effectively making the new version internally discontinuous except for row points. The example from the beginning will become: v2: <key=1, cont=0> <key=2, cont=1> v1: <key=1, cont=1> When marking a range as continuous with some rows present only in older versions, we need to insert entries in the latest version, so that we can mark the range as continuous. The easiest solution is to copy the entry from the old version. Another option would be to add support for incomplete rows and insert such instead. This way we would avoid duplicating row contents. This optimization is deferred.	2018-03-06 11:50:25 +01:00
Duarte Nunes	42f407ad9e	row: Use cached hash for hash calculation This entails doing the cell hash calculation slightly differently, where the cell is hashed individually, the resulting hash being added to the running one. Instead of propagating a flag all through the call chain, we detect whether we are in the new mode by the employed hash algorithm. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:49 +00:00
Duarte Nunes	d773e4b9d4	mutation_partition: Replace hash_row_slice with appending_hash This enables us to only branch once per row on the actual hash algorithm, instead of once per row data item. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:49 +00:00
Duarte Nunes	99a3e3aa76	mutation_partition: Allow caching cell hashes We add storage to a row to hold the cached hashes of each individual cell. We don't store the hash in each cell because that would a) change the cell equality function, and b) require us to change a cell in a potentially fragmented buffer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:47 +00:00
Duarte Nunes	b2e1a91f4d	query-result: Use digester instead of md5_hasher Use the digester class instead of md5_hasher to encapsulate the decision of which hash algorithm to use. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Piotr Jastrzebski	96c97ad1db	Rename streamed_mutation* files to mutation_fragment* Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:49 +01:00
Avi Kivity	c743d1258d	Merge "Reverse order of version merging in MVCC" from Tomasz "Changes merging in MVCC to apply newer version to older instead of older to newer. Before (v0 = oldest): (((v3 + v2) + v1) + v0) After: (v0 + (v1 + (v2 + v3))) or: (((v0 + v1) + v2) + v3) There are several reasons to do this: 1) When continuity merging will change semantics to support eviction from older versions, it will be easier to implement apply() if we can assume that we merge newer to older instead of older to newer, since newer version may have entries falling into a continuous interval in older, but not the other way around. If we didn't revert the order, apply() would have to keep track of lower bound of a continuous interval in the right-hand side argument (older version) as it is applied and update continuity flags in the left hand side by scanning all entries overlapping with it. If order is reversed, merging only needs to deal with the current entry. Also, if we were to keep the old order, we cannot simply move entries from the left hand side as we merge because we need to keep track of the lower bound of a continuous interval, and we need to provide monotonic exception guarantees. So merging would be both more complicated and slower. 2) With large partitions older versions are typically larger than newer versions, and since merging is O(N_right(1 + log(N_left))), it's better to merge newer into older. This fixes latency spikes seen in perf_cache_eviction. Fixes #2715." tag 'tgrabiec/reverse-order-of-mvcc-version-merging-v1' of github.com:scylladb/seastar-dev: mvcc: Reverse order of version merging anchorless_list: Introduce last() mvcc: Implement partition_entry::upgrade() using squashed() mvcc: Extract version merging functions mutation_partition: Add rows_entry::set_dummy() position_in_partition: Introduce after_key()	2018-01-21 13:56:57 +02:00
José Guilherme Vanz	380bc0aa0d	Swap arguments order of mutation constructor Swap arguments in the mutation constructor keeping the same standard from the constructor variants. Refs #3084 Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com> Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>	2018-01-21 12:58:42 +02:00
Piotr Jastrzebski	d266eaa01e	mutation_source: rename make_flat_mutation_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-19 09:30:12 +01:00
Tomasz Grabiec	60d3c25c02	mvcc: Reverse order of version merging Change merging to apply newer version to older instead of older to newer. Before: (((v3 + v2) + v1) + v0) After: (v0 + (v1 + (v2 + v3))) or equivalent: (((v0 + v1) + v2) + v3) There are several reasons to do this: 1) When continuity merging will change semantics to support eviction from older versions, it will be easier to implement apply() if we can assume that we merge newer to older instead of older to newer, since newer version may have entries falling into a continuous interval in older, but not the other way around. If we didn't revert the order, apply() would have to keep track of lower bound of a continuous interval in the right-hand side argument (older version) as it is applied and update continuity flags in the left hand side by scanning all entries overlapping with it. If order is reversed, merging only needs to deal with the current entry. Also, if we were to keep the old order, we cannot simply move entries from the left hand side as we merge because we need to keep track of the lower bound of a continuous interval, and we need to provide monotonic exception guarantees. So merging would be both more complicated and slower. 2) With large partitions older versions are typically larger than newer versions, and since merging is O(N_right*(1 + log(N_left))), it's better to merge newer into older. Fixes #2715.	2018-01-18 13:52:08 +01:00
Duarte Nunes	83e983d4d0	mutation_partition: Remove unused operator==() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180115013546.67260-1-duarte@scylladb.com>	2018-01-15 11:16:35 +02:00
Duarte Nunes	9d1d9883ff	mutation_partition: Remove unused for_each_cell() overload Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180115013618.67351-1-duarte@scylladb.com>	2018-01-15 11:16:34 +02:00
Glauber Costa	54d3ebde4e	flat_mutation_reader: pass timeout down to consume() We pass the timeout that we received from data_query/mutation_query down to consume, which is responsible for actually reading the data. To make those timeouts actionable, though, we'll have to patch fill_buffer(). This will happen in the next patch. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Glauber Costa	8433702c90	mutation_query: add a timeout to the mutation query path data_query and mutation_query are patched so that they start accepting a per-query timeout. We will default to no timeout, and then no callers will be changed yet. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Tomasz Grabiec	8e8ece5dec	mutation_partition: Introduce deletable_row::apply() from a clustering_row fragment	2017-12-08 17:50:47 +01:00
Tomasz Grabiec	b3709047b0	mutation_partition: Extract sliced() from mutation into mutation_partition So that we can call it on mutation_partition.	2017-12-08 17:50:47 +01:00
Tomasz Grabiec	5541c9fd63	mutation_partition: Define equal_continuity() using get_continuity() This fixes the problem of equal_continuity() being prone to false positives due to redundant information (extra dummy rows) present in one of the partitions. get_continuity() is minified, so is not prone to this.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	bde050835f	mutation_partition: Make check_continuity() const-qualified	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	865bd8a594	mutation_partition: Introduce mutation_partition::get_continuity() Intended to be used in tests.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	22138554e6	mutation_partition: Leave moved-from row in an empty state Needed by apply_monotonically(). Fixes SIGSEGV in mutation_test_g.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	a305a28574	mutation_partition: Fix upgrade() not preserving static row continuity We do not rely on this yet, but will.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	05a6c67804	mutation_partition: Don't print absent elements Makes printout shorter and thus easier to parse.	2017-12-01 10:52:37 +01:00
Tomasz Grabiec	d8b54a57aa	mutation_partition: Make row_marker printout similar to other partition elements	2017-12-01 10:52:37 +01:00
Tomasz Grabiec	fd7ab5fe99	database: Move operator<<() overloads to appropriate source files	2017-12-01 10:52:37 +01:00
Tomasz Grabiec	7bde3090b4	mutation_partition: Use multi-line printout Convert to a multi line output, which is easier to read for a human. After: {ks.cf key {key: pk{000c706b30303030303030303030}, token:-2018791535786252460} data {mutation_partition: {tombstone: none}, range_tombstones: {}, static: cont=1 {row: }, clustered: { {rows_entry: cont=true dummy=false {position: clustered,ckp{000c636b30303030303030303030},0} {deletable_row: {row: }}}, {rows_entry: cont=true dummy=true {position: clustered,ckp{000c636b30303030303030303031},0} {deletable_row: {row: }}}}}}	2017-12-01 10:52:37 +01:00
Tomasz Grabiec	70e14f78a7	mutation_partition: Drop apply_reversibly()	2017-11-28 13:03:06 +01:00
Tomasz Grabiec	091e10fc70	mutation_partition: Relax exception guarantees of apply() The uses which needed strong or weak exception guarantees were switched to a solution involving apply_monotonically(). All remaining uses don't need any exception guarantees.	2017-11-28 13:03:06 +01:00
Tomasz Grabiec	988d3c67b4	mutation_partition: Introduce apply_weak() Intended to be used by code which doesn't need any exception guarantees. Currently just delegates to apply_monotonically().	2017-11-28 13:03:03 +01:00
Tomasz Grabiec	97ebf51d3a	mutation_partition: Introduce apply_monotonically() Has weaker exception guarantees than apply(), which allows for simpler implementation. Intended to replace the apply() with strong exception guarantees.	2017-11-28 12:28:51 +01:00
Tomasz Grabiec	978b874065	mutation_partition: Introduce row::consume_with()	2017-11-28 11:20:03 +01:00
Paweł Dziepak	48c3db54c9	mutation_partition: convert queries to flat_mutation_readers	2017-11-21 11:37:04 +00:00
Glauber Costa	d49ecae201	mutation_partition: estimate size of partition In the memtable flusher, we account for the size of a partition as we read them. However, there are other points in the architecture where we would like to calculate the size of a partition in a point in which we are not reading it. One such example is the cache update process. This patch enhances the mutation_partition adding a method that returns the total size for this partition. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 16:21:44 -05:00
Tomasz Grabiec	ca3e72266f	mutation_partition: Fix abort in case range tombstone copying fails If exception is thrown from _row_tombstones.apply(), _rows will be left uncleared. This will trigger assertion in bi::set_member_hook destructor, which assrts that the hook is not linked. Always clear _rows.	2017-11-07 15:33:24 +01:00
Tomasz Grabiec	749f5770df	mutation: Introduce apply(mutation_fragment)	2017-11-02 12:16:17 +01:00
Tomasz Grabiec	72028bb048	mutation_partition: Allow creating rows_entry at any clustered position_in_partition In preparation for supporting setting continuity of arbitrary clustering range.	2017-11-02 11:05:19 +01:00
Tomasz Grabiec	409adc045a	mutation_partition: Remove delegating_compare() It can't work with rows_entry at any position_in_partition, so we need to drop it.	2017-11-02 11:05:19 +01:00
Tomasz Grabiec	65ca8eebb8	mutation_partition: Print rows_entry's position instead of key For dummy rows, _key doesn't reflect the right position. Message-Id: <1505317040-6783-1-git-send-email-tgrabiec@scylladb.com>	2017-09-13 20:49:28 +03:00
Tomasz Grabiec	455a1b0d24	mutation_partition: Introduce range continuity checking methods	2017-09-13 17:47:04 +02:00
Tomasz Grabiec	b6ae5783cd	mvcc: Introduce partition_entry::evict() The operation frees as much memory as possible, marking affected mutation elements as discontinuous.	2017-09-13 17:47:03 +02:00
Duarte Nunes	c7aa3ea069	mutation_partition: Remove obsolete short read detection When compacting a partition for querying we would read an extra row, to include any tombstones between that one and the previous row. This is no longer needed since we have a general mechanism to detect short reads in the storage_proxy. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811103031.22866-1-duarte@scylladb.com>	2017-08-15 12:01:55 +01:00
Paweł Dziepak	43cce6c2f4	rows_entry: make position() inlineable	2017-07-26 14:38:27 +01:00
Tomasz Grabiec	136d205855	mutation_partition: Always mark static row as continuous when no static columns To avoid unnecessary cache misses after static columns are added. Message-Id: <1500650057-26036-1-git-send-email-tgrabiec@scylladb.com>	2017-07-24 10:23:35 +03:00
Tomasz Grabiec	0770845a23	mutation_partition: Introduce r-value accepting deletable_row::apply()	2017-06-24 18:06:11 +02:00

1 2 3 4 5

229 Commits