scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Duarte Nunes	83e983d4d0	mutation_partition: Remove unused operator==() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180115013546.67260-1-duarte@scylladb.com>	2018-01-15 11:16:35 +02:00
Duarte Nunes	9d1d9883ff	mutation_partition: Remove unused for_each_cell() overload Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180115013618.67351-1-duarte@scylladb.com>	2018-01-15 11:16:34 +02:00
Glauber Costa	54d3ebde4e	flat_mutation_reader: pass timeout down to consume() We pass the timeout that we received from data_query/mutation_query down to consume, which is responsible for actually reading the data. To make those timeouts actionable, though, we'll have to patch fill_buffer(). This will happen in the next patch. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Glauber Costa	8433702c90	mutation_query: add a timeout to the mutation query path data_query and mutation_query are patched so that they start accepting a per-query timeout. We will default to no timeout, and then no callers will be changed yet. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Tomasz Grabiec	8e8ece5dec	mutation_partition: Introduce deletable_row::apply() from a clustering_row fragment	2017-12-08 17:50:47 +01:00
Tomasz Grabiec	b3709047b0	mutation_partition: Extract sliced() from mutation into mutation_partition So that we can call it on mutation_partition.	2017-12-08 17:50:47 +01:00
Tomasz Grabiec	5541c9fd63	mutation_partition: Define equal_continuity() using get_continuity() This fixes the problem of equal_continuity() being prone to false positives due to redundant information (extra dummy rows) present in one of the partitions. get_continuity() is minified, so is not prone to this.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	bde050835f	mutation_partition: Make check_continuity() const-qualified	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	865bd8a594	mutation_partition: Introduce mutation_partition::get_continuity() Intended to be used in tests.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	22138554e6	mutation_partition: Leave moved-from row in an empty state Needed by apply_monotonically(). Fixes SIGSEGV in mutation_test_g.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	a305a28574	mutation_partition: Fix upgrade() not preserving static row continuity We do not rely on this yet, but will.	2017-12-08 12:01:27 +01:00
Tomasz Grabiec	05a6c67804	mutation_partition: Don't print absent elements Makes printout shorter and thus easier to parse.	2017-12-01 10:52:37 +01:00
Tomasz Grabiec	d8b54a57aa	mutation_partition: Make row_marker printout similar to other partition elements	2017-12-01 10:52:37 +01:00
Tomasz Grabiec	fd7ab5fe99	database: Move operator<<() overloads to appropriate source files	2017-12-01 10:52:37 +01:00
Tomasz Grabiec	7bde3090b4	mutation_partition: Use multi-line printout Convert to a multi line output, which is easier to read for a human. After: {ks.cf key {key: pk{000c706b30303030303030303030}, token:-2018791535786252460} data {mutation_partition: {tombstone: none}, range_tombstones: {}, static: cont=1 {row: }, clustered: { {rows_entry: cont=true dummy=false {position: clustered,ckp{000c636b30303030303030303030},0} {deletable_row: {row: }}}, {rows_entry: cont=true dummy=true {position: clustered,ckp{000c636b30303030303030303031},0} {deletable_row: {row: }}}}}}	2017-12-01 10:52:37 +01:00
Tomasz Grabiec	70e14f78a7	mutation_partition: Drop apply_reversibly()	2017-11-28 13:03:06 +01:00
Tomasz Grabiec	091e10fc70	mutation_partition: Relax exception guarantees of apply() The uses which needed strong or weak exception guarantees were switched to a solution involving apply_monotonically(). All remaining uses don't need any exception guarantees.	2017-11-28 13:03:06 +01:00
Tomasz Grabiec	988d3c67b4	mutation_partition: Introduce apply_weak() Intended to be used by code which doesn't need any exception guarantees. Currently just delegates to apply_monotonically().	2017-11-28 13:03:03 +01:00
Tomasz Grabiec	97ebf51d3a	mutation_partition: Introduce apply_monotonically() Has weaker exception guarantees than apply(), which allows for simpler implementation. Intended to replace the apply() with strong exception guarantees.	2017-11-28 12:28:51 +01:00
Tomasz Grabiec	978b874065	mutation_partition: Introduce row::consume_with()	2017-11-28 11:20:03 +01:00
Paweł Dziepak	48c3db54c9	mutation_partition: convert queries to flat_mutation_readers	2017-11-21 11:37:04 +00:00
Glauber Costa	d49ecae201	mutation_partition: estimate size of partition In the memtable flusher, we account for the size of a partition as we read them. However, there are other points in the architecture where we would like to calculate the size of a partition in a point in which we are not reading it. One such example is the cache update process. This patch enhances the mutation_partition adding a method that returns the total size for this partition. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 16:21:44 -05:00
Tomasz Grabiec	ca3e72266f	mutation_partition: Fix abort in case range tombstone copying fails If exception is thrown from _row_tombstones.apply(), _rows will be left uncleared. This will trigger assertion in bi::set_member_hook destructor, which assrts that the hook is not linked. Always clear _rows.	2017-11-07 15:33:24 +01:00
Tomasz Grabiec	749f5770df	mutation: Introduce apply(mutation_fragment)	2017-11-02 12:16:17 +01:00
Tomasz Grabiec	72028bb048	mutation_partition: Allow creating rows_entry at any clustered position_in_partition In preparation for supporting setting continuity of arbitrary clustering range.	2017-11-02 11:05:19 +01:00
Tomasz Grabiec	409adc045a	mutation_partition: Remove delegating_compare() It can't work with rows_entry at any position_in_partition, so we need to drop it.	2017-11-02 11:05:19 +01:00
Tomasz Grabiec	65ca8eebb8	mutation_partition: Print rows_entry's position instead of key For dummy rows, _key doesn't reflect the right position. Message-Id: <1505317040-6783-1-git-send-email-tgrabiec@scylladb.com>	2017-09-13 20:49:28 +03:00
Tomasz Grabiec	455a1b0d24	mutation_partition: Introduce range continuity checking methods	2017-09-13 17:47:04 +02:00
Tomasz Grabiec	b6ae5783cd	mvcc: Introduce partition_entry::evict() The operation frees as much memory as possible, marking affected mutation elements as discontinuous.	2017-09-13 17:47:03 +02:00
Duarte Nunes	c7aa3ea069	mutation_partition: Remove obsolete short read detection When compacting a partition for querying we would read an extra row, to include any tombstones between that one and the previous row. This is no longer needed since we have a general mechanism to detect short reads in the storage_proxy. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811103031.22866-1-duarte@scylladb.com>	2017-08-15 12:01:55 +01:00
Paweł Dziepak	43cce6c2f4	rows_entry: make position() inlineable	2017-07-26 14:38:27 +01:00
Tomasz Grabiec	136d205855	mutation_partition: Always mark static row as continuous when no static columns To avoid unnecessary cache misses after static columns are added. Message-Id: <1500650057-26036-1-git-send-email-tgrabiec@scylladb.com>	2017-07-24 10:23:35 +03:00
Tomasz Grabiec	0770845a23	mutation_partition: Introduce r-value accepting deletable_row::apply()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	dce293e11c	tests: row_cache: Apply only fully continuous mutations to underlying mutation source Cache currently assumes that mutations coming from outside are fully continuous.	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	05b56fcfb0	mutation_partition: Add support for specifying continuity This will allow expressing lack of information about certain ranges of rows (including the static row), which will be used in cache to determine if information in cache is complete or not. Continuity is represented internally using flags on row entries. The key range between two consecutive entries is continuous iff rows_entry::continuous() is true for the later entry. The range starting after the last entry is assumed to be continuous. The range corresponding to the key of the entry is continuous iff rows_entry::dummy() is false. [tgrabiec: - based on the following commits: 4a5bf75 - Piotr Jastrzebski : mutation_partition: introduce dummy rows_entry 773070e - Piotr Jastrzebski : mutation_partition: add continuity flag to rows_entry - documented that partition tombstone is always complete - require specifying the partition tombstone when creating an incomplete entry - replaced rows_entry(dummy_tag, ...) constructor with more general rows_entry(position_in_partition, ...) - documented continuity semantics on mutation_partition - fixed _static_row_cached being lost by mutation_partition copy constructors - fixed conversion to streamed_mutation to ignore dummy entries - fixed mutation_partition serializer to drop dummy entries - documented semantics of continuity on mutation_partition level - dropped assumptions that dummy entries can be only at the last position - changed equality to ignore continuity completely, rather than partially (it was not ignoring dummy entries, but ignoring continuity flag) - added printout of continuity information in mutation_partition - fixed handling of empty entries in apply_reversibly() with regards to continuity; we no longer can remove empty entries before merging, since that may affect continuity of the right-hand mutation. Added _erased flag. - fixed mutation_partition::clustered_row() with dummy==true to not ignore the key - fixed partition_builder to not ignore continuity - renamed dummy_tag_t to dummy_tag. _t suffix is reserved. - standardized all APIs on is_dummy and is_continuous bool_class:es - replaced add_dummy_entry() with ensure_last_dummy() with safer semantics - dropped unused remove_dummy_entry() - simplified and inlined cache_entry::add_dummy_entry() - fixed mutation_partition(incomplete_tag) constructor to mark all row ranges as discontinuous ]	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	65b3123516	mutation_partition: Use rows_entry::position() in comparators key() will not be valid for dummy entries, but position() is always valid. [tgrabiec: Extracted from other commits] [tgrabiec: Added missing change to range_tombstone_stream::get_next]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	660f3127a6	mutation_partition: Introduce rows_entry::position() In preparation for enabling dummy entries with postion past all clustering rows.	2017-06-24 18:06:11 +02:00
Paweł Dziepak	b2b78158f6	mutation_partition: restore formatting No functional change. Message-Id: <20170526104119.22075-2-pdziepak@scylladb.com>	2017-06-06 11:20:57 +03:00
Paweł Dziepak	d9dd798c4f	counter_write_query: avoid use-after-free on partition range Message-Id: <20170526104119.22075-1-pdziepak@scylladb.com>	2017-05-28 11:41:30 +03:00
Tomasz Grabiec	804f46f684	mutation: Make compare_*_for_merge() consistent with equals() equals() considers expiring cells to be different form non-expiring cells, but compare_row_marker_for_merge() considers them equal. Fix the latter to pick expiring cells. The choice was arbitrary.	2017-05-23 13:35:03 +02:00
Duarte Nunes	9e88b60ef5	mutation: Set cell using clustering_key_prefix Change the clustering key argument in mutation::set_cell from exploded_clustering_prefix to clustering_key_prefix, which allows for some overall code simplification and fewer copies. This mostly affects the cql3 layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Duarte Nunes	db63ffdbb4	mutation_partition: Harmonize apply_delete overloads This patch ensures the different mutation_partition::apply_delete() overloads behave similarly, so that, for example, an empty clustering key is treated the same way as an empty exploded_clustering_key_prefix. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	392403b5b3	row_marker: Mark constructors explicit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Raphael S. Carvalho	a6f8f4fe24	compaction: do not write expired cell as dead cell if it can be purged right away When compacting a fully expired sstable, we're not allowing that sstable to be purged because expired cell is unconditionally converted into a dead cell. Why not check if the expired cell can be purged instead using gc before and max purgeable timestamp? Currently, we need two compactions to get rid of a fully expired sstable which cells could have always been purged. look at this sstable with expired cell: { "partition" : { "key" : [ "2" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 120, "liveness_info" : { "tstamp" : "2017-04-09T17:07:12.702597Z", "ttl" : 20, "expires_at" : "2017-04-09T17:07:32Z", "expired" : true }, "cells" : [ { "name" : "country", "value" : "1" }, ] now this sstable data after first compaction: [shard 0] compaction - Compacted 1 sstables to [...]. 120 bytes to 79 (~65% of original) in 229ms = 0.000328997MB/s. { ... "rows" : [ { "type" : "row", "position" : 79, "cells" : [ { "name" : "country", "deletion_info" : { "local_delete_time" : "2017-04-09T17:07:12Z" }, "tstamp" : "2017-04-09T17:07:12.702597Z" }, ] now another compaction will actually get rid of data: compaction - Compacted 1 sstables to []. 79 bytes to 0 (~0% of original) in 1ms = 0MB/s. ~2 total partitions merged to 0 NOTE: It's a waste of time to wait for second compaction because the expired cell could have been purged at first compaction because it satisfied gc_before and max purgeable timestamp. Fixes #2249, #2253 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170413001049.9663-1-raphaelsc@scylladb.com>	2017-04-13 10:59:19 +03:00
Avi Kivity	27c42359bc	Merge seastar upstream * seastar 6b21197...2ebe842 (6): > Merge "Various improvements to execution stages" from Paweł > app-template: allow apps to specify a name for help message > bool_class: avoid initializing object of incomplete type > app-template: make sure we can still get help with required options > prometheus: Http handler that returns prometheus 0.4 protobuf or text format > Update DPDK to 17.02 Includes patch from Pawel to adjust to updated execution_stage interface.	2017-03-26 10:50:21 +03:00
Paweł Dziepak	a78501c206	mutation_query: add an execution stage	2017-03-09 09:27:43 +00:00
Avi Kivity	439b38f5ab	Merge "Improvements to counter implementation" from Paweł "This series adds various optimisations to counter implementation (nothing extreme, mostly just avoiding unnecessary operations) as well as some missing features such as tracing and dropping timed out queries. Performance was tested using: perf-simple-query -c4 --counters --duration 60 The following results are medians. before after diff write 18640.41 33156.81 +77.9% read 58002.32 62733.93 +8.2%" * tag 'pdziepak/optimise-counters/v3' of github.com:cloudius-systems/seastar-dev: (30 commits) cell_locker: add metrics for lock acquisition storage_proxy: count counter updates for which the node was a leader storage_proxy: use counter-specific timeout for writes storage_proxy: transform counter timeouts to mutation_write_timeout_exception db: avoid allocations in do_apply_counter_update() tests/counters: add test for apply reversability counters: attempt to apply in place atomic_cell: add COUNTER_IN_PLACE_REVERT flag counters: add equality operators counters: implement decrement operators for shard_iterator counters: allow using both views and mutable_views atomic_cell: introduce atomic_cell_mutable_view managed_bytes: add cast to mutable_view bytes: add bytes_mutable_view utils: introduce mutable_view db: add more tracing events for counter writes db: propagate tracing state for counter writes tests/cell_locker: add test for timing out lock acquisition counter_cell_locker: allow setting timeouts db: propagate timeout for counter writes ...	2017-03-07 11:48:13 +02:00
Tomasz Grabiec	4b6e77e97e	db: Fix overflow of gc_clock time point If query_time is time_point::min(), which is used by to_data_query_result(), the result of subtraction of gc_grace_seconds() from query_time will overflow. I don't think this bug would currently have user-perceivable effects. This affects which tombstones are dropped, but in case of to_data_query_result() uses, tombstones are not present in the final data query result, and mutation_partition::do_compact() takes tombstones into consideration while compacting before expiring them. Fixes the following UBSAN report: /usr/include/c++/5.3.1/chrono:399:55: runtime error: signed integer overflow: -2147483648 - 604800 cannot be represented in type 'int' Message-Id: <1488385429-14276-1-git-send-email-tgrabiec@scylladb.com>	2017-03-01 18:49:56 +02:00
Paweł Dziepak	582d397c41	introduce counter_write_query() Counter write path involves read-modify-write. That read is guaranteed to query only a single partition, does not care about dead cells and expects to receive an unserialized mutation as a result. Standard mutation queries can are able to produce results fit for counter updates, but the logic involved is much more general (i.e. slower), hence the addition of new, counter-specific kind of query.	2017-03-01 16:33:36 +00:00

1 2 3 4 5

212 Commits