When views contain a primary key column that is not part of the base
table primary key, that column determines whether the row is live or
not. We need to ensure that when that cell is dead, and thus the
derived row marker, either by normal deletion of by TTL, so is the
rest of the row.
This patch introduces the idea of shawdowing row marker. We map the
status of the regular base column in the view's PK to the view row's
marker. If this marker is dead, so is that cell in the base table, and
so should the view row become. To enforce that, a view row's dead
marker shadows the whole row if that view includes a base regular
column in its PK.
Fixes#3360
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The querier encapsulates all objects needed to serve queries, except
result builders. It is designed to be suspendable, savable and
resumable. It contains all logic needed to suspend, resume and determine
whether the querier can be resumed or not.
It is the foundation upon which the "reader-reuse" mechanism is built.
are_limits_reached() allows querying whether the compactor reached
the page's limits. This is needed to determine whether there will be
more pages and thus whether the compact_mutation_state has to be kept
around.
start_new_page() resets the limits to the current page's ones and
sets the _empty_partition flag so that the partition header (if the last
page finished inside a partition) will be reemitted.
Make a copy of the current decorated-key in consume_end_of_stream() so
that it persists while the compaction state is suspended.
Also add current_partition() to allow client code to query the partition
the compaction is positioned in. This is needed to determine whether
the start position of the next page matches that of the
compact_mutation_state.
Currently compact_mutation is used as a use-once-then-throw-away object.
After it satisfies its consumer it's destroyed together with the
consumer. This conflicts with the effort to save and reuse readers and
associated infrastructure between pages of a query.
To resolve this conflict compact_mutation is split into two classes:
(1) compact_mutation_state
(2) compact_mutation
compact_mutation_state encapsulates all the compaction logic and state,
while compact_mutation continues to provide the same API using
compact_mutation_state behind the scenes.
compact_mutation_state doesn't store the consumer, instead its
consume_* methods are templated on the consumer and take it as an
argument. This allows compact_mutation_state to be independent of the
consumer's type.
Additionally compact_mutation can now be constructed from a shared
pointer to compact_mutation_state. This allows client code to
pre-construct a compaction state and retain it after the
compact_mutation object is destroyed.
These changes allow the state of a compaction to be saved and restored
later while code that is only interested in storing the saved state
can stay independent of the consumer's type.
This patch only contains the splitting of compact_mutation into
compact_mutation and compact_mutation_state. The next patches will add
the missing functionality that is needed to make compact_mutation_state
truly reusable across pages.
query::full_slice doesn't select any regular or static columns, which
is at odds with the expectations of its users. This patch replaces it
with the schema::full_slice() version.
Refs #2885
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>
When compacting a partition for querying we would read an extra row,
to include any tombstones between that one and the previous row.
This is no longer needed since we have a general mechanism to detect
short reads in the storage_proxy.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170811103031.22866-1-duarte@scylladb.com>
This patch replaces the current row tombstone representation by a
row_tombstone.
The intent of the patch is thus to reify the idea of shadowable
tombstones, that up until now we considered all materialized view row
tombstones to be.
We need to distinguish shadowable from non-shadowable row tombstones
to support scenarios such as, when inserting to a table with a
materialzied view:
1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1
2. delete from base using timestamp 2 where p = 3
3. insert into base (p, v1) values (3, 1) using timestamp 3
These should yield a view row where v2 is definitely null, but with
the current implementation, v2 will pop back with its value v2=3@TS=1,
even though its dead in the base row. This is because the row
tombstone inserted at 2) is a shadowable one.
This patch only addresses the memory representation of such
row_tombstones.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
If query_time is time_point::min(), which is used by
to_data_query_result(), the result of subtraction of
gc_grace_seconds() from query_time will overflow.
I don't think this bug would currently have user-perceivable
effects. This affects which tombstones are dropped, but in case of
to_data_query_result() uses, tombstones are not present in the final
data query result, and mutation_partition::do_compact() takes
tombstones into consideration while compacting before expiring them.
Fixes the following UBSAN report:
/usr/include/c++/5.3.1/chrono:399:55: runtime error: signed integer overflow: -2147483648 - 604800 cannot be represented in type 'int'
Message-Id: <1488385429-14276-1-git-send-email-tgrabiec@scylladb.com>
With this patch we stop counting dead partitions (i.e., partitions
containing only tombstones) towards the partition limit, which should
apply only to partitions with live rows.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Originally, streamed_mutations guaranteed that emitted tombstones are
disjoint. In order to achieve that two separate objects were produced
for each range tombstone: range_tombstone_begin and range_tombstone_end.
Unfortunately, this forced sstable writer to accumulate all clustering
rows between range_tombstone_begin and range_tombstone_end.
However, since there is no need to write disjoint tombstones to sstables
(see #1153 "Write range tombstones to sstables like Cassandra does") it
is also not necessary for streamed_mutations to produce disjoint range
tombstones.
This patch changes that by making streamed_mutation produce
range_tombstone objects directly.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Checking bloom filters of sstables to compute max purgeable timestamp
for compaction is expensive in terms of CPU time. We can avoid
calculating it if we're not about to GC any tombstone.
This patch changes compacting functions to accept a function instead
of ready value for max_purgeable.
I verified that bloom filter operations no longer appear on flame
graphs during compaction-heavy workload (without tombstones).
Refs #1322.
With so many consumer concepts out there, it is confusing to name
parameters using genering "Consumer" name, let's name them after
(already defined) concepts: CompactedMutationsConsumer, FlattenedConsumer.
compact_mutation code is going to be shared among queries and sstable
compaction. There are some differences though. Queries don't provide
_max_purgeable and sstable compaction don't need any limits.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
_max_perguable can be different for each partition, since it is computed
using sstables in which that partition is present (or likely to be
present).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Since decorated keys are already computed it is better to pass more
information than less. Consumers interested just in partition key can
just drop token and the ones requiring full decorated key don't need to
recompute it.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>