The following patches convert sstable writers to use flat mutation
readers instead of the legacy mutation_reader interface.
Writers were already using flat consumer interface and used
consume_flattened_in_thread(), so most of the work was limited to
providing an appropriate equivalent for flat mutation readers.
* https://github.com/pdziepak/scylla.git flat_mutation_reader-sstable-write/v1:
flat_mutation_reader: move consumer_adapter out of consume()
flat_mutation_reader: introduce consume_in_thread()
tests/flat_mutation_reader: test consume_in_thread()
sstables: switch write_components() to flat_mutation_reader
streamed_mutation: drop streamed_mutation_returning()
sstables: convert compaction to flat_mutation_reader
mutation_reader: drop consume_flattened_in_thread()
This series mainly fixes issues with the serialization of promoted
index entries for non-compound schemas and with the serialization of
range tombstones, also for non-compound schemas.
We lift the correct cell name writing code into its own function,
and direct all users to it. We also ensure backward compatibility with
incorrectly generated promoted indexes and range tombstones.
Fixes#2995Fixes#2986Fixes#2979Fixes#2992Fixes#2993
* git@github.com:duarten/scylla.git promoted-index-serialization/v3:
sstables/sstables: Unify column name writers
sstables/sstables: Don't write index entry for a missing row maker
sstables/sstables: Reuse write_range_tombstone() for row tombstones
sstables/sstables: Lift index writing for row tombstones
sstables/sstables: Leverage index code upon range tombstone consume
sstables/sstables: Move out tombstone check in write_range_tombstone()
sstables/sstables: A schema with static columns is always compound
sstables/sstables: Lift column name writing logic
sstables/sstables: Use schema-aware write_column_name() for
collections
sstables/sstables: Use schema-aware write_column_name() for row marker
sstables/sstables: Use schema-aware write_column_name() for static row
sstables/sstables: Writing promoted index entry leverages
column_name_writer
sstables/sstables: Add supported feature list to sstables
sstables/sstables: Don't use incorrectly serialized promoted index
cql3/single_column_primary_key_restrictions: Implement is_inclusive()
cql3/delete_statement: Constrain range deletions for non-compound
schemas
tests/cql_query_test: Verify range deletion constraints
sstables/sstables: Correctly deserialize range tombstones
service/storage_service: Add feature for correct non-compound RTs
tests/sstable_*: Start the storage service for some cases
sstables/sstable_writer: Prepare to control range tombstone
serialization
sstables/sstables: Correctly serialize range tombstones
tests/sstable_assertions: Fix monotonicity check for promoted indexes
tests/sstable_assertions: Assert a promoted index is empty
tests/sstable_mutation_test: Verify promoted index serializes
correctly
tests/sstable_mutation_test: Verify promoted index repeats tombstones
tests/sstable_mutation_test: Ensure range tombstone serializes
correctly
tests/sstable_datafile_test: Add test for incorrect promoted index
tests/sstable_datafile_test: Verify reading of incorrect range
tombstones
sstables/sstable: Rename schema-oblivious write_column_name() function
sstables/sstables: No promoted index without clustering keys
tests/sstable_mutation_test: Verify promoted index is not generated
sstables/sstables: Optimize column name writing and indexing
compound_compat: Don't assume compoundness
TTL of 1 second may cause the cell to expire right after we write it,
if the second component of current time changes right after it. Use
larger ttl to avoid spurious faliures due to this.
Message-Id: <1511463392-1451-1-git-send-email-tgrabiec@scylladb.com>
This patch changes some factory functions so that they don't assume
the schema is compound.
This enables some code simplification in
sstables::write_column_name().
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Instead of serializing the column name twice, serialize it once into a
buffer which gets used for index bookkeeping and to write to disk.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
flat_mutation_reader provides a replacement for the old
consume_flattened*() interface and therefore an 'in-thread' variant is
also necessary. It expects to be executed in a seastar::thread context
and guarantees that the consumer member functions will be invoked inside
that thread as well (which is why it cannot be easily replaced by
non-thread version).
Addition to that, just like the old consume_flattened_in_thread() its
replacement allows specifying a filter functions that causes selected
partitions to be skipped entirely and never reach the consumer.
This function is now called write_compound_non_dense_column_name() so
callers are aware of the cases where it call be called.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Add a test to verify that we can still read incorrectly written range
tombstones for non-compound schemas, for previous Scylla versions.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch ensures we correctly serialize range tombstones for dense
non-compound schemas, which until now assumed the bounds were compound
composite. We also fix the reading function, which assumed the same
thing. This affected Apache Cassandra compatibility.
Fixes#2986
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds support to sstable_writer to be able to control
correct range tombstone serialization.
When range tombstone serialization will be fixed in subsequent
patches, it will only be enabled when the whole cluster supports the
feature to allow for rollbacks.
The feature needs to be enabled for an sstable as a whole, to prevent
problems with it being enabled during an sstable write.
Thus, the sstable writer will pass on this information to the sstable
methods that carry out the actual file writing.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds a cluster feature to enable correct serialization of
non-compound range tombstones. We thus support rollbacks during an
upgrade, as we will only change range tombstone serialization when the
cluster is fully upgraded and all nodes are capable of reading the new
format.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the range tombstone read path to deal with
correctly written non-compound range tombstones, while also
maintaining backward compatibility and reading old Scylla-generated
range tombstones.
The fix for the write path will activate an sstable feature which will
connect with this patch.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
We cannot represent ranged deletions with non-inclusive bounds on our
current storage format for schemas that are non-compound, since the
clustering key won't include the EOC byte.
Refs #2986
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Promoted indexes generated before this patch by Scylla are considered
incorrect if they belong to a non-compound schema, due to #2993.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds additional metadata to the scylla sstable component.
Namely, it adds a list of features that the current sstable supports.
The upcoming usages of the feature list are meant for backward
compatibility, but the implementation makes no such assumptions.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch refactors writing a promoted index entry to leverage the
column_name_writer. It not only reduces code duplication, but also
solves two important bugs:
1) Column names for schema types other than compound non-dense were
not correctly serialized, as the wrong overload of
write_column_name() was being called, which assumed the specified
composite to be compound.
2) Before, for some schema types we were passing an empty
clustering_key to maybe_flush_pi_block(), which caused it to bypass
appending open range tombstones to the data file, causing wrong
query results to be returned.
Fixes#2979Fixes#2992Fixes#2993
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch lifts the logic to write a column name depending on the
schema's denseness and compoundness into a function, so that it may
later be reused in other places. We still duplicate the same logic
when writing a clustered row because the index writer requires it for
now.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
A schema can only have static columns if it has at least one
clustering column. A schema with a clustering column is always
compound, unless it is created with compact storage. A schema created
with compact storage cannot have static columns, so we can remove dead
code from the sstable write path.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Encapsulate the decision to write the row_marker and to write a
corresponding entry in the promoted index. We now avoid writing the
index entry if there is no row marker, and just start indexing the row
at the first cell.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Making consumer_adapter a member of flat_mutation_reader::impl instead
of being a local class in consume() will make it possible to reuse that
helper in other functions.
It's hard to make sense of the metric transport.requests_blocked_memory
because it shows a queue size. Specially in production setups scraping
at every 15 seconds, that doesn't tell us much.
We solve that in other layers that record blocking by providing both a
requests_blocked_memory and requests_blocked_memory_current
Fixes#3010
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20171123033329.32596-1-glauber@scylladb.com>
Prometheus histograms have 3 embedded metrics: count, buckets, and sum.
Currently we fill up count and buckets but sum is left at 0. This is
particularly bad, since according to the prometheus documentation, the
best way to calculate histogram averages is to write:
rate(metric_sum[5m]) / rate(metric_count[5m])
One way of keeping track of the sum is adding the value we sampled,
every time we sample. However, the interface for the estimated histogram
has a method that allows to add a metric while allowing to adjust the
count for missing metrics (add_nano())
That makes acumulating a sum inaccurate--as we will have no values for
the points that were added. To overcome that, when we call add_nano(),
we pretend we are introducing new_count - _count metrics, all with the
same value.
Long term, doing away with sampling may help us provide more accurate
results.
After this patch, we are able to correctly calculate latency averages
through the data exported in prometheus.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20171122144558.7575-1-glauber@scylladb.com>
* seastar-dev.git haaawk/flat_reader_remove_read_rows:
sstable_mutation_test: use read_rows_flat instead of read_rows
perf_sstable: use read_rows_flat instead of read_rows
Remove sstable::read_rows