Commit Graph

34 Commits

Author SHA1 Message Date
Tomasz Grabiec
08f4a3664e sstables: mc: writer: Avoid large allocations for maintaining promoted index
Currently, we keep the entries in a circular_buffer, which uses
a contiguous storage. For large partitions with many promoted index
entries this can cause OOM and sstable compaction failure.

A similar problem exists for the offset vector built
in write_promoted_index().

This change solves the problem by serializing promoted index entries
and the offset vector on the fly directly into a bytes_ostream, which
uses fragmented storage.

The serialization of the first entry is deferred, so that
serialization is avoided if there will be less than 2
entries. Promoted index is not added for such partitions.

There still remains a problem that large-enough promoted index can cause OOM.

Refs #4217
2019-02-18 16:03:07 +01:00
Tomasz Grabiec
4e093bc3a4 sstables: mc: writer: Avoid double-serialization of the promoted index 2019-02-18 16:03:07 +01:00
Paweł Dziepak
bc61471132 sstables/mc/writer: don't assume all schema columns are present
The writer constructor prepares lists of present static and regular
columns, those should be used for any further checks.
2019-02-07 10:16:50 +00:00
Rafael Ávila de Espíndola
625080b414 Rename large_partition_handler
Now that it also handles large rows, rename it to large_data_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:14 -08:00
Rafael Ávila de Espíndola
1185138a34 Print a warning if a row is too large
Tests: unit (release)

Refs #3988.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:10 -08:00
Benny Halevy
93270dd8e0 gc_clock: make 64 bit
Fixes: #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
1ccd72f115 sstables: mc: use int64_t for local_deletion_time and ttl
In preparation for changing gc_clock::duration::rep to int64_t.

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
427d6e6090 sstables: add capped_tombstone_deletion_time stats counter
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
0ec46924bf sstables: mc: cap partition tombstone local_deletion_time to max
deletion_time struct as int32_t deletion_time that cannot hold long
time values. Cap local_deletion_time to max_local_deletion_time and
log a warning about that,
This corresponds to Cassandra's MAX_DELETION_TIME.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
bd6861989d sstables: mc: use proper gc_clock types for local_deletion_time and ttl
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
33314cec3f sstables: safely convert ttl and local_deletion_time to int32_t
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
c4c2133e3e sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time
mc format only writes delta local_deletion_time of tombstones.
Conventional deletion_time is written only for the partition header.

Restructure the code to pass a tombstone to write_delta_deletion_time
rather than struct deletion_time to prepare for using 64-bit deletion times.

The tombstone uses gc_clock::time_point while struct
deletion_time is limited to int32_t local_deletion_time.

Note that for "live" tombstones we encode <api::missing_timestamp,
no_deletion_time> as was previously evaluated by to_deletion_time().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
820906b794 sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
844a2de263 sstables: mc: prevent signed integer overflow
Fix runtime error: signed integer overflow
introduced by 2dc3776407

Delta-encoded values may wrap around if the encoded value is
less than the base value.  This could happen in two places:
In the mc-format serialization header itself, where the base values are implicit
Cassandra epoch time, and in the sstables data files, where the base values
are taken from the encoding_stats (later written to the serialization_header).

In these cases, when the calculation is done using signed integer/long we may see
"runtime error: signed integer overflow" messages in debug mode
(with -fsanitize=undefined / -fsanitize=signed-integer-overflow).

Overflow here is expected and harmless since we do not gurantee that
neither the base values in the serialization header are greater than
or equal to Cassandra's epoch now that the delta-encoded values are
always greater than or equal to the respective base values in
the serialization header.

To prevent these warnings, the subtraction/addition should be done with unsigned
(two's complement) arithmetic and the result converted to the signed type.

Note that to keep the code simple where possible, when also rely on implicit
conversion of signed integers to unsigned when either one of added value is unsigned
and the other is signed.

Fixes: #4098

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190120142950.15776-1-bhalevy@scylladb.com>
2019-01-20 16:59:46 +02:00
Paweł Dziepak
635873639b Merge "Encoding stats enhancements" from Benny
"
Cleanup various cases related to updating of metatdata stats and encoding stats
updating in preparation for 64-bit gc_clock (#3353).

Fixes #4026
Fixes #4033
Fixes #4035
Fixes #4041

Refs #3353
"

* 'projects/encoding-stats-fixes/v6' of https://github.com/bhalevy/scylla:
  sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES
  sstables: mc: use api::timestamp_type in write_liveness_info
  sstables: mc: sstable_write encoding_stats are const
  mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time
  memtable: don't use encoding_stats epochs as default
  memtable: mc: udpate min_ttl encoding stats for dead row marker
  memtable: mc: add comment regarding updating encoding stats of collection tombstones
  sstables: metadata_collector: add update tombstone stats
  sstables: assert that delete_time is not live when updating stats
  sstables: move update_deletion_time_stats to metadata collector
  sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram
  sstables: mc: write_liveness_info and write_collection should update tombstone_histogram
  sstables: update_local_deletion_time for row marker deletion_time and expiration
2019-01-15 16:53:36 +02:00
Avi Kivity
f5ee466a1c Merge "Cleanup UDT and tuple names creation" from Piotr
"
Currently the logic is scattered between types.*, cql3_types.* and
sstables/mc/writer.cc.

This patchset places all the logic in types.* and makes sure we
correctly add "frozen<...>" and "FrozenType(...)" to the names of
tuples and UDTs.

Fixes #4087

Tests: unit(release)
"

* 'haaawk/4087_v1' of github.com:scylladb/seastar-dev:
  Add comment explaining tuple type name creation
  Add "FrozenType(...)" to UDT name only when it's frozen
  Move "FrozenType(...)" addition to UDT name to user_type_impl
  Add "frozen<...>" to tuple CQL name only when it's frozen
  Move "frozen<...>" addition to tuple CQL name to tuple_type_impl
  Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type
  Add "frozen<...>" to UDT CQL name only when it's frozen
  Move "frozen<...>" addition to UDT CQL name to user_type_impl
2019-01-13 15:34:24 +02:00
Benny Halevy
d9e2aa65fc sstables: mc: use api::timestamp_type in write_liveness_info
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
7ea96aa778 sstables: mc: sstable_write encoding_stats are const
Encoding stats are immutable once statistics are sealed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
75ccd29b6a sstables: metadata_collector: add update tombstone stats
Conditionally update timestamp and local_deletion_time stats based on tombstone

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
12e6b503c9 sstables: move update_deletion_time_stats to metadata collector
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
2989b986ef sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram
Refs #4026
Refs #4033

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
bcb1fcd402 sstables: mc: write_liveness_info and write_collection should update tombstone_histogram
Fixes #4033

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Rafael Ávila de Espíndola
cd9ce18874 sstable: rename the is_boundary predicate
The new name makes it clear what is on either side of the boundary.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190110221324.33618-1-espindola@scylladb.com>
2019-01-11 14:36:49 +02:00
Piotr Jastrzebski
fc17bd376b Move "FrozenType(...)" addition to UDT name to user_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:07:47 +01:00
Benny Halevy
2dc3776407 sstables: mc: sign-extend serialization_header min_local_deletion_time_base and min_ttl_base
Refs #4074
Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190110141439.1324-1-bhalevy@scylladb.com>
2019-01-10 16:23:20 +02:00
Avi Kivity
4a6aeced59 Merge "Fix UDTs representation in serialization header" from Piotr
"
Tests: unit(release)
"

Fixes #4073.

* commit 'FETCH_HEAD~1':
  Add test for serialization header with UDT
  Fix UDT names in serialization header
2019-01-10 12:57:11 +02:00
Piotr Jastrzebski
3de85aebc9 Fix UDT names in serialization header
Serialization header stores type names of all
columns in a table. Including partition key columns,
clustering key columns, static columns and regular columns.

If one of those types is a user defined type then we need to
wrap its name into
"org.apache.cassandra.db.marshal.FrozenType(...)".

Fixes #4073

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-10 10:58:30 +01:00
Benny Halevy
60323b79d1 sstables: mc: sign-extend delta local_deletion_time and delta ttl
Follow Cassandra's encoding so that values that are less than the
baseline encoding_stats will wrap-around in 64-bits rather tham 32.

Fixes #4074
Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com>
2019-01-09 21:43:30 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Rafael Ávila de Espíndola
51a08c3240 sstable: remove constexpr from run time predicates
We never check these predicates at compile time.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190108010055.92042-1-espindola@scylladb.com>
2019-01-08 12:28:42 +02:00
Rafael Ávila de Espíndola
3c9178d122 sstables: Refactor predicates on bound_kind_m
This moves the predicate functions to the start of the file, renames
is_in_bound_kind to is_bound_kind for consistency with to_bound_kind
and defines all predicates in a similar fashion.

It also uses the predicates to reduce code duplication.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-02 17:50:44 -08:00
Tomasz Grabiec
62a1afaac9 sstables: mc: writer: Avoid calling unsigned_vint::serialized_size()
Rather than adding serialized_size() to the body size before
serializing the field, we can serialize the field to _tmp_bufs at the
beginning and have the body size automatically account for it.
2018-12-18 11:11:36 +01:00
Tomasz Grabiec
a14633c6d0 sstables: Extract MC format writer to mc/writer.cc
This moves all MC-related writing code to mc/writer.cc:

  - m_format_write_helpers.hh is dropped
  - m_format_write_helpers_impl.hh is dropped
  - sstable_writer_m is moved out of sstables.cc

sstable_writer_m is renamed to sstables::mc::writer
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
bd7e9ad3ab sstables: mc: Extract bound_kind_m related stuff into mc/types.hh 2018-12-12 12:06:46 +01:00