Commit Graph

156 Commits

Author SHA1 Message Date
Michael Livshin
c96708d262 add support for the ME sstable format
The ME format has been introduced in Cassandra 3.11.11:

11952fae77/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java (L123)
d84c6e9810

It adds originating host id to sstable metadata in support of fixing
loss of commit log data when moving sstables between nodes:

https://issues.apache.org/jira/browse/CASSANDRA-16619

In Scylla:

* The supported way to ingest sstables is via upload/, where stored
  commit log replay position should be disregarded (but see
  https://github.com/scylladb/scylla/issues/10080).

* A later commit in this series implements originating host id
  validation for native ME sstables.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Avi Kivity
42e3f33722 sstables: convert write() to concepts
There are three variants: integral, enum, and self-describing
(currently expressed as not integral and not enum). Convert to
concepts by using the standard concepts or the new self_describing
concept.
2021-03-18 19:26:43 +02:00
Avi Kivity
bba9c1c616 sstables: add concept for self-describing type
Our sstable parsing and writing code contains a self-describing
type concept, where a type can advertise its members via a
describe_types() member function with a specific protocol.

Formalize that into a C++ concept. This is a little tricky, since
describe_type() accepts a parameter that is itself a template, and
requires clauses only work with concrete type. To handle this problem,
create such a concrete example type and use it in the concept.
2021-03-18 17:52:54 +02:00
Benny Halevy
77328a936a sstables: scylla_metadata: add support for sstable_origin
Add new scylla_metadata_type::SSTableOrigin.
Store and retrive a sstring to the scylla metadata component.
Pass sstable_writer_config::origin from the mx sstable writer
and ignore it in the k_l writer.

Add unit test to verify the sstable_origin extension
using both empty and a random string.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-01 16:45:52 +02:00
Benny Halevy
92443ed71c sstables: store large_data_stats in scylla_metadata
Store the large data statistics in the scylla_metadata component.

These will be retrieved when loading the sstable and be
used for determining whether to delete the corresponding
large data entries upon sstable deletion.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-01 15:19:42 +02:00
Benny Halevy
79c19a166c sstables: writer: keep track of large data stats
In the next patch, this is will be written to the
sstable's scylla_metadata component.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-01 15:19:41 +02:00
Pekka Enberg
a37eaaa022 sstables: Add support for the "md" format enum value
Add the sstable_version_types::md enum value
and logically extend sstable_version_types comparisons to cover
also the > sstable_version_types::mc cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
25c1a16f8e sstables: move column_name_helper to metadata_collector.cc
It is used only for updating the metadata_collector {min,max}_column_names.
Implement metadata_collector::do_update_min_max_components in
sstables/metadata_collector.cc that will be used to host some other
metadata_collector methods in following patches that need not be
implemented in the header file.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 13:27:26 +03:00
Tomasz Grabiec
c1df00859e sstables: Make deletion_time printable
Message-Id: <1591387901-7974-12-git-send-email-tgrabiec@scylladb.com>
2020-06-07 13:55:34 +03:00
Avi Kivity
a4c44cab88 treewide: update concepts language from the Concepts TS to C++20
Seastar recently lost support for the experimental Concepts Technical
Specification (TS) and gained support for C++20 concepts. Re-enable
concepts in Scylla by updating our use of concepts to the C++20
standard.

This change:
 - peels off uses of the GCC6_CONCEPT macro
 - removes inclusions of <seastar/gcc6-concepts.hh>
 - replaces function-style concepts (no longer supported) with
   equation-style concepts
 - semicolons added and removed as needed
 - deprecated std::is_pod replaced by recommended replacement
 - updates return type constraints to use concepts instead of
   type names (either std::same_as or std::convertible_to, with
   std::same_as chosen when possible)

No attempt is made to improve the concepts; this is a specification
update only.
Message-Id: <20200531110254.2555854-1-avi@scylladb.com>
2020-06-02 09:12:21 +03:00
Kamil Braun
3d811e2f95 sstables: freeze types nested in collection types in legacy sstables
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).

SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).

This commit does two things:
1. for all SSTables created after this commit, include a new feature
   flag, CorrectUDTsInCollections, presence of which implies that frozen
   UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
   nested inside collections are always frozen, even if they don't have
   the tag. This assumption is safe to be made, because at the time of
   this commit, Scylla does not allow non-frozen (multi-cell) types inside
   collections or UDTs, and because of point 1 above.

There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.

Fixes #6130.
2020-04-16 18:44:56 +03:00
Piotr Jastrzebski
b569d127a0 token: change data to array<uint8_t, 8>
It is save to do such change because we support only
Murmur3Partitioner which uses only tokens that are
8 bytes long.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 09:30:46 +01:00
Avi Kivity
adb64dc72f treewide: tighten concepts syntax
gcc 10 requires a semicolon after every compound requirement,
as per the standard. Add missing semicolons where necessary.
Message-Id: <20200129205805.20928-1-avi@scylladb.com>
2020-01-30 14:10:18 +02:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Piotr Jastrzebski
a962696e44 sstables: Add a feature for empty counters in Scylla.db.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-23 10:10:24 +02:00
Tomasz Grabiec
47ca280e57 sstables: mc: Write static compact tables the same way as Cassandra
Static compact tables are tables with compact storage and no
clustering columns.

Before this patch, Scylla was writing rows of static compact tables as
clustered rows instead of static rows. That's because in our in-memory
model such tables have regular rows and no static row. In Cassandra's
schema (since 3.x), those tables have columns which are marked as
static and there are no regular columns.

This worked fine as long as Scylla was writing and reading those
sstables. But when importing sstables from Cassandra, our reader was
skipping the static row, since it's not present in the schema, and
returning no rows as a result. Also, Cassandra, and Scylla tools,
would have problems reading those sstables.

Fix this by writing rows for such tables the same way as Cassandra
does. In order to support rolling downgrade, we do that only when all
nodes are upgraded.

Fixes #4139.
2019-03-18 11:18:33 +01:00
Tomasz Grabiec
b68df143a1 sstables: Add sstable::features() 2019-03-18 11:18:33 +01:00
Benny Halevy
1ccd72f115 sstables: mc: use int64_t for local_deletion_time and ttl
In preparation for changing gc_clock::duration::rep to int64_t.

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
156f9ffa11 sstables: add capped_local_deletion_time stats counter
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
7609a04565 sstables: mc: metadata collector: cap local_deletion_time at max
max local_deletion_time_tracker in stats is int32_t so just track the limit
of (max int32_t - 1) if time_point is greater than the limit.
This corresponds to Cassandra's MAX_DELETION_TIME.

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
bd6861989d sstables: mc: use proper gc_clock types for local_deletion_time and ttl
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
6465a673f5 sstables: mc: define expired_liveness_ttl as signed int32_t
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
c4c2133e3e sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time
mc format only writes delta local_deletion_time of tombstones.
Conventional deletion_time is written only for the partition header.

Restructure the code to pass a tombstone to write_delta_deletion_time
rather than struct deletion_time to prepare for using 64-bit deletion times.

The tombstone uses gc_clock::time_point while struct
deletion_time is limited to int32_t local_deletion_time.

Note that for "live" tombstones we encode <api::missing_timestamp,
no_deletion_time> as was previously evaluated by to_deletion_time().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
844a2de263 sstables: mc: prevent signed integer overflow
Fix runtime error: signed integer overflow
introduced by 2dc3776407

Delta-encoded values may wrap around if the encoded value is
less than the base value.  This could happen in two places:
In the mc-format serialization header itself, where the base values are implicit
Cassandra epoch time, and in the sstables data files, where the base values
are taken from the encoding_stats (later written to the serialization_header).

In these cases, when the calculation is done using signed integer/long we may see
"runtime error: signed integer overflow" messages in debug mode
(with -fsanitize=undefined / -fsanitize=signed-integer-overflow).

Overflow here is expected and harmless since we do not gurantee that
neither the base values in the serialization header are greater than
or equal to Cassandra's epoch now that the delta-encoded values are
always greater than or equal to the respective base values in
the serialization header.

To prevent these warnings, the subtraction/addition should be done with unsigned
(two's complement) arithmetic and the result converted to the signed type.

Note that to keep the code simple where possible, when also rely on implicit
conversion of signed integers to unsigned when either one of added value is unsigned
and the other is signed.

Fixes: #4098

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190120142950.15776-1-bhalevy@scylladb.com>
2019-01-20 16:59:46 +02:00
Benny Halevy
2dc3776407 sstables: mc: sign-extend serialization_header min_local_deletion_time_base and min_ttl_base
Refs #4074
Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190110141439.1324-1-bhalevy@scylladb.com>
2019-01-10 16:23:20 +02:00
Benny Halevy
60323b79d1 sstables: mc: sign-extend delta local_deletion_time and delta ttl
Follow Cassandra's encoding so that values that are less than the
baseline encoding_stats will wrap-around in 64-bits rather tham 32.

Fixes #4074
Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com>
2019-01-09 21:43:30 +02:00
Rafael Ávila de Espíndola
26ac2c23ef Change *_row_* names that refer to partitions
This renames some variables and functions to make it clear that they
refer to partitions and not rows.

Old versions of sstablemetadata used to refer to a row histogram, but
current versions now mention a partition histogram instead.

This patch doesn't change the exposed API names.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181229223311.4184-2-espindola@scylladb.com>
2019-01-09 14:53:42 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Benny Halevy
40410465d7 sstables: mc: expired_liveness_ttl should be max int32_t rather than max uint32_t
Corresponding to Cassandra's EXPIRED_LIVENESS_TTL = Integer.MAX_VALUE;

Fixes #4060

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190107172457.20430-1-bhalevy@scylladb.com>
2019-01-07 18:41:37 +01:00
Tomasz Grabiec
a4721b4d50 sstables: types: Extract sstable_enabled_features::all() 2018-12-12 12:06:45 +01:00
Tomasz Grabiec
fad4fba4bc sstables: Templetize write() functions on the writer
Will allow writing to both a file_writer, or an in-memory writer like
a bytes_ostream.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
aa19f98d18 sstables: Write Statistics.db offset map entries in the same order as Cassandra
Before this patch we were writing offset map enteies in unspecified
order, the one returned by std::unorderd_map. Cassandra writes them
sorted by metadata_type. Use the same order for improved
compatibility.

Fixes #3955.

Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com>
2018-12-03 16:40:24 +02:00
Raphael S. Carvalho
a66b1954cc sstables: use a random uuid for sstables without run identifier
Older sstables must have an identifier for them to be associated
with their own run.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:01 -02:00
Raphael S. Carvalho
62025fa52c sstables: add run identifier to scylla metadata
It identifies a run which a particular sstable belongs to.
Existing sstables will have a random uuid associated with it
in memory.

UUID is the correct choice because it allows sstables to be
exported without having conflicts when using identifier generated
by different nodes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:52:44 -02:00
Raphael S. Carvalho
d29482dce8 sstables: deprecate sstable metadata's ancestors
The reason for that is that it's not available in sstable format mc,
so we can no longer rely on it in common code for the currently
supported formats.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>
2018-11-23 19:38:32 +01:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Vladimir Krivopalov
759d36a26e sstables: Support Scylla-specific extension for writing shadowable tombstones.
The original SSTables 'mc' format, as defined in Cassandra, does not provide
a way to store shadowable deletion in addition to regular row deletion
for materialized views.
It is essential to store it because of known corner-case issues that
otherwise appear.

For this to work, we introduce a Scylla-specific extended flag to be set
in SSTables in 'mc' format that indicates a shadowable tombstone is
written after the regular row tombstone.

This is deemed to be safe because shadowable tombstones are specific to
materialized views and MV tables are not supposed to be imported or
exported.

Note that a shadowable tombstone can be written without a regular
tombstone as well as along with it.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
e168433945 sstables: Introduce a feature for shadowable tombstones in Scylla.db.
This is used to indicate that the SSTables being read may contain a
Scylla-specific HAS_SCYLLA_SHADOWABLE_TOMBSTONE extended flag set.

If feature is not disabled, we should not honour this flag.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
8f79f76116 sstables: Support checking row extension flags for Cassandra shadowable deletion.
This flag can be only used in MV tables that are not supposed to be
imported to Scylla.
Since Scylla representation of shadowable tombstones differs from that
of Cassandra, such SSTables are rejected on read and Scylla never sets
this flag on writing.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
e71cc5ab20 sstables: Introduce TTL limitation and special 'expired TTL' value.
This allows to store expired liveness info in SSTables 3.x format
without introducing a possible conflict with real TTL values.

As per Cassandra, TTL cannot exceed 20 years so taking the maximum value
as a special value for indicating expired liveness info is safe.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-10 11:44:14 -07:00
Vladimir Krivopalov
bdca27ae41 sstables: Always store only min bases in serialization_header.
There previously was an inconsistency in treating min values stored in a
serialization_header. They are written to or read from a Statistics.db
as deltas against fixed bases, but when we parse timeouts from the data
file, we need the full bases, not just deltas.

This inconsistency causes wrong timestamp values if we write an sstable
and then read from it using one and the same sstable object because we
turn min values into bases on write and then don't adjust them back
because we already have them in memory.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-25 17:23:40 -07:00
Vladimir Krivopalov
48fa088ec6 sstables: Do not parse ancestors from compaction metadata for SSTables 3.x
Ancestors array has been removed starting from 'ma' format
(CASSANDRA-7066).

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-09-19 17:11:43 -07:00
Vladimir Krivopalov
4bf1e9de3f sstables: Support resetting data_consume_rows_context_m to indexable_element::cell.
Set the proper parsing state when resetting to indexable_element::cell.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 10:09:19 -07:00
Vladimir Krivopalov
a497edcbda sstables: Move promoted_index_block from types.hh to index_entry.hh.
It is only being used by index_reader internally and never exposed so
should not be listed in commonly used types.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-06-28 12:28:59 -07:00
Piotr Jastrzebski
a3683d6e0f sstables 3: add serialization_header::adjust
In SSTables 3, min timestamp and min deletion time in serialization
header are not stored normally but instead the difference between
their value and the cassandra "epoch" is stored.
This is supposed to make SSTables smaller. As a consequence, we have
to add the "epoch" after reading the values to obtain the actual
values of min timestamp and min deletion time.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-06-15 09:10:48 +02:00
Piotr Jastrzebski
2b8ff15f9f column_flags_m: add HAS_COMPLEX_DELETION
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-06-07 22:47:19 +02:00
Piotr Jastrzebski
f6e1c38486 Introduce column_flags_m
This will be used for reading columns from data file.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-05-23 19:54:16 +02:00
Piotr Jastrzebski
d8cd8e04ed Add unfiltered_flags_m::has_all_columns
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-05-23 16:39:52 +02:00