Files
scylladb/docs/dev/sstable-scylla-format.md
Botond Dénes a9c86fc2e4 docs: document schema subcomponent in sstable-scylla-format.md
Commit 234f905 (sstables: scylla_metadata: add schema member) added a
new Schema subcomponent (tag 11) to scylla_metadata. Document it in the
sstable Scylla format reference:

- Add schema to the subcomponent grammar enumeration
- Add a summary entry describing the subcomponent (tag 11) and its purpose
- Add a detailed ## schema subcomponent section with the binary grammar,
  covering table_id, table_schema_version, keyspace_name, table_name and
  the column_description array (column_kind, column_name, column_type)

Fixes https://github.com/scylladb/scylladb/issues/27960

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#28983
2026-04-15 14:40:15 +03:00

8.5 KiB

File format of the Scylla.db sstable component

The Scylla.db component (present in a file named like mc-223-big-Scylla.db contains assorted Scylla-only metadata. Its presence indicates the sstable was created by Scylla (or some Scylla-aware creator). Non-Scylla consumers will ignore it.

The file is small and intended to be processed in-memory.

Main structure

The main structure is that of an unordered set of subcomponents. Each component is prefixed with a be32 tag that indicates its type, and its serialized size (so unknown subcomponents can be skipped).

scylla_db = subcomponent_count (tag serialized_size subcomponent)*
subcomponent_count = be32
serialized_size = be32
tag = be32

Subcomponents and tag values

The following subcomponents are recognized. They are described in more detail in individual sections

subcomponent = sharding_metadata
    | features
    | extension_attributes
    | run_identifier
    | large_data_stats
    | sstable_origin
    | scylla_build_id
    | scylla_version
    | ext_timestamp_stats
    | schema
    | components_digests

sharding_metadata (tag 1): describes what token sub-ranges are included in this sstable. This is used, when loading the sstable, to determine which shard(s) it occupies.

features (tag 2): a set of boolean flags that describe the sstable

extension_attributes (tag 3): a map<string, string> with additional attributes

run_identifier (tag 4): a uuid that is the same for all sstables in the same run (and different for sstables in different runs).

large_data_stats (tag 5): a map<large_data_type, large_data_stats_entry> with statistics about large data entities in the sstable.

sstable_origin (tag 6): a string describing the origin of the sstable ("memtable" for memtable flush, "garbage collection" for compaction, etc.).

scylla_build_id (tag 7): a string containing the build id of the Scylla executable that created the sstable.

scylla_version (tag 8): a string containing the version of the Scylla executable that created the sstable.

ext_timestamp_stats (tag 9): a map<ext_timestamp_stats_type, int64_t> with statistics about timestamps in the sstable, like: min_live_timestamp, and min_live_row_marker_timestamp.

sstable_identifier (tag 10): a uuid identifying the sstable for its whole lifetime. It is derived from the sstable uuid generation, upon creation (or uniquely generated if the sstable has numerical generation). Yet, unlike the sstable that may change if the sstable is migrated to a different shard or node, the sstable identifier is stable and copied with the rest of the scylla metadata.

schema (tag 11): the schema of the table the sstable belongs to. It stores the most important fields: the table id and version (as UUIDs), keyspace name, table name, and a list of all columns with their kind, name and type. It is not a complete schema equivalent to the one stored in the system schema tables, but it contains enough information for tools like scylla-sstable to parse an sstable in a self-sufficient manner.

components_digests (tag 12): a map<component_type, uint32_t> with CRC32 digests of all SSTable component files that are checksummed during write. Each entry maps a component type (e.g., Data, Index, Filter, Statistics, etc.) to its CRC32 checksum. This allows verifying the integrity of individual component files.

The scylla sstable dump-scylla-metadata tool can be used to dump the scylla metadata in JSON format.

Trailing digest

When the components_digests subcomponent is present, the Scylla.db file contains a trailing CRC32 digest appended after the serialized subcomponents data. This digest covers the entire serialized data section (i.e., all subcomponents) and can be used to verify the integrity of the scylla metadata itself.

sharding_metadata subcomponent

sharding_metadata = token_range_count token_range*
token_range_count = be32
token_range = left_token_bound right_token_bound
left_token_bound = token_bound
right_token_bound = token_bound
token_bound = exclusive_flag token
exclusive_flag = byte          // 0=inclusive, 1=exclusive
token = token_size byte*
token_size = be16

Sharding metadata is a sorted list of disjoint token ranges. Each token range consists of a left bound and a right bound; either bound may be inclusive or exclusive. The tokens are interpreted according to the partitioner.

The sstable contains no partitions whose token is outside the ranges described by sharding_metadata.

features subcomponent

features = be64      // interpreted as a set of bits

bit 0: NonCompoundPIEntries (if set, indicates the sstable was generated by Scylla with issue #2993 fixed)

bit 1: NonCompoundRangeTombstones (if set, indicates the sstable was generated by Scylla with issue #2986 fixed)

bit 2: ShadowableTombstones (if set, indicates the sstable was generated by Scylla with issue #3885 fixed)

bit 3: CorrectStaticCompact (if set, indicates the sstable was generated by Scylla with issue #4139 fixed)

bit 4: CorrectEmptyCounters (if set, indicates the sstable was generated by Scylla with issue #4363 fixed)

bit 5: CorrectUDTsInCollections (if set, indicates that the sstable was generated by Scylla with issue #6130 fixed)

bit 6: CorrectLastPiBlockWidth (if set, indicates that the width of the last promoted index block never includes the partition end marker)

extension_attributes subcomponent

extension_attributes = extension_attribute_count extension_attribute*
extension_attribute_count = be32
extension_attribute = extension_attribute_key extension_attribute_value
extension_attribute_key = string32
extension_attribute_value = string32
string32 = string32_size byte*
string32_size = be32

There are currently no defined attributes.

run_identifier subcomponent

run_identifier = uuid
uuid = uuid_high_bits uuid_low_bits
uuid_high_bits = be64
uuid_low_bits = be64

If the run_identifier subcomponent is present, the sstable is part of a run. All sstables with the same run_identifier belong to the same run. They are guaranteed to be disjoint (non-overlapping) in their partition keys.

large_data_stats subcomponent

large_data_stats = large_data_count large_data_pair*
large_data_count = be32
large_data_pair = large_data_type large_data_stats_entry
large_data_type = partition_size | row_size | cell_size | rows_in_partition | elements_in_collection
    partition_size = be32(1)    // partition size, in bytes
    row_size = be32(2)          // row size, in bytes
    cell_size = be32(3)         // cell size, in bytes
    rows_in_partition = be32(4) // number of rows in a partition
    elements_in_collection = be32(5) // number of elements in a collection
large_data_stats_entry = max_value threshold above_threshold
    max_value = be64
    threshold = be64
    above_threshold = be32

The large_data_stats component holds statistics about partition, row, and cell sizes and about number of rows in partition. For each entry, it keeps the largest value for the entry type, the respective large_data threshold and the number of entities that are above the threshold.

schema subcomponent

schema = table_id table_schema_version keyspace_name table_name column_count column_description*
table_id = uuid
table_schema_version = uuid
uuid = uuid_high_bits uuid_low_bits
uuid_high_bits = be64
uuid_low_bits = be64
keyspace_name = string32
table_name = string32
column_count = be32
column_description = column_kind column_name column_type
column_kind = byte    // 1=partition_key, 2=clustering_key, 3=static_column, 4=regular_column
column_name = string32
column_type = string32    // CQL type name (e.g. "org.apache.cassandra.db.marshal.UTF8Type")
string32 = string32_size byte*
string32_size = be32

The schema subcomponent stores the most important schema fields of the table the sstable belongs to. It serves as an alternative schema source to the one stored in the statistics component, which lacks column names and other metadata. Unlike the full schema stored in the system schema tables, it is not intended to be comprehensive, but it contains enough information for tools like scylla-sstable to parse an sstable in a self-sufficient manner.