Files
scylladb/docs/dev/sstable-scylla-format.md
Benny Halevy 54ab038825 sstables: mx/writer: add large_data_type::elements_in_collection
Add a new large_data_stats type and entry for keeping
the collection_elements_count_threshold and the maximum value
of collection_elements.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:41:56 +03:00

143 lines
5.2 KiB
Markdown

# File format of the Scylla.db sstable component
The `Scylla.db` component (present in a file named like `mc-223-big-Scylla.db`
contains assorted Scylla-only metadata. Its presence indicates the sstable was
created by Scylla (or some Scylla-aware creator). Non-Scylla consumers will ignore it.
The file is small and intended to be processed in-memory.
## Main structure
The main structure is that of an unordered set of subcomponents. Each component
is prefixed with a be32 tag that indicates its type, and its serialized size
(so unknown subcomponents can be skipped).
scylla_db = subcomponent_count (tag serialized_size subcomponent)*
subcomponent_count = be32
serialized_size = be32
tag = be32
## Subcomponents and tag values
The following subcomponents are recognized. They are described in more detail
in individual sections
subcomponent = sharding_metadata
| features
| extension_attributes
| run_identifier
| large_data_stats
| sstable_origin
| scylla_build_id
| scylla_version
`sharding_metadata` (tag 1): describes what token sub-ranges are included in this
sstable. This is used, when loading the sstable, to determine which shard(s)
it occupies.
`features` (tag 2): a set of boolean flags that describe the sstable
`extension_attributes` (tag 3): a `map<string, string>` with additional attributes
`run_identifier` (tag 4): a uuid that is the same for all sstables in the same run
(and different for sstables in different runs).
`large_data_stats` (tag 5): a `map<large_data_type, large_data_stats_entry>` with statistics
about large data entities in the sstable.
`sstable_origin` (tag 6): a string describing the origin of the
sstable ("memtable" for memtable flush, "garbage collection" for
compaction, etc.).
`scylla_build_id` (tag 7): a string containing the build id of the
Scylla executable that created the sstable.
`scylla_version` (tag 8): a string containing the version of the
Scylla executable that created the sstable.
## sharding_metadata subcomponent
sharding_metadata = token_range_count token_range*
token_range_count = be32
token_range = left_token_bound right_token_bound
left_token_bound = token_bound
right_token_bound = token_bound
token_bound = exclusive_flag token
exclusive_flag = byte // 0=inclusive, 1=exclusive
token = token_size byte*
token_size = be16
Sharding metadata is a sorted list of disjoint token ranges. Each token range
consists of a left bound and a right bound; either bound may be inclusive or
exclusive. The tokens are interpreted according to the partitioner.
The sstable contains no partitions whose token is outside the ranges described by
sharding_metadata.
## features subcomponent
features = be64 // interpreted as a set of bits
bit 0: NonCompoundPIEntries (if set, indicates the sstable was generated by
Scylla with issue #2993 fixed)
bit 1: NonCompoundRangeTombstones (if set, indicates the sstable was generated by
Scylla with issue #2986 fixed)
bit 2: ShadowableTombstones (if set, indicates the sstable was generated by
Scylla with issue #3885 fixed)
bit 3: CorrectStaticCompact (if set, indicates the sstable was generated by
Scylla with issue #4139 fixed)
bit 4: CorrectEmptyCounters (if set, indicates the sstable was generated by
Scylla with issue #4363 fixed)
bit 5: CorrectUDTsInCollections (if set, indicates that the sstable was generated
by Scylla with issue #6130 fixed)
## extension_attributes subcomponent
extension_attributes = extension_attribute_count extension_attribute*
extension_attribute_count = be32
extension_attribute = extension_attribute_key extension_attribute_value
extension_attribute_key = string32
extension_attribute_value = string32
string32 = string32_size byte*
string32_size = be32
There are currently no defined attributes.
## run_identifier subcomponent
run_identifier = uuid
uuid = uuid_high_bits uuid_low_bits
uuid_high_bits = be64
uuid_low_bits = be64
If the run_identifier subcomponent is present, the sstable is part of a run.
All sstables with the same run_identifier belong to the same run. They are
guaranteed to be disjoint (non-overlapping) in their partition keys.
## large_data_stats subcomponent
large_data_stats = large_data_count large_data_pair*
large_data_count = be32
large_data_pair = large_data_type large_data_stats_entry
large_data_type = partition_size | row_size | cell_size | rows_in_partition | elements_in_collection
partition_size = be32(1) // partition size, in bytes
row_size = be32(2) // row size, in bytes
cell_size = be32(3) // cell size, in bytes
rows_in_partition = be32(4) // number of rows in a partition
elements_in_collection = be32(5) // number of elements in a collection
large_data_stats_entry = max_value threshold above_threshold
max_value = be64
threshold = be64
above_threshold = be32
The large_data_stats component holds statistics about partition,
row, and cell sizes and about number of rows in partition.
For each entry, it keeps the largest value for the entry type,
the respective large_data threshold and the number of entities
that are above the threshold.