Add a new large_data_stats type and entry for keeping the collection_elements_count_threshold and the maximum value of collection_elements. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
143 lines
5.2 KiB
Markdown
143 lines
5.2 KiB
Markdown
# File format of the Scylla.db sstable component
|
|
|
|
The `Scylla.db` component (present in a file named like `mc-223-big-Scylla.db`
|
|
contains assorted Scylla-only metadata. Its presence indicates the sstable was
|
|
created by Scylla (or some Scylla-aware creator). Non-Scylla consumers will ignore it.
|
|
|
|
The file is small and intended to be processed in-memory.
|
|
|
|
## Main structure
|
|
|
|
The main structure is that of an unordered set of subcomponents. Each component
|
|
is prefixed with a be32 tag that indicates its type, and its serialized size
|
|
(so unknown subcomponents can be skipped).
|
|
|
|
scylla_db = subcomponent_count (tag serialized_size subcomponent)*
|
|
subcomponent_count = be32
|
|
serialized_size = be32
|
|
tag = be32
|
|
|
|
## Subcomponents and tag values
|
|
|
|
The following subcomponents are recognized. They are described in more detail
|
|
in individual sections
|
|
|
|
subcomponent = sharding_metadata
|
|
| features
|
|
| extension_attributes
|
|
| run_identifier
|
|
| large_data_stats
|
|
| sstable_origin
|
|
| scylla_build_id
|
|
| scylla_version
|
|
|
|
`sharding_metadata` (tag 1): describes what token sub-ranges are included in this
|
|
sstable. This is used, when loading the sstable, to determine which shard(s)
|
|
it occupies.
|
|
|
|
`features` (tag 2): a set of boolean flags that describe the sstable
|
|
|
|
`extension_attributes` (tag 3): a `map<string, string>` with additional attributes
|
|
|
|
`run_identifier` (tag 4): a uuid that is the same for all sstables in the same run
|
|
(and different for sstables in different runs).
|
|
|
|
`large_data_stats` (tag 5): a `map<large_data_type, large_data_stats_entry>` with statistics
|
|
about large data entities in the sstable.
|
|
|
|
`sstable_origin` (tag 6): a string describing the origin of the
|
|
sstable ("memtable" for memtable flush, "garbage collection" for
|
|
compaction, etc.).
|
|
|
|
`scylla_build_id` (tag 7): a string containing the build id of the
|
|
Scylla executable that created the sstable.
|
|
|
|
`scylla_version` (tag 8): a string containing the version of the
|
|
Scylla executable that created the sstable.
|
|
|
|
## sharding_metadata subcomponent
|
|
|
|
sharding_metadata = token_range_count token_range*
|
|
token_range_count = be32
|
|
token_range = left_token_bound right_token_bound
|
|
left_token_bound = token_bound
|
|
right_token_bound = token_bound
|
|
token_bound = exclusive_flag token
|
|
exclusive_flag = byte // 0=inclusive, 1=exclusive
|
|
token = token_size byte*
|
|
token_size = be16
|
|
|
|
Sharding metadata is a sorted list of disjoint token ranges. Each token range
|
|
consists of a left bound and a right bound; either bound may be inclusive or
|
|
exclusive. The tokens are interpreted according to the partitioner.
|
|
|
|
The sstable contains no partitions whose token is outside the ranges described by
|
|
sharding_metadata.
|
|
|
|
## features subcomponent
|
|
|
|
features = be64 // interpreted as a set of bits
|
|
|
|
bit 0: NonCompoundPIEntries (if set, indicates the sstable was generated by
|
|
Scylla with issue #2993 fixed)
|
|
|
|
bit 1: NonCompoundRangeTombstones (if set, indicates the sstable was generated by
|
|
Scylla with issue #2986 fixed)
|
|
|
|
bit 2: ShadowableTombstones (if set, indicates the sstable was generated by
|
|
Scylla with issue #3885 fixed)
|
|
|
|
bit 3: CorrectStaticCompact (if set, indicates the sstable was generated by
|
|
Scylla with issue #4139 fixed)
|
|
|
|
bit 4: CorrectEmptyCounters (if set, indicates the sstable was generated by
|
|
Scylla with issue #4363 fixed)
|
|
|
|
bit 5: CorrectUDTsInCollections (if set, indicates that the sstable was generated
|
|
by Scylla with issue #6130 fixed)
|
|
|
|
## extension_attributes subcomponent
|
|
|
|
extension_attributes = extension_attribute_count extension_attribute*
|
|
extension_attribute_count = be32
|
|
extension_attribute = extension_attribute_key extension_attribute_value
|
|
extension_attribute_key = string32
|
|
extension_attribute_value = string32
|
|
string32 = string32_size byte*
|
|
string32_size = be32
|
|
|
|
There are currently no defined attributes.
|
|
|
|
## run_identifier subcomponent
|
|
|
|
run_identifier = uuid
|
|
uuid = uuid_high_bits uuid_low_bits
|
|
uuid_high_bits = be64
|
|
uuid_low_bits = be64
|
|
|
|
If the run_identifier subcomponent is present, the sstable is part of a run.
|
|
All sstables with the same run_identifier belong to the same run. They are
|
|
guaranteed to be disjoint (non-overlapping) in their partition keys.
|
|
|
|
## large_data_stats subcomponent
|
|
|
|
large_data_stats = large_data_count large_data_pair*
|
|
large_data_count = be32
|
|
large_data_pair = large_data_type large_data_stats_entry
|
|
large_data_type = partition_size | row_size | cell_size | rows_in_partition | elements_in_collection
|
|
partition_size = be32(1) // partition size, in bytes
|
|
row_size = be32(2) // row size, in bytes
|
|
cell_size = be32(3) // cell size, in bytes
|
|
rows_in_partition = be32(4) // number of rows in a partition
|
|
elements_in_collection = be32(5) // number of elements in a collection
|
|
large_data_stats_entry = max_value threshold above_threshold
|
|
max_value = be64
|
|
threshold = be64
|
|
above_threshold = be32
|
|
|
|
The large_data_stats component holds statistics about partition,
|
|
row, and cell sizes and about number of rows in partition.
|
|
For each entry, it keeps the largest value for the entry type,
|
|
the respective large_data threshold and the number of entities
|
|
that are above the threshold.
|