mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-20 16:40:35 +00:00
sstables: add LargeDataRecords metadata type (tag 13)
Add a new scylla metadata component LargeDataRecords (tag 13) that stores per-SSTable top-N large data records. Each record carries: - large_data_type (partition_size, row_size, cell_size, etc.) - binary serialized partition key and clustering key - column name (for cell records) - value (size in bytes) - element count (rows or collection elements, type-dependent) - range tombstones and dead rows (partition records only) The struct uses disk_string<uint32_t> for key/name fields and is serialized via the existing describe_type framework into the SSTable Scylla metadata component. Add JSON support in scylla-sstable and format documentation.
This commit is contained in:
@@ -33,6 +33,7 @@ in individual sections
|
||||
| ext_timestamp_stats
|
||||
| schema
|
||||
| components_digests
|
||||
| large_data_records
|
||||
|
||||
`sharding_metadata` (tag 1): describes what token sub-ranges are included in this
|
||||
sstable. This is used, when loading the sstable, to determine which shard(s)
|
||||
@@ -80,6 +81,11 @@ all SSTable component files that are checksummed during write. Each entry maps a
|
||||
type (e.g., Data, Index, Filter, Statistics, etc.) to its CRC32 checksum. This allows
|
||||
verifying the integrity of individual component files.
|
||||
|
||||
`large_data_records` (tag 13): an `array<large_data_record>` with the top-N individual large
|
||||
data entries (partitions, rows, cells) found during the sstable write. Unlike `large_data_stats`
|
||||
which only stores aggregate statistics, this records the actual keys and sizes so they survive
|
||||
tablet/shard migration.
|
||||
|
||||
The [scylla sstable dump-scylla-metadata](https://github.com/scylladb/scylladb/blob/master/docs/operating-scylla/admin-tools/scylla-sstable.rst#dump-scylla-metadata) tool
|
||||
can be used to dump the scylla metadata in JSON format.
|
||||
|
||||
@@ -203,3 +209,35 @@ in the statistics component, which lacks column names and other metadata. Unlike
|
||||
the full schema stored in the system schema tables, it is not intended to be
|
||||
comprehensive, but it contains enough information for tools like scylla-sstable
|
||||
to parse an sstable in a self-sufficient manner.
|
||||
|
||||
## large_data_records subcomponent
|
||||
|
||||
large_data_records = record_count large_data_record*
|
||||
record_count = be32
|
||||
large_data_record = large_data_type partition_key clustering_key column_name value elements_count range_tombstones dead_rows
|
||||
large_data_type = be32 // same enum as in large_data_stats
|
||||
partition_key = string32 // binary serialized partition key (sstables::key::get_bytes())
|
||||
clustering_key = string32 // binary serialized clustering key (clustering_key_prefix::representation()), empty if N/A
|
||||
column_name = string32 // column name as text, empty for partition/row entries
|
||||
value = be64 // size in bytes (partition, row, or cell size depending on type)
|
||||
elements_count = be64 // type-dependent element count (see below)
|
||||
range_tombstones = be64 // number of range tombstones (partition_size records only, 0 otherwise)
|
||||
dead_rows = be64 // number of dead rows (partition_size records only, 0 otherwise)
|
||||
string32 = string32_size byte*
|
||||
string32_size = be32
|
||||
|
||||
The large_data_records component holds individual top-N large data entries
|
||||
(partitions, rows, cells) found during the sstable write. Unlike large_data_stats,
|
||||
which only stores aggregate per-type statistics (max value, threshold, count above
|
||||
threshold), large_data_records preserves the actual partition key, clustering key,
|
||||
column name, and size for each above-threshold entry. This information is embedded
|
||||
in the sstable file itself and therefore survives tablet/shard migration.
|
||||
|
||||
The elements_count field carries a type-dependent element count:
|
||||
|
||||
- For partition_size and rows_in_partition records: number of rows in the partition
|
||||
- For cell_size and elements_in_collection records: number of elements in the collection (0 for non-collection cells)
|
||||
- For row_size records: 0
|
||||
|
||||
The range_tombstones and dead_rows fields are meaningful only for
|
||||
partition_size records and are zero for all other record types.
|
||||
|
||||
Reference in New Issue
Block a user