Files

Geoff Montee 0eb5603ebd Docs: describe the system tables

Fixes issue #12818 with the following docs changes:

docs/dev/system_keyspace.md: Added missing system tables, added table of contents (TOC), added categories

Closes scylladb/scylladb#27789

2026-03-04 08:55:43 +02:00

64 KiB

Raw Permalink Blame History

System Keyspace Layout

This section describes the layouts and usage of system.* tables.

System Table Categories
system.IndexInfo
system.batchlog
system.batchlog_v2
system.built_views
system.cdc_generations_v3
system.cdc_local
system.cdc_streams
system.cdc_streams_history
system.cdc_streams_state
system.client_routes
system.clients
system.cluster_status
system.commitlog_cleanups
system.compaction_history
system.config
system.corrupt_data
system.dicts
system.discovery
system.group0_history
The system.large_* tables
system.large_cells
system.large_partitions
system.large_rows
system.load_per_node
system.local
system.paxos
system.peers
system.protocol_servers
system.raft
system.raft_snapshot_config
system.raft_snapshots
system.raft_state
system.repair_history
system.repair_tasks
system.role_attributes
system.role_members
system.role_permissions
system.roles
system.runtime_info
system.scylla_local
system.scylla_views_builds_in_progress
system.service_levels_v2
system.size_estimates
system.snapshots
system.sstables
system.sstables_registry
system.tablet_sizes
system.tablets
system.token_ring
system.topology
system.topology_requests
system.truncated
system.versions
system.view_build_status_v2
system.view_building_tasks
system.views_builds_in_progress

System Table Categories

Category	System Tables
Core Cluster Information	local, peers, scylla_local
Topology and Cluster Management	topology, topology_requests, tablets, tablet_sizes, cluster_status, token_ring, load_per_node, commitlog_cleanups, discovery
Raft Consensus	raft, raft_state, raft_snapshots, raft_snapshot_config, group0_history
CDC (Change Data Capture)	cdc_local, cdc_streams, cdc_streams_history, cdc_streams_state, cdc_generations_v3
Materialized Views	views_builds_in_progress, scylla_views_builds_in_progress, built_views, view_build_status_v2, view_building_tasks
Lightweight Transactions (Paxos)	paxos
Compaction	compaction_history
Repair	repair_history, repair_tasks
Diagnostic and Monitoring	large_partitions, large_rows, large_cells, clients, client_routes, corrupt_data, protocol_servers, runtime_info
Configuration	config, versions
Batchlog	batchlog, batchlog_v2
Truncation	truncated
Service Levels (QoS)	service_levels_v2
Role-Based Access Control (RBAC)	roles, role_members, role_attributes, role_permissions
Storage	snapshots, size_estimates, sstables, sstables_registry, dicts
Legacy/Compatibility	IndexInfo

system.IndexInfo

Legacy table used for secondary index metadata. This table exists for Cassandra compatibility but is largely unused in modern ScyllaDB versions.

Schema:

CREATE TABLE system."IndexInfo" (
    table_name text,
    index_name text,
    PRIMARY KEY (table_name, index_name)
) WITH CLUSTERING ORDER BY (index_name ASC);

Note: This table name requires quoting due to the capital letters.

system.batchlog

Stores batches that have been written locally but not yet replicated to other nodes. Used to ensure batch atomicity across replicas.

Schema:

CREATE TABLE system.batchlog (
    id uuid PRIMARY KEY,
    data blob,
    version int,
    written_at timestamp
);

Columns:

id: Unique identifier for the batch
data: Serialized batch mutations
version: Protocol version used to serialize the batch
written_at: Timestamp when the batch was written locally

system.batchlog_v2

Stores batches that have been written locally but not yet replicated to other nodes. This is the newer version of batchlog with improved clustering and sharding. Used to ensure batch atomicity across replicas.

Schema:

CREATE TABLE system.batchlog_v2 (
    version int,
    stage tinyint,
    shard int,
    written_at timestamp,
    id uuid,
    data blob,
    PRIMARY KEY ((version, stage, shard), written_at, id)
) WITH CLUSTERING ORDER BY (written_at ASC, id ASC);

Columns:

version: Protocol version (part of composite partition key)
stage: Batch processing stage (part of composite partition key)
shard: Shard number (part of composite partition key)
written_at: Timestamp when the batch was written locally (part of composite clustering key)
id: Unique identifier for the batch (part of composite clustering key)
data: Serialized batch mutations

Related tables: batchlog

system.built_views

Tracks which materialized views have completed their initial build process.

Schema:

CREATE TABLE system.built_views (
    keyspace_name text,
    view_name text,
    PRIMARY KEY (keyspace_name, view_name)
) WITH CLUSTERING ORDER BY (view_name ASC);

Columns:

keyspace_name: Name of the keyspace containing the view
view_name: Name of the materialized view

system.cdc_generations_v3

Stores CDC generation data. Each CDC generation defines which streams are active during a particular time period. This is the current version of CDC generation storage, replacing earlier versions.

Schema:

CREATE TABLE system.cdc_generations_v3 (
    key text,
    id timeuuid,
    range_end bigint,
    streams frozen<set<blob>>,
    ignore_msb tinyint,
    PRIMARY KEY (key, id, range_end)
) WITH CLUSTERING ORDER BY (id ASC, range_end ASC);

Columns:

key: Partition key (typically "cdc_generations")
id: Unique identifier for this CDC generation
range_end: End of the token range described by this row
streams: Set of stream IDs mapped to this token range
ignore_msb: Number of most significant bits to ignore in token-to-shard mapping (sharding parameter of the node that owned this range when the generation was created)

How it works: This is a single-partition table. Each generation describes a mapping from all tokens in the token ring to stream IDs. The mapping is built from smaller range-local mappings, where each row describes how tokens in a subrange map to stream IDs. The range_end is the end of the range for this row; the start is the range_end of the previous row (or wraps around for the first row with a given id).

Related tables: cdc_local, cdc_streams, system_distributed.cdc_timestamps

system.cdc_local

Stores local CDC state information for this node.

Schema:

CREATE TABLE system.cdc_local (
    key text PRIMARY KEY,
    streams_timestamp timestamp,
    uuid uuid
);

Columns:

key: Always set to "cdc_local"
streams_timestamp: Timestamp of the currently active streams generation on this node
uuid: UUID of the CDC generation (may be null for older generations that only have timestamp)

Related tables: cdc_generations_v3, cdc_streams

system.cdc_streams

Virtual table containing CDC stream mappings for tablet-based tables. Each stream corresponds to a tablet and is used for distributing CDC log entries.

Schema:

CREATE TABLE system.cdc_streams (
    keyspace_name text,
    table_name text,
    timestamp timestamp,
    stream_state tinyint,
    stream_id blob,
    PRIMARY KEY ((keyspace_name, table_name), timestamp, stream_state, stream_id)
) WITH CLUSTERING ORDER BY (timestamp ASC, stream_state ASC, stream_id ASC);

Columns:

keyspace_name: Name of the keyspace
table_name: Name of the table
timestamp: Timestamp identifying the CDC generation
stream_state: State of the stream (0=current, 1=closed, 2=opened)
stream_id: CDC stream identifier

Related tables: cdc_streams_history, cdc_streams_state, cdc_generations_v3

system.cdc_streams_history

Historical record of CDC stream configurations for tablet-based tables, described as differences from the previous state.

Schema:

CREATE TABLE system.cdc_streams_history (
    table_id uuid,
    timestamp timestamp,
    stream_state tinyint,
    last_token bigint,
    stream_id blob,
    PRIMARY KEY (table_id, timestamp, stream_state, last_token)
) WITH CLUSTERING ORDER BY (timestamp ASC, stream_state ASC, last_token ASC);

Columns:

table_id: UUID of the table
timestamp: Timestamp of this stream configuration change
stream_state: State of the stream change (e.g., added, removed)
last_token: Last token of the tablet
stream_id: CDC stream identifier for this tablet

Related tables: cdc_streams

system.cdc_streams_state

Tracks the state of CDC streams for tablet-based tables. This is the oldest CDC stream set for tablets.

Schema:

CREATE TABLE system.cdc_streams_state (
    table_id uuid,
    last_token bigint,
    stream_id blob,
    timestamp timestamp STATIC,
    PRIMARY KEY (table_id, last_token)
) WITH CLUSTERING ORDER BY (last_token ASC);

Columns:

table_id: UUID of the table
last_token: Last token of the tablet
stream_id: CDC stream identifier for this tablet
timestamp: (static) Timestamp of the CDC stream set

Related tables: cdc_streams, cdc_generations_v3

Note: CDC timestamp tracking is handled by the system_distributed.cdc_timestamps table, not in the system keyspace.

system.client_routes

Stores client connection routing information for each connection ID.

Schema:

CREATE TABLE system.client_routes (
    connection_id text,
    host_id uuid,
    address text,
    port int,
    tls_port int,
    alternator_port int,
    alternator_https_port int,
    PRIMARY KEY (connection_id, host_id)
) WITH CLUSTERING ORDER BY (host_id ASC);

Columns:

connection_id: Unique identifier for the connection
host_id: UUID of the host node (clustering key)
address: IP address of the host
port: CQL port number
tls_port: TLS-enabled CQL port number
alternator_port: Alternator (DynamoDB-compatible) port number
alternator_https_port: HTTPS port for Alternator

system.clients

Virtual table that provides information about currently connected CQL clients.

Schema:

CREATE TABLE system.clients (
    address inet,
    port int,
    client_type text,
    connection_stage text,
    driver_name text,
    driver_version text,
    hostname text,
    protocol_version int,
    shard_id int,
    ssl_cipher_suite text,
    ssl_enabled boolean,
    ssl_protocol text,
    username text,
    scheduling_group text,
    client_options map<text, text>,
    PRIMARY KEY (address, port, client_type)
) WITH CLUSTERING ORDER BY (port ASC, client_type ASC);

Columns:

address: Client IP address
port: Client connection port
client_type: Type of client connection (currently only CQL clients are tracked)
connection_stage: Current stage of the connection
driver_name: Name of the client driver being used
driver_version: Version of the client driver
hostname: Client hostname if available
protocol_version: CQL native protocol version used
shard_id: ScyllaDB shard handling this connection
ssl_cipher_suite: SSL cipher suite if SSL is enabled
ssl_enabled: Whether SSL/TLS is enabled for this connection
ssl_protocol: SSL/TLS protocol version
username: Authenticated username
scheduling_group: Workload prioritization group
client_options: Map of additional client options and configuration parameters. Stores all data sent by the client in the STARTUP request. This column is useful for debugging and monitoring purposes

Note: This is a virtual table. This table was previously persisted to disk (in the data directory) before and including version 4.5.

system.cluster_status

Virtual table containing information about the status of each endpoint in the cluster. Equivalent to the nodetool status command.

Schema:

CREATE TABLE system.cluster_status (
    peer inet PRIMARY KEY,
    dc text,
    up boolean,
    status text,
    load text,
    tokens int,
    owns float,
    host_id uuid
);

Columns:

peer: IP address of the node
dc: Data center name
up: Whether the node is currently up
status: Node status (e.g., UN for Up/Normal)
load: Amount of data stored on the node
tokens: Number of tokens owned
owns: Percentage of the ring owned by this node
host_id: Unique identifier for the node

Implemented by cluster_status_table in db/system_keyspace.cc.

system.commitlog_cleanups

Tracks commitlog cleanup operations, used to ensure durability guarantees are met.

Schema:

CREATE TABLE system.commitlog_cleanups (
    shard int,
    position frozen<tuple<bigint, int>>,
    table_uuid uuid,
    start_token_exclusive bigint,
    end_token_inclusive bigint,
    PRIMARY KEY (shard, position, table_uuid, start_token_exclusive, end_token_inclusive)
) WITH CLUSTERING ORDER BY (position ASC, table_uuid ASC, start_token_exclusive ASC, end_token_inclusive ASC);

Columns:

shard: The shard (CPU) that performed the cleanup
position: Position in the commitlog (tuple of segment ID and position within segment)
table_uuid: UUID of the table being cleaned up
start_token_exclusive: Start of the token range (exclusive)
end_token_inclusive: End of the token range (inclusive)

system.compaction_history

Records historical information about compaction operations.

Schema:

CREATE TABLE system.compaction_history (
    id uuid PRIMARY KEY,
    bytes_in bigint,
    bytes_out bigint,
    columnfamily_name text,
    started_at timestamp,
    compacted_at timestamp,
    compaction_type text,
    keyspace_name text,
    rows_merged map<int, bigint>,
    shard_id int,
    sstables_in list<frozen<sstableinfo>>,
    sstables_out list<frozen<sstableinfo>>,
    total_tombstone_purge_attempt bigint,
    total_tombstone_purge_failure_due_to_overlapping_with_memtable bigint,
    total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable bigint
) WITH default_time_to_live = 604800;

User-Defined Type:

CREATE TYPE system.sstableinfo (
    generation int,
    level int,
    size bigint,
    version text
);

Columns:

id: Unique identifier for the compaction
bytes_in: Total bytes read during compaction
bytes_out: Total bytes written during compaction
columnfamily_name: Table that was compacted
started_at: Timestamp when compaction started
compacted_at: Timestamp when compaction completed
compaction_type: Type of compaction (e.g., COMPACTION, MAJOR, RESHAPE)
keyspace_name: Keyspace containing the compacted table
rows_merged: Map showing how many rows were merged at each level
shard_id: Shard that performed the compaction
sstables_in: List of input SSTables
sstables_out: List of output SSTables
total_tombstone_purge_attempt: Number of tombstones attempted to be purged
total_tombstone_purge_failure_due_to_overlapping_with_memtable: Tombstones not purged due to memtable overlap
total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable: Tombstones not purged due to SSTable overlap

Note: Entries in this table have a TTL of 7 days (604800 seconds).

system.config

Virtual table holding all configuration variables currently in use. Values can be modified at runtime using the UPDATE statement for configuration options that support live updates. Updates are not durable. Restarting the node will reset options to their previous values unless the physical configuration file has also been updated.

Schema:

CREATE TABLE system.config (
    name text PRIMARY KEY,
    source text,
    type text,
    value text
);

Columns:

name: Name of the configuration parameter
source: Origin of the configuration value, one of:
- default: The default value (not explicitly configured)
- config: Set via configuration file
- cli: Set via command-line argument
- cql: Set via updating this table
- internal: Deliberately configured by ScyllaDB internals
type: Data type of the configuration parameter (e.g., string, bool, integer)
value: Current value in JSON format (as it would appear in the config file)

Usage:

-- View a specific setting
SELECT * FROM system.config WHERE name = 'compaction_throughput_mb_per_sec';

-- Update a live-updateable setting
UPDATE system.config SET value = '64' WHERE name = 'compaction_throughput_mb_per_sec';

Related tables: versions

system.corrupt_data

Stores data found to be corrupt during internal operations. This data cannot be written to SSTables because it would be spread around by repair and compaction and might cause failures in SSTable parsing. The table preserves corrupt data so it can be inspected and possibly restored by the database operator.

Schema:

CREATE TABLE system.corrupt_data (
    keyspace_name text,
    table_name text,
    id timeuuid,
    partition_key blob,
    clustering_key text,
    mutation_fragment_kind text,
    frozen_mutation_fragment blob,
    origin text,
    sstable_name text,
    PRIMARY KEY ((keyspace_name, table_name), id)
) WITH CLUSTERING ORDER BY (id ASC)
    AND gc_grace_seconds = 0;

Columns:

keyspace_name: Keyspace name of the source table
table_name: Table name of the source table
id: Unique identifier assigned when the corrupt data entry is created
partition_key: Partition key of the partition in the source table (can be incomplete or null due to corruption)
clustering_key: Clustering key of the mutation fragment (can be null for some mutation fragment kinds, or incomplete due to corruption)
mutation_fragment_kind: Kind of the mutation fragment, one of:
- partition start
- partition end
- static row
- clustering row
- range tombstone change (only clustering row and range tombstone change can have clustering_key set)
frozen_mutation_fragment: The serialized mutation fragment itself
origin: The name of the process that found the corruption (e.g., sstable-writer)
sstable_name: The name of the SSTable that contains the corrupt data, if known (SSTable may have been compacted or deleted)

system.dicts

Stores shared compression dictionaries used for inter-node traffic compression when internode_compression_enable_advanced is enabled.

Schema:

CREATE TABLE system.dicts (
    name text PRIMARY KEY,
    timestamp timestamp,
    origin uuid,
    data blob
);

Columns:

name: Unique name/identifier for the dictionary
timestamp: When the dictionary was created
origin: UUID of the node that created the dictionary
data: Compressed dictionary data

Note: Dictionary training can leak unencrypted data to disk. See internode_compression_train_dictionaries configuration option.

system.discovery

Used during cluster discovery to find peer nodes. This table is part of the Raft-based topology management.

Schema:

CREATE TABLE system.discovery (
    key text,
    ip_addr inet,
    raft_server_id uuid,
    PRIMARY KEY (key, ip_addr)
) WITH CLUSTERING ORDER BY (ip_addr ASC);

Columns:

key: Partition key (typically "peers")
ip_addr: IP address of a discovered peer (clustering column)
raft_server_id: The ID of the group 0 Raft server on that peer (may be UUID 0 if unknown during discovery)

Related tables: topology

system.group0_history

Records the history of Raft Group 0 state changes. Group 0 is the central Raft group used for topology and schema coordination.

Schema:

CREATE TABLE system.group0_history (
    key text,
    state_id timeuuid,
    description text,
    PRIMARY KEY (key, state_id)
) WITH CLUSTERING ORDER BY (state_id DESC);

Columns:

key: Partition key (typically "history")
state_id: Identifier for this state change (in descending order)
description: Human-readable description of the state change

Related tables: raft, raft_state

The `system.large_*` tables

ScyllaDB performs better when partitions, rows, and cells are not too large. To help diagnose cases where these grow too large, ScyllaDB keeps three tables that record large partitions (including those with too many rows), rows, and cells, respectively.

The meaning of an entry in each of these tables is similar. It means that there is a particular SSTable with a large partition, row, cell, or a partition with too many rows. In particular, this implies that:

There is no entry until compaction aggregates enough data in a single SSTable.
The entry stays around until the SSTable is deleted.

In addition, the entries also have a TTL of 30 days.

system.large_cells

Tracks large cells detected in SSTables. Used to help diagnose performance issues caused by oversized cells.

Schema:

CREATE TABLE system.large_cells (
    keyspace_name text,
    table_name text,
    sstable_name text,
    cell_size bigint,
    partition_key text,
    clustering_key text,
    column_name text,
    compaction_time timestamp,
    collection_elements bigint,
    PRIMARY KEY ((keyspace_name, table_name), sstable_name, cell_size, partition_key, clustering_key, column_name)
) WITH CLUSTERING ORDER BY (sstable_name ASC, cell_size DESC, partition_key ASC, clustering_key ASC, column_name ASC);

Columns:

keyspace_name: Keyspace containing the large cell
table_name: Table containing the large cell
sstable_name: SSTable where the cell was detected
cell_size: Size of the cell in bytes
partition_key: Partition key of the row
clustering_key: Clustering key of the row
column_name: Name of the column containing the large cell
compaction_time: When the large cell was detected during compaction
collection_elements: Number of elements in the collection (if the cell contains a collection type)

Note: A collection is just one cell. There is no information about the size of each collection element. This table is currently only used with the MC SSTable format (issue #4868). Entries have a TTL of 30 days.

Related tables: large_partitions, large_rows

Example usage

-- View all large cells
SELECT * FROM system.large_cells;

-- View large cells for a specific table
SELECT * FROM system.large_cells WHERE keyspace_name = 'ks1' AND table_name = 'standard1';

system.large_partitions

Tracks large partitions detected in SSTables, including partitions with too many rows. ScyllaDB performs better if partitions are not too large.

Schema:

CREATE TABLE system.large_partitions (
    keyspace_name text,
    table_name text,
    sstable_name text,
    partition_size bigint,
    partition_key text,
    range_tombstones bigint,
    dead_rows bigint,
    rows bigint,
    compaction_time timestamp,
    PRIMARY KEY ((keyspace_name, table_name), sstable_name, partition_size, partition_key)
) WITH CLUSTERING ORDER BY (sstable_name ASC, partition_size DESC, partition_key ASC);

Columns:

keyspace_name: Keyspace containing the large partition
table_name: Table containing the large partition
sstable_name: SSTable where the partition was detected
partition_size: Size of the partition in bytes
partition_key: The partition key value
range_tombstones: Number of range tombstones in the partition
dead_rows: Number of tombstoned (deleted) rows
rows: Total number of rows in the partition
compaction_time: When the large partition was detected during compaction

Important notes:

There is no entry until compaction aggregates enough data in a single SSTable
The entry stays around until the SSTable is deleted
Entries have a TTL of 30 days

Related tables: large_rows, large_cells

Example usage

-- View all large partitions
SELECT * FROM system.large_partitions;

-- View large partitions for a specific table
SELECT * FROM system.large_partitions WHERE keyspace_name = 'ks1' AND table_name = 'standard1';

system.large_rows

Tracks large clustering rows and static rows detected in SSTables.

Schema:

CREATE TABLE system.large_rows (
    keyspace_name text,
    table_name text,
    sstable_name text,
    row_size bigint,
    partition_key text,
    clustering_key text,
    compaction_time timestamp,
    PRIMARY KEY ((keyspace_name, table_name), sstable_name, row_size, partition_key, clustering_key)
) WITH CLUSTERING ORDER BY (sstable_name ASC, row_size DESC, partition_key ASC, clustering_key ASC);

Columns:

keyspace_name: Keyspace containing the large row
table_name: Table containing the large row
sstable_name: SSTable where the row was detected
row_size: Size of the row in bytes
partition_key: Partition key of the row
clustering_key: Clustering key of the row (empty for static rows)
compaction_time: When the large row was detected during compaction

Note: This table is currently only used with the MC SSTable format (issue #4868). Entries have a TTL of 30 days.

Related tables: large_partitions, large_cells

Example usage

-- View all large rows
SELECT * FROM system.large_rows;

-- View large rows for a specific table
SELECT * FROM system.large_rows WHERE keyspace_name = 'ks1' AND table_name = 'standard1';

system.load_per_node

Virtual table containing information about the current tablet load with node granularity. Can be queried on any node, but the data comes from the group0 leader. Reads wait for group0 leader to be elected and load balancer stats to become available.

Schema:

CREATE TABLE system.load_per_node (
    node uuid PRIMARY KEY,
    dc text,
    effective_capacity bigint,
    rack text,
    ip inet,
    up boolean,
    tablets_allocated bigint,
    tablets_allocated_per_shard double,
    storage_capacity bigint,
    storage_allocated_load bigint,
    storage_allocated_utilization double,
    storage_load bigint,
    storage_utilization double
);

Columns:

node: Host ID of the node
dc: Data center name
effective_capacity: Sum of available disk space and tablet sizes on a node. Used to compute load on a node for size based balancing
rack: Rack name
ip: IP address of the node
up: Whether the node is alive according to gossiper
tablets_allocated: Number of tablet replicas on the node (migrating tablets are counted as if migration already finished)
tablets_allocated_per_shard: tablets_allocated divided by shard count on the node
storage_capacity: Total disk capacity in bytes (by default equals file system capacity)
storage_allocated_load: Estimated disk space for tablets based on the configured target_tablet_size (not actual current size)
storage_allocated_utilization: Fraction of node's disk capacity used for storage_allocated_load (1.0 means full utilization)
storage_load: Disk space allocated for tablets, computed with actual tablet sizes. Can be null if some of the tablet sizes are not known
storage_utilization: Fraction of node's disk capacity taken for storage_load (with actual tablet sizes), where 1.0 means full utilization. Can be null if some of the tablet sizes are not known

Related tables: tablets, tablet_sizes

system.local

Stores information about the local node. This is one of the most important system tables, containing essential cluster and node metadata.

Schema:

CREATE TABLE system.local (
    key text PRIMARY KEY,
    bootstrapped text,
    broadcast_address inet,
    cluster_name text,
    cql_version text,
    data_center text,
    gossip_generation int,
    host_id uuid,
    listen_address inet,
    native_protocol_version text,
    partitioner text,
    rack text,
    release_version text,
    rpc_address inet,
    schema_version uuid,
    supported_features text,
    tokens set<text>,
    truncated_at map<uuid, blob>
);

Key Columns:

key: Always set to "local"
bootstrapped: Bootstrap status (COMPLETED, IN_PROGRESS, NEEDS_BOOTSTRAP)
broadcast_address: Address this node broadcasts to other nodes
cluster_name: Name of the cluster
cql_version: CQL version supported
data_center: Data center this node belongs to
host_id: Unique identifier for this node
native_protocol_version: Native CQL protocol version
partitioner: Partitioner class being used
rack: Rack this node belongs to
release_version: ScyllaDB release version
schema_version: Current schema version UUID
tokens: Set of tokens owned by this node
truncated_at: Legacy truncation records (see truncated for current storage)

Related tables: peers, scylla_local

system.paxos

Stores state for Paxos-based lightweight transactions (LWT). Each row represents the Paxos state for a particular partition in a user table.

Schema:

CREATE TABLE system.paxos (
    row_key blob,
    cf_id uuid,
    promise timeuuid,
    most_recent_commit blob,
    most_recent_commit_at timeuuid,
    proposal blob,
    proposal_ballot timeuuid,
    PRIMARY KEY (row_key, cf_id)
) WITH CLUSTERING ORDER BY (cf_id ASC);

Columns:

row_key: Partition key of the user table row
cf_id: UUID of the user table
promise: Ballot of the most recent promise (previously named in_progress_ballot in older versions)
most_recent_commit: The most recently committed value (serialized frozen_mutation)
most_recent_commit_at: Ballot of the most recent commit
proposal: The current proposal value (serialized frozen_mutation)
proposal_ballot: Ballot of the current proposal

system.peers

Stores information about other nodes in the cluster as seen by this node.

Schema:

CREATE TABLE system.peers (
    peer inet PRIMARY KEY,
    data_center text,
    host_id uuid,
    preferred_ip inet,
    rack text,
    release_version text,
    rpc_address inet,
    schema_version uuid,
    supported_features text,
    tokens set<text>
);

Columns:

peer: IP address of the peer node (primary key)
data_center: Data center of the peer
host_id: Unique identifier for the peer node
preferred_ip: Preferred IP address for inter-node communication
rack: Rack of the peer
release_version: ScyllaDB version running on the peer
rpc_address: Address for client RPC connections
schema_version: Schema version on the peer
supported_features: Features supported by this peer
tokens: Set of tokens owned by this peer

Related tables: local

system.protocol_servers

Virtual table listing all client-facing data-plane protocol servers and their listen addresses. Equivalent of nodetool statusbinary plus the "Native Transport active" fields from nodetool info.

Schema:

CREATE TABLE system.protocol_servers (
    name text PRIMARY KEY,
    listen_addresses frozen<list<text>>,
    protocol text,
    protocol_version text
);

Columns:

name: Name/alias of the server (e.g., "native" for the CQL server)
listen_addresses: List of addresses this server listens on
protocol: Name of the protocol this server serves (e.g., "cql")
protocol_version: Version of the protocol this server understands

Implemented by protocol_servers_table in db/system_keyspace.cc.

system.raft

Core Raft consensus state table. Holds information about Raft groups including the log entries.

Schema:

CREATE TABLE system.raft (
    group_id timeuuid,
    index bigint,
    term bigint,
    data blob,
    vote_term bigint STATIC,
    vote uuid STATIC,
    snapshot_id uuid STATIC,
    commit_idx bigint STATIC,
    PRIMARY KEY (group_id, index)
) WITH CLUSTERING ORDER BY (index ASC);

Columns:

group_id: Identifier for the Raft group
index: Log entry index
term: Raft term for this log entry
data: Serialized log entry data
vote_term: (static) Term of the last vote
vote: (static) Server ID that received the vote
snapshot_id: (static) ID of the current snapshot
commit_idx: (static) Index of the last committed entry

Related tables: raft_state, raft_snapshots, raft_snapshot_config, group0_history

system.raft_snapshot_config

Stores configuration for Raft snapshots.

Schema:

CREATE TABLE system.raft_snapshot_config (
    group_id timeuuid,
    disposition text,
    server_id uuid,
    can_vote boolean,
    PRIMARY KEY (group_id, disposition, server_id)
) WITH CLUSTERING ORDER BY (disposition ASC, server_id ASC);

Columns:

group_id: Identifier for the Raft group
disposition: Configuration disposition (can be CURRENT or PREVIOUS)
server_id: ID of the server in the snapshot configuration
can_vote: Whether this server can vote in the Raft group

Related tables: raft, raft_snapshots

system.raft_snapshots

Stores Raft snapshot data.

Schema:

CREATE TABLE system.raft_snapshots (
    group_id timeuuid PRIMARY KEY,
    snapshot_id uuid,
    idx bigint,
    term bigint
);

Columns:

group_id: Identifier for the Raft group
snapshot_id: Snapshot identifier
idx: Log index of the snapshot
term: Raft term of the snapshot

Related tables: raft, raft_snapshot_config

system.raft_state

Tracks the state of Raft groups.

Schema:

CREATE TABLE system.raft_state (
    group_id timeuuid,
    disposition text,
    server_id uuid,
    can_vote boolean,
    PRIMARY KEY (group_id, disposition, server_id)
) WITH CLUSTERING ORDER BY (disposition ASC, server_id ASC);

Columns:

group_id: Identifier for the Raft group
disposition: Current disposition/state of the configuration ("CURRENT" or "PREVIOUS")
server_id: ID of the server in this Raft group
can_vote: Whether this server can vote in this configuration

Related tables: raft, group0_history

system.repair_history

Records the history of repair operations.

Schema:

CREATE TABLE system.repair_history (
    table_uuid uuid,
    repair_time timestamp,
    repair_uuid uuid,
    range_start bigint,
    range_end bigint,
    keyspace_name text STATIC,
    table_name text STATIC,
    PRIMARY KEY (table_uuid, repair_time, repair_uuid, range_start, range_end)
) WITH CLUSTERING ORDER BY (repair_time DESC, repair_uuid ASC, range_start ASC, range_end ASC);

Columns:

table_uuid: UUID of the repaired table
repair_time: When the repair was completed
repair_uuid: Unique identifier for the repair operation
range_start: Start of the repaired token range (as bigint)
range_end: End of the repaired token range (as bigint)
keyspace_name: (static) Keyspace containing the repaired table
table_name: (static) Name of the repaired table

system.repair_tasks

Records tablet repair tasks and their progress.

Schema:

CREATE TABLE system.repair_tasks (
    task_uuid uuid,
    operation text,
    first_token bigint,
    last_token bigint,
    timestamp timestamp,
    table_uuid uuid STATIC,
    PRIMARY KEY (task_uuid, operation, first_token, last_token)
) WITH CLUSTERING ORDER BY (operation ASC, first_token ASC, last_token ASC);

Columns:

task_uuid: Unique identifier for the repair task
operation: Type of repair operation (clustering key)
first_token: First token of the tablet (clustering key)
last_token: Last token of the tablet (clustering key)
timestamp: When the repair task was recorded
table_uuid: (static) UUID of the table being repaired

system.role_attributes

Stores additional attributes for roles (e.g., service level associations).

Schema:

CREATE TABLE system.role_attributes (
    role text,
    name text,
    value text,
    PRIMARY KEY (role, name)
) WITH CLUSTERING ORDER BY (name ASC);

Columns:

role: Name of the role
name: Attribute name
value: Attribute value

Related tables: roles, role_members, role_permissions

system.role_members

Stores role membership information (which roles are members of other roles).

Schema:

CREATE TABLE system.role_members (
    role text,
    member text,
    PRIMARY KEY (role, member)
) WITH CLUSTERING ORDER BY (member ASC);

Columns:

role: Name of the parent role
member: Name of the member role

Related tables: roles, role_attributes, role_permissions

system.role_permissions

Stores permissions granted to roles.

Schema:

CREATE TABLE system.role_permissions (
    role text,
    resource text,
    permissions set<text>,
    PRIMARY KEY (role, resource)
) WITH CLUSTERING ORDER BY (resource ASC);

Columns:

role: Name of the role
resource: Resource the permissions apply to (e.g., data/keyspace/table)
permissions: Set of permissions granted (e.g., SELECT, MODIFY, CREATE)

Related tables: roles, role_members, role_attributes

system.roles

Stores role definitions for authentication and authorization.

Schema:

CREATE TABLE system.roles (
    role text PRIMARY KEY,
    can_login boolean,
    is_superuser boolean,
    member_of set<text>,
    salted_hash text
);

Columns:

role: Name of the role (primary key)
can_login: Whether this role can be used for authentication
is_superuser: Whether this role has superuser privileges
member_of: Set of roles this role is a member of
salted_hash: Hashed password for authentication

Related tables: role_members, role_permissions, role_attributes

system.runtime_info

Virtual table containing runtime-specific information such as memory stats, memtable stats, cache stats, and more. Data is grouped so that related items stay together and are easily queried. Roughly equivalent to nodetool info, nodetool gettraceprobability, and nodetool statusgossip commands.

Schema:

CREATE TABLE system.runtime_info (
    group text,
    item text,
    value text,
    PRIMARY KEY (group, item)
) WITH CLUSTERING ORDER BY (item ASC);

Columns:

group: Category/group of the runtime information
item: Name of the metric or setting
value: Current value

Implemented by runtime_info_table in db/system_keyspace.cc.

system.scylla_local

Stores ScyllaDB-specific local node information that extends beyond the Cassandra-compatible system.local table.

Schema:

CREATE TABLE system.scylla_local (
    key text PRIMARY KEY,
    value text
);

Columns:

key: Setting or property name
value: The value (typically JSON or plain text)

Common keys:

cluster_supports_upgrading_from_version: Minimum version the cluster supports upgrade from
enabled_features: Features enabled on this node
raft_group0_id: ID of the Raft Group 0
topology_state_machine_version: Version of the topology state machine

Related tables: local

system.scylla_views_builds_in_progress

ScyllaDB-specific table tracking materialized view build progress, extending the Cassandra-compatible version.

Schema:

CREATE TABLE system.scylla_views_builds_in_progress (
    keyspace_name text,
    view_name text,
    cpu_id int,
    next_token text,
    generation_number int,
    first_token text,
    PRIMARY KEY (keyspace_name, view_name, cpu_id)
) WITH CLUSTERING ORDER BY (view_name ASC, cpu_id ASC);

Columns:

keyspace_name: Keyspace containing the view
view_name: Name of the materialized view
cpu_id: Shard (CPU) that owns this portion of the build
next_token: Next token to process
generation_number: Build generation number
first_token: First token in the range being built

Related tables: views_builds_in_progress, built_views

system.service_levels_v2

Stores service level definitions for Quality of Service (QoS) workload prioritization.

Schema:

CREATE TABLE system.service_levels_v2 (
    service_level text PRIMARY KEY,
    timeout duration,
    workload_type text,
    shares int
);

Columns:

service_level: Name of the service level
timeout: Default timeout for operations in this service level
workload_type: Type of workload (e.g., batch, interactive)
shares: CPU shares allocated to this service level for scheduling priority

system.size_estimates

Virtual table containing size estimates for individual token ranges of each keyspace/table. Useful for capacity planning and understanding data distribution.

Schema:

CREATE TABLE system.size_estimates (
    keyspace_name text,
    table_name text,
    range_start text,
    range_end text,
    mean_partition_size bigint,
    partitions_count bigint,
    PRIMARY KEY (keyspace_name, table_name, range_start, range_end)
) WITH CLUSTERING ORDER BY (table_name ASC, range_start ASC, range_end ASC);

Columns:

keyspace_name: Name of the keyspace
table_name: Name of the table
range_start: Start of the token range
range_end: End of the token range
mean_partition_size: Average partition size in bytes for this range (bigint)
partitions_count: Estimated number of partitions in this range (bigint)

Implemented by size_estimates_mutation_reader in db/size_estimates_virtual_reader.{hh,cc}.

system.snapshots

Virtual table listing snapshots on the node. Equivalent to the nodetool listsnapshots command.

Schema:

CREATE TABLE system.snapshots (
    keyspace_name text,
    table_name text,
    snapshot_name text,
    live bigint,
    total bigint,
    PRIMARY KEY (keyspace_name, table_name, snapshot_name)
) WITH CLUSTERING ORDER BY (table_name ASC, snapshot_name ASC);

Columns:

keyspace_name: Keyspace of the snapshot
table_name: Table of the snapshot
snapshot_name: Name of the snapshot
live: Live data size in bytes (data that isn't also in the current data files)
total: Total size of snapshot files in bytes

Implemented by snapshots_table in db/system_keyspace.cc.

system.sstables

The "ownership" table for non-local SSTables when using object storage (S3).

Schema:

CREATE TABLE system.sstables (
    owner uuid,
    generation timeuuid,
    status text,
    state text,
    version text,
    format text,
    PRIMARY KEY (owner, generation)
) WITH CLUSTERING ORDER BY (generation ASC);

Columns:

owner: UUID of the owning table
generation: SSTable generation identifier
status: Current status of the SSTable
state: State of the SSTable
version: SSTable format version
format: SSTable format

Note: When a user keyspace is created with S3 storage options, SSTables are put on remote object storage and the information about them is kept in this table.

system.sstables_registry

Tracks SSTable ownership information.

Schema:

CREATE TABLE system.sstables_registry (
    owner uuid,
    generation timeuuid,
    status text,
    state text,
    version text,
    format text,
    PRIMARY KEY (owner, generation)
) WITH CLUSTERING ORDER BY (generation ASC);

Columns:

owner: UUID of the SSTable owner
generation: SSTable generation identifier (clustering key)
status: Current status of the SSTable
state: State of the SSTable
version: SSTable format version
format: SSTable format

system.tablet_sizes

Virtual table containing information about current tablet disk sizes. The table can contain incomplete data, in which case missing_replicas will contain the host IDs of replicas for which the tablet size is not known. Can be queried on any node, but the data comes from the group0 leader. Reads wait for group0 leader to be elected and load balancer stats to become available.

Schema:

CREATE TABLE system.tablet_sizes (
    table_id uuid,
    last_token bigint,
    missing_replicas frozen<set<uuid>>,
    replicas frozen<map<uuid, bigint>>,
    PRIMARY KEY (table_id, last_token)
) WITH CLUSTERING ORDER BY (last_token ASC);

Columns:

table_id: UUID of the table
last_token: Last token owned by the tablet
missing_replicas: Set of host IDs for replicas where tablet size is unknown
replicas: Map of replica host IDs to disk size of the tablet replica in bytes

Related tables: tablets, load_per_node

system.tablets

Holds information about all tablets in the cluster. Only tables that use tablet-based replication strategy have entries here.

Each partition (table_id) represents a tablet map of a given table.

Schema:

CREATE TABLE system.tablets (
    table_id uuid,
    last_token bigint,
    base_table uuid STATIC,
    keyspace_name text STATIC,
    repair_scheduler_config frozen<repair_scheduler_config> STATIC,
    resize_seq_number bigint STATIC,
    resize_task_info frozen<tablet_task_info> STATIC,
    resize_type text STATIC,
    table_name text STATIC,
    tablet_count int STATIC,
    migration_task_info frozen<tablet_task_info>,
    new_replicas frozen<list<frozen<tuple<uuid, int>>>>,
    repair_task_info frozen<tablet_task_info>,
    repair_time timestamp,
    replicas frozen<list<frozen<tuple<uuid, int>>>>,
    session uuid,
    stage text,
    transition text,
    sstables_repaired_at bigint,
    repair_incremental_mode text,
    PRIMARY KEY (table_id, last_token)
) WITH CLUSTERING ORDER BY (last_token ASC);

User-Defined Types:

CREATE TYPE system.repair_scheduler_config (
    auto_repair_enabled boolean,
    auto_repair_threshold bigint
);

CREATE TYPE system.tablet_task_info (
    request_type text,
    tablet_task_id uuid,
    request_time timestamp,
    sched_nr bigint,
    sched_time timestamp,
    repair_hosts_filter text,
    repair_dcs_filter text
);

Static Columns (per table):

tablet_count: Number of tablets in the map
table_name: Name of the table (provided for convenience)
keyspace_name: Name of the keyspace
base_table: Optionally set with the table_id of another table that this table is co-located with, meaning they always have the same tablet count and tablet replicas, and are migrated and resized together as a group. When base_table is set, the rest of the tablet map is empty, and the tablet map of base_table should be read instead. When base_table is NULL, this table has its own independent tablet map stored in the remaining columns.
resize_type: Resize decision type that spans all tablets of a given table (merge, split, or none)
resize_seq_number: Sequence number (>= 0) of the resize decision that globally identifies it. It's monotonically increasing, incremented by one for every new decision, so a higher value means it came later in time.
repair_scheduler_config: Configuration for the repair scheduler containing:
- auto_repair_enabled: When set to true, auto repair is enabled. Disabled by default.
- auto_repair_threshold: If the time since last repair is longer than auto_repair_threshold seconds, the tablet is eligible for auto repair.

Per-Tablet Columns:

last_token is the last token owned by the tablet. The i-th tablet, where i = 0, 1, ..., tablet_count-1, owns the token range:

   (-inf, last_token(0)]            for i = 0
   (last_token(i-1), last_token(i)] for i > 0

replicas: List of tuples (host_id, shard_id) representing the shard-replicas of the tablet
new_replicas: What will be put in replicas after transition is done
stage: Current stage of tablet transition
transition: Type of transition (see below)
repair_time: Last time the tablet was repaired
sstables_repaired_at: Tablet-level reference repaired_at value. When an SSTable's on-disk repaired_at value is less than or equal to sstables_repaired_at, the SSTable is considered repaired.
repair_incremental_mode: Mode for incremental repair (see below)
repair_task_info: Metadata for the task manager containing:
- request_type: Type of the request (user_repair or auto_repair)
- tablet_task_id: UUID of the task
- request_time: Time the request was created
- sched_nr: Number of times the request has been scheduled by the repair scheduler
- sched_time: Time the request was scheduled by the repair scheduler
- repair_hosts_filter: Repair replicas listed in the comma-separated host_id list
- repair_dcs_filter: Repair replicas listed in the comma-separated DC list

Transition types:

migration: One tablet replica is moving from one shard to another
rebuild: New tablet replica is created from the remaining replicas
repair: Tablet replicas are being repaired

Incremental repair modes (repair_incremental_mode):

regular: The incremental repair logic is enabled. Unrepaired SSTables will be included for repair. Repaired SSTables will be skipped. The incremental repair states will be updated after repair.
full: The incremental repair logic is enabled. Both repaired and unrepaired SSTables will be included for repair. The incremental repair states will be updated after repair.
disabled: The incremental repair logic is disabled completely. The incremental repair states (e.g., repaired_at in SSTables and sstables_repaired_at in this table) will not be updated after repair.

Related tables: tablet_sizes, load_per_node, topology

system.token_ring

Virtual table containing the ring description for each keyspace. Equivalent of the nodetool describe_ring $KEYSPACE command (when filtered for a specific keyspace). Overlaps with the output of nodetool ring.

Schema:

CREATE TABLE system.token_ring (
    keyspace_name text,
    table_name text,
    start_token text,
    endpoint inet,
    end_token text,
    dc text,
    rack text,
    PRIMARY KEY (keyspace_name, table_name, start_token, endpoint)
) WITH CLUSTERING ORDER BY (table_name ASC, start_token ASC, endpoint ASC);

Columns:

keyspace_name: Name of the keyspace
table_name: Name of the table
start_token: Start of the token range
endpoint: IP address of the node owning this range
end_token: End of the token range
dc: Data center of the endpoint
rack: Rack of the endpoint

Implemented by token_ring_table in db/system_keyspace.cc.

system.topology

Stores cluster topology state managed by Raft. This is the central table for Raft-based topology management.

Schema:

CREATE TABLE system.topology (
    key text,
    host_id uuid,
    cleanup_status text,
    datacenter text,
    ignore_msb int,
    node_state text,
    num_tokens int,
    rack text,
    rebuild_option text,
    release_version text,
    replaced_id uuid,
    request_id timeuuid,
    shard_count int,
    supported_features frozen<set<text>>,
    tokens frozen<set<text>>,
    tokens_string text,
    topology_request text,
    committed_cdc_generations frozen<set<tuple<bigint, timeuuid>>> STATIC,
    enabled_features frozen<set<text>> STATIC,
    fence_version bigint STATIC,
    global_requests frozen<set<timeuuid>> STATIC,
    global_topology_request text STATIC,
    global_topology_request_id timeuuid STATIC,
    ignore_nodes frozen<set<uuid>> STATIC,
    new_cdc_generation_data_uuid timeuuid STATIC,
    new_keyspace_rf_change_data frozen<map<text, text>> STATIC,
    new_keyspace_rf_change_ks_name text STATIC,
    paused_rf_change_requests frozen<set<timeuuid>> STATIC,
    session uuid STATIC,
    tablet_balancing_enabled boolean STATIC,
    transition_state text STATIC,
    unpublished_cdc_generations frozen<set<tuple<bigint, timeuuid>>> STATIC,
    upgrade_state text STATIC,
    version bigint STATIC,
    PRIMARY KEY (key, host_id)
) WITH CLUSTERING ORDER BY (host_id ASC);

Partition Key:

key: Partition key (typically "topology")

Clustering Column:

host_id: Host ID of the node (each row represents one node)

Per-Node Regular Columns:

datacenter: Data center of the node
rack: Rack of the node
tokens: Set of token strings owned by the node
node_state: Current state of the node in topology
release_version: ScyllaDB version running on the node
topology_request: Current topology request for this node
replaced_id: If this node is replacing another, the replaced node's host ID
rebuild_option: Rebuild configuration for the node
num_tokens: Number of tokens for the node
tokens_string: Alternative representation of tokens
shard_count: Number of shards on the node
ignore_msb: MSB bits to ignore for token calculation
cleanup_status: Status of cleanup operations
supported_features: Features supported by this node
request_id: ID of the current topology request for this node

Static Columns (cluster-wide state):

ignore_nodes: Set of node IDs to ignore
new_cdc_generation_data_uuid: UUID for new CDC generation data
new_keyspace_rf_change_ks_name: (deprecated) Keyspace name for RF change
new_keyspace_rf_change_data: (deprecated) RF change data
version: Topology version number
fence_version: Version used for fencing operations
transition_state: Current state of topology transition
committed_cdc_generations: Set of committed CDC generations (generation timestamp, ID)
unpublished_cdc_generations: Set of unpublished CDC generations
global_topology_request: Current global topology request
global_topology_request_id: ID of the global topology request
enabled_features: Features enabled across the cluster
session: Current topology session ID
tablet_balancing_enabled: Whether tablet load balancing is enabled
upgrade_state: State of cluster upgrade process
global_requests: Set of global request IDs
paused_rf_change_requests: Set of paused replication factor change requests

Related tables: topology_requests, tablets, discovery

system.topology_requests

Queue of pending topology change requests. The coordinator processes this queue to handle topology operations.

Schema:

CREATE TABLE system.topology_requests (
    id timeuuid PRIMARY KEY,
    initiating_host uuid,
    request_type text,
    start_time timestamp,
    done boolean,
    error text,
    end_time timestamp,
    truncate_table_id uuid,
    new_keyspace_rf_change_ks_name text,
    new_keyspace_rf_change_data map<text, text>
);

Columns:

id: Unique identifier for the request
initiating_host: Host ID of the node that initiated the request
request_type: Type of topology request (e.g., join, leave, remove, replace, remove_node)
start_time: When the request was started
done: Whether the request has been completed
error: Error message if the request failed
end_time: When the request completed
truncate_table_id: Table ID for truncate operations
new_keyspace_rf_change_ks_name: Keyspace name for replication factor changes
new_keyspace_rf_change_data: Data for replication factor changes

Related tables: topology

system.truncated

Holds truncation replay positions per table and shard. When a table is truncated, SSTables are removed and the current replay position for each shard is recorded here. During commitlog replay (in case of a crash), this data is used to filter mutations and ensure truncated data is not resurrected.

Schema:

CREATE TABLE system.truncated (
    table_uuid uuid,
    shard int,
    position int,
    segment_id bigint,
    truncated_at timestamp STATIC,
    PRIMARY KEY (table_uuid, shard)
) WITH CLUSTERING ORDER BY (shard ASC);

Columns:

table_uuid: ID of the truncated table
shard: Shard number
position: Replay position within the segment
segment_id: Commitlog segment ID
truncated_at: (static) Timestamp when the truncation occurred

Note: Prior to this table being added, truncation records were kept in the truncated_at map column in system.local. When booting up, ScyllaDB merges data from the legacy store with data from this table. Until the whole cluster agrees on the feature TRUNCATION_TABLE, truncation will write both new and legacy records. When the feature is agreed upon, the legacy map is removed.

system.versions

Virtual table containing all version-related information. Equivalent of the nodetool version command, but contains more detailed version information.

Schema:

CREATE TABLE system.versions (
    key text PRIMARY KEY,
    version text,
    build_mode text,
    build_id text
);

Columns:

key: Version category identifier (typically "local")
version: Full version string
build_mode: Build mode (e.g., release, debug, dev)
build_id: Build identifier/hash

Implemented by versions_table in db/system_keyspace.cc.

Related tables: config

system.view_build_status_v2

Tracks the build status of materialized views across the cluster.

Schema:

CREATE TABLE system.view_build_status_v2 (
    keyspace_name text,
    view_name text,
    host_id uuid,
    status text,
    PRIMARY KEY ((keyspace_name, view_name), host_id)
) WITH CLUSTERING ORDER BY (host_id ASC);

Columns:

keyspace_name: Keyspace containing the view
view_name: Name of the materialized view
host_id: Host ID of the node reporting status
status: Build status (e.g., STARTED, SUCCESS)

Related tables: built_views, views_builds_in_progress

system.view_building_tasks

Tracks tasks related to building materialized views.

Schema:

CREATE TABLE system.view_building_tasks (
    key text,
    id timeuuid,
    type text,
    aborted boolean,
    base_id uuid,
    view_id uuid,
    last_token bigint,
    host_id uuid,
    shard int,
    PRIMARY KEY (key, id)
) WITH CLUSTERING ORDER BY (id ASC);

Columns:

key: Partition key for grouping tasks
id: Unique identifier for the build task (timeuuid)
type: Type of the build task
aborted: Whether the task has been aborted
base_id: UUID of the base table
view_id: UUID of the materialized view
last_token: Last token processed
host_id: Host ID of the node handling this task
shard: Shard ID handling this task

Related tables: views_builds_in_progress

system.views_builds_in_progress

Tracks materialized view builds that are currently in progress. This is the Cassandra-compatible version.

Schema:

CREATE TABLE system.views_builds_in_progress (
    keyspace_name text,
    view_name text,
    generation_number int,
    last_token text,
    PRIMARY KEY (keyspace_name, view_name)
) WITH CLUSTERING ORDER BY (view_name ASC);

Columns:

keyspace_name: Keyspace containing the view
view_name: Name of the materialized view
generation_number: Build generation number
last_token: Last token that was processed

Virtual Tables

Virtual tables behave just like regular tables from the user's point of view. The difference between them and regular tables comes down to how they are implemented. While regular tables have memtables/commitlog/SSTables and all you would expect from CQL tables, virtual tables translate some in-memory structure to CQL result format.

The following system tables are virtual tables:

cluster_status
clients
config
load_per_node
protocol_servers
runtime_info
size_estimates
snapshots
tablet_sizes
token_ring
versions

For more details see virtual-tables.md.

Notes

Read-Only Tables: Most system tables should be considered read-only. Direct modification can cause cluster instability.
Internal Use: Many tables (especially Raft, CDC, and topology tables) are for internal ScyllaDB use. Modifying them directly is not recommended.
Monitoring: Tables like large_partitions, large_rows, and large_cells are useful for performance monitoring and troubleshooting.
Version Suffixes: Tables with _v2 or _v3 suffixes represent schema evolution across ScyllaDB versions.
TTL on Diagnostic Tables: The large_* tables have entries with a 30-day TTL.

Verification

To verify table schemas in your cluster:

-- Describe a specific table
DESCRIBE TABLE system.<table_name>;

-- List all tables in the system keyspace
SELECT * FROM system_schema.tables WHERE keyspace_name = 'system';

-- Get column details for a specific table
SELECT * FROM system_schema.columns 
WHERE keyspace_name = 'system' AND table_name = '<table_name>';

64 KiB Raw Permalink Blame History

System Keyspace Layout

Table of Contents

System Table Categories

system.IndexInfo

system.batchlog

system.batchlog_v2

system.built_views

system.cdc_generations_v3

system.cdc_local

system.cdc_streams

system.cdc_streams_history

system.cdc_streams_state

system.client_routes

system.clients

system.cluster_status

system.commitlog_cleanups

system.compaction_history

system.config

system.corrupt_data

system.dicts

system.discovery

system.group0_history

The system.large_* tables

system.large_cells

Example usage

system.large_partitions

Example usage

system.large_rows

Example usage

system.load_per_node

system.local

system.paxos

system.peers

system.protocol_servers

system.raft

system.raft_snapshot_config

system.raft_snapshots

system.raft_state

system.repair_history

system.repair_tasks

system.role_attributes

system.role_members

system.role_permissions

system.roles

system.runtime_info

system.scylla_local

system.scylla_views_builds_in_progress

system.service_levels_v2

system.size_estimates

system.snapshots

system.sstables

system.sstables_registry

system.tablet_sizes

system.tablets

system.token_ring

system.topology

system.topology_requests

system.truncated

system.versions

system.view_build_status_v2

system.view_building_tasks

system.views_builds_in_progress

Virtual Tables

Notes

Verification

64 KiB

Raw Permalink Blame History

The `system.large_*` tables