Files
scylladb/docs/dev/system_schema_keyspace.md
Tomasz Grabiec ba692d1805 schema_tables: Keep "replication" column backwards-compatible by expanding rack lists to numeric RF
In 380f243986 we added support for rack
lists in replication options. Drivers which are not prepared to parse
that (as of now, all of them), will not create metadata object for
that keyspace. This breaks, for example, the "copy to/from" cqlsh
command. Potentially other things too.

To fix that, keep the "replication" column in the old format, and
store numeric RF there, which corresponds to the number of
replicas. Accurate options in the new format are put in
"replication_v2".

We set replication_v2 in the schema only when it differs from the old
"replication" so that the new column is not set during upgrade,
otherwise downgrade would fail. Partition tombstone is added to ensure
that pre-alter replication_v2 value is deleted on alters which change
replication to a value which is the same as the post-alter
"replication" value.

Fixes #26415

Closes scylladb/scylladb#26429
2025-10-21 09:11:25 +03:00

3.5 KiB

System schema keyspace layout

This section describes layouts and usage of system_schema.* tables.

system_schema.keyspaces

This table contains one row per keyspaces.

Schema:

CREATE TABLE system_schema.keyspaces (
    keyspace_name text PRIMARY KEY,
    durable_writes boolean,
    replication frozen<map<text, text>>
)

Columns:

  • keyspace_name - name of the keyspace

  • durable_writes - whether writes to the keyspace are using commitlog.

  • replication - Deprecated replication settings for the keyspace. Contains the same options as the replication_v2 column, but rack lists are replaced with replica count (numeric RF) for backwards compatibility with drivers which don't recognize rack lists here.

  • replication_v2 - replication settings for the keyspace. The value for the "class" key determines replication strategy name. The structure of other options depends on the replication strategy.

    For NetworkTopologyStrategy the other options specify replication factors for datacenters, stored as a flattened map of the extended options map (see below).

    For SimpleStrategy there is a single option "replication_factor" specifying the replication factor.

Extended options map used by NetworkTopologyStrategy is a map where values can be either strings or lists of strings.

For example:

   {
      'dc1': '3',
      'dc2': ['rack1', 'rack2'],
      'dc3': []
   }

The options above mean that the replication factor for datacenter dc1 is 3, for datacenter dc2 it is 2, with replicas placed on racks rack1 and rack2. For 'dc3' the replication factor is 0, expressed as an empty list of racks.

The extended map is stored in the "replication" column in a flattened form, where values which are lists are represented as multiple entries in the map with the list index appended to the key, with : as the separator. The index can be negative, which is used to indicate that the list is empty.

The example extended options map from above has a flattened representation of:

  {
    'dc1': '3',
    'dc2:0': 'rack1',
    'dc2:1': 'rack2',
    'dc3:-1': ''
  }

system_schema.computed_columns

Computed columns are a special kind of columns. Rather than having their value provided directly by the user, they are computed - possibly from other column values. Examples of such computed columns could be:

  • token column generated from the base partition key for secondary indexes
  • map value column, generated as the extraction of a single value from a map stored in a different column

Computed columns in many ways act as regular columns, so they are also present in the system_schema.columns table - system_schema.computed_columns is an additional mapping that marks the column as computed and provides its computation.

Schema:

CREATE TABLE system_schema.computed_columns (
    keyspace_name text,
    table_name text,
    column_name text,
    computation blob,
    PRIMARY KEY (keyspace_name, table_name, column_name)
) WITH CLUSTERING ORDER BY (table_name ASC, column_name ASC);

computation is stored as a blob and its contents are assumed to be a JSON representation of computation's type and any custom fields needed.

Example representations:

{'type':'token'}

{'type':'map_value','map':'my_map_column_name','key':'AF$^GESHHgge6yhf'}

The token computation does not need additional arguments, as it returns the token of base's partition key. In order to compute a map value, what's additionally needed is the column name that stores the map and the key at which the value is expected.