Files
scylladb/docs/design-notes/secondary_index.md
Nadav Har'El f76f6dbccb secondary index: avoid special characters in default index names
In CQL, table names are limited to so-called word characters (letters,
numbers and underscores), but column names don't have such a limitation.
When we create a secondary index, its default name is constructed from
the column name - so can contain problematic characters. It can include
even the "/" character. The problem is that the index name is then used,
like a table name, to create a directory with that name.

The test included in this patch demonstrates that before this patch, this
can be misused to create subdirectories anywhere in the filesystem, or to
crash Scylla when it fails to create a directory (which it considers an
unrecoverable I/O error).

In this patch we do what Cassandra does - remove all non-word
characters from the indexed column name before constructing the default
index name. In the included test - which can run on both Scylla and
Cassandra - we verify that the constructed index name is the same as
in Cassandra, which is useful to know (e.g., because knowing the index
name is needed to DROP the index).

Also, this patch adds a second line of defense against the security problem
described above: It is now an error to create a schema with a slash or
null (the two characters not allowed in Unix filenames) in the keyspace
or table names. So if the first line of defense (CQL checking the validity
of its commands) fails, we'll have that second line of defense. I verified
that if I revert the default-index-name fix, the second line of defense
kicks in, and the index creation is aborted and cannot create files in
the wrong place to crash Scylla.

Fixes #3403

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220320162543.3091121-1-nyh@scylladb.com>
2022-03-20 18:33:48 +02:00

2.4 KiB

Secondary indexes in Scylla

Secondary indexes can currently be either global (default) or local. Global indexes use the indexed column as its partition key, while local indexes share their partition key with their base table, which ensures local lookup.

The distinction is stored in index target, which is a string kept in index's options map under the key "target". Example of a global and local indexes on the same table and column:

SELECT * FROM system\_schema.indexes;
 keyspace\_name | table\_name | index\_name | kind       | options
----------------+-------------+-------------+------------+----------------------
         demodb |           t |  local_t_v1 | COMPOSITES | {'target': '{"pk":["p"],"ck":["v1"]}'}
         demodb |           t |    t_v1_idx | COMPOSITES | {'target': 'v1'}

Default naming

By default, index names are generated from table name, column name and "_idx" postfix. Because index names, like table names, must be word characters (letters, digits, or underscore), any characters in the column name which aren't word characters are dropped before constructing the index name.

If the name is taken (e.g. because somebody already created a named index with the exact same name), "_X" is appended, where X is the smallest number that ensures name uniqueness.

Default name for an index created on table t and column v1 is thus t_v1_idx, but it can also become t_v1_idx_1 if the first one was already taken. Both global and local indexes share the same default naming conventions.

When in doubt, DESCRIBE index_name or SELECT * FROM system_schema.indexes commands can be leveraged to see more details on index targets and type.

Global index

Global index's target is usually just the indexed column name, unless the index has a specific type. All supported types are:

  • regular index: v
  • full collection index: FULL(v)
  • index on map keys: KEYS(v)
  • index on map values: ENTRIES(v)

Their serialization is just string representation, so: "v", "FULL(v)", "KEYS(v)", "ENTRIES(v)" are all valid targets.

Local index

Local index's target consists of explicit partition key followed by indexed column definition. Currently the partition key must match the partition key of base table.

Their serialization is a string representing primary key in JSON. Examples: { "pk": ["p1", "p2", "p3"], "ck": ["v"] }

{ "pk": ["p"], "ck": ["v"] }