Merge 'Docs: document how scylla-sstable obtains its schema' from Botond Dénes

This is a very important aspect of the tool that was completely missing from the document before. Also add a comparison with SStableDump.
Fixes: https://github.com/scylladb/scylladb/issues/11363

Closes #11390

* github.com:scylladb/scylladb:
  docs: scylla-sstable.rst: add comparison with SStableDump
  docs: scylla-sstable.rst: add section about providing the schema
This commit is contained in:
Avi Kivity
2022-08-31 14:28:52 +03:00
2 changed files with 83 additions and 0 deletions

View File

@@ -11,6 +11,16 @@ generating a histogram, validating the content of SStables, and more. See `Suppo
Run ``scylla-sstable --help`` for additional information about the tool and the operations.
This tool is similar to SStableDump_, with notable differences:
* Built on the ScyllaDB C++ codebase, it supports all SStable formats and components that ScyllaDB supports.
* Expanded scope: this tool supports much more than dumping SStable data components (see `Supported Operations`_).
* More flexible on how schema is obtained and where SStables are located: SStableDump_ only supports dumping SStables located in their native data directory. To dump an SStable, one has to clone the entire ScyllaDB data directory tree, including system table directories and even config files. scylla-sstable can dump sstables from any path with multiple choices on how to obtain the schema, see Schema_.
Currently, SStableDump_ works better on production systems as it automatically loads the schema from the system tables, unlike scylla-sstable, which has to be provided with the schema explicitly. On the other hand scylla-sstable works better for off-line investigations, as it can be used with as little as just a schema definition file and a single sstable. In the future we plan on closing this gap -- adding support for automatic schema-loading for scylla-sstable too -- and completely supplant SStableDump_ with scylla-sstable.
.. _SStableDump: /operating-scylla/admin-tools/sstabledump
Usage
------
@@ -26,6 +36,77 @@ The command syntax is as follows:
You can specify more than one SStable.
Schema
^^^^^^
All operations need a schema to interpret the SStables with.
Currently, there are two ways to obtain the schema:
* ``--schema-file FILENAME`` - Read the schema definition from a file.
* ``--system-schema KEYSPACE.TABLE`` - Use the known definition of built-in tables (only works for system tables).
By default, the tool uses the first method: ``--schema-file schema.cql``; i.e. it assumes there is a schema file named ``schema.cql`` in the working directory.
If this fails, it will exit with an error.
The schema file should contain all definitions needed to interpret data belonging to the table.
Example ``schema.cql``:
.. code-block:: cql
CREATE KEYSPACE ks WITH replication = {'class': 'NetworkTopologyStrategy', 'mydc1': 1, 'mydc2': 4};
CREATE TYPE ks.mytype (
f1 int,
f2 text
);
CREATE TABLE ks.cf (
pk int,
ck text,
v1 int,
v2 mytype,
PRIMARY KEY (pk, ck)
);
Note:
* In addition to the table itself, the definition also has to includes any user defined types the table uses.
* The keyspace definition is optional, if missing one will be auto-generated.
* The schema file doesn't have to be called ``schema.cql``, this is just the default name. Any file name is supported (with any extension).
Dropped columns
***************
The examined sstable might have columns which were dropped from the schema definition. In this case providing the up-do-date schema will not be enough, the tool will fail when attempting to process a cell for the dropped column.
Dropped columns can be provided to the tool in the form of insert statements into the ``system_schema.dropped_columns`` system table, in the schema definition file. Example:
.. code-block:: cql
INSERT INTO system_schema.dropped_columns (
keyspace_name,
table_name,
column_name,
dropped_time,
type
) VALUES (
'ks',
'cf',
'v1',
1631011979170675,
'int'
);
CREATE TABLE ks.cf (pk int PRIMARY KEY, v2 int);
System tables
*************
If the examined table is a system table -- it belongs to one of the system keyspaces (``system``, ``system_schema``, ``system_distributed`` or ``system_distributed_everywhere``) -- you can just tell the tool to use the known built-in definition of said table. This is possible with the ``--system-schema`` flag. Example:
.. code-block:: console
scylla-sstable dump-data --system-schema system.local ./path/to/md-123456-big-Data.db
Supported Operations
^^^^^^^^^^^^^^^^^^^^^^^
The ``dump-*`` operations output JSON. For ``dump-data``, you can specify another output format.

View File

@@ -4,8 +4,10 @@ SSTabledump
This tool allows you to converts SSTable into a JSON format file.
SSTabledump supported when using Scylla 3.0, Scylla Enterprise 2019.1, and newer versions.
In older versions, the tool is named SSTable2json_.
If you need more flexibility or want to dump more than just the data-component, see scylla-sstable_.
.. _SSTable2json: /operating-scylla/admin-tools/sstable2json
.. _scylla-sstable: /operating-scylla/admin-tools/scylla-sstable
Use the full path to the data file when executing the command.