From 65da6a26a392e9f31a22dab165c226998b481465 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Botond=20D=C3=A9nes?= Date: Fri, 26 Aug 2022 09:19:55 +0300 Subject: [PATCH 1/2] docs: scylla-sstable.rst: add section about providing the schema Providing the schema for the scylla-sstable tool is an important topic that was completely missing from the description so far. --- .../admin-tools/scylla-sstable.rst | 71 +++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/docs/operating-scylla/admin-tools/scylla-sstable.rst b/docs/operating-scylla/admin-tools/scylla-sstable.rst index 6b4f32c3b4..0ca1cf9986 100644 --- a/docs/operating-scylla/admin-tools/scylla-sstable.rst +++ b/docs/operating-scylla/admin-tools/scylla-sstable.rst @@ -26,6 +26,77 @@ The command syntax is as follows: You can specify more than one SStable. +Schema +^^^^^^ +All operations need a schema to interpret the SStables with. +Currently, there are two ways to obtain the schema: + +* ``--schema-file FILENAME`` - Read the schema definition from a file. +* ``--system-schema KEYSPACE.TABLE`` - Use the known definition of built-in tables (only works for system tables). + +By default, the tool uses the first method: ``--schema-file schema.cql``; i.e. it assumes there is a schema file named ``schema.cql`` in the working directory. +If this fails, it will exit with an error. + +The schema file should contain all definitions needed to interpret data belonging to the table. + +Example ``schema.cql``: + +.. code-block:: cql + + CREATE KEYSPACE ks WITH replication = {'class': 'NetworkTopologyStrategy', 'mydc1': 1, 'mydc2': 4}; + + CREATE TYPE ks.mytype ( + f1 int, + f2 text + ); + + CREATE TABLE ks.cf ( + pk int, + ck text, + v1 int, + v2 mytype, + PRIMARY KEY (pk, ck) + ); + +Note: + +* In addition to the table itself, the definition also has to includes any user defined types the table uses. +* The keyspace definition is optional, if missing one will be auto-generated. +* The schema file doesn't have to be called ``schema.cql``, this is just the default name. Any file name is supported (with any extension). + +Dropped columns +*************** + +The examined sstable might have columns which were dropped from the schema definition. In this case providing the up-do-date schema will not be enough, the tool will fail when attempting to process a cell for the dropped column. +Dropped columns can be provided to the tool in the form of insert statements into the ``system_schema.dropped_columns`` system table, in the schema definition file. Example: + +.. code-block:: cql + + INSERT INTO system_schema.dropped_columns ( + keyspace_name, + table_name, + column_name, + dropped_time, + type + ) VALUES ( + 'ks', + 'cf', + 'v1', + 1631011979170675, + 'int' + ); + + CREATE TABLE ks.cf (pk int PRIMARY KEY, v2 int); + +System tables +************* + +If the examined table is a system table -- it belongs to one of the system keyspaces (``system``, ``system_schema``, ``system_distributed`` or ``system_distributed_everywhere``) -- you can just tell the tool to use the known built-in definition of said table. This is possible with the ``--system-schema`` flag. Example: + +.. code-block:: console + + scylla-sstable dump-data --system-schema system.local ./path/to/md-123456-big-Data.db + Supported Operations ^^^^^^^^^^^^^^^^^^^^^^^ The ``dump-*`` operations output JSON. For ``dump-data``, you can specify another output format. From 0f4666010a19a7d9dbc80b0024f10bdcac388c33 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Botond=20D=C3=A9nes?= Date: Fri, 26 Aug 2022 09:40:01 +0300 Subject: [PATCH 2/2] docs: scylla-sstable.rst: add comparison with SStableDump The two tools have very similar goals, user might wonder when to use one or the other. Also add a link to sstabledump.rst to scylla-sstable. --- docs/operating-scylla/admin-tools/scylla-sstable.rst | 10 ++++++++++ docs/operating-scylla/admin-tools/sstabledump.rst | 2 ++ 2 files changed, 12 insertions(+) diff --git a/docs/operating-scylla/admin-tools/scylla-sstable.rst b/docs/operating-scylla/admin-tools/scylla-sstable.rst index 0ca1cf9986..211bf1c949 100644 --- a/docs/operating-scylla/admin-tools/scylla-sstable.rst +++ b/docs/operating-scylla/admin-tools/scylla-sstable.rst @@ -11,6 +11,16 @@ generating a histogram, validating the content of SStables, and more. See `Suppo Run ``scylla-sstable --help`` for additional information about the tool and the operations. +This tool is similar to SStableDump_, with notable differences: + +* Built on the ScyllaDB C++ codebase, it supports all SStable formats and components that ScyllaDB supports. +* Expanded scope: this tool supports much more than dumping SStable data components (see `Supported Operations`_). +* More flexible on how schema is obtained and where SStables are located: SStableDump_ only supports dumping SStables located in their native data directory. To dump an SStable, one has to clone the entire ScyllaDB data directory tree, including system table directories and even config files. scylla-sstable can dump sstables from any path with multiple choices on how to obtain the schema, see Schema_. + +Currently, SStableDump_ works better on production systems as it automatically loads the schema from the system tables, unlike scylla-sstable, which has to be provided with the schema explicitly. On the other hand scylla-sstable works better for off-line investigations, as it can be used with as little as just a schema definition file and a single sstable. In the future we plan on closing this gap -- adding support for automatic schema-loading for scylla-sstable too -- and completely supplant SStableDump_ with scylla-sstable. + +.. _SStableDump: /operating-scylla/admin-tools/sstabledump + Usage ------ diff --git a/docs/operating-scylla/admin-tools/sstabledump.rst b/docs/operating-scylla/admin-tools/sstabledump.rst index 2775043a32..fa9a397ce1 100644 --- a/docs/operating-scylla/admin-tools/sstabledump.rst +++ b/docs/operating-scylla/admin-tools/sstabledump.rst @@ -4,8 +4,10 @@ SSTabledump This tool allows you to converts SSTable into a JSON format file. SSTabledump supported when using Scylla 3.0, Scylla Enterprise 2019.1, and newer versions. In older versions, the tool is named SSTable2json_. +If you need more flexibility or want to dump more than just the data-component, see scylla-sstable_. .. _SSTable2json: /operating-scylla/admin-tools/sstable2json +.. _scylla-sstable: /operating-scylla/admin-tools/scylla-sstable Use the full path to the data file when executing the command.