Files
scylladb/tools/schema_loader.hh
Tomasz Grabiec 573ef87245 Merge ' tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes
scylla-sstable currently has two ways to obtain the schema:

    * via a `schema.cql` file.
    * load schema definition from memory (only works for system tables).

This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.

This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.

Example:

```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```

As seen above, subdirectories like qurantine, staging etc are also supported.

Fixes: https://github.com/scylladb/scylladb/issues/10126

Closes #13448

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add tests for schema loading
  test/cql-pytest: add no_autocompaction_context
  docs: scylla-sstable.rst: remove accidentally added copy-pasta
  docs: scylla-sstable.rst: remove paragraph with schema limitations
  docs: scylla-sstable.rst: update schema section
  test/cql-pytest: nodetool.py: add flush_keyspace()
  tools/scylla-sstable: reform schema loading mechanism
  tools/schema_loader: add load_schema_from_schema_tables()
  db/schema_tables: expose types schema

(cherry picked from commit 952b455310)

Closes #15386
2023-11-02 17:25:18 +02:00

74 lines
2.8 KiB
C++

/*
* Copyright (C) 2021-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <filesystem>
#include <seastar/core/future.hh>
#include "seastarx.hh"
#include "schema.hh"
namespace tools {
/// Load the schema(s) from the specified string
///
/// The schema string is expected to contain everything that is needed to
/// create the table(s): keyspace, UDTs, etc. Definitions are expected to be
/// separated by `;`. A keyspace will be automatically generated if missing.
/// Tables whose name ends in "_scylla_cdc_log" are interpreted as CDC tables,
/// meaning they will be configured with the CDC partitioner.
/// Loading the schema(s) has no side-effect [1]. Nothing is written to disk,
/// it is all in memory, kept alive by the returned `schema_ptr`.
/// This is intended to be used by tools, which don't want to meddle with the
/// scylla home directory.
///
/// [1] Currently some global services has to be instantiated (snitch) to
/// be able to load the schema(s), these survive the call.
future<std::vector<schema_ptr>> load_schemas(std::string_view schema_str);
/// Load exactly one schema from the specified string
///
/// If the string contains more or less then one schema, an exception will be
/// thrown. See \ref load_schemas().
future<schema_ptr> load_one_schema(std::string_view schema_str);
/// Load the schema(s) from the specified path
///
/// Same as \ref load_schemas() except it loads the schema from
/// the file at the specified path.
future<std::vector<schema_ptr>> load_schemas_from_file(std::filesystem::path path);
/// Load exactly one schema from the specified path
///
/// Same as \ref load_one_schema() except it loads the schema from
/// the file at the specified path.
future<schema_ptr> load_one_schema_from_file(std::filesystem::path path);
/// Load the system schema, with the given keyspace and table
///
/// Note that only schemas from builtin system tables are supported, i.e.,
/// from the following keyspaces:
/// * system
/// * system_schema
/// * system_distributed
/// * system_distributed_everywhere
///
/// Any table from said keyspaces can be loaded. The keyspaces are created with
/// all schema and experimental features enabled.
schema_ptr load_system_schema(std::string_view keyspace, std::string_view table);
/// Load the schema of the table with the designated keyspace and table name,
/// from the system schema table sstables.
///
/// The schema table sstables are accessed for read only. In general this method
/// tries very hard to have no side-effects.
/// The \p scylla_data_path parameter is expected to point to the scylla data
/// directory, which is usually /var/lib/scylla/data.
future<schema_ptr> load_schema_from_schema_tables(std::filesystem::path scylla_data_path, std::string_view keyspace, std::string_view table);
} // namespace tools