mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Go to file

Nadav Har'El 32fff17e19 Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes

`scylla-sstable` currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).

This meant that for most cases it was necessary to export the schema into a `CQL` format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable *is* inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a `schema.cql` is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.

This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.

Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like `qurantine`, `staging` etc are also supported.

Fixes: https://github.com/scylladb/scylladb/issues/10126

Closes #13075

* github.com:scylladb/scylladb:
  docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section
  test/cql-pytest: test_tools.py: add test for schema loading
  test/cql-pytest: nodetool.py: add flush_keyspace()
  tools/scylla-sstable: reform schema loading mechanism
  tools/schema_loader: add load_schema_from_schema_tables()
  db/schema_tables: expose types schema

2023-03-30 09:35:59 +03:00

.github

docs: Separate conf.py

2023-03-27 13:42:58 +03:00

alternator

Merge 'treewide: improve compatibility with gcc 13' from Avi Kivity

2023-03-24 15:16:05 +02:00

api

Merge 'bytes, gms: replace operator<<(..) with fmt formatter' from Kefu Chai

2023-03-28 08:25:41 +03:00

auth

Merge 'config: make query timeouts live update-able' from Kefu Chai

2023-03-29 19:38:26 +03:00

cdc

build: cmake: extract more subsystem out into its own CMakeLists.txt

2023-03-02 10:15:25 +08:00

cmake

build: cmake: port more cxxflags from configure.py

2023-03-26 14:01:21 +08:00

compaction

Merge 'Allow each compaction group to have its own compaction strategy state' from Raphael "Raph" Carvalho

2023-03-29 18:57:11 +03:00

conf

conf: enable consistent_cluster_management by default

2023-01-20 13:29:06 +01:00

cql3

cql3: do not use operator<< to print authenticated_user

2023-03-29 16:02:29 +08:00

data_dictionary

table: Keep storage options lw-shared-ptr

2023-03-16 17:30:45 +03:00

Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes

2023-03-30 09:35:59 +03:00

debug

…

dht

bootstrapper: Add get_random_bootstrap_tokens function

2023-03-21 16:06:43 +02:00

direct_failure_detector

…

dist

scylla_kernel_check: suppress verbose iotune messages

2023-03-30 07:30:07 +03:00

docs

Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes

2023-03-30 09:35:59 +03:00

exceptions

…

gms

bytes, gms: s/format_to/fmt::format_to/

2023-03-29 14:47:28 +03:00

idl

build: cmake: add missing source files to idl and service

2023-03-26 14:01:21 +08:00

index

build: cmake: extract index, repair and data_dictionary out

2023-03-08 22:53:42 +08:00

interface

build: cmake: expose scylla_gen_build_dir from "interface"

2023-02-28 21:28:46 +08:00

lang

cql: renice the wasm compilation alien thread

2023-03-26 18:38:23 +03:00

licenses

scripts: remove git-archive-all

2023-03-29 18:59:23 +03:00

locator

Merge 'Optimize topology::compare_endpoints' from Benny Halevy

2023-03-07 15:17:19 +02:00

message

treewide: use fmtlib to format gms::inet_address

2023-03-27 20:06:45 +08:00

mutation

mutation/mutation_compactor: consume_partition_end(): reset _stop

2023-03-29 17:48:45 +03:00

mutation_writer

reader_permit: keep trace_state pointer on permit

2023-03-22 04:58:01 -04:00

raft

raft: include boost header using <path/to/header> not "path/to/header"

2023-03-26 14:07:50 +08:00

readers

readers/multishard: shard_reader: fast-forward created reader to current range

2023-03-24 08:43:03 -04:00

redis

redis,thrift,transport: make timeout_config live-updateable

2023-03-29 20:17:45 +08:00

reloc

…

repair

repair: rename repair_module

2023-03-27 16:33:39 +02:00

replica

Merge 'Break the proxy -> database -> [views] -> proxy loop' from Pavel Emelyanov

2023-03-30 08:29:29 +03:00

rust

rust: update dependencies

2023-03-16 13:45:53 +02:00

schema

Merge 'Raft, use schema commit log' from Gusev Petr

2023-03-27 13:27:30 +02:00

scripts

scripts: remove git-archive-all

2023-03-29 18:59:23 +03:00

seastar @ 1204efbc5e

Update seastar submodule

2023-03-22 21:21:04 +08:00

service

client_state: split the param list of ctor into multi lines

2023-03-29 20:17:45 +08:00

sstables

sstables: do not use operator<< to print composite_view

2023-03-29 16:13:59 +08:00

streaming

reader_permit: keep trace_state pointer on permit

2023-03-22 04:58:01 -04:00

swagger-ui @ 12f1da1082

…

tasks

repair: rename repair_module

2023-03-27 16:33:39 +02:00

test

Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes

2023-03-30 09:35:59 +03:00

thrift

redis,thrift,transport: make timeout_config live-updateable

2023-03-29 20:17:45 +08:00

tools

Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes

2023-03-30 09:35:59 +03:00

tracing

Merge 'Optimize topology::compare_endpoints' from Benny Halevy

2023-03-07 15:17:19 +02:00

transport

redis,thrift,transport: make timeout_config live-updateable

2023-03-29 20:17:45 +08:00

types

types: remove unused header

2023-03-26 16:55:16 +03:00

unified

Repackaging cqlsh

2023-03-12 20:22:33 +02:00

utils

utils: config_file: add a space after =

2023-03-29 19:22:21 +08:00

.dockerignore

…

.gitattributes

…

.gitignore

git: remove Cargo.lock from .gitignore

2023-02-14 08:51:53 +02:00

.gitmodules

Repackaging cqlsh

2023-03-12 20:22:33 +02:00

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

…

absl-flat_hash_map.hh

…

amplify.yml

…

backlog_controller.hh

…

build_mode.hh

release: correct a typo in comment

2023-03-29 13:42:38 +03:00

bytes_ostream.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

bytes.cc

bytes: implement formatting helpers using formatter

2023-03-27 20:06:45 +08:00

bytes.hh

bytes, gms: s/format_to/fmt::format_to/

2023-03-29 14:47:28 +03:00

cache_flat_mutation_reader.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

cache_temperature.hh

…

cartesian_product.hh

…

cell_locking.hh

treewide: prevent redefining names

2023-03-21 13:42:49 +02:00

checked-file-impl.hh

…

client_data.cc

…

client_data.hh

…

clocks-impl.cc

…

clocks-impl.hh

…

clustering_bounds_comparator.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

clustering_interval_set.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

clustering_key_filter.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

clustering_ranges_walker.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

CMakeLists.txt

build: cmake: drop unnecessary linkages

2023-03-16 12:14:21 +08:00

collection_mutation.cc

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

collection_mutation.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

column_computation.hh

column_computation: adjust to use clustering_or_static_row

2022-12-06 11:21:16 +01:00

combine.hh

…

compatible_ring_position.hh

…

compound_compat.hh

compound_compat: remove operator<<(ostream, composite)

2023-03-29 16:13:59 +08:00

compound.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

compress.cc

compress, transport: do not detect LZ4_compress_default()

2023-02-23 14:39:20 +02:00

compress.hh

…

concrete_types.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

configure.py

raft topology: add RAFT_TOPOLOGY_CMD verb that will be used by topology coordinator to communicated with nodes

2023-03-23 16:29:56 +02:00

CONTRIBUTING.md

Replacing user-group with community forum, added link to U. lesson on Spring Boot Fixed author/email details

2023-02-23 19:05:26 +02:00

converting_mutation_partition_applier.cc

Introduce schema/ module

2023-02-15 11:01:50 +02:00

converting_mutation_partition_applier.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

counters.cc

treewide: use fmtlib when printing UUID

2023-03-20 15:38:45 +08:00

counters.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

cql_serialization_format.hh

treewide: drop cql_serialization_format

2023-01-03 19:54:13 +02:00

db_clock.hh

…

debug.cc

test: extract debug::the_database out

2023-01-19 17:42:23 +08:00

debug.hh

…

default.nix

build: nix: switch to non-static zstd

2023-02-17 10:29:34 +02:00

Doxyfile

…

duration.cc

…

duration.hh

…

encoding_stats.hh

…

enum_set.hh

…

fix_system_distributed_tables.py

…

flake.lock

build: bump Lua version (5.3 -> 5.4) in Nix devenv

2023-01-19 15:53:49 +01:00

flake.nix

build: fix Nix devenv

2022-12-19 20:53:07 +02:00

frozen_schema.cc

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

frozen_schema.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

full_position.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

gc_clock.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

gdbinit

gdbinit: add ignore clause for SIG35

2023-01-12 12:13:04 +02:00

gen_segmented_compress_params.py

…

generic_server.cc

Merge 'Optimize topology::compare_endpoints' from Benny Halevy

2023-03-07 15:17:19 +02:00

generic_server.hh

…

HACKING.md

…

hashing_partition_visitor.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

idl-compiler.py

idl-compiler: mark captured this used

2023-02-28 21:56:55 +08:00

inet_address_vectors.hh

…

init.cc

treewide: use fmtlib to format gms::inet_address

2023-03-27 20:06:45 +08:00

init.hh

configurables: Add optional service lookup to init callback

2023-03-14 17:13:52 +02:00

install-dependencies.sh

build: add wasm compilation target for rust

2023-03-21 10:30:08 +02:00

install.sh

configure.py: build and use libseastar.so in debug and dev modes

2023-02-27 21:08:34 +02:00

interval.hh

…

keys.cc

…

keys.hh

keys: disambiguate construction from initializer_list<bytes>

2023-03-21 13:42:49 +02:00

LICENSE.AGPL

…

log.hh

…

main.cc

view: Add view_builder -> view_update_generator dependency

2023-03-29 14:08:47 +03:00

map_difference.hh

…

marshal_exception.hh

…

multishard_mutation_query.cc

readers/multishard: reader_lifecycle_policy: add get_read_range()

2023-03-24 08:40:11 -04:00

multishard_mutation_query.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

mutation_query.cc

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

mutation_query.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

noexcept_traits.hh

…

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

partition_range_compat.hh

…

partition_slice_builder.cc

treewide: drop cql_serialization_format

2023-01-03 19:54:13 +02:00

partition_slice_builder.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

partition_snapshot_reader.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

partition_snapshot_row_cursor.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

protocol_server.hh

…

querier.cc

treewide: do not define/capture unused variables

2023-02-15 22:57:18 +02:00

querier.hh

treewide: do not define/capture unused variables

2023-02-15 22:57:18 +02:00

query_class_config.hh

…

query_id.hh

query_id: extract into new header

2023-03-01 10:25:25 +02:00

query_ranges_to_vnodes.cc

query_ranges_to_vnodes_generator: fix for exclusive boundaries

2023-02-07 16:02:31 +02:00

query_ranges_to_vnodes.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

query_result_merger.hh

…

query-request.hh

query_id: extract into new header

2023-03-01 10:25:25 +02:00

query-result-reader.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

query-result-set.cc

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

query-result-set.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

query-result-writer.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

query-result.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

query.cc

treewide: use fmtlib when printing UUID

2023-03-20 15:38:45 +08:00

range.hh

…

read_context.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

reader_concurrency_semaphore.cc

reader_permit: set_trace_state(): emit trace message linking to previous page

2023-03-26 18:41:21 +03:00

reader_concurrency_semaphore.hh

reader_permit: keep trace_state pointer on permit

2023-03-22 04:58:01 -04:00

reader_permit.hh

reader_permit: refresh trace_state on new pages

2023-03-22 04:58:10 -04:00

README.md

Replacing user-group with community forum, added link to U. lesson on Spring Boot Fixed author/email details

2023-02-23 19:05:26 +02:00

real_dirty_memory_accounter.hh

dirty_memory_manager: move to replica module

2022-12-06 22:24:17 +02:00

release.cc

…

release.hh

…

reversibly_mergeable.hh

…

row_cache.cc

treewide: use fmt::join() when appropriate

2023-03-16 20:34:18 +08:00

row_cache.hh

row_cache: pass "const cache_entry" to operator<<

2023-03-16 07:46:11 +08:00

schema_mutations.cc

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

schema_mutations.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

schema_upgrader.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

scylla_post_install.sh

scylla_coredump_setup: fix coredump timeout settings

2023-02-16 10:23:20 +02:00

scylla-gdb.py

Merge 'reader_concurrency_semaphore: handle read blocked on memory being registered as inactive' from Botond Dénes

2023-03-15 20:10:19 +02:00

SCYLLA-VERSION-GEN

release: prepare for 5.3.0-dev

2023-01-18 16:22:41 +02:00

seastarx.hh

…

serialization_visitors.hh

…

serializer_impl.hh

…

serializer.cc

…

serializer.hh

…

service_permit.hh

…

setup.py

…

shell.nix

…

sstables_loader.cc

reader_permit: keep trace_state pointer on permit

2023-03-22 04:58:01 -04:00

sstables_loader.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

supervisor.hh

…

table_helper.cc

…

table_helper.hh

…

test.py

test: drop our "pytest" wrapper script

2023-03-08 07:31:37 +02:00

timeout_config.cc

timeout_config: remove unused make_timeout_config()

2023-03-29 20:17:45 +08:00

timeout_config.hh

timeout_config: remove unused make_timeout_config()

2023-03-29 20:17:45 +08:00

timestamp.hh

…

tombstone_gc_extension.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

tombstone_gc_options.cc

…

tombstone_gc_options.hh

…

tombstone_gc.cc

Introduce schema/ module

2023-02-15 11:01:50 +02:00

tombstone_gc.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

tox.ini

…

ubsan-suppressions.supp

…

unimplemented.cc

…

unimplemented.hh

…

validation.cc

validation: Avoid throwing schema lookup

2023-03-24 08:43:48 +02:00

validation.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

version.hh

version: Reverse version increase

2022-12-12 18:45:32 +02:00

view_info.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

vint-serialization.cc

…

vint-serialization.hh

…

zstd.cc

…

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.5%

Python 26.2%

CMake 0.4%

GAP 0.3%

Shell 0.3%