mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Go to file

Botond Dénes 05b381bfa2 Merge 'Simple S3 storage for sstables' from Pavel Emelyanov

The PR adds sstables storage backend that keeps all component files as S3 objects and system.sstables_registry ownership table that keeps track of what sstables objects belong to local node and their names.

When a keyspace is configured with 'STORAGE = { 'type': 'S3' }' the respective class table object eventually gets the storage_options instance pointing to the target S3 endpoint and bucket. All the sstables created for that table attach the S3 storage implementation that maintains components' files as S3 objects. Writing to and reading from components is handled by the S3 client facilities from utils/. Changing the sstable state, which is -- moving between normal, staging and quarantine states -- is not yet implemented, but would eventually happen by updating entries in the sstables registry.

To keep track of which node owns which objects, to provide bucket-wide uniqueness of object names and to maintain sstable state the storage driver keeps records in the system.sstables_registry ownership table. The table maps sstable location and generation to the object format, version, status-state (*) and (!) unique identifier (some time soon this identifier is supposed to be replaced with UUID sstables generations). The component object name is thus s3://bucket/uuid/component_basename. The registry is also used on boot. The distributed loader picks up sstables from all the tables found in schema and for S3-backed keyspaces it lists entries in the registry to a) identify those and b) get their unique S3-side identifiers to open by name.

(*) About sstable's status and state.

The state field is the part of today's sstable path on disk -- staging, quarantine, normal (root table data dir), etc. Since S3 doesn't have the renaming facility, moving sstable between those states is only possible by updating the entry in the registry. This is not yet implemented in this set (#13017)

The status field tracks sstable' transition through its creation-deletion. It first starts with 'creating' status which corresponds to the today's TemporaryTOC file. After being created and written to the sstable moves into 'sealed' state which corresponds to the today's normal sstable being with the TOC file. To delete sstable atomically it first moves into 'removing' state which is equivalent to being in the deletion-log for the on-disk sstable. Once removed from the bucket, the entry is removed from the registry.

To play with:

1. Start minio (installed by install-dependencies.sh)
```
export MINIO_ROOT_USER=${root_user}
export MINIO_ROOT_PASSWORD=${root_pass}
mkdir -p ${root_directory}
minio server ${root_directory}
```

2. Configure minio CLI, create anonymous bucket
```
mc config host rm local
mc config host add local http://127.0.0.1:9000 ${root_user} ${root_pass}
mc mb local/sstables
mc anonymous set public local/sstables
```

3. Start Scylla with object-storage feature enabled
``` scylla ... --experimental-features=keyspace-storage-options --workdir ${as_usual}```

4. Create KS with S3 storage
``` create keyspace ... storage = { 'type': 'S3', 'endpoint': '127.0.0.1:9000', 'bucket': 'sstables' };```

The S3 client has a logger named "s3", it's useful to use on with `trace` verbosity.

Closes #12523

* github.com:scylladb/scylladb:
  test: Add object-storage test
  distributed_loader: Print storage type when populating
  sstable_directory: Add ownership table components lister
  sstable_directory: Make components_lister and API
  sstable_directory: Create components lister based on storage options
  sstables: Add S3 storage implementation
  system_keyspace: Add ownership table
  system_keyspace: Plug to user sstables manager too
  sstable: Make storage instance based on storage options
  sstable_directory: Keep storage_options aboard
  sstable: Virtualize the helper that gets on-disk stats for sstable
  sstable, storage: Virtualize data sink making for small components
  sstable, storage: Virtualize data sink making for Data and Index
  sstable/writer: Shuffle writer::init_file_writers()
  sstable: Make storage an API
  utils: Add S3 readable file impl for random reads
  utils: Add S3 data sink for multipart upload
  utils: Add S3 client with basic ops
  cql-pytest: Add option to run scylla over stable directory
  test.py: Equip it with minio server
  sstables: Detach write_toc() helper

2023-04-11 08:17:25 +03:00

.github

docs: Separate conf.py

2023-03-27 13:42:58 +03:00

alternator

alternator,util: Move aws4-hmac-sha256 signature generator to util

2023-04-04 18:24:48 +03:00

api

Merge 'Topology: introduce nodes' from Benny Halevy

2023-04-06 13:47:22 +03:00

auth

auth: remove unused operator<<(.., resource_kind)

2023-04-07 20:32:28 +08:00

cdc

build: cmake: extract more subsystem out into its own CMakeLists.txt

2023-03-02 10:15:25 +08:00

cmake

build: cmake: set stack frame limits

2023-04-04 15:33:20 +08:00

compaction

Merge 'Compaction reevaluation bug fixes' from Raphael "Raph" Carvalho

2023-04-05 13:51:21 +03:00

conf

commitlog: use separate directory for schema commitlog

2023-03-30 21:55:50 +04:00

cql3

cql3: s/std::regex/boost::regex/

2023-04-06 09:50:32 -04:00

data_dictionary

table: Keep storage options lw-shared-ptr

2023-03-16 17:30:45 +03:00

Merge 'Simple S3 storage for sstables' from Pavel Emelyanov

2023-04-11 08:17:25 +03:00

debug

…

dht

bootstrapper: Add get_random_bootstrap_tokens function

2023-03-21 16:06:43 +02:00

direct_failure_detector

direct_failure_detector: Avoid throwing exceptions in the success path

2023-03-31 12:40:43 +02:00

dist

commitlog: use separate directory for schema commitlog

2023-03-30 21:55:50 +04:00

docs

system_keyspace: Add ownership table

2023-04-10 16:44:28 +03:00

exceptions

exception: fix the error code used for rate_limit_exception

2022-09-13 11:46:15 +02:00

gms

bytes, gms: s/format_to/fmt::format_to/

2023-03-29 14:47:28 +03:00

idl

build: cmake: add missing source files to idl and service

2023-03-26 14:01:21 +08:00

index

index: s/std::regex/boost::regex/

2023-04-06 09:50:41 -04:00

interface

build: cmake: expose scylla_gen_build_dir from "interface"

2023-02-28 21:28:46 +08:00

lang

wasm: add noexcept specifier for alien::run_on()

2023-04-03 08:19:00 +03:00

licenses

scripts: remove git-archive-all

2023-03-29 18:59:23 +03:00

locator

topology: add node state

2023-04-02 20:18:31 +03:00

message

treewide: use fmtlib to format gms::inet_address

2023-03-27 20:06:45 +08:00

mutation

mutation_partition_v2: add sentinel to the tracker *after* adding it to the tree

2023-04-05 09:52:44 +02:00

mutation_writer

reader_permit: keep trace_state pointer on permit

2023-03-22 04:58:01 -04:00

raft

raft: include boost header using <path/to/header> not "path/to/header"

2023-03-26 14:07:50 +08:00

readers

readers/multishard: shard_reader: fast-forward created reader to current range

2023-03-24 08:43:03 -04:00

redis

redis,thrift,transport: make timeout_config live-updateable

2023-03-29 20:17:45 +08:00

reloc

…

repair

topology: add node state

2023-04-02 20:18:31 +03:00

replica

distributed_loader: Print storage type when populating

2023-04-10 16:44:29 +03:00

rust

build: cmake: include cxx.h with relative path

2023-04-04 15:33:20 +08:00

schema

Merge 'Raft, use schema commit log' from Gusev Petr

2023-03-27 13:27:30 +02:00

scripts

scripts/refresh-submodules.sh: include all commits in summary

2023-04-06 11:27:14 +03:00

seastar @ 1204efbc5e

Update seastar submodule

2023-03-22 21:21:04 +08:00

service

Merge 'Standardize node ops sync_nodes selection' from Benny Halevy

2023-04-10 13:14:55 +02:00

sstables

Merge 'Simple S3 storage for sstables' from Pavel Emelyanov

2023-04-11 08:17:25 +03:00

streaming

reader_permit: keep trace_state pointer on permit

2023-03-22 04:58:01 -04:00

swagger-ui @ 12f1da1082

…

tasks

repair: rename repair_module

2023-03-27 16:33:39 +02:00

test

Merge 'Simple S3 storage for sstables' from Pavel Emelyanov

2023-04-11 08:17:25 +03:00

thrift

Merge 'tree: migrate from std::regex to boost::regex' from Botond Dénes

2023-04-09 18:47:41 +03:00

tools

sstable: Make storage instance based on storage options

2023-04-10 16:43:01 +03:00

tracing

Merge 'Optimize topology::compare_endpoints' from Benny Halevy

2023-03-07 15:17:19 +02:00

transport

redis,thrift,transport: make timeout_config live-updateable

2023-03-29 20:17:45 +08:00

types

Merge 'tree: migrate from std::regex to boost::regex' from Botond Dénes

2023-04-09 18:47:41 +03:00

unified

Repackaging cqlsh

2023-03-12 20:22:33 +02:00

utils

Merge 'Simple S3 storage for sstables' from Pavel Emelyanov

2023-04-11 08:17:25 +03:00

.dockerignore

…

.gitattributes

…

.gitignore

git: remove Cargo.lock from .gitignore

2023-02-14 08:51:53 +02:00

.gitmodules

Repackaging cqlsh

2023-03-12 20:22:33 +02:00

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

…

absl-flat_hash_map.hh

…

amplify.yml

docs: automatic previews configuration

2022-11-04 15:44:22 +02:00

backlog_controller.hh

…

build_mode.hh

release: correct a typo in comment

2023-03-29 13:42:38 +03:00

bytes_ostream.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

bytes.cc

bytes: implement formatting helpers using formatter

2023-03-27 20:06:45 +08:00

bytes.hh

bytes, gms: s/format_to/fmt::format_to/

2023-03-29 14:47:28 +03:00

cache_flat_mutation_reader.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

cache_temperature.hh

…

cartesian_product.hh

…

cell_locking.hh

treewide: prevent redefining names

2023-03-21 13:42:49 +02:00

checked-file-impl.hh

…

client_data.cc

…

client_data.hh

…

clocks-impl.cc

…

clocks-impl.hh

…

clustering_bounds_comparator.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

clustering_interval_set.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

clustering_key_filter.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

clustering_ranges_walker.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

CMakeLists.txt

build: cmake: set stack frame limits

2023-04-04 15:33:20 +08:00

collection_mutation.cc

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

collection_mutation.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

column_computation.hh

column_computation: adjust to use clustering_or_static_row

2022-12-06 11:21:16 +01:00

combine.hh

…

compatible_ring_position.hh

compatible_ring_position_or_view: make it cheap to copy

2022-10-04 12:00:21 +03:00

compound_compat.hh

compound_compat: remove operator<<(ostream, composite)

2023-03-29 16:13:59 +08:00

compound.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

compress.cc

compress, transport: do not detect LZ4_compress_default()

2023-02-23 14:39:20 +02:00

compress.hh

…

concrete_types.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

configure.py

Merge 'Simple S3 storage for sstables' from Pavel Emelyanov

2023-04-11 08:17:25 +03:00

CONTRIBUTING.md

Replacing user-group with community forum, added link to U. lesson on Spring Boot Fixed author/email details

2023-02-23 19:05:26 +02:00

converting_mutation_partition_applier.cc

Introduce schema/ module

2023-02-15 11:01:50 +02:00

converting_mutation_partition_applier.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

counters.cc

treewide: use fmtlib when printing UUID

2023-03-20 15:38:45 +08:00

counters.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

cql_serialization_format.hh

treewide: drop cql_serialization_format

2023-01-03 19:54:13 +02:00

db_clock.hh

…

debug.cc

test: extract debug::the_database out

2023-01-19 17:42:23 +08:00

debug.hh

…

default.nix

build: nix: switch to non-static zstd

2023-02-17 10:29:34 +02:00

Doxyfile

…

duration.cc

duration.cc: s/std::regex/boost::regex/

2023-04-06 09:50:37 -04:00

duration.hh

…

encoding_stats.hh

…

enum_set.hh

…

fix_system_distributed_tables.py

…

flake.lock

build: bump Lua version (5.3 -> 5.4) in Nix devenv

2023-01-19 15:53:49 +01:00

flake.nix

build: fix Nix devenv

2022-12-19 20:53:07 +02:00

frozen_schema.cc

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

frozen_schema.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

full_position.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

gc_clock.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

gdbinit

gdbinit: add ignore clause for SIG35

2023-01-12 12:13:04 +02:00

gen_segmented_compress_params.py

…

generic_server.cc

Merge 'Optimize topology::compare_endpoints' from Benny Halevy

2023-03-07 15:17:19 +02:00

generic_server.hh

…

HACKING.md

commitlog: use separate directory for schema commitlog

2023-03-30 21:55:50 +04:00

hashing_partition_visitor.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

idl-compiler.py

idl-compiler: mark captured this used

2023-02-28 21:56:55 +08:00

inet_address_vectors.hh

…

init.cc

treewide: use fmtlib to format gms::inet_address

2023-03-27 20:06:45 +08:00

init.hh

configurables: Add optional service lookup to init callback

2023-03-14 17:13:52 +02:00

install-dependencies.sh

build: add wasm compilation target for rust

2023-03-21 10:30:08 +02:00

install.sh

configure.py: build and use libseastar.so in debug and dev modes

2023-02-27 21:08:34 +02:00

interval.hh

…

keys.cc

add utf8:validate to operator<< partition_key with_schema.

2022-09-22 16:42:31 +03:00

keys.hh

keys: disambiguate construction from initializer_list<bytes>

2023-03-21 13:42:49 +02:00

LICENSE.AGPL

…

log.hh

…

main.cc

Merge 'Topology: introduce nodes' from Benny Halevy

2023-04-06 13:47:22 +03:00

map_difference.hh

…

marshal_exception.hh

…

multishard_mutation_query.cc

readers/multishard: reader_lifecycle_policy: add get_read_range()

2023-03-24 08:40:11 -04:00

multishard_mutation_query.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

mutation_query.cc

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

mutation_query.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

noexcept_traits.hh

…

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

partition_range_compat.hh

…

partition_slice_builder.cc

treewide: drop cql_serialization_format

2023-01-03 19:54:13 +02:00

partition_slice_builder.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

partition_snapshot_reader.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

partition_snapshot_row_cursor.hh

partition_snapshot_row_cursor: do not use operator<< when printing position

2023-03-31 19:03:14 +08:00

protocol_server.hh

…

querier.cc

treewide: do not define/capture unused variables

2023-02-15 22:57:18 +02:00

querier.hh

treewide: do not define/capture unused variables

2023-02-15 22:57:18 +02:00

query_class_config.hh

…

query_id.hh

query_id: extract into new header

2023-03-01 10:25:25 +02:00

query_ranges_to_vnodes.cc

query_ranges_to_vnodes_generator: fix for exclusive boundaries

2023-02-07 16:02:31 +02:00

query_ranges_to_vnodes.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

query_result_merger.hh

…

query-request.hh

query_id: extract into new header

2023-03-01 10:25:25 +02:00

query-result-reader.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

query-result-set.cc

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

query-result-set.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

query-result-writer.hh

types: move types.{cc,hh} into types

2023-02-19 21:05:45 +02:00

query-result.hh

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

query.cc

treewide: use fmtlib when printing UUID

2023-03-20 15:38:45 +08:00

range.hh

…

read_context.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

reader_concurrency_semaphore.cc

reader_permit: set_trace_state(): emit trace message linking to previous page

2023-03-26 18:41:21 +03:00

reader_concurrency_semaphore.hh

reader_permit: keep trace_state pointer on permit

2023-03-22 04:58:01 -04:00

reader_permit.hh

reader_permit: refresh trace_state on new pages

2023-03-22 04:58:10 -04:00

README.md

Replacing user-group with community forum, added link to U. lesson on Spring Boot Fixed author/email details

2023-02-23 19:05:26 +02:00

real_dirty_memory_accounter.hh

dirty_memory_manager: move to replica module

2022-12-06 22:24:17 +02:00

release.cc

release: define SCYLLA_BUILD_MODE_STR by stringifying SCYLLA_BUILD_MODE

2022-08-25 16:50:42 +02:00

release.hh

release: define SCYLLA_BUILD_MODE_STR by stringifying SCYLLA_BUILD_MODE

2022-08-25 16:50:42 +02:00

reversibly_mergeable.hh

…

row_cache.cc

Fix use-after-move when initializing row cache with dummy entry

2023-03-31 19:46:53 +03:00

row_cache.hh

row_cache: pass "const cache_entry" to operator<<

2023-03-16 07:46:11 +08:00

schema_mutations.cc

utils: move hashing related files to utils/ module

2023-02-17 07:19:52 +02:00

schema_mutations.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

schema_upgrader.hh

Introduce mutation/ module

2023-02-14 11:19:03 +02:00

scylla_post_install.sh

scylla_coredump_setup: fix coredump timeout settings

2023-02-16 10:23:20 +02:00

scylla-gdb.py

compaction: rename compaction::task

2023-03-29 15:23:18 +02:00

SCYLLA-VERSION-GEN

release: prepare for 5.3.0-dev

2023-01-18 16:22:41 +02:00

seastarx.hh

…

serialization_visitors.hh

…

serializer_impl.hh

serializer_impl.hh: add reverse vector serializer

2022-11-14 16:06:24 +01:00

serializer.cc

…

serializer.hh

…

service_permit.hh

…

setup.py

…

shell.nix

build: improvements & upgrades to Nix dev environment

2022-10-02 11:47:16 +03:00

sstables_loader.cc

reader_permit: keep trace_state pointer on permit

2023-03-22 04:58:01 -04:00

sstables_loader.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

supervisor.hh

…

table_helper.cc

…

table_helper.hh

…

test.py

test.py: Equip it with minio server

2023-04-10 16:43:01 +03:00

timeout_config.cc

timeout_config: remove unused make_timeout_config()

2023-03-29 20:17:45 +08:00

timeout_config.hh

timeout_config: remove unused make_timeout_config()

2023-03-29 20:17:45 +08:00

timestamp.hh

…

tombstone_gc_extension.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

tombstone_gc_options.cc

…

tombstone_gc_options.hh

…

tombstone_gc.cc

Introduce schema/ module

2023-02-15 11:01:50 +02:00

tombstone_gc.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

tox.ini

…

ubsan-suppressions.supp

…

unimplemented.cc

…

unimplemented.hh

…

validation.cc

validation: Avoid throwing schema lookup

2023-03-24 08:43:48 +02:00

validation.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

version.hh

version: Reverse version increase

2022-12-12 18:45:32 +02:00

view_info.hh

Introduce schema/ module

2023-02-15 11:01:50 +02:00

vint-serialization.cc

…

vint-serialization.hh

…

zstd.cc

…

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.3%

Python 26.5%

CMake 0.3%

GAP 0.3%

Shell 0.3%