mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Go to file

Kamil Braun 3155cde9c8 sys_dist_ks: new table for exchanging CDC generations

Currently when a node wants to create and broadcast a new CDC generation
it performs the following steps:
1. choose the generation's stream IDs and mapping (how this is done is
   irrelevant for the current discussion)
2. choose the generation's timestamp by taking the current time
   (according to its local clock) and adding 2 * ring_delay
3. insert the generation's data (mapping and stream IDs) into
   system_distributed.cdc_generation_descriptions, using the
   generation's timestamp as the partition key (we call this table
   the "old internal table" below)
4. insert the generation's timestamp into the "CDC_STREAMS_TIMESTAMP"
   application state.

The timestamp spreads epidemically through the gossip protocol. When
nodes see the timestamp, they retrieve the generation data from the
old internal table.

Unfortunately, due to the schema of the old internal table, where
the entire generation data is stored in a single cell, step 3 may fail for
sufficiently large generations (there is a size threshold for which step
3 will always fail - retrying the operation won't help). Also the old
internal table lies in the system_distributed keyspace that uses
SimpleStrategy with replication factor 3, which is also problematic; for
example, when nodes restart, they must reach at least 2 out of these 3
specific replicas in order to retrieve the current generation (we write
and read the generation data with QUORUM, unless we're a single-node
cluster, where we use ONE). Until this happens, a restarting
node can't coordinate writes to CDC-enabled tables. It would be better
if the node could access the last known generation locally.

The commit introduces a new table for broadcasting generation data with
the following properties:
-  it uses a better schema that stores the data in multiple rows, each
   of manageable size
-  it resides in the `system_distributed_everywhere` keyspace so the
   data will be written to every node in the cluster that has a token in
   the token ring
-  the data will be written using CL=ALL and read using CL=ONE; thanks
   to this, restarting node won't have to communicate with other nodes
   to retrieve the data of the last known generation. Note that writing
   with CL=ALL does not reduce availability: creating a new generation
   *requires* all nodes to be available anyway, because they must learn
   about the generation before their clocks go past the generation's
   timestamp; if they don't, partitions won't be mapped to stream IDs
   consistently across the cluster
-  the partition key is no longer the generation's timestamp. Because it
   was that way in the old internal table, it forced the algorithm to
   choose the timestamp *before* the generation data was inserted into
   the table. What if the inserting took a long time? It increased the
   chance that nodes would learn about the generation too late (after
   their clocks moved past its timestamp). With the new schema we will
   first insert the generation data using a randomly generated UUID as
   the partition key, *then* choose the timestamp, then gossip both the
   timestamp and the UUID. The timestamp and the UUID form the
   "generation identifier" of this new generation; this should explain
   why we introduced the generation_id_v2 type in previous commits.
   Observe that after a node learns about a generation broadcasted using
   this new method through gossip it will retrieve its data very quickly
   since it's one of the replicas and it can use CL=ONE as it was
   written using CL=ALL.

Note that the node is still using the old method - the actual switch
will be done in a later commit.

2021-05-25 16:07:23 +02:00

.github

docs: added multiversion_regex_builder

2021-01-13 11:07:29 +02:00

abseil @ 9c6a50fdd8

Update abseil submodule

2021-02-08 15:41:46 +02:00

alternator

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

api

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

auth

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

cdc

tree-wide: introduce cdc::generation_id_v2

2021-05-24 17:50:21 +02:00

conf

config: relax batch size warning and failure thresholds

2021-04-06 20:56:06 +03:00

cql3

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

sys_dist_ks: new table for exchanging CDC generations

2021-05-25 16:07:23 +02:00

debug

…

dht

Merge 'token_metadata: Fix get_all_endpoints to return nodes in the ring' from Asias He

2021-05-11 18:39:10 +03:00

dist

install.sh: Setup aio-max-nr upon installation

2021-05-24 14:24:20 +03:00

docs

docs: add a paragraph describing service level timeouts

2021-05-10 12:39:41 +02:00

exceptions

cql: fix error return from execution of fromJson() and other functions

2021-01-21 15:21:13 +01:00

gms

tree-wide: introduce cdc::generation_id_v2

2021-05-24 17:50:21 +02:00

idl

Merge 'Switch to use NODE_OPS_CMD for decommission and bootstrap operation' from Asias He

2021-05-06 17:28:19 +03:00

index

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

interface

…

libdeflate @ e7e54eab42

…

licenses

…

locator

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

message

gossip: Introduce direct failure detector

2021-05-24 10:47:06 +03:00

mutation_writer

mutation_writer: add segregate_by_partition

2021-05-05 12:03:42 +03:00

raft

Merge "raft: fsm cleanups" from Gleb

2021-05-14 17:24:59 +02:00

redis

redis: drop unused fields _storage_proxy and _requests_blocked_memory

2021-05-21 20:58:32 +03:00

reloc

reloc: Remove "build_reloc.sh" script as obsolete

2020-11-20 22:41:26 +02:00

repair

repair: drop unused _nr_peer_nodes field

2021-05-21 20:59:23 +03:00

scripts

scripts: introduce coverage.py

2021-05-07 15:54:49 +03:00

seastar @ 28dddd2683

Update seastar submodule

2021-05-20 20:14:15 +03:00

service

sys_dist_ks: new table for exchanging CDC generations

2021-05-25 16:07:23 +02:00

sstables

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

streaming

streaming: drop unused fields

2021-05-21 21:03:23 +03:00

swagger-ui @ 12f1da1082

…

test

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

thrift

treewide: remove extraneous database.hh includes from headers

2021-05-20 01:59:14 +03:00

tools

Update tools/jmx submodule (toppartitions multi-sampler query)

2021-05-11 18:39:10 +03:00

tracing

treewide: remove inclusions of storage_proxy.hh from headers

2021-04-20 21:23:00 +03:00

transport

transport: remove extraneous qos/service_level_controller includes from headers

2021-05-20 02:32:15 +03:00

types

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

unified

unified/uninstall.sh: simplify uninstall.sh, delete all files correctly

2021-05-18 14:55:18 +02:00

utils

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

.dockerignore

…

.gitattributes

…

.gitignore

docs: added theme

2020-12-03 17:37:18 +01:00

.gitmodules

…

.gitorderfile

…

absl-flat_hash_map.cc

…

absl-flat_hash_map.hh

…

atomic_cell_hash.hh

imr: switch back to open-coded description of structures

2021-02-16 23:43:07 +01:00

atomic_cell_or_collection.hh

imr: switch back to open-coded description of structures

2021-02-16 23:43:07 +01:00

atomic_cell.cc

atomic_cell: fix operator<< for atomic_cell_or_collection

2021-02-22 14:45:34 +02:00

atomic_cell.hh

atomic_cell: get rid of is_value_fragments

2021-05-09 11:08:53 +03:00

backlog_controller.hh

…

bytes_ostream.hh

bytes_ostream: convert write_placeholder from enable_if to concepts

2021-03-22 12:00:07 +01:00

bytes.cc

…

bytes.hh

bytes: implement std::hash using appending_hash

2021-01-08 13:17:46 +01:00

cache_flat_mutation_reader.hh

row_cache: Avoid generating overlapping range tombstones

2021-05-12 00:10:24 +02:00

cache_temperature.hh

…

caching_options.cc

caching_options.hh: move code to .cc

2021-04-05 13:05:43 +03:00

caching_options.hh

caching_options.hh: move code to .cc

2021-04-05 13:05:43 +03:00

canonical_mutation.cc

canonical_mutation: make the data type non-contiguous

2021-02-15 10:24:47 +01:00

canonical_mutation.hh

canonical_mutation: make the data type non-contiguous

2021-02-15 10:24:47 +01:00

cartesian_product.hh

cartesian_product: Remove std::iterator from iterator

2020-11-17 16:53:20 +01:00

cell_locking.hh

…

checked-file-impl.hh

files: Construct file_impls properly

2021-03-26 00:22:11 +01:00

clocks-impl.cc

…

clocks-impl.hh

…

clustering_bounds_comparator.hh

clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view

2020-12-20 15:14:44 +01:00

clustering_interval_set.hh

clustering_interval_set: Remove std::iterator from position_range_iterator

2020-11-17 16:53:20 +01:00

clustering_key_filter.hh

…

clustering_ranges_walker.hh

clustering_range_walker: fix false discontiguity detected after a static row

2021-02-01 19:32:07 +02:00

CMakeLists.txt

db: Add virtual tables interface

2021-05-12 17:05:34 +02:00

collection_mutation.cc

collection_mutation: don't linearize collection values

2021-05-23 12:16:56 +03:00

collection_mutation.hh

collection_mutation: don't linearize collection values

2021-05-23 12:16:56 +03:00

column_computation.hh

Reduce dependency on header utils/rjson.hh

2021-04-25 13:20:51 +03:00

combine.hh

…

compaction_garbage_collector.hh

…

compaction_strategy_type.hh

…

compaction_strategy.hh

…

compatible_ring_position.hh

dht: ring_position, decorated_key: convert tri_comparators to std::strong_ordering

2021-03-18 12:40:05 +02:00

compound_compat.hh

composite: replace enable_if with constraints

2021-04-04 13:56:51 +03:00

compound.hh

keys, compound: take the argument to from_single_value() by reference

2021-05-24 11:20:24 +03:00

compress.cc

…

compress.hh

…

concrete_types.hh

…

configure.py

build: enable -Wunused-private-field warning

2021-05-21 21:05:16 +03:00

connection_notifier.cc

treewide: remove inclusions of storage_proxy.hh from headers

2021-04-20 21:23:00 +03:00

connection_notifier.hh

code: Use qctx::evecute_cql methods, not global ones

2020-11-19 18:39:05 +03:00

CONTRIBUTING.md

CONTRIBUTING.md: add the requirement for self-contained headers

2021-05-05 15:10:46 +03:00

converting_mutation_partition_applier.cc

imr: switch back to open-coded description of structures

2021-02-16 23:43:07 +01:00

converting_mutation_partition_applier.hh

…

counters.cc

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

counters.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

cql_serialization_format.hh

…

database_fwd.hh

…

database.cc

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

database.hh

utils: phased_barrier: advance_and_await: make noexcept

2021-05-12 01:36:11 +02:00

db_clock.hh

…

debug.hh

…

default.nix

build: add nix-shell support

2021-04-14 13:15:59 +02:00

digest_algorithm.hh

…

digester.hh

…

dirty_memory_manager.hh

…

distributed_loader.cc

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

distributed_loader.hh

distributed_loader: Add get_sstables_from_upload_dir

2021-01-16 20:03:17 +08:00

Doxyfile

…

duration.cc

…

duration.hh

…

encoding_stats.hh

…

enum_set.hh

…

fix_system_distributed_tables.py

tracing: add username to the session table

2020-10-01 04:46:40 +02:00

flat_mutation_reader.cc

mutation_fragment_stream_validator: add reset methods

2021-05-05 12:03:42 +03:00

flat_mutation_reader.hh

flat_mutation_reader: consume_mutation_fragments_until: maybe yield after each popped mutation_fragment

2021-05-03 14:06:26 +03:00

frozen_mutation.cc

flat_mutation_reader: make sure to close flat_mutation_reader_from_mutations

2021-04-25 11:25:47 +03:00

frozen_mutation.hh

Merge "lwt: store column_mapping's for each table schema version upon a DDL change" from Pavel Solodovnikov

2020-10-15 20:48:29 +02:00

frozen_schema.cc

frozen_schema: order idl implementations correctly

2020-10-03 19:56:28 +03:00

frozen_schema.hh

…

gc_clock.hh

…

gen_segmented_compress_params.py

…

generic_server.cc

generic_server: Rename "maybe_idle" to "maybe_stop"

2021-04-13 14:13:24 +03:00

generic_server.hh

generic_server: Rename "maybe_idle" to "maybe_stop"

2021-04-13 14:13:24 +03:00

HACKING.md

Merge "Improve coverage support" from Botond

2021-05-11 18:39:10 +03:00

hashers.cc

…

hashers.hh

…

hashing_partition_visitor.hh

…

hashing.hh

hashing: appending_hash: convert from enable_if to concepts

2021-03-17 09:59:22 +02:00

idl-compiler.py

idl-compiler: allow fields of type utils::chunked_vector

2021-01-13 04:09:18 +01:00

inet_address_vectors.hh

storage_proxy, treewide: use utils::small_vector inet_address_vector:s

2021-05-05 18:36:54 +03:00

init.cc

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

init.hh

cross-tree: reduce dependency on db/config.hh and database.hh

2021-05-05 13:23:00 +03:00

install-dependencies.sh

build: drop lld from install-dependencies.sh on s390x

2021-04-12 09:46:33 +03:00

install.sh

install.sh: apply correct file security context when copying files

2021-05-18 12:09:51 +03:00

interval.hh

interval: support C++20 three-way comparisons

2021-02-28 21:03:25 +02:00

intrusive_set_external_comparator.hh

…

keys.cc

keys: convert trichotomic comparators to return std::strong_ordering

2021-03-21 09:30:43 +02:00

keys.hh

keys, compound: take the argument to from_single_value() by reference

2021-05-24 11:20:24 +03:00

LICENSE.AGPL

…

lister.cc

…

lister.hh

…

log.hh

…

lua.cc

cross-tree: reduce dependency on db/config.hh and database.hh

2021-05-05 13:23:00 +03:00

lua.hh

cross-tree: reduce dependency on db/config.hh and database.hh

2021-05-05 13:23:00 +03:00

main.cc

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

map_difference.hh

…

marshal_exception.hh

…

memtable-sstable.hh

table: Add write_memtable_to_sstable variant which accepts flat_mutation_reader

2021-01-04 16:23:00 -03:00

memtable.cc

memtable: flush_reader: make sure to close partition reader

2021-04-25 11:35:07 +03:00

memtable.hh

memtable: Track min timestamp

2021-01-04 13:24:43 -03:00

multishard_mutation_query.cc

multishard_mutation_query: save_reader(): avoid round-trip for destroying rparts

2021-05-18 10:07:13 +03:00

multishard_mutation_query.hh

multishard_mutation_query: add query_data_on_all_shards()

2021-03-02 07:53:53 +02:00

mutation_cleaner.hh

…

mutation_compactor.hh

mutation compactor: query compaction: ignore purgeable tombstones

2021-01-22 15:27:48 +02:00

mutation_consumer_concepts.hh

flat_mutation_reader: move mutation consumer concepts to separate header

2021-01-22 15:27:48 +02:00

mutation_fragment_stream_validator.hh

mutation_fragment_stream_validator: add reset methods

2021-05-05 12:03:42 +03:00

mutation_fragment.cc

range_tombstone_stream: Remove unused methods

2021-03-16 12:08:18 +03:00

mutation_fragment.hh

clustering_row: Add new .apply() overload

2021-04-09 10:05:47 +03:00

mutation_partition_serializer.cc

imr: switch back to open-coded description of structures

2021-02-16 23:43:07 +01:00

mutation_partition_serializer.hh

…

mutation_partition_view.cc

mutation_fragment: add schema and permit

2020-09-28 11:27:23 +03:00

mutation_partition_view.hh

…

mutation_partition_visitor.hh

…

mutation_partition.cc

mutation_partition: counter_write_query: close reader when done

2021-04-25 11:35:07 +03:00

mutation_partition.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

mutation_query.cc

mutation_query: move to_data_query_result() to mutation_partition.cc

2021-01-22 15:27:48 +02:00

mutation_query.hh

mutation_query: remove now unused mutation_query()

2021-04-09 13:40:27 +03:00

mutation_reader.cc

evictable_reader: remove _reader_created flag

2021-05-16 14:45:46 +03:00

mutation_reader.hh

mutation_reader: reader_lifecycle_policy: return future from destroy_reader

2021-04-25 11:35:07 +03:00

mutation_rebuilder.hh

mutation_rebuilder: drop unused field _remaining_limit

2021-05-21 20:57:33 +03:00

mutation_source_metadata.hh

…

mutation.cc

mutation: remove now unused query() and query_compacted()

2021-01-22 15:36:37 +02:00

mutation.hh

mutation: consume(): add reverse mode

2021-02-03 11:00:47 +02:00

noexcept_traits.hh

noexcept_traits: convert enable_if to concepts

2021-03-30 09:30:23 +02:00

NOTICE.txt

raft: etcd unit tests: initial boost tests

2021-01-18 12:33:12 -04:00

ORIGIN

…

partition_builder.hh

partition_builder: accept_row(): use append_clustering_row()

2020-12-02 15:08:49 +02:00

partition_range_compat.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

partition_slice_builder.cc

…

partition_slice_builder.hh

…

partition_snapshot_reader.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

partition_snapshot_row_cursor.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

partition_version_list.hh

…

partition_version.cc

misc: fix indentation

2021-01-08 14:16:08 +01:00

partition_version.hh

row_cache: Zap dummy entries when populating or reading a range

2021-03-01 20:34:35 +02:00

position_in_partition.hh

position_in_partition: Convert tri_compare to strong_ordering

2021-04-09 18:20:39 +03:00

querier.cc

querier_cache: implement stop

2021-04-25 11:35:07 +03:00

querier.hh

querier_cache: implement stop

2021-04-25 11:35:07 +03:00

query_class_config.hh

query_class_config: add operator== for max_result_size

2021-05-05 11:20:22 +03:00

query_result_merger.hh

…

query-request.hh

query: partition_slice: add range_scan_data_variant option

2021-03-02 07:53:53 +02:00

query-result-reader.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

query-result-set.cc

treewide: use query_mutations() instead of mutation::query()

2021-01-22 15:36:37 +02:00

query-result-set.hh

…

query-result-writer.hh

mutation_partition: mark query_result_builder constructor noexcept

2021-04-25 11:35:07 +03:00

query-result.hh

result_memory_accounter: abort unpaged queries hitting the global limit

2021-02-26 23:43:16 +02:00

query.cc

…

range_tombstone_list.cc

range_tombstone_list: Add new slice() helper

2021-03-16 11:55:28 +03:00

range_tombstone_list.hh

range_tombstone_list: Add new slice() helper

2021-03-16 11:55:28 +03:00

range_tombstone.cc

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

range_tombstone.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

range.hh

…

read_context.hh

read_context: move_to_next_partition(): make reader creation atomic

2021-05-18 13:41:48 +03:00

reader_concurrency_semaphore.cc

reader_concurrency_semaphore: dump_reader_diagnostics(): print more information in the header

2021-05-10 10:15:47 +03:00

reader_concurrency_semaphore.hh

reader_concurrency_semaphore: dump_reader_diagnostics(): cap number of printed lines

2021-05-10 10:15:47 +03:00

reader_permit.hh

reader_permit: always forward resources

2021-04-26 15:56:56 +03:00

README.md

docs: fix invalid path in README.mds

2021-02-21 13:49:12 +02:00

real_dirty_memory_accounter.hh

…

release.cc

scylla: Add "--build-mode" command line option

2021-01-20 16:07:29 +02:00

release.hh

scylla: Add "--build-mode" command line option

2021-01-20 16:07:29 +02:00

reversibly_mergeable.hh

…

row_cache.cc

row_cache: create_underlying_reader: call read_context on_underlying_created only on success

2021-05-12 01:34:48 +02:00

row_cache.hh

row_cache: hold read_context as unique_ptr

2021-04-25 11:35:07 +03:00

schema_builder.hh

schema_tables: put schema tables on shard 0

2021-01-28 13:28:22 +02:00

schema_fwd.hh

…

schema_mutations.cc

uuid: reduce code dependency on UUID_gen.hh

2021-01-27 20:08:29 +02:00

schema_mutations.hh

…

schema_registry.cc

global_schema_ptr: add support for view's base table

2021-03-07 12:50:42 +02:00

schema_registry.hh

global_schema_ptr: add support for view's base table

2021-03-07 12:50:42 +02:00

schema_upgrader.hh

mutation_fragment: add schema and permit

2020-09-28 11:27:23 +03:00

schema.cc

Reduce dependency on header utils/rjson.hh

2021-04-25 13:20:51 +03:00

schema.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

scylla_post_install.sh

…

scylla-gdb.py

scylla-gdb: scylla_io_queues: support io_group._max_bytes_count

2021-05-20 20:14:15 +03:00

SCYLLA-VERSION-GEN

version: prepare for the 4.6 cycle

2021-04-01 20:40:52 +03:00

seastarx.hh

…

serialization_visitors.hh

…

serializer_impl.hh

treewide: reduce boost headers usage in scylla header files

2021-05-20 01:33:18 +03:00

serializer.cc

serializer: add serializer<lw_shared_ptr<T>> specialization

2021-01-29 01:58:46 +03:00

serializer.hh

serializer: implement FragmentedView for buffer_view

2020-11-27 15:26:13 +01:00

service_permit.hh

service_permit: add a getter for the number of units held

2021-03-29 11:34:18 +02:00

setup.py

…

shell.nix

build: add nix-shell support

2021-04-14 13:15:59 +02:00

supervisor.hh

…

table_helper.cc

table_helper: Use query_processor::get_migration_manager()

2021-03-15 19:35:53 +03:00

table_helper.hh

table_helper: Require local query processor in calls

2020-10-06 15:44:20 +03:00

table.cc

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

test.py

test.py: refine test mode control

2021-05-11 18:39:10 +03:00

timeout_config.cc

…

timeout_config.hh

…

timestamp.hh

…

to_string.hh

…

tombstone.hh

…

tox.ini

…

types.cc

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

types.hh

Merge "treewide: various header cleanups" from Pavel S

2021-05-24 14:24:20 +03:00

ubsan-suppressions.supp

suppress ubsan error in boost::deque::clear()

2020-11-09 11:25:19 +02:00

unimplemented.cc

…

unimplemented.hh

…

user_types_metadata.hh

…

validation.cc

validation: Remove get_local_storage_proxy call

2020-12-11 18:52:42 +03:00

validation.hh

validation: Remove get_local_storage_proxy call

2020-12-11 18:52:42 +03:00

version.hh

…

view_info.hh

…

vint-serialization.cc

…

vint-serialization.hh

vint-serialization: Reference the correct spec

2021-01-05 18:54:09 +02:00

xx_hasher.hh

…

zstd.cc

…

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The users mailing list and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.5%

Python 26.2%

CMake 0.4%

GAP 0.3%

Shell 0.3%