Go to file

Andrzej Jackowski ea2bdae45a mapreduce: add tablet-aware dispatching algorithm

The primary goal of this change is to reduce the time during which the
Effective Replication Map (ERM) is retained by the mapreduce service.
This ensures that long aggregate queries do not block topology
operations. As ScyllaDB transitions towards tablets, which simplify
work dispatching, the new algorithm is designed specifically for
tablets.

The algorithm divides work so that each `tablet_replica` (a <host,
shard> pair) processes two tablets at a time. After processing of each
`tablet_replica`, the ERM is released and re-acquired.

The new algorithm can be summarized as follows:
1. Prepare a set of exclusive `partition_ranges`, where each range
   represents one tablet. This set is called `ranges_left`, because it
   contains ranges that still need processing.
2. Loop until `ranges_left` is empty:
   I.  Create `tablet_replica` -> `ranges` mapping for the current ERM
       and `ranges_left`. Store this mapping and the number
       representing current ERM version as `ranges_per_replica`.
   II. In parallel, for each tablet_replica, iterate through
       ranges_per_tablet_replica. Select independently up to two ranges
       that are still existing in ranges_left. Remove each range
       selected for processing from ranges_left. Before each iteration,
       verify that ERM version has not changed. If it has,
       return to Step I.

Steps I and II are exclusive to simplify maintaining `ranges_left` and
`ranges_per_replica`:
 - Step I iterates through `ranges_left` and creates
   `ranges_per_replica`
 - Step II iterates through `ranges_per_replica` and remove processed
   ranges from `ranges_left`

To maintain the exclusivity, the algorithm uses `parallel_for_each` in
Step II, requiring all ongoing `tablet_replica` processing to finish
before returning to Step I.

Currently, each node can handle any partition range, even if the
mapreduce supercoordinator does not retain the ERM and the range is
absent locally. This is because `execute_on_this_shard` creates a new
pager to coordinate the partition range read, including obtaining its
own ERM. However, absent ranges are handled by shard 0, so proper
routing is necessary to avoid overloading shard 0. Thus, in Step II,
the ERM is retained during each `tablet_replica` processing.

The tablet split scenario is not well-handled in this implementation.
After a split, the entire pre-split range is sent to a node hosting
the `tablet_replica` containing the range's `end_token`. The node
will typically not have other tablets in the range, and as
aforementioned, absent ranges are handled by shard 0. As a result,
in such scenario, shard 0 handles a significant portion of the range.
This issue is addressed later in this patch series by introducing
`shard_id` in `mapreduce_request`.

Ref. scylladb#21831

2025-06-25 10:18:02 +02:00

.github

.github/workflows/conflict_reminder: reduce the amount of conflict reminder for every push event

2025-05-28 11:01:44 +03:00

abseil @ d7aaad83b4

…

alternator

Merge 'Add Support for Per-Table Metrics in Alternator' from Amnon Heiman

2025-06-08 10:42:05 +03:00

api

Merge 'Replace container_to_vec with std::ranges' from Pavel Emelyanov

2025-06-06 10:57:06 +03:00

audit

audit: add semaphore to audit_syslog_storage_helper

2025-04-08 16:24:42 +02:00

auth

untyped_result_set: mark get_blob() as returning unfragmented data

2025-05-26 09:40:34 +02:00

bin

treewide: improve bash error reporting

2025-02-10 18:28:52 +03:00

cdc

cdc: add sanity check for generating an empty generation

2025-04-25 11:25:07 +02:00

cmake

build: when compiling without -g, don't leave debugging information

2025-05-12 15:42:17 +03:00

compaction

Merge 'Extend compaction_history table with additional compaction statistics' from Łukasz Paszkowski

2025-05-27 14:12:13 +03:00

conf

Merge 'Add tablet enforcing option' from Benny Halevy

2025-04-03 16:32:19 +03:00

cql3

Merge 'transport: Implement SCYLLA_USE_METADATA_ID support' from Andrzej Jackowski

2025-05-29 12:27:31 +03:00

data_dictionary

Merge 'scylla-sstable: add native S3 support' from Ernest Zaslavsky

2025-03-14 15:05:52 +02:00

Merge 'Extend compaction_history table with additional compaction statistics' from Łukasz Paszkowski

2025-05-27 14:12:13 +03:00

debug

…

dht

dht: fragment token_range_vector

2025-05-27 14:47:24 +03:00

dist

dist: rpm: override %_sbindir for Fedora 42

2025-05-16 12:05:29 +02:00

docs

doc: add the upgrade guide from 2025.1 to 2025.2

2025-06-04 14:00:05 +03:00

ent

encryption/utils: Move encryption httpclient to "general" REST client

2025-05-30 12:21:51 +03:00

exceptions

transport: storage_proxy: release ERM when waiting for query timeout

2025-04-23 09:29:47 +02:00

gms

load_and_stream: Add abortion flow to mutation streaming

2025-05-27 14:21:58 +03:00

idl

dht: fragment token_range_vector

2025-05-27 14:47:24 +03:00

index

vector_index: add custom index and vector index classes

2025-05-27 21:04:50 +02:00

lang

treewide: Reduce db/config.hh header fanout

2025-02-25 15:16:40 +01:00

licenses

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

locator

alternator: Add support for TTL when using tablets

2025-06-05 17:39:29 +03:00

message

dht: fragment token_range_vector

2025-05-27 14:47:24 +03:00

mutation

Merge 'Move mutation_fragment_v2::kind into mutation_fragment_v2::data, mutation_fragment::kind into mutation_fragment::data' from Radosław Cybulski

2025-06-02 10:57:17 +03:00

mutation_writer

readers/mutation_reader: s/reader_consumer_v2/mutation_reader_consumer/

2025-05-09 07:53:29 -04:00

node_ops

tasks: check whether a node is alive before rpc

2025-04-17 12:51:22 +02:00

pgo

Update pgo profiles - aarch64

2025-06-01 04:45:06 +03:00

raft

raft: server_impl: use named gate

2025-04-12 11:28:48 +03:00

readers

compacting_reader: Extend to accept tombstone purge statistics

2025-05-16 20:00:00 +02:00

redis

transport: generic_server: remove no longer used connection advertising code

2025-05-27 19:31:09 +02:00

reloc

treewide: improve bash error reporting

2025-02-10 18:28:52 +03:00

repair

readers/queue: drop v2 from reader and related names

2025-05-09 07:53:29 -04:00

replica

replica: Fix truncate assert failure

2025-06-08 15:59:15 +03:00

rust

rust: update dependencies

2025-03-04 09:45:23 +02:00

schema

Merge 'index: implement schema management layer for vector search indexes' from null

2025-05-22 12:19:36 +03:00

scripts

build: fix merge-compdb.py for CMake 'output' attributes

2025-05-20 08:43:09 +03:00

seastar @ 26badcb14c

Update seastar submodule

2025-06-03 13:47:05 +03:00

service

mapreduce: add tablet-aware dispatching algorithm

2025-06-25 10:18:02 +02:00

sstables

replica: Fix truncate assert failure

2025-06-08 15:59:15 +03:00

streaming

Merge 'token_range_vector: fragment' from Avi Kivity

2025-05-29 18:45:13 +02:00

swagger-ui @ 12f1da1082

…

tasks

test: add test for getting tasks children

2025-04-17 13:48:44 +02:00

test

mapreduce: remove _shared_token_metadata from mapreduce_service

2025-06-25 08:42:16 +02:00

tools

Add nodetool refresh --scope option

2025-05-29 16:12:09 +03:00

tracing

tracing: trace_keyspace_helper: use named gate

2025-04-12 11:29:48 +03:00

transport

Merge 'generic_server: transport: improve stats counting and shedding' from Marcin Maliszkiewicz

2025-05-29 12:49:58 +02:00

types

allow "UTC" and "GMT" in string format of timestamp

2025-02-12 09:38:28 +02:00

unified

treewide: improve bash error reporting

2025-02-10 18:28:52 +03:00

utils

disk_space_monitor: add space_source_registration

2025-06-04 16:25:24 +03:00

.clang-format

clang-format: argument and function packing

2024-10-04 14:52:41 +02:00

.dockerignore

…

.gitattributes

configure.py: prepare the build for a default PGO profile in version control

2024-12-27 16:16:04 +08:00

.gitignore

Add .idea folder to .gitignore

2024-09-20 11:49:41 +03:00

.gitmodules

build: replace tools/java submodule with packaged cassandra-stress

2025-04-15 10:11:28 +03:00

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

absl-flat_hash_map.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

amplify.yml

…

backlog_controller.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

build_mode.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes_fwd.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes_ostream.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes.hh

bytes: adapt fmt_hex to std::span<const std::byte>

2025-04-01 00:07:27 +02:00

cache_temperature.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cartesian_product.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cell_locking.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

client_data.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

client_data.hh

transport/server: use scheduling group assigned to current user

2025-01-02 07:13:34 +01:00

clocks-impl.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clocks-impl.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clustering_bounds_comparator.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clustering_interval_set.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clustering_key_filter.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clustering_ranges_walker.hh

clustering_range_walker: drop boost iterator_range dependency

2025-02-17 11:34:46 +03:00

CMakeLists.txt

Move direct_failure_detector from root to service/

2025-04-08 13:03:24 +03:00

collection_mutation.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

collection_mutation.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

column_computation.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

combine.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

compound_compat.hh

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

compound.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

compress.cc

compress: fix a use-after-free in dictionary_holder::get_recommended_dict()

2025-06-03 10:42:38 +03:00

compress.hh

compress: add some test-only APIs

2025-05-07 14:43:20 +02:00

concrete_types.hh

types: implement vector_type_impl

2025-01-26 19:36:41 +01:00

configure.py

encryption/utils: Move encryption httpclient to "general" REST client

2025-05-30 12:21:51 +03:00

CONTRIBUTING.md

Fix typos

2025-02-11 00:17:43 +02:00

converting_mutation_partition_applier.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

converting_mutation_partition_applier.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

counters.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

counters.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

coverage_excludes.txt

…

coverage_sources.list

…

cql_serialization_format.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

db_clock.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

debug.cc

gdb: protect debug::the_database from lto

2025-01-23 22:26:04 +02:00

debug.hh

gdb: protect debug::the_database from lto

2025-01-23 22:26:04 +02:00

default.nix

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

Doxyfile

…

duration.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

duration.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

encoding_stats.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

enum_set.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

fix_system_distributed_tables.py

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

flake.lock

…

flake.nix

…

frozen_schema.cc

schema_registry: store base info instead of base schema for view entries

2025-04-24 01:08:39 +02:00

frozen_schema.hh

schema_registry: store base info instead of base schema for view entries

2025-04-24 01:08:39 +02:00

full_position.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

gc_clock.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

gdbinit

…

gen_segmented_compress_params.py

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

generic_server.cc

generic_server: make shutdown() return void

2025-05-27 19:31:09 +02:00

generic_server.hh

generic_server: make shutdown() return void

2025-05-27 19:31:09 +02:00

HACKING.md

build: replace tools/java submodule with packaged cassandra-stress

2025-04-15 10:11:28 +03:00

hashing_partition_visitor.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

idl-compiler.py

idl, message: make with_timeout and cancellable verb attributes composable

2025-04-30 11:45:51 +03:00

inet_address_vectors.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

init.cc

compress: distribute compression dictionaries over shards

2025-05-07 14:43:18 +02:00

init.hh

Move object_storage.yaml endpoints to scylla.yaml

2025-03-31 13:39:39 +03:00

install-dependencies.sh

toolchain: set scylla-driver release based on tools/cqlsh

2025-05-15 06:08:14 +03:00

install.sh

install.sh: simplify check_usermode_support()

2025-02-24 11:29:30 +03:00

interval.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

keys.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

keys.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

LICENSE-ScyllaDB-Source-Available.md

Fix typos

2025-02-13 01:54:08 +02:00

main.cc

mapreduce: remove _shared_token_metadata from mapreduce_service

2025-06-25 08:42:16 +02:00

map_difference.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

marshal_exception.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

multishard_mutation_query.cc

readers/mutation_source: s/make_reader_v2/make_mutation_reader/

2025-05-09 07:53:29 -04:00

multishard_mutation_query.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

mutation_query.cc

schema: deinline some speculative_retry methods

2025-01-02 12:28:33 +01:00

mutation_query.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

partition_range_compat.hh

dht: fragment token_range_vector

2025-05-27 14:47:24 +03:00

partition_slice_builder.cc

tree: Remove unused boost headers

2025-02-25 10:32:32 +03:00

partition_slice_builder.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

partition_snapshot_reader.hh

moved cache files to db

2025-02-04 12:21:31 +03:00

protocol_server.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

querier.cc

querier_cache: use named gate

2025-04-12 11:28:48 +03:00

querier.hh

readers/mutation_source: s/make_reader_v2/make_mutation_reader/

2025-05-09 07:53:29 -04:00

query_id.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query_ranges_to_vnodes.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query_ranges_to_vnodes.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query_result_merger.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query-request.hh

db: cql3: add comments regarding unsafe interval<clustering_key_prefix>

2025-02-26 12:01:28 +01:00

query-result-reader.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query-result-set.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query-result-set.hh

db: prevent accidental copies of result_set_row by making it move-only

2025-02-17 09:48:08 +02:00

query-result-writer.hh

query-result-writer: reorder initialization to prevent use-after-move

2025-02-17 13:45:35 +03:00

query-result.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

reader_concurrency_semaphore_group.cc

treewide: fix misspellings

2025-01-05 16:13:09 +02:00

reader_concurrency_semaphore_group.hh

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

reader_concurrency_semaphore.cc

reader_concurrency_semaphore: use named gate

2025-04-12 11:28:48 +03:00

reader_concurrency_semaphore.hh

reader_concurrency_semaphore: use named gate

2025-04-12 11:28:48 +03:00

reader_permit.hh

reader_permit: mark check_abort() as const

2025-02-07 01:32:35 -05:00

README.md

README: adjust to reflect license change

2025-01-30 10:28:32 +03:00

real_dirty_memory_accounter.hh

moved cache files to db

2025-02-04 12:21:31 +03:00

release.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

release.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

reversibly_mergeable.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

schema_mutations.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

schema_mutations.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

schema_upgrader.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

scylla_post_install.sh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

scylla-gdb.py

Update seastar submodule

2025-06-03 13:47:05 +03:00

SCYLLA-VERSION-GEN

Update ScyllaDB version to: 2025.3.0-dev

2025-05-07 11:43:11 +03:00

seastarx.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

serialization_visitors.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

serializer_impl.hh

serializer_impl.hh: add as_input_stream(managed_bytes_view) overload

2025-05-13 10:32:32 +02:00

serializer.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

serializer.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

service_permit.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

shell.nix

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

sstable_dict_autotrainer.cc

compress: distribute compression dictionaries over shards

2025-05-07 14:43:18 +02:00

sstable_dict_autotrainer.hh

dict_autotrainer: introduce sstable_dict_autotrainer

2025-04-01 00:07:30 +02:00

sstables_loader.cc

sstables_loader: Extend logging with recently added skip-cleanup

2025-05-28 11:20:27 +03:00

sstables_loader.hh

code: Push bool skip_cleanup flag around

2025-05-13 16:51:21 +03:00

supervisor.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

table_helper.cc

audit: Add service level support to CQL login process

2025-01-15 11:10:36 +01:00

table_helper.hh

audit: Add the audit subsystem

2025-01-15 11:10:35 +01:00

test.py

test.py: move setup cgroups to the generic method

2025-04-24 14:05:49 +02:00

timeout_config.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

timeout_config.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

timestamp.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tombstone_gc_extension.hh

schema: deprecate schema_extension

2025-03-19 20:36:16 +02:00

tombstone_gc_options.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tombstone_gc_options.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tombstone_gc-internals.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tombstone_gc.cc

Merge 'scylla sstable: Add standard extensions and propagate to schema load ' from Calle Wilund

2025-02-26 13:52:47 +02:00

tombstone_gc.hh

repair: Wire repair_time in system.tablets for tombstone gc

2025-01-17 16:12:05 +08:00

ubsan-suppressions.supp

…

unimplemented.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

unimplemented.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

validation.cc

tree: Make values mutable to enable move semantics

2025-03-03 13:53:02 +03:00

validation.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

version.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

view_info.hh

base_info: remove the lw_shared_ptr variant

2025-04-24 01:08:40 +02:00

vint-serialization.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

vint-serialization.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++23 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its API - CQL. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of ScyllaDB.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 73.5%

Python 25.3%

CMake 0.3%

GAP 0.3%

Shell 0.3%