mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 21:17:01 +00:00

Go to file

Piotr Dulikowski 62efe6616a Merge 'mapreduce: add tablet-aware dispatching algorithm' from Andrzej Jackowski

The primary motivation for this change is to reduce the time during which the Effective Replication Map (ERM) is retained by the mapreduce service. This ensures that long aggregate queries do not block topology operations. As ScyllaDB is generally transitioning towards tablets, and using tablets simplifies work dispatching, the decision was made to design the new algorithm specifically for tablets. The goal of the algorithm is to divide the work in such a way that each `tablet_replica` (that is <host, shard> pair) processes two tablets at a time.

The new algorithm can be summarized as follows:
 1. Prepare a tablet_replica -> partition_range mapping where the values     cover the entire space.
 2. For each tablet_replica, in parallel, take two partition ranges and dispatch them to the node hosting the replica. The ERM is released and re-acquired in each iteration, allowing the destination (i.e., tablet_replica) to change for each
artition range (in such cases, the partition range is assigned to the appropriate tablet_replica).

In step 1, the main difference compared to the old algorithm (dispatch_to_vnodes) is that partition ranges are assigned to a tablet_replica rather than just the host.

In step 2, the main difference is that the work is divided into smaller batches, and the ERM is released and re-acquired for each batch.

In the current implementation, each node can correctly handle every partition range, even if the mapreduce supercoordinator does not retain the ERM and the range is absent locally. This is because mapreduce_service::execute_on_this_shard creates a new pager that coordinates the partition range read, including obtaining its own ERM. However, every partition range that is absent locally is handled by shard 0. Therefore, proper routing of partition ranges is necessary to avoid shard 0 overload. This is why, in step 2, the ERM is retained during each batch processing, and the tablet_replica is refreshed for each processed range.

Additionally, shard_id is added to mapreduce request. When shard_id is set, the entire partition range is handled by the specified shard. As the new tablet-aware mapreduce algorithm balances the workload across shards, shard_id ensure that the balance is preserved, even during events such as tablet splits.

This patch series:
 - Refactors a bit mapreduce service, to facilitate having two algorithm versions (one for vnodes and one for tablets).
 - Implements tablet-aware dispatching algorithm.
 - Adds shard_id to mapreduce request and uses the information to handle requests entirely by selected shard.
 - Adds test_long_query_timeout_erm to verify the new functionality.

Fixes: scylladb#21831

No backport, as it is rather new feature than a bugfix.

Closes scylladb/scylladb#24383

* github.com:scylladb/scylladb:
  mapreduce: add missing comma and space in mapreduce_request operator<<
  mapreduce: add shard_id_hint to mapreduce request
  test: add test_long_query_timeout_erm
  mapreduce: add tablet-aware dispatching algorithm
  storage_proxy: make storage_proxy::is_alive public
  mapreduce: remove _shared_token_metadata from mapreduce_service
  mapreduce: move dispatching logic to dispatch_to_vnodes
  mapreduce: remove underscores from variable names
  mapreduce: move req_with_modified_pr handling to a new function
  mapreduce: change next_vnode lambda to get_next_partition_range function

2025-06-26 12:25:39 +02:00

.github

.github/workflows/conflict_reminder: reduce the amount of conflict reminder for every push event

2025-05-28 11:01:44 +03:00

abseil @ d7aaad83b4

…

alternator

alternator/stats.cc, metrics-config.yml: docs fix per-table metrics

2025-06-15 18:06:36 +03:00

api

api/failure_detector.cc: stream endpoints

2025-06-25 11:28:37 +03:00

audit

audit: add semaphore to audit_syslog_storage_helper

2025-04-08 16:24:42 +02:00

auth

Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz"

2025-06-16 22:38:12 +03:00

bin

treewide: improve bash error reporting

2025-02-10 18:28:52 +03:00

cdc

cdc: add sanity check for generating an empty generation

2025-04-25 11:25:07 +02:00

cmake

build: cmake: Use LINKER: prefix for consistent linker option handling

2025-06-25 11:17:15 +03:00

compaction

compaction_manager: cancel submission timer on drain

2025-06-20 11:33:49 +03:00

conf

Merge 'Add tablet enforcing option' from Benny Halevy

2025-04-03 16:32:19 +03:00

cql3

Merge 'Introduce a queue of global topology requests.' from Gleb Natapov

2025-06-23 16:08:09 +03:00

data_dictionary

Merge 'scylla-sstable: add native S3 support' from Ernest Zaslavsky

2025-03-14 15:05:52 +02:00

logging: Add row count to large partition warning message

2025-06-26 12:25:38 +02:00

debug

…

dht

interval: reduce sizeof

2025-06-14 21:29:43 +03:00

dist

Do not perform blkdiscard by default on the disks during RAID setup.

2025-06-26 12:25:38 +02:00

docs

Add repository layout dev documentation

2025-06-25 13:58:05 +03:00

ent

encryption/utils: Move encryption httpclient to "general" REST client

2025-05-30 12:21:51 +03:00

exceptions

transport: storage_proxy: release ERM when waiting for query timeout

2025-04-23 09:29:47 +02:00

gms

Merge 'Make uuid sstable generations mandatory' from Benny Halevy

2025-06-26 12:25:38 +02:00

idl

Merge 'mapreduce: add tablet-aware dispatching algorithm' from Andrzej Jackowski

2025-06-26 12:25:39 +02:00

index

interval: rename start() to start_ref() (and end() etc).

2025-06-14 21:26:16 +03:00

lang

treewide: Reduce db/config.hh header fanout

2025-02-25 15:16:40 +01:00

licenses

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

locator

storage_service: Use utils::chunked_vector to avoid big allocation

2025-06-19 16:51:01 +03:00

message

dht: fragment token_range_vector

2025-05-27 14:47:24 +03:00

mutation

interval: rename start_ref() back to start() (and end_ref() etc).

2025-06-14 21:26:16 +03:00

mutation_writer

readers/mutation_reader: s/reader_consumer_v2/mutation_reader_consumer/

2025-05-09 07:53:29 -04:00

node_ops

topology request: make it possible to hold global request types in request_type field

2025-06-09 13:38:49 +03:00

pgo

Update pgo profiles - aarch64

2025-06-23 19:20:50 +03:00

raft

raft: server_impl: use named gate

2025-04-12 11:28:48 +03:00

readers

interval: rename start_ref() back to start() (and end_ref() etc).

2025-06-14 21:26:16 +03:00

redis

transport: generic_server: remove no longer used connection advertising code

2025-05-27 19:31:09 +02:00

reloc

treewide: improve bash error reporting

2025-02-10 18:28:52 +03:00

repair

interval: rename start_ref() back to start() (and end_ref() etc).

2025-06-14 21:26:16 +03:00

replica

Merge 'cql, schema: Extend keyspace, table, views, indexes name length limit from 48 to 192 bytes' from Karol Nowacki

2025-06-22 17:41:10 +03:00

rust

rust: update dependencies

2025-03-04 09:45:23 +02:00

schema

cql, schema: Extend name length limit from 48 to 192 bytes

2025-06-18 14:08:38 +02:00

scripts

alternator/stats.cc, metrics-config.yml: docs fix per-table metrics

2025-06-15 18:06:36 +03:00

seastar @ 26badcb14c

Update seastar submodule

2025-06-03 13:47:05 +03:00

service

Merge 'mapreduce: add tablet-aware dispatching algorithm' from Andrzej Jackowski

2025-06-26 12:25:39 +02:00

sstables

Merge 'Make uuid sstable generations mandatory' from Benny Halevy

2025-06-26 12:25:38 +02:00

streaming

sstables: sstable_generation_generator: set last_generation=0 by default

2025-06-18 11:30:29 +03:00

swagger-ui @ 12f1da1082

…

tasks

test: add test for getting tasks children

2025-04-17 13:48:44 +02:00

test

Merge 'mapreduce: add tablet-aware dispatching algorithm' from Andrzej Jackowski

2025-06-26 12:25:39 +02:00

tools

Merge 'Make uuid sstable generations mandatory' from Benny Halevy

2025-06-26 12:25:38 +02:00

tracing

tracing: trace_keyspace_helper: use named gate

2025-04-12 11:29:48 +03:00

transport

Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz"

2025-06-16 22:38:12 +03:00

types

allow "UTC" and "GMT" in string format of timestamp

2025-02-12 09:38:28 +02:00

unified

treewide: improve bash error reporting

2025-02-10 18:28:52 +03:00

utils

utils/exceptions.cc: Added check for exceptions::request_timeout_exception in is_timeout_exception function.

2025-06-26 12:25:38 +02:00

.clang-format

clang-format: argument and function packing

2024-10-04 14:52:41 +02:00

.dockerignore

…

.gitattributes

configure.py: prepare the build for a default PGO profile in version control

2024-12-27 16:16:04 +08:00

.gitignore

…

.gitmodules

build: replace tools/java submodule with packaged cassandra-stress

2025-04-15 10:11:28 +03:00

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

absl-flat_hash_map.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

amplify.yml

…

backlog_controller.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

build_mode.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes_fwd.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes_ostream.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes.hh

bytes: adapt fmt_hex to std::span<const std::byte>

2025-04-01 00:07:27 +02:00

cache_temperature.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cartesian_product.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cell_locking.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

client_data.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

client_data.hh

transport/server: use scheduling group assigned to current user

2025-01-02 07:13:34 +01:00

clocks-impl.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clocks-impl.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clustering_bounds_comparator.hh

interval: rename start_ref() back to start() (and end_ref() etc).

2025-06-14 21:26:16 +03:00

clustering_interval_set.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clustering_key_filter.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

clustering_ranges_walker.hh

interval: rename start_ref() back to start() (and end_ref() etc).

2025-06-14 21:26:16 +03:00

CMakeLists.txt

Move direct_failure_detector from root to service/

2025-04-08 13:03:24 +03:00

collection_mutation.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

collection_mutation.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

column_computation.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

combine.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

compound_compat.hh

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

compound.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

compress.cc

transport/server: silence the oversized allocation warning in snappy_compress

2025-06-10 19:13:26 +03:00

compress.hh

db/config: add an option that disables dict-aware sstable compressors in DDL statements

2025-06-09 13:30:40 +03:00

concrete_types.hh

types: implement vector_type_impl

2025-01-26 19:36:41 +01:00

configure.py

build: add p11-kit's cflags to user_cflags instead of args.user_cflags

2025-06-25 11:24:09 +03:00

CONTRIBUTING.md

Fix typos

2025-02-11 00:17:43 +02:00

converting_mutation_partition_applier.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

converting_mutation_partition_applier.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

counters.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

counters.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

coverage_excludes.txt

…

coverage_sources.list

…

cql_serialization_format.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

db_clock.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

debug.cc

gdb: protect debug::the_database from lto

2025-01-23 22:26:04 +02:00

debug.hh

gdb: protect debug::the_database from lto

2025-01-23 22:26:04 +02:00

default.nix

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

Doxyfile

…

duration.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

duration.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

encoding_stats.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

enum_set.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

fix_system_distributed_tables.py

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

flake.lock

…

flake.nix

…

frozen_schema.cc

Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz"

2025-06-16 22:38:12 +03:00

frozen_schema.hh

Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz"

2025-06-16 22:38:12 +03:00

full_position.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

gc_clock.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

gdbinit

…

gen_segmented_compress_params.py

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

generic_server.cc

test: add test for live updates of generic server config

2025-06-23 17:56:26 +02:00

generic_server.hh

generic_server: fix connections semaphore config observer

2025-06-23 17:54:01 +02:00

HACKING.md

build: replace tools/java submodule with packaged cassandra-stress

2025-04-15 10:11:28 +03:00

hashing_partition_visitor.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

idl-compiler.py

idl, message: make with_timeout and cancellable verb attributes composable

2025-04-30 11:45:51 +03:00

inet_address_vectors.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

init.cc

compress: distribute compression dictionaries over shards

2025-05-07 14:43:18 +02:00

init.hh

Move object_storage.yaml endpoints to scylla.yaml

2025-03-31 13:39:39 +03:00

install-dependencies.sh

toolchain: set scylla-driver release based on tools/cqlsh

2025-05-15 06:08:14 +03:00

install.sh

install.sh: simplify check_usermode_support()

2025-02-24 11:29:30 +03:00

interval.hh

interval: reduce sizeof

2025-06-14 21:29:43 +03:00

keys.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

keys.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

LICENSE-ScyllaDB-Source-Available.md

Fix typos

2025-02-13 01:54:08 +02:00

main.cc

Merge 'mapreduce: add tablet-aware dispatching algorithm' from Andrzej Jackowski

2025-06-26 12:25:39 +02:00

map_difference.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

marshal_exception.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

multishard_mutation_query.cc

readers/mutation_source: s/make_reader_v2/make_mutation_reader/

2025-05-09 07:53:29 -04:00

multishard_mutation_query.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

mutation_query.cc

schema: deinline some speculative_retry methods

2025-01-02 12:28:33 +01:00

mutation_query.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

partition_range_compat.hh

interval: rename start_ref() back to start() (and end_ref() etc).

2025-06-14 21:26:16 +03:00

partition_slice_builder.cc

tree: Remove unused boost headers

2025-02-25 10:32:32 +03:00

partition_slice_builder.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

partition_snapshot_reader.hh

moved cache files to db

2025-02-04 12:21:31 +03:00

protocol_server.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

querier.cc

interval: rename start_ref() back to start() (and end_ref() etc).

2025-06-14 21:26:16 +03:00

querier.hh

readers/mutation_source: s/make_reader_v2/make_mutation_reader/

2025-05-09 07:53:29 -04:00

query_id.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query_ranges_to_vnodes.cc

interval: rename start_ref() back to start() (and end_ref() etc).

2025-06-14 21:26:16 +03:00

query_ranges_to_vnodes.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query_result_merger.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query-request.hh

mapreduce: add shard_id_hint to mapreduce request

2025-06-25 19:23:07 +02:00

query-result-reader.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query-result-set.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query-result-set.hh

db: prevent accidental copies of result_set_row by making it move-only

2025-02-17 09:48:08 +02:00

query-result-writer.hh

query-result-writer: reorder initialization to prevent use-after-move

2025-02-17 13:45:35 +03:00

query-result.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

query.cc

mapreduce: add missing comma and space in mapreduce_request operator<<

2025-06-25 19:23:07 +02:00

reader_concurrency_semaphore_group.cc

treewide: fix misspellings

2025-01-05 16:13:09 +02:00

reader_concurrency_semaphore_group.hh

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

reader_concurrency_semaphore.cc

reader_concurrency_semaphore: use named gate

2025-04-12 11:28:48 +03:00

reader_concurrency_semaphore.hh

reader_concurrency_semaphore: use named gate

2025-04-12 11:28:48 +03:00

reader_permit.hh

reader_permit: mark check_abort() as const

2025-02-07 01:32:35 -05:00

README.md

README: adjust to reflect license change

2025-01-30 10:28:32 +03:00

real_dirty_memory_accounter.hh

moved cache files to db

2025-02-04 12:21:31 +03:00

release.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

release.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

reversibly_mergeable.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

schema_mutations.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

schema_mutations.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

schema_upgrader.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

scylla_post_install.sh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

scylla-gdb.py

gdb: adjust unordered container accessors for libstdc++15

2025-06-18 09:15:03 +03:00

SCYLLA-VERSION-GEN

Update ScyllaDB version to: 2025.3.0-dev

2025-05-07 11:43:11 +03:00

seastarx.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

serialization_visitors.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

serializer_impl.hh

serializer_impl.hh: add as_input_stream(managed_bytes_view) overload

2025-05-13 10:32:32 +02:00

serializer.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

serializer.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

service_permit.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

shell.nix

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

sstable_dict_autotrainer.cc

compress: distribute compression dictionaries over shards

2025-05-07 14:43:18 +02:00

sstable_dict_autotrainer.hh

dict_autotrainer: introduce sstable_dict_autotrainer

2025-04-01 00:07:30 +02:00

sstables_loader.cc

Add support for nodetool refresh --skip-reshape

2025-06-10 12:52:13 +03:00

sstables_loader.hh

Add support for nodetool refresh --skip-reshape

2025-06-10 12:52:13 +03:00

supervisor.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

table_helper.cc

audit: Add service level support to CQL login process

2025-01-15 11:10:36 +01:00

table_helper.hh

audit: Add the audit subsystem

2025-01-15 11:10:35 +01:00

test.py

test.py: pytest c++ facades should respect saving logs on success

2025-06-24 20:53:32 +03:00

timeout_config.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

timeout_config.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

timestamp.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tombstone_gc_extension.hh

schema: deprecate schema_extension

2025-03-19 20:36:16 +02:00

tombstone_gc_options.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tombstone_gc_options.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tombstone_gc-internals.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tombstone_gc.cc

Merge 'scylla sstable: Add standard extensions and propagate to schema load ' from Calle Wilund

2025-02-26 13:52:47 +02:00

tombstone_gc.hh

repair: Wire repair_time in system.tablets for tombstone gc

2025-01-17 16:12:05 +08:00

ubsan-suppressions.supp

…

unimplemented.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

unimplemented.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

validation.cc

tree: Make values mutable to enable move semantics

2025-03-03 13:53:02 +03:00

validation.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

version.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

view_info.hh

base_info: remove the lw_shared_ptr variant

2025-04-24 01:08:40 +02:00

vint-serialization.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

vint-serialization.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++23 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its API - CQL. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of ScyllaDB.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.2%

Python 26.6%

CMake 0.3%

GAP 0.3%

Shell 0.3%