mirror of https://github.com/scylladb/scylladb.git synced 2026-05-13 11:22:01 +00:00

Go to file

Kamil Braun 4d99cd2055 Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky

Add the gossip state for broadcasting the nodes state_id.

Implemented the Group0 state broadcaster (based on the gossip) that will broadcast the state id of each node and check the minimal state id for the tombstone GC.

When there is a change in the tombstone GC minimal state id, the state broadcaster will update the tombstone GC time for the group0-managed tables.

The main component of the change is the newly added `group0_state_id_handler` that keeps track, broadcasts and receives the last group0 state_ids across all nodes and sets the tombstone GC deletion time accordingly:
* on each group0 change applied, the state_id handler broadcasts the state_id as a gossip state (only if the value has changed)
* the handler checks for the node state ids every refresh period (configurable, 1h by default)
* on every check, the handler figures out the lowest state_id (timeuuid), which is state_id that all of the nodes already have
* the timestamp of this minimum state_id is then used to set the tombstone GC deletion time
* the tombstone GC calculation then uses that deletion time to provide the GC time back to the callers, e.g. when doing the compaction
* (as the time for tombstone GC calculation has the 1s granularity we actually deduce 1s from the determined timestamp, because it can happen that there were some newer mutations received in the same second that were not distributed across the nodes yet)

This change introduces a new flag to the static schema descriptor (`is_group0_table`) that is being checked for this newly added mode in the tombstone GC. We also add a check (in non-release builds only) on every group0 modification that the table has this flag set.

The group0 tombstone GC handling is similar to the "repair" tombstone GC mode in a sense (that the tombstone GC time is determined according to a reconciliation action), however it is not explicitly visible to (nor editable by) the user. And also the tombstone GC calculation is much simpler than the "repair" mode calculation - for example, we always use the whole range (as opposed to the "repair" mode that can have specific repair times set for specific ranges).

We use the group0 configuration to determine the set of nodes (both current and previous in case of joint configuration) - we need to make sure that we account for all the group0 nodes (if any node didn't provide the state_id yet, the current check round will be skipped, i.e. no GC will be done until all known nodes provide their state_id timestamp value).

Also note that the group0 state_id handling works on all nodes independently, i.e. each node might have its own (possibly different) state depending on the gossip application state propagation. This is however not a problem, as some nodes might be behind, but they will catch up eventually, and this solution has the benefit of being distributed (as opposed to having a central point to handle the state, like for example the topology coordinator that has been considered in the early stages of the design).

Fixes: scylladb/scylla#15607

New feature, should not be backported.

Closes scylladb/scylladb#20394

* github.com:scylladb/scylladb:
  raft: add the check for the group0 tables
  raft: fast tombstone GC for group0-managed tables
  tombstone_gc: refactor the repair map
  raft: flag the group0-managed tables
  gossip: broadcast the group0 state id
  raft/test: add test for the group0 tombstone GC
  treewide: code cleanup and refactoring

2024-10-11 11:52:27 +02:00

.github

.github: add db to iwyu's CLEANER_DIR

2024-10-04 20:48:18 +08:00

abseil @ d7aaad83b4

build: bring abseil submodule back

2024-05-05 23:31:09 +03:00

alternator

alternator: add "dc" and "rack" options to "/localnodes" request

2024-10-07 20:53:47 +03:00

api

api/storage_service: use ranges when handlging restore API

2024-10-07 10:54:37 +03:00

auth

auth: add "IWYU pragma: keep" to keep boost/regex_fwd.hpp

2024-10-07 20:08:05 +03:00

bin

scripts: fix bin/cqlsh shortcut

2024-09-16 09:52:29 +03:00

cdc

treewide: Prefer bytes_fwd.hh over bytes.hh

2024-10-02 07:29:30 +02:00

cmake

cmake/check_headers: correct typos

2024-10-08 09:38:16 +03:00

compaction

Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky

2024-10-11 11:52:27 +02:00

conf

config: drop reversed_reads_auto_bypass_cache

2024-08-13 10:02:42 +02:00

cql3

schema/schema: break circular dependency with replica::database

2024-10-10 10:07:26 +03:00

data_dictionary

treewide: accept list of sstables in "restore" API

2024-10-01 23:24:56 +08:00

Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky

2024-10-11 11:52:27 +02:00

debug

…

dht

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

direct_failure_detector

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

dist

scylla_coredump_setup: fix typos in comment

2024-09-30 13:29:34 +03:00

docs

docs: Fix confgroup links

2024-10-09 20:16:15 +03:00

exceptions

treewide: Prefer bytes_fwd.hh over bytes.hh

2024-10-02 07:29:30 +02:00

gms

gossip: broadcast the group0 state id

2024-10-08 20:53:54 +02:00

idl

forward_service: rename to mapreduce_service

2024-07-03 19:29:47 +03:00

index

code-cleanup: add missing header guards

2024-07-09 18:31:35 +03:00

lang

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

licenses

…

locator

Merge 'Node replace and remove operations: Add deprecate IP addresses usage warning.' from Sergey Zolotukhin

2024-10-03 11:08:28 +02:00

message

message/messaging_service: guard adding maintenance tenant under cluster feature

2024-09-16 15:34:36 +02:00

mutation

utils/unconst, mutation_partition: switch to ranges

2024-10-07 17:30:12 +03:00

mutation_writer

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

node_ops

cmake/check_headers: correct typos

2024-10-08 09:38:16 +03:00

raft

raft: add more information to start_read_barrier error

2024-10-09 16:24:34 +02:00

readers

Update seastar submodule

2024-09-18 13:59:22 +03:00

redis

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

reloc

reloc: create $BUILDDIR for getting its path

2024-05-01 09:52:17 +03:00

repair

Merge 'repair: Fix stall in repair_get_row_diff_with_rpc_stream_process_op_slow_path' from Asias He

2024-10-10 09:27:27 +03:00

replica

Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky

2024-10-11 11:52:27 +02:00

rust

rust: disable incremental build for release build

2024-06-20 12:01:14 +03:00

schema

Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky

2024-10-11 11:52:27 +02:00

scripts

[script/pull_github_pr.sh] Check Gating status before merging

2024-10-01 14:46:29 +03:00

seastar @ 3c9c2696a4

Update seastar submodule

2024-09-29 13:47:40 +03:00

service

Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky

2024-10-11 11:52:27 +02:00

sstables

sstable_set: Reserve vector of readers

2024-10-11 09:56:17 +03:00

streaming

view: check_needs_view_update_path: get token_metadata_ptr

2024-10-09 20:56:21 +03:00

swagger-ui @ 12f1da1082

…

tasks

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

test

Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky

2024-10-11 11:52:27 +02:00

tools

sstables: scylla_metadata: add sstable identifier

2024-10-10 08:52:46 +03:00

tracing

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

transport

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

types

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

unified

dist: drop scylla-jmx

2024-09-13 07:59:45 +03:00

utils

Merge 'utils: replace dependency on boost ranges with <ranges>' from Avi Kivity

2024-10-09 16:04:48 +03:00

.clang-format

clang-format: argument and function packing

2024-10-04 14:52:41 +02:00

.dockerignore

…

.gitattributes

gitattributes: Mark swagger .js files as binary

2024-06-19 15:07:56 +03:00

.gitignore

Add .idea folder to .gitignore

2024-09-20 11:49:41 +03:00

.gitmodules

dist: drop scylla-jmx

2024-09-13 07:59:45 +03:00

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

…

absl-flat_hash_map.hh

…

amplify.yml

…

backlog_controller.hh

…

build_mode.hh

…

bytes_fwd.hh

cql3: Refactor description

2024-09-20 14:24:53 +02:00

bytes_ostream.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

bytes.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

bytes.hh

cql3: Refactor description

2024-09-20 14:24:53 +02:00

cache_mutation_reader.hh

tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh

2024-09-10 19:05:57 +03:00

cache_temperature.hh

…

cartesian_product.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

cell_locking.hh

Merge 'cell_locker: maybe_rehash: ignore allocation failures' from Benny Halevy

2024-08-12 10:54:56 +03:00

checked-file-impl.hh

…

client_data.cc

…

client_data.hh

transport: do not return client_type from cql_server::connection::make_client_key()

2024-06-07 09:23:06 +08:00

clocks-impl.cc

…

clocks-impl.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

clustering_bounds_comparator.hh

clustering_bounds_comparator: drop operator<< for bound_kind

2024-06-11 18:01:06 +02:00

clustering_interval_set.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

clustering_key_filter.hh

clustering_key_filter: unify get_ranges and get_native_ranges

2024-08-13 10:07:12 +02:00

clustering_ranges_walker.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

CMakeLists.txt

build: cmake: use the same options to configure seastar

2024-08-28 06:15:59 +03:00

collection_mutation.cc

compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats

2024-09-10 19:05:57 +03:00

collection_mutation.hh

tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh

2024-09-10 19:05:57 +03:00

column_computation.hh

…

combine.hh

…

compound_compat.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

compound.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

compress.cc

…

compress.hh

compress, auth: include used headers

2024-05-30 09:16:23 +03:00

concrete_types.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

configure.py

Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky

2024-10-11 11:52:27 +02:00

CONTRIBUTING.md

…

converting_mutation_partition_applier.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

converting_mutation_partition_applier.hh

…

counters.cc

treewide: use std::ranges sort functions rather than boost

2024-10-01 14:19:05 +03:00

counters.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

coverage_excludes.txt

…

coverage_sources.list

…

cql_serialization_format.hh

…

db_clock.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

debug.cc

…

debug.hh

…

default.nix

treewide: drop thrift support

2024-06-07 06:44:59 +08:00

Doxyfile

…

duration.cc

…

duration.hh

…

encoding_stats.hh

…

enum_set.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

fix_system_distributed_tables.py

…

flake.lock

…

flake.nix

…

frozen_schema.cc

…

frozen_schema.hh

…

full_position.hh

…

gc_clock.hh

…

gdbinit

…

gen_segmented_compress_params.py

…

generic_server.cc

generic_server: make server::stop() idempotent

2024-08-28 15:54:31 +02:00

generic_server.hh

generic_server: convert connection tracking to seastar::gate

2024-08-28 10:59:44 +02:00

HACKING.md

HACKIGN.md: clarify the use of dbuild when running test.py

2024-09-10 13:40:45 +03:00

hashing_partition_visitor.hh

…

idl-compiler.py

idl-compiler: generate async serialization functions for stub members

2024-05-02 19:27:56 +03:00

inet_address_vectors.hh

…

init.cc

Ignore seed name resolution errors on restart.

2024-08-28 14:01:04 +02:00

init.hh

Ignore seed name resolution errors on restart.

2024-08-28 14:01:04 +02:00

install-dependencies.sh

install-dependencies.sh: update node_exporter to 1.8.2

2024-09-25 18:42:25 +03:00

install.sh

install.sh: fix more incorrect permission on strict umask

2024-09-03 10:37:53 +03:00

interval.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

keys.cc

clustering_bounds_comparator: drop operator<< for bound_kind

2024-06-11 18:01:06 +02:00

keys.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

LICENSE.AGPL

…

log.hh

…

main.cc

group0: Stop group0 if node initialization fails

2024-10-06 17:20:52 +03:00

map_difference.hh

…

marshal_exception.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

multishard_mutation_query.cc

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

multishard_mutation_query.hh

…

mutation_query.cc

readers: Use reversed schema and native reversed slices

2024-08-13 10:03:46 +02:00

mutation_query.hh

Fix comments refering to half-reversed (legacy) slices

2024-08-13 10:07:12 +02:00

noexcept_traits.hh

…

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

…

partition_range_compat.hh

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

partition_slice_builder.cc

…

partition_slice_builder.hh

…

partition_snapshot_reader.hh

readers: Use reversed schema and native reversed slices

2024-08-13 10:03:46 +02:00

partition_snapshot_row_cursor.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

protocol_server.hh

protocol_server: Keep scheduling group on board

2024-05-24 17:54:29 +03:00

querier.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

querier.hh

querier: consume_page(): add rate-limiting to tombstone warnings

2024-08-06 08:56:11 -04:00

query_id.hh

…

query_ranges_to_vnodes.cc

./: not include unused headers

2024-03-20 09:16:46 +02:00

query_ranges_to_vnodes.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

query_result_merger.hh

…

query-request.hh

Fix comments refering to half-reversed (legacy) slices

2024-08-13 10:07:12 +02:00

query-result-reader.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

query-result-set.cc

…

query-result-set.hh

query-result-set: add formatter for query-result-set.hh types

2024-02-21 17:54:48 +08:00

query-result-writer.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

query-result.hh

query-result.hh: add formatter for query::result::printer

2024-02-21 17:57:18 +08:00

query.cc

query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges

2024-08-13 10:07:10 +02:00

read_context.hh

readers: Use reversed schema and native reversed slices

2024-08-13 10:03:46 +02:00

reader_concurrency_semaphore.cc

reader_concurrency_semaphore: in stats, fix swapped count_resources and memory_resources

2024-10-09 14:12:01 +03:00

reader_concurrency_semaphore.hh

reader_concurrency_semaphore: test constructor: don't ignore metrics param

2024-08-04 21:14:42 +03:00

reader_permit.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

README.md

README: Update the version of C++ to C++23

2024-08-14 12:06:23 +03:00

real_dirty_memory_accounter.hh

…

release.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

release.hh

release: introduce doc_link()

2024-05-08 09:41:17 -04:00

reversibly_mergeable.hh

…

row_cache.cc

row_cache: coroutinize do_update()

2024-09-21 00:07:02 +02:00

row_cache.hh

treewide: rename flat_mutation_reader_v2 to mutation_reader

2024-06-21 07:12:06 +03:00

schema_mutations.cc

schema_mutations: add fmt::formatter for schema_mutations

2024-03-15 09:49:56 +02:00

schema_mutations.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

schema_upgrader.hh

…

scylla_post_install.sh

…

scylla-gdb.py

scylla-gdb.py: drop compatibility code for EOL releases

2024-10-03 15:42:08 +03:00

SCYLLA-VERSION-GEN

Update ScyllaDB version to: 6.3.0-dev

2024-09-17 13:43:04 +03:00

seastarx.hh

treewide: remove dependency on boost asio address_v4

2024-10-01 14:00:50 +03:00

serialization_visitors.hh

…

serializer_impl.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

serializer.cc

…

serializer.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

service_permit.hh

…

setup.py

…

shell.nix

…

sstables_loader.cc

sstable_loader: Remove unused _snapshot_name from download_task_impl

2024-10-07 10:43:13 +03:00

sstables_loader.hh

Merge 'Do not remove objects from backup storage after restore' from Pavel Emelyanov

2024-10-04 14:59:40 +03:00

supervisor.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

table_helper.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

table_helper.hh

cql3: introduce dialect infrastructure

2024-08-29 21:19:23 +03:00

test.py

test: add complete_multipart_upload completion tests

2024-10-01 09:06:24 +03:00

timeout_config.cc

…

timeout_config.hh

treewide: drop thrift support

2024-06-07 06:44:59 +08:00

timestamp.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

tombstone_gc_extension.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

tombstone_gc_options.cc

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

tombstone_gc_options.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

tombstone_gc.cc

raft: fast tombstone GC for group0-managed tables

2024-10-08 21:07:30 +02:00

tombstone_gc.hh

raft: fast tombstone GC for group0-managed tables

2024-10-08 21:07:30 +02:00

tox.ini

…

ubsan-suppressions.supp

…

unimplemented.cc

treewide: drop thrift support

2024-06-07 06:44:59 +08:00

unimplemented.hh

treewide: drop thrift support

2024-06-07 06:44:59 +08:00

validation.cc

…

validation.hh

…

version.hh

…

view_info.hh

mv: delete a partition in a single operation when applicable

2024-07-25 11:12:58 +03:00

vint-serialization.cc

…

vint-serialization.hh

…

zstd.cc

zstd: include external header with brackets

2024-07-04 10:42:29 +03:00

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++23 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its API - CQL. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.5%

Python 26.2%

CMake 0.4%

GAP 0.3%

Shell 0.3%