mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Go to file

Piotr Dulikowski 2e5eb92f21 Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak

When generating CDC log mutations for some base mutation, use a CDC schema that is compatible with the base schema.

The compatible CDC schema has for every base column a corresponding CDC column with the same name. If using a non-compatible schema, we may encounter a situation, especially during ALTER, that we have a mutation with a base column set with some value, but the CDC schema doesn't have a column by that name. This would cause the user request to fail with an error.

We add to the schema object a schema_ptr that for CDC-enabled tables points to the schema object of the CDC table that is compatible with the schema. It is set by the schema merge algorithm when creating the schema for a table that is created or altered. We use the fact that a base table and its CDC table are created and altered in the same group0 operation, and this way we can find and set the cdc schema for a base table.

When transporting the base schema as a frozen schema between shards, we transport with it the frozen cdc schema as well.

The patch starts with a series of refactoring commits that make extending the frozen schema easier and cleans up some duplication in the code about the frozen schema. We combine the two types `frozen_schema_with_base_info` and `view_schema_and_base_info` to a single type `extended_frozen_schema` that holds a frozen schema with additional data that is not part of the schema mutations but needs to be transported with it to unfreeze it - base_info, and the frozen cdc schema which is added in a later commit.

Fixes https://github.com/scylladb/scylladb/issues/26405

backport not needed - enhancement

Closes scylladb/scylladb#24960

* github.com:scylladb/scylladb:
  test: cdc: test cdc compatible schema
  cdc: use compatiable cdc schema
  db: schema_applier: create schema with pointer to CDC schema
  db: schema_applier: extract cdc tables
  schema: add pointer to CDC schema
  schema_registry: remove base_info from global_schema_ptr
  schema_registry: use extended_frozen_schema in schema load
  schema_registry: replace frozen_schema+base_info with extended_frozen_schema
  frozen_schema: extract info from schema_ptr in the constructor
  frozen_schema: rename frozen_schema_with_base_info to extended_frozen_schema

2025-11-13 10:11:54 +01:00

.github

auto-backport: Add support for JIRA issue references

2025-11-12 08:15:06 +02:00

abseil @ d7aaad83b4

…

alternator

Merge 'Support local primary-replica-only for native restore' from Robert Bindar

2025-11-13 12:11:18 +03:00

api

Support primary_replica_only for native restore API

2025-11-11 09:17:52 +02:00

audit

cql3: ks_prop_defs: Expand numeric RF to rack list

2025-10-29 23:32:59 +01:00

auth

Merge 'auth: implement vector store authorization' from Michał Hudobski

2025-10-20 17:32:00 +03:00

bin

…

cdc

Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak

2025-11-13 10:11:54 +01:00

cmake

build: disable the -fextend-variable-liveness clang option

2025-10-21 10:47:34 +03:00

compaction

replica/table: do not stop major compaction when disabling auto compaction

2025-10-29 19:22:07 +05:30

conf

Fix comment for tablets_mode_for_new_keyspaces

2025-11-09 10:49:46 +02:00

cql3

cql3: Make abstract_type explicitly noncopyable

2025-11-12 09:11:56 +01:00

data_dictionary

schema_tables: Keep "replication" column backwards-compatible by expanding rack lists to numeric RF

2025-10-21 09:11:25 +03:00

Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak

2025-11-13 10:11:54 +01:00

debug

…

dht

dht, sstables: replace vector with chunked_vector when computing sstable shards

2025-10-02 00:47:42 +02:00

dist

dist/docker: add configurable blocked-reactor-notify-ms parameter

2025-11-11 12:38:40 +02:00

docs

Merge 'Support local primary-replica-only for native restore' from Robert Bindar

2025-11-13 12:11:18 +03:00

ent

Merge 'encryption::kms_host: Add exponential backoff-retry for 503 errors' from Calle Wilund

2025-11-12 08:33:33 +02:00

exceptions

exceptions.hh: fix message argument passing

2025-08-13 13:39:52 +02:00

gms

cql3: allow counters with tablets

2025-11-03 16:04:37 +01:00

idl

load_stats: change data structure which contains tablet sizes

2025-10-24 14:37:00 +02:00

index

secondary_index: disallow multiple vector indexes on the same column

2025-10-29 11:55:38 +02:00

keys

api/storage_service: add GET 'natural_endpoints' v2 to support composite keys with ':'

2025-10-01 15:53:25 +02:00

lang

treewide: Move type related files to a type directory As requested in #22110 , moved the files and fixed other includes and build system.

2025-09-17 17:32:19 +03:00

licenses

…

locator

Merge 'Support local primary-replica-only for native restore' from Robert Bindar

2025-11-13 12:11:18 +03:00

message

message: move RPC compression from utils/ to message/

2025-09-30 17:03:09 +03:00

mutation

mutation/mutation_compactor: remove _can_gc member

2025-10-16 10:38:47 +03:00

mutation_writer

replica: Fix split compaction when tablet boundaries change

2025-09-07 05:20:23 -03:00

node_ops

storage_service: change node_ops_info::ignore_nodes to host id

2025-09-15 10:18:24 +02:00

pgo

pgo: enable counters with tablets

2025-11-03 16:04:37 +01:00

query

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

raft

raft: small fixes for voters code

2025-10-16 18:41:08 +02:00

readers

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

reloc

…

repair

repair: Add metric for time spent on tablet repair

2025-11-06 10:00:20 +03:00

replica

storage_proxy: apply counter mutation on all write shards

2025-11-03 16:03:29 +01:00

rust

Revert "build: add precompiled headers to CMakeLists.txt"

2025-09-03 09:46:00 +03:00

schema

schema: add pointer to CDC schema

2025-10-21 14:13:43 +02:00

scripts

scripts: pull_github_pr.sh: Fix auth problem detection

2025-10-31 18:32:58 +03:00

seastar @ 63900e0307

Update seastar submodule

2025-10-22 11:26:40 +03:00

service

Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak

2025-11-13 10:11:54 +01:00

sstables

sstables/trie: fix an assertion violation in bti_partition_index_writer_impl::write_last_key

2025-11-07 11:25:07 +02:00

streaming

sstables: make sstable::estimated_keys_for_range asynchronous

2025-09-29 13:01:21 +02:00

swagger-ui @ 12f1da1082

…

tasks

Merge 'compaction: handle exception in expected_total_workload' from Aleksandra Martyniuk

2025-09-17 15:10:19 +03:00

test

Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak

2025-11-13 10:11:54 +01:00

tools

Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak

2025-11-13 10:11:54 +01:00

tracing

Add max trace size output configuration variable

2025-10-28 13:29:15 +03:00

transport

code: Switch to seastar API level 9

2025-10-17 10:26:50 +03:00

types

cql3: Make abstract_type explicitly noncopyable

2025-11-12 09:11:56 +01:00

unified

…

utils

utils: stall_free: add dispose_gently

2025-11-11 12:20:18 +02:00

vector_search

vector_search: remove dependence on cql3

2025-10-21 17:41:55 +03:00

.clang-format

…

.dockerignore

…

.gitattributes

…

.gitignore

.gitignore: add rust target

2025-08-19 13:09:18 +03:00

.gitmodules

…

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

…

absl-flat_hash_map.hh

…

amplify.yml

…

backlog_controller.hh

…

build_mode.hh

…

bytes_fwd.hh

…

bytes_ostream.hh

…

bytes.cc

…

bytes.hh

…

cartesian_product.hh

…

client_data.cc

…

client_data.hh

…

clocks-impl.cc

treewide: Move mutation related files to a mutation directory

2025-09-24 13:23:38 +03:00

clocks-impl.hh

…

CMakeLists.txt

cmake: fix the seastar API level

2025-10-23 11:20:20 +03:00

configure.py

test::lib: Add azure mock/real server fixture

2025-11-05 10:22:22 +00:00

CONTRIBUTING.md

docs: fix typos and spelling errors

2025-09-30 13:16:49 +02:00

coverage_excludes.txt

…

coverage_sources.list

…

db_clock.hh

…

debug.cc

…

debug.hh

…

default.nix

…

Doxyfile

…

encoding_stats.hh

treewide: Move mutation related files to a mutation directory

2025-09-24 13:23:38 +03:00

enum_set.hh

auth: add possibilty to check for any permission in set

2025-10-03 16:55:57 +02:00

fix_system_distributed_tables.py

…

flake.lock

…

flake.nix

…

gc_clock.hh

…

gdbinit

…

gen_segmented_compress_params.py

…

HACKING.md

docs: fix typos and spelling errors

2025-09-30 13:16:49 +02:00

hashing_partition_visitor.hh

…

idl-compiler.py

…

inet_address_vectors.hh

…

init.cc

db: experimental consistent-tablets option

2025-10-15 11:27:10 +03:00

init.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

install-dependencies.sh

install-dependencies.sh: update node_exporter to 1.10.2

2025-11-11 11:36:13 +02:00

install.sh

…

LICENSE-ScyllaDB-Source-Available.md

…

main.cc

db/config: Change default SSTable compressor to LZ4WithDictsCompressor

2025-10-30 15:53:49 +02:00

marshal_exception.hh

…

mutation_query.cc

…

mutation_query.hh

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

mutation: async_utils: add unfreeze_and_split_gently

2025-09-30 17:15:41 +03:00

partition_range_compat.hh

…

partition_slice_builder.cc

…

partition_slice_builder.hh

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

partition_snapshot_reader.hh

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

query_ranges_to_vnodes.cc

…

query_ranges_to_vnodes.hh

…

reader_concurrency_semaphore_group.cc

…

reader_concurrency_semaphore_group.hh

…

reader_concurrency_semaphore.cc

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

reader_concurrency_semaphore.hh

…

reader_permit.hh

…

README.md

docs: fix typos and spelling errors

2025-09-30 13:16:49 +02:00

real_dirty_memory_accounter.hh

…

release.cc

release: adjust doc_link() for the post source-available world

2025-09-29 17:02:55 +03:00

release.hh

…

reversibly_mergeable.hh

…

schema_upgrader.hh

treewide: Move mutation related files to a mutation directory

2025-09-24 13:23:38 +03:00

scylla_post_install.sh

…

scylla-gdb.py

gdb: simplify and future-proof looking up coroutine frame type

2025-10-20 12:38:53 +03:00

SCYLLA-VERSION-GEN

Update ScyllaDB version to: 2026.1.0-dev

2025-09-30 18:54:09 +03:00

seastarx.hh

…

serialization_visitors.hh

…

serializer_impl.hh

…

serializer.cc

…

serializer.hh

treewide: include boost headers as "system" headers

2025-08-22 17:21:24 +03:00

service_permit.hh

…

shell.nix

…

sstable_dict_autotrainer.cc

…

sstable_dict_autotrainer.hh

…

sstables_loader.cc

Improve choice distribution for primary replica

2025-11-11 09:18:01 +02:00

sstables_loader.hh

Support primary_replica_only for native restore API

2025-11-11 09:17:52 +02:00

supervisor.hh

…

table_helper.cc

schema: Allow configuring consistency setting for a keyspace

2025-10-16 13:34:49 +03:00

table_helper.hh

…

test.py

vector_store_client_test: Relocate to a dedicated directory

2025-09-25 14:04:28 +02:00

timeout_config.cc

…

timeout_config.hh

…

tombstone_gc_extension.hh

…

tombstone_gc_options.cc

…

tombstone_gc_options.hh

…

tombstone_gc-internals.hh

treewide: Add missing #pragma once

2025-09-01 14:58:21 +03:00

tombstone_gc.cc

tombstone_gc: add tombstone_gc_state factory methods for gc_all and no_gc

2025-10-16 10:38:47 +03:00

tombstone_gc.hh

tombstone_gc: add tombstone_gc_state factory methods for gc_all and no_gc

2025-10-16 10:38:47 +03:00

ubsan-suppressions.supp

…

unimplemented.cc

…

unimplemented.hh

…

validation.cc

…

validation.hh

…

version.hh

…

view_info.hh

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

vint-serialization.cc

…

vint-serialization.hh

…

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++23 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain. This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its API - CQL. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of ScyllaDB.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.5%

Python 26.2%

CMake 0.4%

GAP 0.3%

Shell 0.3%