mirror of https://github.com/scylladb/scylladb.git synced 2026-06-04 22:13:19 +00:00

Go to file

Nadav Har'El 31e0315710 Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski

Fix cdc writing unnecesary entries to it's log, like for example when Alternator deletes an item which in reality doesn't exist.

Originally @wps0 tackled this issue. This patch is an extension of his work. His work involved adding `should_skip` function to cdc, which would process a `mutation` object and decide, wherever changes in the object should be added to cdc log or not.

The issue with his approach is that `mutation` object might contain changes for more than one row. If - for example - the `mutation` object contains two changes, delete of non-existing row and create of non-existing row, `should_skip` function will detect changes in second item and allow whole `mutation` (BOTH items) to be added. For example (using python's boto3) running this on empty table:
```
with table.batch_writer() as batch:
    batch.put_item({'p': 'p', 'c': 'c0'})
    batch.delete_item(Key={'p': 'p', 'c': 'c1'})
```
will emit two events ("put" event and "delete" event), even though the item with `c` set to `c1` does not exist (thus can't be deleted). Note, that both entries in batch write must use the same partition key, otherwise upper layer with split them into separate `mutation` objects and the issue will not happen.

The solution is to do similar processing, but consider each change separated from others. This is tricky to implement due to a way cdc works. When cdc processes `mutation` object (containing X changes), it emits cdc entries in phases. Phase 1 - emit `preimage` (old state) for each change (if requested). Phase 2 - for each change emit actual "diff" (update / delete and so on). Phase 3 - emit `postimage` (new state).

We will know if change needs to be skipped during phase 2. By that time phase 1 is completed and preimage for the change is emited. At that moment we set a flag that the change (identified by clustering key value) needs to be skipped - we add a clustering key to a `ignore-rows` set (`_alternator_clustering_keys_to_ignore` variable) and continue normally. Once all phases finish we add a `postprocess` phase (`clean_up_noop_rows` function). It will go through generated cdc mutations and skip all modifications, for which clustering key is in `ignore-rows` set. After skipping we need to do a "cleanup" operation - each generated cdc mutation contain index (incremented by one), if we skipped some parts, the index is not consecutive anymore, so we reindex final changes.

There's a special case worth mentioning - Alternator tables without clustering keys. At that point `mutation` object passed to cdc can contain exactly one change (since different partition keys are splitted by upper layers and Alternator will never emit `mutation` object containing two (or more) changes with the same primary key. Here, when we decide the change is to be skipped we add empty `bytes` object to `ignore-rows` set. When checking `ignore-rows` set, we check if it's empty or not (we don't check for presence of empty `bytes` object).

Note: there might be some confusion between this patch and #28452 patch. Both started from the same error observation and use similar tests for validation, as both are easily triggered by BatchWrite commands (both needs `mutation` object passed to cdc to contain more than one single change). This issue tho is about wrong data written in cdc log and is fixed at cdc, where #28452 is about wrong way of parsing correct cdc data and is fixed at Alternator side of things. Note, that we need #28452 to truly verify (otherwise we will emit correct cdc entries, but Alternator will incorrectly parse them).

Note: to benefit / notice this patch you need `alternator_streams_increased_compatibility` flag turned on.

Note: rework is quite "broad" and covers a lot of ground - every operation, that might result in a no-change to the database state should be tested. An additional test was added - trying to remove a column from non-existing item, as well as trying to remove non-existing column from existing item.

Fixes: #28368
Fixes: SCYLLADB-1528
Fixes: SCYLLADB-538

Closes scylladb/scylladb#28544

* github.com:scylladb/scylladb:
  alternator: remove unnecesary code
  alternator: fix Alternator writing unnecesary cdc entries
  alternator: add failing tests for Streams

2026-04-18 00:07:51 +03:00

.github

codeowners: add owner for the test framework

2026-04-16 17:57:21 +03:00

abseil @ 255c84dadd

abseil: update to lts_2026_01_07

2026-04-08 12:19:54 +03:00

alternator

Merge 'alternator: Add stream support for tablets' from Radosław Cybulski

2026-04-17 23:48:31 +03:00

api

Merge 'Replace CAS estimated histogram with estimated_histogram_with_max' from Amnon Heiman

2026-04-17 13:12:59 +03:00

audit

audit: restore static_cast for batch inspect

2026-04-17 23:11:18 +03:00

auth

Merge 'auth: sanitize {USER} substitution in LDAP URL template' from Piotr Smaron

2026-04-15 14:40:15 +03:00

bin

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

cdc

Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski

2026-04-18 00:07:51 +03:00

cmake

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

compaction

compaction: release GC'ed sstables incrementally during compaction

2026-04-17 18:20:47 +03:00

conf

Merge 'Introduce maintenance scheduling supergroup and do initial population' from Pavel Emelyanov

2026-04-12 00:34:48 +03:00

cql3

vector-store: fix creating local vector search indexes with a part of the partition key

2026-04-17 11:44:15 +02:00

data_dictionary

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

treewide: add cdc helper functions to system_keyspace

2026-04-17 18:57:44 +02:00

debug

…

dht

locator: tablets: Support arbitrary tablet boundaries

2026-04-15 01:25:14 +02:00

dist

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

docs

Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski

2026-04-18 00:07:51 +03:00

ent

encryption: cover system.raft table in system_info_encryption

2026-04-16 13:22:10 +02:00

exceptions

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

gms

db: implement large_data virtual tables with feature flag gating

2026-04-16 08:49:02 +03:00

idl

logstor: split log record to header and data

2026-04-16 10:00:35 +03:00

index

Merge 'vector_index: allow recreating vector indexes on the same column' from Dawid Pawlik

2026-04-15 14:40:15 +03:00

keys

keys: move key_to_str() to keys/keys.hh

2026-04-16 08:42:54 +03:00

lang

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

licenses

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

locator

Merge 'Allow arbitrary tablet boundaries and count' from Tomasz Grabiec

2026-04-15 18:57:22 +03:00

message

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

mutation

alternator: fix Alternator writing unnecesary cdc entries

2026-04-17 18:00:25 +02:00

mutation_writer

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

node_ops

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

pgo

Update pgo profiles - aarch64

2026-04-15 05:26:22 +03:00

query

Merge 'query: result_set: change row member to a chunked vector' from Benny Halevy

2026-04-15 14:40:15 +03:00

raft

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

readers

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

reloc

treewide: improve bash error reporting

2025-02-10 18:28:52 +03:00

repair

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

replica

Merge 'Move ignore_component_digest_mismatch flag on sstables_manager' from Pavel Emelyanov

2026-04-17 12:54:17 +03:00

rust

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

schema

db: implement large_data virtual tables with feature flag gating

2026-04-16 08:49:02 +03:00

scripts

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

seastar @ 4d268e0ef5

Update seastar submodule

2026-03-10 22:06:58 +02:00

service

Merge 'Replace CAS estimated histogram with estimated_histogram_with_max' from Amnon Heiman

2026-04-17 13:12:59 +03:00

sstables

Merge 'Move ignore_component_digest_mismatch flag on sstables_manager' from Pavel Emelyanov

2026-04-17 12:54:17 +03:00

streaming

Merge 'Move ignore_component_digest_mismatch flag on sstables_manager' from Pavel Emelyanov

2026-04-17 12:54:17 +03:00

swagger-ui @ 12f1da1082

…

tasks

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

test

Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski

2026-04-18 00:07:51 +03:00

tools

Merge 'Move ignore_component_digest_mismatch flag on sstables_manager' from Pavel Emelyanov

2026-04-17 12:54:17 +03:00

tracing

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

transport

cql: expose stable result metadata for prepared LIST statements

2026-04-13 17:49:27 +03:00

types

Merge 'query: result_set: change row member to a chunked vector' from Benny Halevy

2026-04-15 14:40:15 +03:00

unified

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

utils

Merge 'Replace CAS estimated histogram with estimated_histogram_with_max' from Amnon Heiman

2026-04-17 13:12:59 +03:00

vector_search

vector_search: decrease default connection timeout to 3s

2026-04-17 12:26:39 +03:00

.clang-format

clang-format: argument and function packing

2024-10-04 14:52:41 +02:00

.dockerignore

…

.gitattributes

configure.py: prepare the build for a default PGO profile in version control

2024-12-27 16:16:04 +08:00

.gitignore

.gitignore: add rust target

2025-08-19 13:09:18 +03:00

.gitmodules

build: replace tools/java submodule with packaged cassandra-stress

2025-04-15 10:11:28 +03:00

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

absl-flat_hash_map.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

amplify.yml

…

backlog_controller_fwd.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

backlog_controller.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

build_mode.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

bytes_fwd.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

bytes_ostream.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

bytes.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

bytes.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

cartesian_product.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

client_data.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

client_data.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

clocks-impl.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

clocks-impl.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

CMakeLists.txt

Merge 'Introduce maintenance scheduling supergroup and do initial population' from Pavel Emelyanov

2026-04-12 00:34:48 +03:00

configure.py

Merge 'Alternator: Add vector search support' from Nadav Har'El

2026-04-17 10:25:45 +02:00

CONTRIBUTING.md

docs: fix typos and spelling errors

2025-09-30 13:16:49 +02:00

coverage_excludes.txt

test.py: support code coverage

2024-01-18 11:11:34 +02:00

coverage_sources.list

configure.py support coverage profiles on standrad build modes

2024-01-18 11:11:34 +02:00

db_clock.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

debug.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

debug.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

default.nix

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

Doxyfile

…

encoding_stats.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

enum_set.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

exported_templates.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

exported_templates.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

fix_system_distributed_tables.py

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

flake.lock

build: bump Lua version (5.3 -> 5.4) in Nix devenv

2023-01-19 15:53:49 +01:00

flake.nix

build: fix Nix devenv

2022-12-19 20:53:07 +02:00

gc_clock.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

gdbinit

gdbinit: add ignore clause for SIG35

2023-01-12 12:13:04 +02:00

gen_segmented_compress_params.py

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

HACKING.md

docs: fix typos and spelling errors

2025-09-30 13:16:49 +02:00

hashing_partition_visitor.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

idl-compiler.py

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

inet_address_vectors.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

init.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

init.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

install-dependencies.sh

build: add slirp4netns to dependencies

2026-03-05 17:44:17 +02:00

install.sh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

LICENSE-ScyllaDB-Source-Available.md

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

main.cc

alternator: add system_keyspace reference

2026-04-17 18:57:43 +02:00

marshal_exception.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

mutation_query.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

mutation_query.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

NOTICE.txt

PowerPC: remove ppc stuff

2025-07-08 10:38:23 +03:00

ORIGIN

…

partition_builder.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

partition_range_compat.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

partition_slice_builder.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

partition_slice_builder.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

query_ranges_to_vnodes.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

query_ranges_to_vnodes.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

reader_concurrency_semaphore_group.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

reader_concurrency_semaphore_group.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

reader_concurrency_semaphore.cc

reader_concurrency_semaphore: drop unused stop_ext_{pre,post}()

2026-04-15 14:40:15 +03:00

reader_concurrency_semaphore.hh

reader_concurrency_semaphore: drop unused stop_ext_{pre,post}()

2026-04-15 14:40:15 +03:00

reader_permit.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

README.md

docs: fix link to docker build README.MD

2026-02-18 12:12:46 +01:00

real_dirty_memory_accounter.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

release.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

release.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

reversibly_mergeable.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

schema_upgrader.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

scylla_post_install.sh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

scylla-gdb.py

abseil: update to lts_2026_01_07

2026-04-08 12:19:54 +03:00

SCYLLA-VERSION-GEN

Update ScyllaDB version to: 2026.2.0-dev

2026-01-25 11:09:17 +02:00

seastarx.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

serialization_visitors.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

serializer_impl.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

serializer.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

serializer.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

service_permit.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

shell.nix

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

sstable_dict_autotrainer.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

sstable_dict_autotrainer.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

sstables_loader.cc

sstables: Remove ignore_component_digest_mismatch from sstable_open_config

2026-04-16 13:49:14 +03:00

sstables_loader.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

stdafx.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

stdafx.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

supervisor.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

table_helper.cc

cql3: Add cql_config parameter to parsed_statement::prepare()

2026-04-16 07:57:25 +03:00

table_helper.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

test.py

test.py: delete dead code in test.py

2026-04-16 22:08:31 +02:00

timeout_config.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

timeout_config.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

tombstone_gc_extension.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

tombstone_gc_options.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

tombstone_gc_options.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

tombstone_gc-internals.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

tombstone_gc.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

tombstone_gc.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

ubsan-suppressions.supp

…

unimplemented.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

unimplemented.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

validation.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

validation.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

version.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

view_info.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

vint-serialization.cc

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

vint-serialization.hh

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++23 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain. This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its API - CQL. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of ScyllaDB.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.1%

Python 26.7%

CMake 0.3%

GAP 0.3%

Shell 0.3%