mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 22:43:15 +00:00

Files

Nadav Har'El 31e0315710 Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski

Fix cdc writing unnecesary entries to it's log, like for example when Alternator deletes an item which in reality doesn't exist.

Originally @wps0 tackled this issue. This patch is an extension of his work. His work involved adding `should_skip` function to cdc, which would process a `mutation` object and decide, wherever changes in the object should be added to cdc log or not.

The issue with his approach is that `mutation` object might contain changes for more than one row. If - for example - the `mutation` object contains two changes, delete of non-existing row and create of non-existing row, `should_skip` function will detect changes in second item and allow whole `mutation` (BOTH items) to be added. For example (using python's boto3) running this on empty table:
```
with table.batch_writer() as batch:
    batch.put_item({'p': 'p', 'c': 'c0'})
    batch.delete_item(Key={'p': 'p', 'c': 'c1'})
```
will emit two events ("put" event and "delete" event), even though the item with `c` set to `c1` does not exist (thus can't be deleted). Note, that both entries in batch write must use the same partition key, otherwise upper layer with split them into separate `mutation` objects and the issue will not happen.

The solution is to do similar processing, but consider each change separated from others. This is tricky to implement due to a way cdc works. When cdc processes `mutation` object (containing X changes), it emits cdc entries in phases. Phase 1 - emit `preimage` (old state) for each change (if requested). Phase 2 - for each change emit actual "diff" (update / delete and so on). Phase 3 - emit `postimage` (new state).

We will know if change needs to be skipped during phase 2. By that time phase 1 is completed and preimage for the change is emited. At that moment we set a flag that the change (identified by clustering key value) needs to be skipped - we add a clustering key to a `ignore-rows` set (`_alternator_clustering_keys_to_ignore` variable) and continue normally. Once all phases finish we add a `postprocess` phase (`clean_up_noop_rows` function). It will go through generated cdc mutations and skip all modifications, for which clustering key is in `ignore-rows` set. After skipping we need to do a "cleanup" operation - each generated cdc mutation contain index (incremented by one), if we skipped some parts, the index is not consecutive anymore, so we reindex final changes.

There's a special case worth mentioning - Alternator tables without clustering keys. At that point `mutation` object passed to cdc can contain exactly one change (since different partition keys are splitted by upper layers and Alternator will never emit `mutation` object containing two (or more) changes with the same primary key. Here, when we decide the change is to be skipped we add empty `bytes` object to `ignore-rows` set. When checking `ignore-rows` set, we check if it's empty or not (we don't check for presence of empty `bytes` object).

Note: there might be some confusion between this patch and #28452 patch. Both started from the same error observation and use similar tests for validation, as both are easily triggered by BatchWrite commands (both needs `mutation` object passed to cdc to contain more than one single change). This issue tho is about wrong data written in cdc log and is fixed at cdc, where #28452 is about wrong way of parsing correct cdc data and is fixed at Alternator side of things. Note, that we need #28452 to truly verify (otherwise we will emit correct cdc entries, but Alternator will incorrectly parse them).

Note: to benefit / notice this patch you need `alternator_streams_increased_compatibility` flag turned on.

Note: rework is quite "broad" and covers a lot of ground - every operation, that might result in a no-change to the database state should be tested. An additional test was added - trying to remove a column from non-existing item, as well as trying to remove non-existing column from existing item.

Fixes: #28368
Fixes: SCYLLADB-1528
Fixes: SCYLLADB-538

Closes scylladb/scylladb#28544

* github.com:scylladb/scylladb:
  alternator: remove unnecesary code
  alternator: fix Alternator writing unnecesary cdc entries
  alternator: add failing tests for Streams

2026-04-18 00:07:51 +03:00

_ext

docs: fix local build

2025-12-14 11:48:48 +02:00

_static

doc: add OS support for version 2025.4

2025-10-28 13:29:40 +03:00

_templates

docs: do not show any version warning for upgrade guide pages

2025-08-22 09:49:27 +03:00

_utils

doc: remove About Upgrade and redirect to Upgrade Policy

2026-04-07 13:44:10 +02:00

alternator

Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski

2026-04-18 00:07:51 +03:00

architecture

tablets: Introduce pow2_count per-table tablet option

2026-04-15 10:40:56 +02:00

cql

vector-store: fix creating local vector search indexes with a part of the partition key

2026-04-17 11:44:15 +02:00

dev

erge 'db: store large data records in SSTable metadata and serve via virtual tables' from Benny Halevy

2026-04-16 14:03:31 +03:00

features

docs/cql: document the new CQL per-row TTL feature

2026-02-25 14:59:44 +02:00

getting-started

Merge 'doc: fix the installation section' from Anna Stuchlik

2026-03-19 17:13:53 +02:00

docs/cql: document the new CQL per-row TTL feature

2026-02-25 14:59:44 +02:00

operating-scylla

Merge 'Enable vnodes-to-tablets migrations with arbitrary tokens' from Nikos Dragazis

2026-04-17 00:46:35 +03:00

reference

docs: make the glossary more tablet inclusive

2026-01-19 11:50:13 +03:00

rst_include

doc: remove an oudated troubleshooting page

2026-04-14 15:14:32 +03:00

troubleshooting

db: implement large_data virtual tables with feature flag gating

2026-04-16 08:49:02 +03:00

upgrade

doc: add the 2026.x patch release upgrade guide-from-2025

2026-04-07 13:52:16 +02:00

using-scylla

cql: add Cassandra SAI (StorageAttachedIndex) compatibility

2026-04-09 17:20:03 +02:00

.gitignore

…

conf.py

docs: Makefile: drop redundant -t $(FLAG) from sphinx options

2026-04-15 14:40:15 +03:00

faq.rst

docs/faq.rst: Fixing small spelling mistake

2026-04-09 11:48:46 +03:00

index.rst

replace the Driver pages with a link to the new Drivers pages

2025-12-04 10:07:27 +02:00

Makefile

docs: Makefile: drop redundant -t $(FLAG) from sphinx options

2026-04-15 14:40:15 +03:00

pyproject.toml

build(deps): bump sphinx-scylladb-theme from 1.9.1 to 1.9.2 in /docs

2026-04-15 14:57:37 +03:00

README-metrics.md

docs: add metrics generation validation

2025-11-25 15:39:52 +03:00

README.md

Clarify documentation build instructions

2025-12-16 06:56:00 +02:00

robots.txt

…

uv.lock

build(deps): bump sphinx-scylladb-theme from 1.9.1 to 1.9.2 in /docs

2026-04-15 14:57:37 +03:00

README.md

ScyllaDB Documentation

This repository contains the source files for ScyllaDB documentation.

The dev folder contains developer-oriented documentation related to the ScyllaDB code base. It is not published and is only available via GitHub.
All other folders and files contain user-oriented documentation related to ScyllaDB and are sources for docs.scylladb.com/manual.

To report a documentation bug or suggest an improvement, open an issue in GitHub issues for this project.

To contribute to the documentation, open a GitHub pull request.

Key Guidelines for Contributors

The user documentation is written in reStructuredText (RST) - a plaintext markup language similar to Markdown. If you're not familiar with RST, see ScyllaDB RST Examples.
The developer documentation is written in Markdown. See Basic Markdown Syntax for reference.
Follow the ScyllaDB Style Guide.

To prevent the build from failing:

If you add a new file, ensure it's added to an appropriate toctree, for example:

 .. toctree::
    :maxdepth: 2
    :hidden:

    Page X </folder1/article1>
    Page Y </folder1/article2>
    Your New Page </folder1/your-new-article>

Make sure the link syntax is correct. See the guidelines on creating links
Make sure the section headings are correct. See the guidelines on creating headings Note that the markup must be at least as long as the text in the heading. For example:
```
----------------------
Prerequisites
----------------------
```

Building User Documentation

Prerequisites

Python
poetry
make

See the ScyllaDB Sphinx Theme prerequisites to check which versions of the above are currently required.

Mac OS X

You must have a working Homebrew in order to install the needed tools.

You also need the standard utility make.

Check if you have these two items with the following commands:

brew help
make -h

Linux Distributions

Building the user docs should work out of the box on most Linux distributions.

Windows

Use "Bash on Ubuntu on Windows" for the same tools and capabilities as on Linux distributions.

Building the Docs

Run make preview in the docs/ directory to build the documentation.
Preview the built documentation locally at http://127.0.0.1:5500/.

Cleanup

You can clean up all the build products and auto-installed Python stuff with:

make pristine

Information for Contributors

If you are interested in contributing to Scylla docs, please read the Scylla open source page at http://www.scylladb.com/opensource/ and complete a Scylla contributor agreement if needed. We can only accept documentation pull requests if we have a contributor agreement on file for you.

Third-party Documentation

Do any copying as a separate commit. Always commit an unmodified version first and then do any editing in a separate commit.
We already have a copy of the Apache license in our tree, so you do not need to commit a copy of the license.
Include the copyright header from the source file in the edited version. If you are copying an Apache Cassandra document with no copyright header, use:

This document includes material from Apache Cassandra.
Apache Cassandra is Copyright 2009-2014 The Apache Software Foundation.