Files
scylladb/docs/features/cdc/cdc-intro.rst
Anna Stuchlik b0ced64c88 doc: remove the limitation for disabling CDC
This commit removes the instruction to stop all writes before disabling CDC with ALTER.

Fixes https://github.com/scylladb/scylla-docs/issues/4020

Closes scylladb/scylladb#24406
2025-06-10 12:53:09 +03:00

91 lines
4.0 KiB
ReStructuredText
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
============
CDC Overview
============
:abbr:`CDC (Change Data Capture)` is a feature that allows you to not only query the current state of a database's table, but also query the history of all changes made to the table.
As an example, suppose you made a sequence of changes to some table in the given order:
.. code-block:: cql
UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 0;
UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 0;
UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 1;
UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 1;
UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 1;
Normally, querying the table would return
.. code-block:: cql
pk | ck | v
----+----+---
0 | 0 | 2
0 | 1 | 0
(2 rows)
but with CDC, you can also learn the history of all changes:
.. code-block:: none
change at 2020-01-29 14:37:32: UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
change at 2020-01-29 14:37:33: UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 0;
change at 2020-01-29 14:37:35: UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 0; <- latest change
change at 2020-01-29 14:37:38: UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 1;
change at 2020-01-29 14:37:39: UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 1;
change at 2020-01-29 14:37:40: UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 1; <- latest change
(not an actual syntax, the above example just presents the general concept).
Use cases for CDC
-----------------
Some examples where CDC may be beneficial:
* Heterogeneous database replication: applying captured changes to another database or table. The other database may use a different schema (or no schema at all), better suited for some specific workloads. An example is replication to ElasticSearch for efficient text searches.
* Implementing a notification system.
* In-flight analytics: looking for patterns in the changes in order to derive useful information, e.g. for fraud detection.
In ScyllaDB CDC is optional and enabled on a per-table basis. The history of changes made to a CDC-enabled table is stored in a separate associated table.
Terminology
-----------
* **Base Table** - this is the original table, where all changes are made.
* **Log Table** - this is the table associated to the base table which is created when CDC is enabled. Read about it in the :doc:`log table document <./cdc-log-table>`.
Enabling CDC
------------
You can enable CDC when creating or altering a table using the ``cdc`` option, for example:
.. code-block:: none
CREATE TABLE ks.t (pk int, ck int, v int, PRIMARY KEY (pk, ck, v)) WITH cdc = {'enabled':true};
.. include:: /features/cdc/_common/cdc-params.rst
Using CDC with Applications
---------------------------
When writing applications, you can now use our language specific libraries to simplify writing applications which will read from ScyllaDB CDC.
The following libraries are available:
* `Go <https://github.com/scylladb/scylla-cdc-go>`_
* `Java <https://github.com/scylladb/scylla-cdc-java>`_
* `Rust <https://github.com/scylladb/scylla-cdc-rust>`_
More information
----------------
`ScyllaDB University: Change Data Capture (CDC) lesson <https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/>`_ - Learn how to use CDC. Some of the topics covered are:
* An overview of Change Data Capture, what exactly is it, what are some common use cases, what does it do, and an overview of how it works
* How can that data be consumed? Different options for consuming the data changes including normal CQL, a layered approach, and integrators
* How does CDC work under the hood? Covers an example of what happens in the DB on different operations to allow CDC
* A summary of CDC: Its easy to integrate and consume, it uses plain CQL tables, its robust, its replicated in the same way as normal data, it has a reasonable overhead, it does not overflow if the consumer fails to act and data is TTLed. The summary also includes a comparison with Cassandra, DynamoDB, and MongoDB.