This commit removes the instruction to stop all writes before disabling CDC with ALTER. Fixes https://github.com/scylladb/scylla-docs/issues/4020 Closes scylladb/scylladb#24406
91 lines
4.0 KiB
ReStructuredText
91 lines
4.0 KiB
ReStructuredText
============
|
||
CDC Overview
|
||
============
|
||
|
||
|
||
:abbr:`CDC (Change Data Capture)` is a feature that allows you to not only query the current state of a database's table, but also query the history of all changes made to the table.
|
||
|
||
As an example, suppose you made a sequence of changes to some table in the given order:
|
||
|
||
.. code-block:: cql
|
||
|
||
UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
|
||
UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 0;
|
||
UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 0;
|
||
UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 1;
|
||
UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 1;
|
||
UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 1;
|
||
|
||
Normally, querying the table would return
|
||
|
||
.. code-block:: cql
|
||
|
||
pk | ck | v
|
||
----+----+---
|
||
0 | 0 | 2
|
||
0 | 1 | 0
|
||
|
||
(2 rows)
|
||
|
||
but with CDC, you can also learn the history of all changes:
|
||
|
||
.. code-block:: none
|
||
|
||
|
||
change at 2020-01-29 14:37:32: UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
|
||
change at 2020-01-29 14:37:33: UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 0;
|
||
change at 2020-01-29 14:37:35: UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 0; <- latest change
|
||
change at 2020-01-29 14:37:38: UPDATE ks.t SET v = 2 WHERE pk = 0 AND ck = 1;
|
||
change at 2020-01-29 14:37:39: UPDATE ks.t SET v = 1 WHERE pk = 0 AND ck = 1;
|
||
change at 2020-01-29 14:37:40: UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 1; <- latest change
|
||
|
||
(not an actual syntax, the above example just presents the general concept).
|
||
|
||
Use cases for CDC
|
||
-----------------
|
||
|
||
Some examples where CDC may be beneficial:
|
||
|
||
* Heterogeneous database replication: applying captured changes to another database or table. The other database may use a different schema (or no schema at all), better suited for some specific workloads. An example is replication to ElasticSearch for efficient text searches.
|
||
* Implementing a notification system.
|
||
* In-flight analytics: looking for patterns in the changes in order to derive useful information, e.g. for fraud detection.
|
||
|
||
In ScyllaDB CDC is optional and enabled on a per-table basis. The history of changes made to a CDC-enabled table is stored in a separate associated table.
|
||
|
||
Terminology
|
||
-----------
|
||
|
||
* **Base Table** - this is the original table, where all changes are made.
|
||
* **Log Table** - this is the table associated to the base table which is created when CDC is enabled. Read about it in the :doc:`log table document <./cdc-log-table>`.
|
||
|
||
Enabling CDC
|
||
------------
|
||
|
||
You can enable CDC when creating or altering a table using the ``cdc`` option, for example:
|
||
|
||
.. code-block:: none
|
||
|
||
CREATE TABLE ks.t (pk int, ck int, v int, PRIMARY KEY (pk, ck, v)) WITH cdc = {'enabled':true};
|
||
|
||
.. include:: /features/cdc/_common/cdc-params.rst
|
||
|
||
Using CDC with Applications
|
||
---------------------------
|
||
|
||
When writing applications, you can now use our language specific libraries to simplify writing applications which will read from ScyllaDB CDC.
|
||
The following libraries are available:
|
||
|
||
* `Go <https://github.com/scylladb/scylla-cdc-go>`_
|
||
* `Java <https://github.com/scylladb/scylla-cdc-java>`_
|
||
* `Rust <https://github.com/scylladb/scylla-cdc-rust>`_
|
||
|
||
More information
|
||
----------------
|
||
|
||
`ScyllaDB University: Change Data Capture (CDC) lesson <https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/>`_ - Learn how to use CDC. Some of the topics covered are:
|
||
|
||
* An overview of Change Data Capture, what exactly is it, what are some common use cases, what does it do, and an overview of how it works
|
||
* How can that data be consumed? Different options for consuming the data changes including normal CQL, a layered approach, and integrators
|
||
* How does CDC work under the hood? Covers an example of what happens in the DB on different operations to allow CDC
|
||
* A summary of CDC: It’s easy to integrate and consume, it uses plain CQL tables, it’s robust, it’s replicated in the same way as normal data, it has a reasonable overhead, it does not overflow if the consumer fails to act and data is TTL’ed. The summary also includes a comparison with Cassandra, DynamoDB, and MongoDB.
|