Files
scylladb/docs/features/cdc/_common/cdc-inserts.rst
Anna Stuchlik 360f7b3d33 doc: move Features to the top-level page
This commit moves the Features page from the section for developers
to the top level in the page tree. This involves:
- Moving the source files to the *features* folder from the  *using-scylla* folder.
- Moving images into *features/images* folder.
- Updating references to the moved resources.
- Adding redirections to the moved pages.

Closes scylladb/scylladb#20401
2024-09-03 07:24:33 +03:00

201 lines
5.8 KiB
ReStructuredText

Inserts
-------
Digression: the difference between inserts and updates
++++++++++++++++++++++++++++++++++++++++++++++++++++++
Inserts are not the same as updates, contrary to a popular belief in Cassandra/ScyllaDB communities. The following example illustrates the difference:
.. code-block:: cql
CREATE TABLE ks.t (pk int, ck int, v int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled':'true'};
UPDATE ks.t SET v = null WHERE pk = 0 AND ck = 0;
SELECT * FROM ks.t WHERE pk = 0 AND ck = 0;
returns:
.. code-block:: cql
pk | ck | v
----+----+---
(0 rows)
However:
.. code-block:: cql
INSERT INTO ks.t (pk,ck,v) VALUES (0, 0, null);
SELECT * FROM ks.t WHERE pk = 0 AND ck = 0;
returns:
.. code-block:: none
pk | ck | v
----+----+------
0 | 0 | null
(1 rows)
.. _row-marker:
Each table has an additional invisible column called the *row marker*. It doesn't hold a value; it only holds *liveness information* (timestamp and time-to-live). If the row marker is alive, the row shows up when you query it, even if all its non-key columns are null. The difference between inserts and updates is that **updates don't affect the row marker**, while **inserts create an alive row marker**.
Here's another example:
.. code-block:: cql
CREATE TABLE ks.t (pk int, ck int, v int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled':'true'};
UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
SELECT * FROM ks.t;
returns:
.. code-block:: cql
pk | ck | v
----+----+---
0 | 0 | 0
(1 rows)
The value in the ``v`` column keeps the ``(pk = 0, ck = 0)`` row alive, therefore it shows up in the query. After we delete it, the row will be gone:
.. code-block:: cql
UPDATE ks.t SET v = null WHERE pk = 0 AND ck = 0;
SELECT * FROM ks.t;
returns:
.. code-block:: none
pk | ck | v
----+----+---
(0 rows)
However, if we had used an ``INSERT`` instead of an ``UPDATE`` in the first place, the row would still show up even after deleting ``v``:
.. code-block:: cql
INSERT INTO ks.t (pk, ck, v) VALUES (0, 0, 0);
UPDATE ks.t set v = null where pk = 0 and ck = 0;
SELECT * from ks.t;
returns:
.. code-block:: none
pk | ck | v
----+----+------
0 | 0 | null
(1 rows)
The row marker introduced by ``INSERT`` keeps the row alive, even if there are no other non-key columns that are not ``null``. Therefore the row shows up in the query.
We can create just the row marker, without updating any columns, like this:
.. code-block:: cql
INSERT INTO ks.t (pk, ck) VALUES (0, 0);
When specifying both key and non-key columns in an ``INSERT`` statement, we're saying "create a row marker, *and* set cells for this row". We can explicitly divide these two operations; the following:
.. code-block:: cql
INSERT INTO ks.t (pk, ck, v) VALUES (0, 0, 0);
is equivalent to:
.. code-block:: cql
BEGIN UNLOGGED BATCH
INSERT INTO ks.t (pk, ck) VALUES (0, 0);
UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
APPLY BATCH;
The ``INSERT`` creates a row marker, the ``UPDATE`` sets the cell in the ``(pk, ck) = (0, 0)`` row and ``v`` column.
Inserts in CDC
++++++++++++++
Inserts affect the CDC log very similarly to updates; if no collections or static columns are involved, the difference lies only in the ``cdc$operation`` column:
#. Start with a basic table and perform some insert:
.. code-block:: cql
CREATE TABLE ks.t (pk int, ck int, v1 int, v2 int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled':'true'};
INSERT INTO ks.t (pk, ck, v1) VALUES (0, 0, 0);
INSERT INTO ks.t (pk, ck, v2) VALUES (0, 0, NULL);
#. Confirm that the insert was performed by displaying the contents of the table:
.. code-block:: cql
SELECT * FROM ks.t;
returns:
.. code-block:: none
pk | ck | v1 | v2
----+----+----+------
0 | 0 | 0 | null
(1 rows)
#. Display the contents of the CDC log table:
.. code-block:: cql
SELECT "cdc$batch_seq_no", pk, ck, v1, "cdc$deleted_v1", v2, "cdc$deleted_v2", "cdc$operation" FROM ks.t_scylla_cdc_log;
returns:
.. code-block:: none
cdc$batch_seq_no | pk | ck | v1 | cdc$deleted_v1 | v2 | cdc$deleted_v2 | cdc$operation
------------------+----+----+------+----------------+------+----------------+---------------
0 | 0 | 0 | 0 | null | null | null | 2
0 | 0 | 0 | null | null | null | True | 2
(2 rows)
Delta rows corresponding to inserts are indicated by ``cdc$operation = 2``.
If a static row update is performed within an ``INSERT``, it is separated from the ``INSERT``, in the same way a clustered row update is separated from a static row update. Example:
.. code-block:: cql
CREATE TABLE ks.t (pk int, ck int, s int static, c int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled': true};
INSERT INTO ks.t (pk, ck, s, c) VALUES (0, 0, 0, 0);
SELECT "cdc$batch_seq_no", pk, ck, s, c, "cdc$operation" FROM ks.t_scylla_cdc_log;
returns:
.. code-block:: none
cdc$batch_seq_no | pk | ck | s | c | cdc$operation
------------------+----+------+------+------+---------------
0 | 0 | null | 0 | null | 1
1 | 0 | 0 | null | 0 | 2
(2 rows)
There is no such thing as a "static row insert". Indeed, static rows don't have a row marker; the only way to make a static row show up is to set a static column to a non-null value. Therefore, the following statement (using the table from above):
.. code-block:: cql
INSERT INTO ks.t (pk, s) VALUES (0, 0);
is equivalent to:
.. code-block:: cql
UPDATE ks.t SET s = 0 WHERE pk = 0;
This is the reason why ``cdc$operation`` is ``1``, not ``2``, in the example above for the static row update.