Inserts ------- Digression: the difference between inserts and updates ++++++++++++++++++++++++++++++++++++++++++++++++++++++ Inserts are not the same as updates, contrary to a popular belief in Cassandra/ScyllaDB communities. The following example illustrates the difference: .. code-block:: cql CREATE TABLE ks.t (pk int, ck int, v int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled':'true'}; UPDATE ks.t SET v = null WHERE pk = 0 AND ck = 0; SELECT * FROM ks.t WHERE pk = 0 AND ck = 0; returns: .. code-block:: cql pk | ck | v ----+----+--- (0 rows) However: .. code-block:: cql INSERT INTO ks.t (pk,ck,v) VALUES (0, 0, null); SELECT * FROM ks.t WHERE pk = 0 AND ck = 0; returns: .. code-block:: none pk | ck | v ----+----+------ 0 | 0 | null (1 rows) .. _row-marker: Each table has an additional invisible column called the *row marker*. It doesn't hold a value; it only holds *liveness information* (timestamp and time-to-live). If the row marker is alive, the row shows up when you query it, even if all its non-key columns are null. The difference between inserts and updates is that **updates don't affect the row marker**, while **inserts create an alive row marker**. Here's another example: .. code-block:: cql CREATE TABLE ks.t (pk int, ck int, v int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled':'true'}; UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0; SELECT * FROM ks.t; returns: .. code-block:: cql pk | ck | v ----+----+--- 0 | 0 | 0 (1 rows) The value in the ``v`` column keeps the ``(pk = 0, ck = 0)`` row alive, therefore it shows up in the query. After we delete it, the row will be gone: .. code-block:: cql UPDATE ks.t SET v = null WHERE pk = 0 AND ck = 0; SELECT * FROM ks.t; returns: .. code-block:: none pk | ck | v ----+----+--- (0 rows) However, if we had used an ``INSERT`` instead of an ``UPDATE`` in the first place, the row would still show up even after deleting ``v``: .. code-block:: cql INSERT INTO ks.t (pk, ck, v) VALUES (0, 0, 0); UPDATE ks.t set v = null where pk = 0 and ck = 0; SELECT * from ks.t; returns: .. code-block:: none pk | ck | v ----+----+------ 0 | 0 | null (1 rows) The row marker introduced by ``INSERT`` keeps the row alive, even if there are no other non-key columns that are not ``null``. Therefore the row shows up in the query. We can create just the row marker, without updating any columns, like this: .. code-block:: cql INSERT INTO ks.t (pk, ck) VALUES (0, 0); When specifying both key and non-key columns in an ``INSERT`` statement, we're saying "create a row marker, *and* set cells for this row". We can explicitly divide these two operations; the following: .. code-block:: cql INSERT INTO ks.t (pk, ck, v) VALUES (0, 0, 0); is equivalent to: .. code-block:: cql BEGIN UNLOGGED BATCH INSERT INTO ks.t (pk, ck) VALUES (0, 0); UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0; APPLY BATCH; The ``INSERT`` creates a row marker, the ``UPDATE`` sets the cell in the ``(pk, ck) = (0, 0)`` row and ``v`` column. Inserts in CDC ++++++++++++++ Inserts affect the CDC log very similarly to updates; if no collections or static columns are involved, the difference lies only in the ``cdc$operation`` column: #. Start with a basic table and perform some insert: .. code-block:: cql CREATE TABLE ks.t (pk int, ck int, v1 int, v2 int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled':'true'}; INSERT INTO ks.t (pk, ck, v1) VALUES (0, 0, 0); INSERT INTO ks.t (pk, ck, v2) VALUES (0, 0, NULL); #. Confirm that the insert was performed by displaying the contents of the table: .. code-block:: cql SELECT * FROM ks.t; returns: .. code-block:: none pk | ck | v1 | v2 ----+----+----+------ 0 | 0 | 0 | null (1 rows) #. Display the contents of the CDC log table: .. code-block:: cql SELECT "cdc$batch_seq_no", pk, ck, v1, "cdc$deleted_v1", v2, "cdc$deleted_v2", "cdc$operation" FROM ks.t_scylla_cdc_log; returns: .. code-block:: none cdc$batch_seq_no | pk | ck | v1 | cdc$deleted_v1 | v2 | cdc$deleted_v2 | cdc$operation ------------------+----+----+------+----------------+------+----------------+--------------- 0 | 0 | 0 | 0 | null | null | null | 2 0 | 0 | 0 | null | null | null | True | 2 (2 rows) Delta rows corresponding to inserts are indicated by ``cdc$operation = 2``. If a static row update is performed within an ``INSERT``, it is separated from the ``INSERT``, in the same way a clustered row update is separated from a static row update. Example: .. code-block:: cql CREATE TABLE ks.t (pk int, ck int, s int static, c int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled': true}; INSERT INTO ks.t (pk, ck, s, c) VALUES (0, 0, 0, 0); SELECT "cdc$batch_seq_no", pk, ck, s, c, "cdc$operation" FROM ks.t_scylla_cdc_log; returns: .. code-block:: none cdc$batch_seq_no | pk | ck | s | c | cdc$operation ------------------+----+------+------+------+--------------- 0 | 0 | null | 0 | null | 1 1 | 0 | 0 | null | 0 | 2 (2 rows) There is no such thing as a "static row insert". Indeed, static rows don't have a row marker; the only way to make a static row show up is to set a static column to a non-null value. Therefore, the following statement (using the table from above): .. code-block:: cql INSERT INTO ks.t (pk, s) VALUES (0, 0); is equivalent to: .. code-block:: cql UPDATE ks.t SET s = 0 WHERE pk = 0; This is the reason why ``cdc$operation`` is ``1``, not ``2``, in the example above for the static row update.