Commit Graph

18 Commits

Author SHA1 Message Date
Kefu Chai
6c06751640 cdc: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16725
2024-01-11 09:13:37 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Avi Kivity
14a4173f50 treewide: make headers self-sufficient
In preparation for some large header changes, fix up any headers
that aren't self-sufficient by adding needed includes or forward
declarations.
2021-04-20 21:23:00 +03:00
Calle Wilund
46ea8c9b8b cdc: Add an "end-of-record" column to
Fixes #7435

Adds an "eor" (end-of-record) column to cdc log. This is non-null only on
last-in-timestamp group rows, i.e. end of a singular source "event".

A client can use this as a shortcut to knowing whether or not he has a
full cdc "record" for a given source mutation (single row change).

Closes #7436
2020-10-26 09:39:27 +02:00
Piotr Dulikowski
20b236d27d cdc: don't update partition state when not needed
In some cases, tracking the state of processed rows inside `transformer`
is not needd at all. We don't need to do it if either:

- Preimage and postimage are disabled for the table,
- Only preimage is enabled and we are processing the last timestamp.

This commit disables updating the state in the cases listed above.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
24b50ffbc8 cdc: add interface for producing pre/postimages
Introduces new methods to the change_processor interface that will cause
it to produce pre/postimage rows for requested clustering key, or for
static row.

Introduces logic in split.cc responsible for calling pre/postimage
methods of the change_processor interface. This does not have any effect
on generated CDC log mutations yet, because the transformer class has
empty implementations in place of those methods.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
82ddeb1992 cdc: track batch_no inside transformer
Move tracking of batch_no inside the transformer.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
7b47f84965 cdc: move cdc$time generation to transformer
Generate the timeuuid on the transformer side, which allows to simplify
the change_processor interface.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
51d97be0b3 cdc: introduce change_processor interface
This allows for a more refined use of the transformer by the
for_each_change function (now named "process_changes_with_splitting).

The change_processor interface exposes two methods so far:
begin_timestamp, and process_change (previously named "transform").
By separating those two and exposing them, process_changes_with\
_splitting can cause the transformer to generate less CDC log mutations
- only one for each timestamp in the batch.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
f907cab156 cdc: remove redundant schema arguments from cdc functions
A `mutation` object already has a reference to its schema. It does not
make sense to call functions changed in this commit with a different
schema.
2020-07-08 15:36:40 +02:00
Botond Dénes
e0284bb9ee treewide: add missing headers and/or forward declarations 2020-03-23 09:29:45 +02:00
Kamil Braun
3200d415da cdc: use a single timeuuid value for a batch of changes
If a batch update is performed with a sequence of changes with a single
timestamp, they will now show up in CDC with a single timeuuid in the
`time` column, distinguished by different `batch_seq_no` values.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-05 12:32:57 +01:00
Kamil Braun
292eba9da0 cdc: replace split with for_each_change
`for_each_change` is like `split` but it doesn't return a vector of
mutations representing each change; instead, it takes as a parameter
a function which gets called on each mutation.

This reduced the memory usage and allows to preserve common context
when handling each change (will be useful in next commits).

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-05 12:05:08 +01:00
Kamil Braun
529d30ef66 cdc: add split function
This function takes a mutation and returns a set of mutations, each
representing a separate change with a single timestamp and ttl.
2020-03-03 13:17:51 +01:00
Kamil Braun
b5c944370e cdc: add should_split function
The function checks if there are multiple timestamps and/or ttls inside
a mutation, which means separate changes should be created for this
mutation in CDC.
2020-03-03 13:17:50 +01:00