scylladb

Author	SHA1	Message	Date
Piotr Wieczorek	a32e8091a9	alternator, cdc: Don't emit an event for equal items This commit adds a function that compares split mutations with the `row_state`, that was selected as a preimage or propagated through cdc options by a caller. If the items are equal, the corresponding log row isn't generated. The result being that creating an item with BatchWriteItem, PutItem, or UpdateItem doesn't emit an INSERT/MODIFY event if exactly identical item already exists. Comparing the items may be costly, so this logic is controlled by `alternator_streams_compabitiblity` flag. This commit handles the following cases: - `PutItem/UpdateItem/BatchWriteItem.PutItem of an existing and equal item: nothing`	2025-10-30 08:38:30 +01:00
Piotr Wieczorek	e3fde8087a	cdc: Don't split a row marker away from row cells CDC log table records a mutation as a sequence of log rows that record an atomic change (i.e. a row marker, tombstones, etc.), whereas a mutation in Alternator Streams always appears as a single log row. The type of operation is determined based on the type of the last log row in CDC. As a result, updates that create a row always appeared to Alternator Streams as an update (row marker + data), rather than an insert. This commit makes them a single log row. Its operation type is insert if it contains a row marker, and an update otherwise, which gives results consistent with DynamoDB Streams.	2025-10-30 07:40:31 +01:00
Ernest Zaslavsky	5ba5aec1f8	treewide: Move mutation related files to a `mutation` directory As requested in #22104, moved the files and fixed other includes and build system. Moved files: - combine.hh - collection_mutation.hh - collection_mutation.cc - converting_mutation_partition_applier.hh - converting_mutation_partition_applier.cc - counters.hh - counters.cc - timestamp.hh Fixes: #22104 This is a cleanup, no need to backport Closes scylladb/scylladb#25085	2025-09-24 13:23:38 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Kefu Chai	94e36d4af4	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. this change addresses the leftover of 850ee7e170a. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19467	2024-06-25 12:11:28 +03:00
Kefu Chai	6c06751640	cdc: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16725	2024-01-11 09:13:37 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Avi Kivity	14a4173f50	treewide: make headers self-sufficient In preparation for some large header changes, fix up any headers that aren't self-sufficient by adding needed includes or forward declarations.	2021-04-20 21:23:00 +03:00
Calle Wilund	46ea8c9b8b	cdc: Add an "end-of-record" column to Fixes #7435 Adds an "eor" (end-of-record) column to cdc log. This is non-null only on last-in-timestamp group rows, i.e. end of a singular source "event". A client can use this as a shortcut to knowing whether or not he has a full cdc "record" for a given source mutation (single row change). Closes #7436	2020-10-26 09:39:27 +02:00
Piotr Dulikowski	20b236d27d	cdc: don't update partition state when not needed In some cases, tracking the state of processed rows inside `transformer` is not needd at all. We don't need to do it if either: - Preimage and postimage are disabled for the table, - Only preimage is enabled and we are processing the last timestamp. This commit disables updating the state in the cases listed above.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	24b50ffbc8	cdc: add interface for producing pre/postimages Introduces new methods to the change_processor interface that will cause it to produce pre/postimage rows for requested clustering key, or for static row. Introduces logic in split.cc responsible for calling pre/postimage methods of the change_processor interface. This does not have any effect on generated CDC log mutations yet, because the transformer class has empty implementations in place of those methods.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	82ddeb1992	cdc: track batch_no inside transformer Move tracking of batch_no inside the transformer.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	7b47f84965	cdc: move cdc$time generation to transformer Generate the timeuuid on the transformer side, which allows to simplify the change_processor interface.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	51d97be0b3	cdc: introduce change_processor interface This allows for a more refined use of the transformer by the for_each_change function (now named "process_changes_with_splitting). The change_processor interface exposes two methods so far: begin_timestamp, and process_change (previously named "transform"). By separating those two and exposing them, process_changes_with\ _splitting can cause the transformer to generate less CDC log mutations - only one for each timestamp in the batch.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	f907cab156	cdc: remove redundant schema arguments from cdc functions A `mutation` object already has a reference to its schema. It does not make sense to call functions changed in this commit with a different schema.	2020-07-08 15:36:40 +02:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Kamil Braun	3200d415da	cdc: use a single timeuuid value for a batch of changes If a batch update is performed with a sequence of changes with a single timestamp, they will now show up in CDC with a single timeuuid in the `time` column, distinguished by different `batch_seq_no` values. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 12:32:57 +01:00
Kamil Braun	292eba9da0	cdc: replace `split` with `for_each_change` `for_each_change` is like `split` but it doesn't return a vector of mutations representing each change; instead, it takes as a parameter a function which gets called on each mutation. This reduced the memory usage and allows to preserve common context when handling each change (will be useful in next commits). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 12:05:08 +01:00
Kamil Braun	529d30ef66	cdc: add `split` function This function takes a mutation and returns a set of mutations, each representing a separate change with a single timestamp and ttl.	2020-03-03 13:17:51 +01:00
Kamil Braun	b5c944370e	cdc: add `should_split` function The function checks if there are multiple timestamps and/or ttls inside a mutation, which means separate changes should be created for this mutation in CDC.	2020-03-03 13:17:50 +01:00

23 Commits