Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
as this syntax is not supported by the standard, it seems clang
just silently construct the value with the initializer list and
calls the operator=, but GCC complains:
```
/home/kefu/dev/scylladb/cdc/split.cc:392:54: error: converting to ‘std::optional<partition_deletion>’ from initializer list would use explicit constructor ‘constexpr std::optional<_Tp>::optional(_Up&&) [with _Up = const tombstone&; typename std::enable_if<__and_v<std::__not_<std::is_same<std::optional<_Tp>, typename std::remove_cv<typename std::remove_reference<_Iter>::type>::type> >, std::__not_<std::is_same<std::in_place_t, typename std::remove_cv<typename std::remove_reference<_Iter>::type>::type> >, std::is_constructible<_Tp, _Up>, std::__not_<std::is_convertible<_Iter, _Iterator> > >, bool>::type <anonymous> = false; _Tp = partition_deletion]’
392 | _result[t.timestamp].partition_deletions = {t};
| ^
```
to silences the error, and to be more standard compliant,
let's use emplace() instead.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Fixes#7435
Adds an "eor" (end-of-record) column to cdc log. This is non-null only on
last-in-timestamp group rows, i.e. end of a singular source "event".
A client can use this as a shortcut to knowing whether or not he has a
full cdc "record" for a given source mutation (single row change).
Closes#7436
Due to a bug, clang does not decay a type to a reference, failing
the concept evaluation on correct input. Add parentheses to force
it to decay the type.
In some cases, tracking the state of processed rows inside `transformer`
is not needd at all. We don't need to do it if either:
- Preimage and postimage are disabled for the table,
- Only preimage is enabled and we are processing the last timestamp.
This commit disables updating the state in the cases listed above.
Introduces new methods to the change_processor interface that will cause
it to produce pre/postimage rows for requested clustering key, or for
static row.
Introduces logic in split.cc responsible for calling pre/postimage
methods of the change_processor interface. This does not have any effect
on generated CDC log mutations yet, because the transformer class has
empty implementations in place of those methods.
The function is no longer used in log.cc, so instead it is moved to
split.cc.
Removed declaration of the function from the log.hh header, because it
is not used elsewhere - apart from testing code, but it already
declared find_timestamp in the cdc_test.cc file.
This allows for a more refined use of the transformer by the
for_each_change function (now named "process_changes_with_splitting).
The change_processor interface exposes two methods so far:
begin_timestamp, and process_change (previously named "transform").
By separating those two and exposing them, process_changes_with\
_splitting can cause the transformer to generate less CDC log mutations
- only one for each timestamp in the batch.
Overwriting a collection cell using timestamp T is a process with
following steps:
1. inserting a row marker (if applicable) with timestamp T;
2. writing a collection tombstone with timestamp T-1;
3. writing the new collection value with timestamp T.
Since CDC does clustering of the operations by timestamp, this
would result in 3 separate calls to `transform` (in case of
INSERT, or 2 - in the case of UPDATE), which seems excessive,
especially when pre-/postimage is enabled. This patch makes
collection tombstones being treated as if they had the same TS as
the base write and thus they are processed in one call to `transform`
(as long as TTLs are not used).
Also, `cdc_test` had to be updated in places that relied on former
splitting strategy.
Fixes#6084
This patch fixes a bug in mutation splitting logic of CDC. In the part
that handles updates of non-atomic clustering columns, the column
definition was fetched from a static column of the same id instead of
the actual definition of the clustering column. It could cause the value
to be written to a wrong column.
Tests: unit(dev)
If a batch update is performed with a sequence of changes with a single
timestamp, they will now show up in CDC with a single timeuuid in the
`time` column, distinguished by different `batch_seq_no` values.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
`for_each_change` is like `split` but it doesn't return a vector of
mutations representing each change; instead, it takes as a parameter
a function which gets called on each mutation.
This reduced the memory usage and allows to preserve common context
when handling each change (will be useful in next commits).
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
It is possible to produce an empty mutation using CQL. For example, the
following query:
DELETE FROM ks.tbl WHERE pk = 0 AND ck < 1 AND ck > 2;
will attempt to delete from an empty range of rows. This is translated
to the following mutation:
{ks.tbl {key: pk{000400000000}, token:-3485513579396041028}
{mutation_partition:
static: cont=1 {row: },
clustered: {}}}
Such mutation does not contain any timestamp, therefore it is difficult
to determine what timestamp was used while making the query. This is
problematic for CDC, because an entry in CDC log should be written with
the same timestamp as a part of the mutation.
Because an empty mutation does not modify the table in any way, we can
safely skip logging such mutations in CDC and still preserve the
ability to reconstruct the current state of the base table from full
CDC log.
Tests: unit(dev)
This commit introduces a bunch of new structs describing a change made
to a table, and an `extract_changes` function which takes a mutation and
returns the set of changes contained in this mutation, separated by
timestamp and ttl.
The function checks if there are multiple timestamps and/or ttls inside
a mutation, which means separate changes should be created for this
mutation in CDC.