Commit Graph

85 Commits

Author SHA1 Message Date
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Kefu Chai
ef0f4eaef2 test: do not use operator<< for std::optional
we don't provide it anymore, and if any of existing type provides
constructor accepting an `optional<>`, and hence can be formatted
using operator<< after converting it, neither shall we rely on this
behavior, as it is fragile.

so, in this change, we switch to `fmt::print()` to use {fmt} to
print `optional<>`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-18 10:41:48 +08:00
Kefu Chai
372a4d1b79 treewide: do not define FMT_DEPRECATED_OSTREAM
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.

in this change,

* utils: drop the range formatters in to_string.hh and to_string.c, as
  we don't use them anymore. and the tests for them in
  test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
  we are not able to print the elements using operator<< anymore
  after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
  due to a bug in {fmt} v9, {fmt} fails to format a range whose
  element type is `basic_sstring<uint8_t>`, as it considers it
  as a string-like type, but `basic_sstring<uint8_t>`'s char type
  is signed char, not char. this issue does not exist in {fmt} v10,
  so, in this change, we add a workaround to explicitly specialize
  the type trait to assure that {fmt} format this type using its
  `fmt::formatter` specialization instead of trying to format it
  as a string. also, {fmt}'s generic ranges formatter calls the
  pair formatter's `set_brackets()` and `set_separator()` methods
  when printing the range, but operator<< based formatter does not
  provide these method, we have to include this change in the change
  switching to {fmt}, otherwise the change specializing
  `fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
  for comparing values. but without the operator<< based formatters,
  Boost.Test would not be able to print them. after removing
  the homebrew formatters, we need to use the generic
  `boost_test_print_type()` helper to do this job. so we are
  including `test_utils.hh` in tests so that we can print
  the formattable types.
* treewide: add "#include "utils/to_string.hh" where
  `fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:57:36 +08:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kefu Chai
97587a2ea4 test/boost: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17139
2024-02-06 13:22:16 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Pavel Emelyanov
64c8a59e9b test: Open-code ks.cf name parse into cdc_test
The test uses qualified ks.cf name to find the schema, but it's the only
test case that does it. There's no point in maintaining a dedicated
helper on the cql_test_env just for that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:36 +03:00
Pavel Emelyanov
b4c84f9174 test: Use BOOST_REQUIRE(!db.has_schema())
Surprisingly there's a dedicated helper for the check opposite to the
one fixed in the previous patch. Fix one too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:36 +03:00
Pavel Emelyanov
063baabaee test: Use BOOST_REQUIRE(db.has_schema())
Same as in previous patch, the cql_test_env::require_table_exists()
helper is exactly the same, but returns future and asserts on failures
for no gain

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:32 +03:00
Avi Kivity
42a1ced73b cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.

Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.

The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
2023-05-07 17:17:36 +03:00
Kefu Chai
df63e2ba27 types: move types.{cc,hh} into types
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.

the building systems are updated accordingly.

the source files referencing `types.hh` were updated using following
command:

```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```

the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12926
2023-02-19 21:05:45 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Raphael S. Carvalho
3c5afb2d5c test: Enable Scylla test command line options for boost tests
We have enabled the command line options without changing a
single line of code, we only had to replace old include
with scylla_test_case.hh.

Next step is to add x-log-compaction-groups options, which will
determine the number of compaction groups to be used by all
instantiations of replica::table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Kamil Braun
3be376f6c5 test/boost: cdc_test: remove test_cdc_across_shards
The test checked if creating a table with CDC enabled on shard other
than 0 would create the CDC log table as well; it was a regression test
for #5582. However we will soon bounce all schema change requests to
shard 0, so the test's purpose is gone.

I need to remove this test because `cquery_nofail` does not handle the
bouncing correctly: it silently accepts the bounce message, assumes that
the query was successful and returns. So after we change the code to
start bouncing all requests to shard 0, if a query was ran inside test
code using `cquery_nofail` on a shard different than 0 it would do
nothing and following queries executed on shard 0 would fail because they
depended on the effect of the aforementioned query.
2022-06-23 16:14:41 +02:00
Calle Wilund
adda43edc7 CDC - do not remove log table on CDC disable
Fixes #10489

Killing the CDC log table on CDC disable is unhelpful in many ways,
partly because it can cause random exceptions on nodes trying to
do a CDC-enabled write at the same time as log table is dropped,
but also because it makes it impossible to collect data generated
before CDC was turned off, but which is not yet consumed.

Since data should be TTL:ed anyway, retaining the table should not
really add any overhead beyond the compaction to eventually clear
it. And user did set TTL=0 (disabled), then he is already responsible
for clearing out the data.

This also has the nice feature of meshing with the alternator streams
semantics.

Closes #10601
2022-05-31 19:07:07 +03:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
2d25705db0 cql3: deinline non-trivial methods in selection.hh
This allows us to forward-declare raw_selector, which in turn reduces
indirect inclusions of expression.hh from 147 to 58, reducing rebuilds
when anything in that area changes.

Includes that were lost due to the change are restored in individual
translation units.

Closes #9434
2021-10-05 12:58:55 +02:00
Benny Halevy
4439e5c132 everywhere: cleanup defer.hh includes
Get rid of unused includes of seastar/util/{defer,closeable}.hh
and add a few that are missing from source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-22 21:11:39 +03:00
Pavel Solodovnikov
76bea23174 treewide: reduce header interdependencies
Use forward declarations wherever possible.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>

Closes #8813
2021-06-07 15:58:35 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Kamil Braun
c948573398 sys_dist_ks: don't create old CDC generations table on service initialization
The old table won't be created in clusters that are bootstrapped after
this commit. It will stay in clusters that were upgraded from a version
before this commit.

Note that a fully upgraded cluster doesn't automatically create a new
generation in the new format. Even if the last generation was created
before the upgrade, the cluster will keep using it.
A new generation will be created in the new format when either:
1. a new node bootstraps (in the new version),
2. or the user runs checkAndRepairCdcStreams, which has a new check: if
   the current generation uses the old format, the command will decide
   that repair is needed, even if the generation is completely fine
   otherwise (also in the new version).

During upgrade, while the CDC_GENERATIONS_V2 feature is still not
enabled, the user may still bootstrap a node in the old version of
Scylla or run checkAndRepairCdcStreams on a not-yet-upgraded node. In
that case a new generation will be created in the old format,
using the old table definitions.
2021-05-25 16:07:23 +02:00
Kamil Braun
f25e77c202 test: cdc: include new generations table in permissions test 2021-05-25 16:07:23 +02:00
Piotr Grabowski
778fbb144f cdc: tests: check cdc$deleted_ columns in images
Add a test that checks whether the cdc$deleted_ columns are properly
filled in the pre/post-image rows.

This test checks tables with only atomic columns, tables with frozen
collections and non-frozen collections. The test is performed with
both 'true' pre-image mode and 'full' pre-image mode.
2021-05-04 12:33:15 +02:00
Kamil Braun
67d4e5576d sys_dist_ks: split CDC streams table partitions into clustered rows
Until now, the lists of streams in the `cdc_streams_descriptions` table
for a given generation were stored in a single collection. This solution
has multiple problems when dealing with large clusters (which produce
large lists of streams):
1. large allocations
2. reactor stalls
3. mutations too large to even fit in commitlog segments

This commit changes the schema of the table as described in issue #7993.
The streams are grouped according to token ranges, each token range
being represented by a separate clustering row. Rows are inserted in
reasonably large batches for efficiency.

The table is renamed to enable easy upgrade. On upgrade, the latest CDC
generation's list of streams will be (re-)inserted into the new table.

Yet another table is added: one that contains only the generation
timestamps clustered in a single partition. This makes it easy for CDC
clients to learn about new generations. It also enables an elegant
two-phase insertion procedure of the generation description: first we
insert the streams; only after ensuring that a quorum of replicas
contains them, we insert the timestamp. Thus, if any client observes a
timestamp in the timestamps table (even using a ONE query),
it means that a quorum of replicas must contain the list of streams.
2021-02-18 11:44:59 +01:00
Kamil Braun
2da723b9c8 cdc: produce postimage when inserting with no regular columns
When a row was inserted into a table with no regular columns, and no
such row existed in the first place, postimage would not be produced.
Fix this.

Fixes #7716.

Closes #7723
2020-12-01 18:01:23 +02:00
Piotr Jastrzebski
debd10cc55 cdc: Remove trailing whitespaces from cdc_tests
The change was performed automatically using vim and
:%s/\s\+$//e

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 16:25:22 +01:00
Piotr Jastrzebski
6bdbfbafb7 cdc: Remove mk_cdc_test_config from tests
Now that CDC is GA and enabled by default, there's no longer a need
for a specific config in CDC tests.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 16:21:32 +01:00
Piotr Jastrzebski
e9072542c1 Mark CDC as GA
Enable CDC by default.
Rename CDC experimental feature to UNUSED_CDC to keep accepting cdc
flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:13 +01:00
Calle Wilund
46ea8c9b8b cdc: Add an "end-of-record" column to
Fixes #7435

Adds an "eor" (end-of-record) column to cdc log. This is non-null only on
last-in-timestamp group rows, i.e. end of a singular source "event".

A client can use this as a shortcut to knowing whether or not he has a
full cdc "record" for a given source mutation (single row change).

Closes #7436
2020-10-26 09:39:27 +02:00
Kamil Braun
ff78a3c332 cdc: rename CDC description tables... again
Commit a6ad70d3da changed the format of
stream IDs: the lower 8 bytes were previously generated randomly, now
some of them have semantics. In particular, the least significant byte
contains a version (stream IDs might evolve with further releases).

This is a backward-incompatible change: the code won't properly handle
stream IDs with all lower 8 bytes generated randomly. To protect us from
subtle bugs, the code has an assertion that checks the stream ID's
version.

This means that if an experimental user used CDC before the change and
then upgraded, they might hit the assertion when a node attempts to
retrieve a CDC generation with old stream IDs from the CDC description
tables and then decode it.
In effect, the user won't even be able to start a node.

Similarly as with the case described in
d89b7a0548, the simplest fix is to rename
the tables. This fix must get merged in before CDC goes out of
experimental.

Now, if the user upgrades their cluster from a pre-rename version, the
node will simply complain that it can't obtain the CDC generation
instead of preventing the cluster from working. The user will be able to
use CDC after running checkAndRepairCDCStreams.

Since a new table is added to the system_distributed keyspace, the
cluster's schema has changed, so sstables and digests need to be
regenerated for schema_digest_test.
2020-08-31 11:33:14 +03:00
Calle Wilund
e50911e5b0 cdc: Do not generate pre/post image for non-existent rows
Fixes #7119
Fixes #7120

If preimage select came up empty - i.e. the row did not exist, either
due to never been created, or once delete, we should not bother creating
a log preimage row for it. Esp. since it makes it harder to interpret the
cdc log.

If an operation in a cdc batch did a row delete (ranged, ck, etc), do
not generate postimage data, since the row does no longer exist.
Note that we differentiate deleting all (non-pk/ck) columns from actual
row delete.
2020-08-26 18:14:09 +00:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Calle Wilund
8cc5076033 cdc_test: Do small test of "full"
Not a huge test change, but at least verifies it works.
2020-08-12 16:04:52 +00:00
Piotr Jastrzebski
52ec0c683e codebase wide: replace erase + remove_if with erase_if
C++20 introduced std::erase_if which simplifies removal of elements
from the collection. Previously the code pattern looked like:

<collection>.erase(
        std::remove_if(<collection>.begin(), <collection>.end(), <predicate>),
        <collection>.end());

In C++20 the same can be expressed with:

std::erase_if(<collection>, <predicate>);

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <6ffcace5cce79793ca6bd65c61dc86e6297233fd.1597064990.git.piotr@scylladb.com>
2020-08-10 18:17:38 +03:00
Piotr Dulikowski
246f8da6f6 cdc: implement pre/postimage persistence
Moves responsibility for generating pre/postimage rows from the
"process_change" method to "produce_preimage" and "produce_postimage".
This commit actually affects the contents of generated CDC log
mutations.

Added a unit test that verifies more complicated cases with CQL BATCH.
2020-07-08 15:36:41 +02:00
Piotr Dulikowski
f907cab156 cdc: remove redundant schema arguments from cdc functions
A `mutation` object already has a reference to its schema. It does not
make sense to call functions changed in this commit with a different
schema.
2020-07-08 15:36:40 +02:00
Piotr Dulikowski
027d20c654 cdc: always include preimage for affected rows
This changes the current algorithm so that the preimage row will not be
skipped if the corresponding rows was not present in preimage query
results.
2020-07-08 15:36:40 +02:00
Kamil Braun
a1e235b1a4 CDC: Don't split collection tombstone away from base update
Overwriting a collection cell using timestamp T is a process with
following steps:
1. inserting a row marker (if applicable) with timestamp T;
2. writing a collection tombstone with timestamp T-1;
3. writing the new collection value with timestamp T.
Since CDC does clustering of the operations by timestamp, this
would result in 3 separate calls to `transform` (in case of
INSERT, or 2 - in the case of UPDATE), which seems excessive,
especially when pre-/postimage is enabled. This patch makes
collection tombstones being treated as if they had the same TS as
the base write and thus they are processed in one call to `transform`
(as long as TTLs are not used).

Also, `cdc_test` had to be updated in places that relied on former
splitting strategy.

Fixes #6084
2020-06-07 17:09:05 +03:00
Kamil Braun
d89b7a0548 cdc: rename CDC description tables
Commit 968177da04 has changed the schema
of cdc_topology_description and cdc_description tables in the
system_distributed keyspace.

Unfortunately this was a backwards-incompatible change: these tables
would always be created, irrespective of whether or not "experimental"
was enabled. They just wouldn't be populated with experimental=off.

If the user now tries to upgrade Scylla from a version before this change
to a version after this change, it will work as long as CDC is protected
b the experimental flag and the flag is off.

However, if we drop the flag, or if the user turns experimental on,
weird things will happen, such as nodes refusing to start because they
try to populate cdc_topology_description while assuming a different schema
for this table.

The simplest fix for this problem is to rename the tables. This fix must
get merged in before CDC goes out of experimental.
If the user upgrades his cluster from a pre-rename version, he will simply
have two garbage tables that he is free to delete after upgrading.

sstables and digests need to be regenerated for schema_digest_test since
this commit effectively adds new tables to the system_distributed keyspace.
This doesn't result in schema disagreement because the table is
announced to all nodes through the migration manager.
2020-06-05 09:59:16 +02:00
Kamil Braun
7a98db2ab3 cdc: set ttl column in log rows which update only collections 2020-05-27 08:40:05 +03:00
Piotr Dulikowski
ff80b7c3e2 cdc: do not change frozen list type in cdc log table
For a column of type `frozen<list<T>>` in base table, a corresponding
column of type `frozen<map<timeuuid, T>>` is created in cdc log.

Although a similar change of type takes place in case of non-frozen
lists, this is unneeded in case of frozen lists - frozen collections are
atomic, therefore there is no need for complicated type that will be
able to represent a column update that depends on its previous value
(e.g. appending elements to the end of the list).

Moreover, only cdc log table creation logic performs this type change
for frozen lists. The logic of `transformer::transform`, which is
responsible for creation of mutations to cdc log, assumes that atomic
columns will have their types unchanged in cdc log table. It simply
copies new value of the column from original mutation to the cdc log
mutation. A serialized frozen list might be copied to a field that is of
frozen map type, which may cause the field to become impossible to
deserialize.

This patch causes frozen list base table columns to have a corresponding
column in cdc log with the same type.

A test is added which asserts that the type of cdc log columns is not
changed in the case of frozen base columns.

Tests: unit(dev)
Fixes #6172
2020-04-14 09:44:22 +02:00
Calle Wilund
65a6ebbd73 cdc: Postimage must check iff we have (pre-)image row data for non-touched columns
Fixes #6143

When doing post-image generation, we also write values for columns not
in delta (actual update), based on data selected in pre-image row.

However, if we are doing initial update/insert with only a subset of
columns, when the pre-image result set is nil, this cannot be done.

Adds check to non-touched column post-image code. Also uses the
pre-image value extractor to handle non-atomic sets properly.

Tests updated.
2020-04-08 13:48:54 +02:00
Calle Wilund
532a8634c6 cdc::log: Only generate pre/post-image when enabled
Fixes #6073

The logic with pre/post image was tangled into looking at "rs"
and would cause pre-image info to be stored even if only post-image
data was enabled.

Now only generate keys (and rows for them) iff explicitly enabled.
And only generate pre-image key iff we have pre-image data.
2020-03-24 15:32:30 +00:00
Calle Wilund
881ebe192b cdc::log: Handle non-atomic column assignments broken into two
Fixes #6070

When mutation splitting was added, non-atomic column assignments were broken
into two invocation of transform. This means the second (actual data assignment)
does not know about the tombstone in first one -> postimage is created as if
we were _adding_ to the collection, not replacing it.

While not pretty, we can handle this knowing that we always get
invoked in timestamp order -> tombstone first, then assign.
So we simply keep track of non-atomic columns deleted across calls
and filter out preimage data post this.

Added test cases for all non-atomics
2020-03-24 14:07:13 +00:00
Piotr Dulikowski
338e473946 cdc: fix non-atomic updates in splitting
This patch fixes a bug in mutation splitting logic of CDC. In the part
that handles updates of non-atomic clustering columns, the column
definition was fetched from a static column of the same id instead of
the actual definition of the clustering column. It could cause the value
to be written to a wrong column.

Tests: unit(dev)
2020-03-23 13:47:23 +01:00
Piotr Dulikowski
6c5c745e25 cdc: add cdc log schema test 2020-03-21 07:33:35 +01:00
Calle Wilund
0a3383c090 cdc: Add postimage implementation
Fixes #4992

Implements post-image support by synthesizing it from
pre-image + delta.

Post-image data differs from the delta data in two ways:

1.) It merges non-atomics into an actual result value
2.) It contains _all_ columns of the row, not just
    those affected by the update.

For a non-atomic field, the post-image value of a column
is either the pre-image or the delta (maybe null)

Tested by adding post-image checks to pre-image test
and collection/udt tests
2020-03-16 09:21:06 +00:00
Juliusz Stasiewicz
49f1a24472 tests/cdc: test preimage on row delete
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-08 13:27:49 +01:00
Piotr Dulikowski
38b7f1ad45 unit tests: register cdc extension before tests
In the following commits, using cdc in tests will require registering
cdc extension explicitly in db config.
2020-03-05 16:11:20 +01:00