Commit Graph

65 Commits

Author SHA1 Message Date
Michael Litvak
e7dbccd59e cdc: use chunked_vector instead of vector for stream ids
use utils::chunked_vector instead of std::vector to store cdc stream
sets for tablets.

a cdc stream set usually represents all streams for a specific table and
timestamp, and has a stream id per each tablet of the table. each stream
id is represented by 16 bytes. thus the vector could require quite large
contiguous allocations for a table that has many tablets. change it to
chunked_vector to avoid large contiguous allocations.

Fixes scylladb/scylladb#26791

Closes scylladb/scylladb#26792
2025-10-31 13:02:34 +01:00
Michael Litvak
440caeabcb cdc: helpers for garbage collecting old streams for tablets
introduce helper functions that can be used for garbage collecting old
cdc streams for tablets-based keyspaces.

- get_new_base_for_gc: finds a new base timestamp given a TTL, such that
  all older timestamps and streams can be removed.
- get_cdc_stream_gc_mutations: given new base timestamp and streams,
  builds mutations that update the internal cdc tables and remove the
  older streams.
- garbage_collect_cdc_streams_for_table: combines the two functions
  above to find a new base and build mutations to update it for a
  specific table
- garbage_collect_cdc_streams: builds gc mutations for all cdc tables
2025-10-26 11:01:20 +01:00
Michael Litvak
98de3f0e86 topology coordinator: change streams on tablet split/merge
on tablet split/merge finalization, generate a new CDC timestamp and
stream set for the table with a new stream for each tablet in the new
tablet map, in order to maintain synchronization of the CDC streams with
the tablets.

We pick a new timestamp for the streams with a small delay into the
future so that all nodes can learn about the new streams in time, in the
same way it's done for vnodes.

the new timestamp and streams are published by adding a mutation to the
cdc_streams_history table that contains the timestamp and the sets of
closed and opened streams from the current timestamp.
2025-09-17 14:47:12 +02:00
Michael Litvak
9ec4b6ccb1 cdc: load tablet streams metadata from tables
Read the CDC stream metadata from the internal system tables, and store
it in the cdc metadata data structures.

The metadata is stored in the tables as diffs which is more storage
efficient, but when in-memory we store it as full stream sets for each
timestamp. This is more useful because we need to be able to find a
stream given timestamp and token.
2025-09-17 14:47:12 +02:00
Michael Litvak
aa61f074b5 cdc: helper functions for reading metadata from tables
Define functions in system_keyspace that read from the internal cdc
tables and construct the data into internal cdc data structures.
2025-09-17 14:47:12 +02:00
Michael Litvak
9ef0862155 cdc: remove streams when dropping CDC table
When dropping a CDC log table in a tablets-enabled keyspace, remove all
metadata about the table's CDC streams from the internal CDC tables,
since the streams can't be read anymore.

Similarly, when dropping a tablets-enabled keyspace, remove metadata of
all streams belonging to tables in the keyspace.
2025-09-17 14:47:12 +02:00
Michael Litvak
ed25e420f8 cdc: create streams when allocating tablets
When allocating tablets for a CDC table, create the initial CDC stream
set. We create one stream per each tablet, each stream covering the
corresponding token range.
2025-09-17 14:47:12 +02:00
Dawid Mędrek
90a2e0d1cc cdc/generation: Delete copy constructors of topology_description
The object might be quite big and lead to reactor stalls. Using its
copy constructor is asking for trouble, so let's explicitly delete it.
2025-08-29 13:54:16 +02:00
Dawid Mędrek
508e00319b cdc/generation: Clone topology_description asynchronously
An instance of `cdc::topology_description` can be quite big. The vector
it consists of stores as many `token_range_description`s as there are
vnodes, and the size of each `token_range_description` is O(#shards).

Because of that, copying an instance of the type can lead to reactor
stalls. To prevent that, we introduce an asynchronous function copying
the contents on the object.

Reactor stalls were detected in the call to `map_reduce` in
`generation_service::legacy_do_handle_cdc_generation`, so let's start
using the new function there.

A similar scenario occurs in `generation_service::handle_cdc_generation`,
so we modify it too.

Unfortunately, it doesn't seem viable to provide a reproducer of said
problem.

Fixes scylladb/scylladb#24522
2025-08-29 13:54:00 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
48c8d24345 treewide: drop support for fmt < v10
since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also,
we've switched to the build image based on fedora 40, which ships
fmt-devel v10.2.1, there is no need to support fmt < 10.

in this change, we drop the support fmt < 10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21847
2024-12-09 20:42:38 +02:00
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Avi Kivity
f0ca5e5a08 Merge 'treewide: add fmt::formatter for exception types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter` is added for following types for backward compatibility with {fmt} < 10:

* `utils::bad_exception_container_access`
* `cdc::no_generation_data_exception`
* classes derived from `sstables::malformed_sstable_exception`
* classes derived from `cassandra_exception`

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17944

* github.com:scylladb/scylladb:
  cdc: add fmt::formatter for exception types in data_dictionary.hh
  utils: add fmt::formatter for utils::bad_exception_container_access
  sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception
  exceptions: add fmt::formatter for classes derived from cassandra_exception
  cdc: add fmt::formatter for cdc::no_generation_data_exception
2024-03-21 18:44:37 +02:00
Kefu Chai
f5e1f0ccc7 cdc: add fmt::formatter for cdc::no_generation_data_exception
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<cdc::no_generation_data_exception>` is
added for backward compatibility with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Benny Halevy
fceb1183d3 cdc: should_propose_first_generation: get my_host_id from caller
There is no need to map this node's inet_address to host_id.
The storage_service can easily just pass the local host_id.
While at it, get the other node's host_id directly
from their endpoint_state instead of looking it up
yet again in the gossiper, using the nodes' address.

Refs #12283

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-20 12:53:49 +02:00
Patryk Jędrzejczak
0470b721c2 cdc: generation: allow increasing generation_leeway through error injection
The increased `generation_leeway` is used in the next patch to
write a test. Since it's no longer a constant, we create a new
getter for it.
2024-02-12 10:14:00 +01:00
Michael Huang
62a8a31be7 cdc: use chunked_vector for topology_description entries
Lists can grow very big. Let's use a chunked vector to prevent large contiguous
allocations.
Fixes: #15302.

Closes scylladb/scylladb#15428
2023-09-18 23:17:01 +03:00
Patryk Jędrzejczak
fab066cffe cdc: generation: remove topology_description_generator
After moving the creation of uuid out of
make_new_generation_description, this function only calls the
topology_description_generator's constructor and its generate
method. We could remove this function, but we instead simplify
the code by removing the topology_description_generator class.
We can do this refactor because make_new_generation_description
is the only place using it. We inline its generate method into
make_new_generation_description and turn its private methods into
static functions.
2023-09-12 11:18:54 +02:00
Patryk Jędrzejczak
3bf4cac72e cdc: do not create uuid in make_new_generation_data
In the future commit, we change how we initialize uuid of the
new CDC generation in the Raft-based topology. It forces us to
move this initialization out of the make_new_generation_data
function shared between Raft-based and gossiper-based topologies.

We also rename make_new_generation_data to
make_new_generation_description since it only returns
cdc::topology_description now.
2023-09-12 11:18:38 +02:00
Patryk Jędrzejczak
2cd430ac80 system_kayspace: make CDC_GENERATIONS_V3 single-partition
We make CDC_GENERATIONS_V3 single-partition by adding the key
column and changing the clustering key from range_end to
(id, range_end). This is the first step to enabling the efficient
clearing of obsolete CDC generation data, which we need to prevent
Raft-topology snapshots from endlessly growing as we introduce new
generations over time. The next step is to change the type of the id
column to timeuuid. We do it in the following commits.

After making CDC_GENERATIONS_V3 single-partition, there is no easy
way of preserving the num_ranges column. As it is used only for
sanity checking, we remove it to simplify the implementation.
2023-09-12 09:51:45 +02:00
Patryk Jędrzejczak
ed1c1369d9 cdc: generation: rename get_cdc_generation_mutations
In the following commits, we modify the CDC_GENERATIONS_V3 schema
to enable efficient clearing of obsolete CDC generation data.
These modifications make the current get_cdc_generation_mutations
work only for the CDC_GENERATIONS_V2 schema, and we need a new
function for CDC_GENERATIONS_V3, so we add the "_v2" suffix.
2023-09-11 12:30:21 +02:00
Patryk Jędrzejczak
b05b4a352a cdc: implement the is_cdc_generation_optimal function
In the following commits, we add the CDC generation optimality
check to storage_service::raft_check_and_repair_cdc_streams so
that it doesn't create new CDC generations when unnecessary. Since
generation_service::check_and_repair_cdc_streams already has
this check, we extract it to the new is_cdc_generation_optimal
function to not duplicate the code.
2023-07-28 11:04:17 +02:00
Kamil Braun
3abe0f0ad6 cdc: generation: extract pure parts of make_new_generation outside
`cdc::generation_service::make_new_cdc_generation` would create a new
CDC generation and insert it into the `CDC_GENERATIONS_V2` table these
days. For Raft-based topology chnages we'll do the data insertion
somewhere else - in topology coordinator code. So extract the parts for
calculating the CDC generation to free-standing functions (these are
almost pure calculations, modulo accessing RNG).
2023-04-20 15:38:59 +02:00
Kamil Braun
2233d8f54d db: system_keyspace: add storage for CDC generations managed by group 0
The `CDC_GENERATIONS_V3` table schema is a copy-paste of the
`CDC_GENERATIONS_V2` schema. The difference is that V2 lives in
`system_distributed_keyspace` and writes to it are distributed using
regular `storage_proxy` replication mechanisms based on the token ring.
The V3 table lives in `system_keyspace` and any mutations written to it
will go through group 0.

Also extend the `TOPOLOGY` schema with new columns:
- `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping
  node's `ring_slice`, it stores UUID of a newly introduced CDC
  generation which is used as partition key for the `CDC_GENERATIONS_V3`
  table to access this new generation's data. It's a regular column,
  meaning that every row (corresponding to a node) will have its own.
- `current_cdc_generation_uuid` and `current_cdc_generation_timestamp`
  together form the ID of the newest CDC generation in the cluster.
  (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is
  when the CDC generation starts operating). Those are static columns
  since there's a single newest CDC generation.
2023-04-20 15:38:58 +02:00
Kamil Braun
1e9cf3badd cdc: generation: get_cdc_generation_mutations: take timestamp as parameter
The function would generate a mutation timestamp for itself, take it as
parameter instead. We'll use timestamps provided by Group 0 APIs when
creating CDC generations during Group 0- based topology changes.
2023-04-20 15:38:37 +02:00
Kamil Braun
3e863d0e58 sys_dist_ks: make get_cdc_generation_mutations public
It was a `static` function inside system_distributed_keyspace. Later it
will be used for another table living in system_keyspace, so move it
outside, to the CDC generations module, and make it accessible from
other places.
2023-04-20 15:38:37 +02:00
Kefu Chai
aed681fa3c cdc: generation: schema_tables: use defaulted operator<=>
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job. also, since
operator<=> is defaulted, there is no need to define operator==
anymore, so drop it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Pavel Emelyanov
6fe31d8eac cdc: Move maybe_rewrite_streams_descriptions into after_join
The generation service already has all it needs to do it. This
keeps storage_service smaller and less aware about cdc internals.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 15:34:03 +03:00
Pavel Emelyanov
7a7a87f24a cdc: Turn make_new_cdc_generation a service method
It has everything needed onboard. Only two arguments are required -- the
booststrap tokens and whether or not to inject a delay.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 15:34:03 +03:00
Pavel Emelyanov
b867a19da1 cdc: Remove ring-delay arg from make_new_cdc_generation
It already has the db::config from where to get one (and even
this will change soon).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 15:34:03 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Kamil Braun
337a4ef8ad cdc: when creating new generations, use format v2 if possible
A node with this commit, when creating a new CDC generation (during
bootstrap, upgrade, or when running checkAndRepairCdcStreams command)
will check for the CDC_GENERATIONS_V2 feature and:
- If the feature is enabled create the generation in the v2 format
  and insert it into the new internal table. This is safe because
  a node joins the feature only if it understands the new format.
- Otherwise create it in the v1 format, limiting its size as before,
  and insert it into the old table.

The second case should only happen if we perform bootstrap or run
checkAndRepairCdcStreams in the middle of an upgrade procedure. On fully
upgraded clusters the feature shall be enabled, causing all new
generations to use the new format.
2021-05-25 16:07:23 +02:00
Kamil Braun
4658adbe18 tree-wide: introduce cdc::generation_id_v2
This is a new type of CDC generation identifiers. Compared to old IDs,
additionally to the timestamp it contains an UUID.

These new identifiers will allow a safer and more efficient algorithm of
introducing new generations into a cluster (introduced in a later commit).

For now, nodes keep using the old identifier format when creating new
generations and whenever they learn about a new CDC generation from gossip
they assume that it also is stored in the v1 format. But they do know how
to (de)serialize the second format and how to persist new identifiers in
local tables.
2021-05-24 17:50:21 +02:00
Kamil Braun
99fd2244a3 tree-wide: introduce cdc::generation_id type
This is a follow-up to the previous commit.

Each CDC generation has a timestamp which denotes a logical point in time
when this generation starts operating. That same timestamp is
used to identify the CDC generation. We use this identification scheme
to exchange CDC generations around the cluster.

However, the fact that a generation's timestamp is used as an ID for
this generation is an implementation detail of the currently used method
of managing CDC generations.

Places in the code that deal with the timestamp, e.g. functions which
take it as an argument (such as handle_cdc_generation) are often
interested in the ID aspect, not the "when does the generation start
operating" aspect. They don't care that the ID is a `db_clock::time_point`.
They may sometimes want to retrieve the time point given the ID (such as
do_handle_cdc_generation when it calls `cdc::metadata::insert`),
but they don't care about the fact that the time point actually IS the ID.

In the future we may actually change the specific type of the ID if we
modify the generation management algorithms.

This commit is an intermediate step that will ease the transition in the
future. It introduces a new type, `cdc::generation_id`. Inside it contains
the timestamp, so:
1. if a piece of code doesn't care about the timestamp, it just passes
   the ID around
2. if it does care, it can simply access it using the `get_ts` function.
   The fact that `get_ts` simply accesses the ID's only field is an
   implementation detail.

Using the occasion, we change the `do_handle_cdc_generation_intercept...`
function to be a standard function, not a coroutine. It turns out that -
depending on the shape of the passed-in argument - the function would
sometimes miscompile (the compiled code would not copy the argument to the
coroutine frame).
2021-04-07 13:47:13 +02:00
Kamil Braun
e486e0f759 tree-wide: rename "cdc streams timestamp" to "cdc generation id"
Each CDC generation always has a timestamp, but the fact that the
timestamp identifies the generation is an implementation detail.
We abstract away from this detail by using a more generic naming scheme:
a generation "identifier" (whatever that is - a timestamp or something
else).

It's possible that a CDC generation will be identified by more than a
timestamp in the (near) future.

The actual string gossiped by nodes in their application state is left
as "CDC_STREAMS_TIMESTAMP" for backward compatibility.

Some stale comments have been updated.
2021-04-06 13:15:31 +02:00
Kamil Braun
0cb2f58514 cdc: remove some functions from generation.hh
They are not used outside of the generation module.
2021-04-06 13:15:31 +02:00
Kamil Braun
2e2d51cf2b cdc: get rid of "get_local_streams_timestamp"
This function retrieves the persisted timestamp of the last known CDC
generation (which this node is currently gossiping to other nodes).
It checks that the timestamp is present; if not, it throws an error.

The check is unnecessary. It's used only in a quite esoteric place
(start_gossiping, which implements an almost-never-used API call),
and it's fine if the timestamp is gone - in start_gossiping,
we can start gossiping the tokens without the CDC generation timestamp
(well, if the timestamp is not present in system tables, something
weird must have happened, but that doesn't mean we can't resume
gossiping - fixing CDC generation management in such a case is
a separate problem).
2021-04-06 13:15:31 +02:00
Kamil Braun
022d7773f4 cdc: coroutinize make_new_cdc_generation 2021-02-22 12:47:44 +01:00
Kamil Braun
26ca9d6c33 cdc: coroutinize update_streams_description 2021-02-22 12:46:53 +01:00
Kamil Braun
9bdd000e97 cdc: rewrite streams to the new description table
Nodes automatically ensure that the latest CDC generation's list of
streams is present in the streams description table. When a new
generation appears, we only need to update the table for this
generation; old generations are already inserted.

However, we've changed the description table (from
`cdc_streams_descriptions` to `cdc_streams_descriptions_v2`). The
existing mechanism only ensures that the latest generation appears in
the new description table. This commit adds an additional procedure that
rewrites the older generations as well, if we find that it is necessary
to do so (i.e. when some CDC log tables may contain data in these
generations).
2021-02-18 11:44:59 +01:00
Kamil Braun
7c91894ddf cdc: introduce no_generation_data_exception exception type 2021-02-18 11:44:59 +01:00
Kamil Braun
ba920361b3 cdc: use chunked_vector for streams in streams_version
The vector may get quite long (say... 1,6M stream IDs). We prevent a
large allocation by using utils::chunked_vector.
2021-02-18 11:44:59 +01:00
Kamil Braun
9ae4467970 cdc: remove streams_version::expired field
This field was not used anywhere.
2021-02-18 11:44:59 +01:00
Piotr Jastrzebski
649f254863 cdc: Limit size of topology description
Currently, whole topology description for CDC is stored in a single row.
This means that for a large cluster of strong machines (say 100 nodes 64
cpus each), the size of the topology description can reach 32MB.

This causes multiple problems. First of all, there's a hard limit on
mutation size that can be written to Scylla. It's related to commit log
block size which is 16MB by default. Mutations bigger than that can't be
saved. Moreover, such big partitions/rows cause reactor stalls and
negatively influence latency of other requests.

This patch limits the size of topology description to about 4MB. This is
done by reducing the number of CDC streams per vnode and can lead to CDC
data not being fully colocated with Base Table data on shards. It can
impact performance and consistency of data.

This is just a quick fix to make it easily backportable. A full solution
to the problem is under development.

For more details see #7961, #7993 and #7985.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2021-02-17 13:24:40 +01:00
Piotr Jastrzebski
390cef6a96 cdc: Extract create_stream_ids from topology_description_generator
This new function will be used in the following patches in additional
places.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2021-02-10 10:24:06 +01:00
Piotr Jastrzebski
3024795507 cdc: Change for_testing to add_delay in make_new_cdc_generation
The meaning of the parameter changes from defining whether the function
is called in testing environment to deciding whether a delay should be
added to a timestamp of a newly created CDC generation.

This is a preparation for improvement in the following patch that does
not always add delay to every node but only to non-first node.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 12:19:42 +01:00
Piotr Jastrzebski
2091408478 cdc: Make it possible for CDC generation creation to fail
Following patch enables CDC by default and this means CDC has to work
will all the clusters now.

There is a problematic case when existing cluster with no CDC support
is stopped, all the binaries are updated to newer version with
CDC enabled by default. In such case, nodes know that they are already
members of the cluster but they can't find any CDC generation so they
will try to create one. This creation may fail due to lack of QUORUM
for the write.

Before this patch such situation would lead to node failing to start.
After the change, the node will start but CDC generation will be
missing. This will mean CDC won't be able to work on such cluster before
nodetool checkAndRepairCdcStreams is run to fix the CDC generation.

We still fail to bootstrap if the creation of CDC generation fails.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:29:31 +01:00