system_kayspace: make CDC_GENERATIONS_V3 single-partition

We make CDC_GENERATIONS_V3 single-partition by adding the key
column and changing the clustering key from range_end to
(id, range_end). This is the first step to enabling the efficient
clearing of obsolete CDC generation data, which we need to prevent
Raft-topology snapshots from endlessly growing as we introduce new
generations over time. The next step is to change the type of the id
column to timeuuid. We do it in the following commits.

After making CDC_GENERATIONS_V3 single-partition, there is no easy
way of preserving the num_ranges column. As it is used only for
sanity checking, we remove it to simplify the implementation.
This commit is contained in:
Patryk Jędrzejczak
2023-09-06 09:29:59 +02:00
parent 29f54836d0
commit 2cd430ac80
6 changed files with 45 additions and 34 deletions

View File

@@ -165,16 +165,16 @@ When a node requests the cluster to join, the topology coordinator chooses token
The generation data described by `cdc::topology_description` is then translated into mutations and committed to group 0 using Raft commands. When a node applies these commands (every node in the cluster eventually does that, being a member of group 0), it writes the data into a local table `system.cdc_generations_v3`. The table has the following schema:
```
CREATE TABLE system.cdc_generations_v3 (
key text,
id uuid,
range_end bigint,
ignore_msb tinyint,
num_ranges int static,
streams frozen<set<blob>>,
PRIMARY KEY (id, range_end)
PRIMARY KEY (key, id, range_end)
) ...
```
The table's partition key is the `id uuid` column. The UUID used to insert a new generation into this table is randomly generated by the coordinator.
The table is single-partition where `key` always equals "cdc_generations". The UUID used to insert a new generation into this table is randomly generated by the coordinator.
The committed commands also update the `system.topology` table, storing the UUID in the `new_cdc_generation_data_uuid` column in the row which describes the joining node. Thanks to this, if the coordinator manages to insert the data but then fails, the next coordinator can resume from where the previous coordinator left off - using `new_cdc_generation_data_uuid` to continue with the generation switch.