Commit Graph

11801 Commits

Author SHA1 Message Date
Nadav Har'El
fc861742d7 cql: avoid undefined behavior in totimestamp() of extreme dates
This patch fixes a UBSAN-reported integer overflow during one of our
existing tests,

   test_native_functions.py::test_mintimeuuid_extreme_from_totimestamp

when attempting to convert an extreme "date" value, millions of years
in the past, into a "timestamp" value. When UBSAN crashing is enabled,
this test crashes before this patch, and succeeds after this patch.

The "date" CQL type is 32-bit count of *days* from the epoch, which can
span 2^31 days (5 million years) before or after the epoch. Meanwhile,
the "timestamp" type measures the number of milliseconds from the same
epoch, in 64 bits. Luckily (or intentionally), every "date", however
extreme, can be converted into a "timestamp": This is because 2^31 days
is 1.85e17 milliseconds, well below timestamp's limit of 2^63 milliseconds
(9.2e18).

But it turns out that our conversion function, date_to_time_point(),
used some boost::gregorian library code, which carried out these
calculations in **microsecond** resolution. The extra conversion to
microseconds wasn't just wasteful, it also caused an integer overflow
in the extreme case: 2^31 days is 1.85e20 microseconds, which does NOT
fit in a 64-bit integer. UBSAN notices this overflow, and complains
(plus, the conversion is incorrect).

The fix is to do the trivial conversion on our own (a day is, by
convention, exactly 86400 seconds - no fancy library is needed),
without the grace of Boost. The result is simpler, faster, correct
for the Pliocene-age dates, and fixes the UBSAN crash in the test.

Fixes #17516

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17527
2024-02-27 17:04:18 +02:00
Botond Dénes
5dc145a93f test/topology_custom: test_read_repair.py: reduce run-time
This test needed a lot of data to ensure multiple pages when doing the
read repair. This change two key configuration items, allowing
for a drastic reduction of the data size and consequently a large
reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa,
  reducing this to a too small value would start killing internal
  queries. Now, after said commit, this is no longer a concern, as this
  limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.

With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8

The runtime of the test consequently drops from 62 seconds to 13.5
seconds (dev mode, on my build machine).
2024-02-27 02:27:55 -05:00
Botond Dénes
8213e66815 replica/database: use include page-size in max-result-size
This patch changes get_unlimited_query_max_result_size():
* Also set the page-size field, not just the soft/hard limits
* Renames it to get_query_max_result_size()
* Update callers, specifically storage_proxy::get_max_result_size(),
  which now has a much simpler common return path and has to drop the
  page size on one rare return path.

This is a purely mechanical change, no behaviour is changed.
2024-02-27 02:27:55 -05:00
Kefu Chai
1fe7a467e7 mutation: add fmt::formatter for mutation_fragment_v2::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for mutation_fragment_v2::printer

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 17:47:05 +08:00
Kefu Chai
3d6948c13e tools/scylla-nodetool: implement info
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 14:52:22 +08:00
Kefu Chai
4d8f74f301 test/nodetool: move format_size into utils.py
so that this helper can be shared across more tests. `test_info.py`
will be using it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 14:52:22 +08:00
Raphael S. Carvalho
f07c233ad5 Fix potential data resurrection when another compaction type does cleanup work
Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.

So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.

Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.

Scylla 5.4 is affected.

Fixes #17501.
Fixes #17452.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17502
2024-02-25 13:08:04 +02:00
Lakshmi Narayanan Sreethar
c7eab9329f test/topology_custom: add testcase to verify reshape with tablets
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar
ed2d8529f3 test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Botond Dénes
89efa89dd7 Merge 'test: add fmt::formatters' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for some types used in testing.

Refs #13245

Closes scylladb/scylladb#17485

* github.com:scylladb/scylladb:
  test/unit: add fmt::formatter for tree_test_key_base
  test: add printer for type for BOOST_REQUIRE_EQUAL
  test: add fmt::formatters
  test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
2024-02-23 09:32:39 +02:00
Botond Dénes
1f363a876e Merge 'utils: add fmt::formatter for occupancy_stats, managed_bytes and friends ' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* managed_bytes
* managed_bytes_view
* managed_bytes_opt
* occupancy_stats

and drop their operator<<:s

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17462

* github.com:scylladb/scylladb:
  utils/managed_bytes: add fmt::formatters for managed_bytes and friends
  utils/logalloc: add fmt::formatter for occupancy_stats
2024-02-23 09:31:22 +02:00
Botond Dénes
d314ad2725 Merge 'sstables: close index_reader in has_partition_key' from Aleksandra Martyniuk
If index_reader isn't closed before it is destroyed, then ongoing
sstables reads won't be awaited and assertion will be triggered.

Close index_reader in has_partition_key before destroying it.

Fixes: #17232.

Closes scylladb/scylladb#17355

* github.com:scylladb/scylladb:
  test: add test to check if reader is closed
  sstables: close index_reader in has_partition_key
2024-02-23 09:27:55 +02:00
Kefu Chai
010fb5f323 tools/scylla-nodetool: make keyspace argument optional for "ring"
the "keyspace" argument of the "ring" command is optional. but before
this change, we considered it a mandatory option. it was wrong.

so, in this change, we make it optional, and print out the warning
message if the keyspace is not specified.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17472
2024-02-23 09:25:29 +02:00
Botond Dénes
a08d9ba2a4 Merge 'tools/scylla-nodetool: fixes to address test failures with dtest' from Kefu Chai
* tighten the param check for toppartitions
* add an extra empty line inbetween reports

Closes scylladb/scylladb#17486

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add an extra empty line inbetween reports
  tools/scylla-nodetool: tighten the param check for toppartitions
2024-02-23 09:05:30 +02:00
Botond Dénes
959d33ba39 Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk
RPC calls lose information about the type of returned exception.
Thus, if a table is dropped on receiver node, but it still exists
on a sender node and sender node streams the table's data, then
the whole operation fails.

To prevent that, add a method which synchronizes schema and then
checks, if the exception was caused by table drop. If so,
the exception is swallowed.

Use the method in streaming and repair to continue them when
the table is dropped in the meantime.

Fixes: #17028.
Fixes: #15370.
Fixes: #15598.

Closes scylladb/scylladb#17231

* github.com:scylladb/scylladb:
  repair: handle no_such_column_family from remote node gracefully
  test: test drop table on receiver side during streaming
  streaming: fix indentation
  streaming: handle no_such_column_family from remote node gracefully
  repair: add methods to skip dropped table
2024-02-23 08:25:45 +02:00
Kefu Chai
3574c22d73 test/nodetool/utils: print out unmatched output on test failure
would be more helpful if the matched could print out the unmatched
output on test failure. so, in this change, both stdout and stderr
are printed if they fail to match with the expected error.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17489
2024-02-23 08:20:30 +02:00
Kefu Chai
381c389b56 tools/scylla-nodetool: tighten the param check for toppartitions
the test cases of `test_any_of_required_parameters_is_missing`
considers that we should either pass all positional argument or
pass none of them, otherwise nodetool should fail. but `scylla nodetool`
supported partial positional argument.

to be more consistent with the expected behavior, in this change,
we enforce the sanity check so that we only accept either all
positional args or none of them. the corresponding test is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 12:57:51 +08:00
Kefu Chai
3d9054991b utils/logalloc: add fmt::formatter for occupancy_stats
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `occupancy_stats`, and
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 11:32:41 +08:00
Avi Kivity
bf107dae84 test/unit: add fmt::formatter for tree_test_key_base
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for the classes derived from `tree_test_key_base`

(this change was extracted from a larger change at #15599)

Refs #13245
2024-02-23 10:52:12 +08:00
Kefu Chai
a70318e722 test: add printer for type for BOOST_REQUIRE_EQUAL
after dropping the operator<< for vector, we would not able to
use BOOST_REQUIRE_EQUAL to compare vector<>. to be prepared for this,
less defined the printer for Boost.test

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:52:12 +08:00
Kefu Chai
63396f780d test: add fmt::formatters
the operator<< for `cql3::expr::test_utils::mutation_column_value` is
preserved, as it used by test/lib/expr_test_utils.cc, which prints
std::map<sstring, cql3::expr::test_utils::mutation_column_value> using
the homebrew generic formatter for std::map<>. and the formatter uses
operator<< for printing the elements in map.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:52:12 +08:00
Kefu Chai
2ccd9e695d test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* scheduling_latency_measurer
* perf_result

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:17:50 +08:00
Pavel Emelyanov
5afaa03241 test/object_store: Remove unused managed_cluster (and other stuff)
Now all test cases use pylib manager client to manipulate cluster
While at it -- drop more unused bits from suite .py files

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:40:25 +03:00
Pavel Emelyanov
95ed46e26a test/object_store: Use tmpdir fixture in flush-retry case
Now when the test case in question is not using ManagerCluster, there's
no point in using test_tempdir either and the temporary object-store
config can be generated in generic temporary directory

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:39:30 +03:00
Pavel Emelyanov
252688fe0c test/object_store: Turn flush-retry case to use ManagerClient
In the middle this test case needs to force scylla server reload its
configs. Currently manager API requires that some existing config option
is provided as an argument, but in this test case scylla.yaml remains
intact. So it satisfies the API with non-chaning option.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
e742906f1f test/object_store: Turn "misconfigured" case to use ManagerClient
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
857b48f950 test/object_store: Turn garbage-collect case to use ManagerClient
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
d27b91cfb4 test/object_store: Turn basic case to use ManagerClient
This case is a bit tricky, as it needs to know where scylla's workdir
is, so it replaces the use of test_tempdir with the call to manager to
get server's workdir.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Avi Kivity
67f8dc5a7c Merge 'mutation: add fmt::formatter for clustering_row, row_tombstone and friends' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* row_tombstone
* row_marker
* deletable_row::printer
* row::printer
* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17461

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for clustering_row and friends
  mutation: add fmt::formatter for row_tombstone and friends
2024-02-22 16:16:26 +02:00
Pavel Emelyanov
89d0704d9b test/object_store: Prepare to work with ManagerClient
This includes

- marking the suite as Topology
- import needed fixtures and options from topology conftest
- configuring the zero initial cluster size and anonymous auth
- marking all test cases as skipped, as they no longer work after above

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:02:05 +03:00
Aleksandra Martyniuk
4530be9e5b test: add test to check if reader is closed
Add test to check if reader is closed in sstable::has_partition_key.
2024-02-22 14:53:14 +01:00
Nadav Har'El
b0233c0833 Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity
Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token.

We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.

Closes scylladb/scylladb#17455

* github.com:scylladb/scylladb:
  interval: rename nonwrapping_interval to interval
  interval: rename interval_test to wrapping_interval_test
2024-02-22 14:03:43 +02:00
Pavel Emelyanov
027282ee07 perf_simple_query: Add --memtable-partitions option
There's the --partitions one that specifies how many partitions the test
would generate before measuring. When --bypass-cache option is in use,
thus making the test alway engage sstables readers, it makes sense to
add some control over sstables granularity. The new option suggests that
during population phase, memtable gets flushed every $this-number
partitions, not just once at the end (and unknown amount of times in the
middle because of dirty memory limit).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:44:17 +03:00
Pavel Emelyanov
fd4c2e607e perf_simple_query: Disable auto compaction
Usually a perf test doesn't expect that some activity runs in the
background without controls. Compaction is one of a kind, so it makes
sense to keep it off while running the measurement.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:43:23 +03:00
Pavel Emelyanov
74899f71de perf_simple_query: Keep number of initial tablets in output json
When producing the output json file, keep how many initial tablets were
requested (if at all) next to other workload parameters

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:42:39 +03:00
Kamil Braun
3ee56e1936 Merge 'raft topology: enable writes to previous CDC generations' from Patryk Jędrzejczak
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.

In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. This PR adjusts the Raft-based topology
to ensure all required generations are loaded into memory and their
data isn't cleared too early.

To load all required generations into memory, we replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
To ensure this set doesn't grow endlessly, we remove an entry from
this set together with the data in CDC_GENERATIONS_V3.

Currently, we may clear a CDC generation's data from
CDC_GENERATIONS_V3 if it is not the last committed generation
and it is at least 24 hours old (according to the topology
coordinator's clock). However, after allowing writes to the
previous CDC generations, this condition became incorrect. We
might clear data of a generation that could still be written to.
The new solution introduced in this PR is to clear data of the
generations that finished operating more than 24 hours ago.

Apart from the changes mentioned above, this PR hardens
`test_cdc_generation_clearing.py`.

Fixes scylladb/scylladb#16916
Fixes scylladb/scylladb#17184
Fixes scylladb/scylladb#17288

Closes scylladb/scylladb#17374

* github.com:scylladb/scylladb:
  test: harden test_cdc_generation_clearing
  test: test clean-up of committed_cdc_generations
  raft topology: clean committed_cdc_generations
  raft topology: clean only obsolete CDC generations' data
  storage_service: topology_state_load: load all committed CDC generations
  system_keyspace: load_topology_state: fix indentation
  raft topology: store committed CDC generations' IDs in the topology
2024-02-22 11:41:25 +01:00
Kefu Chai
37c6073fd5 mutation: add fmt::formatter for clustering_row and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 17:53:34 +08:00
Avi Kivity
51df8b9173 interval: rename nonwrapping_interval to interval
Our interval template started life as `range`, and was supported
wrapping to follow Cassandra's convention of wrapping around the
maximum token.

We later recognized that an interval type should usually be non-wrapping
and split it into wrapping_range and nonwrapping_range, with `range`
aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and
so renamed it to `interval`. Given that intervals are usually non-wrapping,
the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes
that an interval is non-wrapping and so doesn't need the
nonwrapping_interval_designation. We just rename nonwrapping_interval
to `interval` and remove the type alias.
2024-02-21 19:43:17 +02:00
Avi Kivity
e338f0e009 interval: rename interval_test to wrapping_interval_test
As preparation for reclaiming the name `interval` for nonwrapping_interval,
rename interval_test to wrapping_interval_test.
2024-02-21 19:38:53 +02:00
Botond Dénes
ca585903b7 test/cql-pytest: remove skip_with_tablets fixture
All tests that used it are fixed, and we should not add any new tests
failing with tablets from now on, so remove.
2024-02-21 02:08:49 -05:00
Botond Dénes
8df82d4781 test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests
To run with both vnodes and tablets. For this functionality, both
replication methods should be covered with tests, because it uses
different ways to produce partition lists, depending on the replication
method.

Also add scylla_only to those tests that were missing this fixture
before. All tests in this suite are scylla-only and with the
parameterization, this is even more apparent.
2024-02-21 02:08:49 -05:00
Botond Dénes
b09b949159 test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets
The underlying functionality was fixed, the tests should now pass with
tablets.
2024-02-21 02:08:49 -05:00
Botond Dénes
7bdd0c2cae locator: introduce tablet_range_spliter
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
2024-02-21 02:08:48 -05:00
Botond Dénes
239484f259 interval: add before() overload which takes another interval
The current point variant cannot take inclusiveness into account, when
said point comes from another interval bound.
This method had no tests at all, so add tests covering both overloads.
2024-02-21 02:08:48 -05:00
Avi Kivity
605bf6e221 range.hh: retire
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.

Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.

The unit tests are renamed and range.hh is deleted.

Closes scylladb/scylladb#17428
2024-02-21 00:24:25 +02:00
Tomasz Grabiec
e63d8ae272 Merge 'Handle tablet migration failure while streaming' from Pavel Emelyanov
It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration.

This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier.

To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field.

refs: #16527

Closes scylladb/scylladb#17360

* github.com:scylladb/scylladb:
  test/topology: Add checking error paths for failed migration
  topology.tablets_migration: Handle failed streaming
  topology.tablets_migration: Add cleanup_target transition stage
  topology.tablets_migration: Add revert_migration transition stage
  storage_service: Rewrap cleanup stage checking in cleanup_tablet()
  test/topology: Move helpers to get tablet replicas to pylib
2024-02-20 18:50:55 +01:00
Botond Dénes
73a3a3faf3 Merge 'tools/scylla-nodetool: implement tablestats' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17387

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement tablestats
  utils/rjson: add templated streaming_writer::Write()
2024-02-20 14:46:07 +02:00
Patryk Jędrzejczak
419354bc9f test: harden test_cdc_generation_clearing
In one of the previous patches, we fixed scylladb/scylladb#16916 as
a side effect. We removed
`system_keyspace::get_cdc_generations_cleanup_candidate`, which
contained the bug causing the issue.

Even though we didn't have to fix this issue directly, it showed us
that `test_cdc_generation_clearing` was too weak. If something went
wrong during/after the only clearing, the test still could pass
because the clearing was the last action in the test. In
scylladb/scylladb#16916, the CDC generation publisher was stuck
after the clearing because of a recurring error. The test wouldn't
detect it. Therefore, we harden the test by expecting two clearings
instead of one. If something goes wrong during the first clearing,
there is a high chance that the second clearing will fail. The new
test version wouldn't pass with the old bug in the code.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
2b724735d1 test: test clean-up of committed_cdc_generations
We extend `test_cdc_generation_clearing`. Now, it also tests the
clean-up of `TOPOLOGY.committed_cdc_generations` added in the
previous patch.

In the implementation, we harden the already existing
`check_system_topology_and_cdc_generations_v3_consistency`. After
the previous patch, data of every generation present in
`committed_cdc_generations` should be present in CDC_GENERATIONS_V3.
In other words, `committed_cdc_generations` should always be a
subset of a set containing generations in CDC_GENERATIONS_V3.
Before the previous patch, this wasn't true after the clearing, so
the new version of `test_cdc_generation_clearing` wouldn't pass
back then.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
b8aa74f539 raft topology: clean only obsolete CDC generations' data
Currently, we may clear a CDC generation's data from
CDC_GENERATIONS_V3 if it is not the last committed generation
and it is at least 24 hours old (according to the topology
coordinator's clock). However, after allowing writes to the
previous CDC generations, this condition became incorrect. We
might clear data of a generation that could still be written to.

The new solution is to clear data of the generations that
finished operating more than 24 hours ago. The rationale behind
it is in the new comment in
`topology_coordinator:clean_obsolete_cdc_generations`.

The previous solution used the clean-up candidate. After
introducing `committed_cdc_generations`, it became unneeded.
The last obsolete generation can be computed in
`topology_coordinator:clean_obsolete_cdc_generations`. Therefore,
we remove all the code that handles the clean-up candidate.

After changing how we clear CDC generations' data,
`test_current_cdc_generation_is_not_removed` became obsolete.
The tested feature is not present in the code anymore.

`test_dependency_on_timestamps` became the only test case covering
the CDC generation's data clearing. We adjust it after the changes.
2024-02-20 12:35:18 +01:00