Commit Graph

6603 Commits

Author SHA1 Message Date
Ferenc Szili
8bb7a18de2 test/cql-pytest: add --omit-scylla-output to Cassandra test runs
Currently, the tests in test/cql-pytest can be run against both ScyllaDB and Cassandra.
Running the test for either will first output the test results, and subsequently
print the stdout output of the process under test. Using the command line
option --omit-scylla-output it is possible to disable this print for Scylla,
but it is not possible for tests run against Cassandra.

This change adds the option to suppress output for Cassandra tests, too. By default,
the stdout of the Cassandra run will still be printed after the test results, but
this can now be disabled with --omit-scylla-output

Closes scylladb/scylladb#17996
2024-03-25 15:14:45 +02:00
Kamil Braun
69bf962522 Merge 'allow changing snitch with topology over raft' from Gleb
Fixes scylladb/scylladb#17513

* 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev:
  doc: amend snitch changing procedure to work with raft
  test: add test to check that snitch change takes effect.
  raft topology: update rack/dc info in topology state on reboot if changed
2024-03-25 10:41:39 +01:00
Gleb Natapov
d7adf26a56 test: add test to check that snitch change takes effect.
The test creates two node cluster with default snitch (SimpleSnitch) and
checks that dc and rack names are as expected. Then it changes the
config to use GossipingPropertyFileSnitch with different names, restart
nodes and check that now peers table has new names.
2024-03-25 10:41:49 +02:00
Raphael S. Carvalho
6bdb456fad sstables_loader: Fix loader when write selector is previous during tablet migration
The loader is writing to pending replica even when write selector is set
to previous. If migration is reverted, then the writes won't be rolled
back as it assumes pending replicas weren't written to yet. That can
cause data resurrection if tablet is later migrated back into the same
replica.

NOTE: write selector is handled correctly when set to next, because
get_natural_endpoints() will return the next replica set, and none
of the replicas will be considered leaving. And of course, selector
set to both is also handled correctly.

Fixes #17892.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17902
2024-03-24 01:20:50 +01:00
Kamil Braun
230f23004b Revert "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
This reverts commit b4144d14c6.

The test is flaky and blocks next promotions.
2024-03-22 17:25:04 +01:00
Petr Gusev
2a5f5d1948 test_fencing: fix flakiness
To cause the stale topology exception the test reads
the version from the last bootstrapped host and assigns its decremented
value to version and fence_version fields of system.topology.
The test assumes that version == fence_version here, if version
is greater than fence_version we won't get state topology
exception in this setup. Tablet balancer can break
this -- it may increment the version after the last node is
bootstrapped.

Fix this by disabling the tablet balancer earlier.

fixes scylladb/scylladb#17807

Closes scylladb/scylladb#17940
2024-03-22 12:49:13 +01:00
Piotr Dulikowski
f23f8f81bf Merge 'Raft-based service levels' from Michał Jadwiszczak
This patch introduces raft-based service levels.

The difference to the current method of working is:
- service levels are stored in `system.service_levels_v2`
- reads are executed with `LOCAL_ONE`
- writes are done via raft group0 operation

Service levels are migrated to v2 in topology upgrade.
After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then)

Fixes #17926

Closes scylladb/scylladb#16585

* github.com:scylladb/scylladb:
  test: test service levels v2 works in recovery mode
  test: add test for service levels migration
  test: add test for service levels snapshot
  test:topology: extract `trigger_snapshot` to utils
  main: create raft dda if sl data was migrated
  service:qos: store information about sl data migration
  service:qos: service levels migration
  main: assign standard service level DDA before starting group0
  service:qos: fix `is_v2()` method
  service:qos: add a method to upgrade data accessor
  test: add unit_test_raft_service_levels_accessor
  service:storage_service: add support for service levels raft snapshot
  service:qos: add abort_source for group0 operations
  service:qos: raft service level distributed data accessor
  service:qos: use group0_guard in data accessor
  cql3:statements: run service level statements on shard0 with raft guard
  test: fix overrides in unit_test_service_levels_accessor
  service:qos: fix indentation
  service:qos: coroutinize some of the methods
  db:system_keyspace: add `SERVICE_LEVELS_V2` table
  service:qos: extract common service levels' table functions
2024-03-22 11:51:53 +01:00
Kamil Braun
9979adb670 Merge 'topology_coordinator: do not clear unpublished CDC generation's data' from Patryk Jędrzejczak
In this PR, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.

We also add a test that passes only after the fix. However, this test
needs to block execution of the CDC generation publisher's loop
twice. Currently, error injections with handlers do not allow it
because handlers always share received messages. Apart from the
first created handler, all handlers would be instantly unblocked by
a message from the past that has already unblocked the first
handler. This seems like a general limitation that could cause
problems in the future, so in this PR, we extend injections with
handlers to solve it once and for all. We add the `share_messages`
parameter to the `inject` (with handler) function. Depending on its
value, handlers will share messages (as before) or not.

Fixes scylladb/scylladb#17497

Closes scylladb/scylladb#17934

* github.com:scylladb/scylladb:
  topology_coordinator: clean_obsolete_cdc_generations: fix log
  topology_coordinator: do not clear unpublished CDC generation's data
  topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
  error_injection: allow injection handlers to not share messages
2024-03-22 11:20:26 +01:00
Kamil Braun
4359a1b460 Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev
In this PR we add timeouts support to raft groups registry. We introduce
the `raft_server_with_timeouts` class, which wraps the `raft::server`
add exposes its interface with additional `raft_timeout` parameter. If
it's set, the wrapper cancels the `abort_source` after certain amount of
time. The value of the timeout can be specified either in the
`raft_timeout` parameter, or the default value can be set in `the
raft_server_with_timeouts` class constructor.

The `raft_group_registry` interface is extended with
`group0_with_timeouts()` method. It returns an instance of
`raft_server_with_timeouts` for group0 raft server. The timeout value
for it is configured in `create_server_for_group0`. It's one minute by
default and can be overridden for tests with
`group0-raft-op-timeout-in-ms` parameter.

The new api allows the client to decide whether to use timeouts or not.
In this PR we are reviewing all the group0 call sites and add
`raft_timeout` if that makes sense. The general principle is that if the
code is handling a client request and the client expects a potential
error, we use timeouts. We don't use timeouts for background fibers
(such as topology coordinator), since they wouldn't add much value. The
only thing the background fiber can do with a timeout is to retry, and
this will have the same end effect as not having a timeout at all.

Fixes scylladb/scylladb#16604

Closes scylladb/scylladb#17590

* github.com:scylladb/scylladb:
  migration_manager: use raft_timeout{}
  storage_service::join_node_response_handler: use raft_timeout{}
  storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
  storage_service::set_tablet_balancing_enabled: use raft_timeout{}
  storage_service::move_tablet: use raft_timeout{}
  raft_check_and_repair_cdc_streams: use raft_timeout{}
  raft_timeout: test that node operations fail properly
  raft_rebuild: use raft_timeout{}
  do_cluster_cleanup: use raft_timeout{}
  raft_initialize_discovery_leader: use raft_timeout{}
  update_topology_with_local_metadata: use with_timeout{}
  raft_decommission: use raft_timeout{}
  raft_removenode: use raft_timeout{}
  join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
  raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
  raft_group0: make_raft_config_nonvoter: add abort_source parameter
  manager_client: server_add with start=false shouldn't call driver_connect
  scylla_cluster: add seeds parameter to the add_server and servers_add
  raft_server_with_timeouts: report the lost quorum
  join_node_request_handler: add raft_timeout{} for start_operation
  skip_mode: add platform_key
  auth: use raft_timeout{}
  raft_group0_client: add raft_timeout parameter
  raft_group_registry: add group0_with_timeouts
  utils: add composite_abort_source.hh
  error_injection: move api registration to set_server_init
  error_injection: add inject_parameter method
  error_injection: move injection_name string into injection_shared_data
  error_injection: pass injection parameters at startup
2024-03-22 10:45:33 +01:00
Botond Dénes
f02baef871 Merge 'test/lib: sstable::test_env consolidate and reduce header footprint' from Avi Kivity
Reduce the sprawl of sstables::test_env in .cc and .hh files, to ease
maintenance and reduce recompilations.

Closes scylladb/scylladb#17965

* github.com:scylladb/scylladb:
  test: sstables::test_env: complete pimplification
  test/lib: test_env: move test_env::reusable_sst() to test_services.cc
2024-03-22 11:26:12 +02:00
Patryk Wrobel
28ed20d65e scylla-nodetool: adjust effective ownership handling
When a keyspace uses tablets, then effective ownership
can be obtained per table. If the user passes only a
keyspace, then /storage_service/ownership/{keyspace}
returns an error.

This change:
 - adds an additional positional parameter to 'status'
   command that allows a user to query status for table
   in a keyspace
 - makes usage of /storage_service/ownership/{keyspace}
   optional to avoid errors when user tries to obtain
   effective ownership of a keyspace that uses tablets
 - implements new frontend tests in 'test_status.py'
   that verify the new logic

Refs: scylladb#17405
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17827
2024-03-22 09:51:57 +02:00
Michał Jadwiszczak
c0853b461c test: test service levels v2 works in recovery mode 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
c551a85cda test: add test for service levels migration 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
5811f696be test: add test for service levels snapshot 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
bf3aed1ecb test:topology: extract trigger_snapshot to utils
The function was defined separately in a few tests.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
2917ec5d51 service:qos: service levels migration
Migrate data from `system_distributes.service_levels` to
`system.service_levels_v2` during raft topology upgrade.

Migration process reads data from old table with CL ALL
and inserts the data to the new table via raft.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
159a6a2169 service:qos: fix is_v2() method 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
fd32f5162a service:qos: add a method to upgrade data accessor 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d403bdfdd5 test: add unit_test_raft_service_levels_accessor
Raft service level data accessor with logic simillar to
`unit_test_service_levels_accessor` to avoid sleeps in boost tests.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d5fa0747d7 service:qos: add abort_source for group0 operations
Add mechanism to abort ongoing group0 operations while draining
service_level_controller or leaving the cluster.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
71c07addb5 service:qos: use group0_guard in data accessor
Adjust service_level_controller and
service_level_controller::service_level_distributed_data_accessor
interfaces to take `group0_guard` while adding/altering/dropping a
service level.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
674286b868 test: fix overrides in unit_test_service_levels_accessor 2024-03-21 23:14:57 +01:00
Avi Kivity
b530dc1e3b test: sstables::test_env: complete pimplification
sstables::test_env uses the pimpl idiom, but incompletely. This
prevents reaping some of the benefits.

Complete the pimplification:
 - the `impl` nested struct is moved out-of-line
 - all non-template member functions are moved out-of-line
 - a destructor is declared and defined out-of-line
 - the move constructor is also defined (necessary after the destructor is
   defined)

After this, we can forward-declare more components.
2024-03-21 22:29:01 +02:00
Avi Kivity
d745929b44 test/lib: test_env: move test_env::reusable_sst() to test_services.cc
test_env implementation is scattered around two .cc, concentrate it
in test_services.cc, which happens to be the file that doesn't cause
link errors.

Move toc_filename with it, as it is its only caller and it is static.
2024-03-21 22:21:02 +02:00
Andrei Chekun
7de28729e7 test: change maintenance socket location to /tmp
Fixes #16912

By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility.

Closes scylladb/scylladb#17941
2024-03-21 18:22:21 +02:00
Patryk Jędrzejczak
27465a00e0 topology_coordinator: do not clear unpublished CDC generation's data
In this commit, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.

We also add a test that passes only after the fix.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
f45aebeee2 topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
In the following commit, we add a test that needs to block the CDC
generation publisher's loop twice. We allow it in this commit by
making handlers of the `cdc_generation_publisher_fiber` injection
share messages. From now on, unblocking every step of the loop will
require sending a new message from the test.

This change breaks the test already using the
`cdc_generation_publisher_fiber` injection, so we adjust the test.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
c5c4cc7d00 error_injection: allow injection handlers to not share messages
For a single injection, all created injection handlers share all
received messages. In particular, it means that one received message
unblocks all handlers waiting for the first message. This behavior
is often desired, for example, if multiple fibers execute the
injected code and we want to unblock them all with a single message.
However, there is a problem if we want to block every execution
of the injected code. Apart from the first created handler, all
handlers will be instantly unblocked by messages from the past that
have already unblocked the first handler.

In one of the following commits, we add a test that needs to block
the CDC generation publisher's loop twice. Since it looks like there
are no good workarounds for this arguably general problem, we extend
injections with handlers in a way that solves it. We introduce the
new `share_messages` parameter. Depending on its value, handlers
will share messages or not. The details are described in the new
comments in `error_injection.hh`.

We also add some basic unit tests for the new funcionality.
2024-03-21 14:35:38 +01:00
Petr Gusev
ae0ec19537 migration_manager: use raft_timeout{}
Checking all the call sites of the migration manager shows
that all of them are initiated by user requests,
not background activities. Therefore, we add a global
raft_timeout{} here.
2024-03-21 16:35:48 +04:00
Petr Gusev
294e1ff464 storage_service::join_node_response_handler: use raft_timeout{}
This function is called as part of a node join procedure
initiated by the user, so having timeouts here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
ca21362ade raft_timeout: test that node operations fail properly 2024-03-21 16:35:48 +04:00
Petr Gusev
099c756ba1 join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
We also add a specific test_quorum_lost_during_node_join. It
exercises the case when the quorum is lost after start_operation
but before these methods are called.
2024-03-21 16:35:48 +04:00
Petr Gusev
99ddffac32 manager_client: server_add with start=false shouldn't call driver_connect
If the server is not started there is not point
in starting the driver, it would fail because there
are no nodes to connect to. On the other hand, we
should connect the driver in server_start()
if it's not connected yet.
2024-03-21 16:35:48 +04:00
Petr Gusev
3f6cf38dd5 scylla_cluster: add seeds parameter to the add_server and servers_add
If this parameter is set, we use its value for
the scylla.yaml of the new node, otherwise we
use IPs of all running nodes as before.

We'll need this parameter in subsequent commits to
restrict the communication between nodes.

We remove default values for _create_server_add_data parameters
since they are redundant - in the two call sites we pass all
of them.
2024-03-21 16:35:48 +04:00
Petr Gusev
99419d5964 raft_server_with_timeouts: report the lost quorum
In this commit we extend the timeout error message with
additional context - if we see that there is no quorum of
available nodes, we report this as the most likely
cause of the error.

We adjust the test by adding this new information to the
expected_error. We need raft-group-registry-fd-threshold-in-ms
to make _direct_fd threshold less than
group0-raft-op-timeout-in-ms.
2024-03-21 16:35:48 +04:00
Petr Gusev
1a3fc58438 join_node_request_handler: add raft_timeout{} for start_operation
In the test, we use the group0-raft-op-timeout-in-ms parameter to
reduce the timeout to one second so as not to waste time.

The join_node_request_handler method contains other group0 calls
which should have timeouts (make_nonvoters and add_entry). They
will be handled in a separate commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
854531ae8e skip_mode: add platform_key
In subsequent commits we are going to add test.py
tests for raft_timeout{} feature. The problem is that
aarch/debug configuration is infamously slow. Timeout
settings used in tests work for all platforms but aarch/debug.

In this commit we extend the skip_mode attribute with the
platform_key property. We'll use @skip_mode('debug', platform_key='aarch64')
to skip the tests for this specific configuration.
The tests will still be run for aarch64/release.
2024-03-21 16:35:43 +04:00
Nadav Har'El
fdeb14b468 Merge 'scylla-nodetool: make command-line parsing fully compatible with the legacy nodetool' from Botond Dénes
There was two more things missing:
* Allow global options to be positioned before the operation/command option (https://github.com/scylladb/scylladb/issues/16695)
* Ignore JVM args (https://github.com/scylladb/scylladb/issues/16696)

This PR fixes both. With this, hopefully we are fully compatible with nodetool as far as command line parsing is concerned.
After this PR goes in, we will need another fix to tools/java/bin/nodetool-wrapper, to allow user to benefit from this fix. Namely, after this PR, we can just try to invoke scylla-nodetool first with all the command-line args as-is. If it returns with exit-code 100, we fall back to nodetool. We will not need the current trick with `--help $1`. In fact, this trick doesn't work currently, because `$1` is not guaranteed to be the command in the first place.

In addition to the above, this PR also introduces a new option, to help us in the switching process. This is `--rest-api-port`, which can also be provided as `-Dcom.scylladb.apiPort`. When provided, this option takes precedence over `--port|-p`. This is intended as a bridge for `scylla-ccm`, which currently provides the JMX port as `--port`. With this change, it can also provided the REST API port as `-Dcom.scylladb.apiPort`. The legacy nodetool will ignore this, while the native nodetool will use it to connect to the correct REST API address. After the switch we can ditch these options.

Fixes: https://github.com/scylladb/scylladb/issues/16695
Fixes: https://github.com/scylladb/scylladb/issues/16696
Refs: https://github.com/scylladb/scylladb/issues/16679
Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17168

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add --rest-api-port option
  tools/scylla-nodetool: ignore JVM args
  tools/utils: make finding the operation command line option more flexible
  tools/utils: get_selected_operation(): remove alias param
  tools: add constant with current help command-line arguments
2024-03-21 14:06:45 +02:00
Pavel Emelyanov
c8fc43d169 test: Update topology_custom/suite::run_first list
The recently added test_tablets_migration dominates with it run-time (10
minutes). Also update other tests, e.g. test_read_repair is not in top-7
for any mode, test_replace and test_raft_recovery_majority_loss are both
not notably slower than most of other tests (~40 sec both). On the other
hand, the test_raft_recovery_basic and test_group0_schema_versioning are
both 1+ minute

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17927
2024-03-21 12:48:50 +01:00
Andrei Chekun
a5455460d8 test: fix flakiness of the multi_dc tests
The initial version used a redundant method, and it did not cover all
cases. So that leads to the flakiness of the test that used this method.
Switching to the cluster_con() method removes flakiness since it's
written more robustly.

Fixes scylladb/scylladb#17914

Closes scylladb/scylladb#17932
2024-03-21 11:17:22 +01:00
Kamil Braun
4dfb7e3051 Merge 'storage_service::merge_topology_snapshot: handle big mutations' from Petr Gusev
The group0 state machine calls `merge_topology_snapshot` from
`transfer_snapshot`. It feeds it with `raft_topology_snapshot` returned
from `raft_pull_topology_snapshot`. This snapshot includes the entire
`system.cdc_generations_v3` table. It can be huge and break the
commitlog `max_record_size` limit.

The `system.cdc_generations_v3` is a single-partition table, so all the
data is contained in one mutation object. To fit the commitlog limit we
split this mutation into many smaller ones and apply them in separate
`database::apply` calls. That means we give up the atomicity guarantee,
but we actually don't need it for `system.cdc_generations_v3` and
`system.topology_requests`.

This PR fixes the dtest
`update_cluster_layout_tests.py::TestLargeScaleCluster::test_add_many_nodes_under_load`

Fixes scylladb/scylladb#17545

Closes scylladb/scylladb#17632

* github.com:scylladb/scylladb:
  test_cdc_generation_data: test snapshot transfer
  storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations
  mutation: add split_mutation function
  storage_service::merge_topology_snapshot: fix indentation
2024-03-21 10:50:03 +01:00
Avi Kivity
628017c810 test: sstables::test_env: mock sstables_registry
sstables::test_env is intended for sstable unit tests, but to satisfy its
dependency of an sstables_registry we instantiate an entire database.

Remove the dependency by having a mock implementation of sstables_registry
and using that instead.

Closes scylladb/scylladb#17895
2024-03-21 10:19:46 +01:00
Tomasz Grabiec
baf12b0b2f test: tablets: Avoid infinite loop in rebalance_tablets()
If there is a bug in the tablet scheduler which makes it never
converge for a given state of topology, rebalance_tablets() will never
complete and will generate a huge amounts of logs. This patch adds a
sanity limit so that we fail earlier.

This was observed in one of the test_load_balancing_with_random_load runs in CI.

Fixes scylladb/scylladb#17894.

Closes scylladb/scylladb#17916
2024-03-21 10:19:46 +01:00
Kamil Braun
bc42a5a092 Merge 'make sure that address map entry is not dropped between join request placement and the request handling' from Gleb
The series marks nodes to be non expiring in the address map earlier, when
they are placed in the topology.

Fixes: scylladb/scylladb#16849

* 'gleb/16849-fix-v2' of github.com:scylladb/scylla-dev:
  test: add test to check that address cannot expire between join request placemen and its processing
  topology_coordinator: set address map entry to nonexpiring when a node is added to the topology
  raft_group0: add modifiable_address_map() function
2024-03-21 10:19:46 +01:00
Avi Kivity
43bcaeb87f Merge 'test: randomized_nemesis_test: add fmt::formatter for some types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* raft_call
* raft_read
* network_majority_grudge
* reconfiguration
* stop_crash
* operation::thread_id
* append_seq
* AppendReg::append
* AppendReg::ret
* operation::either_of<Ops...>
* operation::exceptional_result<Op>
* operation::completion<Op>
* operation::invocable<Op>

and drop their operator<<:s.

in which,

* `operator<<` for append_entry is never used. so it is removed.
* `operator<<` for `std::monostate` and `std::variant` are dropped. as we are now using their counterparts in {fmt}.
* stop_crash::result_type 's `fmt::formatter` is not added, as we cannot define a partial specialization of `fmt::formatter` for a nested class for a template class. we will tackle this struct in another change.

Refs #13245

Closes scylladb/scylladb#17884

* github.com:scylladb/scylladb:
  test: raft: generator: add fmt::formatter:s
  test: randomized_nemesis_test: add fmt::formatter for some types
  test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error
  raft: add fmt::formatter for error classes
2024-03-21 10:19:46 +01:00
Petr Gusev
740b240e9d test_cdc_generation_data: test snapshot transfer
The test only looked at the initial cdc_generation
generation. It made the changes bigger to go
past the raft max_command_size limit.
It then made sure this large mutation set is saved
in several raft commands.

In this commit we enhance the test to check that the
mutations are properly handled during snapshot transfer.
The problem is that the entire system.cdc_generations_v3
table is read into the topology_snapshot and it's total
size can exceed the commitlog max_record_size limit.

We need a separate injection since the compaction
could nullify the effects of the previous injection.

The test fails without the fix from the previous commit.
2024-03-20 22:40:03 +04:00
Petr Gusev
db1afa0aba mutation: add split_mutation function
The function splits the source mutation into multiple
mutations so that their size does not exceed the
max_size limit. The size of a mutation is calculated
as the sum of the memory_usage() of its constituent
mutation_fragments.

The implementation is taken from view_updating_consumer.
We use mutation_rebuilder_v2 to reconstruct mutations from
a stream of mutation fragments and recreate the output
mutation whenever we reach the limit.

We'll need this function in the next commit.
2024-03-20 22:39:51 +04:00
Kefu Chai
61424b615c test: raft: generator: add fmt::formatter:s
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* operation::either_of<Ops...>
* operation::exceptional_result<Op>
* operation::completion<Op>
* operation::invocable<Op>

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Kefu Chai
72899f573e test: randomized_nemesis_test: add fmt::formatter for some types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raft_call
* raft_read
* network_majority_grudge
* reconfiguration
* stop_crash
* operation::thread_id
* append_seq
* append_entry
* AppendReg::append
* AppendReg::ret

and drop their operator<<:s.

in which,

* `operator<<` for `std::monostate` and `std::variant` are dropped.
  as we are now using their counterparts in {fmt}.
* stop_crash::result_type 's `fmt::formatter` is not added, as we
  cannot define a partial specialization of `fmt::formatter` for
  a nested class for a template class. we will tackle this struct
  in another change.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Kefu Chai
97b203b1af test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatter for `seastar::timed_out_error`,
which will be used by the `fmt::formatter` for  `std::variant<...>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00