Commit Graph

38512 Commits

Author SHA1 Message Date
Benny Halevy
672ec66769 test: rest_api: add test_gossiper
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-24 11:37:12 +03:00
Benny Halevy
8825828817 gossiper: add get_unreachable_members_synchronized
Modeled after get_live_members_synchronized,
get_unreachable_members_synchronized calls
replicate_live_endpoints_on_change to synchronize
the state of unreachable_members on all shards.

Fixes scylladb/scylladb#15088

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-24 11:36:55 +03:00
Kamil Braun
93be4c0cb0 Merge 'Base node liveliness consistently on gossiper::is_alive' from Benny Halevy
Currently he gossiper marks endpoint_state objects as alive/dead.
I some cases the endpoint_state::is_alive function is checked but in many other cases
gossiper::is_alive(endpoint) is used to determine if the endpoint is alive.

This series removed the endpoint_state::is_alive state and moves all the logic to gossiper::is_alive
that bases its decision on the endpoint having an endpoint_state and being in the _live_endpoints set.

For that, the _live_endpoints is made sure to be replicated to all shards when changed
and the endpoint_state changes are serialized under lock_endpoint, and also making sure that the
endpoint_state in the _endpoint_states_map is never updated in place, but rather a temporary copy is changed
and then safely replicated using gossiper::replicate

Refs https://github.com/scylladb/scylladb/issues/14794

Closes #14801

* github.com:scylladb/scylladb:
  gossiper: mark_alive: remove local_state param
  endpoint_state: get rid of _is_alive member and methods
  gossiper: is_alive: use _live_endpoints
  gossiper: evict_from_membership: erase endpoint from _live_endpoints
  gossiper: replicate_live_endpoints_on_change: use _live_endpoints_version to detect change
  gossiper: run: no need to replicate live_endpoints
  gossiper: fold update_live_endpoints_version into replicate_live_endpoints_on_change
  gossiper: add mutate_live_and_unreachable_endpoints
  gossiper: reset_endpoint_state_map: clear also shadow endpoint sets
  gossiper: reset_endpoint_state_map: clear live/unreachable endpoints on all shards
  gossiper: functions that change _live_endpoints must be called on shard 0
  gossiper: add lock_endpoint_update_semaphore
  gossiper: make _live_endpoints an unordered_set
  endpoint_state: use gossiper::is_alive externally
2023-08-23 17:18:05 +02:00
Gleb Natapov
d1654ccdda storage_service: register schema version observer before joining group0 and starting gossiper
The schema version is updated by group0, so if group0 starts before
schema version observer is registered some updates may be missed. Since
the observer is used to update node's gossiper state the gossiper may
contain wrong schema version.

Fix by registering the observer before starting group0 and even before
starting gossiper to avoid a theoretical case that something may pull
schema after start of gossiping and before the observer is registered.

Fixes: #15078

Message-Id: <ZOYZWhEh6Zyb+FaN@scylladb.com>
2023-08-23 17:11:51 +02:00
Patryk Jędrzejczak
ef2eac9941 raft topology: make every type in request_param a named struct
We make every alternative type in the request_param variant
a named struct to make the code more readable. Additionally, this
change will make extending request parameters easier if we decide
to do so in the future.

Closes #15132
2023-08-23 16:56:00 +02:00
Patryk Jędrzejczak
7eab9f8a02 raft_removenode: remove "raft topology" from errors
Some runtime errors thrown in storage_service::raft_removenode
start with the "raft topology " prefix. Since "raft topology" is
an implementation detail, we don't want to throw this information
through the user API. Only logs should contain it.

Closes #15136
2023-08-23 16:20:14 +02:00
Nadav Har'El
5530c529c2 test/cql-pytest: regression test for old bug with CAST(f AS TEXT) precision
When casting a float or double column to a string with `CAST(f AS TEXT)`,
Scylla is expected to print the number with enough digits so that reading
that string back to a float or double restores the original number
exactly. This expectation isn't documented anywhere, but makes sense,
and is what Cassandra does.

Before commit 71bbd7475c, this wasn't the
case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision,
which was a bit under enough for a float (which can have 7 decimal digits
of precision), but very much not enough for a double (which can need 15
digits). The origin of this magic "6 digits" number was that Scylla uses
seastar::to_sstring() to print the float and double values, and before
the aforementioned commit those functions used sprintf with the "%g"
format - which always prints 6 decimal digits of precision! After that
commit, to_sstring() now uses a different approach (based on fmt) to
print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double
values to the database, cast them to text, and then recover the float
or double number from that text - and check that we get back exactly the
same float or double object. The test *fails* before the aforementioned
commit, and passes after it. It also passes on Cassandra.

Refs #15127

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15131
2023-08-23 16:06:52 +03:00
Botond Dénes
e7af2a7de8 Merge 'token_metadata::get_endpoint_to_host_id_map_for_reading: restrict to token owners' from Benny Halevy
And verify the they returned host_id isn't null.
Call on_internal_error_noexcept in that case
since all token owners are expected to have their
host_id set. Aborting in testing would help fix
issues in this area.

Fixes scylladb/scylladb#14843
Refs scylladb/scylladb#14793

Closes #14844

* github.com:scylladb/scylladb:
  api: storage_service: improve description of /storage_service/host_id
  token_metadata: get_endpoint_to_host_id_map_for_reading: restrict to token owners
2023-08-23 13:55:14 +03:00
Botond Dénes
139ba553b8 Merge 'sstable, test: log sstable name and pk when capping local_deletion_time ' from Kefu Chai
in this series, we also print the sstable name and pk when writing a tombstone whose local_deletion_time (ldt for short) is greater than INT32_MAX which cannot be represented by an uint32_t.

Fixes #15015

Closes #15107

* github.com:scylladb/scylladb:
  sstable/writer: log sstable name and pk when capping ldt
  test: sstable_compaction_test: add a test for capped tombstone ldt
2023-08-23 09:29:54 +03:00
Botond Dénes
f7505405f0 scylla-gdb.py: use for_each_table() everywhere
scylla-gdb.py has two methods for iterating over all tables:
* all_tables()
* for_each_table()

Despite this, many places in the code iterate over the column family map
directly. This patch leaves just a single method (for_each_table()) and
migrates all the codebase to use it, instead of iterating over the raw
map. While at it, the access to the map is made backward compatible with
pre 52afd9d42d code, said commit wrapped database::_column_families in
tables_metadata object. This broke scylla-gdb.py for older versions.

Closes #15121
2023-08-22 20:39:31 +03:00
Kamil Braun
169d19e5b0 Merge 'raft topology: support --ignore-dead-nodes in removenode and replace' from Patryk Jędrzejczak
We add support for `--ignore-dead-nodes` in `raft_removenode` and
`--ignore-dead-nodes-for-replace` in `raft_replace`. For now, we allow
passing only host ids of the ignored nodes. Supporting IPs is currently
impossible because `raft_address_map` doesn't provide a mapping from IP
to a host id.

The main steps of the implementation are as follows:
- add the `ignore_nodes` column to `system.topology`,
- set the `ignore_nodes` value of the topology mutation in `raft_removenode` and `raft_replace`,
- extend `service::request_param` with alternative types that allow storing a set of ids of the ignored nodes,
- load `ignore_nodes` from `system.topology` into `request_param` in `system_keyspace::load_topology_state`,
- add `ignore_nodes` to `exclude_nodes` in `topology_coordinator::exec_global_command`,
- pass `ignore_nodes` to `replace_with_repair` and `remove_with_repair` in `storage_service::raft_topology_cmd_handler`.

Additionally, we add `test_raft_ignore_nodes.py` with two tests that verify the added changes.

Fixes #15025

Closes #15113

* github.com:scylladb/scylladb:
  test: add test_raft_ignore_nodes
  test: ManagerClient.remove_node: allow List[HostId] for ignore_dead
  raft topology: pass ignore_nodes to {replace, remove}_with_repair
  raft topology: exec_global_command: add ignore_nodes to exclude_nodes
  raft topology: exec_global_command: change type of exclude_nodes
  topology_state_machine: extend request_param with a set of raft ids
  raft topology: set ignore_nodes in raft_removenode and raft_replace
  utils: introduce split_comma_separated_list
  raft topology: add the ignore_nodes column to system.topology
2023-08-22 18:04:59 +02:00
Kamil Braun
cdc3cd2b79 Merge 'raft: add fencing tests' from Petr Gusev
In this PR a simple test for fencing is added. It exercises the data
plane, meaning if it somehow happens that the node has a stale topology
version, then requests from this node will get an error 'stale
topology'. The test just decrements the node version manually through
CQL, so it's quite artificial. To test a more real-world scenario we
need to allow the topology change fiber to sometimes skip unavailable
nodes. Now the algorithm fails and retries indefinitely in this case.

The PR also adds some logs, and removes one seemingly redundant topology
version increment, see the commit messages for details.

Closes #14901

* github.com:scylladb/scylladb:
  test_fencing: add test_fence_hints
  test.py: output the skipped tests
  test.py: add skip_mode decorator and fixture
  test.py: add mode fixture
  hints: add debug log for dropped hints
  hints: send_one_hint: extend the scope of file_send_gate holder
  pylib: add ScyllaMetrics
  hints manager: add send_errors counter
  token_metadata: add debug logs
  fencing: add simple data plane test
  random_tables.py: add counter column type
  raft topology: don't increment version when transitioning to node_state::normal
2023-08-22 16:28:21 +02:00
Piotr Grabowski
17e3e367ca test: use more frequent reconnection policy
The default reconnection policy in Python Driver is an exponential
backoff (with jitter) policy, which starts at 1 second reconnection
interval and ramps up to 600 seconds.

This is a problem in tests (refs #15104), especially in tests that restart
or replace nodes. In such a scenario, a node can be unavailable for an
extended period of time and the driver will try to reconnect to it
multiple times, eventually reaching very long reconnection interval
values, exceeding the timeout of a test.

Fix the issue by using a exponential reconnection policy with a maximum
interval of 4 seconds. A smaller value was not chosen, as each retry
clutters the logs with reconnection exception stack trace.

Fixes #15104

Closes #15112
2023-08-22 15:40:39 +02:00
Avi Kivity
d944872d19 Merge 'Prevent reactor stalls in to_repair_rows_list' from Benny Halevy
This sort series deals with two stall sources in row-level repair `to_repair_rows_list`:
1. Freeing the input `repair_rows_on_wire` in one shot on return (as seen in https://github.com/scylladb/scylladb/issues/14537)
2. Freeing the result `row_list` in one shot on error.  this hasn't been seen in testing but I have no reason to believe it is not susceptible to stalls exactly like repair_rows_on_wire with the same number of rows and mutations.

Fixes https://github.com/scylladb/scylladb/issues/14537

Closes #15102

* github.com:scylladb/scylladb:
  repair: reindent to_repair_rows_list
  repair: to_repair_rows_list: clear_gently on error
  repair: to_repair_rows_list: consume frozen rows gently
2023-08-22 15:29:37 +03:00
Patryk Jędrzejczak
b044ee535f test: add test_raft_ignore_nodes
We add two tests verifying that --ignore-dead-nodes in
raft_removenode and --ignore-dead-nodes-for-replace in
raft_replace are handled correctly.

We need a 7-cluster to have a Raft majority. Therefore, these
tests are quite slow, and we want to run them only in the dev mode.
2023-08-22 14:19:21 +02:00
Patryk Jędrzejczak
6818d13f7d test: ManagerClient.remove_node: allow List[HostId] for ignore_dead
ManagerClient.remove_node allows passing ignore_dead only as
List[IPAddress]. However, raft_removenode currently supports
only host ids. To write a test that passes ignore_dead to
ManagerClient.remove_node in the Raft topology mode, we allow
passing ignore_dead as List[HostId].

Note that we don't want to use List[IPAddress | HostId] because
mixing IP addresses and host ids fails anyway. See
ss::remove_node.set(...) in api::set_storage_service.
2023-08-22 14:19:09 +02:00
Patryk Jędrzejczak
26ad527666 raft topology: pass ignore_nodes to {replace, remove}_with_repair
To properly stream ranges during the removenode or replace
operation in the Raft topology mode, we pass IPs of the ignored
nodes to replace_with_repair and remove_with_repair in
storage_service::raft_topology_cmd_handler.
2023-08-22 14:18:39 +02:00
Patryk Jędrzejczak
e685182290 raft topology: exec_global_command: add ignore_nodes to exclude_nodes
We add ignore_nodes to exclude_nodes in exec_global_command
to ignore nodes marked as dead by --ignore-dead-nodes for
raft_removenode and --ignore-dead-nodes-for-replace for
raft_replace.
2023-08-22 14:18:37 +02:00
Patryk Jędrzejczak
5ebee35f99 raft topology: exec_global_command: change type of exclude_nodes
We extend exclude_nodes in exec_global_command with ignore_nodes
in the next commit. Since we already use std::unordered_set to
store ids of the ignored nodes and their number is unknown, we
change the type of exclude_nodes from utils::small_vector to
std::unordered_set.
2023-08-22 14:17:55 +02:00
Patryk Jędrzejczak
1f57d80ba1 topology_state_machine: extend request_param with a set of raft ids
We add two new alternative types to service::request_param:
removenode_param and replace_param. They allow storing the list
of ignored nodes loaded from the ignore_nodes column of
system.topology. We also remove the raft::server_id type because
it has been only used by the replace operation.
2023-08-22 14:17:37 +02:00
Patryk Jędrzejczak
7d3dc306eb raft topology: set ignore_nodes in raft_removenode and raft_replace
To handle --ignore-dead-nodes in raft_removenode and
--ignore-dead-nodes-for-replace in raft_replace, we set the
ignore_nodes value of the topology mutation in these functions. In
the following commits, we ensure that the topology coordinator
properly makes use of it.
2023-08-22 14:13:51 +02:00
Petr Gusev
1ddc76ffd1 test_fencing: add test_fence_hints
The test makes a write through the first node with
the third node down, this causes a hint to be stored on the
first node for the second. We increment the version
and fence_version on the third node, restart it,
and expect to see a hint delivery failure
because of versions mismatch. Then we update the versions
of the first node and expect hint to be successfully
delivered.
2023-08-22 15:48:40 +04:00
Petr Gusev
3ccd2abad4 test.py: output the skipped tests
pytest option -rs forces it to print
all the skipped tests along with
the reasons. Without this option we
can't tell why certain tests were skipped,
maybe some of them shouldn't already.
2023-08-22 15:48:40 +04:00
Petr Gusev
c434d26b36 test.py: add skip_mode decorator and fixture
Syntactic sugar for marking tests to be
skipped in a particular mode.

There is skip_in_debug/skip_in_release in suite.yaml,
but they can be applied only on the entire file,
which is unnatural and inconvenient. Also, they
don't allow to specify a reason why the test is skipped.

Separate dictionary skipped_funcs is needed since
we can't use pytest fixtures in decorators.
2023-08-22 15:48:40 +04:00
Petr Gusev
a639d161e6 test.py: add mode fixture
Sometimes a test wants to know what mode
it is running in so that e.g. it can skip
itself in some of them.
2023-08-22 15:48:40 +04:00
Petr Gusev
439c91851f hints: add debug log for dropped hints
Dropping data is rather important event,
let's log it at least at the debug level.
It'll help in debugging tests.
2023-08-22 15:48:40 +04:00
Petr Gusev
9fd3df13a2 hints: send_one_hint: extend the scope of file_send_gate holder
The problem was that the holder in with_gate
call was released too early. This happened
before the possible call to on_hint_send_failure
in then_wrapped. As a result, the effects of
on_hint_send_failure (segment_replay_failed flag)
were not visible in send_one_file after
ctx_ptr->file_send_gate.close(), so we could decide
that the segment was sent in full and delete
it even if sending of some hints led to errors.

Fixes #15110
2023-08-22 15:48:40 +04:00
Petr Gusev
0b7a90dff6 pylib: add ScyllaMetrics
This patch adds facilities to work
with Scylla metrics from test.py tests.
The new metrics property was added to
ManagerClient, its query method
sends a request to Scylla metrics
endpoint and returns and object
to conveniently access the result.

ScyllaMetrics is copy-pasted from
test_shedding.py. It's difficult
to reuse code between 'new' and 'old'
styles of tests, we can't just import
pylib in 'old' tests because of some
problems with python search directories.
A past commit of mine that attempted
to solve this problem was rejected on review.
2023-08-22 14:31:04 +04:00
Petr Gusev
1b7603af23 hints manager: add send_errors counter
There was no indication of problems
in the hints manager metrics before.
We need this counter for fencing tests
in the later commit, but it seems to be
useful on its own.
2023-08-22 14:31:04 +04:00
Petr Gusev
fa25e6d63e token_metadata: add debug logs
We log the new version when the new token
metadata is set.

Also, the log for fence_version is moved
in shared_token_metadata from storage_service
for uniformity.
2023-08-22 14:31:04 +04:00
Petr Gusev
360453fd87 fencing: add simple data plane test
The test starts a three node cluster
and manually decrements the version on
the last node. It then tries to write
some data through the last node and
expects to get 'stale topology' exception.
2023-08-22 14:31:01 +04:00
Benny Halevy
801987ab19 gossiper: mark_alive: remove local_state param
It is not used anymore.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 12:06:45 +03:00
Benny Halevy
75d1dd3a76 endpoint_state: get rid of _is_alive member and methods
Now that gossiper bases its is_alive status on _live_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 12:06:45 +03:00
Benny Halevy
8a92a1c699 gossiper: is_alive: use _live_endpoints
Use the presence of of the endpoint in _live_endpoints
as the authoritative source for is_alive
rather than the endpoint_state::is_alive status.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 12:06:45 +03:00
Benny Halevy
a79acbb643 gossiper: evict_from_membership: erase endpoint from _live_endpoints
Although it shouldn't be necessary, erase endpoint
from _live_endpoint, just in case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 12:06:30 +03:00
Benny Halevy
ce2b8724ed gossiper: replicate_live_endpoints_on_change: use _live_endpoints_version to detect change
Rather than keeping an expensive copy of `_live_endpoint`
and `_unreachable_endpoints` in shadow members, while they aren't
currently used for their content anyhow, just to detect when
their corresponding members change.

With that, it is renamed to replicate_live_and_unreachable_endpoints.

This still doesn't provide strong exception safety guarantees,
but at least we don't "cheat" about shard state
and we don't mark shard 0 state as "replicated" by
updating the shadow members.
Also, we save some unneeded allocations.

Refs scylladb/scylladb#15089
Refs scylladb/scylladb#15088

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 12:04:25 +03:00
Benny Halevy
d666fbfe8f gossiper: run: no need to replicate live_endpoints
As asias@scylladb.com noticed, after the previous
patch that calls replicate_live_endpoints_on_change
in mutate_live_and_unreachable_endpoints, _live_endpoints
are always updated on all shards when they change,
so there's no need anymore to replicate them here.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 11:57:32 +03:00
Benny Halevy
2c27297dbd gossiper: fold update_live_endpoints_version into replicate_live_endpoints_on_change
We want to propagate any change to _live_endpoints
to all shards.  Currently we just update the `_live_endpoints_version`
and `replicate_live_endpoints_on_change` propagtes the
change some undetermined time in the future.

To rely on `_live_endpoints` for gossiper::is_alive,
that may be called on any shard, we want to propagate
the change to all shards as soon as it happens.

Use `mutate_live_and_unreachable_endpoints` to update
_live_endpoints and/or _unreachable_endpoints safely,
under `lock_endpoint_update_semaphore`. It is responsible
for incrementing _live_endpoints_version and
calling `replicate_live_endpoints_on_change` to
propagate the change to all shards.

Refs scylladb/scylladb#15089
Refs scylladb/scylladb#15088

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 11:56:41 +03:00
Benny Halevy
86ccc1f49b gossiper: add mutate_live_and_unreachable_endpoints
To be used for safely modifying _live_endpoints
and/or _unreachable_endpoints and replicating the
new version to all shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 11:56:37 +03:00
Patryk Jędrzejczak
0beabdc6ba utils: introduce split_comma_separated_list
Three places handle comma-separated lists similarly:
- ss::remove_node.set(...) in api::set_storage_service,
- storage_service::parse_node_list,
- storage_service::is_repair_based_node_ops_enabled.
In the next commit, the fourth place that needs the same logic
appears -- storage_service::raft_replace. It needs to load
and parse the --ignore-dead-nodes-for-replace param from config.

Moreover, the code in is_repair_based_node_ops_enabled is
different and doesn't seem right. We swap '\"' and '\'' with ' '
but don't do anything with it afterward.

To avoid code duplication and fix is_repair_based_node_ops_enabled,
we introduce the new function utils::split_comma_separated_list.

This change has a small side effect on logging. For example,
ignore_nodes_strs in storage_service::parse_node_list might be
printed in a slightly different form.
2023-08-22 10:30:36 +02:00
Patryk Jędrzejczak
16f5db8af2 raft topology: add the ignore_nodes column to system.topology
In the following commits, we add support for --ignore-dead-nodes
in raft_removenode and --ignore-dead-nodes-for-replace in
raft_replace. To make these request parameters accessible for the
topology coordinator, we store them in the new ignore_nodes
column of system.topology.
2023-08-22 10:30:12 +02:00
Benny Halevy
a14e5ab8a3 gossiper: reset_endpoint_state_map: clear also shadow endpoint sets
If we don't clear them, there is a slight chance
that the next update will make `_live_endpoints` or `_unreachable_endpoints`
equal to their shadow counterparts and prevent an update
in `replicate_live_endpoints_on_change`.

Fixes #15003

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 09:17:19 +03:00
Benny Halevy
0cc0a95543 gossiper: reset_endpoint_state_map: clear live/unreachable endpoints on all shards
Not only on the calling shard (shard 0).
Essentially this change folds `update_live_endpoints_version`
into `reset_endpoint_state_map`.

Acquire the _endpoint_update_semaphore to serialize
this with `replicate_live_endpoints_on_change`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 09:17:19 +03:00
Benny Halevy
c45868e3bc gossiper: functions that change _live_endpoints must be called on shard 0
`update_live_endpoints_version` and functions that call it
must be called on shard 0, since it updates the authoritative
`_live_endpoints` and `_live_endpoints_version`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 09:17:19 +03:00
Benny Halevy
b0b1c8ae6e gossiper: add lock_endpoint_update_semaphore
Add a private helper to acquire the _endpoint_update_semaphore
before calling replicate_live_endpoints_on_change.

Must be called on shard 0.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 09:17:19 +03:00
Benny Halevy
18881bc89d gossiper: make _live_endpoints an unordered_set
It is more efficient to maintain as an unrdered_set
and it will be used in a following patch
to determine is_alive(endpoint) in O(1) on average.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 09:17:19 +03:00
Benny Halevy
97061cc3b8 endpoint_state: use gossiper::is_alive externally
Before we remove endpoint_state:_is_alive to rely
solely on gossipper::_live_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 09:06:09 +03:00
Benny Halevy
758dc252ff repair: reindent to_repair_rows_list 2023-08-22 08:46:26 +03:00
Benny Halevy
7406e9f99b repair: to_repair_rows_list: clear_gently on error
Prevent destroying of potentially large `rows` and `row_list`
in one shot on error as it might caused a reactor stall.

Instead, use utils::clear_gently on the error return path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 08:45:59 +03:00
Benny Halevy
e55143148f repair: to_repair_rows_list: consume frozen rows gently
Although to_repair_rows_list may yield if needed
between rows and mutation fragments, the input
`repair_rows_on_wire` is freed in one shot
and that may cause stalls as seen in qa:
```
  |              bytes_ostream::free_chain at ././bytes_ostream.hh:163
  ++           - addr=0x4103be0:
  |              bytes_ostream::~bytes_ostream at ././bytes_ostream.hh:199
  |              (inlined by) frozen_mutation_fragment::~frozen_mutation_fragment at ././mutation/frozen_mutation.hh:273
  |              (inlined by) std::destroy_at<frozen_mutation_fragment> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_construct.h:88
  |              (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/alloc_traits.h:537
  |              (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/list.tcc:77
  |              (inlined by) std::__cxx11::_List_base<frozen_mutation_fragment, std::allocator<frozen_mutation_fragment> >::~_List_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_list.h:575
  |              (inlined by) partition_key_and_mutation_fragments::~partition_key_and_mutation_fragments at ././repair/repair.hh:203
  |              (inlined by) std::destroy_at<partition_key_and_mutation_fragments> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_construct.h:88
  |              (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/alloc_traits.h:537
  |              (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/list.tcc:77
  |              (inlined by) std::__cxx11::_List_base<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >::~_List_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_list.h:575
  |              (inlined by) to_repair_rows_list at ./repair/row_level.cc:597
```

This change consumes the rows and frozen mutation fragments
incrementally, freeing each after being processed.

Fixes scylladb/scylladb#14537

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 08:45:54 +03:00