repair: resolve start-up deadlock
Repairs have to obtain a permit to the reader concurrency semaphore on each shard they have a presence on. This is prone to deadlocks: node1 node2 repair1_master (takes permit) repair1_follower (waits on permit) repair2_master (waits for permit) repair2_follower (takes permit) In lieu of strong central coordination, we solved this by making permits evictable: if repair2 can evict repair1's permit so it can obtain one and make progress. This is not efficient as evicting a permit usually means discarding already done work, but it prevents the deadlocks. We recently discovered that there is a window when deadlocks can still happen. The permit is made evictable when the disk reader is created. This reader is an evictable one, which effectively makes the permit evictable. But the permit is obtained when the repair constrol structrure -- repair meta -- is create. Between creating the repair meta and reading the first row from disk, the deadlock is still possible. And we know that what is possible, will happen (and did happen). Fix by making the permit evictable as soon as the repair meta is created. This is very clunky and we should have a better API for this (refs #17644), but for now we go with this simple patch, to make it easy to backport. Refs: #17644 Fixes: #17591
This commit is contained in:
@@ -788,6 +788,7 @@ private:
|
||||
repair_hasher _repair_hasher;
|
||||
gc_clock::time_point _compaction_time;
|
||||
bool _is_tablet;
|
||||
reader_concurrency_semaphore::inactive_read_handle _fake_inactive_read_handle;
|
||||
public:
|
||||
std::vector<repair_node_state>& all_nodes() {
|
||||
return _all_node_states;
|
||||
@@ -899,6 +900,13 @@ public:
|
||||
for (unsigned i = 0; i < all_live_peer_nodes.size(); i++) {
|
||||
_all_node_states.push_back(repair_node_state(all_live_peer_nodes[i], all_live_peer_shards[i].value_or(repair_unspecified_shard)));
|
||||
}
|
||||
// Mark the permit as evictable immediately. If there are multiple
|
||||
// repairs launched, they can deadlock in the initial phase, before
|
||||
// they start reading from disk (which is the point where they
|
||||
// become evictable normally).
|
||||
// Prevent this by marking the permit as evictable ASAP.
|
||||
// FIXME: provide a better API for this, this is very clunky
|
||||
_fake_inactive_read_handle = _db.local().get_reader_concurrency_semaphore().register_inactive_read(make_empty_flat_reader_v2(_schema, _permit));
|
||||
}
|
||||
|
||||
// follower constructor
|
||||
@@ -1137,6 +1145,9 @@ private:
|
||||
std::list<repair_row> cur_rows;
|
||||
std::exception_ptr ex;
|
||||
if (!_repair_reader) {
|
||||
// We are about to create a real evictable reader, so drop the fake
|
||||
// reader (evicted or not), we don't need it anymore.
|
||||
_db.local().get_reader_concurrency_semaphore().unregister_inactive_read(std::move(_fake_inactive_read_handle));
|
||||
_repair_reader.emplace(_db,
|
||||
_db.local().find_column_family(_schema->id()),
|
||||
_schema,
|
||||
|
||||
Reference in New Issue
Block a user