view: row_lock: lock_ck: serialize partition and row locking

The problematic scenario this patch fixes might happen due to
unfortunate serialization of locks/unlocks between lock_pk and lock_ck,
as follows:

    1. lock_pk acquires an exclusive lock on the partition.
    2.a lock_ck attempts to acquire shared lock on the partition
        and any lock on the row. both cases currently use a fiber
        returning a future<rwlock::holder>.
    2.b since the partition is locked, the lock_partition times out
        returning an exceptional future.  lock_row has no such problem
        and succeeds, returning a future holding a rwlock::holder,
        pointing to the row lock.
    3.a the lock_holder previously returned by lock_pk is destroyed,
        calling `row_locker::unlock`
    3.b row_locker::unlock sees that the partition is not locked
        and erases it, including the row locks it contains.
    4.a when_all_succeeds continuation in lock_ck runs.  Since
        the lock_partition future failed, it destroyes both futures.
    4.b the lock_row future is destroyed with the rwlock::holder value.
    4.c ~holder attempts to return the semaphore units to the row rwlock,
        but the latter was already destroyed in 3.b above.

Acquiring the partition lock and row lock in parallel
doesn't help anything, but it complicates error handling
as seen above,

This patch serializes acquiring the row lock in lock_ck
after locking the partition to prevent the above race.

This way, erasing the unlocked partition is never expected
to happen while any of its rows locks is held.

Fixes #12168

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12208
This commit is contained in:
Benny Halevy
2022-12-04 16:28:37 +02:00
committed by Avi Kivity
parent f017e9f1c6
commit 5007ded2c1

View File

@@ -10,8 +10,6 @@
#include "log.hh"
#include "utils/latency.hh"
#include <seastar/core/when_all.hh>
static logging::logger mylog("row_locking");
row_locker::row_locker(schema_ptr s)
@@ -102,9 +100,10 @@ row_locker::lock_ck(const dht::decorated_key& pk, const clustering_key_prefix& c
single_lock_stats.operations_currently_waiting_for_lock++;
utils::latency_counter waiting_latency;
waiting_latency.start();
future<lock_type::holder> lock_row = exclusive ? j->second.hold_write_lock(timeout) : j->second.hold_read_lock(timeout);
return when_all_succeed(std::move(lock_partition), std::move(lock_row))
.then_unpack([this, pk = &i->first, cpk = &j->first, exclusive, &single_lock_stats, waiting_latency = std::move(waiting_latency)] (auto lock1, auto lock2) mutable {
return lock_partition.then([this, pk = &i->first, cpk = &j->first, &row_lock = j->second, exclusive, &single_lock_stats, waiting_latency = std::move(waiting_latency), timeout] (auto lock1) mutable {
auto lock_row = exclusive ? row_lock.hold_write_lock(timeout) : row_lock.hold_read_lock(timeout);
return lock_row.then([this, pk, cpk, exclusive, &single_lock_stats, waiting_latency = std::move(waiting_latency), lock1 = std::move(lock1)] (auto lock2) mutable {
// FIXME: indentation
lock1.release();
lock2.release();
waiting_latency.stop();
@@ -112,6 +111,7 @@ row_locker::lock_ck(const dht::decorated_key& pk, const clustering_key_prefix& c
single_lock_stats.lock_acquisitions++;
single_lock_stats.operations_currently_waiting_for_lock--;
return lock_holder(this, pk, cpk, exclusive);
});
});
}