Files
scylladb/partition_version_list.hh
Tomasz Grabiec a459d9ab98 row_cache: Fix undefined behavior during eviction under some conditions
When the last non-dummy row is evicted from a partition, the partition
entry is evicted as well. The existing logic in on_evicted() leaves
the last dummy row in the partition version before evicting the
partition entry. This row may still be attached to the LRU. Eviction
of partition entry goes through mutation_cleaner::clear_gently(). If
this is preempted, the destruction may proceed in the background. If
evicition happens on the remaining row in that entry before it's
destroyed, the code will hit undefined behavior. on_evicted() calls
partition_version::is_referenced_from_entry(), which is unspecified
when the version is enqueued in the mutation_cleaner. It returns
incorrect value for the last item remaining in the LRU. In that case
eviction will try to access non-existent containing partition_entry,
causing undefined behavior.

Caught by debug-mode cql_query_test.test_clustering_filtering with
raft enabled. Where it manifested like this:

  partition_version.hh:328:16: runtime error: load of value 7, which is not a valid value for type 'bool'
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior partition_version.hh:328:16 in
  Aborting on shard 0.

Instances of this issue outside of the unit test environment are not
known as of yet.

This change makes is_referenced_from_entry() return the correct value
even for versions which are queued in the mutation cleaner.

Fixes #11140
2022-08-01 23:53:15 +02:00

91 lines
2.5 KiB
C++

/*
* Copyright (C) 2018-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#pragma once
#include "partition_version.hh"
// Double-ended chained list of partition_version objects
// utilizing partition_version's intrinsic anchorless_list_base_hook.
class partition_version_list {
// All references have unique_owner set to true
// so that partition_version::is_referenced_from_entry() is false
// so that versions owned here (by mutation_cleaner) are not evicted-from.
partition_version_ref _head;
partition_version_ref _tail; // nullptr means _tail == _head.
public:
// Appends v to the tail of this deque.
// The version must not be already referenced.
void push_back(partition_version& v) noexcept {
if (!_tail) {
if (_head) {
v.insert_before(*_head);
_tail = std::move(_head);
}
_head = partition_version_ref(v, true);
#ifdef SEASTAR_DEBUG
assert(!_head->is_referenced_from_entry());
#endif
} else {
v.insert_after(*_tail);
_tail = partition_version_ref(v, true);
#ifdef SEASTAR_DEBUG
assert(!_tail->is_referenced_from_entry());
#endif
}
}
// Returns a reference to the first version in this deque.
// Call only if !empty().
partition_version& front() noexcept {
return *_head;
}
// Returns true iff contains any versions.
bool empty() const noexcept {
return !_head;
}
// Detaches the first version from the list.
// Assumes !empty().
void pop_front() noexcept {
if (_tail && _head->next() == &*_tail) {
_tail = {};
}
partition_version* next = _head->next();
_head->erase();
_head.release();
if (next) {
_head = partition_version_ref(*next, true);
#ifdef SEASTAR_DEBUG
assert(!_head->is_referenced_from_entry());
#endif
}
}
// Appends other to the tail of this deque.
// The other deque will be left empty.
void splice(partition_version_list& other) noexcept {
if (!other._head) {
return;
}
if (!_head) {
_head = std::move(other._head);
_tail = std::move(other._tail);
} else {
(_tail ? _tail : _head)->splice(*other._head);
if (other._tail) {
_tail = std::move(other._tail);
other._head = {};
} else {
_tail = std::move(other._head);
}
}
}
};