Files
scylladb/utils/neat-object-id.hh
Pavel Emelyanov 95f15ea383 utils: B+ tree implementation
// The story is at
// https://groups.google.com/forum/#!msg/scylladb-dev/sxqTHM9rSDQ/WqwF1AQDAQAJ

This is the B+ version which satisfies several specific requirements
to be suitable for row-cache usage.

1. Insert/Remove doesn't invalidate iterators
2. Elements should be LSA-compactable
3. Low overhead of data nodes (1 pointer)
4. External less-only comparator
5. As little actions on insert/delete as possible
6. Iterator walks the sorted keys

The design, briefly is:

There are 3 types of nodes: inner, leaf and data, inner and leaf
keep build-in array of N keys and N(+1) nodes. Leaf nodes sit in
a doubly linked list. Data nodes live separately from the leaf ones
and keep pointers on them. Tree handler keeps pointers on root and
left-most and right-most leaves. Nodes do _not_ keep pointers or
references on the tree (except 3 of them, see below).

changes in v9:

- explicitly marked keys/kids indices with type aliases
- marked the whole erase/clear stuff noexcept
- disposers now accept object pointer instead of reference
- clear tree in destructor
- added more comments
- style/readability review comments fixed

Prior changes

**
- Add noexcepts where possible
- Restrict Less-comparator constraint -- it must be noexcept
- Generalized node_id
- Packed code for beging()/cbegin()

**
- Unsigned indices everywhere
- Cosmetics changes

**
- Const iterators
- C++20 concepts

**
- The index_for() implmenetation is templatized the other way
  to make it possible for AVX key search specialization (further
  patching)

**
- Insertion tries to push kids to siblings before split

  Before this change insertion into full node resulted into this
  node being split into two equal parts. This behaviour for random
  keys stress gives a tree with ~2/3 of nodes half-filled.

  With this change before splitting the full node try to push one
  element to each of the siblings (if they exist and not full).
  This slows the insertion a bit (but it's still way faster than
  the std::set), but gives 15% less total number of nodes.

- Iterator method to reconstruct the data at the given position

  The helper creates a new data node, emplaces data into it and
  replaces the iterator's one with it. Needed to keep arrays of
  data in tree.

- Milli-optimize erase()
  - Return back an iterator that will likely be not re-validated
  - Do not try to update ancestors separation key for leftmost kid

  This caused the clear()-like workload work poorly as compared to
  std:set. In particular the row_cache::invalidate() method does
  exactly this and this change improves its timing.

- Perf test to measure drain speed
- Helper call to collect tree counters

**
- Fix corner case of iterator.emplace_before()
- Clean heterogenous lookup API
- Handle exceptions from nodes allocations
- Explicitly mark places where the key is copied (for future)
- Extend the tree.lower_bound() API to report back whether
  the bound hit the key or not
- Addressed style/cleanness review comments

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:29:43 +03:00

54 lines
1.5 KiB
C++

/*
* Copyright (C) 2020 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <atomic>
namespace utils {
/*
* The neat_id class is purely a debugging thing -- when reading
* the logs with object IDs in it it's more handy to look at those
* consisting * of 1-3 digits, rather than 16 hex-digits of a printed
* pointer.
*
* Embed with [[no_unique_address]] tag for memory efficiency
*/
template <bool Debug>
struct neat_id {
unsigned int operator()() const noexcept { return reinterpret_cast<uintptr_t>(this); }
};
template <>
struct neat_id<true> {
unsigned int _id;
static unsigned int _next() noexcept {
static std::atomic<unsigned int> rover {1};
return rover.fetch_add(1);
}
neat_id() noexcept : _id(_next()) {}
unsigned int operator()() const noexcept { return _id; }
};
} // namespace