Files
scylladb/test/unit/btree_stress_test.cc
Pavel Emelyanov 2f7c03d84c utils: Intrusive B-tree (with tests)
The design of the tree goes from the row-cache needs, which are

1. Insert/Remove do not invalidate iterators
2. Elements are LSA-manageable
3. Low key overhead
4. External tri-comparator
5. As little actions on insert/remove as possible

With the above the design is

Two types of nodes -- inner and leaf. Both types keep pointer on parent nodes
and N pointers on keys (not keys themselves). Two differences: inner nodes have
array of pointers on kids, leaf nodes keep pointer on the tree (to update left-
and rightmost tree pointers on node move).

Nodes do not keep pointers/references on trees, thus we have O(1) move of any
object, but O(logN) to get the tree size. Fortunately, with big keys-per-node
value this won't result in too many steps.

In turn, the tree has 3 pointers -- root, left- and rightmost leaves. The latter
is for constant-time begin() and end().

Keys are managed by user with the help of embeddable member_hook instance,
which is 1 pointer in size.

The code was copied from the B+ tree one, then heavily reworked, the internal
algorythms turned out to differ quite significantly.

For the sake of mutation_partition::apply_monotonically(), which needs to move
an element from one tree into another, there's a key_grabber helping wrapper
that allows doing this move respecting the exception-safety requirement.

As measured by the perf_collections test the B-tree with 8 keys is faster, than
the std::set, but slower than the B+tree:

            vs set        vs b+tree
   fill:     +13%           -6%
   find:     +23%          -35%

Another neat thing is that 1-key insertion-removal is ~40% faster than
for BST (the same number of allocations, but the key object is smaller,
less pointers to set-up and less instructions to execute when linking
node with root).

v4:
- equip insertion methods with on_alloc_point() calls to catch
  potential exception guarantees violations eariler

- add unlink_leftmost_without_rebalance. The method is borrowed from
  boost intrusive set, and is added to kill two birds -- provide it,
  as it turns out to be popular, and use a bit faster step-by-step
  tree destruction than plain begin+erase loop

v3:
- introduce "inline" root node that is embedded into tree object and in
  which the 1st key is inserted. This greatly improves the 1-key-tree
  performance, which is pretty common case for rows cache

v2:
- introduce "linear" root leaf that grows on demand

  This improves the memory consumption for small trees. This linear node may
  and should over-grow the NodeSize parameter. This comes from the fact that
  there are two big per-key memory spikes on small trees -- 1-key root leaf
  and the first split, when the tree becomes 1-key root with two half-filled
  leaves. If the linear extention goes above NodeSize it can flatten even the
  2nd peak

- mitigate the keys indirection a bit

  Prefetching the keys while doing the intra-node linear scan and the nodes
  while descending the tree gives ~+5% of fill and find

- generalize stress tests for B and B+ trees

- cosmetic changes

TODO:

- fix few inefficincies in the core code (walks the sub-tree twice sometimes)
- try to optimize the leaf nodes, that are not lef-/righmost not to carry
  unused tree pointer on board

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-02-02 09:30:29 +03:00

148 lines
5.3 KiB
C++

/*
* Copyright (C) 2021 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <seastar/core/app-template.hh>
#include <seastar/core/thread.hh>
#include <map>
#include <iostream>
#include <fmt/core.h>
#include <fmt/ostream.h>
constexpr int TEST_NODE_SIZE = 8;
constexpr int TEST_LINEAR_THRESH = 21;
#include "utils/intrusive_btree.hh"
#include "btree_validation.hh"
#include "test/unit/tree_test_key.hh"
#include "collection_stress.hh"
using namespace intrusive_b;
using namespace seastar;
class test_key : public tree_test_key_base {
public:
member_hook _hook;
test_key(int nr) noexcept : tree_test_key_base(nr) {}
test_key(const test_key&) = delete;
test_key(test_key&&) = delete;
};
using test_tree = tree<test_key, &test_key::_hook, test_key_tri_compare, TEST_NODE_SIZE, TEST_LINEAR_THRESH, key_search::both, with_debug::yes>;
using test_validator = validator<test_key, &test_key::_hook, test_key_tri_compare, TEST_NODE_SIZE, TEST_LINEAR_THRESH>;
using test_iterator_checker = iterator_checker<test_key, &test_key::_hook, test_key_tri_compare, TEST_NODE_SIZE, TEST_LINEAR_THRESH>;
int main(int argc, char **argv) {
namespace bpo = boost::program_options;
app_template app;
app.add_options()
("count", bpo::value<int>()->default_value(4132), "number of keys to fill the tree with")
("iters", bpo::value<int>()->default_value(9), "number of iterations")
("keys", bpo::value<std::string>()->default_value("rand"), "how to generate keys (rand, asc, desc)")
("verb", bpo::value<bool>()->default_value(false), "be verbose");
return app.run(argc, argv, [&app] {
auto count = app.configuration()["count"].as<int>();
auto iters = app.configuration()["iters"].as<int>();
auto ks = app.configuration()["keys"].as<std::string>();
auto verb = app.configuration()["verb"].as<bool>();
return seastar::async([count, iters, ks, verb] {
test_key_tri_compare cmp;
auto t = std::make_unique<test_tree>();
std::map<int, unsigned long> oracle;
test_validator tv;
auto* itc = new test_iterator_checker(tv, *t);
stress_config cfg;
cfg.count = count;
cfg.iters = iters;
cfg.keys = ks;
cfg.verb = verb;
auto itv = 0;
stress_collection(cfg,
/* insert */ [&] (int key) {
auto ir = t->insert(std::make_unique<test_key>(key), cmp);
assert(ir.second);
oracle[key] = key;
if (itv++ % 7 == 0) {
if (!itc->step()) {
delete itc;
itc = new test_iterator_checker(tv, *t);
}
}
},
/* erase */ [&] (int key) {
test_key k(key);
auto deleter = [] (test_key* k) noexcept { delete k; };
if (itc->here(k)) {
delete itc;
itc = nullptr;
}
t->erase_and_dispose(key, cmp, deleter);
oracle.erase(key);
if (itc == nullptr) {
itc = new test_iterator_checker(tv, *t);
}
if (itv++ % 5 == 0) {
if (!itc->step()) {
delete itc;
itc = new test_iterator_checker(tv, *t);
}
}
},
/* validate */ [&] {
if (verb) {
fmt::print("Validating\n");
tv.print_tree(*t, '|');
}
tv.validate(*t);
},
/* step */ [&] (stress_step step) {
if (step == stress_step::before_erase) {
auto sz = t->calculate_size();
if (sz != (size_t)count) {
fmt::print("Size {} != count {}\n", sz, count);
throw "size";
}
auto ti = t->begin();
for (auto oe : oracle) {
if ((unsigned long)*ti != oe.second) {
fmt::print("Data mismatch {} vs {}\n", oe.second, *ti);
throw "oracle";
}
ti++;
}
}
}
);
delete itc;
});
});
}