mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-30 11:36:54 +00:00
The design of the tree goes from the row-cache needs, which are
1. Insert/Remove do not invalidate iterators
2. Elements are LSA-manageable
3. Low key overhead
4. External tri-comparator
5. As little actions on insert/remove as possible
With the above the design is
Two types of nodes -- inner and leaf. Both types keep pointer on parent nodes
and N pointers on keys (not keys themselves). Two differences: inner nodes have
array of pointers on kids, leaf nodes keep pointer on the tree (to update left-
and rightmost tree pointers on node move).
Nodes do not keep pointers/references on trees, thus we have O(1) move of any
object, but O(logN) to get the tree size. Fortunately, with big keys-per-node
value this won't result in too many steps.
In turn, the tree has 3 pointers -- root, left- and rightmost leaves. The latter
is for constant-time begin() and end().
Keys are managed by user with the help of embeddable member_hook instance,
which is 1 pointer in size.
The code was copied from the B+ tree one, then heavily reworked, the internal
algorythms turned out to differ quite significantly.
For the sake of mutation_partition::apply_monotonically(), which needs to move
an element from one tree into another, there's a key_grabber helping wrapper
that allows doing this move respecting the exception-safety requirement.
As measured by the perf_collections test the B-tree with 8 keys is faster, than
the std::set, but slower than the B+tree:
vs set vs b+tree
fill: +13% -6%
find: +23% -35%
Another neat thing is that 1-key insertion-removal is ~40% faster than
for BST (the same number of allocations, but the key object is smaller,
less pointers to set-up and less instructions to execute when linking
node with root).
v4:
- equip insertion methods with on_alloc_point() calls to catch
potential exception guarantees violations eariler
- add unlink_leftmost_without_rebalance. The method is borrowed from
boost intrusive set, and is added to kill two birds -- provide it,
as it turns out to be popular, and use a bit faster step-by-step
tree destruction than plain begin+erase loop
v3:
- introduce "inline" root node that is embedded into tree object and in
which the 1st key is inserted. This greatly improves the 1-key-tree
performance, which is pretty common case for rows cache
v2:
- introduce "linear" root leaf that grows on demand
This improves the memory consumption for small trees. This linear node may
and should over-grow the NodeSize parameter. This comes from the fact that
there are two big per-key memory spikes on small trees -- 1-key root leaf
and the first split, when the tree becomes 1-key root with two half-filled
leaves. If the linear extention goes above NodeSize it can flatten even the
2nd peak
- mitigate the keys indirection a bit
Prefetching the keys while doing the intra-node linear scan and the nodes
while descending the tree gives ~+5% of fill and find
- generalize stress tests for B and B+ trees
- cosmetic changes
TODO:
- fix few inefficincies in the core code (walks the sub-tree twice sometimes)
- try to optimize the leaf nodes, that are not lef-/righmost not to carry
unused tree pointer on board
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
97 lines
3.4 KiB
C++
97 lines
3.4 KiB
C++
/*
|
|
* Copyright (C) 2021 ScyllaDB
|
|
*/
|
|
|
|
/*
|
|
* This file is part of Scylla.
|
|
*
|
|
* Scylla is free software: you can redistribute it and/or modify
|
|
* it under the terms of the GNU Affero General Public License as published by
|
|
* the Free Software Foundation, either version 3 of the License, or
|
|
* (at your option) any later version.
|
|
*
|
|
* Scylla is distributed in the hope that it will be useful,
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
* GNU General Public License for more details.
|
|
*
|
|
* You should have received a copy of the GNU General Public License
|
|
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
|
|
*/
|
|
|
|
#include <seastar/core/app-template.hh>
|
|
#include <seastar/core/thread.hh>
|
|
#include <map>
|
|
#include <iostream>
|
|
#include <fmt/core.h>
|
|
#include "utils/logalloc.hh"
|
|
|
|
constexpr int TEST_NODE_SIZE = 7;
|
|
constexpr int TEST_LINEAR_THRESHOLD = 19;
|
|
|
|
#include "tree_test_key.hh"
|
|
#include "utils/intrusive_btree.hh"
|
|
#include "btree_validation.hh"
|
|
#include "collection_stress.hh"
|
|
|
|
using namespace intrusive_b;
|
|
using namespace seastar;
|
|
|
|
class test_key : public tree_test_key_base {
|
|
public:
|
|
member_hook _hook;
|
|
test_key(int nr) noexcept : tree_test_key_base(nr) {}
|
|
test_key(const test_key&) = delete;
|
|
test_key(test_key&& o) noexcept : tree_test_key_base(std::move(o)), _hook(std::move(o._hook)) {}
|
|
};
|
|
|
|
using test_tree = tree<test_key, &test_key::_hook, test_key_tri_compare, TEST_NODE_SIZE, TEST_LINEAR_THRESHOLD, key_search::both, with_debug::yes>;
|
|
using test_validator = validator<test_key, &test_key::_hook, test_key_tri_compare, TEST_NODE_SIZE, TEST_LINEAR_THRESHOLD>;
|
|
|
|
int main(int argc, char **argv) {
|
|
namespace bpo = boost::program_options;
|
|
app_template app;
|
|
app.add_options()
|
|
("count", bpo::value<int>()->default_value(10000), "number of keys to fill the tree with")
|
|
("iters", bpo::value<int>()->default_value(13), "number of iterations")
|
|
("verb", bpo::value<bool>()->default_value(false), "be verbose");
|
|
|
|
return app.run(argc, argv, [&app] {
|
|
auto count = app.configuration()["count"].as<int>();
|
|
auto rep = app.configuration()["iters"].as<int>();
|
|
auto verb = app.configuration()["verb"].as<bool>();
|
|
|
|
return seastar::async([count, rep, verb] {
|
|
stress_config cfg;
|
|
cfg.count = count;
|
|
cfg.iters = rep;
|
|
cfg.verb = verb;
|
|
|
|
tree_pointer<test_tree> t;
|
|
test_validator tv;
|
|
|
|
stress_compact_collection(cfg,
|
|
/* insert */ [&] (int key) {
|
|
test_key *k = current_allocator().construct<test_key>(key);
|
|
auto ti = t->insert(*k, test_key_tri_compare{});
|
|
assert(ti.second);
|
|
},
|
|
/* erase */ [&] (int key) {
|
|
auto deleter = current_deleter<test_key>();
|
|
t->erase_and_dispose(test_key(key), test_key_tri_compare{}, deleter);
|
|
},
|
|
/* validate */ [&] {
|
|
if (verb) {
|
|
fmt::print("Validating:\n");
|
|
tv.print_tree(*t, '|');
|
|
}
|
|
tv.validate(*t);
|
|
},
|
|
/* clear */ [&] {
|
|
t->clear_and_dispose(current_deleter<test_key>());
|
|
}
|
|
);
|
|
});
|
|
});
|
|
}
|