scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Author	SHA1	Message	Date
Avi Kivity	5f4bf18387	Revert "Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros" This reverts commit `31909515b3`, reversing changes made to `ef97adc72a`. It shows many serious regressions in dtest. Fixes #8197.	2021-03-02 13:21:22 +02:00
Avi Kivity	31909515b3	Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros Currently, the sstable_set in a table is copied before every change to allow accessing the unchanged version by existing sstable readers. This patch changes the sstable_set to a structure that keeps all its versions that are referenced somewhere and provides a way of getting a reference to an immutable version of the set. Each sstable in the set is associated with the versions it is alive in, and is removed when all such versions don't have references anymore. To avoid copying, the object holding all sstables in the set version is changed to a new structure, sstable_list, which was previously an alias for std::unordered_set<shared_sstable>, and which implements most of the methods of an unordered_set, but its iterator uses the actual set with all sstables from all referenced versions and iterates over those sstables that belong to the captured version. The methods that modify the sets contents give strong exception guarantee by trying to insert new sstables to its containers, and erasing them in the case of an caught exception. To release shared_sstables as soon as possible (i.e. when all references to versions that contain them die), each time a version is removed, all sstables that were referenced exclusively by this version are erased. We are able to find these sstables efficiently by storing, for each version, all sstables that were added and erased in it, and, when a version is removed, merging it with the next one. When a version that adds an sstable gets merged with a version that removes it, this sstable is erased. Fixes #2622 Signed-off-by: Wojciech Mitros wojciech.mitros@scylladb.com Closes #8111 * github.com:scylladb/scylla: sstables: add test for checking the latency of updating the sstable_set in a table sstables: move column_family_test class from test/boost to test/lib sstables: use fast copying of the sstable_set instead of rebuilding it sstables: replace the sstable_set with a versioned structure sstables: remove potential ub sstables: make sstable_set constructor less error-prone	2021-03-01 14:16:36 +02:00
Avi Kivity	d980f550d1	Merge 'row_cache: Make fill_buffer() preemptable when cursor leads with dummy rows' from Tomasz Grabiec fill_buffer() will keep scanning until _lower_bound_changed is true, even if preemption is signaled, so that the reader makes forward progress. Before the patch, we did not update _lower_bound on touching a dummy entry. The read will not respect preemption until we hit a non-dummy row. If there is a lot of dummy rows, that can cause reactor stalls. Fix that by updating _lower_bound on dummy entries as well. Refs #8153. Tested with perf_row_cache_reads: ``` $ build/release/test/perf/perf_row_cache_reads -c1 -m200M Rows in cache: 0 Populating with dummy rows Rows in cache: 373929 Scanning read: 183.658966 [ms], preemption: {count: 848, 99%: 0.545791 [ms], max: 0.519343 [ms]}, cache: 99/100 [MB] read: 120.951515 [ms], preemption: {count: 257, 99%: 0.545791 [ms], max: 0.518795 [ms]}, cache: 99/100 [MB] ``` Notice that max preemption latency is low in the second "read:" line. Closes #8167 * github.com:scylladb/scylla: row_cache: Make fill_buffer() preemptable when cursor leads with dummy rows tests: perf: Introduce perf_row_cache_reads row_cache: Add metric for dummy row hits	2021-02-28 21:00:20 +02:00
Tomasz Grabiec	52e411df36	tests: perf: Introduce perf_row_cache_reads Tests performance of various read patterns from the row cache. Example: $ build/release/test/perf/perf_row_cache_reads_g -c1 -m200M Filling memtable Rows in cache: 0 Populating with dummy rows Rows in cache: 373929 Scanning read: 156.288986 [ms], preemption: {count: 702, 99%: 0.545791 [ms], max: 0.537537 [ms]}, cache: 99/100 [MB] read: 106.480766 [ms], preemption: {count: 6, 99%: 0.006866 [ms], max: 106.496168 [ms]}, cache: 99/100 [MB]	2021-02-26 01:20:38 +01:00
Pavel Emelyanov	9baf1226dc	test/memory_footpring: Print radix tree node sizes After switching cells storage onto compact radix tree it becomes useful to know the tree nodes' sizes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-15 20:41:09 +03:00
Pavel Emelyanov	aa85bc790b	test: Add tests for radix tree Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-15 20:27:00 +03:00
Pavel Emelyanov	0ad361b380	test/perf_collection: Add callback to check the speed of clone In some places scylla clones collections of objects, so it's sometimes needed to measure the speed of this operation. This patch adds a placeholder for it, but no implementations for any supported collections. It will be added soon for radix tree. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-15 17:46:37 +03:00
Pavel Emelyanov	767253fe24	test/perf_mutation: Add option to run with more than 1 columns The --column-count makes the test generate schema with the given numbers of columns and make mutation maker fill random column with the value on each iteration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-15 17:45:42 +03:00
Pavel Emelyanov	fc84ab3418	test/perf_mutation: Prepare to have several regular columns Teach the schema builder and test itself to work on more than one regular column, but for now only use 1, as before. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-15 17:44:34 +03:00
Pavel Emelyanov	21adff2a41	test/perf_mutation: Use builder to build schema The test will be taught to use more than one regular column, so switch to builder in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-15 17:44:06 +03:00
Wojciech Mitros	17634d141b	sstables: add test for checking the latency of updating the sstable_set in a table Added a test which measures the time it takes to replace sstables in a table's sstable_set, using the leveled compaction strategy. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-02-11 11:02:55 +01:00
Avi Kivity	37b41d7764	test: add missing #include <fstream> std::ofstream is used, but there is no direct include for it. This fails the build with libstdc++ 11. Closes #8050	2021-02-09 14:45:20 +02:00
Tomasz Grabiec	2e3f6a9622	tests: perf_fast_forward: Print outpout directory Message-Id: <20210203180053.230627-1-tgrabiec@scylladb.com>	2021-02-04 10:39:41 +02:00
Tomasz Grabiec	e0ceb454c0	tests: perf_fast_forward: Print error hints to stdout They point to lines printed to stdout, so should be aligned with them. Message-Id: <20210203180016.230547-1-tgrabiec@scylladb.com>	2021-02-04 10:39:41 +02:00
Pavel Emelyanov	a92eb2f7a9	perf-test : Print B-tree sizes After the switch from BST to B-tree the memory foorprint includes inner/leaf nodes from the B-tree, so it's useful to know their sizes too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:30 +03:00
Pavel Emelyanov	2f7c03d84c	utils: Intrusive B-tree (with tests) The design of the tree goes from the row-cache needs, which are 1. Insert/Remove do not invalidate iterators 2. Elements are LSA-manageable 3. Low key overhead 4. External tri-comparator 5. As little actions on insert/remove as possible With the above the design is Two types of nodes -- inner and leaf. Both types keep pointer on parent nodes and N pointers on keys (not keys themselves). Two differences: inner nodes have array of pointers on kids, leaf nodes keep pointer on the tree (to update left- and rightmost tree pointers on node move). Nodes do not keep pointers/references on trees, thus we have O(1) move of any object, but O(logN) to get the tree size. Fortunately, with big keys-per-node value this won't result in too many steps. In turn, the tree has 3 pointers -- root, left- and rightmost leaves. The latter is for constant-time begin() and end(). Keys are managed by user with the help of embeddable member_hook instance, which is 1 pointer in size. The code was copied from the B+ tree one, then heavily reworked, the internal algorythms turned out to differ quite significantly. For the sake of mutation_partition::apply_monotonically(), which needs to move an element from one tree into another, there's a key_grabber helping wrapper that allows doing this move respecting the exception-safety requirement. As measured by the perf_collections test the B-tree with 8 keys is faster, than the std::set, but slower than the B+tree: vs set vs b+tree fill: +13% -6% find: +23% -35% Another neat thing is that 1-key insertion-removal is ~40% faster than for BST (the same number of allocations, but the key object is smaller, less pointers to set-up and less instructions to execute when linking node with root). v4: - equip insertion methods with on_alloc_point() calls to catch potential exception guarantees violations eariler - add unlink_leftmost_without_rebalance. The method is borrowed from boost intrusive set, and is added to kill two birds -- provide it, as it turns out to be popular, and use a bit faster step-by-step tree destruction than plain begin+erase loop v3: - introduce "inline" root node that is embedded into tree object and in which the 1st key is inserted. This greatly improves the 1-key-tree performance, which is pretty common case for rows cache v2: - introduce "linear" root leaf that grows on demand This improves the memory consumption for small trees. This linear node may and should over-grow the NodeSize parameter. This comes from the fact that there are two big per-key memory spikes on small trees -- 1-key root leaf and the first split, when the tree becomes 1-key root with two half-filled leaves. If the linear extention goes above NodeSize it can flatten even the 2nd peak - mitigate the keys indirection a bit Prefetching the keys while doing the intra-node linear scan and the nodes while descending the tree gives ~+5% of fill and find - generalize stress tests for B and B+ trees - cosmetic changes TODO: - fix few inefficincies in the core code (walks the sub-tree twice sometimes) - try to optimize the leaf nodes, that are not lef-/righmost not to carry unused tree pointer on board Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:29 +03:00
Avi Kivity	32cdcc0c8b	Merge "sstables: consolidate reader factory methods" from Botond " Currently there are three different methods for creating an sstable reader: * one for single key reads * one for ranged reads * and one nobody uses This patch-set consolidates all these into a single `make_reader()` method, which behind the scenes uses the same logic to dispatch to the right sstable reader constructor that `sstables::as_mutation_source()` uses. This patch-set is part of an effort to clean up the jungle that is the various reader creation methods. The next step is to clean up the sstable_set, which has even more methods. One very sad discovery I made while working on this patch-set is that we still default `mutation_reader::forwarding` to `yes` in the sstable range reader creator method and in the `mutation_source::make_reader()`. I couldn't assume that all callers are passing what they mean as the value for that parameter. I found many sites in tests that create forwardable single partition readers. This is also something we should address soon. Tests: unit(release, debug:v3) " * 'sstables-consolidate-reader-factory-methods-v4' of https://github.com/denesb/scylla: cql_query_test: add unit test covering the non-optimal TWCS sstable read path sstable_mutation_reader: consolidate constructors tests: don't pass temporary ranges to readers sstables: sstable_mutation_reader: remove now unused whole sstable constructor sstables: stats: remove now unused sstable_partition_reads counter sstable: remove read_.row._flat() methods tree-wide: use sstables::make_reader() instead of the read_.row._flat() methods sstables: pass partition_range to create_single_key_sstable_reader() sstables: sstable: add make_reader()	2021-01-28 12:05:06 +02:00
Konstantin Osipov	b4f875f08e	uuid: reduce code dependency on UUID_gen.hh Do not include UUID_gen.hh in trace_state.hh and lists.hh to reduce header level dependency on it. Message-Id: <20210127173114.725761-2-kostja@scylladb.com>	2021-01-27 20:08:29 +02:00
Botond Dénes	c3b4e990a2	tree-wide: use sstables::make_reader() instead of the read_.row._flat() methods	2021-01-27 17:38:17 +02:00
Avi Kivity	aec231ba2e	Merge "Unify query paths" from Botond " Currently we have two parallel query paths: * database::query() -> table::query() -> data_query() * mutation::query() The former is used by single partition queries, the latter by range scans, as mutation::query() is used to convert reconcilable_result to query::result (which means it is also used in single partition queries if it triggers read repair). This is a rather unfortunate situation as we have two parallel implementation of the query code, which means they are prone to diverge, and in fact they already have -- more on that later. This patchset aims to remedy this situation by retiring `mutation::query()` and migrating users to an implementation based on the "standard" query path, in other words one using the same building blocks as the `database::query()` path. This means using `compact_mutation` for compacting and `query_result_builder` for result building. These components however were created to work with `flat_mutation_reader`, however introducing a reader into this pipeline would mean that we'd have to make all the related APIs asynchronous, which would cause an insane amount of churn. To avoid this, this patchset adds an API compatible `consume()` method to `mutation`, which can accept a `compact_mutation` instance as-is. This allows an elegant and succinct reimplementation. So far so good. Like mentioned above, the two implementations have diverged in time, or have been different from the start. The difference manifest when calculating digests, more precisely in which tombstones are included in the digest. The retired `mutation::query()` path incorporates only non-purgeable tombstones in the digest. The standard query path however incorporates all tombstones, even those that can be purged. After some scrutiny however this difference proved to be completely theoretical, as the code path where this would matter -- converting reconcilable result to query result -- passes min timestamp as the query time to the compaction, so nothing is compacted and hence the difference has no chance to manifest. This patch-set was motivated by the desire to provide a single solution to #7434, instead of two, one for each path. Tests: unit(release:v2, debug:v2, dev:v3) " * 'unified-query-path/v3' of https://github.com/denesb/scylla: mutation: remove now unused query() and query_compacted() treewide: use query_mutations() instead of mutation::query() mutation_test: test_query_digest: ensure digest is produced consistently mutation_query: introduce query_mutation() mutation_query: to_data_query_result(): migrate to standard query code mutation_query: move to_data_query_result() to mutation_partition.cc mutation: add consume() flat_mutation_reader: move mutation consumer concepts to separate header mutation compactor: query compaction: ignore purgeable tombstones	2021-01-27 15:58:47 +02:00
Benny Halevy	1847d49971	test: test_env: pick the highest sstable version by default If possible, test the highest sstable format version, as it's the mostly used. If there pre-written sstables we need to load from the test directory from an older version, either specify their version explicitly, or use the new test_env::reusable_sst method that looks up the latest sstable version in the given directory and generation. Test: unit(release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201210161822.2833510-1-bhalevy@scylladb.com>	2021-01-24 10:38:55 +02:00
Botond Dénes	1a3ee71b39	treewide: use query_mutations() instead of mutation::query() We want to retire the latter.	2021-01-22 15:36:37 +02:00
Benny Halevy	29002e3b48	flat_mutation_reader: return future from next_partition To allow it to asynchronously close underlying readers on next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-01-13 17:35:07 +02:00
Raphael S. Carvalho	198b87503f	row_cache: allow external updater to decouple preparation from execution External updater may do some preparatory work like constructing a new sstable list, and at the end atomically replace the old list by the new one. Decoupling the preparation from execution will give us the following benefits: - the preparation step can now yield if needed to avoid reactor stalls, as it's been futurized. - the execution step will now be able to provide strong exception guarantees, as it's now decoupled from the preparation step which can be non-exception-safe. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:45 -03:00
Kamil Braun	af49a95627	perf: microbenchmark for clustering_order_reader_merger	2020-11-30 11:55:44 +01:00
Tomasz Grabiec	5abddc8568	Merge "Testing performance of different collections" from Pavel Emelyanov There's a perf_bptree test that compares B+ tree collection with std::set and std::map ones. There will come more, also the "patterns" to compare are not just "fill with keys" and "drain to empty", so here's the perf_collection test, that measures timings of - fill with keys - drain key by key - empty with .clear() call - full scan with iterator - insert-and-remove of a single element for currently used collections - std::set - std::map - intrusive_set_external_comparator - bplus::tree * https://github.com/xemul/scylla/tree/br-perf-collection-test: test: Generalize perf_bptree into perf_collection perf_collection: Clear collection between itartions perf_collection: Add intrusive_set_external_comparator perf_collection: Add test for single element insertion perf_collection: Add test for destruction with .clear() perf_collection: Add test for full scan time	2020-11-03 13:42:54 +02:00
Pavel Emelyanov	8bceb916ea	test/perf/memory_footprint: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 11:08:09 +03:00
Pavel Emelyanov	3e4de0f748	test/perf/memory_footprint: Don't forget to close sstables::test_env after usage After recent sstables manager rework the sstables::test_env must be .close()d after usage, otherwise the ~sstables_mananger() hits the _closing assertion. Do it with the help of .do_with(). The execution context is already seastar::async in this place, so .get() it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 11:06:35 +03:00
Pavel Emelyanov	8558339c63	perf_collection: Add test for full scan time Scan here means walking the collection forward using iterator. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	7284469b24	perf_collection: Add test for destruction with .clear() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	72ccc43380	perf_collection: Add test for single element insertion In some cases a collection is used to keep several elements, so it's good to know this timing. For example, a mutation_partition keeps a set of rows, if used in cache it can grow large, if used in mutation to apply, it's typically small. Plain replacement of bst into b-tree caused performance degardation of mutation application because b-tree is only better at big sizes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	207e1aa48f	perf_collection: Add intrusive_set_external_comparator This collection is widely used, any replacement should be compared against it to better understand pros-n-cons. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	2d09864627	perf_collection: Clear collection between itartions Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	c891f274dc	test: Generalize perf_bptree into perf_collection Rename into perf_collection and localize the B+ code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	4f5ccf82cb	mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_clustering_row() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Avi Kivity	5e292897df	test: perf/perf_sstable.hh: prepare for asynchronously closed sstables_manager Obtain a test_env using do_with_async_returning(); and pass it to column_family_test_config so it can stop using test_sstables_manager.	2020-09-23 20:55:14 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Avi Kivity	3530e80ce1	Merge "Support md format" from Benny " This series adds support for the "md" sstable format. Support is based on the following: * do not use clustering based filtering in the presence of static row, tombstones. * Disabling min/max column names in the metadata for formats older than "md". * When updating the metadata, reset and disable min/max in the presence of range tombstones (like Cassandra does and until we process them accurately). * Fix the way we maintain min/max column names by: keeping whole clustering key prefixes as min/max rather than calculating min/max independently for each component, like Cassandra does in the "md" format. Fixes #4442 Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug) md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1 " * tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits) config: enable_sstables_md_format by default test: cql_query_test: add test_clustering_filtering unit tests table: filter_sstable_for_reader: allow clustering filtering md-format sstables table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results table: filter_sstable_for_reader: adjust to md-format table: filter_sstable_for_reader: include non-scylla sstables with tombstones table: filter_sstable_for_reader: do not filter if static column is requested table: filter_sstable_for_reader: refactor clustering filtering conditional expression features: add MD_SSTABLE_FORMAT cluster feature config: add enable_sstables_md_format database: add set_format_by_config test: sstable_3_x_test: test both mc and md versions test: Add support for the "md" format sstables: mx/writer: use version from sstable for write calls sstables: mx/writer: update_min_max_components for partition tombstone sstables: metadata_collector: support min_max_components for range tombstones sstable: validate_min_max_metadata: drop outdated logic sstables: rename mc folder to mx sstables: may_contain_rows: always true for old formats sstables: add may_contain_rows ...	2020-08-11 13:29:11 +03:00
Piotr Jastrzebski	80e3923b3c	codebase wide: replace find(...) != end() with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously the code pattern looked like: <collection>.find(<element>) != <collection>.end() In C++20 the same can be expressed with: <collection>.contains(<element>) This is not only more concise but also expresses the intend of the code more clearly. This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>	2020-08-11 13:28:50 +03:00
Benny Halevy	e2340d0684	config: enable_sstables_md_format by default Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 19:19:32 +03:00
Benny Halevy	65239a6e50	config: add enable_sstables_md_format MD format is disabled by default at this point. The option extends enable_sstables_mc_format so that both are needed to be set for supporting the md format. The MD_FORMAT cluster feature will be added in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Avi Kivity	257c17a87a	Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael " While working on another patch I was getting odd compiler errors saying that a call to ::make_shared was ambiguous. The reason was that seastar has both: template <typename T, typename... A> shared_ptr<T> make_shared(A&&... a); template <typename T> shared_ptr<T> make_shared(T&& a); The second variant doesn't exist in std::make_shared. This series drops the dependency in scylla, so that a future change can make seastar::make_shared a bit more like std::make_shared. " * 'espindola/make_shared' of https://github.com/espindola/scylla: Everywhere: Explicitly instantiate make_lw_shared Everywhere: Add a make_shared_schema helper Everywhere: Explicitly instantiate make_shared cql3: Add a create_multi_column_relation helper main: Return a shared_ptr from defer_verbose_shutdown	2020-08-02 19:51:24 +03:00
Botond Dénes	6660a5df51	result_memory_accounter: remove default constructor If somebody wants to bypass proper memory accounting they should at the very least be forced to consider if that is indeed wise and think a second about the limit they want to apply.	2020-07-28 18:00:29 +03:00
Rafael Ávila de Espíndola	efeaded427	Everywhere: Add a make_shared_schema helper This replaces a lot of make_lw_shared(schema(...)) with make_shared_schema(...). This makes it easier to drop a dependency on the differences between seastar::make_shared and std::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Pavel Emelyanov	f8ffc31218	test: Print more sizes in memory_footprint_test The row cache memory footprint changed after switch to B+ because we no longer have a sole cache_entry allocation, but also the bplus::data and bplus::node. Knowing their sizes helps analyzing the footprint changes. Also print the size of memtable_entry that's now also stored in B+'s data. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	174b101a49	row_cache: Switch partition tree onto B+ rails The row_cache::partitions_type is replaced from boost::intrusive::set to bplus::tree<Key = int64_t, T = array_trusted_bounds<cache_entry>> Where token is used to quickly locate the partition by its token and the internal array -- to resolve hashing conflicts. Summary of changes in cache_entry: - compare's goes away as the new collection needs tri-compare one which is provided by ring_position_comparator - when initialized the dummy entry is added with "after_all_keys" kind, not "before_all_keys" as it was by default. This is to make tree entries sorted by token - insertion and removing of cache_entries happens inside double_decker, most of the changes in row_cache.cc are about passing constructor args from current_allocator.construct into double_decker.empace_before() - the _flags is extended to keep array head/tail bits. There's a room for it, sizeof(cache_entry) remains unchanged The rest fits smothly into the double_decker API. Also, as was told in the previous patch, insertion and removal _may_ invalidate iterators, but may leave them intact. However, currently this doesn't seem to be a problem as the cache_tracker ::insert() and ::on_partition_erase do invalidate iterators unconditionally. Later this can be otimized, as iterators are invalidated by double-decker only in case of hash conflict, otherwise it doesn't change arrays and B+ tree doesn't invalidate its. tests: unit(dev), perf(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	95f15ea383	utils: B+ tree implementation // The story is at // https://groups.google.com/forum/#!msg/scylladb-dev/sxqTHM9rSDQ/WqwF1AQDAQAJ This is the B+ version which satisfies several specific requirements to be suitable for row-cache usage. 1. Insert/Remove doesn't invalidate iterators 2. Elements should be LSA-compactable 3. Low overhead of data nodes (1 pointer) 4. External less-only comparator 5. As little actions on insert/delete as possible 6. Iterator walks the sorted keys The design, briefly is: There are 3 types of nodes: inner, leaf and data, inner and leaf keep build-in array of N keys and N(+1) nodes. Leaf nodes sit in a doubly linked list. Data nodes live separately from the leaf ones and keep pointers on them. Tree handler keeps pointers on root and left-most and right-most leaves. Nodes do _not_ keep pointers or references on the tree (except 3 of them, see below). changes in v9: - explicitly marked keys/kids indices with type aliases - marked the whole erase/clear stuff noexcept - disposers now accept object pointer instead of reference - clear tree in destructor - added more comments - style/readability review comments fixed Prior changes - Add noexcepts where possible - Restrict Less-comparator constraint -- it must be noexcept - Generalized node_id - Packed code for beging()/cbegin() - Unsigned indices everywhere - Cosmetics changes - Const iterators - C++20 concepts - The index_for() implmenetation is templatized the other way to make it possible for AVX key search specialization (further patching) - Insertion tries to push kids to siblings before split Before this change insertion into full node resulted into this node being split into two equal parts. This behaviour for random keys stress gives a tree with ~2/3 of nodes half-filled. With this change before splitting the full node try to push one element to each of the siblings (if they exist and not full). This slows the insertion a bit (but it's still way faster than the std::set), but gives 15% less total number of nodes. - Iterator method to reconstruct the data at the given position The helper creates a new data node, emplaces data into it and replaces the iterator's one with it. Needed to keep arrays of data in tree. - Milli-optimize erase() - Return back an iterator that will likely be not re-validated - Do not try to update ancestors separation key for leftmost kid This caused the clear()-like workload work poorly as compared to std:set. In particular the row_cache::invalidate() method does exactly this and this change improves its timing. - Perf test to measure drain speed - Helper call to collect tree counters - Fix corner case of iterator.emplace_before() - Clean heterogenous lookup API - Handle exceptions from nodes allocations - Explicitly mark places where the key is copied (for future) - Extend the tree.lower_bound() API to report back whether the bound hit the key or not - Addressed style/cleanness review comments Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:29:43 +03:00
Pavel Emelyanov	9d38846ed2	test: Move perf measurement helpers into header To use the code in new perf tests in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 12:58:26 +03:00

1 2

91 Commits