scylladb

Author	SHA1	Message	Date
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	e9aff2426e	everywhere: make deferred actions noexcept Prepare for updating seastar submodule to a change that requires deferred actions to be noexcept (and return void). Test: unit(dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:52 +03:00
Michael Livshin	f364666d4a	row_cache: count read row tombstones Refs #7749. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-08-01 19:41:11 +03:00
Pavel Emelyanov	6ef27c9fa1	btree: Make iterators not modify the tree itself The const_iterator cannot modify anything, but the plain iterator has public methods to remove the key from the tree. To control how the tree is modified this method must be marked private and modification by iterator should come from somewhere else. This somewhere else is the existing key_grabber that's already used to move keys between trees. Generalize this ability to move a key out of a tree (i.e. -- erase). Once done -- mark the iterator::erase_and_dispose private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-27 20:06:53 +03:00
Piotr Sarna	e9d26dd7ed	utils/coroutine: wrap a helper in utils namespace The class name `coroutine` became problematic since seastar introduced it as a namespace for coroutine helpers. To avoid a clash, the class from scylla is wrapped in a separate namespace. Without this patch, Seastar submodule update fails to compile. Message-Id: <6cb91455a7ac3793bc78d161e2cb4174cf6a1606.1626949573.git.sarna@scylladb.com>	2021-07-22 13:28:43 +03:00
Tomasz Grabiec	e947fac74c	database: Fix cache metrics not being registered Introduced in `6a6403d`. The default constructor with dummy_app_stats is also used by production code. Fixes #9012 Message-Id: <20210712221447.71902-1-tgrabiec@scylladb.com>	2021-07-13 07:50:44 +03:00
Tomasz Grabiec	6a6403d19d	row_cache: cache_tracker: Do not register metrics when constructed for tests Some tests will create two cache_tracker instances because of one being embedded in the sstable test env. This would lead to double registration of metrics, which raises run time error. Avoid by not registering metrics in prometheus in tests at all.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	7fa4e10aa0	row_cache: Use generic LRU for eviction In preparation for tracking different kinds of objects, not just rows_entry, in the LRU, switch to the LRU implementation form utils/lru.hh which can hold arbitrary element type.	2021-07-02 10:25:58 +02:00
Michael Livshin	9ef2317248	row_cache: count range tombstones processed during read Refs #7749. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210602152210.17948-1-michael.livshin@scylladb.com>	2021-06-14 14:29:05 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Benny Halevy	b4cbd46adb	row_cache: create_underlying_reader: call read_context on_underlying_created only on success ctx.on_underlying_created() mustn't be called if src.make_reader failed and a reader isn't created. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210511054525.35090-1-bhalevy@scylladb.com>	2021-05-12 01:34:48 +02:00
Benny Halevy	0a2670c9ec	row_cache: hold read_context as unique_ptr Such that the holder, that is responsible for closing the read_context before destroying it, holds it uniquely. cache_flat_mutation_reader may be constructed either with a read_context&, where it knows that the read_context is owned externally, by the caller, or it could be constructed with a std::unique_ptr<read_context> in which case it assumes ownership of the read_context and it is now responsible for closing it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	8531eaaacf	row_cache: make_reader: make read_context only when needed So we can have better control on who's responsible to close it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	9944586480	row_cache: make_reader: use range directly Not via ctx, so we can delay the making of the read_context, as needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	4c969756ac	row_cache: scanning_and_populating_reader: make sure to close underlying readers Note that scanning_and_populating_reader::read_next_partition now closes the current reader unconditionally and before assigning a new reader. This should be an improvement since we want to release resources the reader resources as early as possible, certainly before allocating new resources. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	e34ed3d3e4	row_cache: range_populating_reader: add close method To close the undelying _reader. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	c707ff27a4	row_cache: single_partition_populating_reader: add close method To close the optional underlying _reader and _read_context. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Piotr Jastrzebski	cb3dbb1a4b	row_cache: remove redundant check in make_reader This check is always true because a dummy entry is added at the end of each cache entry. If that wasn't true, the check in else-if would be an UB. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-04-12 21:12:33 +02:00
Piotr Jastrzebski	b3b68dc662	read_context: remove skip_first_fragment arg from create_underlying All callers pass false for its value so no need to keep it around. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-04-12 19:51:06 +02:00
Tomasz Grabiec	cb0b8d1903	row_cache: Zap dummy entries when populating or reading a range This will prevent accumulation of unnecessary dummy entries. A single-partition populating scan with clustering key restrictions will insert dummy entries positioned at the boundaries of the clustering query range to mark the newly populated range as continuous. Those dummy entries may accumulate with time, increasing the cost of the scan, which needs to walk over them. In some workloads we could prevent this. If a populating query overlaps with dummy entries, we could erase the old dummy entry since it will not be needed, it will fall inside a broader continuous range. This will be the case for time series worklodas which scan with a decreasing (newest) lower bound. Refs #8153. _last_row is now updated atomically with _next_row. Before, _last_row was moved first. If exception was thrown and the section was retried, this could cause the wrong entry to be removed (new next instead of old last) by the new algorithm. I don't think this was causing problems before this patch. The problem is not solved for all the cases. After this patch, we remove dummies only when there is a single MVCC version. We could patch apply_monotonically() to also do it, so that dummies which are inside continuous ranges are eventually removed, but this is left for later. perf_row_cache_reads output after that patch shows that the second scan touches no dummies: $ build/release/test/perf/perf_row_cache_reads_g -c1 -m200M Rows in cache: 0 Populating with dummy rows Rows in cache: 265320 Scanning read: 142.621613 [ms], preemption: {count: 639, 99%: 0.545791 [ms], max: 0.526929 [ms]}, cache: 0/0 [MB] read: 0.023197 [ms], preemption: {count: 1, 99%: 0.035425 [ms], max: 0.032736 [ms]}, cache: 0/0 [MB] Message-Id: <20210226172801.800264-1-tgrabiec@scylladb.com>	2021-03-01 20:34:35 +02:00
Avi Kivity	d980f550d1	Merge 'row_cache: Make fill_buffer() preemptable when cursor leads with dummy rows' from Tomasz Grabiec fill_buffer() will keep scanning until _lower_bound_changed is true, even if preemption is signaled, so that the reader makes forward progress. Before the patch, we did not update _lower_bound on touching a dummy entry. The read will not respect preemption until we hit a non-dummy row. If there is a lot of dummy rows, that can cause reactor stalls. Fix that by updating _lower_bound on dummy entries as well. Refs #8153. Tested with perf_row_cache_reads: ``` $ build/release/test/perf/perf_row_cache_reads -c1 -m200M Rows in cache: 0 Populating with dummy rows Rows in cache: 373929 Scanning read: 183.658966 [ms], preemption: {count: 848, 99%: 0.545791 [ms], max: 0.519343 [ms]}, cache: 99/100 [MB] read: 120.951515 [ms], preemption: {count: 257, 99%: 0.545791 [ms], max: 0.518795 [ms]}, cache: 99/100 [MB] ``` Notice that max preemption latency is low in the second "read:" line. Closes #8167 * github.com:scylladb/scylla: row_cache: Make fill_buffer() preemptable when cursor leads with dummy rows tests: perf: Introduce perf_row_cache_reads row_cache: Add metric for dummy row hits	2021-02-28 21:00:20 +02:00
Tomasz Grabiec	f0a3272a5f	row_cache: Add metric for dummy row hits This will help to diagnose performance problems related to the read having to walk through a lot of dummy rows to fill the buffer. Refs #8153	2021-02-25 18:26:01 +01:00
Benny Halevy	4b46793c19	row_cache: scanning_and_populating_reader: add _read_next_partition flag Instead of resetting _reader in scanning_and_populating_reader::fill_buffer in the `reader_finished` case, use a gentler, _read_next_partition flag on which `read_next_partition` will be called in the next iteration. Then, read_next_partition can close _reader only before overwriting it with a new reader. Otherwise, if _reader is always closed in the ``reader_finished` case, we end up hitting premature end_of_stream. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210215101254.480228-30-bhalevy@scylladb.com>	2021-02-17 19:06:21 +02:00
Pavel Emelyanov	5c0f9a8180	mutation_partition: Switch cache of rows onto B-tree The switch is pretty straightforward, and consists of - change less-compare into tri-compare - rename insert/insert_check into insert_before_hint - use tree::key_grabber in mutation_partition::apply_monotonically to exception-safely transfer a row from one tree to another - explicitly erase the row from tree in rows_entry::on_evicted, there's a O(1) tree::iterator method for this - rewrite rows_entry -> cache_entry transofrmation in the on_evicted to fit the B-tree API - include the B-tree's external memory usage into stats That's it. The number of keys per node was is set to 12 with linear search and linear extention of 20 because - experimenting with tree shows that numbers 8 through 10 keys with linear search show the best performance on stress tests for insert/find-s of keys that are memcmp-able arrays of bytes (which is an approximation of current clustring key compare). More keys work slower, but still better than any bigger value with any type of search up to 64 keys per node - having 12 keys per nodes is the threshold at which the memory footprint for B-tree becomes smaller than for boost::intrusive::set for partitions with 32+ keys - 20 keys for linear root eats the first-split peak and still performs well in linear search As a result the footpring for B tree is bigger than the one for BST only for trees filled with 21...32 keys by 0.1...0.7 bytes per key. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:30 +03:00
Tomasz Grabiec	94749b01eb	Merge "futurize flat_mutation_reader::next_partition" from Benny The main motivation for this patchset is to prepare for adding a async close() method to flat_mutation_reader. In order to close the reader before destroying it in all paths we need to make next_partition asynchronous so it can asynchronously close a current reader before destoring it, e.g. by reassignment of flat_mutation_reader_opt, as done in scanning_reader::next_partition. Test: unit(release, debug) * git@github.com:bhalevy/scylla.git futurize-next-partition-v1: flat_mutation_reader: return future from next_partition multishard_mutation_query: read_context: save_reader: destroy reader_meta from the calling shard mutation_reader: filtering_reader: fill_buffer: futurize inner loop flat_mutation_reader::impl: consumer_adapter: futurize handle_result flat_mutation_reader: consume_pausable/in_thread: futurize_invoke consumer flat_mutation_reader: FlatMutationReaderConsumer: support also async consumer flat_mutation_reader:impl: get rid of _consume_done member	2021-01-19 10:19:03 +02:00
Avi Kivity	60f5ec3644	Merge 'managed_bytes: switch to explicit linearization' from Michał Chojnowski This is a revival of #7490. Quoting #7490: The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope. We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes. As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized(). Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view). By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure. This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator. Closes #7820 * github.com:scylladb/scylla: test: add hashers_test memtable: fix accounting of managed_bytes in partition_snapshot_accounter test: add managed_bytes_test utils: fragment_range: add a fragment iterator for FragmentedView keys: update comments after changes and remove an unused method mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator row_cache: more indentation fixes utils: remove unused linearization facilities in `managed_bytes` class misc: fix indentation treewide: remove remaining `with_linearized_managed_bytes` uses memtable, row_cache: remove `with_linearized_managed_bytes` uses utils: managed_bytes: remove linearizing accessors keys, compound: switch from bytes_view to managed_bytes_view sstables: writer: add write_* helpers for managed_bytes_view compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view types: change equal() to accept managed_bytes_view types: add parallel interfaces for managed_bytes_view types: add to_managed_bytes(const sstring&) serializer_impl: handle managed_bytes without linearizing utils: managed_bytes: add managed_bytes_view::operator[] utils: managed_bytes: introduce managed_bytes_view utils: fragment_range: add serialization helpers for FragmentedMutableView bytes: implement std::hash using appending_hash utils: mutable_view: add substr() utils: fragment_range: add compare_unsigned utils: managed_bytes: make the constructors from bytes and bytes_view explicit utils: managed_bytes: introduce with_linearized() utils: managed_bytes: constrain with_linearized_managed_bytes() utils: managed_bytes: avoid internal uses of managed_bytes::data() utils: managed_bytes: extract do_linearize_pure() thrift: do not depend on implicit conversion of keys to bytes_view clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view cql3: expression: linearize get_value_from_mutation() eariler bytes: add to_bytes(bytes) cql3: expression: mark do_get_value() as static	2021-01-18 11:01:28 +02:00
Benny Halevy	29002e3b48	flat_mutation_reader: return future from next_partition To allow it to asynchronously close underlying readers on next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-01-13 17:35:07 +02:00
Avi Kivity	d508a63d4b	row_cache: linearize key in cache_entry::do_read() do_read() does not linearize cache_entry::_key; this can cause a crash with keys larger than 13k. Fixes #7897. Closes #7898	2021-01-13 11:07:29 +02:00
Pavel Solodovnikov	907b73a652	row_cache: more indentation fixes Fixup indentation issues introduced in recent patches. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Pavel Solodovnikov	8709844566	misc: fix indentation The patch fixes indentation issues introduced in previous patches related to removing `with_linearized_managed_bytes` uses from the code tree. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Pavel Solodovnikov	bf8b138b42	memtable, row_cache: remove `with_linearized_managed_bytes` uses Since `managed_bytes::data()` is deleted as well as other public APIs of `managed_bytes` which would linearize stored values except for explicit `with_linearized`, there is no point invoking `with_linearized_managed_bytes` hack which would trigger automatic linearization under the hood of managed_bytes. Remove useless `with_linearized_managed_bytes` wrapper from memtable and row_cache code. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Raphael S. Carvalho	198b87503f	row_cache: allow external updater to decouple preparation from execution External updater may do some preparatory work like constructing a new sstable list, and at the end atomically replace the old list by the new one. Decoupling the preparation from execution will give us the following benefits: - the preparation step can now yield if needed to avoid reactor stalls, as it's been futurized. - the execution step will now be able to provide strong exception guarantees, as it's now decoupled from the preparation step which can be non-exception-safe. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:45 -03:00
Botond Dénes	dd372c8457	flat_mutation_reader: de-virtualize buffer_size() The main user of this method, the one which required this method to return the collective buffer size of the entire reader tree, is now gone. The remaining two users just use it to check the size of the reader instance they are working with. So de-virtualize this method and reduce its responsibility to just returning the buffer size of the current reader instance.	2020-10-06 08:22:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Tomasz Grabiec	a22645b7dd	Merge "Unfriend rows_entry, cache_tracker and mutation_partition" from Pavel Emelyanov The classes touche private data of each other for no real reason. Putting the interaction behind API makes it easier to track the usage. * xemul/br-unfriends-in-row-cache-2: row cache: Unfriend classes from each other rows_entry: Move container/hooks types declarations rows_entry: Simplify LRU unlink mutation_partition: Define .replace_with method for rows_entry mutation_partition: Use rows_entry::apply_monotonically	2020-09-22 21:18:14 +02:00
Pavel Emelyanov	7ed1e18a13	rows_entry: Simplify LRU unlink The cache_tracker tries to access private member of the rows_entry to unlink it, but the lru_type is auto_unlink and can unlink itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-11 16:35:51 +03:00
Pavel Emelyanov	fabf849fcb	row_cache: Save one key compare on direct hit The partitions_type::lower_bound() method can return a hint that saves info about the "lower-ness of the bound", in particular when the search key is found, this can be guessed from the hint without comparison. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	ada174c932	row_cache: Kill incomplete_tag The incomplete entry is created in one place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	240b966695	row_cache: Do not copy partition tombstone when creating cache entry The row_cache::find_or_create is only used to put (or touch) an entry in cache having the partition_start mutation at hands. Thus, theres no point in carrying key reference and tombstone value through the calls, just the partition_start reference is enough. Since the new cache entry is created incomplete, rename the creation method to reflect this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	84a6d439ad	test: Lookup an existing entry with its own helper The only caller of find_or_create() in tests works on already existing (.populate()-d) entry, so patch this place for explicity and for the sake of next patching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	3f33a71c0c	row_cache: Move missing entry creation into helper No functional changes, just move the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	4662082748	populating reader: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	e680bdc59c	populating reader: Less allocator switching on population Now when the key for new partition is copied inside do_find_or_create_entry we may call this function without allocator set, as it sets the allocator inside. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	449f9e1218	populating reader: Do not copy decorated key too early When the missing partition is created in cache the decorated key is copied from the ring position view too early -- to do the lookup. However, the read context had been already entered the partition and already has the decorated key on board, so for lookup we can use the reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	5a29e17a5f	row_cache: Revive do_find_or_create_entry concepts Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Botond Dénes	5e9a7d2608	row_cache: remove unnecessary includes of partition_snapshot_reader.hh Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200820124447.2561477-1-bdenes@scylladb.com>	2020-08-20 15:19:42 +02:00
Piotr Sarna	29e2dc242a	row_cache: add tracing In order to improve tracing for the read path, cache is now also actively adding basic trace information. Example: select * from t where token(p) >= 42 and token(p) < 112; activity \| timestamp \| source \| source_elapsed \| client -----------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:10:34.694000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:10:34.694307 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:10:34.694377 \| 127.0.0.1 \| 70 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:10:34.694425 \| 127.0.0.1 \| 118 \| 127.0.0.1 Start querying token range [{42, start}, {112, start}] [shard 0] \| 2020-08-07 13:10:34.694432 \| 127.0.0.1 \| 125 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2020-08-07 13:10:34.694446 \| 127.0.0.1 \| 139 \| 127.0.0.1 Scanning cache for range [{42, start}, {112, start}] and slice {(-inf, +inf)} [shard 0] \| 2020-08-07 13:10:34.694454 \| 127.0.0.1 \| 147 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:10:34.694494 \| 127.0.0.1 \| 187 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:10:34.694520 \| 127.0.0.1 \| 213 \| 127.0.0.1 Request complete \| 2020-08-07 13:10:34.694221 \| 127.0.0.1 \| 221 \| 127.0.0.1 Example with cache miss: select * from t where p = 7; activity \| timestamp \| source \| source_elapsed \| client -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:25:04.363000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:25:04.363310 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:25:04.363384 \| 127.0.0.1 \| 74 \| 127.0.0.1 Creating read executor for token 1634052884888577606 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2020-08-07 13:25:04.363450 \| 127.0.0.1 \| 139 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:25:04.363455 \| 127.0.0.1 \| 145 \| 127.0.0.1 Start querying singular range {{1634052884888577606, pk{000400000007}}} [shard 0] \| 2020-08-07 13:25:04.363461 \| 127.0.0.1 \| 151 \| 127.0.0.1 Querying cache for range {{1634052884888577606, pk{000400000007}}} and slice {(-inf, +inf)} [shard 0] \| 2020-08-07 13:25:04.363490 \| 127.0.0.1 \| 180 \| 127.0.0.1 Range {{1634052884888577606, pk{000400000007}}} not found in cache [shard 0] \| 2020-08-07 13:25:04.363494 \| 127.0.0.1 \| 183 \| 127.0.0.1 Reading key {{1634052884888577606, pk{000400000007}}} from sstable /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Data.db [shard 0] \| 2020-08-07 13:25:04.363522 \| 127.0.0.1 \| 211 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Index.db: scheduling bulk DMA read of size 16 at offset 0 [shard 0] \| 2020-08-07 13:25:04.363546 \| 127.0.0.1 \| 235 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Index.db: finished bulk DMA read of size 16 at offset 0, successfully read 16 bytes [shard 0] \| 2020-08-07 13:25:04.364406 \| 127.0.0.1 \| 1095 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Data.db: scheduling bulk DMA read of size 56 at offset 0 [shard 0] \| 2020-08-07 13:25:04.364445 \| 127.0.0.1 \| 1134 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Data.db: finished bulk DMA read of size 56 at offset 0, successfully read 56 bytes [shard 0] \| 2020-08-07 13:25:04.364599 \| 127.0.0.1 \| 1288 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:25:04.364685 \| 127.0.0.1 \| 1375 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:25:04.364719 \| 127.0.0.1 \| 1408 \| 127.0.0.1 Request complete \| 2020-08-07 13:25:04.364421 \| 127.0.0.1 \| 1421 \| 127.0.0.1 Example without cache for verification: select * from t where token(p) >= 42 and token(p) < 112 bypass cache; activity \| timestamp \| source \| source_elapsed \| client ------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:11:16.122000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:11:16.122657 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:11:16.122742 \| 127.0.0.1 \| 85 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:11:16.122806 \| 127.0.0.1 \| 149 \| 127.0.0.1 Start querying token range [{42, start}, {112, start}] [shard 0] \| 2020-08-07 13:11:16.122814 \| 127.0.0.1 \| 158 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2020-08-07 13:11:16.122829 \| 127.0.0.1 \| 172 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:11:16.122895 \| 127.0.0.1 \| 239 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:11:16.122928 \| 127.0.0.1 \| 271 \| 127.0.0.1 Request complete \| 2020-08-07 13:11:16.122280 \| 127.0.0.1 \| 280 \| 127.0.0.1 Message-Id: <3b31584c13f23f84af35660d0aa73ba56c30cf13.1596799589.git.sarna@scylladb.com>	2020-08-09 12:53:04 +03:00
Pavel Emelyanov	4d2f5f93a4	memtable: Switch onto B+ rails The change is the same as with row-cache -- use B+ with int64_t token as key and array of memtable_entry-s inside it. The changes are: Similar to those for row_cache: - compare() goes away, new collection uses ring_position_comparator - insertion and removal happens with the help of double_decker, most of the places are about slightly changed semantics of it - flags are added to memtable_entry, this makes its size larger than it could be, but still smaller than it was before Memtable-specific: - when the new entry is inserted into tree iterators _might_ get invalidated by double-decker inner array. This is easy to check when it happens, so the invalidation is avoided when possible - the size_in_allocator_without_rows() is now not very precise. This is because after the patch memtable_entries are not allocated individually as they used to. They can be squashed together with those having token conflict and asking allocator for the occupied memory slot is not possible. As the closest (lower) estimate the size of enclosing B+ data node is used Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	174b101a49	row_cache: Switch partition tree onto B+ rails The row_cache::partitions_type is replaced from boost::intrusive::set to bplus::tree<Key = int64_t, T = array_trusted_bounds<cache_entry>> Where token is used to quickly locate the partition by its token and the internal array -- to resolve hashing conflicts. Summary of changes in cache_entry: - compare's goes away as the new collection needs tri-compare one which is provided by ring_position_comparator - when initialized the dummy entry is added with "after_all_keys" kind, not "before_all_keys" as it was by default. This is to make tree entries sorted by token - insertion and removing of cache_entries happens inside double_decker, most of the changes in row_cache.cc are about passing constructor args from current_allocator.construct into double_decker.empace_before() - the _flags is extended to keep array head/tail bits. There's a room for it, sizeof(cache_entry) remains unchanged The rest fits smothly into the double_decker API. Also, as was told in the previous patch, insertion and removal _may_ invalidate iterators, but may leave them intact. However, currently this doesn't seem to be a problem as the cache_tracker ::insert() and ::on_partition_erase do invalidate iterators unconditionally. Later this can be otimized, as iterators are invalidated by double-decker only in case of hash conflict, otherwise it doesn't change arrays and B+ tree doesn't invalidate its. tests: unit(dev), perf(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	dff5eb6f25	memtable: Count partitions separately The B+ will not have constant-time .size() call, so do it by hands Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00

1 2 3 4 5 ...

326 Commits