scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	4b51e0bf30	row_cache: Move cache_tracker to a separate header It will be needed by the sstable layer to get the to the LRU and the LSA region. Split to avoid inclusion of whole row_cache.hh	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	7fa4e10aa0	row_cache: Use generic LRU for eviction In preparation for tracking different kinds of objects, not just rows_entry, in the LRU, switch to the LRU implementation form utils/lru.hh which can hold arbitrary element type.	2021-07-02 10:25:58 +02:00
Michael Livshin	9ef2317248	row_cache: count range tombstones processed during read Refs #7749. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210602152210.17948-1-michael.livshin@scylladb.com>	2021-06-14 14:29:05 +02:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Benny Halevy	0a2670c9ec	row_cache: hold read_context as unique_ptr Such that the holder, that is responsible for closing the read_context before destroying it, holds it uniquely. cache_flat_mutation_reader may be constructed either with a read_context&, where it knows that the read_context is owned externally, by the caller, or it could be constructed with a std::unique_ptr<read_context> in which case it assumes ownership of the read_context and it is now responsible for closing it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Pavel Emelyanov	ccc1f24097	row_cache: Remove mentionings of cache_streamed_mutation This class was replaced by cache_flat_mutation_reader long ago and doesn't exist. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210330153942.27222-1-xemul@scylladb.com>	2021-04-01 12:54:45 +03:00
Tomasz Grabiec	f0a3272a5f	row_cache: Add metric for dummy row hits This will help to diagnose performance problems related to the read having to walk through a lot of dummy rows to fill the buffer. Refs #8153	2021-02-25 18:26:01 +01:00
Avi Kivity	60f5ec3644	Merge 'managed_bytes: switch to explicit linearization' from Michał Chojnowski This is a revival of #7490. Quoting #7490: The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope. We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes. As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized(). Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view). By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure. This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator. Closes #7820 * github.com:scylladb/scylla: test: add hashers_test memtable: fix accounting of managed_bytes in partition_snapshot_accounter test: add managed_bytes_test utils: fragment_range: add a fragment iterator for FragmentedView keys: update comments after changes and remove an unused method mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator row_cache: more indentation fixes utils: remove unused linearization facilities in `managed_bytes` class misc: fix indentation treewide: remove remaining `with_linearized_managed_bytes` uses memtable, row_cache: remove `with_linearized_managed_bytes` uses utils: managed_bytes: remove linearizing accessors keys, compound: switch from bytes_view to managed_bytes_view sstables: writer: add write_* helpers for managed_bytes_view compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view types: change equal() to accept managed_bytes_view types: add parallel interfaces for managed_bytes_view types: add to_managed_bytes(const sstring&) serializer_impl: handle managed_bytes without linearizing utils: managed_bytes: add managed_bytes_view::operator[] utils: managed_bytes: introduce managed_bytes_view utils: fragment_range: add serialization helpers for FragmentedMutableView bytes: implement std::hash using appending_hash utils: mutable_view: add substr() utils: fragment_range: add compare_unsigned utils: managed_bytes: make the constructors from bytes and bytes_view explicit utils: managed_bytes: introduce with_linearized() utils: managed_bytes: constrain with_linearized_managed_bytes() utils: managed_bytes: avoid internal uses of managed_bytes::data() utils: managed_bytes: extract do_linearize_pure() thrift: do not depend on implicit conversion of keys to bytes_view clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view cql3: expression: linearize get_value_from_mutation() eariler bytes: add to_bytes(bytes) cql3: expression: mark do_get_value() as static	2021-01-18 11:01:28 +02:00
Pavel Solodovnikov	bf8b138b42	memtable, row_cache: remove `with_linearized_managed_bytes` uses Since `managed_bytes::data()` is deleted as well as other public APIs of `managed_bytes` which would linearize stored values except for explicit `with_linearized`, there is no point invoking `with_linearized_managed_bytes` hack which would trigger automatic linearization under the hood of managed_bytes. Remove useless `with_linearized_managed_bytes` wrapper from memtable and row_cache code. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Raphael S. Carvalho	198b87503f	row_cache: allow external updater to decouple preparation from execution External updater may do some preparatory work like constructing a new sstable list, and at the end atomically replace the old list by the new one. Decoupling the preparation from execution will give us the following benefits: - the preparation step can now yield if needed to avoid reactor stalls, as it's been futurized. - the execution step will now be able to provide strong exception guarantees, as it's now decoupled from the preparation step which can be non-exception-safe. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:45 -03:00
Tomasz Grabiec	a22645b7dd	Merge "Unfriend rows_entry, cache_tracker and mutation_partition" from Pavel Emelyanov The classes touche private data of each other for no real reason. Putting the interaction behind API makes it easier to track the usage. * xemul/br-unfriends-in-row-cache-2: row cache: Unfriend classes from each other rows_entry: Move container/hooks types declarations rows_entry: Simplify LRU unlink mutation_partition: Define .replace_with method for rows_entry mutation_partition: Use rows_entry::apply_monotonically	2020-09-22 21:18:14 +02:00
Pavel Emelyanov	7a1265a338	rows_entry: Move container/hooks types declarations Define container types near the containing elements' hook members, so that they could be private without the need to friend classes with each other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-11 16:35:51 +03:00
Pavel Emelyanov	7ed1e18a13	rows_entry: Simplify LRU unlink The cache_tracker tries to access private member of the rows_entry to unlink it, but the lru_type is auto_unlink and can unlink itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-11 16:35:51 +03:00
Pavel Emelyanov	ada174c932	row_cache: Kill incomplete_tag The incomplete entry is created in one place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	240b966695	row_cache: Do not copy partition tombstone when creating cache entry The row_cache::find_or_create is only used to put (or touch) an entry in cache having the partition_start mutation at hands. Thus, theres no point in carrying key reference and tombstone value through the calls, just the partition_start reference is enough. Since the new cache entry is created incomplete, rename the creation method to reflect this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	84a6d439ad	test: Lookup an existing entry with its own helper The only caller of find_or_create() in tests works on already existing (.populate()-d) entry, so patch this place for explicity and for the sake of next patching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	3f33a71c0c	row_cache: Move missing entry creation into helper No functional changes, just move the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	5a29e17a5f	row_cache: Revive do_find_or_create_entry concepts Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	35a22ac48a	bptree: Special intra-node key search when possible If the key type is int64_t and the less-comparator is "natural" (i.e. it's literally 'a < b') we may use the SIMD instructions to search for the key on a node. Before doing so, the maybe_key and the searcher should be prepared for that, in particular: 1. maybe_key should set unused keys to the minimal value 2. the searcher for this case should call the gt() helper with primitive types -- int64_t search key and array of int64_t values To tell to B+ code that the key-less pair is such the less-er should define the simplify_key() method converting search keys to int64_t-s. This searcher is selected automatically, if any mismatch happens it silently falls back to default one. Thus also add a static assertion to the row-cache to mitigate this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-06 15:41:31 +03:00
Pavel Emelyanov	92f58f62f2	headers:: Remove flat_mutation_reader.hh from several other headers All they can live with forward declaration of the f._m._r. plus a seastar header in commitlog code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:54:47 +03:00
Pavel Emelyanov	4d2f5f93a4	memtable: Switch onto B+ rails The change is the same as with row-cache -- use B+ with int64_t token as key and array of memtable_entry-s inside it. The changes are: Similar to those for row_cache: - compare() goes away, new collection uses ring_position_comparator - insertion and removal happens with the help of double_decker, most of the places are about slightly changed semantics of it - flags are added to memtable_entry, this makes its size larger than it could be, but still smaller than it was before Memtable-specific: - when the new entry is inserted into tree iterators _might_ get invalidated by double-decker inner array. This is easy to check when it happens, so the invalidation is avoided when possible - the size_in_allocator_without_rows() is now not very precise. This is because after the patch memtable_entries are not allocated individually as they used to. They can be squashed together with those having token conflict and asking allocator for the occupied memory slot is not possible. As the closest (lower) estimate the size of enclosing B+ data node is used Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	174b101a49	row_cache: Switch partition tree onto B+ rails The row_cache::partitions_type is replaced from boost::intrusive::set to bplus::tree<Key = int64_t, T = array_trusted_bounds<cache_entry>> Where token is used to quickly locate the partition by its token and the internal array -- to resolve hashing conflicts. Summary of changes in cache_entry: - compare's goes away as the new collection needs tri-compare one which is provided by ring_position_comparator - when initialized the dummy entry is added with "after_all_keys" kind, not "before_all_keys" as it was by default. This is to make tree entries sorted by token - insertion and removing of cache_entries happens inside double_decker, most of the changes in row_cache.cc are about passing constructor args from current_allocator.construct into double_decker.empace_before() - the _flags is extended to keep array head/tail bits. There's a room for it, sizeof(cache_entry) remains unchanged The rest fits smothly into the double_decker API. Also, as was told in the previous patch, insertion and removal _may_ invalidate iterators, but may leave them intact. However, currently this doesn't seem to be a problem as the cache_tracker ::insert() and ::on_partition_erase do invalidate iterators unconditionally. Later this can be otimized, as iterators are invalidated by double-decker only in case of hash conflict, otherwise it doesn't change arrays and B+ tree doesn't invalidate its. tests: unit(dev), perf(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	7b2754cf5f	row-cache: Use ring_position_comparator in some places The row cache (and memtable) code uses own comparators built on top of the ring_position_comparator for collections of partitions. These collections will be switched from the key less-compare to the pair of token less-compare + key tri-compare. Prepare for the switch by generalizing the ring_partition_comparator and by patching all the non-collections usage of less-compare to use one. The memtable code doesn't use it outside of collections, but patch it anyway as a part of preparations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	1346289151	cache_tracker: Mark methods noexcept All but few are trivially such. The clear_continuity() calls cache_entry::set_continuous() that had become noexcept a patch ago. The allocator() calls region.allocator() which had been marked noexcept few patches back. The on_partition_erase() calls allocator().invalidate_references(), both had been marked noexcept few patches back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:44:17 +03:00
Pavel Emelyanov	d4ef845136	cache_entry: Mark methods noexcept All but one are trivially such, the position() one calls is_dummy_entry() which has become noexcept right now. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:43 +03:00
Botond Dénes	fe024cecdc	row_cache: pass a valid permit to underlying read All reader are soon going to require a valid permit, so make sure we have a valid permit which we can pass to the underlying reader when creating it. This means `row_cache::make_reader()` now also requires a permit to be passed to it.	2020-05-28 11:34:35 +03:00
Pavel Emelyanov	83fe0427d2	api/cache_service: Relax getting partitions count This patch has two goals -- speed up the total partitions calculations (walking databases is faster than walking tables), and get rid og row_cache._partitions.size() call, which will not be available on new _partitions collection implementation. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423133900.27818-1-xemul@scylladb.com>	2020-04-23 17:47:58 +02:00
Pavel Emelyanov	d3b6f66f50	row_cache: Remove unused invalidate_unwrapped() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423133557.27053-1-xemul@scylladb.com>	2020-04-23 17:04:31 +03:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Tomasz Grabiec	57a93513bd	row_cache: Make evict() not use invalidate_unwrapped() invalidate_unwrapped() calls cache_entry::evict(), which cannot be called concurrently with cache update. invalidate() serializes it properly by calling do_update(), but evict() doesn't. The purpose of evict() is to stress eviction in tests, which can happen concurrently with cache update. Switch it to use memory reclaimer, so that it's both correct and more realistic. evict() is used only in tests.	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	595e1a540e	row_cache: Switch _prev_snapshot_pos to be a ring_position_ext dht::ring_position cannot represent all ring_position_view instances, in particular those obtained from dht::ring_position_view::for_range_start(). To allow using the latter, switch to views.	2019-05-13 19:30:50 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Tomasz Grabiec	074be4d4e8	memtable, cache: Run mutation_cleaner worker in its own scheduling group The worker is responsible for merging MVCC snapshots, which is similar to merging sstables, but in memory. The new scheduling group will be therefore called "memory compaction". We should run it in a separate scheduling group instead of main/memtables, so that it doesn't disrupt writes and other system activities. It's also nice for monitoring how much CPU time we spend on this.	2018-06-27 21:51:04 +02:00
Paweł Dziepak	96b0577343	row_cache: deglobalise row cache tracker Row cache tracker has numerous implicit dependencies on ohter objects (e.g. LSA migrators for data held by mutation_cleaner). The fact that both cache tracker and some of those dependencies are thread local objects makes it hard to guarantee correct destruction order. Let's deglobalise cache tracker and put in in the database class.	2018-06-25 09:37:43 +01:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Tomasz Grabiec	70c72773be	cache: Defer during partition merging	2018-05-30 14:41:41 +02:00
Tomasz Grabiec	3f19f76c67	mvcc: Destroy memtable partition versions gently Now all snapshots will have a mutation_cleaner which they will use to gently destroy freed partition_version objects. Destruction of memtable entries during cache update is also using the gentle cleaner now. We need to have a separate cleaner for memtable objects even though they're owned by cache's region, because memtable versions must be cleared without a cache_tracker. Each memtable will have its own cleaner, which will be merged with the cache's cleaner when memtable is merged into cache. Fixes some sources of reactor stalls on cache update when there are large partition entries in memtables.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	f0c1edd672	cache: Destroy partition versions incrementally Instead of destroying whole partition_versions at once, we will do that gently using mutation_cleaner to avoid reactor stalls. Large deletions could happen when large partition gets invalidated, upgraded to a new schema, or when it's abandaned by a detached snapshot. Refs #3289.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	40cc766cf2	database: Add API for incremental clearing of partition entries Partitions can get very large. Destroying them all at once can stall the reactor for significant amount of time. We want to avoid that by doing destruction incrementally, deferring in between. A new API is added for that at various levels: stop_iteration clear_gently() noexcept; It returns stop_iteration::yes when the object is fully cleared and can be now destroyed quickly. So a deferring destruction can look like this: return repeat([this] { return clear_gently(); }); The reason why clear_gently() doesn't return a future<> itself is that some contexts cannot defer, like memory reclamation.	2018-05-30 12:18:56 +02:00
Tomasz Grabiec	2f75212ca4	cache: Define trivial methods inline They have users in a different compilation unit, in partition_version.cc	2018-05-30 12:18:56 +02:00
Tomasz Grabiec	641bcd0b35	cache: Introduce unlink_from_lru() Will be used in row_cache_alloc_stress to unlink partitions which we don't want to get evicted, instead of reapeatedly calling touch() on them after each subsequent population. After switching to row-level LRU, doing so greatly increases run time of the test due to quadratic behavior.	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	b9d22584bb	cache: Add row-level stats about cache update from memtable	2018-03-07 16:52:58 +01:00
Tomasz Grabiec	da901b93fc	cache: Track number of rows and row invalidations	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	381bf02f55	cache: Evict with row granularity Instead of evicting whole partitions, evicts whole rows. As part of this, invalidation of partition entries was changed to not evict from snapshots right away, but unlink them and let them be evicted by the reclaimer.	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	dce9185fc9	cache: Track static row insertions separately from regular rows So that row eviction counter, which doesn't look at the static row, can be in sync with row insertion counter.	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	5320705300	cache: Propagate cache_tracker to places manipulating evictable entries cache_tracker reference will be needed to link/unlink row entries. No change of behavior in this patch.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	30df3ddd7d	cache: Do not evict from cache_entry destructor We will need to propagate a cache_tracker reference to evict(). Instead of evicting from destructor, do so before cache_entry gets unlinked from the tree. Entries which are not linked, don't need to be explicitly evicted.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	4efab6f6a6	cache: Use on_evicted() in cache_tracker::clear() In preparation for switching LRU to row level.	2018-03-06 11:50:27 +01:00

1 2 3 4

190 Commits