Commit Graph

200 Commits

Author SHA1 Message Date
Botond Dénes
5e97fb9fc4 row_cache: update reader implementations to v2
cache_flat_mutation_reader gets a native v2 implementation. The
underlying mutation representation is not changed: range deletions are
still stored as v1 range_tombstones in mutation_partition. These are
converted to range tombstone changes during reading.
This allows for separating the change of a native v2 reader
implementation and a native v2 in-memory storage format, enabling the
two to be done at separate times and incrementally.
2022-04-21 14:57:04 +03:00
Botond Dénes
2a0d7e8a1d row_cache: cache_entry::read(): return v2 reader
Push the conversion down one level. Soon we will make cache flat
mutation reader a v2 reader, this keeps the related noise separate.
2022-04-20 10:59:09 +03:00
Botond Dénes
0b035c9099 row_cache: return v2 readers from make_reader*()
And adjust callers. The factory functions just sprinkle upgrade_to_v2()
on returned readers for now.
One test in row_cache_test.cc had to be disabled, because the upgrade to
v2 wrapper we now have over cache readers doesn't allow it to directly
control the reader's buffer size and so the test fails. There is a FIXME
left in the test code and the test will be re-enabled once a native v2
reader implementation allows us to get rid of the upgrade wrapper.
2022-04-20 10:59:09 +03:00
Botond Dénes
11109f4c45 mutation_reader: move mutation source into readers/ 2022-03-30 15:42:51 +03:00
Avi Kivity
e7fb71020b Merge 'replica: Optimize empty_flat_reader out of the hot path' from Michał Chojnowski
When row_cache::make_reader() and memtable::make_flat_reader() see that the query result is empty, they return empty_flat_reader, which is a trivial implementation of flat_mutation_reader.
Even though empty_flat_reader doesn't do anything meaningful, it still needs to be created, handled in merging_reader and destroyed. Turns out this is costly.

This patch series replaces hot path uses of empty_flat_reader with an empty optional.

Performance effects:

`perf_simple_query --smp 1`
TPS: 138k -> 168k
allocs/op: 80.2 -> 71.1
insns/op: 49.9k -> 45.1k

`perf_simple_query --smp 1 --enable-cache=1 --flush`
TPS: 125k -> 150k
allocs/op: 79.2 -> 71.1
insns/op: 51.7k -> 47.2k

For a cassandra-stress benchmark (localhost, 100% cache reads) this translates to a TPS increase from ~42k to ~48k per hyperthread.

Note that this optimization is effective for single-partition reads where the queried partition is only in cache/sstables or only in memtables. Other queries (e.g. where the partition is in both cache in memtables and needs to be merged) are unaffected.

Closes #10204

* github.com:scylladb/scylla:
  replica: Prefer row_cache::make_reader_opt() to row_cache::make_reader()
  row_cache: Add row_cache::make_reader_opt()
  replica: Prefer memtable::make_flat_reader_opt() to memtable::make_flat_reader()
  memtable: Add memtable::make_flat_reader_opt()

[avi: adjust #include for readers/ split]
2022-03-14 14:07:00 +02:00
Michał Chojnowski
6c6519a909 row_cache: Add row_cache::make_reader_opt() 2022-03-14 12:02:49 +01:00
Avi Kivity
cbba80914d memtable: move to replica module and namespace
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.

Memtables are also used outside the replica, in two places:
 - in some virtual tables; this is also in some way inside the replica,
   (virtual readers are installed at the replica level, not the
   cooordinator), so I don't consider it a layering violation
 - in many sstable unit tests, as a convenient way to create sstables
   with known input. This is a layering violation.

We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).

Test: unit (dev)

Closes #10120
2022-02-23 09:05:16 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Tomasz Grabiec
63351483f0 row_cache: Support reverse reads natively
Some implementation notes below.

When iterating in reverse, _last_row is after the current entry
(_next_row) in table schema order, not before like in the forward
mode.

Since there is no dummy row before all entries, reverse iteration must
be now prepared for the fact that advancing _next_row may land not
pointing at any row. The partition_snapshot_row_cursor maintains
continuity() correctly in this case, and positions the cursor before
all rows, so most of the code works unchanged. The only excpetion is
in move_to_next_entry(), which now cannot assume that failure to
advance to an entry means it can end a read.

maybe_drop_last_entry() is not implemented in reverse mode, which may
expose reverse-only workload to the problem of accumulating dummy
entries.

ensure_population_lower_bound() was not updating _last_row after
inserting the entry in latets version. This was not a problem for
forward reads because they do not modify the row in the partition
snapshot represented by _last_row. They only need the row to be there
in the latest version after the call. It's different for reveresed
reads, which change the continuity of the entry represented by
_last_row, hence _last_row needs to have the iterator updated to point
to the entry from the latest version, otherwise we'd set the
continuity of the previous version entry which would corrupt the
continuity.
2021-12-19 22:41:35 +01:00
Botond Dénes
41facb3270 treewide: move reversing to the mutation sources
Push down reversing to the mutation-sources proper, instead of doing it
on the querier level. This will allow us to test reverse reads on the
mutation source level.
The `max_size` parameter of `consume_page()` is now unused but is not
removed in this patch, it will be removed in a follow-up to reduce
churn.
2021-09-29 12:15:45 +03:00
Tomasz Grabiec
4b51e0bf30 row_cache: Move cache_tracker to a separate header
It will be needed by the sstable layer to get the to the LRU and the
LSA region. Split to avoid inclusion of whole row_cache.hh
2021-07-02 10:25:58 +02:00
Tomasz Grabiec
7fa4e10aa0 row_cache: Use generic LRU for eviction
In preparation for tracking different kinds of objects, not just
rows_entry, in the LRU, switch to the LRU implementation form
utils/lru.hh which can hold arbitrary element type.
2021-07-02 10:25:58 +02:00
Michael Livshin
9ef2317248 row_cache: count range tombstones processed during read
Refs #7749.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Message-Id: <20210602152210.17948-1-michael.livshin@scylladb.com>
2021-06-14 14:29:05 +02:00
Pavel Solodovnikov
76bea23174 treewide: reduce header interdependencies
Use forward declarations wherever possible.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>

Closes #8813
2021-06-07 15:58:35 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Benny Halevy
0a2670c9ec row_cache: hold read_context as unique_ptr
Such that the holder, that is responsible for closing the
read_context before destroying it, holds it uniquely.

cache_flat_mutation_reader may be constructed either
with a read_context&, where it knows that the read_context
is owned externally, by the caller, or it could
be constructed with a std::unique_ptr<read_context> in
which case it assumes ownership of the read_context
and it is now responsible for closing it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Pavel Emelyanov
ccc1f24097 row_cache: Remove mentionings of cache_streamed_mutation
This class was replaced by cache_flat_mutation_reader long ago
and doesn't exist.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210330153942.27222-1-xemul@scylladb.com>
2021-04-01 12:54:45 +03:00
Tomasz Grabiec
f0a3272a5f row_cache: Add metric for dummy row hits
This will help to diagnose performance problems related to the read
having to walk through a lot of dummy rows to fill the buffer.

Refs #8153
2021-02-25 18:26:01 +01:00
Avi Kivity
60f5ec3644 Merge 'managed_bytes: switch to explicit linearization' from Michał Chojnowski
This is a revival of #7490.

Quoting #7490:

The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope.

We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes.

As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized().

Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view).

By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure.

This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator.

Closes #7820

* github.com:scylladb/scylla:
  test: add hashers_test
  memtable: fix accounting of managed_bytes in partition_snapshot_accounter
  test: add managed_bytes_test
  utils: fragment_range: add a fragment iterator for FragmentedView
  keys: update comments after changes and remove an unused method
  mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator
  row_cache: more indentation fixes
  utils: remove unused linearization facilities in `managed_bytes` class
  misc: fix indentation
  treewide: remove remaining `with_linearized_managed_bytes` uses
  memtable, row_cache: remove `with_linearized_managed_bytes` uses
  utils: managed_bytes: remove linearizing accessors
  keys, compound: switch from bytes_view to managed_bytes_view
  sstables: writer: add write_* helpers for managed_bytes_view
  compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view
  types: change equal() to accept managed_bytes_view
  types: add parallel interfaces for managed_bytes_view
  types: add to_managed_bytes(const sstring&)
  serializer_impl: handle managed_bytes without linearizing
  utils: managed_bytes: add managed_bytes_view::operator[]
  utils: managed_bytes: introduce managed_bytes_view
  utils: fragment_range: add serialization helpers for FragmentedMutableView
  bytes: implement std::hash using appending_hash
  utils: mutable_view: add substr()
  utils: fragment_range: add compare_unsigned
  utils: managed_bytes: make the constructors from bytes and bytes_view explicit
  utils: managed_bytes: introduce with_linearized()
  utils: managed_bytes: constrain with_linearized_managed_bytes()
  utils: managed_bytes: avoid internal uses of managed_bytes::data()
  utils: managed_bytes: extract do_linearize_pure()
  thrift: do not depend on implicit conversion of keys to bytes_view
  clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view
  cql3: expression: linearize get_value_from_mutation() eariler
  bytes: add to_bytes(bytes)
  cql3: expression: mark do_get_value() as static
2021-01-18 11:01:28 +02:00
Pavel Solodovnikov
bf8b138b42 memtable, row_cache: remove with_linearized_managed_bytes uses
Since `managed_bytes::data()` is deleted as well as other public
APIs of `managed_bytes` which would linearize stored values except
for explicit `with_linearized`, there is no point
invoking `with_linearized_managed_bytes` hack which would trigger
automatic linearization under the hood of managed_bytes.

Remove useless `with_linearized_managed_bytes` wrapper from
memtable and row_cache code.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-01-08 14:16:08 +01:00
Raphael S. Carvalho
198b87503f row_cache: allow external updater to decouple preparation from execution
External updater may do some preparatory work like constructing a new sstable list,
and at the end atomically replace the old list by the new one.

Decoupling the preparation from execution will give us the following benefits:
- the preparation step can now yield if needed to avoid reactor stalls, as it's
been futurized.
- the execution step will now be able to provide strong exception guarantees, as
it's now decoupled from the preparation step which can be non-exception-safe.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-12-28 13:17:45 -03:00
Tomasz Grabiec
a22645b7dd Merge "Unfriend rows_entry, cache_tracker and mutation_partition" from Pavel Emelyanov
The classes touche private data of each other for no real
reason. Putting the interaction behind API makes it easier
to track the usage.

* xemul/br-unfriends-in-row-cache-2:
  row cache: Unfriend classes from each other
  rows_entry: Move container/hooks types declarations
  rows_entry: Simplify LRU unlink
  mutation_partition: Define .replace_with method for rows_entry
  mutation_partition: Use rows_entry::apply_monotonically
2020-09-22 21:18:14 +02:00
Pavel Emelyanov
7a1265a338 rows_entry: Move container/hooks types declarations
Define container types near the containing elements' hook
members, so that they could be private without the need
to friend classes with each other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-09-11 16:35:51 +03:00
Pavel Emelyanov
7ed1e18a13 rows_entry: Simplify LRU unlink
The cache_tracker tries to access private member of the
rows_entry to unlink it, but the lru_type is auto_unlink
and can unlink itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-09-11 16:35:51 +03:00
Pavel Emelyanov
ada174c932 row_cache: Kill incomplete_tag
The incomplete entry is created in one place.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-09-03 21:13:21 +03:00
Pavel Emelyanov
240b966695 row_cache: Do not copy partition tombstone when creating cache entry
The row_cache::find_or_create is only used to put (or touch) an entry in cache
having the partition_start mutation at hands. Thus, theres no point in carrying
key reference and tombstone value through the calls, just the partition_start
reference is enough.

Since the new cache entry is created incomplete, rename the creation method
to reflect this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-09-03 21:13:21 +03:00
Pavel Emelyanov
84a6d439ad test: Lookup an existing entry with its own helper
The only caller of find_or_create() in tests works on already existing (.populate()-d) entry,
so patch this place for explicity and for the sake of next patching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-09-03 21:13:21 +03:00
Pavel Emelyanov
3f33a71c0c row_cache: Move missing entry creation into helper
No functional changes, just move the code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-09-03 21:13:21 +03:00
Pavel Emelyanov
5a29e17a5f row_cache: Revive do_find_or_create_entry concepts
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-09-03 21:13:21 +03:00
Pavel Emelyanov
35a22ac48a bptree: Special intra-node key search when possible
If the key type is int64_t and the less-comparator is "natural" (i.e. it's
literally 'a < b') we may use the SIMD instructions to search for the key
on a node. Before doing so, the maybe_key and the searcher should be prepared
for that, in particular:

1. maybe_key should set unused keys to the minimal value
2. the searcher for this case should call the gt() helper with
   primitive types -- int64_t search key and array of int64_t values

To tell to B+ code that the key-less pair is such the less-er should define
the simplify_key() method converting search keys to int64_t-s.

This searcher is selected automatically, if any mismatch happens it silently
falls back to default one. Thus also add a static assertion to the row-cache
to mitigate this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-06 15:41:31 +03:00
Pavel Emelyanov
92f58f62f2 headers:: Remove flat_mutation_reader.hh from several other headers
All they can live with forward declaration of the f._m._r. plus a
seastar header in commitlog code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:54:47 +03:00
Pavel Emelyanov
4d2f5f93a4 memtable: Switch onto B+ rails
The change is the same as with row-cache -- use B+ with int64_t token
as key and array of memtable_entry-s inside it.

The changes are:

Similar to those for row_cache:

- compare() goes away, new collection uses ring_position_comparator

- insertion and removal happens with the help of double_decker, most
  of the places are about slightly changed semantics of it

- flags are added to memtable_entry, this makes its size larger than
  it could be, but still smaller than it was before

Memtable-specific:

- when the new entry is inserted into tree iterators _might_ get
  invalidated by double-decker inner array. This is easy to check
  when it happens, so the invalidation is avoided when possible

- the size_in_allocator_without_rows() is now not very precise. This
  is because after the patch memtable_entries are not allocated
  individually as they used to. They can be squashed together with
  those having token conflict and asking allocator for the occupied
  memory slot is not possible. As the closest (lower) estimate the
  size of enclosing B+ data node is used

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:30:02 +03:00
Pavel Emelyanov
174b101a49 row_cache: Switch partition tree onto B+ rails
The row_cache::partitions_type is replaced from boost::intrusive::set
to bplus::tree<Key = int64_t, T = array_trusted_bounds<cache_entry>>

Where token is used to quickly locate the partition by its token and
the internal array -- to resolve hashing conflicts.

Summary of changes in cache_entry:

- compare's goes away as the new collection needs tri-compare one which
  is provided by ring_position_comparator

- when initialized the dummy entry is added with "after_all_keys" kind,
  not "before_all_keys" as it was by default. This is to make tree
  entries sorted by token

- insertion and removing of cache_entries happens inside double_decker,
  most of the changes in row_cache.cc are about passing constructor args
  from current_allocator.construct into double_decker.empace_before()

- the _flags is extended to keep array head/tail bits. There's a room
  for it, sizeof(cache_entry) remains unchanged

The rest fits smothly into the double_decker API.

Also, as was told in the previous patch, insertion and removal _may_
invalidate iterators, but may leave them intact. However, currently
this doesn't seem to be a problem as the cache_tracker ::insert() and
::on_partition_erase do invalidate iterators unconditionally.

Later this can be otimized, as iterators are invalidated by double-decker
only in case of hash conflict, otherwise it doesn't change arrays and
B+ tree doesn't invalidate its.

tests: unit(dev), perf(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:30:02 +03:00
Pavel Emelyanov
7b2754cf5f row-cache: Use ring_position_comparator in some places
The row cache (and memtable) code uses own comparators built on top
of the ring_position_comparator for collections of partitions. These
collections will be switched from the key less-compare to the pair
of token less-compare + key tri-compare.

Prepare for the switch by generalizing the ring_partition_comparator
and by patching all the non-collections usage of less-compare to use
one.

The memtable code doesn't use it outside of collections, but patch it
anyway as a part of preparations.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:30:02 +03:00
Pavel Emelyanov
1346289151 cache_tracker: Mark methods noexcept
All but few are trivially such.

The clear_continuity() calls cache_entry::set_continuous() that had become noexcept
a patch ago.

The allocator() calls region.allocator() which had been marked noexcept few patches
back.

The on_partition_erase() calls allocator().invalidate_references(), both had
been marked noexcept few patches back.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:44:17 +03:00
Pavel Emelyanov
d4ef845136 cache_entry: Mark methods noexcept
All but one are trivially such, the position() one calls is_dummy_entry()
which has become noexcept right now.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:41:43 +03:00
Botond Dénes
fe024cecdc row_cache: pass a valid permit to underlying read
All reader are soon going to require a valid permit, so make sure we
have a valid permit which we can pass to the underlying reader when
creating it. This means `row_cache::make_reader()` now also requires
a permit to be passed to it.
2020-05-28 11:34:35 +03:00
Pavel Emelyanov
83fe0427d2 api/cache_service: Relax getting partitions count
This patch has two goals -- speed up the total partitions
calculations (walking databases is faster than walking tables),
and get rid og row_cache._partitions.size() call, which will
not be available on new _partitions collection implementation.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200423133900.27818-1-xemul@scylladb.com>
2020-04-23 17:47:58 +02:00
Pavel Emelyanov
d3b6f66f50 row_cache: Remove unused invalidate_unwrapped()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200423133557.27053-1-xemul@scylladb.com>
2020-04-23 17:04:31 +03:00
Piotr Dulikowski
59fbbb993f memtables: add partition/row hit/miss counters
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-12 13:35:41 +01:00
Tomasz Grabiec
57a93513bd row_cache: Make evict() not use invalidate_unwrapped()
invalidate_unwrapped() calls cache_entry::evict(), which cannot be
called concurrently with cache update. invalidate() serializes it
properly by calling do_update(), but evict() doesn't. The purpose of
evict() is to stress eviction in tests, which can happen concurrently
with cache update. Switch it to use memory reclaimer, so that it's
both correct and more realistic.

evict() is used only in tests.
2019-10-03 22:03:28 +02:00
Tomasz Grabiec
595e1a540e row_cache: Switch _prev_snapshot_pos to be a ring_position_ext
dht::ring_position cannot represent all ring_position_view instances,
in particular those obtained from
dht::ring_position_view::for_range_start(). To allow using the latter,
switch to views.
2019-05-13 19:30:50 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Tomasz Grabiec
074be4d4e8 memtable, cache: Run mutation_cleaner worker in its own scheduling group
The worker is responsible for merging MVCC snapshots, which is similar
to merging sstables, but in memory. The new scheduling group will be
therefore called "memory compaction".

We should run it in a separate scheduling group instead of
main/memtables, so that it doesn't disrupt writes and other system
activities. It's also nice for monitoring how much CPU time we spend
on this.
2018-06-27 21:51:04 +02:00
Paweł Dziepak
96b0577343 row_cache: deglobalise row cache tracker
Row cache tracker has numerous implicit dependencies on ohter objects
(e.g. LSA migrators for data held by mutation_cleaner). The fact that
both cache tracker and some of those dependencies are thread local
objects makes it hard to guarantee correct destruction order.

Let's deglobalise cache tracker and put in in the database class.
2018-06-25 09:37:43 +01:00
Paweł Dziepak
27014a23d7 treewide: require type info for copying atomic_cell_or_collection 2018-05-31 15:51:11 +01:00
Tomasz Grabiec
70c72773be cache: Defer during partition merging 2018-05-30 14:41:41 +02:00
Tomasz Grabiec
3f19f76c67 mvcc: Destroy memtable partition versions gently
Now all snapshots will have a mutation_cleaner which they will use to
gently destroy freed partition_version objects.

Destruction of memtable entries during cache update is also using the
gentle cleaner now. We need to have a separate cleaner for memtable
objects even though they're owned by cache's region, because memtable
versions must be cleared without a cache_tracker.

Each memtable will have its own cleaner, which will be merged with the
cache's cleaner when memtable is merged into cache.

Fixes some sources of reactor stalls on cache update when there are
large partition entries in memtables.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
f0c1edd672 cache: Destroy partition versions incrementally
Instead of destroying whole partition_versions at once, we will do that
gently using mutation_cleaner to avoid reactor stalls.

Large deletions could happen when large partition gets invalidated,
upgraded to a new schema, or when it's abandaned by a detached snapshot.

Refs #3289.
2018-05-30 14:41:40 +02:00