Presence checker is constructed and destroyed in the standard
allocator context, but the presence check was invoked in the LSA
context. If the presence checker allocates and caches some managed
objects, there will be alloc-dealloc mismatch.
That is the case with LeveledCompactionStrategy, which uses
incremental_selector.
Fix by invoking the presence check in the standard allocator context.
Fixes#4063.
Message-Id: <1547547700-16599-1-git-send-email-tgrabiec@scylladb.com>
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.
Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.
Scylla now requires GCC 8 to compile.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
Implementing intra-partition fast-forwarding adds more complexity to
already very-much-not-trivial cache readers and isn't really critical in
any way since it is not used outside of the tests. Let's use the generic
adapter instead of natively implementing it.
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
After the new in-memory representation of cells was introduced there was
a regression in atomic_cell_or_collection::operator<< which stopped
printing the content of the cell. This makes debugging more incovenient
are time-consuming. This patch fixes the problem. Schema is propagated
to the atomic_cell_or_collection printer and the full content of the
cell is printed.
Fixes#3571.
Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>
Currently timeout is opt-in, that is, all methods that even have it
default it to `db::no_timeout`. This means that ensuring timeout is used
where it should be is completely up to the author and the reviewrs of
the code. As humans are notoriously prone to mistakes this has resulted
in a very inconsistent usage of timeout, many clients of
`flat_mutation_reader` passing the timeout only to some members and only
on certain call sites. This is small wonder considering that some core
operations like `operator()()` only recently received a timeout
parameter and others like `peek()` didn't even have one until this
patch. Both of these methods call `fill_buffer()` which potentially
talks to the lower layers and is supposed to propagate the timeout.
All this makes the `flat_mutation_reader`'s timeout effectively useless.
To make order in this chaos make the timeout parameter a mandatory one
on all `flat_mutation_reader` methods that need it. This ensures that
humans now get a reminder from the compiler when they forget to pass the
timeout. Clients can still opt-out from passing a timeout by passing
`db::no_timeout` (the previous default value) but this will be now
explicit and developers should think before typing it.
There were suprisingly few core call sites to fix up. Where a timeout
was available nearby I propagated it to be able to pass it to the
reader, where I couldn't I passed `db::no_timeout`. Authors of the
latter kind of code (view, streaming and repair are some of the notable
examples) should maybe consider propagating down a timeout if needed.
In the test code (the wast majority of the changes) I just used
`db::no_timeout` everywhere.
Tests: unit(release, debug)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>
boost::intrusive::set::insert() may throw if keys require
linearization and that fails, in which case we will leak the entry.
When this happens in cache, we will also violate the invariant for
entry eviction, which assumes all tracked entries are linked, and
cause a SEGFAULT.
Use the non-throwing and faster insert_before() instead. Where we
can't use insert_before(), use alloc_strategy_unique_ptr<> to ensure
that entry is deallocated on insert failure.
Fixes#3585.
The worker is responsible for merging MVCC snapshots, which is similar
to merging sstables, but in memory. The new scheduling group will be
therefore called "memory compaction".
We should run it in a separate scheduling group instead of
main/memtables, so that it doesn't disrupt writes and other system
activities. It's also nice for monitoring how much CPU time we spend
on this.
If memtable snapshot goes away after memtable started merging to
cache, it would enqueue the snapshots for cleaning on the memtable's
cleaner, which will have to clean without deferrring when the memtable
is destroyed. That may stall the reactor. To avoid this, make merge()
cause the old instance of the cleaner to redirect to the new instance
(owned by cache), like we do for regions. This way the snapshots
mentioned earlier can be cleaned after memtable is destroyed,
gracefully.
Row cache tracker has numerous implicit dependencies on ohter objects
(e.g. LSA migrators for data held by mutation_cleaner). The fact that
both cache tracker and some of those dependencies are thread local
objects makes it hard to guarantee correct destruction order.
Let's deglobalise cache tracker and put in in the database class.
Memtable entries should be cleaned using memtable cleaner, which
unlike the cache' cleaner is not associated with the cache
tracker. It's an error to clean a snapshot using tracker which doesn't
own the entries. This will corrupt cache tracker's row counter.
Fixes failure of test_exception_safety_of_update_from_memtable from
row_cache.cc in debug mode and with allocation failure injection
enabled.
Introduce in "cache: Defer during partition merging"
(70c72773be).
Message-Id: <1528988256-20578-1-git-send-email-tgrabiec@scylladb.com>
Incremental merging will be implemented by the means of resumable
functions, which return stop_iteration::no when not yet
finished. We're not using futures, so that the caller can do work
around preemption points as well.
Now all snapshots will have a mutation_cleaner which they will use to
gently destroy freed partition_version objects.
Destruction of memtable entries during cache update is also using the
gentle cleaner now. We need to have a separate cleaner for memtable
objects even though they're owned by cache's region, because memtable
versions must be cleared without a cache_tracker.
Each memtable will have its own cleaner, which will be merged with the
cache's cleaner when memtable is merged into cache.
Fixes some sources of reactor stalls on cache update when there are
large partition entries in memtables.
Instead of destroying whole partition_versions at once, we will do that
gently using mutation_cleaner to avoid reactor stalls.
Large deletions could happen when large partition gets invalidated,
upgraded to a new schema, or when it's abandaned by a detached snapshot.
Refs #3289.
buffer_size() exposes the collective size of the external memory
consumed by the mutattion-fragments in the flat reader's buffer. This
provides a basis to build basic memory accounting on. Altought this is
not the entire memory consumption of any given reader it is the most
volatile component and usually by far the largest one too.
"
This series switches granularity of memory-pressure-induced eviction in cache
from a partition to a row.
Since 9b21a9b cache can store partial partitions with row granularity but they
were still evicted as a unit. This is problematic for the following reasons:
- more is evicted than necessary, which decreases cache efficiency. In the
worst case, whole cache gets evicted at once
- evicting large amounts of memory (large partitions) at once may impact
latency badly
Fixes#2576.
See the documentation added in patch titled "doc: Document row cache eviction"
for details on how eviction works.
Open issues to be fixed incrementally:
- range tombstones are not evictable
- cache update still has partition granularity, which
causes bad latency on memtable flush with large partitions
"
* tag 'tgrabiec/row-level-eviction-v3' of github.com:scylladb/seastar-dev: (43 commits)
doc: Document row cache eviction
tests: cache: Add tests for row-level eviction
tests: cache: Check that data is evictable after schema change
tests: cache: Move definitions to the top
tests: perf_cache_eviction: Switch eviction counter to row granularity
tests: row_cache_alloc_stress: Avoid quadratic behavior
cache: Introduce unlink_from_lru()
cache: Add row-level stats about cache update from memtable
mvcc: Propagate information if insertion happened from ensure_entry_if_complete()
cache: Track number of rows and row invalidations
cache: Evict with row granularity
cache: Track static row insertions separately from regular rows
tests: mvcc: Use apply_to_incomplete() to create versions
tests: mvcc: Fix test_apply_to_incomplete()
tests: cache: Do not depend on particular granularity of eviction
tests: cache: Make sure readers touch rows in test_eviction()
mvcc: Store complete rows in each version in evictable entries
mvcc: Introduce partition_snapshot_row_cursor::ensure_entry_in_latest()
tests: cache: Invoke partial eviction in test_concurrent_reads_and_eviction
cache: Ensure all evictable partition_versions have a dummy after all rows
...
Will be used in row_cache_alloc_stress to unlink partitions which we
don't want to get evicted, instead of reapeatedly calling touch() on
them after each subsequent population. After switching to row-level
LRU, doing so greatly increases run time of the test due to quadratic
behavior.
Instead of evicting whole partitions, evicts whole rows.
As part of this, invalidation of partition entries was changed to not
evict from snapshots right away, but unlink them and let them be
evicted by the reclaimer.
We will need to propagate a cache_tracker reference to evict(). Instead
of evicting from destructor, do so before cache_entry gets unlinked
from the tree. Entries which are not linked, don't need to be
explicitly evicted.
Commit 6ccd317 introduced a bug in partition_entry::evict() where a
partition entry may be partially evicted if there are non-evictable
snapshots in it. Partially evicting some of the versions may violate
consistency of a snapshot which includes evicted versions. For one,
continuity flags are interpreted realtive to the merged view, not
within a version, so evicting from some of the versions may mark
reanges as continuous when before they were discontinuous. Also, range
tombtsones of the snapshot are taken from all versions, so we can't
partially evict some of them without marking all affected ranges as
discontinuous.
The fix is to revert back to full eviciton, and avoid moving
non-evictable snapshots to cache. When moving whole partition entry to
cache, we first create a neutral empty partition entry and then merge
the memtable entry into it just like we would if the entry already
existed.
Fixes#3215.
Tests: unit (release)
Message-Id: <1518710592-21925-2-git-send-email-tgrabiec@scylladb.com>
We have had a quota of partitions to process in clear_gently /
update_cache, so that we don't overwork. However, with those things now
being in their own task group there is no harm in allowing it to run
until we reach a natural preemption point.
While we are at it, clear_gently did not check for need_preempt()
before, so this patch fixes it.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We have moved clear_gently from using a seastar::thread's scheduling_group to
using the CPU scheduler's. However, update_cache was forgotten.
This patch fixes that and gets rid of the old group just in case.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
In the last patch, we enabled per-request timeouts, we enable timeouts
in fill_buffer. There are many places, though, in which we
fast_forward_to before we fill_buffer, so in order to make that
effective we need to propagate the timeouts to fast_forward_to as well.
In the same way as fill_buffer, we make the argument optional wherever
possible in the high level callers, making them mandatory in the
implementations.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
As part of the work to enable per-request timeouts, we enable timeouts
in fill_buffer.
The argument is made optional at the main classes, but mandatory in all
the ::impl versions. This way we'll make sure we didn't forget anything.
At this point we're still mostly passing that information around and
don't have any entity that will act on those timeouts. In the next patch
we will wire that up.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
compiler: gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
Problems introduced in f6a461c7a4
and 37b19ae6ba, respectively.
They both fail to compile due to use of method in lambda without
explicit mention of this. Some of failure is fixed by not using
auto in lambda parameter.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20171218222144.12297-1-raphaelsc@scylladb.com>