Replace stdx::optional and stdx::string_view with the C++ std
counterparts.
Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.
Scylla now requires GCC 8 to compile.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
Single-partition reader is less expensive than the one that accepts any
range of partitions, but it doesn't support fast-forwarding to another
partition range properly and therefore cannot be used if that option is
enabled.
After the new in-memory representation of cells was introduced there was
a regression in atomic_cell_or_collection::operator<< which stopped
printing the content of the cell. This makes debugging more incovenient
are time-consuming. This patch fixes the problem. Schema is propagated
to the atomic_cell_or_collection printer and the full content of the
cell is printed.
Fixes#3571.
Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>
Currently timeout is opt-in, that is, all methods that even have it
default it to `db::no_timeout`. This means that ensuring timeout is used
where it should be is completely up to the author and the reviewrs of
the code. As humans are notoriously prone to mistakes this has resulted
in a very inconsistent usage of timeout, many clients of
`flat_mutation_reader` passing the timeout only to some members and only
on certain call sites. This is small wonder considering that some core
operations like `operator()()` only recently received a timeout
parameter and others like `peek()` didn't even have one until this
patch. Both of these methods call `fill_buffer()` which potentially
talks to the lower layers and is supposed to propagate the timeout.
All this makes the `flat_mutation_reader`'s timeout effectively useless.
To make order in this chaos make the timeout parameter a mandatory one
on all `flat_mutation_reader` methods that need it. This ensures that
humans now get a reminder from the compiler when they forget to pass the
timeout. Clients can still opt-out from passing a timeout by passing
`db::no_timeout` (the previous default value) but this will be now
explicit and developers should think before typing it.
There were suprisingly few core call sites to fix up. Where a timeout
was available nearby I propagated it to be able to pass it to the
reader, where I couldn't I passed `db::no_timeout`. Authors of the
latter kind of code (view, streaming and repair are some of the notable
examples) should maybe consider propagating down a timeout if needed.
In the test code (the wast majority of the changes) I just used
`db::no_timeout` everywhere.
Tests: unit(release, debug)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>
boost::intrusive::set::insert() may throw if keys require
linearization and that fails, in which case we will leak the entry.
When this happens in cache, we will also violate the invariant for
entry eviction, which assumes all tracked entries are linked, and
cause a SEGFAULT.
Use the non-throwing and faster insert_before() instead. Where we
can't use insert_before(), use alloc_strategy_unique_ptr<> to ensure
that entry is deallocated on insert failure.
Fixes#3585.
The worker is responsible for merging MVCC snapshots, which is similar
to merging sstables, but in memory. The new scheduling group will be
therefore called "memory compaction".
We should run it in a separate scheduling group instead of
main/memtables, so that it doesn't disrupt writes and other system
activities. It's also nice for monitoring how much CPU time we spend
on this.
If memtable snapshot goes away after memtable started merging to
cache, it would enqueue the snapshots for cleaning on the memtable's
cleaner, which will have to clean without deferrring when the memtable
is destroyed. That may stall the reactor. To avoid this, make merge()
cause the old instance of the cleaner to redirect to the new instance
(owned by cache), like we do for regions. This way the snapshots
mentioned earlier can be cleaned after memtable is destroyed,
gracefully.
Before this patch, maybe_merge_versions() had to be manually called
before partition snapshot goes away. That is error prone and makes
client code more complicated. Delegate that task to a new
partition_snapshot_ptr object, through which all snapshots are
published now.
Now all snapshots will have a mutation_cleaner which they will use to
gently destroy freed partition_version objects.
Destruction of memtable entries during cache update is also using the
gentle cleaner now. We need to have a separate cleaner for memtable
objects even though they're owned by cache's region, because memtable
versions must be cleared without a cache_tracker.
Each memtable will have its own cleaner, which will be merged with the
cache's cleaner when memtable is merged into cache.
Fixes some sources of reactor stalls on cache update when there are
large partition entries in memtables.
Partitions can get very large. Destroying them all at once can stall
the reactor for significant amount of time. We want to avoid that by
doing destruction incrementally, deferring in between. A new API is
added for that at various levels:
stop_iteration clear_gently() noexcept;
It returns stop_iteration::yes when the object is fully cleared and
can be now destroyed quickly. So a deferring destruction can look like
this:
return repeat([this] { return clear_gently(); });
The reason why clear_gently() doesn't return a future<> itself is that some
contexts cannot defer, like memory reclamation.
We keep track of all updates and store the minimal values of timestamps,
TTLs and local deletion times across all the inserted data.
These values are written as a part of serialization_header for
Statistics.db and used for delta-encoding values when writing Data.db
file in SSTables 3.0 (mc) format.
For #1969.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Previously it was also possible to pass a frozen_mutation to it.
Now we de-serialize frozen mutations at the calling side.
This is a pre-requisite for collecting memtable statistics needed for
writing into the SSTables 3.0 format.
For #1969.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
buffer_size() exposes the collective size of the external memory
consumed by the mutattion-fragments in the flat reader's buffer. This
provides a basis to build basic memory accounting on. Altought this is
not the entire memory consumption of any given reader it is the most
volatile component and usually by far the largest one too.
In this patchset I am resubmitting Avi's enablement of the CPU scheduler
in his behalf. I've done a ton of testing in the series and there are
some improvements / changes that I had previously sent as a separate series.
What you see here is the result of merging that work.
After this patchset is applied, workloads are smoother and we are able to
uphold the pre-defined shares among the various actors.
We also finally have everything we need to merge the CPU and I/O controllers.
After that is done the code is now much simpler. But also, as a bonus,
controllers that were previously available for I/O only (compactions) are
enabled for CPU as well.
* git@github.com:glommer/scylla.git cpusched-v7:
Avi Kivity (4):
database, sstables, compaction: convert use of thread_scheduling_group
to seastar cpu scheduler
memtable, database: make memtable::clear_gently() inherit
scheduling_group
config: mark background_writer_scheduling_quota as Unused
database: place data_query execution stage into scheduling_group
Glauber Costa (9):
database, main: set up scheduling_groups for our main tasks
row_cache: actually use the scheduling group for update_cache
allow update_cache and clear_gently to use the entire task quota.
database: remove cpu_flush_quota metric
controllers: retire auto_adjust_flush_quota
controllers: allow memtable I/O controller to have shares statically
set
controllers: update control points for memtable I/O controller
controllers: allow a static priority to override the controller output
controllers: unify the I/O and CPU controllers
We have had a quota of partitions to process in clear_gently /
update_cache, so that we don't overwork. However, with those things now
being in their own task group there is no harm in allowing it to run
until we reach a natural preemption point.
While we are at it, clear_gently did not check for need_preempt()
before, so this patch fixes it.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
When digest is requested, pre-calculate the cell's hash. A downside of
this approach is that more work will be done when there are multiple
versions of a row that contain values for the same cell, but we expect
these cases to be rare and the upside of caching a cell's hash to
compensate for the extra work.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Retryable code that has side effects is a recipe for bugs. This patch
reworkds the snapshot reader so that the amount of logic run with
reclamation disabled is minimal and has a very limited side effects.
In the last patch, we enabled per-request timeouts, we enable timeouts
in fill_buffer. There are many places, though, in which we
fast_forward_to before we fill_buffer, so in order to make that
effective we need to propagate the timeouts to fast_forward_to as well.
In the same way as fill_buffer, we make the argument optional wherever
possible in the high level callers, making them mandatory in the
implementations.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
As part of the work to enable per-request timeouts, we enable timeouts
in fill_buffer.
The argument is made optional at the main classes, but mandatory in all
the ::impl versions. This way we'll make sure we didn't forget anything.
At this point we're still mostly passing that information around and
don't have any entity that will act on those timeouts. In the next patch
we will wire that up.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The wrapper is no longer needed because
read_range_rows returns ::mutation_reader instead of
sstables::mutation_reader and the reader returned from
it keeps the pointer to shared_sstable that was used to
create the reader.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Since we started accounting virtual dirty memory we no longer have a cap
on real dirty memory. In most situations that is not needed, since real
dirty will just be at most twice as much as virtual dirty (current
flushing memtable plus new memtable).
However, due to things like cache updates and component flushing we can
end up having a lot of memtables that are virtually freed but not yet
fully released, leading real dirty memory to explode using all the box'
memory.
This patch adds a cap on real dirty memory as well. Because of the
hierarchical nature of region_group, if the parent blocks due to memory
depletion, so will the child (virtual dirty region group).
After that is done, we need to make sure that dirty memory is not seen
as freed until the cache update is done. Until a particular partition is
moved to the cache it is not evictable. As a result we can OOM the
system if we have a lot of pending cache updates as the writes will not
be throttled and memory won't be made available.
This patch pins the memory used by the region as real dirty before the
cache update starts, and unpins it when it is over. In the mean time it
gradually releases memory of the partitions that are being moved to
cache.
I have verified in a couple of workloads that the amount of memory
accounted through this is the same amount of memory accounted through
the memtable flush procedure.
Fixes#1942
* git@github.com:glommer/scylla.git glommer/update-cache-v4:
row_cache: modernize use of seastar threads
mutation_partition: estimate size of partition
memtable: factor out calculation of memtable_entry memory size
memtable: add a method to export memtable's dirty memory manager
dirty_memory_manager: block if we hit the real dirty limit
dirty_memory_manager: add functions to manipulate real dirty
partition: add method to calculate memory size of a partition
row cache: pin real dirty during cache updates.
The total size is the sum of two components. Add a method that
does that sum so this code gets easier to reuse.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Rename flat_mutation_reader_from_mutation to
flat_mutation_reader_from_mutations.
Make it work with std::vector<mutation> instead of a single
mutation.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This method creates a flat_mutation_reader
instead of mutation_reader. All users will be gradually
converted to the new interface. make_reader is implemented
using make_flat_reader and will be removed once all users
are migrated.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
query::full_slice doesn't select any regular or static columns, which
is at odds with the expectations of its users. This patch replaces it
with the schema::full_slice() version.
Refs #2885
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>
"This changeset is the first step to flatten mutation_reader.
Then it introduces new mutation_fragment types for partition header and end of partition.
Using those a new flat_mutation_reader is defined.
Finally it introduces converters between new flat_mutation_reader and
old mutation_reader."
* 'haaawk/flattened_mutation_reader_v12' of github.com:scylladb/seastar-dev:
Add tests for flat_mutation_reader
Introduce conversion from flat_mutation_reader to mutation_reader
Introduce conversion from mutation_reader to flat_mutation_reader
Introduce flat_mutation_reader
Extract FlattenedConsumer concept using GCC6_CONCEPT
Introduce partition_end mutation_fragment
Introduce a position for end of partition
Introduce partition_start mutation_fragment
Introduce FragmentConsumer
Introduce a position for partition start
streamed_mutation: Extract concepts using GCC6_CONCEPT macro
This type of mutation_fragment will be used in new mutation_reader
to signal the end of the current partition.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>