Since we started accounting virtual dirty memory we no longer have a cap
on real dirty memory. In most situations that is not needed, since real
dirty will just be at most twice as much as virtual dirty (current
flushing memtable plus new memtable).
However, due to things like cache updates and component flushing we can
end up having a lot of memtables that are virtually freed but not yet
fully released, leading real dirty memory to explode using all the box'
memory.
This patch adds a cap on real dirty memory as well. Because of the
hierarchical nature of region_group, if the parent blocks due to memory
depletion, so will the child (virtual dirty region group).
After that is done, we need to make sure that dirty memory is not seen
as freed until the cache update is done. Until a particular partition is
moved to the cache it is not evictable. As a result we can OOM the
system if we have a lot of pending cache updates as the writes will not
be throttled and memory won't be made available.
This patch pins the memory used by the region as real dirty before the
cache update starts, and unpins it when it is over. In the mean time it
gradually releases memory of the partitions that are being moved to
cache.
I have verified in a couple of workloads that the amount of memory
accounted through this is the same amount of memory accounted through
the memtable flush procedure.
Fixes#1942
* git@github.com:glommer/scylla.git glommer/update-cache-v4:
row_cache: modernize use of seastar threads
mutation_partition: estimate size of partition
memtable: factor out calculation of memtable_entry memory size
memtable: add a method to export memtable's dirty memory manager
dirty_memory_manager: block if we hit the real dirty limit
dirty_memory_manager: add functions to manipulate real dirty
partition: add method to calculate memory size of a partition
row cache: pin real dirty during cache updates.
This patch fixes 'DROP INDEX' CQL statement to also drop the underlying
index view automatically so that we don't leave unused materialized
views behind.
Message-Id: <1510303421-15945-1-git-send-email-penberg@scylladb.com>
Right now, once a region is moved to the cache is no longer visible to
the dirty memory system. Not as real dirty nor virtual dirty.
The problem is that until a particular partition is moved to the cache
it is not evictable. As a result we can OOM the system if we have a lot
of pending cache updates as the writes will not be throttled and memory
won't be made available.
This patch pins the memory used by the region as real dirty before the
cache update starts, and unpins it when it is over. In the mean time it
gradually releases memory of the partitions that are being moved to
cache.
I have verified in a couple of workloads that the amount of memory
accounted through this is the same amount of memory accounted through
the memtable flush procedure.
Fixes#1942
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Once that is added, also add a method to a memtable entry to calculate
the entire size of a memtable entry. Right now we only have one method
to calculate the size minus rows.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
There are times in which we want to add and remove real dirty memory
without impacting virtual dirty. One such example is the cache update
process, where real dirty is the limiting factor.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Since we started accounting virtual dirty memory we no longer have a cap
on real dirty memory. In most situations that is not needed, since real
dirty will just be at most twice as much as virtual dirty (current
flushing memtable plus new memtable).
However, due to things like cache updates and component flushing we can
end up having a lot of memtables that are virtually freed but not yet
fully released, leading real dirty memory to explode using all the box'
memory.
This patch adds a cap on real dirty memory as well. Because of the
hierarchical nature of region_group, if the parent blocks due to memory
depletion, so will the child (virtual dirty region group).
A next step is to add a controller that will increase the priority of
the tasks involving in releasing real dirty memory if we get dangerously
close to the threshold.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The total size is the sum of two components. Add a method that
does that sum so this code gets easier to reuse.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
In the memtable flusher, we account for the size of a partition as we
read them. However, there are other points in the architecture where we
would like to calculate the size of a partition in a point in which we
are not reading it. One such example is the cache update process.
This patch enhances the mutation_partition adding a method that returns
the total size for this partition.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
For a while now we have an async() function, that simplifies the code by not
needing to issue an explicit join. This patch converts the row cache to use
async() as well, which most of our code already does. Doing so will make
it easier to make changes to update_cache.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
"Implement flat_mutation_reader::consume and add tests for it.
For that implement flat_mutation_reader_from_mutations and
read_mutation_from_flat_mutation_reader."
* 'haaawk/flat_reader_consume_v3' of github.com:scylladb/seastar-dev:
Add tests for flat_mutation_reader::consume
Add tests for flat_mutation_reader utils
Introduce read_mutation_from_flat_mutation_reader
Make mutation_rebuilder streamed_mutation independent
flat_mutation_reader_from_mutation: support multiple mutations
Introduce flat_mutation_reader::consume
Move FlattenedConsumer concept to flat_mutation_reader.hh
"This patchset introduces memtable::make_flat_reader that returns
flat_mutation_reader and converts internal memtable readers into
flat_mutation_readers.
It also introduces some utility methods like make_forwardable and
make_partition_snapshot_flat_reader."
* 'haaawk/flat_reader_memtable_v4' of github.com:scylladb/seastar-dev:
Turn scanning_reader into flat_mutation_reader
Change memtable_entry::read to return flat_mutation_reader
Make iterator_reader independent from mutation_reader
Introduce make_partition_snapshot_flat_reader
Prepare partition_snapshot_flat_reader
Introduce flat_mutation_reader_from_mutation
Prepare flat_mutation_reader_from_mutation
Introduce make_forwardable
Prepare make_forwardable for flat_mutation_reader
Introduce empty_flat_reader
memtable: Introduce make_flat_reader
mutation_rebuilder will be used not only with streamed_mutations
but also with flat_mutation_readers so it's better for it to be
independent from streamed_mutation.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Rename flat_mutation_reader_from_mutation to
flat_mutation_reader_from_mutations.
Make it work with std::vector<mutation> instead of a single
mutation.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This concept will be used both in flat_mutation_reader.hh
and mutation_reader.hh. mutation_reader.hh includes
flat_mutation_reader.hh so we have to move the concept to
make it accessible in both files.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Make it possible for a mutation_source to be created both for sources
that use old mutation_reader and new flat_mutation_reader.
Add tests for flat_mutation_reader::next_partition to run_mutation_source_tests.
* seastar-dev.git 'dev/haaawk/flat_reader_mutation_source_v3':
Remove mutation_reader.hh dependency from flat_mutation_reader.hh
Prepare mutation_source for more than one implementation
Add flat reader mutation source implementation
Add mutation_source::make_flat_mutation_reader
Use mutation_source::make_flat_mutation_reader in tests
Add flat_mutation_reader_assertions
Add test for flat_mutation_reader::next_partition
This commit creates a copy of partition_snapshot_reader
and names it partition_snapshot_flat_reader.
This new class will be turned into a flat_mutation_reader
in the next commit.
The purpose of this commit is to make it easier to review the next
commit.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This is a utility method that will be handy in conversion
from mutation_reader to flat_mutation_reader.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This commit copies streamed_mutation_from_mutation
from streamed_mutation to flat_mutation_reader
and renames it to streamed_mutation_from_mutation_copy.
This copy will be used as a base for
flat_mutation_reader_from_mutation.
The purpose of this commit is to make it easier to review the next
commit.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
It will add the ability to fast_forward_to on position_range
to flat_mutation_reader that does not have this ability.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This commit copies make_forwardable from streamed_mutation
to flat_mutation_reader and renames it to make_forwardable_copy.
This copy will be used as a base for make_forwardable implementation
for flat_mutation_reader.
The purpose of this commit is to make it easier to review the next
commit.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This method creates a flat_mutation_reader
instead of mutation_reader. All users will be gradually
converted to the new interface. make_reader is implemented
using make_flat_reader and will be removed once all users
are migrated.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This will be used as an intermediate state of migration
from mutation_reader to flat_mutation_reader.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
There will be a second implementation that will be used by
sources that are converted to flat_mutation_reader.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
It's not needed and causes cyclic dependency when we need
flat_mutation_reader in mutation_source.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
We don't support non-PK restrictions correctly as explained in commit
3c90607 ("tests/cql_query_test: Fix view creation in
test_duration_restrictions()") and Apache Cassandra doesn't support them
for MVs either. Change some test cases to not rely on them.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20171107165138.3176-1-duarte@scylladb.com>
f44131226a introduced a regression where for some verbs we would
return partitions in their natural sort order, but since thrift
partition ranges can wrap-around, what we need to preserve is query
order.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20171103201118.18175-1-duarte@scylladb.com>
Fixes#2938.
* 'tgrabiec/fix-range-tombstone-list-exception-safety-v1' of github.com:scylladb/seastar-dev:
tests: range_tombstone_list: Add test for exception safety of apply()
tests: Introduce range_tombstone_list assertions
cache: Make range tombstone merging exception-safe
range_tombstone_list: Introduce apply_monotonically()
range_tombstone_list: Make reverter::erase() exception-safe
range_tombstone_list: Fix memory leaks in case of bad_alloc
mutation_partition: Fix abort in case range tombstone copying fails
managed_bytes: Declare copy constructor as allocation point
Integrate with allocation failure injection framework
We don't support non-PK restrictions correctly as explained in commit
3c90607 ("tests/cql_query_test: Fix view creation in
test_duration_restrictions()") and Apache Cassandra doesn't support them
for MVs either. Disable the tests, but don't remove them because they
will be resurrected once CASSANDRA-13832 is fixed.
Message-Id: <1510052422-3478-1-git-send-email-penberg@scylladb.com>