Commit Graph

552 Commits

Author SHA1 Message Date
Avi Kivity
2c9b886b6d logalloc: reindent
No functional changes.
Message-Id: <20180731125116.32009-1-avi@scylladb.com>
2018-08-01 00:35:54 +01:00
Avi Kivity
0fc54aab98 logalloc: run releaser() in user-provided scheduling group
Let the user specify which scheduling group should run the
releaser, since it is running functions on the user's behalf.

Perhaps a cleaner interface is to require the user to call
a long-running function for the releaser, and so we'd just
inherit its scheduling group, but that's a much bigger change.
2018-07-31 11:57:58 +03:00
Paweł Dziepak
b485deb124 utils: uuid: don't use std::random_device()
std::random_device() is extremely slow. This patch modifies
make_rand_uuid() so that it requires only two invocations of the PRNG.
2018-07-26 12:02:32 +01:00
Avi Kivity
761931659a Merge "Do not linearise incoming CQL3 requests" from Paweł
"
This series changes the native CQL3 protocl layer so that it works with
fragmented buffers instead of a single temporary_buffer per request.
The main part is fragmented_temporary_buffer which represents a
fragmented buffer consisting of multiple temporary_buffers. It provides
helpers for reading fragmented buffer from an input_stream, interpreting
the data in the fragmented buffer as well as view that satisfy
FragmentRange concept.

There are still situations where a fragmented buffer is linearised. That
includes decompressing client requests (this uses reusable buffers in a
similar way to the code that sends compressed responses), CQL statement
restrictions and values that are hard-coded in prepared statements
(hopefully, the values in those cases will be small), value validation
in some cases (blobs are not validated, irrelevant for many fixed-size
small types, but may be a problem for large text cells) as well as
operations on collections.

Tests: unit(release), dtests(cql_prepared_test.py, cql_tests.py, cql_additional_tests.py)
"

* tag 'fragmented-cql3-receive/v1' of https://github.com/pdziepak/scylla: (23 commits)
  types: bytes_view: override fragmented validate()
  cql3: value_view: switch to fragmented_temporary_buffer::view
  types: add validate that accepts fragmented_temporary_buffer::view
  cql3 query_options: add linearize()
  cql3: query_options: use bytes_ostream for temporaries
  cql3: operation: make make_cell accept fragmented_temporary_buffer::view
  atomic_cell: accept fragmented_temporary_buffer::view values
  cql3: avoid ambiguity in a call to update_parameters::make_cell()
  transport: switch to fragmented_temporary_buffer
  transport: extract compression buffers from response class
  tests/reusable_buffer: test fragmented_temporary_buffer support
  utils: reusable_buffer: support fragmented_temporary_buffer
  tests: add test for fragmented_temporary_buffer
  util fragment_range: add general linearisation functions
  utils: add fragmented_temporary_buffer
  tests: add basic test for transport requests and responses
  tests/random-utils: print seed
  tests/random-utils: generate sstrings
  cql3: add value_view printer and equality comparison
  transport: move response outside of cql_server class
  ...
2018-07-22 19:40:37 +03:00
Vladimir Krivopalov
79c2f0095c utils: Add overloaded_functor helper.
The overloaded_functor class template can be used to encompass multiple
lambdas accepting different types into a single callable object that can
be used with any of those types.

One application is visitors for std::variant where different handling is
required for different types.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-07-20 13:50:17 -07:00
Paweł Dziepak
32ba47fb87 utils: reusable_buffer: support fragmented_temporary_buffer
reusable_buffer already supports bytes_ostream which is often used for
handling data sent from Scylla. This patch adds support for
fragmented_temporary_buffer which is going to be mainly used for data
received by Scylla.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
b152aafd67 util fragment_range: add general linearisation functions
All FragmentRange implementations can be linearised in the same way, so
let's provide linearized() and with_linearized() functions for all of
them.
2018-07-18 12:28:06 +01:00
Paweł Dziepak
fc484f0819 utils: add fragmented_temporary_buffer
Seastar output_streams produce temporary_buffer<char>s.
fragmented_temporary_buffer represents a single fragmented buffer that
consists of, possibly multiple, temporary_buffer<char>s.
2018-07-18 12:28:06 +01:00
Tomasz Grabiec
612b223819 managed_bytes: Mark read_linearize() as an allocation point 2018-07-17 16:39:43 +02:00
Tomasz Grabiec
d94c7c07a3 lsa: Disable alloc failure injector inside the LSA sanitizer
Message-Id: <1531814822-30259-1-git-send-email-tgrabiec@scylladb.com>
2018-07-17 11:27:56 +01:00
Duarte Nunes
63b63b0461 utils/loading_cache: Avoid using invalidated iterators
When periodically reloading the values in the loading_cache, we would
iterate over the list of entries and call the load() function for
those which need to be reloaded.

For some concrete caches, load() can remove the entry from the LRU set,
and can be executed inline from the parallel_for_each(). This means we
could potentially keep iterating using an invalidated iterator.

Fix this by using a temporary container to hold those entries to be
reloaded.

Spotted when reading the code.

Also use if constexpr and fix the comment in the function containing
the changes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180712124143.13638-1-duarte@scylladb.com>
2018-07-12 13:59:09 +01:00
Botond Dénes
2e7bf9c6f9 loading_cache::reload(): obtain key before calling _load()
The continuation attached to _load() needs the key of the loaded entry
to check whether it was disposed during the load. However if _load()
invalidates the entry the continuation's capture line will access
invalid memory while trying to obtain the key.
To avoid this save a copy of the key before calling _load() and pass it
to both _load() and the continuation.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <b571b73076ca863690f907fbd3fb4ff54e597b28.1531393608.git.bdenes@scylladb.com>
2018-07-12 13:42:42 +01:00
Avi Kivity
a4a2f743a8 Merge "Avoid large allocations when reading sstable index pages" from Tomasz
"
If there is a lot of partitions in the index page, index_list may grow large
and require large contiguous blocks of memory, because it's based on
std::vector. That puts pressure on the memory allocator, and if memory is
fragmented, may not be possible to satisfy without a lot of eviction. Switch
to chunked_vector to avoid this.

Refs #3597
"

* 'tgrabiec/avoid-large-alloc-in-index-reader' of github.com:tgrabiec/scylla:
  sstables: Switch index_list to chunked_vector to avoid large allocations
  utils: chunked_vector: Do not require T to be default-constructible for clear()
  utils: chunked_vector: Implement front()
2018-07-12 15:30:18 +03:00
Duarte Nunes
1fb3b924f4 utils/loading_cache: Remove superfluous continuation
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180712122031.13424-1-duarte@scylladb.com>
2018-07-12 15:22:35 +03:00
Tomasz Grabiec
b0f5df10d2 utils: chunked_vector: Do not require T to be default-constructible for clear()
resize(), used by clear(), requires T to be default-constructible in
case the vector is expanded. It's not actually needed for clearing,
and there will be users which use clear() with
non-default-constructible T, so implement clear() without using
resize().
2018-07-11 16:55:20 +02:00
Tomasz Grabiec
03832dab97 utils: chunked_vector: Implement front()
std::vector<> has it, so should this, for easy migration.
2018-07-11 16:55:20 +02:00
Avi Kivity
28621066e6 observable: allow an observable to disconnect() twice without penalty
Message-Id: <20180711070754.13286-1-avi@scylladb.com>
2018-07-11 10:15:01 +01:00
Avi Kivity
1895483781 observable: add comments explaining the purpose and use of the mechanism
Message-Id: <20180710133706.8791-1-avi@scylladb.com>
2018-07-11 10:15:01 +01:00
Avi Kivity
7db394ce50 observable: switch to noncopyable_function
std::function's move constructor is not noexcept, so observer's move
constructor and assignment operator also cannot be. Switch to Seastar's
noncopyable_function which provides better guarantees.

Tests: observer_tests (release)
Message-Id: <20180710073628.30702-1-avi@scylladb.com>
2018-07-10 09:42:49 +01:00
Avi Kivity
96737d140f utils: add observer/observable templates
An observable is used to decouple an information producer from a consumer
(in the same way as a callback), while allowing multiple consumers (called
observers) to coexist and to manage their lifetime separately.

Two classes are introduced:

 observable: a producer class; when an observable is invoked all observers
        receive the information
 observer: a consumer class; receives information from a observable

Modelled after boost::signals2, with the following changes
 - all signals return void; information is passed from the producer to
   the consumer but not back
 - thread-unsafe
 - modern C++ without preprocessor hacks
 - connection lifetime is always managed rather than leaked by default
 - renamed to avoid the funky "slot" name
Message-Id: <20180709172726.5079-1-avi@scylladb.com>
2018-07-09 18:48:44 +01:00
Avi Kivity
f3da043230 Merge "Make in-memory partition version merging preemptable" from Tomasz
"
Partition snapshots go away when the last read using the snapshot is done.
Currently we will synchronously attempt to merge partition versions on this event.
If partitions are large, that may stall the reactor for a significant amount of time,
depending on the size of newer versions. Cache update on memtable flush can
create especially large versions.

The solution implemented in this series is to allow merging to be preemptable,
and continue in the background. Background merging is done by the mutation_cleaner
associated with the container (memtable, cache). There is a single merging process
per mutation_cleaner. The merging worker runs in a separate scheduling group,
introduced here, called "mem_compaction".

When the last user of a snapshot goes away the snapshot is slided to the
oldest unreferenced version first so that the version is no longer reachable
from partition_entry::read(). The cleaner will then keep merging preceding
(newer) versions into it, until it merges a version which is referenced. The
merging is preemtable. If the initial merging is preempted, the snapshot is
enqueued into the cleaner, the worker woken up, and merging will continue
asynchronously.

When memtable is merged with cache, its cleaner is merged with cache cleaner,
so any outstanding background merges will be continued by the cache cleaner
without disruption.

This reduces scheduling latency spikes in tests/perf_row_cache_update
for the case of large partition with many rows. For -c1 -m1G I saw
them dropping from >23ms to 1-2ms. System-level benchmark using scylla-bench
shows a similar improvement.
"

* tag 'tgrabiec/merge-snapshots-gradually-v4' of github.com:tgrabiec/scylla:
  tests: perf_row_cache_update: Test with an active reader surviving memtable flush
  memtable, cache: Run mutation_cleaner worker in its own scheduling group
  mutation_cleaner: Make merge() redirect old instance to the new one
  mvcc: Use RAII to ensure that partition versions are merged
  mvcc: Merge partition version versions gradually in the background
  mutation_partition: Make merging preemtable
  tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots
2018-07-01 15:32:51 +03:00
Calle Wilund
054514a47a sstables::compress: Ensure unqualified compressor name if possible
Fixes #3546

Both older origin and scylla writes "known" compressor names (i.e. those
in origin namespace) unqualified (i.e. LZ4Compressor).

This behaviour was not preserved in the virtualization change. But
probably should be.

Message-Id: <20180627110930.1619-1-calle@scylladb.com>
2018-06-27 14:16:50 +03:00
Tomasz Grabiec
4d3cc2867a mutation_partition: Make merging preemtable 2018-06-27 12:48:30 +02:00
Avi Kivity
9a7ecdb3b9 Merge "Deglobalise cache_tracker" from Paweł
"
Cache tracker is a thread-local global object that indirectly depends on
the lifetimes of other objects. In particular, a member of
cache_tracker: mutation_cleaner may extend the lifetime of a
mutation_partition until the cleaner is destroyed. The
mutation_partition itself depends on LSA migrators which are
thread-local objects. Since, there is no direct dependency between
LSA-migrators and cache_tracker it is not guarantee that the former
won't be destroyed before the latter. The easiest (barring some unit
tests that repeat the same code several billion times) solution is to
stop using globals.

This series also improves the part of LSA sanitiser that deals with
migrators.

Fixes #3526.

Tests: unit(release)
"

* tag 'deglobalise-cache-tracker/v1-rebased' of https://github.com/pdziepak/scylla:
  mutation_cleaner: add disclaimer about mutation_partition lifetime
  lsa: enhance sanitizer for migrators
  lsa: formalise migrator id requirements
  row_cache: deglobalise row cache tracker
2018-06-26 16:38:12 +01:00
Paweł Dziepak
55bf9d78a6 lsa: enhance sanitizer for migrators
Current LSA sanitizer performs only basic checks on the migrators use,
without doing any additonal reporting in case an error is detected. This
patch enhances it so that when a problem is detected relevant stack
traces get printed.
2018-06-25 09:37:43 +01:00
Paweł Dziepak
fcd9b1f821 lsa: formalise migrator id requirements
object_descriptor uses special encoding for migrator ids which assumes
that the valid ones are in a range smaller than uint32_t. Let's add some
static asserts that make this fact more visible.
2018-06-25 09:37:43 +01:00
Paweł Dziepak
b4c5e1a6d4 utils: add reusable_buffer
This commit adds a helper class reusable_buffer which can be used to
avoid excessive memory allocations of large buffers when bytes_ostream
needs to be linearised. The idea is that reusable_buffer in most cases
is going to be thread local so that multiple continuation chains can
reuse the same large buffer.
2018-06-25 09:21:47 +01:00
Gleb Natapov
b38ced0fcd Configure logalloc memory size during initialization 2018-06-11 15:34:14 +03:00
Paweł Dziepak
ba5e64383a utils: add metaprogramming helper functions 2018-05-31 10:09:01 +01:00
Paweł Dziepak
c41b9fc7ec utils: add fragment range
This patch introduces a FragmentRange concept which is the minimal interface all
classes representing a fragmented buffer should satisfy.
2018-05-31 10:09:01 +01:00
Tomasz Grabiec
db36ff0643 utils: Extract small_vector.hh 2018-05-30 14:41:41 +02:00
Tomasz Grabiec
a19c5cbc16 Introduce a coroutine wrapper
Represents a deferring operation which defers cooperatively with the caller.

The operation is started and resumed by calling run(), which returns
with stop_iteration::no whenever the operation defers and is not
completed yet. When the operation is finally complete, run() returns
with stop_iteration::yes.

This allows the caller to:

 1) execute some post-defer and pre-resume actions atomically

 2) have control over when the operation is resumed and in which context,
    in particular the caller can cancel the operation at deferring points.

It will be used to implement deferring partition_version::apply_to_incomplete().
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
e5aa02efeb mvcc: Introduce partition_version_list 2018-05-30 12:18:56 +02:00
Vlad Zolotarov
3114cef42c loading_shared_values: introduce the templated find() overload
This overload alows searching the elements by an arbitrary key as long as it is "hashable"
to the same values as the default key and if there is a comparator for
this new key.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-22 20:15:00 -04:00
Vlad Zolotarov
34620deee4 utils::loading_cache: add remove(key)/remove(iterator) methods
remove(key): removes the entry with the given key if exists, otherwise does nothing.
remote(iterator): removes an entry by a given iterator (returned from loading_cache::find()).

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-22 20:05:00 -04:00
Tomasz Grabiec
498a4132c5 lsa: Add use for debug::static_migrators
Otherwise GDB complains about it being optimized out, breaking our
debug scritps.
2018-05-17 14:22:14 +02:00
Avi Kivity
05cec4a265 Merge "Reduce LSA memory reclamation overhead" from Tomasz
"
Main optimization is in the patch titled "lsa: Reduce amount of segment compactions".

I measured 50% reduction of cache update run time in a steady state for an
append-only workload with large partition, in perf_row_cache_update version from:

  c3f9e6ce1f/tests/perf_row_cache_update.cc

Other workloads, and other allocation sites probably also could see the
improvement.
"

* tag 'tgrabiec/reduce-lsa-segment-compactions-v1' of github.com:tgrabiec/scylla:
  lsa: Expose counters for allocation and compaction throughput
  lsa: Reduce amount of segment compactions
  lsa: Avoid the call to segment_pool::descriptor() in compact()
  lsa: Make reclamation on reserve refill more efficient
2018-05-16 10:24:20 +03:00
Tomasz Grabiec
4fdd61f1b0 lsa: Expose counters for allocation and compaction throughput
Allow observing amplification induced by segment compaction.
2018-05-15 21:49:01 +02:00
Tomasz Grabiec
3775a9ecec lsa: Reduce amount of segment compactions
Reclaiming memory through segment compaction is expensive. For
occupancy of 85%, in order to reclaim one free segment, we need to
compact 7 segments, by migrating 6 segments worth of data. This results
in significant amplification. Compaction involves moving objects,
which in some cases is expensive in itself as well
(See https://github.com/scylladb/scylla/issues/3247).

This patch reduces amount of segment compactions in favor of doing
more eviction. It especially helps workloads in which LRU order
matches allocation order, in which case there will be no segment
compaction, and just eviction.

In perf_row_cache_update test case for large partition with lots of
rows, which simulates appending workload, I measured that for each new
object allocated, 2 need to be migrated, before the patch. After the
patch, only 0.003 objects are migrated. This reduces run time of
cache update part by 50%.
2018-05-15 21:49:01 +02:00
Glauber Costa
2ba08178ca large_bitset: be more accurate with memory usage
We are slightly underestimating the amount of memory we use. Now that
the chunked vector can exports its internal memory usage we can use that
directly.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-15 11:22:21 -04:00
Glauber Costa
7190bb4f95 chunked_vector: exports its current memory usage
There are times in which we would like to estimate how much memory
a chunked_vector is using. We have two strategies to do it:

1) multiply the size by the size of the elements. That is wrong, because
the chunked_vector can allocate larger chunks in anticipation of more
elements to come.

2) multiply the number of chunks by 128kB. That is also wrong, because
the chunk_vector will not always allocate the entire chunk if there are
only a few elements in it.

The best way to deal with it is to allow the chunked_vector to exports
its current memory usage.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-15 11:22:21 -04:00
Tomasz Grabiec
8faafdaae5 lsa: Avoid the call to segment_pool::descriptor() in compact() 2018-05-11 19:07:23 +02:00
Tomasz Grabiec
19edf3970e lsa: Make reclamation on reserve refill more efficient
Currently reserve refill allocates segments repeatedly until the
reserve threhsold is met. If single segment allocation needs to
reclaim memory, it will ask the reclaimer for one segment. The
reclaimer could make better decisions if it knew the total number of
segments we try to allocate. In particular, it would not attempt to
compact any segment until it evicts total amount of memory first,
which may reduce the total amount of segment compactions during
refill.

This patch changes refill to increase reclamation step used by
allocate_segment() so that it matches the total amount of memory we
refill.
2018-05-11 19:07:23 +02:00
Vladimir Krivopalov
e5477c6c6c utils: Use dedicated enum for Bloom filter format instead of a boolean.
It better reflects the purpose of the parameter and provides better type-safety.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <10a4fc16dafa0fb3234969041f68f9e7bfc61312.1525899669.git.vladimir@scylladb.com>
2018-05-10 09:47:41 +03:00
Paweł Dziepak
c6c5accd19 lsa: provide migrator with the object size
While the migration function should have enough information to obtain
the object size itself, the LSA logic needs to compute it as well.
IMR is going to make calculating object sizes more expensive, so by
providing the infromation to the migrator we can avoid some needless
operations.
2018-05-09 16:52:26 +01:00
Paweł Dziepak
884888dc11 lsa: add free() that does not require object size
It is non-trivial to get the size of an IMR object. However, the
standard allocator doesn't really need it and LSA can compute it itself
by asking the migrator.
2018-05-09 16:52:26 +01:00
Paweł Dziepak
f7438a8b96 mutable_view: add default constructor and const_iterator
Makes the interface more consistent with bytes_view.
2018-05-09 16:52:26 +01:00
Paweł Dziepak
b1bec336b3 lsa: sanitize use of migrators
Having migrators dynamically registered and deregistered opens a new
class of bugs. This patch adds some additional checks in the debug mode
with the hopes of catching any misuse early.
2018-05-09 16:52:26 +01:00
Paweł Dziepak
cca9f8c944 lsa: reuse registered migrator ids
With the introduction of the new in-memory representation we will get
type- and schema-dependent migrators. Since there is no bound how many
times they can be created and destroyed it is better to be safe and
reuse registered migrator ids.
2018-05-09 16:52:20 +01:00
Paweł Dziepak
b3699f286d lsa: make migrators table thread-local
Migrators can be registered and deregistered at any time. If the table
is not thread-local we risk race conditions.
2018-05-09 16:10:46 +01:00