Commit Graph

15891 Commits

Author SHA1 Message Date
Tomasz Grabiec
0a1aec2bd6 tests: perf_row_cache_update: Test with an active reader surviving memtable flush
Exposes latency issues caused by mutation_cleaner life time issues,
fixed by eralier commits.
2018-06-27 21:51:04 +02:00
Tomasz Grabiec
074be4d4e8 memtable, cache: Run mutation_cleaner worker in its own scheduling group
The worker is responsible for merging MVCC snapshots, which is similar
to merging sstables, but in memory. The new scheduling group will be
therefore called "memory compaction".

We should run it in a separate scheduling group instead of
main/memtables, so that it doesn't disrupt writes and other system
activities. It's also nice for monitoring how much CPU time we spend
on this.
2018-06-27 21:51:04 +02:00
Tomasz Grabiec
6c6ffaee71 mutation_cleaner: Make merge() redirect old instance to the new one
If memtable snapshot goes away after memtable started merging to
cache, it would enqueue the snapshots for cleaning on the memtable's
cleaner, which will have to clean without deferrring when the memtable
is destroyed. That may stall the reactor. To avoid this, make merge()
cause the old instance of the cleaner to redirect to the new instance
(owned by cache), like we do for regions. This way the snapshots
mentioned earlier can be cleaned after memtable is destroyed,
gracefully.
2018-06-27 21:51:04 +02:00
Tomasz Grabiec
450985dfee mvcc: Use RAII to ensure that partition versions are merged
Before this patch, maybe_merge_versions() had to be manually called
before partition snapshot goes away. That is error prone and makes
client code more complicated. Delegate that task to a new
partition_snapshot_ptr object, through which all snapshots are
published now.
2018-06-27 21:51:04 +02:00
Tomasz Grabiec
c26a304fbb mvcc: Merge partition version versions gradually in the background
When snapshots go away, typically when the last reader is destroyed,
we used to merge adjacent versions atomically. This could induce
reactor stalls if partitions were large. This is especially true for
versions created on cache update from memtables.

The solution is to allow this process to be preempted and move to the
background. mutation_cleaner keeps a linked list of such unmerged
snapshots and has a worker fiber which merges them incrementally and
asynchronously with regards to reads.

This reduces scheduling latency spikes in tests/perf_row_cache_update
for the case of large partition with many rows. For -c1 -m1G I saw
them dropping from 23ms to 2ms.
2018-06-27 12:48:30 +02:00
Tomasz Grabiec
4d3cc2867a mutation_partition: Make merging preemtable 2018-06-27 12:48:30 +02:00
Tomasz Grabiec
4995a8c568 tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots
Preparation for switching to background merging.
2018-06-27 12:48:30 +02:00
Avi Kivity
9a7ecdb3b9 Merge "Deglobalise cache_tracker" from Paweł
"
Cache tracker is a thread-local global object that indirectly depends on
the lifetimes of other objects. In particular, a member of
cache_tracker: mutation_cleaner may extend the lifetime of a
mutation_partition until the cleaner is destroyed. The
mutation_partition itself depends on LSA migrators which are
thread-local objects. Since, there is no direct dependency between
LSA-migrators and cache_tracker it is not guarantee that the former
won't be destroyed before the latter. The easiest (barring some unit
tests that repeat the same code several billion times) solution is to
stop using globals.

This series also improves the part of LSA sanitiser that deals with
migrators.

Fixes #3526.

Tests: unit(release)
"

* tag 'deglobalise-cache-tracker/v1-rebased' of https://github.com/pdziepak/scylla:
  mutation_cleaner: add disclaimer about mutation_partition lifetime
  lsa: enhance sanitizer for migrators
  lsa: formalise migrator id requirements
  row_cache: deglobalise row cache tracker
2018-06-26 16:38:12 +01:00
Asias He
c3b5a2ecd5 gossip: Fix tokens assignment in assassinate_endpoint
The tokens vector is defined a few lines above and is needed outsie the
if block.

Do not redefine it again in the if block, otherwise the tokens will be empty.

Found by code inspection.

Fixes #3551.

Message-Id: <c7a06375c65c950e94236571127f533e5a60cbfd.1530002177.git.asias@scylladb.com>
2018-06-26 16:38:12 +01:00
Tomasz Grabiec
6d6b93d1e7 flat_mutation_reader: Move field initialization to initializer list
This works around a problem of std::terminate() being called in debug
mode build if initialization of _current throws.

Backtrace:

Thread 2 "row_cache_test_" received signal SIGABRT, Aborted.
0x00007ffff17ce9fb in raise () from /lib64/libc.so.6
(gdb) bt
  #0  0x00007ffff17ce9fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff17d077d in abort () from /lib64/libc.so.6
  #2  0x00007ffff5773025 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
  #3  0x00007ffff5770c16 in ?? () from /lib64/libstdc++.so.6
  #4  0x00007ffff576fb19 in ?? () from /lib64/libstdc++.so.6
  #5  0x00007ffff5770508 in __gxx_personality_v0 () from /lib64/libstdc++.so.6
  #6  0x00007ffff3ce4ee3 in ?? () from /lib64/libgcc_s.so.1
  #7  0x00007ffff3ce570e in _Unwind_Resume () from /lib64/libgcc_s.so.1
  #8  0x0000000003633602 in reader::reader (this=0x60e0001160c0, r=...) at flat_mutation_reader.cc:214
  #9  0x0000000003655864 in std::make_unique<make_forwardable(flat_mutation_reader)::reader, flat_mutation_reader>(flat_mutation_reader &&) (__args#0=...)
    at /usr/include/c++/7/bits/unique_ptr.h:825
  #10 0x0000000003649a63 in make_flat_mutation_reader<make_forwardable(flat_mutation_reader)::reader, flat_mutation_reader>(flat_mutation_reader &&) (args#0=...)
    at flat_mutation_reader.hh:440
  #11 0x000000000363565d in make_forwardable (m=...) at flat_mutation_reader.cc:270
  #12 0x000000000303f962 in memtable::make_flat_reader (this=0x61300001d540, s=..., range=..., slice=..., pc=..., trace_state_ptr=..., fwd=..., fwd_mr=...)
    at memtable.cc:592

Message-Id: <1528792447-13336-1-git-send-email-tgrabiec@scylladb.com>
2018-06-25 20:03:23 +03:00
Avi Kivity
31eeae0126 Merge "Avoid buffer linearisation in read path" from Paweł
"
The read path on coordinator involves a lot of passing around buffers
and some occasional processing. We start with query::result obtained
from the storage_proxy which is then transformed into a
cql3::result_set, which is then used to write a response. Buffers are
copied and linearised quite excessively.

This series attempts to remedy that by using view of fragmented buffers
as much as possible. The first part deals with reading from
query::result. ser::buffer_view is introduced which enables the IDL
infrastructure to read a buffer without copying or linearising it.
The second part is switching native protocol layer to use bytes_ostream
instead of std::vector<char> to hold the generated response to the
client. The last part introduces cql3::result_generator which is an
alternative to cql3::result_set that passes buffer views without copying
or linearising anything from query::result to the native protocl layer
(or Thrift). It is only used in simple cases, when no processing at the
CQL layer is required, except for paged queries which require some
simple interpretation of the results and are supported by the result
generator.

Tests: unit(release), dtests(paging_test.py paging_additional_test.py
  cql_additional_tests.py cql_tracing_test.py cql_prepared_test.py
  cql_cast_test.py cql_tests.py)
"

* tag 'buffer-views-query-result/v2' of https://github.com/pdziepak/scylla: (34 commits)
  cql3: select_statement: use fetch_page_generator() if possible
  pager: add fetch_page_generator()
  pager: make the visitor handle_result() accepts a template parameter
  pager: make query_result_visitor base class a template parameter
  pager: make myvistor a member class of query_pager
  pager: make shared pointers to selection constant
  pager: merge query_pager and query_pagers::impl
  cql3: select_statement: use result_generator if possible
  cql3: selection: add is_trivial()
  cql3: result: support result_generator
  cql3: add lazy result_generator
  cql3: add result class
  cql3::result_set: fix encapsulation
  thrift: use cql3::result_set visiting interface
  transport: use cql3::result_set visiting interface
  cql3::result_set: add visit()
  transport: response: add write_int_placeholder()
  transport: steal response buffers and make send zero-copy
  transport: use reusable_buffer for compression
  transport: response: use bytes_ostream
  ...
2018-06-25 17:37:50 +03:00
Paweł Dziepak
bdc299cc38 mutation_cleaner: add disclaimer about mutation_partition lifetime
mutation_cleaner has already caused problems by extending lifetime of
mutation_partition past the lifetime of LSA migrators that it uses (due
to the fact that both the cleaner and migrators where thread-local
globals). Since, the long term goal is to make mutation_partition
internal representation depend more and more on schema that lifetime
extension may again cause problems in the future, so let's add a
disclaimer that hopefuly, will help avoiding them.
2018-06-25 09:37:43 +01:00
Paweł Dziepak
55bf9d78a6 lsa: enhance sanitizer for migrators
Current LSA sanitizer performs only basic checks on the migrators use,
without doing any additonal reporting in case an error is detected. This
patch enhances it so that when a problem is detected relevant stack
traces get printed.
2018-06-25 09:37:43 +01:00
Paweł Dziepak
fcd9b1f821 lsa: formalise migrator id requirements
object_descriptor uses special encoding for migrator ids which assumes
that the valid ones are in a range smaller than uint32_t. Let's add some
static asserts that make this fact more visible.
2018-06-25 09:37:43 +01:00
Paweł Dziepak
96b0577343 row_cache: deglobalise row cache tracker
Row cache tracker has numerous implicit dependencies on ohter objects
(e.g. LSA migrators for data held by mutation_cleaner). The fact that
both cache tracker and some of those dependencies are thread local
objects makes it hard to guarantee correct destruction order.

Let's deglobalise cache tracker and put in in the database class.
2018-06-25 09:37:43 +01:00
Paweł Dziepak
2b1fcfe019 cql3: select_statement: use fetch_page_generator() if possible 2018-06-25 09:21:47 +01:00
Paweł Dziepak
1cf3cb285f pager: add fetch_page_generator()
fetch_page_generator() is an equivalent of fetch_page(), but instead of
building a cql3::result_set it returns a cql3::result_generator().
2018-06-25 09:21:47 +01:00
Paweł Dziepak
f6fe831d49 pager: make the visitor handle_result() accepts a template parameter 2018-06-25 09:21:47 +01:00
Paweł Dziepak
fc87ca5926 pager: make query_result_visitor base class a template parameter
So far query_result_visitor was tied to result_set_builder. The goal is
to enable result_generator to work with paged queries as well so we need
to decouple them.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
dc9a65ea76 pager: make myvistor a member class of query_pager
It is going to be come a class template.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
319b2cde7e pager: make shared pointers to selection constant
Shared pointers make code harder to reason about, it is not easy to get
rid of them in this piece of the code, but we can restore at least a bit
of sanity by adding consts.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
327d3de51e pager: merge query_pager and query_pagers::impl
There is just a single implementation of query_pager and there is no
reason to make anything virtual. Devirtualising this code will allow
higher layers to pass visitors via templates.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
fa5dea91e7 cql3: select_statement: use result_generator if possible 2018-06-25 09:21:47 +01:00
Paweł Dziepak
3f1184d16d cql3: selection: add is_trivial()
cql3::result_generator supports only trivial selections.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
adad31ba6b cql3: result: support result_generator
cql3::result can now hold either a result_set or a result_generator.
Some code that is not performance critical expects to get result_set so
a way of converting the result_generator to a result_set is added.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
02443d10db cql3: add lazy result_generator
result_generator is a restricted alternative of result_set. It supports
only the simples cases, but is much cheaper as it passes data almost
directly from query::result to its visitor bypassing much of the CQL
layer.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
dca68afce6 cql3: add result class
So far the only way of returing a result of a CQL query was to build a
result_set. An alternative lazy result generator is going to be
introduced for the simple cases when no transformations at CQL layer are
needed. To do that we need to hide the fact that there are going to be
multiple representations of a cql results from the users.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
29cc4a4c0b cql3::result_set: fix encapsulation 2018-06-25 09:21:47 +01:00
Paweł Dziepak
8f26d9c03f thrift: use cql3::result_set visiting interface 2018-06-25 09:21:47 +01:00
Paweł Dziepak
54d5dc414d transport: use cql3::result_set visiting interface 2018-06-25 09:21:47 +01:00
Paweł Dziepak
2e4234ab63 cql3::result_set: add visit()
This visiting interface for result_set satisfies most of its users (at
least all of those which are in the hot path). It will allow having an
alternative of result_set (i.e. lazy result generator) which would
provide exaclty the same interface.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
c0e7160625 transport: response: add write_int_placeholder()
This allows the response writer to defer writing integers until later
time. It will be used by lazy response generator which will know the
number of rows in the response only after they are all written.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
88aff8eda8 transport: steal response buffers and make send zero-copy
Each response is sent only once, so we can safely steal its buffers and
pass them to the output_stream using the zero-copy interface.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
821e6683e3 transport: use reusable_buffer for compression
Compression algorithms require us to linearise bytes_ostream. This may
cause an excessive number of large allocations. Using reusable_buffers
can avoid that.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
a7c4d407ce transport: response: use bytes_ostream
std::vector<char> is not a very good container for incrementally
building a response. It may cause excessive copies and allocations. If
the response is large it will put more pressure on the memory allocator
by requiring the buffer to be contiguous.

We already have bytes_ostream which avoids all of these problems, so
let's use it.
2018-06-25 09:22:43 +01:00
Paweł Dziepak
c04d38b76b transport: drop response::make_message() 2018-06-25 09:22:35 +01:00
Paweł Dziepak
444acf49af transport: use std::unique_ptr for the response
So far cql_server::response was passed around using shared pointers.
They have very big cost of making it hard to reason about the code. All
that is not necessary and we can easily switch to using much more
sensible std::unique_ptr.
2018-06-25 09:22:24 +01:00
Paweł Dziepak
12f89299b2 transport: move response to a separate header
There are some other translation units which right now are satisfied
with the response being an incomplete type. This means that
std::unique_ptr can't be used for it. Let's move the class declaration
to a header that can be included where needed.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
3b9ba30497 tests: add test for reusable buffers 2018-06-25 09:21:47 +01:00
Paweł Dziepak
b4c5e1a6d4 utils: add reusable_buffer
This commit adds a helper class reusable_buffer which can be used to
avoid excessive memory allocations of large buffers when bytes_ostream
needs to be linearised. The idea is that reusable_buffer in most cases
is going to be thread local so that multiple continuation chains can
reuse the same large buffer.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
8feab33cf4 query::result: use std::optional instead of experimental version 2018-06-25 09:21:47 +01:00
Paweł Dziepak
9d140488bd tests/perf: add performance test for IDL 2018-06-25 09:21:47 +01:00
Paweł Dziepak
4704c4efab query::result: avoid copying and linearising cell value
query::result_view already operates on views of a serialised
query::result. However, until now the value of a cell was always
linearised and copied. This patch makes use of ser::buffer_view to avoid
that.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
982f71a804 query::result_view: add concept 2018-06-25 09:21:47 +01:00
Paweł Dziepak
2914f64b2d serializer: user buffer_view in bytes deserialiser 2018-06-25 09:21:47 +01:00
Paweł Dziepak
19caf709e1 serializer: add view of a fragmented stream
ser::buffer_view is a view of a fragmented buffer in a stream od
IDL-serialised data. It can be used to deserialise IDL objects without
needless copying and linearisation of large blobs.
2018-06-25 09:21:47 +01:00
Paweł Dziepak
fe8dc1fa5c bytes_ostream: add remove_suffix() 2018-06-25 09:21:47 +01:00
Paweł Dziepak
969219d5bc tests/random-utils: add missing include 2018-06-25 09:21:47 +01:00
Paweł Dziepak
a85197a7b5 bytes_ostream: make fragment_iterator default constructible 2018-06-25 09:21:47 +01:00
Piotr Sarna
828497ad19 hints: amend a comment in device limits
To make the comment less confusing, 'group of managers'
is used instead of 'device'.

Refs #3516

Reported-by: Vlad Zolotarov <vladz@scylladb.com>
Signed-off-by: Piotr Sarna <sarna@scylladb.com>
Message-Id: <60c9ab6b47195570f7ce7dff9556e3739b7ae00f.1529862547.git.sarna@scylladb.com>
2018-06-24 19:14:59 +01:00