Commit Graph

1063 Commits

Author SHA1 Message Date
Botond Dénes
d1209c548a Fix -Wreturn-type warnings
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <99f7a006daaa78eb87720ac51c394093398bc868.1504013915.git.bdenes@scylladb.com>
2017-08-29 16:41:09 +03:00
Paweł Dziepak
d5fa07f6df Merge "sstables: switch from deque<> to a custom container" from Avi
Large deques require contiguous storage, which may not be available (or may
be expensive to obtain).  Switch to new custom container instead, which allocates
less contiguous storage.

Allocation problems were observed with the summary and compression info. While
there is work to reduce compression info contiguous space use, this solves
all std::deque problems (and should not conflict with that work).

Fixes #2708

* tag '2708/v6' of https://github.com/avikivity/scylla:
  sstables: switch std::deque to chunked_vector
  tests: add test for chunked_vector
  utils: add a new container type chunked_vector
2017-08-29 11:11:01 +01:00
Botond Dénes
eec451bcf8 segmented_offsets: use _current_bucket_segment_index consistently
Previously _current_bucket_segment_index was used differently depending on
whether update_position_trackers() is used in a random or sequential
access. In the former case was used as the absolute index of the segment
(independent of the buckets) and in the latter as the relative index of
the segment within its bucket. This caused problems when there was a
switch between random and sequential access, meaning one could get different
results for an at() call depending on what was the previous at() call.
Fix this by consistently using _current_bucket_segment_index as - like its
name suggest - the bucket relative segment index.

Ref #1946.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <7f68ac1d32c80e8dea6dfa11be02acaa961bce2a.1503924927.git.bdenes@scylladb.com>
2017-08-28 16:14:25 +03:00
Avi Kivity
fa8d0fe4d0 Revert "Revert "Revert "Revert "Merge "Compress in-memory compression-info" from Botond""""
This reverts commit 238877a0c6.  A fix was found and will be committed
shortly.
2017-08-28 16:14:13 +03:00
Tomasz Grabiec
65e488c150 sstables: Fix abort in mutation reader for certain skip pattern
The problem happens for the following sequence of events:

 1) reader stops in the middle of some partition before it
    skips to another partition range

 2) reader is fast forwarded to a partition range which has no data in
    the sstable. There are some partitions between the previous
    partition range and the one we skip to

 3) the reader is asked for next partition

The problem was that mutation_reader::fast_forward_to() was putting
the reader in _read_enabled == false state in step 2, but
data_consume_context was not fast forwarded to the range. When in step
3 we were asked for the next partition, we attempted to skip using
index (because of 1). The result of the skip was some position which
is outside of the current range of data_consume_context, which causes
it to abort. To fix, add a check for _read_enabled before we try to
skip.
2017-08-28 10:28:15 +02:00
Tomasz Grabiec
dc3c8863f3 sstables: Fix reader returning partition past the query range in some cases
If index was used to skip to the next partition (because the current
partition wasn't consumed in full) and reader's partition range ends
before the data file ends, we did not detect that we're out of range
before returning a streamed_mutation. Fix by checking _context.eof()
before doing that.

Refs #2733.
2017-08-28 10:16:27 +02:00
Tomasz Grabiec
6baad2c2e6 sstables: Introduce data_consume_context::eof() 2017-08-28 09:19:43 +02:00
Avi Kivity
238877a0c6 Revert "Revert "Revert "Merge "Compress in-memory compression-info" from Botond"""
This reverts commit 9d27455744. It's still broken.

To reproduce:

  ./tools/bin/cassandra-stress write -schema compression=LZ4Compressor

(on a clean database)

.0  0x00007ffff32aa69b in raise () from /lib64/libc.so.6
.1  0x00007ffff32ac4a0 in abort () from /lib64/libc.so.6
.2  0x000000000054a0e8 in seastar::memory::abort_on_underflow (size=<optimized out>) at core/memory.cc:1189
.3  seastar::memory::allocate_large (size=<optimized out>) at core/memory.cc:1194
.4  0x000000000054b305 in seastar::memory::allocate (size=size@entry=18446744073702885265) at core/memory.cc:1227
.5  0x000000000054b45e in malloc (n=n@entry=18446744073702885265) at core/memory.cc:1452
.6  0x00000000006013e4 in seastar::temporary_buffer<char>::temporary_buffer (this=0x6010195fc800, size=18446744073702885265) at /home/avi/urchin/seastar/core/temporary_buffer.hh:72
.7  0x0000000000a3908b in seastar::input_stream<char>::read_exactly (this=0x6010053d0248, n=18446744073702885265) at /home/avi/urchin/seastar/core/iostream-impl.hh:189
.8  0x0000000000a9c77f in compressed_file_data_source_impl::get (this=0x6010053d0240) at sstables/compress.cc:499
.9  0x0000000000aa1b01 in seastar::data_source::get (this=<optimized out>) at /home/avi/urchin/seastar/core/iostream.hh:63
.10 seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}::operator()() const (__closure=__closure@entry=0x6010195fcab0) at /home/avi/urchin/seastar/core/iostream-impl.hh:204
.11 0x0000000000aa22f0 in seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::apply<seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}&>(sstables::data_consume_rows_context&&) (func=...) at /home/avi/urchin/seastar/core/future.hh:1312
.12 seastar::repeat<seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}>(sstables::data_consume_rows_context&&) (action=...) at /home/avi/urchin/seastar/core/future-util.hh:203
.13 0x0000000000a9e730 in seastar::input_stream<char>::consume<sstables::data_consume_rows_context> (consumer=..., this=<optimized out>) at /home/avi/urchin/seastar/core/iostream-impl.hh:237
.14 data_consumer::continuous_data_consumer<sstables::data_consume_rows_context>::consume_input<sstables::data_consume_rows_context> (c=..., this=<optimized out>) at sstables/consumer.hh:226
.15 sstables::data_consume_context::impl::read (this=<optimized out>) at sstables/row.cc:411
.16 sstables::data_consume_context::read (this=<optimized out>) at sstables/row.cc:437
.17 0x0000000000aafbae in sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const (__closure=<optimized out>) at sstables/partition.cc:843
.18 seastar::apply_helper<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, std::tuple) (args=..., func=...) at ./seastar/core/apply.hh:36
.19 seastar::apply<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=..., func=...)
    at ./seastar/core/apply.hh:44
.20 seastar::futurize<seastar::future<> >::apply<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=...,
    func=...) at ./seastar/core/future.hh:1302
.21 seastar::future<>::then<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) (
    this=this@entry=0x6010195fcbb0, func=...) at ./seastar/core/future.hh:890
.22 0x0000000000ac273f in sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const (__closure=0x6010195fcc28) at sstables/partition.cc:843
.23 seastar::do_until_continued<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}&&, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}&&, seastar::promise<>) (stop_cond=..., action=..., p=...) at /home/avi/urchin/seastar/core/future-util.hh:155
.24 0x0000000000ac29c3 in seastar::do_until<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}&&, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}&&) (action=..., stop_cond=..., this=<optimized out>) at /home/avi/urchin/seastar/core/future-util.hh:330
.25 sstables::sstable_streamed_mutation::fill_buffer (this=<optimized out>) at sstables/partition.cc:844
.26 0x0000000000ad3d2b in streamed_mutation::fill_buffer (this=0x6010195fcd10) at ./streamed_mutation.hh:489
.27 consume_flattened_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (streamed_mutation const&)> >(mutation_reader&, stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >&, std::function<bool (streamed_mutation const&)>&&) (

(gdb) p addr
$1 = {
  chunk_start = 13330037,
  chunk_len = 18446744073702885265,
  offset = 0
}
2017-08-27 13:32:37 +03:00
Avi Kivity
1f66940134 sstables: switch std::deque to chunked_vector
Reduce susceptibility to memory fragmentation.
2017-08-26 16:44:47 +03:00
Botond Dénes
839d1db4d3 parse(compression): add missing reinterpret_cast<char*>
std::copy_n was using value as uint64_t*, smashing the stack.
Also remove unused variable.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4e2d71fc74326965dfd98bed2347100fb6ebe43b.1503568210.git.bdenes@scylladb.com>
2017-08-24 13:38:03 +03:00
Avi Kivity
9d27455744 Revert "Revert "Merge "Compress in-memory compression-info" from Botond""
This reverts commit 9656fd79a0. A fix is now
available.
2017-08-24 13:37:35 +03:00
Tomasz Grabiec
9656fd79a0 Revert "Merge "Compress in-memory compression-info" from Botond"
This reverts commit ef85cf1cb3, reversing
changes made to de011ece52.

Vlad reports that this causes SIGSEGV on cluster restarts.

seastar::backtrace_buffer::append_backtrace() at /home/vladz/work/urchin/seastar/core/reactor.cc:274
 (inlined by) print_with_backtrace at /home/vladz/work/urchin/seastar/core/reactor.cc:289
seastar::print_with_backtrace(char const*) at /home/vladz/work/urchin/seastar/core/reactor.cc:296
sigsegv_action at /home/vladz/work/urchin/seastar/core/reactor.cc:3512
 (inlined by) operator() at /home/vladz/work/urchin/seastar/core/reactor.cc:3498
 (inlined by) _FUN at /home/vladz/work/urchin/seastar/core/reactor.cc:3494
?? ??:0
operator()<seastar::temporary_buffer<char> > at /home/vladz/work/urchin/sstables/sstables.cc:870
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) do_void_futurize_apply_tuple<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/future.hh:1270
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/future.hh:1290
 (inlined by) then<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)> > at /home/vladz/work/urchin/seastar/core/future.hh:890
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:873
 (inlined by) do_until_continued<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>, sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>&> at /home/vladz/work/urchin/seastar/core/future-util.hh:155
do_until<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>, sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>&> at /home/vladz/work/urchin/seastar/core/future-util.hh:330
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:874
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:1302
then<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:890
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:875
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:1302
operator()<seastar::future_state<> > at /home/vladz/work/urchin/seastar/core/future.hh:900
 (inlined by) run at /home/vladz/work/urchin/seastar/core/future.hh:395
seastar::reactor::run_tasks(seastar::circular_buffer<std::unique_ptr<seastar::task, std::default_delete<seastar::task> >, std::allocator<std::unique_ptr<seastar::task, std::default_delete<seastar::task> > > >&) at /home/vladz/work/urchin/seastar/core/reactor.cc:2317
seastar::reactor::run() at /home/vladz/work/urchin/seastar/core/reactor.cc:2775
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/vladz/work/urchin/seastar/core/app-template.cc:142
2017-08-24 11:44:14 +02:00
Paweł Dziepak
31afc2f242 shared_index_lists: restore indentation
Message-Id: <20170821162934.25386-4-pdziepak@scylladb.com>
2017-08-22 12:09:42 +02:00
Paweł Dziepak
93eaa95378 sstables: make shared_index_lists::get_or_load exception safe
Message-Id: <20170821162934.25386-3-pdziepak@scylladb.com>
2017-08-22 12:09:42 +02:00
Avi Kivity
ef85cf1cb3 Merge "Compress in-memory compression-info" from Botond
"Overly large metadata can hog memory which especially hurts in setups
with bad disk/memory ratio. To ease the pain compress the in-memory
compression-info.

The compression is implemented based on Avi's idea which is to group n
offsets together into segments, where each segment stores a base
absolute offset into the file, the other offsets in the segments being
relative offsets (and thus of reduced size).  Also offsets are allocated
only just enough bits to store their maximum value. The offsets are thus
packed in a buffer like so:
    arrrarrrarrr...
where n is 4, a is an absolute offset and r are offsets relative to a.
This of course means that stored offsets will not be aligned, not even
on a byte boundary, but the size reduction pretty convincing.
In addition, segments are stored in buckets, where each bucket has its
own base offset. In addition, segments in a buckets are optimized to
address as large of a chunk of the data as possible for a given chunk
size."

Ref #1946.

* 'bdenes/compress-compression-v3' of https://github.com/denesb/scylla:
  Add unit test for compress::offsets
  Optimise the storage of compression chunk offsets
  Add script to precompute segmented compression parameters
2017-08-22 10:30:58 +03:00
Botond Dénes
028c7a0888 Optimise the storage of compression chunk offsets
To reduce the memory footprint of compression-info, n offsets are
grouped together into segments, where each segment stores a base
absolute offset into the file, the other offsets in the segments being
relative offsets (and thus of reduced size). Also offsets are
allocated only just enough bits to store their maximum value. The
offsets are thus packed in a buffer like so:
     arrrarrrarrr...
where n is 4, a is an absolute offset and r are offsets relative to a.

The optimal value of n can be calculated for a given file_size (f) and
chunk_size (c), by finding the minima of the following function:

f(n) = (f/c)/n * (log2(f) + (n - 1)*log2((n-1)*(c + 64)))

This is done in an empirical way, using a script (see below).

Furthermore segments are stored in buckets, where each bucket has its
own base offset. Each bucket therefore can address an equal chunk of the
file and furthermore each segment in a bucket can address an equal
sub-chunk of this area.
The value of a given offset i is thus:
    bucket_base_offset_for(i) + segment_base_offset_for(i) + offset(i)

To account for the bucketed storage we calculate a local_f, which is
optimized so that a bucketful of segmented offsets can address the
largest possible chunk of f. As value of this local_f only depends on
the bucket_size (b) and c the value of n can be made independent of f
and therefore only depend on one dynamic value, c. This makes life much
simpler as we don't need to know the size of the file up-front, we can
just append buckets to the storage on demand, while the required storage
is still less than a third [1] of the original storage requirements
(std::deque<uint64>).

The table with the minima(f(n)) for different f and c values is
pre-computed by gen_segmented_compress_params.py and
stored in sstables/segmented_compress_params.hh. This script also
creates a table with the best values of local_f for the given
bucket_size. At runtime we only select the best params based on c.

[1] This was calculated for c=4K and b=4K
2017-08-21 17:06:12 +03:00
Avi Kivity
9f415ef870 sstables: accurate summary entry size calculation
Calculate the summary entry size correctly, so we don't end up with oversize
summaries.
Message-Id: <20170819184255.14181-2-avi@scylladb.com>
2017-08-21 14:28:57 +02:00
Avi Kivity
17c372bf0e sstables: get rid of 64kB minimum index advance to generate summary
Limiting summary entry generation to at most one summary entry
per 64k of index data can lead to large index pages, with thousands
of index entries per summary entry. These are slow to parse, and there
is no real gain from the limit, since we already enforce a size
limit on the summary.

Remove the limit and allow summary entry generation based solely on
spanned data size.

Fixes #2711.
Message-Id: <20170819184255.14181-1-avi@scylladb.com>
2017-08-21 14:26:44 +02:00
Raphael S. Carvalho
10eaa2339e compaction: Make resharding go through compaction manager
Two reasons for this change:
1) every compaction should be multiplexed to manager which in turn
will make decision when to schedule. improvements on it will
immediately benefit every existing compaction type.
2) active tasks metric will now track ongoing reshard jobs.

Fixes #2671.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170817224334.6402-1-raphaelsc@scylladb.com>
2017-08-20 11:35:14 +03:00
Paweł Dziepak
784dcbf1ca sstables: initialise index metrics on all shards
Fixes #2702.

Message-Id: <20170816085454.21554-1-pdziepak@scylladb.com>
2017-08-16 15:44:26 +03:00
Botond Dénes
611774b1d9 Use the incremental reader for compaction
As leveled compaction strategy stands to gain the most from
incrementally opening sstables.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <292648d3fa4ea97376c0b4360754a20132194f63.1502822066.git.bdenes@scylladb.com>
2017-08-15 21:38:04 +03:00
Duarte Nunes
7fb6a74302 combined_mutation_reader: Drop exhausted readers if not in FF mode
Exhausted readers can be fast forwarded, so we have to keep them
around. However, if the current reader is not fast forwardable, then
we can drop those readers and their buffers.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-14 14:37:27 +02:00
Raphael S. Carvalho
050a7019b8 sstables/index_reader: fix index reader for summary entry spanning lots of keys
quantity prevents index_reader from reading all index entries of a summary
entry that span more than min_index_interval entries. That can happen after
introduction of size-based sampling, and consequently, sstable will not be
able to return a key which logical position in summary entry is beyond
min_index_interval. It's ok to not use quantity because index_reader will
read all indexes until either next summary entry or end of file is reached.

Fixes test_sstable_conforms_to_mutation_source

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170812045821.25269-1-raphaelsc@scylladb.com>
2017-08-12 09:44:16 +03:00
Raphael S. Carvalho
872412d31a db/config: introduce sstable_summary_ratio option
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-08-11 01:36:21 -03:00
Raphael S. Carvalho
8726ee937d sstables: introduce size-based sampling for sstable summary
Currently, a summary entry is added after min_index_interval index
entries were written. Not taking into account size of index entries
becomes a problem with large partitions which may create big index
entries due to promoted indexes. Read performance is affected as a
consequence because index entries spanned by summary are all read
from disk to serve request.

What we wanna do is to also add a summary entry after index reaches
a boundary. To deal with oversampling, we want to write 1 byte to
summary for every 2000 bytes written to data file (this will be
eventually made into an option in the config file).
Both conditions must be met to avoid under or oversampling.
That way, the amount of data needed from index file to satify the
request is drastically reduced.

Fixes #1842.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-08-11 00:30:12 -03:00
Raphael S. Carvalho
da7489720b sstables: make components_writer::offset const qualified and uint64_t
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-08-10 21:48:11 -03:00
Raphael S. Carvalho
881c479be8 sstables: make writer::offset const qualified and uint64_t
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-08-10 21:46:39 -03:00
Botond Dénes
94fc550e68 sstable_set::incremental_selector: select() now returns a selection
A seletion contains - in addition to the list of sstables - a next_token
which is a hint as to what is the next best token to call select() with.
This should be the smallest token such that at the next call to
select() the least number of new sstables will be returned, without
skipping any.
2017-08-09 16:27:33 +03:00
Raphael S. Carvalho
dddbd34b52 sstables: close index file when sstable writer fails
index's file output stream uses write behind but it's not closed
when sstable write fails and that may lead to crash.
It happened before for data file (which is obviously easier to
reproduce for it) and was fixed by 0977f4fdf8.

Fixes #2673.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170807171146.10243-1-raphaelsc@scylladb.com>
2017-08-08 09:53:14 +03:00
Duarte Nunes
569bbf2edd sstables/sstables: Use per-cpu noop_write_monitor
We employ a thread-per-core architecture, so don't go about sharing
seastar::shared_ptrs across cpus.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170801144153.17354-1-duarte@scylladb.com>
2017-08-01 18:10:49 +03:00
Avi Kivity
db7329b1cb Merge "Ensure correct EOC for PI block cell names" from Duarte
"This series ensures the always write correct cell names to promoted
index cell blocks, taking into account the eoc of range tombstones.

Fixes #2333"

* 'pi-cell-name/v1' of github.com:duarten/scylla:
  tests/sstable_mutation_test: Test promoted index blocks are monotonic
  sstables: Consider eoc when flushing pi block
  sstables: Extract out converting bound_kind to eoc
2017-08-01 18:09:07 +03:00
Avi Kivity
1e8bb972b6 compaction: fix iteration in leveled compaction droppable tombstones loop
Since get_level_count() is unsigned, it will never be negative, and
the loop may never terminate.

Message-Id: <20170719133502.13316-1-avi@scylladb.com>
2017-08-01 13:40:36 +03:00
Avi Kivity
ba2e170e4b compaction: fix return in leveled compaction droppable tombstones loop
If the loop ever terminates, we need to return something.

Message-Id: <20170719133508.13374-1-avi@scylladb.com>
2017-08-01 13:33:02 +03:00
Duarte Nunes
1a33cc6847 sstables: Release the flush permit before fsyncing
This allows a queued flush to start while we fsync the current
sstable, which helps reduce the overall time new writes are blocked on
dirty memory.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-31 12:40:19 +02:00
Duarte Nunes
784a078e72 sstables: Introduce write_monitor
The write_monitor provides callbacks to inform an observer of the
state of the ongoing sstable write.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-31 12:40:19 +02:00
Avi Kivity
e855a28fae Revert "Merge "memtable flush: Fixes and improvements" from Duarte"
This reverts commit 733a64a1df, reversing
changes made to e11e66723a.

Breaks sstable_test and perf_fast_forward.
2017-07-31 12:44:28 +03:00
Duarte Nunes
5e64839e85 sstables: Release the flush permit before fsyncing
This allows a queued flush to start while we fsync the current
sstable, which helps reduce the overall time new writes are blocked on
dirty memory.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 21:09:18 +02:00
Duarte Nunes
a737577881 sstables: Introduce write_monitor
The write_monitor provides callbacks to inform an observer of the
state of the ongoing sstable write.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 21:09:18 +02:00
Duarte Nunes
06728bdfe9 sstables: Consider eoc when flushing pi block
When flushing a promoted index block using a range tombstone cell name
as a bound, use the right eoc value instead of always writing
composite::eoc::none.

Fixes #2333

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 18:23:58 +02:00
Duarte Nunes
718517ed91 sstables: Extract out converting bound_kind to eoc
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 18:23:58 +02:00
Paweł Dziepak
7b0f75c0d1 sstables: avoid indirect calls to abstract_type::is_multi_cell() 2017-07-26 14:38:27 +01:00
Paweł Dziepak
28c105e4a7 sstables: avoid copying key components 2017-07-26 14:38:27 +01:00
Paweł Dziepak
960a140880 index_reader: advance_and_check_if_present() use index_comparator 2017-07-26 14:36:37 +01:00
Paweł Dziepak
dc7bad9a50 sstables: cache token in index entries
When a sstable reader is fast forwarded some index entries may be read
(and compared) multiple times. This patch makes sure that once a token
is computed we keep it around and reuse if the entry is accessed again.
2017-07-26 14:36:37 +01:00
Paweł Dziepak
bfb7b56c74 sstable: keep a pre-computed token in summary_entry
Each sstable index lookup involves a binary search in the summary and
each time a partition key of summary entry is compared with anything its
token needs to be calculated.
Since we keep summary in the memory all the time it is better to also
keep the tokens around.
2017-07-26 14:36:36 +01:00
Paweł Dziepak
31d7cfdefb sstables: introduce decorated_key_view 2017-07-26 14:36:36 +01:00
Paweł Dziepak
e0a04cb7fe sstables: make sure that fill_buffer() actually fills buffer
streamed_mutation::impl::fill_buffer() is supposed to either push
mutation fragments to the buffer or set EOS flag. However, it was
possible that mp_row_consumer would return proceed::no if a skip was
needed without satisfying any of these conditions.
2017-07-26 14:36:36 +01:00
Avi Kivity
c5ee62a6a4 Merge "restrict background writers with scheduling groups" from Glauber
"This patchset restricts background writers - such as compactions,
streaming flushes and memtable flushes to a maximum amount of CPU usage
through a seastar::thread_scheduling_group.

The said maximum is recommended to be set  50 % - it is default
disabled, but can be adjusted through a configuration option until we
are able to auto-tune this.

The second patch in this series provides a preview on how such auto-tune
would look like. By implementing a simple controller we automatically
adjust the quota for the memtable writer processes, so that the rate at
which bytes come in is equal to the rates at which bytes are flushed.

Tail latencies are greatly reduced by this series, and heavy spikes that
previously appeared on CPU-bound workloads are no more."

* 'memtable-controller-v5' of https://github.com/glommer/scylla:
  simple controller for memtable/streaming writer shares.
  restrict background writers to 50 % of CPU.
2017-07-20 10:58:53 +03:00
Tomasz Grabiec
a9237c1666 schema: Revert back to the 1.7 layout of static compact tables in memory
We are using C* 3.x compatible layout in schema tables but want to
keep using the 1.7 layout in memory for compatibility during rolling
upgrade. This patch switches the schema and schema_builder classes
back to the old layout. Translation of layout happens when converting
to/from schema mutations.

Notable changes:

 1) Includes a revert of commit 6260f31e08
    "thrift: Update CQL mapping of static CFs".

 2) Brings back the "default_validation_class" schema attribute. In v3
    it can be dervied from column definitions, but in v2 it can't, so
    we have to store it.

 3) legacy_schema_migrator and schema_builder don't have to do
    conversions to v3, this is now handled by the v3_columns
    class. schema_builder works with the same layout as schema, that
    is v2.

 4) Includes a revert of commit 66991a7ccb
    "v3 schema test fixes"

Fixes #2555.
2017-07-19 09:52:15 +02:00
Raphael S. Carvalho
7ecedac222 compaction: wire up time window compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-19 02:58:37 -03:00