Commit Graph

12947 Commits

Author SHA1 Message Date
Tomasz Grabiec
dc3c8863f3 sstables: Fix reader returning partition past the query range in some cases
If index was used to skip to the next partition (because the current
partition wasn't consumed in full) and reader's partition range ends
before the data file ends, we did not detect that we're out of range
before returning a streamed_mutation. Fix by checking _context.eof()
before doing that.

Refs #2733.
2017-08-28 10:16:27 +02:00
Tomasz Grabiec
6baad2c2e6 sstables: Introduce data_consume_context::eof() 2017-08-28 09:19:43 +02:00
Avi Kivity
238877a0c6 Revert "Revert "Revert "Merge "Compress in-memory compression-info" from Botond"""
This reverts commit 9d27455744. It's still broken.

To reproduce:

  ./tools/bin/cassandra-stress write -schema compression=LZ4Compressor

(on a clean database)

.0  0x00007ffff32aa69b in raise () from /lib64/libc.so.6
.1  0x00007ffff32ac4a0 in abort () from /lib64/libc.so.6
.2  0x000000000054a0e8 in seastar::memory::abort_on_underflow (size=<optimized out>) at core/memory.cc:1189
.3  seastar::memory::allocate_large (size=<optimized out>) at core/memory.cc:1194
.4  0x000000000054b305 in seastar::memory::allocate (size=size@entry=18446744073702885265) at core/memory.cc:1227
.5  0x000000000054b45e in malloc (n=n@entry=18446744073702885265) at core/memory.cc:1452
.6  0x00000000006013e4 in seastar::temporary_buffer<char>::temporary_buffer (this=0x6010195fc800, size=18446744073702885265) at /home/avi/urchin/seastar/core/temporary_buffer.hh:72
.7  0x0000000000a3908b in seastar::input_stream<char>::read_exactly (this=0x6010053d0248, n=18446744073702885265) at /home/avi/urchin/seastar/core/iostream-impl.hh:189
.8  0x0000000000a9c77f in compressed_file_data_source_impl::get (this=0x6010053d0240) at sstables/compress.cc:499
.9  0x0000000000aa1b01 in seastar::data_source::get (this=<optimized out>) at /home/avi/urchin/seastar/core/iostream.hh:63
.10 seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}::operator()() const (__closure=__closure@entry=0x6010195fcab0) at /home/avi/urchin/seastar/core/iostream-impl.hh:204
.11 0x0000000000aa22f0 in seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::apply<seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}&>(sstables::data_consume_rows_context&&) (func=...) at /home/avi/urchin/seastar/core/future.hh:1312
.12 seastar::repeat<seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}>(sstables::data_consume_rows_context&&) (action=...) at /home/avi/urchin/seastar/core/future-util.hh:203
.13 0x0000000000a9e730 in seastar::input_stream<char>::consume<sstables::data_consume_rows_context> (consumer=..., this=<optimized out>) at /home/avi/urchin/seastar/core/iostream-impl.hh:237
.14 data_consumer::continuous_data_consumer<sstables::data_consume_rows_context>::consume_input<sstables::data_consume_rows_context> (c=..., this=<optimized out>) at sstables/consumer.hh:226
.15 sstables::data_consume_context::impl::read (this=<optimized out>) at sstables/row.cc:411
.16 sstables::data_consume_context::read (this=<optimized out>) at sstables/row.cc:437
.17 0x0000000000aafbae in sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const (__closure=<optimized out>) at sstables/partition.cc:843
.18 seastar::apply_helper<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, std::tuple) (args=..., func=...) at ./seastar/core/apply.hh:36
.19 seastar::apply<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=..., func=...)
    at ./seastar/core/apply.hh:44
.20 seastar::futurize<seastar::future<> >::apply<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=...,
    func=...) at ./seastar/core/future.hh:1302
.21 seastar::future<>::then<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) (
    this=this@entry=0x6010195fcbb0, func=...) at ./seastar/core/future.hh:890
.22 0x0000000000ac273f in sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const (__closure=0x6010195fcc28) at sstables/partition.cc:843
.23 seastar::do_until_continued<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}&&, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}&&, seastar::promise<>) (stop_cond=..., action=..., p=...) at /home/avi/urchin/seastar/core/future-util.hh:155
.24 0x0000000000ac29c3 in seastar::do_until<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}&&, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}&&) (action=..., stop_cond=..., this=<optimized out>) at /home/avi/urchin/seastar/core/future-util.hh:330
.25 sstables::sstable_streamed_mutation::fill_buffer (this=<optimized out>) at sstables/partition.cc:844
.26 0x0000000000ad3d2b in streamed_mutation::fill_buffer (this=0x6010195fcd10) at ./streamed_mutation.hh:489
.27 consume_flattened_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (streamed_mutation const&)> >(mutation_reader&, stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >&, std::function<bool (streamed_mutation const&)>&&) (

(gdb) p addr
$1 = {
  chunk_start = 13330037,
  chunk_len = 18446744073702885265,
  offset = 0
}
2017-08-27 13:32:37 +03:00
Avi Kivity
576e33149f Merge seastar upstream
* seastar 0083ee8...85ca12d (1):
  > Merge "Run-time logging configuration" from Jesse

Includes patch from Jesse:

"Switch to Seastar for logging option handling

In addition to updating the abstraction layer for Seastar logging in `log.hh`,
the configuration system (`db/config.{hh,cc}`) has been updated in two ways:

- The string-map type for Boost.program_options is now defined in Seastar.

- A configuration value can be marked as `UsedFromSeastar`. This is like `Used`,
  except the option is expected to be defined in the Boost.Program_options
  description for Seastar. If the option is not defined in Seastar, or it is
  defined with a different type, then a run-time exception is thrown early in
  Scylla's initialization. This is necessary because logging options which are
  now defined in Seastar were previously defined in Scylla and support for these
  options in the YAML file cannot be dropped. In order to be able to verify that
  options marked `UsedFromSeastar` are actually defined in Seastar, the
  interface for adding options to `db::config` has changed from taking a
  `boost::program_options::options_description_easy_init` (which is handle into
  a `boost::program_options::options_description` which only allows adding
  options) to taking a `boost::program_options::options_description`
  directly (which also allows querying existing options).

Scylla also fully defers to Seastar's support for run-time logging
configuration."

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <ef26cffb91bef1ae95d508187a6dd861a6c4fc84.1503344007.git.jhaberku@scylladb.com>
2017-08-27 13:11:33 +03:00
Avi Kivity
4f5b5bc8e6 Merge seastar upstream
* seastar b9f1eb7...0083ee8 (1):
  > http: Add MIME type support for JSON
2017-08-27 13:09:04 +03:00
Jesse Haber-Kucharsky
af95d3baa7 db/config.cc: Remove unused function
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5a4e4e153c2d87e838d1cf6def7a494a92a72f63.1503344007.git.jhaberku@scylladb.com>
2017-08-27 13:08:19 +03:00
Vlad Zolotarov
9b9f19606f scylla_cpuset_setup: add the description near the perftune.yaml removing
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1503600250-25169-1-git-send-email-vladz@scylladb.com>
2017-08-27 12:51:12 +03:00
Asias He
68346f7e53 repair: Use with_semaphore for sp_parallelism_semaphore
Instead of calling semaphore.signal() manually.

Message-Id: <51b7ecdebac91763a2340fe00959742810614845.1503648936.git.asias@scylladb.com>
2017-08-27 12:50:38 +03:00
Avi Kivity
2b3ee4b0a7 Merge "make cf drop more robust" from Glauber
"We have recently found two problems with the drop_column_family code
that needs addressing. The first is that exceptions in truncate() may
lead to stop() being skipped, which can cause Scylla to crash.

The other is that a truncate() issued before drop_column_family may get
the chance to execute only after the column family is already dropped
and also crash (That is issue 2726).

The second problem is the classic problem of asynchronous execution on
an object that may terminate, which we have been traditionally solving
with a gate. We add a gate to the column family that will be closed
during CF stop(), and we will require all asychronous operations to
enter it.

The immediate fix is for truncate(), where we have seen a real, concrete
problem. But it would be good to audit other code paths to make sure
that they are sane.

The most obvious ones, flush, compaction and sstable deletion are
already sane, since they are waited on explicitly during stop()."

Fixes #2726.

* 'issue-2726-v2-master' of github.com:glommer/scylla:
  database: add gate for generic async operations to column family
  database: make sure that column family is always stopped when dropped
2017-08-27 12:42:20 +03:00
Tomasz Grabiec
2ca99be27d ring_position_view: Print token instead of token pointer
Broken in e989d65539.
Message-Id: <1503667158-7544-1-git-send-email-tgrabiec@scylladb.com>
2017-08-25 14:25:21 +01:00
Glauber Costa
83323e155e database: add gate for generic async operations to column family
run_with_compaction_disabled(), which is called by truncate, has a
pretty large defer point in remove(). When the code gets to finally
execute, we can't guarantee that the column family will still be alive.

That is true in particular if we issued a drop table command following
truncate: by the time truncate gets to resume, the CF will be gone.
Before the column family is dropped, it will always call its stop()
method, which means we have an opportunity to do some waiting there. We
already wait for flushes and current compactions to end.

Traditionally, we have been solving similar problems by adding a gate
that will catch asynchronous operations and making sure that potentially
asynchronous operations will enter the gate before executing. Let's do
the same thing here. We will close() the gate during stop().

Fixes #2726

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-08-24 13:12:57 -04:00
Glauber Costa
d090e7be35 database: make sure that column family is always stopped when dropped
truncate can throw exceptions. If it does, cf->stop() will never be
called because it is contained in a .then clause instead of finally.

One of the things that truncate does - in a finally block of its own -
is initiate a final compaction. If it returns an exception nobody will
wait for that compaction to finish (since cf->stop() is the one doing
that) and we'll crash.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-08-24 13:01:47 -04:00
Avi Kivity
40aeb00151 Merge "consider the pre-existing cpuset.conf when configuring networking mode" from Vlad
"Preserve the networking configuration mode during the upgrade by generating the /etc/scylla.d/perftune.yaml
file and using it."

Fixes #2725.

* 'dist_respect_cpuset_conf-v3' of https://github.com/vladzcloudius/scylla:
  scylla_prepare: respect the cpuset.conf when configuring the networking
  scylla_cpuset_setup: rm perftune.yaml
  scylla_cpuset_setup: add a missing "include" of scylla_lib.sh
2017-08-24 18:53:22 +03:00
Vlad Zolotarov
c72eb34b89 scylla_prepare: respect the cpuset.conf when configuring the networking
Choose the networking configuration mode according to the current contents of /etc/scylla.d/cpuset.conf.

If it doesn't exist - use the default mode.
If it exists - use the mode that has been used for generation of the CPU set.

Store the configuration into the /etc/scylla.d/perftune.yaml

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-08-24 09:09:40 -04:00
Vlad Zolotarov
89285a13ac scylla_cpuset_setup: rm perftune.yaml
scylla_setup resets our configuration and perftune.yaml is a part of it.
perftune.yaml is generated based on the contents of cpuset.conf therefore we should reset
these together.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-08-24 09:09:40 -04:00
Vlad Zolotarov
d0ccfe34b9 scylla_cpuset_setup: add a missing "include" of scylla_lib.sh
The scylla_cpuset_setup uses a verify_args() function that is defined in the scylla_lib.sh.

Fixes #2716

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-08-24 09:09:40 -04:00
Paweł Dziepak
1006a946e8 mvcc: allow invoking maybe_merge_versions() inside allocating section
Message-Id: <20170823083544.4225-1-pdziepak@scylladb.com>
2017-08-24 14:30:38 +02:00
Botond Dénes
839d1db4d3 parse(compression): add missing reinterpret_cast<char*>
std::copy_n was using value as uint64_t*, smashing the stack.
Also remove unused variable.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4e2d71fc74326965dfd98bed2347100fb6ebe43b.1503568210.git.bdenes@scylladb.com>
2017-08-24 13:38:03 +03:00
Avi Kivity
9d27455744 Revert "Revert "Merge "Compress in-memory compression-info" from Botond""
This reverts commit 9656fd79a0. A fix is now
available.
2017-08-24 13:37:35 +03:00
Tomasz Grabiec
9656fd79a0 Revert "Merge "Compress in-memory compression-info" from Botond"
This reverts commit ef85cf1cb3, reversing
changes made to de011ece52.

Vlad reports that this causes SIGSEGV on cluster restarts.

seastar::backtrace_buffer::append_backtrace() at /home/vladz/work/urchin/seastar/core/reactor.cc:274
 (inlined by) print_with_backtrace at /home/vladz/work/urchin/seastar/core/reactor.cc:289
seastar::print_with_backtrace(char const*) at /home/vladz/work/urchin/seastar/core/reactor.cc:296
sigsegv_action at /home/vladz/work/urchin/seastar/core/reactor.cc:3512
 (inlined by) operator() at /home/vladz/work/urchin/seastar/core/reactor.cc:3498
 (inlined by) _FUN at /home/vladz/work/urchin/seastar/core/reactor.cc:3494
?? ??:0
operator()<seastar::temporary_buffer<char> > at /home/vladz/work/urchin/sstables/sstables.cc:870
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) do_void_futurize_apply_tuple<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/future.hh:1270
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/future.hh:1290
 (inlined by) then<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)> > at /home/vladz/work/urchin/seastar/core/future.hh:890
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:873
 (inlined by) do_until_continued<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>, sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>&> at /home/vladz/work/urchin/seastar/core/future-util.hh:155
do_until<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>, sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>&> at /home/vladz/work/urchin/seastar/core/future-util.hh:330
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:874
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:1302
then<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:890
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:875
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:1302
operator()<seastar::future_state<> > at /home/vladz/work/urchin/seastar/core/future.hh:900
 (inlined by) run at /home/vladz/work/urchin/seastar/core/future.hh:395
seastar::reactor::run_tasks(seastar::circular_buffer<std::unique_ptr<seastar::task, std::default_delete<seastar::task> >, std::allocator<std::unique_ptr<seastar::task, std::default_delete<seastar::task> > > >&) at /home/vladz/work/urchin/seastar/core/reactor.cc:2317
seastar::reactor::run() at /home/vladz/work/urchin/seastar/core/reactor.cc:2775
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/vladz/work/urchin/seastar/core/app-template.cc:142
2017-08-24 11:44:14 +02:00
Alexys Jacob
a133290694 scylla_io_setup: migrate away from deprecated string.atoi
Python 2.0 deprecated string.atoi and we should move away from it
as stated here: https://docs.python.org/2/library/string.html#string.atoi

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170817134002.28124-1-ultrabug@gentoo.org>
2017-08-24 12:36:34 +03:00
Avi Kivity
dcac7125fe Merge seastar upstream
* seastar e96881a...b9f1eb7 (9):
  > httpd: indentation patch
  > httpd: handle exception when shutting down
  > stall-detector: Allow backtrace throttling to be configured
  > stall-detector: Fix messages about suppresssion not appearing
  > scripts: posix_net_conf.sh: allow passing a perftune.py configuration file as a parameter
  > scripts: perftune.py: add the possibility to pass the parameters in a configuration file and print the YAML file with the current configuration
  > scripts: perftune.py: actually use the number of Rx queues when comparing to the number of CPU threads
  > core: make current_backtrace() noexcept
  > memory: add large allocation detector stubs for default allocator
2017-08-24 11:35:28 +03:00
Piotr Jastrzebski
477068d2c3 Make streamed_mutation more exception safe
Make sure that push_mutation_fragment leaves
_buffer_size with a correct value if exception
is thrown from emplace_back.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <83398412aa78332d88d91336b79140aecc988602.1503474403.git.piotr@scylladb.com>
2017-08-23 09:37:04 +01:00
Avi Kivity
2f41ed8493 Merge "repair: Do not allow repair until node is in NORMAL status" from Asias
Fixes #2723.

* tag 'asias/repair_issue_2723_v1' of github.com:cloudius-systems/seastar-dev:
  repair: Do not allow repair until node is in NORMAL status
  gossip: Add is_normal helper
2017-08-23 09:44:45 +03:00
Asias He
69c81bcc87 repair: Do not allow repair until node is in NORMAL status
The following backtrace was reported by user when running repair and keeping restarting the node at the same time.

 #0  0x00007eff077281d7 in raise () from /lib64/libc.so.6
 #1  0x00007eff07729a08 in abort () from /lib64/libc.so.6
 #2  0x00007eff07721146 in __assert_fail_base () from /lib64/libc.so.6
 #3  0x00007eff077211f2 in __assert_fail () from /lib64/libc.so.6
 #4  0x00000000010ef2c2 in locator::token_metadata::first_token_index (this=0x641000214e98, start=...) at locator/token_metadata.cc:133
 #5  0x00000000010ef2d9 in locator::token_metadata::first_token (this=0x641000214e98, start=...) at locator/token_metadata.cc:143
 #6  0x00000000010e329d in locator::abstract_replication_strategy::get_natural_endpoints (this=0x641000494000, search_token=...)
     at locator/abstract_replication_strategy.cc:66
 #7  0x0000000001481186 in get_neighbors (hosts=std::vector of length 0, capacity 0, data_centers=std::vector of length 0, capacity 0,
     range=<error reading variable: access outside bounds of object referenced via synthetic pointer>, ksname=..., db=...) at repair/repair.cc:196
 #8  repair_range<nonwrapping_range<dht::token> > (range=..., ri=...) at repair/repair.cc:781
 #9  <lambda(auto:99&)>::<lambda(auto:100&&)>::<lambda(auto:101&)>::<lambda()>::operator() (__closure=0x7efec07f7460) at repair/repair.cc:1005
 #10 futurize<future<bool_class<stop_iteration_tag> > >::apply<repair_ranges(repair_info)::<lambda(auto:99&)>::

It is reproduced with

1) while true; do curl -X POST --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/ks3"; done

2) start node 127.0.0.1, stop node 127.0.0.1 in a loop

The problem is, during boot up, the token_metadata is not replicated to all shards until
the node goes into NORMAL status.

To fix, check until node is in NORMAL status before allowing repair.

Fixes #2723
2017-08-23 14:40:04 +08:00
Asias He
65912dd1ac gossip: Add is_normal helper
It will be used by repair to check if a node is in NORMAL status.
2017-08-23 14:40:04 +08:00
Amnon Heiman
abbd78367c Add configuration to disable per keyspace and column family metrics
The number of keysapce and column family metrics reported is
proportional to the number of shards times the number of keysapce/column
families.

This can cause a performance issue both on the reporting system and on
the collecting system.

This patch adds a configuration flag (set to false by default) to enable
or disable those metrics.

Fixes #2701

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170821113843.1036-1-amnon@scylladb.com>
2017-08-22 19:19:54 +03:00
Botond Dénes
4f42acc956 abstract_marker::raw::prepare: add missing return statement
The function doesn't return a value in the all-false branch.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <3c1976682ffc190d741c066d942b83be4463cae8.1503402721.git.bdenes@scylladb.com>
2017-08-22 15:06:18 +03:00
Paweł Dziepak
9d82a1ebfd abstract_read_executor: make make_requests() exception safe
Message-Id: <20170821162934.25386-5-pdziepak@scylladb.com>
2017-08-22 12:09:42 +02:00
Paweł Dziepak
31afc2f242 shared_index_lists: restore indentation
Message-Id: <20170821162934.25386-4-pdziepak@scylladb.com>
2017-08-22 12:09:42 +02:00
Paweł Dziepak
93eaa95378 sstables: make shared_index_lists::get_or_load exception safe
Message-Id: <20170821162934.25386-3-pdziepak@scylladb.com>
2017-08-22 12:09:42 +02:00
Avi Kivity
ef85cf1cb3 Merge "Compress in-memory compression-info" from Botond
"Overly large metadata can hog memory which especially hurts in setups
with bad disk/memory ratio. To ease the pain compress the in-memory
compression-info.

The compression is implemented based on Avi's idea which is to group n
offsets together into segments, where each segment stores a base
absolute offset into the file, the other offsets in the segments being
relative offsets (and thus of reduced size).  Also offsets are allocated
only just enough bits to store their maximum value. The offsets are thus
packed in a buffer like so:
    arrrarrrarrr...
where n is 4, a is an absolute offset and r are offsets relative to a.
This of course means that stored offsets will not be aligned, not even
on a byte boundary, but the size reduction pretty convincing.
In addition, segments are stored in buckets, where each bucket has its
own base offset. In addition, segments in a buckets are optimized to
address as large of a chunk of the data as possible for a given chunk
size."

Ref #1946.

* 'bdenes/compress-compression-v3' of https://github.com/denesb/scylla:
  Add unit test for compress::offsets
  Optimise the storage of compression chunk offsets
  Add script to precompute segmented compression parameters
2017-08-22 10:30:58 +03:00
Botond Dénes
62c18da35c Add unit test for compress::offsets 2017-08-21 17:06:20 +03:00
Botond Dénes
028c7a0888 Optimise the storage of compression chunk offsets
To reduce the memory footprint of compression-info, n offsets are
grouped together into segments, where each segment stores a base
absolute offset into the file, the other offsets in the segments being
relative offsets (and thus of reduced size). Also offsets are
allocated only just enough bits to store their maximum value. The
offsets are thus packed in a buffer like so:
     arrrarrrarrr...
where n is 4, a is an absolute offset and r are offsets relative to a.

The optimal value of n can be calculated for a given file_size (f) and
chunk_size (c), by finding the minima of the following function:

f(n) = (f/c)/n * (log2(f) + (n - 1)*log2((n-1)*(c + 64)))

This is done in an empirical way, using a script (see below).

Furthermore segments are stored in buckets, where each bucket has its
own base offset. Each bucket therefore can address an equal chunk of the
file and furthermore each segment in a bucket can address an equal
sub-chunk of this area.
The value of a given offset i is thus:
    bucket_base_offset_for(i) + segment_base_offset_for(i) + offset(i)

To account for the bucketed storage we calculate a local_f, which is
optimized so that a bucketful of segmented offsets can address the
largest possible chunk of f. As value of this local_f only depends on
the bucket_size (b) and c the value of n can be made independent of f
and therefore only depend on one dynamic value, c. This makes life much
simpler as we don't need to know the size of the file up-front, we can
just append buckets to the storage on demand, while the required storage
is still less than a third [1] of the original storage requirements
(std::deque<uint64>).

The table with the minima(f(n)) for different f and c values is
pre-computed by gen_segmented_compress_params.py and
stored in sstables/segmented_compress_params.hh. This script also
creates a table with the best values of local_f for the given
bucket_size. At runtime we only select the best params based on c.

[1] This was calculated for c=4K and b=4K
2017-08-21 17:06:12 +03:00
Avi Kivity
de011ece52 main: deprecate non-murmur3 partitioners more forcefully
Some (most?) users don't read logs or release notes, so they won't notice
that the ByteOrdered and Random partitioners were deprecated in 2.0. Make
them notice by refusing to start with a deprecated partitioner, unless a
switch is explicitly enabled.
Message-Id: <20170820073424.8331-1-avi@scylladb.com>
2017-08-21 14:32:22 +02:00
Avi Kivity
9f415ef870 sstables: accurate summary entry size calculation
Calculate the summary entry size correctly, so we don't end up with oversize
summaries.
Message-Id: <20170819184255.14181-2-avi@scylladb.com>
2017-08-21 14:28:57 +02:00
Avi Kivity
17c372bf0e sstables: get rid of 64kB minimum index advance to generate summary
Limiting summary entry generation to at most one summary entry
per 64k of index data can lead to large index pages, with thousands
of index entries per summary entry. These are slow to parse, and there
is no real gain from the limit, since we already enforce a size
limit on the summary.

Remove the limit and allow summary entry generation based solely on
spanned data size.

Fixes #2711.
Message-Id: <20170819184255.14181-1-avi@scylladb.com>
2017-08-21 14:26:44 +02:00
Avi Kivity
81a33df25d dht: reduce split_range_to_single_shard contiguous memory demand
split_range_to_single_shard() returns a vector of size 4096, with
each element (a partition_range) of size 100. The total of 400k can
cause defragmentation if memory is fragmented.

Fix by using a deque.

Fixes #2707.
Message-Id: <20170819141017.28287-1-avi@scylladb.com>
2017-08-21 14:25:45 +02:00
Piotr Jastrzebski
c602ffd610 Make Scylla ttl expiration behave like in Cassandra
Fixes #2497

[tgrabiec: reworked the title]

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <2f5a99dce6ef11fe0ef135c9fa0592078fc9a056.1502886874.git.piotr@scylladb.com>
2017-08-21 14:25:45 +02:00
Botond Dénes
eae33a1f19 Add script to precompute segmented compression parameters
The script generates sstables/segmented_compress_params.hh which
contains a list with the optimal number of grouped offsets for
different data and chunk sizes as well as a list with the best
nominal data sizes for different chunk sizes, given a bucket size.
Data sizes are in the range of [2**4,2**50] and chunks in the
range of [2**4, 2**30]. Data sizes that are not used with the current
bucket_size are ommited.
See next commit for details of how the calculated values are used.
2017-08-21 10:44:08 +03:00
Avi Kivity
5a2439e702 main: check for large allocations
Large allocations can require cache evictions to be satisfied, and can
therefore induce long latencies. Enable the seastar large allocation
warning so we can hunt them down and fix them.

Message-Id: <20170819135212.25230-1-avi@scylladb.com>
2017-08-21 10:25:40 +03:00
Pekka Enberg
318423d50b Merge seastar upstream
* seastar 2d16aca...e96881a (4):
  > memory: add detector for large allocations
  > memory: reduce large allocations for small pools
  > net: Fix potential NULL pointer dereference in udp.cc
  > Update dpdk submodule
2017-08-21 10:24:08 +03:00
Tomasz Grabiec
8f2ca52740 tests: Run test_query_only_static_row test case on all mutation sources
The test checks behavior common to all mutation readers, so it's
better to run it against all mutation sources rather than only for
cache reader.

Message-Id: <1503072333-17995-1-git-send-email-tgrabiec@scylladb.com>
2017-08-20 12:23:28 +03:00
Raphael S. Carvalho
10eaa2339e compaction: Make resharding go through compaction manager
Two reasons for this change:
1) every compaction should be multiplexed to manager which in turn
will make decision when to schedule. improvements on it will
immediately benefit every existing compaction type.
2) active tasks metric will now track ongoing reshard jobs.

Fixes #2671.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170817224334.6402-1-raphaelsc@scylladb.com>
2017-08-20 11:35:14 +03:00
Takuya ASADA
38b2ff617f dist/redhat: follow the change on libgcc/libstdc++ package name
Since we moved to external 3rdparty repository, we added '53' suffix on gcc
packages, so follow the change.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170819092039.1090-2-syuu@scylladb.com>
2017-08-19 16:01:28 +03:00
Takuya ASADA
f1b5401d1f dist/redhat: Change g++ command name on CentOS
We have added '-5.3' suffix on g++ command from scylla-gcc53-c++-5.3.1-2.2,
follow the change on scylla build script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170819092039.1090-1-syuu@scylladb.com>
2017-08-19 16:01:27 +03:00
Avi Kivity
e428805ba5 Merge "Optimize query result partition and row counts" from Duarte
"Now that range queries go through the normal digest path, we rely on
query::result::calculate_counts() to count the amount of partitions
and rows returned.

This series optimizes it, in case it is needed, and also changes the
result message to include the partition and row counts, avoiding the
calculation altogether."

* 'calculate-counts/v3' of github.com:duarten/scylla:
  query-result: Send row and partition count over the wire
  query::result: Optimize calculate_counts()
2017-08-17 13:41:21 +03:00
Alexys Jacob
e5ff8efea3 dist: Fix Gentoo Linux scylla-jmx and scylla-tools packages detection
These two admin related packages will be packaged under the "app-admin"
category and not the "dev-db" one.

This fixes the detection path of the packages for scylla_setup.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170817094756.21550-1-ultrabug@gentoo.org>
2017-08-17 13:20:43 +03:00
Nadav Har'El
7832d8a883 get rid of unused part in configure.py
Scylla's configure.py contains stuff we copied from Seastar's
configure.py, but is no longer used. Let's get rid of some of it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170813150842.12603-1-nyh@scylladb.com>
2017-08-17 12:05:44 +03:00
Duarte Nunes
1e7f0eab82 memtable: Created readers should be fast forwardable by default
mutation_reader::forwarding defaults to yes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170816180304.2121-1-duarte@scylladb.com>
2017-08-17 10:21:01 +03:00