Commit Graph

12103 Commits

Author SHA1 Message Date
Piotr Jastrzebski
6528f3a963 Make sure mutation_reader for sstables can be fast-forwarded
Fixes #2145.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
[tgrabiec: Extracted from a series, fixed title]
Message-Id: <1495639745-19387-1-git-send-email-tgrabiec@scylladb.com>
2017-05-24 16:36:24 +01:00
Tomasz Grabiec
6cf2841654 mvcc: Extract partition_snapshot_reader to separate header
Right know whole world includes it transitively, which results in
painful recompiles when the code changes.

Relax dependencies.
Message-Id: <1495620201-8046-1-git-send-email-tgrabiec@scylladb.com>
2017-05-24 12:13:15 +01:00
Asias He
f792c78c96 streaming: Do not abort session too early in idle detection
Streaming ususally takes long time to complete. Abort it on false
positive idle detection can be very wasteful.

Increase the abort timeout from 10 minutes to a very large timeout, 300
minutes. The real idle session will be aborted eventually if other
mechanisms, e.g., streaming manager has gossip callback for on_remove
and on_restart event to abort, do not abort the session.

Fixes #2197

Message-Id: <57f81bfebfdc6f42164de5a84733097c001b394e.1494552921.git.asias@scylladb.com>
2017-05-24 12:29:50 +03:00
Paweł Dziepak
3b9c0a6ae2 Merge "loading_cache: fix the known complexity issue in the shrink() method" from Vlad
Use the boost::intrusive containers in order to achieve a O(1) complexity
for both "LRU list" update and to minimize the memory overhead in the hash
table item to "LRU list" item connection:
   - Make the timestamped_val be both a bi::list and a bi::unordered_set
     item.
   - Make a bi::unordered_set be a cache backend instead of the
     std::unordered_map.

As a result dropping k LRU items becomes an O(k) operation instead of
O(N log N), where N is a total number of all cached items:
   - Every time a value is read - move it to the front of the "LRU list"
     (O(1)).
   - When we need to remove k LRU items:
      - Repeat k times:
         - Take an element from the back of the "LRU list". (O(1)).
         - Remove it from the bi::unordered_set and dispose. (O(1)).
           We use an auto-unlink configuration for bi::list, therefore
           disposing an item is going to auto unlink it from the list.

* 'permissions_cache_move_to_intrusive-v1' of github.com:scylladb/seastar-dev:
  utils::loading_cache: cleanup
  utils/loading_cache.hh: use intrusive list to store the lru entry
  utils::loading_cache: implement automatic rehashing
  utils::loading_cache: make the underlying map to be an intrusive unordered_set
2017-05-23 16:18:16 +01:00
Tomasz Grabiec
cec8d7f38c gdb: Fix error about gdb.Value not being convertible to int by %x format
Message-Id: <1495538843-27777-1-git-send-email-tgrabiec@scylladb.com>
2017-05-23 15:38:58 +03:00
Avi Kivity
fd0e1eb1e2 Merge "Fixes for mutation algebra" from Tomasz
"Enforces commutativity of addition:

 m1 + m2 == m2 + m1

and consistency of difference and addition with equality:

 m1 + (m2 - m1) == m1 + m2"

* tag 'tgrabiec/fix-range-tombstone-commutativity-v2' of github.com:cloudius-systems/seastar-dev:
  mutation: Make compare_*_for_merge() consistent with equals()
  tests: mutation: Improve assertion failure message
  tests: Use default equality in test_mutation_diff_with_random_generator
  mutation: Make counter cell difference consistent with apply
  tests: range_tombstone_list_test: Improve error message
  tests: range_tombstone_list: Check adjacent range merging
  range_tombstone_list: Merge adjacent range tombstones in apply()
  tests: mutation: Check commutativity of mutation addition
  range_tombstone_list: Avoid violating set invariant
  range_tombstone_list: Make tombstone merging commutative
  range_tombstone_list: Add erase() operation to the reverter
  range_tombstone_list: Make all undo operations ordered relative to each other
  utils: Extract to_boost_visitor() to a separate header
  allocating_strategy: Introduce alloc_strategy_unique_ptr<>
2017-05-23 15:20:38 +03:00
Tomasz Grabiec
804f46f684 mutation: Make compare_*_for_merge() consistent with equals()
equals() considers expiring cells to be different form non-expiring cells,
but compare_row_marker_for_merge() considers them equal. Fix the latter to
pick expiring cells. The choice was arbitrary.
2017-05-23 13:35:03 +02:00
Tomasz Grabiec
c1475a8eb2 tests: mutation: Improve assertion failure message 2017-05-23 13:16:03 +02:00
Tomasz Grabiec
d15880b3b7 tests: Use default equality in test_mutation_diff_with_random_generator 2017-05-23 13:16:03 +02:00
Tomasz Grabiec
9dbae279ad mutation: Make counter cell difference consistent with apply
The case when both cells are dead was not handled properly, the diff
was always empty, whereas the cell with higher timestamp should win.

Caused test_mutation_diff_with_random_generator to fail.
2017-05-23 13:16:03 +02:00
Tomasz Grabiec
951da421db tests: range_tombstone_list_test: Improve error message 2017-05-23 13:16:03 +02:00
Tomasz Grabiec
bee40b4628 tests: range_tombstone_list: Check adjacent range merging 2017-05-23 13:16:03 +02:00
Tomasz Grabiec
3c509308ab range_tombstone_list: Merge adjacent range tombstones in apply()
Needed for equivalence to work correctly with difference and addition:

  m1 + (m2 - m1) = m1 + m2

Fixes #2158.
2017-05-23 13:16:03 +02:00
Tomasz Grabiec
ef4c7c458c tests: mutation: Check commutativity of mutation addition 2017-05-23 12:11:12 +02:00
Tomasz Grabiec
1dea251ca2 range_tombstone_list: Avoid violating set invariant
The code was inserting an entry with the same key as its successor,
and only later adjusting the key of the old entry. This is violating
set's invariant of unique keys, and insertion may cause rebalancing. I
don't know if this violation actually causes problems currently, but
it's safer not to.

Fix by first updating the existing entry and then inserting the new
one.
2017-05-23 12:11:12 +02:00
Tomasz Grabiec
a2a22e5f00 range_tombstone_list: Make tombstone merging commutative
Example of non-commutative case:

 a = [1, 5]@t2
 b = {[2, 3]@t1, [4, 5]@t1}

 a + b = [1, 5]@t2
 b + a = [1, 4)@t2, [4, 5]@t2

After this patch, both will yield [1, 5]@t2.

The patch also changes the logic to handle overlaps of tombstones with
equal timestamps to be handled symmetrically. They are now merged
instead of split on either of the boundary.

Refs #2158.
2017-05-23 12:11:12 +02:00
Tomasz Grabiec
c4dac7c80f range_tombstone_list: Add erase() operation to the reverter 2017-05-23 12:11:12 +02:00
Tomasz Grabiec
935709cddc range_tombstone_list: Make all undo operations ordered relative to each other
Later operation may depend on the result of previous operation. Same dependency
is present when reverting the operations.

Fixes assertion failure in update reverter.
2017-05-23 12:11:12 +02:00
Vlad Zolotarov
2d4d198fb9 utils::loading_cache: cleanup
- Remove "_" at the beginning of the type names.
   - s/Pred/EqualPred/

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-22 23:02:18 -04:00
Vlad Zolotarov
fd59a548c0 utils/loading_cache.hh: use intrusive list to store the lru entry
Fix the shrink() O(n log n) complexity issue by constantly pushing the corresponding intrusive
list entry to the head of the list every time the values are read.

This will keep the list ordered by the last read time from the most recently read
to the least recently read entry.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-22 23:00:18 -04:00
Vlad Zolotarov
0c4e9efce7 utils::loading_cache: implement automatic rehashing
- Start the cache with 256 buckets - the minimum number of buckets.
   - Limit the maximal number of buckets by 1M buckets.
   - Keep the load factor between 0.25 and 1.0 as long as the number of buckets is
     between the minimum and the maximum values mentioned above.
   - Grow and shrink the hash every "refresh" period if needed.
   - Enable bi::power_2_buckets and bi::compare_hash bi::unordered_set options.
   - Enable bi::unordered_set_base_hook's bi::store_hash option.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-22 22:57:44 -04:00
Vlad Zolotarov
2be3596a4f utils::loading_cache: make the underlying map to be an intrusive unordered_set
Make the underlying map to be a boost::intrusive::unordered_set<timestamped_val>
instead of std::unordered_set<Key, timestamped_val>.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-22 18:45:13 -04:00
Tomasz Grabiec
5aeb9eb70c utils: Extract to_boost_visitor() to a separate header 2017-05-22 19:30:02 +02:00
Tomasz Grabiec
69e2eccf68 allocating_strategy: Introduce alloc_strategy_unique_ptr<> 2017-05-22 19:30:02 +02:00
Raphael S. Carvalho
4b4a1883aa refresh: do not use default priority for loading new sstables
Metadata is read using default priority class, which can significantly
slow down the process under high load. Compaction class can be used,
and if it turns out to be a problem, we can switch to a special class
for it.

Fixes #1859.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170517184546.17497-1-raphaelsc@scylladb.com>
2017-05-22 19:03:17 +03:00
Avi Kivity
ef428d008c Merge "reduce memory requirement for loading sstables" from Rapahel
"fixes a problem in which memory requirement for loading in-memory
components of sstables is very high due to unlimited parallelism."

* 'mem_requirement_sstable_load_v2_2' of github.com:raphaelsc/scylla:
  database: fix indentation of distributed_loader::open_sstable
  database: reduce memory requirement to load sstables
  sstables: loads components for a sstable in parallel
  sstables: enable read ahead for read of in-memory components
  sstables: make random_access_reader work with read ahead
2017-05-22 18:23:03 +03:00
Raphael S. Carvalho
28206993a4 database: fix indentation of distributed_loader::open_sstable
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-05-22 11:52:52 -03:00
Raphael S. Carvalho
a4e414cb3b database: reduce memory requirement to load sstables
SSTable load temporarily uses more space than needed to store metadata,
due to:
1) All components are read using read_simple() which uses 128k buffer.
file::dma_read_bulk() will allocate 128k, and may potentially allocate
another big buffer (128k - read) for file::read_maybe_eof().
2) read_filter() may use double the space it needs to.

Due to the fact that sstable loading parallelism is unlimited, Scylla
may require much more memory to load all sstables, and that may lead to
OOM. Higher the number of sstables higher the memory overhead.

To confirm this problem, I wrote a test[1] which loads 30k sstables in
parallel and reports the memory usage peak in the end.
When loading 30k sstables, each of which metadata is ~300kb, memory
usage peak was ~18G. When loading completed, only ~9GB were needed to
store all the metadata.
[1]: https://gist.github.com/raphaelsc/2db37b4fb34301833ab9eeed3b1a524d

To fix this problem, we need to set a limit on load parallelism (let's
start with a small number like 3 and adjust later if needed) and rely
on readahead so that the requirement drops considerably without
increasing boot time. Actually, boot time is improved by it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2017-05-22 11:52:51 -03:00
Raphael S. Carvalho
043fae2ef5 sstables: loads components for a sstable in parallel
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2017-05-22 11:52:49 -03:00
Raphael S. Carvalho
0ac729fd57 sstables: enable read ahead for read of in-memory components
Read ahead 4 is used. Let's adjust it later if needed. File size is
used to prevent file_input_stream from issuing useless reads beyond
file size with read ahead enabled. We can switch to variant without
length once file_input_stream handles it properly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-05-22 11:52:37 -03:00
Raphael S. Carvalho
77b8870cf3 sstables: make random_access_reader work with read ahead
Scylla crashes if read ahead is enabled by file_random_access_reader
because a call to seek() destroys the existing input stream without
closing it first.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-05-22 11:52:33 -03:00
Duarte Nunes
6ac73b57fb cql3/statements/select_statement: Remove dead code
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170522100230.17393-1-duarte@scylladb.com>
2017-05-22 14:32:12 +03:00
Avi Kivity
5828ddcca4 Merge seastar upstream
* seastar 4af898c...68dbf60 (4):
  > dpdk: follow namespace changes to fix compile error
  > perftune.py: fix regression introduced in df5f74ac
  > doc: typo in README.md
  > posix_net: load-balance connections
2017-05-22 12:39:48 +03:00
Asias He
b56ba02335 gossip: Make bootstrap more robust
The bootstrapping node will be a gossip only member, until the streaming
finishes and the node becomes NORMAL state. If during this time, the
bootstrapping node is overwhelmed with streaming, it is possible the node will
delay the update the gossip heartbeat. Be forgiving for the bootstrapping node
and do not remove it from gossip too fast. Otherwise, streaming rpc verbs will
not be resent becasue the node is not in gossip membership anymore.

Fixes #2150

Message-Id: <286d7035d854f2a48abf4e1e2e3bfcb8b22b9ca2.1494553580.git.asias@scylladb.com>
2017-05-21 19:25:40 +03:00
Takuya ASADA
7777b558c4 dist/redhat: Use mock for CentOS/RHEL rpms
Enable mock for CentOS/RHEL, also support cross building by mock.

Fixes #630

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170513171200.14926-1-syuu@scylladb.com>
2017-05-21 19:22:54 +03:00
Avi Kivity
2f23648b9e Revert "dist: add conflict with Cassandra"
This reverts commit da55aecca3.  Instead of an
install-time conflict, we'll add a run-time conflict.
2017-05-21 18:37:59 +03:00
Alexys Jacob
c8116b4252 scylla_raid_setup: fix typo on print_usage
Simple typo fix on the usage message output, the script name was not correct.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170519145851.6205-1-ultrabug@gentoo.org>
2017-05-21 18:01:28 +03:00
Avi Kivity
5b182537db Merge seastar upstream
* seastar 8aef5f5...4af898c (4):
  > memory: fix debug build
  > tests: fix slab_test build
  > xen: fix fallouts from seastar namespace change
  > build: make swagger generated files depend on the code generator
2017-05-21 13:48:24 +03:00
Alexys Jacob
8dbad4f34a scylla_sysconfig_setup: fix typo on print_usage
Simple typo fix on the usage message output, the script name was not correct.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170519143227.2741-1-ultrabug@gentoo.org>
2017-05-21 13:41:43 +03:00
Alexys Jacob
c0756d97b8 scylla_setup: fix typos on cpu scaling messages
This fixes typos on CPU scaling related messages.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170519143703.3574-1-ultrabug@gentoo.org>
2017-05-21 13:41:42 +03:00
Glauber Costa
5f99158889 api: return correct values for bloom filter statistics
We are currently suspecting that the bloom filter false positive ratio
is not being respected. While trying to debug that, I found out that we
have a more basic problem:

The numbers are all meaningless, because the stats are wrong.  We are
accumulating by summing the ratios together. It's easy to see how this
doesn't work, if we look at an example where the ratio for some CFs is
zero:

SST1: false = 1, total = 2. ratio = 0.5
SST2: false = 0, total = 98 . ratio = 0.

The real ratio in this example is 1 / (98 + 2) = 1 %, but the displayed
ratio will be 0.5 + 0 = 0.5.

This patch will map reduce all the sstables together keeping both
numerator and denominator, yielding the right value at the end. To do
that, we'll reuse the existing ratio_holder class, which already does
exactly what we want.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20170518222333.16307-1-glauber@scylladb.com>
2017-05-21 13:11:22 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Avi Kivity
dab2783b58 Merge seastar upstream
* seastar 45b718b...f726938 (2):
  > memory: add --mbind option to supress warning message when running Seastar apps on container
  > Add support for Gentoo Linux irqbalance configuration detection.
2017-05-20 21:15:46 +03:00
Avi Kivity
c8cb3d6ff5 Merge "Materialized views: bug fixes and unit tests" from Duarte
"This series fixes bugs related to materialized views, most pertaining
to column filtering in the where clause."

* 'materialized-views/bug-fixes/v1' of https://github.com/duarten/scylla:
  tests/view_schema_test: Add more test cases
  tests/cql_assertions: Add assertion for row set equality
  single_column_relation: Correctly print IN relation
  statement_restrictions: Allow filtering regular columns for views
  statement_restrictions: Relax clustering restrictions for views
  statement_restrictions: Relax partition restrictions for views
  cql3/statements: Prevent setting default ttl on view
  cql3/restrictions: Complete implementation of is_satisfied_by()
  db/view: Re-implement clustering_prefix_matches()
  db/view: Re-implement partition_key_matches()
  db/view: Generate regular tombstone for base deletions
  db/view: Consider cell liveness when generating updates
  db/view: Don't generate view updates for static rows
2017-05-20 13:52:56 +03:00
Tomasz Grabiec
cd4d15672b utils: estimated_histogram: Fix clear()
It was a no-op. It doesn't seem currently used, but I will have a use
for it soon.
Message-Id: <1495198172-1969-1-git-send-email-tgrabiec@scylladb.com>
2017-05-19 14:34:34 +01:00
Paweł Dziepak
c560cf9d9d Merge "fixes and improvements in the permissions cache implementation" from Vlad
"There are numerous issues in the current implementation of permissions
cache starting from the logical errors and bugs and ending with the
suboptimal implementation described in the issue #2262."

* 'permissions_cache_fixes-v4' of github.com:scylladb/seastar-dev:
  utils::loading_cache: avoid the reads storm when the key is not in the cache
  utils::loading_cache: cleanup
  utils::loading_cache: align the constrains in the constructor with the parameters description
  utils::loading_cache: refresh in the background
  auth::auth: add operator<<() for a permission_cache key
  auth::auth::permissions_cache: use the values from the configuration - don't try to be smart
  db::config: define a saner default value for permissions_validity_in_ms
2017-05-18 13:33:05 +01:00
Vlad Zolotarov
6a63c87a9f utils::loading_cache: avoid the reads storm when the key is not in the cache
Use a mutex to serialize producers when the key is not present in the cache.

Fixes #2262

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-18 07:55:48 -04:00
Tomasz Grabiec
3fc1703ccf range: Fix SFINAE rule for picking the best do_lower_bound()/do_upper_bound() overload
mutation_partition has a slicing constructor which is supposed to copy
only the rows from the query range. The rows are located using
nonwrapping_range::lower_bound() and
nonwrapping_range::lower_bound(). Those two have two different
implementations chosen with SFINAE. One is using std::lower_bound(),
and one is using container's built in lower_bound() should it
exist. We're using intrusive tree in mutation_partition, so
container's lower_bound() is preferred. It's O(log N) whereas
std::lower_bound() is O(N), because tree's iterator is not random
access.

However, the current rule for picking container's lower_bound() never
triggers, because lower_bound() has two overloads in the container:

  ./range.hh:618:14: error: decltype cannot resolve address of overloaded function
              typename = decltype(&std::remove_reference<Range>::type::upper_bound)>
              ^~~~~~~~

As a result, the overload which uses std::lower_bound() is used.

Spotted when running perf_fast_forward with wide partition limit in
cache lifted off. It's so slow that I timeouted waiting for the result
(> 16 min).

Fixes #2395.

Message-Id: <1495048614-9913-1-git-send-email-tgrabiec@scylladb.com>
2017-05-18 13:28:10 +03:00
Avi Kivity
ba31619594 tests: fix partitioner_test for g++ 5
It can't make the leap from dht::ring_position to
stdx::optional<range_bound<dht::ring_position>> for some reason.
2017-05-18 13:09:41 +03:00
Pekka Enberg
30b5933db2 Merge "Add Gentoo Linux support to utility and setup scripts" from Alexys
"These patches add support to setup and operate ScyllaDB on Gentoo Linux.

 * scylla_setup and related scripts
 * node_health_check

 I have kept them as simple as possible and tested them to setup and operate
 succesfully a three nodes cluster running on Gentoo Linux."

* 'gentoo_linux_support' of github.com:ultrabug/scylla:
  scylla_setup: add gentoo linux installation detection
  prometheus node_exporter install: add support for gentoo linux
  raid setup: add support for gentoo linux
  ntp setup: add support for gentoo linux
  kernel check: add support for gentoo linux
  cpuscaling setup: add support for gentoo linux
  coredump setup: add support for gentoo linux
  detect gentoo linux on selinux setup
  add gentoo_variant detection and SYSCONFIG setup
2017-05-18 09:41:13 +03:00