Commit Graph

12077 Commits

Author SHA1 Message Date
Raphael S. Carvalho
28206993a4 database: fix indentation of distributed_loader::open_sstable
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-05-22 11:52:52 -03:00
Raphael S. Carvalho
a4e414cb3b database: reduce memory requirement to load sstables
SSTable load temporarily uses more space than needed to store metadata,
due to:
1) All components are read using read_simple() which uses 128k buffer.
file::dma_read_bulk() will allocate 128k, and may potentially allocate
another big buffer (128k - read) for file::read_maybe_eof().
2) read_filter() may use double the space it needs to.

Due to the fact that sstable loading parallelism is unlimited, Scylla
may require much more memory to load all sstables, and that may lead to
OOM. Higher the number of sstables higher the memory overhead.

To confirm this problem, I wrote a test[1] which loads 30k sstables in
parallel and reports the memory usage peak in the end.
When loading 30k sstables, each of which metadata is ~300kb, memory
usage peak was ~18G. When loading completed, only ~9GB were needed to
store all the metadata.
[1]: https://gist.github.com/raphaelsc/2db37b4fb34301833ab9eeed3b1a524d

To fix this problem, we need to set a limit on load parallelism (let's
start with a small number like 3 and adjust later if needed) and rely
on readahead so that the requirement drops considerably without
increasing boot time. Actually, boot time is improved by it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2017-05-22 11:52:51 -03:00
Raphael S. Carvalho
043fae2ef5 sstables: loads components for a sstable in parallel
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2017-05-22 11:52:49 -03:00
Raphael S. Carvalho
0ac729fd57 sstables: enable read ahead for read of in-memory components
Read ahead 4 is used. Let's adjust it later if needed. File size is
used to prevent file_input_stream from issuing useless reads beyond
file size with read ahead enabled. We can switch to variant without
length once file_input_stream handles it properly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-05-22 11:52:37 -03:00
Raphael S. Carvalho
77b8870cf3 sstables: make random_access_reader work with read ahead
Scylla crashes if read ahead is enabled by file_random_access_reader
because a call to seek() destroys the existing input stream without
closing it first.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-05-22 11:52:33 -03:00
Duarte Nunes
6ac73b57fb cql3/statements/select_statement: Remove dead code
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170522100230.17393-1-duarte@scylladb.com>
2017-05-22 14:32:12 +03:00
Avi Kivity
5828ddcca4 Merge seastar upstream
* seastar 4af898c...68dbf60 (4):
  > dpdk: follow namespace changes to fix compile error
  > perftune.py: fix regression introduced in df5f74ac
  > doc: typo in README.md
  > posix_net: load-balance connections
2017-05-22 12:39:48 +03:00
Asias He
b56ba02335 gossip: Make bootstrap more robust
The bootstrapping node will be a gossip only member, until the streaming
finishes and the node becomes NORMAL state. If during this time, the
bootstrapping node is overwhelmed with streaming, it is possible the node will
delay the update the gossip heartbeat. Be forgiving for the bootstrapping node
and do not remove it from gossip too fast. Otherwise, streaming rpc verbs will
not be resent becasue the node is not in gossip membership anymore.

Fixes #2150

Message-Id: <286d7035d854f2a48abf4e1e2e3bfcb8b22b9ca2.1494553580.git.asias@scylladb.com>
2017-05-21 19:25:40 +03:00
Takuya ASADA
7777b558c4 dist/redhat: Use mock for CentOS/RHEL rpms
Enable mock for CentOS/RHEL, also support cross building by mock.

Fixes #630

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170513171200.14926-1-syuu@scylladb.com>
2017-05-21 19:22:54 +03:00
Avi Kivity
2f23648b9e Revert "dist: add conflict with Cassandra"
This reverts commit da55aecca3.  Instead of an
install-time conflict, we'll add a run-time conflict.
2017-05-21 18:37:59 +03:00
Alexys Jacob
c8116b4252 scylla_raid_setup: fix typo on print_usage
Simple typo fix on the usage message output, the script name was not correct.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170519145851.6205-1-ultrabug@gentoo.org>
2017-05-21 18:01:28 +03:00
Avi Kivity
5b182537db Merge seastar upstream
* seastar 8aef5f5...4af898c (4):
  > memory: fix debug build
  > tests: fix slab_test build
  > xen: fix fallouts from seastar namespace change
  > build: make swagger generated files depend on the code generator
2017-05-21 13:48:24 +03:00
Alexys Jacob
8dbad4f34a scylla_sysconfig_setup: fix typo on print_usage
Simple typo fix on the usage message output, the script name was not correct.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170519143227.2741-1-ultrabug@gentoo.org>
2017-05-21 13:41:43 +03:00
Alexys Jacob
c0756d97b8 scylla_setup: fix typos on cpu scaling messages
This fixes typos on CPU scaling related messages.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170519143703.3574-1-ultrabug@gentoo.org>
2017-05-21 13:41:42 +03:00
Glauber Costa
5f99158889 api: return correct values for bloom filter statistics
We are currently suspecting that the bloom filter false positive ratio
is not being respected. While trying to debug that, I found out that we
have a more basic problem:

The numbers are all meaningless, because the stats are wrong.  We are
accumulating by summing the ratios together. It's easy to see how this
doesn't work, if we look at an example where the ratio for some CFs is
zero:

SST1: false = 1, total = 2. ratio = 0.5
SST2: false = 0, total = 98 . ratio = 0.

The real ratio in this example is 1 / (98 + 2) = 1 %, but the displayed
ratio will be 0.5 + 0 = 0.5.

This patch will map reduce all the sstables together keeping both
numerator and denominator, yielding the right value at the end. To do
that, we'll reuse the existing ratio_holder class, which already does
exactly what we want.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20170518222333.16307-1-glauber@scylladb.com>
2017-05-21 13:11:22 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Avi Kivity
dab2783b58 Merge seastar upstream
* seastar 45b718b...f726938 (2):
  > memory: add --mbind option to supress warning message when running Seastar apps on container
  > Add support for Gentoo Linux irqbalance configuration detection.
2017-05-20 21:15:46 +03:00
Avi Kivity
c8cb3d6ff5 Merge "Materialized views: bug fixes and unit tests" from Duarte
"This series fixes bugs related to materialized views, most pertaining
to column filtering in the where clause."

* 'materialized-views/bug-fixes/v1' of https://github.com/duarten/scylla:
  tests/view_schema_test: Add more test cases
  tests/cql_assertions: Add assertion for row set equality
  single_column_relation: Correctly print IN relation
  statement_restrictions: Allow filtering regular columns for views
  statement_restrictions: Relax clustering restrictions for views
  statement_restrictions: Relax partition restrictions for views
  cql3/statements: Prevent setting default ttl on view
  cql3/restrictions: Complete implementation of is_satisfied_by()
  db/view: Re-implement clustering_prefix_matches()
  db/view: Re-implement partition_key_matches()
  db/view: Generate regular tombstone for base deletions
  db/view: Consider cell liveness when generating updates
  db/view: Don't generate view updates for static rows
2017-05-20 13:52:56 +03:00
Tomasz Grabiec
cd4d15672b utils: estimated_histogram: Fix clear()
It was a no-op. It doesn't seem currently used, but I will have a use
for it soon.
Message-Id: <1495198172-1969-1-git-send-email-tgrabiec@scylladb.com>
2017-05-19 14:34:34 +01:00
Paweł Dziepak
c560cf9d9d Merge "fixes and improvements in the permissions cache implementation" from Vlad
"There are numerous issues in the current implementation of permissions
cache starting from the logical errors and bugs and ending with the
suboptimal implementation described in the issue #2262."

* 'permissions_cache_fixes-v4' of github.com:scylladb/seastar-dev:
  utils::loading_cache: avoid the reads storm when the key is not in the cache
  utils::loading_cache: cleanup
  utils::loading_cache: align the constrains in the constructor with the parameters description
  utils::loading_cache: refresh in the background
  auth::auth: add operator<<() for a permission_cache key
  auth::auth::permissions_cache: use the values from the configuration - don't try to be smart
  db::config: define a saner default value for permissions_validity_in_ms
2017-05-18 13:33:05 +01:00
Vlad Zolotarov
6a63c87a9f utils::loading_cache: avoid the reads storm when the key is not in the cache
Use a mutex to serialize producers when the key is not present in the cache.

Fixes #2262

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-18 07:55:48 -04:00
Tomasz Grabiec
3fc1703ccf range: Fix SFINAE rule for picking the best do_lower_bound()/do_upper_bound() overload
mutation_partition has a slicing constructor which is supposed to copy
only the rows from the query range. The rows are located using
nonwrapping_range::lower_bound() and
nonwrapping_range::lower_bound(). Those two have two different
implementations chosen with SFINAE. One is using std::lower_bound(),
and one is using container's built in lower_bound() should it
exist. We're using intrusive tree in mutation_partition, so
container's lower_bound() is preferred. It's O(log N) whereas
std::lower_bound() is O(N), because tree's iterator is not random
access.

However, the current rule for picking container's lower_bound() never
triggers, because lower_bound() has two overloads in the container:

  ./range.hh:618:14: error: decltype cannot resolve address of overloaded function
              typename = decltype(&std::remove_reference<Range>::type::upper_bound)>
              ^~~~~~~~

As a result, the overload which uses std::lower_bound() is used.

Spotted when running perf_fast_forward with wide partition limit in
cache lifted off. It's so slow that I timeouted waiting for the result
(> 16 min).

Fixes #2395.

Message-Id: <1495048614-9913-1-git-send-email-tgrabiec@scylladb.com>
2017-05-18 13:28:10 +03:00
Avi Kivity
ba31619594 tests: fix partitioner_test for g++ 5
It can't make the leap from dht::ring_position to
stdx::optional<range_bound<dht::ring_position>> for some reason.
2017-05-18 13:09:41 +03:00
Pekka Enberg
30b5933db2 Merge "Add Gentoo Linux support to utility and setup scripts" from Alexys
"These patches add support to setup and operate ScyllaDB on Gentoo Linux.

 * scylla_setup and related scripts
 * node_health_check

 I have kept them as simple as possible and tested them to setup and operate
 succesfully a three nodes cluster running on Gentoo Linux."

* 'gentoo_linux_support' of github.com:ultrabug/scylla:
  scylla_setup: add gentoo linux installation detection
  prometheus node_exporter install: add support for gentoo linux
  raid setup: add support for gentoo linux
  ntp setup: add support for gentoo linux
  kernel check: add support for gentoo linux
  cpuscaling setup: add support for gentoo linux
  coredump setup: add support for gentoo linux
  detect gentoo linux on selinux setup
  add gentoo_variant detection and SYSCONFIG setup
2017-05-18 09:41:13 +03:00
Vlad Zolotarov
1ef22f84c1 utils::loading_cache: cleanup
- Fix a callback signature: receive a const ref.
   - White spaces.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-17 15:03:14 -04:00
Vlad Zolotarov
87ce0b2d47 utils::loading_cache: align the constrains in the constructor with the parameters description
According to description of permissions_validity_in_ms the permissions_cache is enabled if this
value is set to a non-zero value. Otherwise the permissions_cache is disabled.

According to the permissions_update_interval_in_ms description it must have a non-zero value if permissions_cache
is enabled.

permissions_cache_max_entries description doesn't explicitly state it but it makes no sense to allow it to be zero
if permissions_cache is enabled.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-17 15:03:14 -04:00
Vlad Zolotarov
e286828472 utils::loading_cache: refresh in the background
This patch changes the way a loading_cache works.

Before this patch:
   1) If a permissions key is not in the cache it's loaded in the foreground and the original
      query is blocked till the permissions are loaded.
   2) Every _period the timer does the following:
      1) If a value was loaded more than _expiry time ago it is removed from the cache.
      2) If the cache is too big - the less recently loaded values are removed till the cache
         fits the requested size.

After this patch:
   1) If a permissions key is not in the cache it's loaded in the foreground and the original
      query is blocked till the permissions are loaded.
   2) Every _period the timer does the following:
      1) If a value in the cache was loaded or read for the last time more than _expiry time ago - it's removed from the cache.
      2) If the cache is too big - the less recently read values are removed till the cache fits the requested size.
      3) The values that were loaded more than _refresh time ago are re-read in the background.

The new implementation allows to minimize the amount of the foreground reads for a frequently used value to a single
event (when the value is loaded for the first time).

It also ensures we do not reload values we no longer need.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-17 15:03:06 -04:00
Alexys Jacob
fa0944ac19 scylla_setup: add gentoo linux installation detection 2017-05-17 18:06:54 +02:00
Alexys Jacob
9bb1bda466 prometheus node_exporter install: add support for gentoo linux 2017-05-17 18:06:34 +02:00
Alexys Jacob
1d235e5012 raid setup: add support for gentoo linux 2017-05-17 18:06:14 +02:00
Alexys Jacob
fdd5944ab2 ntp setup: add support for gentoo linux 2017-05-17 18:05:59 +02:00
Alexys Jacob
412f96a1bf kernel check: add support for gentoo linux 2017-05-17 18:05:45 +02:00
Alexys Jacob
a198f2b1af cpuscaling setup: add support for gentoo linux 2017-05-17 18:05:24 +02:00
Alexys Jacob
6a1807a7d8 coredump setup: add support for gentoo linux 2017-05-17 18:05:08 +02:00
Alexys Jacob
bc63e501db detect gentoo linux on selinux setup 2017-05-17 18:04:20 +02:00
Vlad Zolotarov
4edb336ac5 auth::auth: add operator<<() for a permission_cache key
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-17 12:03:56 -04:00
Vlad Zolotarov
d780818cac auth::auth::permissions_cache: use the values from the configuration - don't try to be smart
Our configuration already has the default values for for permission cache parameters.
Therefore if user decides to give some bad parameters we'd rather fail the load and inform him/her
about the bad parameters instead of trying to silently "fix" them.

In addition the original code wasn't passing the parameters correctly: it switched the "expiry" and "refresh" parameters in
the utils::loaded_cache constructor.

Add to this that the original code was doing really strange things in the permission_cache::expiry(cfg) method.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-17 12:03:56 -04:00
Vlad Zolotarov
ea1cfabe28 db::config: define a saner default value for permissions_validity_in_ms
It makes little sense to have the same value for permissions_update_interval_in_ms and permissions_validity_in_ms.
This may cause the values to be invalidated only because some minor delays in the timer scheduling.

It makes a lot more sense to make the permissions_update_interval_in_ms value smaller than permissions_validity_in_ms.
This way we would minimize the chances of "false invalidation" due to some small delays in the timer scheduling.

In addition, 2s seems to be a too small value for permissions_validity_in_ms since our default read_request_timeout_in_ms is 5s.
This means that a single system_auth read failure would guarantee that the following queries are going to read system_auth data
in the foreground.

Setting it to 10s would allow a second read attempt before we enforce the foreground read.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-17 12:03:56 -04:00
Alexys Jacob
2ca0380d06 add gentoo_variant detection and SYSCONFIG setup 2017-05-17 18:03:53 +02:00
Avi Kivity
2aa5b3e20c Merge "Improve perf_fast_forward test" from Tomasz
"Notably:
  - add validation of the results (e.g. fragment count, expectations about disk activity)
  - add cache-specific tests"

* 'tgrabiec/add-cache-tests-to-perf-fast-forward' of github.com:cloudius-systems/seastar-dev:
  tests: perf_fast_forward: Report cache stats
  row_cache: Keep counters in a struct
  tests: perf_fast_forward: Add cache-specific tests
  tests: perf_fast_forward: Extract test_reading_all()
  tests: perf_fast_forward: Add validation of the results
  tests: perf_fast_forward: Fix partition scans to read the expected amount of fragments
  tests: perf_fast_forward: Allow the test to be interrupted
  tests: perf_fast_forward: Allow testing with cache enabled
  row_cache: Implement mutation_reader::fast_forward_to() for cache scanner
2017-05-17 18:06:02 +03:00
Calle Wilund
29b20d410a schema_tables: Remove "class" attribute from strategy options
Not 100% proper, but in line with how we still store the info.
Ensures (helps at least) to keep schema loaded from tables
and schema from builder comparable.

Fixes schema_changes_test error.

Message-Id: <1495030581-2138-2-git-send-email-calle@scylladb.com>
2017-05-17 17:56:11 +03:00
Calle Wilund
6ca07f16c1 scylla: fix compilation errors on gcc 5
Message-Id: <1495030581-2138-1-git-send-email-calle@scylladb.com>
2017-05-17 17:56:06 +03:00
Paweł Dziepak
3ecceaee48 Merge "Fix fast_forward_to() on sstable reader being ignored in some cases" from Tomasz
"When mutation reader enters the partition using index,
streamed_mutation object is returned to the user before the row start
fragment is processed. In that case, when we process the row start, we
should ignore it and not call setup_for_partition() again. That may
override user's fast_forward_to() request."

* 'tgrabiec/fix-initial-fast-forward-to-for-single-key-sstable-readers' of github.com:scylladb/seastar-dev:
  tests: mutation_source_test: Test forwarding in single-key readers
  sstables: Remove unused code
  sstables: mutation_reader: Fix setup_for_partition() being called twice in some cases
  sstables: Fix verify_end_state() to tolerate ATOM_START_2 state
2017-05-17 15:35:30 +01:00
Avi Kivity
eb69fe78a4 Merge "Adding private repository to housekeeping" from Amnon
"This series adds private repository support to scylla-housekeeping"

* 'amnon/housekeeping_private_repo_v3' of github.com:cloudius-systems/seastar-dev:
  scylla-housekeeping service: Support private repositories
  scylla-housekeeping-upstart: Use repository id, when checking for version
  scylla-housekeeping: support private repositories
2017-05-17 15:56:46 +03:00
Tomasz Grabiec
777ffa3a27 tests: perf_fast_forward: Report cache stats 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
d1bde3036e row_cache: Keep counters in a struct
So that taking a snapshot of all stats is easy.
2017-05-17 14:15:14 +02:00
Tomasz Grabiec
7a81f5e980 tests: perf_fast_forward: Add cache-specific tests 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
1a7b03004a tests: perf_fast_forward: Extract test_reading_all() 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
a38fd16f89 tests: perf_fast_forward: Add validation of the results 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
3c3ea51657 tests: perf_fast_forward: Fix partition scans to read the expected amount of fragments
make_pkeys() needs to be invoked with n equal to the number of keys
which the table was populated with. Otherwise the extra keys, which
are missing in the table, may be placed anywhere in the vector due to
ring order sorting, and break the assumption that the table contains
all keys from the array up to index n. This resulted in the test
reading slighlty less fragments than it would follow from the desired
count.

Another problem is that we should not skip the fast_forward_to() call
for the inital range (workaround for a bug in sstable mutation
reader), otherwise we will read slightly less than expected as well.
2017-05-17 14:15:14 +02:00