Commit Graph

1376 Commits

Author SHA1 Message Date
Duarte Nunes
392403b5b3 row_marker: Mark constructors explicit
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-04-25 11:43:04 +02:00
Tomasz Grabiec
f3609fc813 tests: log_historgram_test: Fix compiation on Ubuntu
Some gcc versions incorrectly complain:

  tests/log_histogram_test.cc:87:22: error: ‘opts1’ is not a valid template argument for type ‘const log_histogram_options&’ because object ‘opts1’ has not external linkage
 size_t hist_key<node<opts1>>(const node<opts1>& n) { return n.v; }

Apparently this is a bug in gcc:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52036

Fixes #2307.

Message-Id: <1493108791-11247-1-git-send-email-tgrabiec@scylladb.com>
2017-04-25 12:15:28 +03:00
Pekka Enberg
940c3f4330 Merge "Clang fixes (part 2)" from Avi
"This series fixes some more errors found by clang, with the aim of enabling
clang/zapcc as a supported compiler.  A single issue remains, but it's
probably in std::experimental::optional::swap(); not in our code."

* tag 'clang/2/v1' of https://github.com/avikivity/scylla:
  sstable_test: avoid passing negative non-type template arguments to unsigned parameters
  UUID: add more comparison operators
  sstable_datafile_test: avoid string_view user-defined literal conversion operator
  mutation_source_test: avoid template function without template keyword
  cql_query_test: define static variable
  cql_query_test: add braces for single-item collection initializers
  storage_service: don't use typeid(temporary)
  logalloc: remove unused max_occupancy_for_compaction
  storage_proxy: drop overzealous use of __int128_t in recently-modified-no-read-repair logic
  storage_proxy: drop unused member access from return value
  storage_proxy: fix reference bound to temporary in data_read_resolver::less_compare
  read_repair_decision: fix operator<<(std::ostream&, ...)
2017-04-24 20:32:16 +03:00
Avi Kivity
6d9e18fd61 logalloc: reduce descriptor overhead
Every lsa-allocated object is prefixed by a header that contains information
needed to free or migrate it.  This includes its size (for freeing) and
an 8-byte migrator (for migrating).  Together with some flags, the overhead
is 14 bytes (16 bytes if the default alignment is used).

This patch reduces the header size to 1 byte (8 bytes if the default alignment
is used).  It uses the following techniques:

 - ULEB128-like encoding (actually more like ULEB64) so a live object's header
   can typically be stored using 1 byte
 - indirection, so that migrators can be encoded in a small index pointing
   to a migrator table, rather than using an 8-byte pointer; this exploits
   the fact that only a small number of types are stored in LSA
 - moving the responsibility for determining an object's size to its
   migrator, rather than storing it in the header; this exploits the fact
   that the migrator stores type information, and object size is in fact
   information about the type

The patch improves the results of memory_footprint_test as following:

Before:

 - in cache:     976
 - in memtable:  947

After:

mutation footprint:
 - in cache:     880
 - in memtable:  858

A reduction of about 10%.  Further reductions are possible by reducing the
alignment of lsa objects.

logalloc_test was adjusted to free more objects, since with the lower
footprint, rounding errors (to full segments) are different and caused
false errors to be detected.

Missing: adjustments to scylla-gdb.py; will be done after we agree on the
new descriptor's format.
2017-04-24 12:23:12 +02:00
Duarte Nunes
cddf2f4d74 tests: Fix failure virtual_reader_test
This patch fixes a failure of virtual_reader_test, where both the test
itself and the cql_test_env initialize the messaging_service to listen
on the same address and port, triggering an assert in
posix_ap_server_socket_impl::accept().

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170423104240.21275-1-duarte@scylladb.com>
2017-04-23 14:06:35 +03:00
Avi Kivity
566c094764 sstable_test: avoid passing negative non-type template arguments to unsigned parameters
Clang complains.  The test looks somewhat bogus, but that's for another patch.
2017-04-22 22:13:55 +03:00
Avi Kivity
5424aca745 sstable_datafile_test: avoid string_view user-defined literal conversion operator
Clang doesn't like it, perhaps because it isn't in the std namespace (it's
still in std::experimental).
2017-04-22 22:11:30 +03:00
Avi Kivity
705ac957a2 mutation_source_test: avoid template function without template keyword
This isn't (yet?) standard C++, and clang rejects it.
2017-04-22 22:10:21 +03:00
Avi Kivity
551fb03476 cql_query_test: define static variable
single_node_cql_env is declared but not defined; define it to make clang
happy.
2017-04-22 22:01:44 +03:00
Avi Kivity
eb700752d8 cql_query_test: add braces for single-item collection initializers
Clang complains that braces are missing; I didn't verify it but I'm sure
it's right.  Add braces to make it happy.
2017-04-22 22:00:49 +03:00
Raphael S. Carvalho
4a86dd473d tests: add tests/sstable_resharding_test.cc
Forgot to add file after resolving conflict.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170422172053.3734-1-raphaelsc@scylladb.com>
2017-04-22 21:09:29 +03:00
Benoît Canet
f68049ef5d tests: Fix clang auto universal reference type deduction
Replace it by regular template type deduction.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <20170421204150.4626-2-benoit@scylladb.com>
2017-04-22 20:04:00 +03:00
Benoit Canet
b902f3b81b tests: Remove parenthesis in variable declaration
Prevent clang compilation of this tests.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <20170421204150.4626-1-benoit@scylladb.com>
2017-04-22 20:04:00 +03:00
Raphael S. Carvalho
8a37b279ed tests: add test for new sstable resharding
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-04-21 17:11:34 -03:00
Raphael S. Carvalho
d82a8dfae0 lcs: restore invariant instead of sending overlapping sst to L0
A large token span sstable may find its way into high level due to resharding,
which means the strategy invariant is broken. The invariant is restored by
compacting first set of overlapping sstables, meaning that the restoration
is done incrementally for multiple overlapping sets.

Invariant is restored by regular compaction after resharding puts new unshared
sstables into their original level, where level > 0.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-04-21 17:11:09 -03:00
Avi Kivity
fccbf2c51f Merge "Reduce memory reclamation latency" from Tomasz
"Currently eviction is performed until occupancy of the whole region
drops below the 85% threshold. This may take a while if region had
high occupancy and is large. We could improve the situation by only
evicting until occupancy of the sparsest segment drops below the
threshold, as is done by this change.

I tested this using a c-s read workload in which the condition
triggers in the cache region, with 1G per shard:

 lsa-timing - Reclamation cycle took 12.934 us.
 lsa-timing - Reclamation cycle took 47.771 us.
 lsa-timing - Reclamation cycle took 125.946 us.
 lsa-timing - Reclamation cycle took 144356 us.
 lsa-timing - Reclamation cycle took 655.765 us.
 lsa-timing - Reclamation cycle took 693.418 us.
 lsa-timing - Reclamation cycle took 509.869 us.
 lsa-timing - Reclamation cycle took 1139.15 us.

The 144ms pause is when large eviction is necessary.

Statistics for reclamation pauses for a read workload over
larger-than-memory data set:

Before:

 avg = 865.796362
 stdev = 10253.498038
 min = 93.891000
 max = 264078.000000
 sum = 574022.988000
 samples = 663

After:

 avg = 513.685650
 stdev = 275.270157
 min = 212.286000
 max = 1089.670000
 sum = 340573.586000
 samples = 663

Refs #1634."

* tag 'tgrabiec/lsa-reduce-reclaim-latency-v3' of github.com:cloudius-systems/seastar-dev:
  lsa: Reduce reclamation latency
  tests: Add test for log_histogram
  log_histogram: Allow non-power-of-two minimum values
  lsa: Use regular compaction threshold in on-idle compaction
  tests: row_cache_test: Induce update failure more reliably
  lsa: Add getter for region's eviction function
2017-04-21 17:47:06 +03:00
Tomasz Grabiec
20f4c9bf23 lsa: Reduce reclamation latency
Currently eviction is performed until occupancy of the whole region
drops below the 85% threshold. This may take a while if region had
high occupancy and is large. We could improve the situation by only
evicting until occupancy of the sparsest segment drops below the
threshold, as is done by this change.

I tested this using a c-s read workload in which the condition
triggers in the cache region, with 1G per shard:

 lsa-timing - Reclamation cycle took 12.934 us.
 lsa-timing - Reclamation cycle took 47.771 us.
 lsa-timing - Reclamation cycle took 125.946 us.
 lsa-timing - Reclamation cycle took 144356 us.
 lsa-timing - Reclamation cycle took 655.765 us.
 lsa-timing - Reclamation cycle took 693.418 us.
 lsa-timing - Reclamation cycle took 509.869 us.
 lsa-timing - Reclamation cycle took 1139.15 us.

The 144ms pause is when large eviction is necessary.

Statistics for reclamation pauses for a read workload over
larger-than-memory data set:

Before:

 avg = 865.796362
 stdev = 10253.498038
 min = 93.891000
 max = 264078.000000
 sum = 574022.988000
 samples = 663

After:

 avg = 513.685650
 stdev = 275.270157
 min = 212.286000
 max = 1089.670000
 sum = 340573.586000
 samples = 663

Refs #1634.

Message-Id: <1484730859-11969-1-git-send-email-tgrabiec@scylladb.com>
2017-04-21 12:52:31 +02:00
Tomasz Grabiec
4313641c03 tests: Add test for log_histogram 2017-04-21 12:52:31 +02:00
Tomasz Grabiec
e054ccc037 tests: row_cache_test: Induce update failure more reliably
After changing region evicitability condition to be less strict, cache
update stopped failing because reclamation was able to compact dense
region. Induce failure by installing evictor which refuses to evict
from cache beyond few elements.
2017-04-20 14:51:47 +02:00
Tomasz Grabiec
4ed7e529db sstables: Move binary_search() to a header
There are instantiations of binary_search() used in sstables.cc, but
defined in partition.cc. The instantiations are explicitly declared in
partition.cc, but the types changed and they became obsolete. The
thing worked because partition.cc also instantiated it with the right
type. But after that code will be removed, it no longer would, and we
would get a linker error. To avoid such problems, define
binary_search() in a header.
2017-04-20 10:54:38 +02:00
Tomasz Grabiec
7dc3fe7d3f tests: perf_fast_forward: Add test case for forwarding with clustering restrictions in a large partition 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
eed864690b tests: perf_fast_forward: Add test case for slicing of large partition using a single-partition reader 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
81fc7977a4 tests: perf_fast_forward: Add test for selecting few rows from large partition 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
02da3ba316 tests: perf_fast_forward: Fix use-after-free in scan_with_stride_partitions()
partition_range must live as long as the reader is used.
2017-04-19 08:37:56 +02:00
Raphael S. Carvalho
11b74050a1 partitioned_sstable_set: fix quadratic space complexity
streaming generates lots of small sstables with large token range,
which triggers O(N^2) in space in interval map.
level 0 sstables will now be stored in a structure that has O(N)
in space complexity and which will be included for every read.

Fixes #2287.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170417185509.6633-1-raphaelsc@scylladb.com>
2017-04-18 13:04:38 +03:00
Benoît Canet
8f793905a3 perf_sstable: Change busy loop to futurized loop
The blocked task detector introduced in
113ed9e963 was seeing
the initialization phase of perf_ssttable as a blocked
task.

Tranform this part of the code in a futurized loop
to make to blocked task detector happy.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <20170413132506.17806-1-benoit@scylladb.com>
2017-04-13 18:17:28 +03:00
Raphael S. Carvalho
a6f8f4fe24 compaction: do not write expired cell as dead cell if it can be purged right away
When compacting a fully expired sstable, we're not allowing that sstable
to be purged because expired cell is *unconditionally* converted into a
dead cell. Why not check if the expired cell can be purged instead using
gc before and max purgeable timestamp?

Currently, we need two compactions to get rid of a fully expired sstable
which cells could have always been purged.

look at this sstable with expired cell:
  {
    "partition" : {
      "key" : [ "2" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 120,
        "liveness_info" : { "tstamp" : "2017-04-09T17:07:12.702597Z",
"ttl" : 20, "expires_at" : "2017-04-09T17:07:32Z", "expired" : true },
        "cells" : [
          { "name" : "country", "value" : "1" },
        ]

now this sstable data after first compaction:
[shard 0] compaction - Compacted 1 sstables to [...]. 120 bytes to 79
(~65% of original) in 229ms = 0.000328997MB/s.

  {
    ...
    "rows" : [
      {
        "type" : "row",
        "position" : 79,
        "cells" : [
          { "name" : "country", "deletion_info" :
{ "local_delete_time" : "2017-04-09T17:07:12Z" },
            "tstamp" : "2017-04-09T17:07:12.702597Z"
          },
        ]

now another compaction will actually get rid of data:
compaction - Compacted 1 sstables to []. 79 bytes to 0 (~0% of original)
in 1ms = 0MB/s. ~2 total partitions merged to 0

NOTE:
It's a waste of time to wait for second compaction because the expired
cell could have been purged at first compaction because it satisfied
gc_before and max purgeable timestamp.

Fixes #2249, #2253

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170413001049.9663-1-raphaelsc@scylladb.com>
2017-04-13 10:59:19 +03:00
Avi Kivity
5b530aa464 Merge "Use promoted index for skipping in sstable mutation readers" from Tomasz
"sstable_streamed_mutation::fast_forward_to() is changed to use promoted index
(via index_reader) to optimize skipping in large partitions.

In addition to that, sstable mutation_reader is changed to use the index
to skip to the next partition.

Performance impact was evaluated using newly added tests/perf/perf_fast_forward

What's beyond this series:

  - Using index_reader for single-partition reads as well

  - Using index_reader for skipping across ranges in clustering restrictions"

* tag 'tgrabiec/skip-within-partition-using-index-v2' of github.com:cloudius-systems/seastar-dev: (47 commits)
  tests: Add performance test for fast forwarding of sstable readers
  tests: Allow starting cql_test_env on pre-existing data
  config: Allow specifying source when setting value
  tests: sstable: Add test for fast forwarding within partition using index
  sstables: sstable_streamed_mutation: use index in fast_forward_to()
  sstables: Store parsed promoted index in index_entry
  sstables: Add trace-level logging for sstable consumption
  sstables: Define deletion_time earlier
  sstables: Make parsing throw exception on malformed promoted index
  tests: Add tests for ordering of position_in_partition relative to composites
  position_range: Introduce all_clustered_rows() factory method
  position_in_partition: Introduce for_key()/after_key() factory methods
  position_in_partition: Add factory methods for positions around all rows
  position_in_partition: Introduce for_range_start()/for_range_end()
  position_in_partition: Fix friendship declaration
  keys: Introduce is_empty() for prefixes
  position_in_partition: Make comparable with composites
  types: Enhance lexicographical comparators
  compound_compat: Accept marker value in serialize_value()
  compound_compat: Add trichotomic comparator
  ...
2017-03-29 19:01:12 +03:00
Raphael S. Carvalho
023031b0c8 compaction: lcs: fix functionality to feed starved levels
quick introduction to level starvation:
high levels may be left uncompacted (thus starved) for a long time if user
makes something that make they contain little data, such as cleanup or change
of max sstable size (default 160M). Leveled strategy handles this problem as
follow: consider we're compacting L1 to L2. If L3 is starved, we look for one
of its sstable that is fully contained in token range of candidates L1->L2,
so that we won't end up with an overlapping in L2.

now the problem:
the functionality isn't working properly now because range of candidates is
being incorrectly calculated due to an accident when converting the code to
C++. It won't cause an overlap because it's actually being more restrictive
about which sstable from starved level can be used.

A test case was added to confirm the problem.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170328223753.15398-1-raphaelsc@scylladb.com>
2017-03-29 18:59:46 +03:00
Tomasz Grabiec
7fd724821b tests: Add performance test for fast forwarding of sstable readers 2017-03-28 18:34:55 +02:00
Tomasz Grabiec
543a484d78 tests: Allow starting cql_test_env on pre-existing data 2017-03-28 18:34:55 +02:00
Tomasz Grabiec
f1aca6d116 tests: sstable: Add test for fast forwarding within partition using index 2017-03-28 18:34:55 +02:00
Tomasz Grabiec
b40b20387a tests: Add tests for ordering of position_in_partition relative to composites 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
18a057aa81 compound_compat: Return composite from serialize_value()
To make the code more type-safe. Also, mark constructor from bytes
explicit.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
27d86dfe18 sstables: Enable skipping to cells at data_consume_context level 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
cd295e9926 sstables: Avoid moving an sstable
In preparation for adding non-movable members.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
5edb427873 sstables: Remove private constructor
To reduce duplication.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
a7301a702f tests: Add missing blocking on fast_forward_to() 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
5fe14735e8 tests: dht: Test ring_position_comparator 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
ff6cca6e9e tests: Add utility for checking total orders 2017-03-28 18:10:39 +02:00
Duarte Nunes
53014bd762 mutation_source_test: Ensure unique collection elements
Duplicate elements are illegal in collections, so we ensure they only
contain unique ones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170327161149.8938-4-duarte@scylladb.com>
2017-03-27 18:44:11 +02:00
Duarte Nunes
94d568924d mutation_source_test: Sort collection elements
Ref #1607

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170327161149.8938-3-duarte@scylladb.com>
2017-03-27 18:43:58 +02:00
Duarte Nunes
4963902922 mutation_source_test: Remove extra randomness source
This patch ensures we generate UUIDs using the same randomness source
as all the other values we randomly generator, so that we can get a
deterministic run from the seeds we print.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170327161149.8938-2-duarte@scylladb.com>
2017-03-27 18:43:44 +02:00
Tomasz Grabiec
bb0ce5d8fe Merge "Ensure base and view schema versions match" from Duarte
The mapping between a base table update and a view update is schema
dependent, so we need to ensure the view schema versions match the
base schema version. For example, we match base columns to view
columns by name, so we need to ensure the base and view schemas we're
using for writting are isolated with respect to a previous alter
table statement.

We thus need to match base schema versions with view schema versions,
and we need to so atomically to ensure that when one fiber sees a
schema, it also sees the complete set of corresponding view schemas.
This series ensures the schemas modified as a result of an alter
table statement are published atomically, under the schema lock. This
way, all the schemas referenced by the database are consistent with
each other when they are observed by other fibers.

Finally, we upgrade the mutation schema before generating the view
updates, to ensure it matches the most recent view schemas the base
replica knows about, registered in the database.

The db::view::view class was replaced by a set of non-member
functions, with its state, which used to reflect only the most recent
schema version, being moved to a new view_info class.
2017-03-17 12:40:00 +01:00
Tomasz Grabiec
cefb6b604a tests: lsa_async_eviction_test: Allocate objects under allocating section 2017-03-16 10:21:10 +01:00
Duarte Nunes
e215f25b11 migration_manager: Atomically migrate table and views
This patch changes the migration path for table updates such that the
base table mutations are sent and applied atomically with the view
schema mutations.

This ensures that after schema merging, we have a consistent mapping
of base table versions to view table versions, which will be used in
later patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 16:03:56 +01:00
Duarte Nunes
143136647a mutation_test: Add more test cases for difference()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 14:34:01 +01:00
Duarte Nunes
005e4741e3 mutation_source_test: Randomly generate collection cells
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 14:34:01 +01:00
Tomasz Grabiec
ed530dfb3a tests: sstables: Add test for skipping within a compressed stream
Refs #2143.
2017-03-13 13:08:24 +01:00
Tomasz Grabiec
88ccc99017 tests: sstables: Add test for handling of repeated tombstones 2017-03-10 14:42:22 +01:00