Commit Graph

2696 Commits

Author SHA1 Message Date
Avi Kivity
1ff6b8fb96 Merge "Don't binary compare compressed sstables in test_write_many_partitions_* tests" from Piotr
"
Compression is not deterministic so instead of binary comparing the sstable files we just read data back
and make sure everything that was written down is still present.

Tests: unit(release)
"

* 'haaawk/binary-compare-of-compressed-sstables/v3' of github.com:scylladb/seastar-dev:
  sstables: Remove compressed parameter from get_write_test_path
  sstables: Remove unused sstable test files
  sstables: Ensure compare_sstables isn't used for compressed files
  sstables: Don't binary compare compressed sstables
  sstables: Remove debug printout from test_write_many_partitions
2018-11-27 14:01:20 +02:00
Avi Kivity
5e759b0c07 Merge "Optimize checksum computation for the MC sstable format" from Tomek
"
One part of the improvement comes from replacing zlib's CRC32 with the one
from libdeflate, which is optimized for modern architecture and utilizes the
PCLMUL instruction.

perf_checksum test was introduced to measure performance of various
checksumming operations.

Results for 514 B (relevant for writing with compression enabled):

    test                                      iterations      median         mad         min         max
    crc_test.perf_deflate_crc32_combine            58414    16.711us     3.483ns    16.708us    16.725us
    crc_test.perf_adler_combine                165788278     6.059ns     0.031ns     6.027ns     7.519ns
    crc_test.perf_zlib_crc32_combine               59546    16.767us    26.191ns    16.741us    16.801us
    ---
    crc_test.perf_deflate_crc32_checksum        12705072    83.267ns     4.580ns    78.687ns    98.964ns
    crc_test.perf_adler_checksum                 3918014   206.701ns    23.469ns   183.231ns   258.859ns
    crc_test.perf_zlib_crc32_checksum            2329682   428.787ns     0.085ns   428.702ns   510.085ns

Results for 64 KB (relevant for writing with compression disabled):

    test                                      iterations      median         mad         min         max
    crc_test.perf_deflate_crc32_combine            25364    38.393us    17.683ns    38.375us    38.545us
    crc_test.perf_adler_combine                169797143     5.842ns     0.009ns     5.833ns     6.901ns
    crc_test.perf_zlib_crc32_combine               26067    38.663us    95.094ns    38.546us    40.523us
    ---
    crc_test.perf_deflate_crc32_checksum          202821     4.937us    14.426ns     4.912us     5.093us
    crc_test.perf_adler_checksum                   44684    22.733us   206.263ns    22.492us    25.258us
    crc_test.perf_zlib_crc32_checksum              18839    53.049us    36.117ns    53.013us    53.274us

The new CRC32 implementation (deflate_crc32) doesn't provide a fast
checksum_combine() yet, it delegates to zlib so it's as slow as the latter.

Because for CRC32 checksum_combine() is several orders of magnitude slower
than checksum(), we avoid calling checksum_combine() completely for this
checksummer. We still do it for adler32, which has combine() which is faster
than checksum().

SStable write performance was evaluated by running:

  perf_fast_forward --populate --data-directory /tmp/perf-mc \
     --rows=10000000 -c1 -m4G --datasets small-part

Below is a summary of the average frag/s for a memtable flush. Each result is
an average of about 20 flushes with stddev of about 4k.

Before:

 [1] MC,lz4: 330'903
 [2] LA,lz4: 450'157
 [3] MC,checksum: 419'716
 [4] LA,checksum: 459'559

After:

 [1'] MC,lz4: 446'917 ([1] + 35%)
 [2'] LA,lz4: 456'046 ([2] + 1.3%)
 [3'] MC,checksum: 462'894 ([3] + 10%)
 [4'] LA,checksum: 467'508 ([4] + 1.7%)

After this series, the performance of the MC format writer is similar to that
of the LA format before the series.

There seems to be a small but consistent improvement for LA too. I'm not sure
why.
"

* tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla:
  tests: perf: Introduce perf_checksum
  tests: Add test for libdeflate CRC32 implementation
  sstables: compress: Use libdeflate for crc32
  sstables: compress: Rename crc32_utils to zlib_crc32_checksummer
  licenses: Add libdeflate license
  Integrate libdeflate with the build system
  Add libdeflate submodule
  sstables: Avoid checksum_combine() for the crc32 checksummer
  sstables: compress: Avoid unnecessary checksum_combine()
  sstables: checksum_utils: Add missing include
2018-11-26 20:10:46 +02:00
Tomasz Grabiec
f1a35b654a tests: perf: Introduce perf_checksum 2018-11-26 18:59:43 +01:00
Tomasz Grabiec
5b6e3fb5ed tests: Add test for libdeflate CRC32 implementation 2018-11-26 18:59:42 +01:00
Paweł Dziepak
62ea153629 Merge "Check for schema mismatch after dropping dead cells" from Piotr
"
Previously we were checking for schema incompatibility between current schema and sstable
serialization header before reading any data. This isn't the best approach because
data in sstable may be already irrelevant due to column drop for example.

This patchset moves the check after actual data is read and verified that it has
a timestamp new enough to classify it as nonobsolete.

Fixes #3924
"

* 'haaawk/3924/v3' of github.com:scylladb/seastar-dev:
  sstables: Enable test_schema_change for MC format
  sstables3: Throw error on schema mismatch only for live cells
  sstables: Pass column_info to consume_*_column
  sstables: Add schema_mismatch to column_info
  sstables: Store column data type in column_info
  sstables: Remove code duplication in column_translation
2018-11-26 13:10:18 +00:00
Piotr Jastrzebski
dec48dd1e2 sstables: Remove compressed parameter from get_write_test_path
This parameter is no longer used.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:46:23 +01:00
Piotr Jastrzebski
92ffccd636 sstables: Remove unused sstable test files
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:15 +01:00
Piotr Jastrzebski
a29c9189cb sstables: Ensure compare_sstables isn't used for compressed files
Binary comparing compressed sstables is wrong because compression
is not deterministic.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:15 +01:00
Piotr Jastrzebski
7e263208f0 sstables: Don't binary compare compressed sstables
This family of test_write_many_partitions_* tests writes
sstables down from memtable using different compressions.
Then it compares the resulting file with a blueprint file
and reads the data back to check everything is there.

Compression is not deterministic so this patch makes the
tests not compare resulting compressed sstable file with blueprint
file and instead only read data back.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:03 +01:00
Piotr Jastrzebski
5c86294a56 sstables: Enable test_schema_change for MC format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:25:23 +01:00
Duarte Nunes
2a371c2689 Merge 'Allow bypassing cache on a per-query basis' from Avi
"
Some queries are very unlikely to hit cache. Usually this includes
range queries on large tables, but other patterns are possible.

While the database should adapt to the query pattern, sometimes the
user has information the database does not have. By passing this
information along, the user helps the database manage its resources
more optimally.

To do this, this patch introduces a BYPASS CACHE clause to the
SELECT statement. A query thus marked will not attempt to read
from the cache, and instead will read from sstables and memtables
only. This reduces CPU time spent to query and populate the cache,
and will prevent the cache from being flooded with data that is
not likely to be read again soon. The existing cache disabled path
is engaged when the option is selected.

Tests: unit (release), manual metrics verification with ccm with and without the
    BYPASS CACHE clause.

Ref #3770.
"

* tag 'cache-bypass/v2' of https://github.com/avikivity/scylla:
  doc: document SELECT ... BYPASS CACHE
  tests: add test for SELECT ... BYPASS CACHE
  cql: add SELECT ... BYPASS CACHE clause
  db: add query option to bypass cache
2018-11-26 09:59:40 +00:00
Paweł Dziepak
13385778fd Merge "Measure performance of dataset population in perf_fast_forward" from Tomasz
* tag 'perf-ffwd-dataset-population-v2' of github.com:tgrabiec/scylla:
  tests: perf_fast_forward: Measure performance of dataset population
  tests: perf_fast_forward: Record the dataset on which test case was run
  tests: perf_fast_forward: Introduce the concept of a dataset
  tests: perf_fast_forward: Introduce make_compaction_disabling_guard()
  tests: perf_fast_forward: Initialize output manager before population
  tests: perf_fast_forward: Handle empty test parameter set
  tests: perf_fast_forward: Extract json_output_writer::write_common_test_group()
  tests: perf_fast_forward: Factor out access to cfg to a single place per function
  tests: perf_fast_forward: Extract result_collector
  tests: perf_fast_forward: Take writes into account in AIO statistics
  tests: perf_fast_forward: Reorder members
  tests: perf_fast_forward: Add --sstable-format command line option
2018-11-26 09:45:55 +00:00
Avi Kivity
f69401c609 tests: add test for SELECT ... BYPASS CACHE
The test verifies that cache read metrics are not incremented during a cache
bypass read.
2018-11-26 11:37:52 +02:00
Piotr Jastrzebski
c2561a2796 sstables: Remove debug printout from test_write_many_partitions
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-25 13:29:10 +01:00
Raphael S. Carvalho
3fa70d6b5f tests: add example compaction strategy for sstable run based approach
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 20:16:54 -02:00
Raphael S. Carvalho
baf89f0df3 tests/sstable_test: add test for compaction replacement of exhausted sstable
Make sure that compaction is capable of releasing exhausted sstable space
early in the procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:26 -02:00
Raphael S. Carvalho
0085e8371d tests/sstables: add test for sstable run based compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:23 -02:00
Raphael S. Carvalho
e5a0b05c15 sstables/compaction: release space earlier of exhausted input sstables
Currently, compaction only replace input sstables at end of compaction,
meaning compaction must be finished for all the space of those sstables
to be released.

What we can do instead is to delete earlier some input sstable under
some conditions:

1) SStable data should be committed to a new, sealed output sstable,
meaning it's exhausted.
2) Exhausted sstable mustn't overlap with a non-exhausted sstable
because a tombstone in the exhausted could have been purged and the
shadowed data in non-exhausted could be ressurected if system
crashes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:07 -02:00
Raphael S. Carvalho
edc87014c1 tests/sstables: add run identifier correctness test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:02 -02:00
Raphael S. Carvalho
a66b1954cc sstables: use a random uuid for sstables without run identifier
Older sstables must have an identifier for them to be associated
with their own run.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:01 -02:00
Tomasz Grabiec
8e93046abc tests: perf_fast_forward: Measure performance of dataset population 2018-11-23 19:22:50 +01:00
Tomasz Grabiec
2c95aa4d8d tests: perf_fast_forward: Record the dataset on which test case was run
Now any given test case can potentially run on many different datasets.
2018-11-23 19:22:12 +01:00
Tomasz Grabiec
470552b7ab tests: perf_fast_forward: Introduce the concept of a dataset
A dataset represents a table with data, populated in certain way, with
certain characteristics of the schema and data.

Before this change, datasets were implicitly defined, with population
hard-coded inside the populate() function.

This change gathers logic related to datasets into classes, in order to:

  - make it easier to define new datasets.

  - be able to measure performance of dataset population in a
    standardized way.

  - being able to express constraints on datasets imposed by different
    test cases.  Test cases are matched with possible datasets based
    on the abstract interface they accept (e.g. clustered_ds,
    multipartition_ds), and which must be implemented by a compatible
    dataset. To facilitate this matching, test function is now wrapped
    into a dataset_acceptor object, with an automatically-generated can_run()
    virtual method, deduced by make_test_fn().

  - be able to select tests to run based on the dataset name.
    Only tests which are compatible with that dataset will be run.
2018-11-23 19:22:09 +01:00
Tomasz Grabiec
2746f78a9f tests: perf_fast_forward: Introduce make_compaction_disabling_guard() 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
b00d360281 tests: perf_fast_forward: Initialize output manager before population 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
25dc481030 tests: perf_fast_forward: Handle empty test parameter set 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
38a1b7e87b tests: perf_fast_forward: Extract json_output_writer::write_common_test_group() 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
a507ca8159 tests: perf_fast_forward: Factor out access to cfg to a single place per function
Preparatory change before making n_rows be determined through a
dataset object.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
3fc78a25bf tests: perf_fast_forward: Extract result_collector
Extracts the result collection and reporting logic out of
run_test_case(). Will be needed in population tests, for which we
don't want the looping logic.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
f4a70283ee tests: perf_fast_forward: Take writes into account in AIO statistics
Relevant for population tests. So far all tests were read tests.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
96f5bd2f46 tests: perf_fast_forward: Reorder members 2018-11-23 19:18:09 +01:00
Tomasz Grabiec
3ac5e8887e tests: perf_fast_forward: Add --sstable-format command line option 2018-11-23 19:18:09 +01:00
Paweł Dziepak
09439cd809 tests/sstable: add test for schema changes
for_each_schema_change() is used for testing reading an sstable that was
written with a different schema. Because of #3924, for now the mc format
is not verified this way.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
dc7f9fea5b tests/mutation: add test for schema changes 2018-11-23 12:14:06 +00:00
Paweł Dziepak
35f9f424e9 tests: generate schema changes
This patch adds for_each_schema_change() functions which generates
schemas and data before and after some modification to the schema (e.g.
adding a column, changing its type). It can be used to test schema
upgrades.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
daee4bd3b8 tests: add models for schemas and data
This patch introduces a model of Scylla schemas and data, implemented
using simple standard library primitives. It can be used for testing the
actuall schemas, mutation_partitions, etc. used by the schema by
comparing the results of various actions.

The initial use case for this model was to test schema changes, but
there is no reason why in the future it cannot be extended to test other
things as well.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
2a0e929830 tests/random-utils: make functions and variables inline
random-utils.hh is a header which may be included in multiple
translation units so all members should be non-static inline to avoid
any duplication.
2018-11-22 11:30:31 +00:00
Benny Halevy
dcd18e2b62 remove exec permission from top_k source files
This was introduced by 32525f2694

Cc: Rafi Einstein <rafie@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181121163352.13325-1-bhalevy@scylladb.com>
2018-11-21 18:38:50 +02:00
Paweł Dziepak
4aa5d83590 Merge "Optimize sstable writing of the MC format" from Tomasz
"
Tested with perf_fast_forward from:

  github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1

Using the following command line:

  build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \
     --data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \
     --datasets small-part

The average reported flush throughput was (stdev for the avergages is around 4k):
  - for mc before the series: 367848 frag/s
  - for lc before the series: 463458 frag/s (= mc.before +25%)
  - for mc after the series: 429276 frag/s (= mc.before +16%)
  - for lc after the series: 466495 frag/s (= mc.before +26%)

Refs #3874.
"

* tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla:
  sstables: mc: Avoid serialization of promoted index when empty
  sstables: mc: Avoid double serialization of rows
  tests: sstable 3.x: Do not compare Statistics component
  utils: Introduce memory_data_sink
  schema: Optimize column count getters
  sstables: checksummed_file_data_sink_impl: Bypass output_stream
2018-11-21 13:11:40 +00:00
Tomasz Grabiec
8f686af9af tests: sstable 3.x: Do not compare Statistics component
The Statistics component recorded in the test was generated using a
buggy verion of Scylla, and is not correct. Exposed by fixing the bug
in the way statistics are generated.

Rather than comparing binary content, we should have explicit checks
for statistics.
2018-11-21 14:04:27 +01:00
Avi Kivity
bb85a21a8f Merge "compress: Restore lz4 as default compressor" from Duarte
"
Enables sstable compression with LZ4 by default, which was the
long-time behavior until a regression turned off compression by
default.

Fixes #3926
"

* 'restore-default-compression/v2' of https://github.com/duarten/scylla:
  tests/cql_query_test: Assert default compression options
  compress: Restore lz4 as default compressor
  tests: Be explicit about absence of compression
2018-11-21 14:20:39 +02:00
Michael Munday
360374cfde tests: fix compilation of partitioner_test with boost 1.68 on IBM Z
The boost multiprecision library that I am compiling against seems
to be missing an overload for the cast to a string. The easy
workaround seems to be to call str() directly instead.

This also fixes #3922.

Message-Id: <20181120215709.43939-1-mike.munday@ibm.com>
2018-11-21 11:43:42 +02:00
Duarte Nunes
9464fffc8c tests/cql_query_test: Assert default compression options
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:27 +00:00
Duarte Nunes
5f64e34fcc tests: Be explicit about absence of compression
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:26 +00:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Vladimir Krivopalov
759fbbd5f6 random_mutation_generator: Add row_marker to rows regardless of whether they're deleted.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <f55b91f1349f0e98def6b7ca9755b5ccf4f48a3e.1542308626.git.vladimir@scylladb.com>
2018-11-16 13:17:07 +01:00
Duarte Nunes
6fbf792777 db/view/view_builder: Don't timeout waiting for view to be built
Remove the timeout argument to
db::view::view_builder::wait_until_built(), a test-only function to
wait until a given materialized view has finished building.

This change is motivated by the fact that some tests running on slow
environments will timeout. Instead of incrementally increasing the
timeout, remove it completely since tests are already run under an
exterior timeout.

Fixes #3920

Tests: unit release(view_build_test, view_schema_test)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181115173902.19048-1-duarte@scylladb.com>
2018-11-15 19:41:43 +02:00
Nadav Har'El
45f05b06d2 view_complex_test: fix another ttl
In a previous patch I fixed most TTLs in the view_complex_test.cc tests
from low numbers to 100 seconds. I missed one. This one never caused
problems in practice, but for good form, let's fix it too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115160234.26478-1-nyh@scylladb.com>
2018-11-15 18:03:28 +02:00
Vladimir Krivopalov
51afb1d8bd tests: Generate deleted rows and shadowable tombstones in random_mutation_generator.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <77e956890264023227e07cc6d295df870d0a5af2.1542295208.git.vladimir@scylladb.com>
2018-11-15 16:26:07 +01:00
Avi Kivity
0216f49bb0 Merge "Add filtering support for CONTAINS" from Piotr
"
This series enables filtering support for CONTAINS restriction.
"

* 'enable_filtering_for_contains_2' of https://github.com/psarna/scylla:
  tests: add CONTAINS test case to filtering tests
  cql3: enable filtering for CONTAINS restriction
  cql3: add is_satisfied_by(bytes_view) for CONTAINS
2018-11-15 16:49:29 +02:00