This makes the tests a bit more strict by also checking the message
returned by the what() function.
This shows that some of the tests are out of sync with which errors
they check for. I will hopefully fix this in another pass.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
sstable_test.cc was already a bit too big and there is potential for
having a lot of tests about broken sstables.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
"
This patchset addresses two recently discovered bugs both triggered by
summary regeneration:
Tests: unit {release}
+
Validated with debug build of Scylla (ASAN) that no use-after-free
occurs when re-generating Summary.db.
"
* 'projects/sstables-30/summary-regeneration/v1' of https://github.com/argenet/scylla:
tests: Add test reading SSTables in 'mc' format with missing summary.
sstables: When loading, read statistics before summary.
database: Capture io_priority_class by reference to avoid dangling ref.
As far as I can tell the old sstable reading code required reading the
data into a contiguous buffer. The function data_consume_rows_at_once
implemented the old behavior and incrementally code was moved away
from it.
Right now the only use is in two tests. The sstables used in those
tests are already used in other tests with data_consume_rows.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181127024319.18732-2-espindola@scylladb.com>
"
Compression is not deterministic so instead of binary comparing the sstable files we just read data back
and make sure everything that was written down is still present.
Tests: unit(release)
"
* 'haaawk/binary-compare-of-compressed-sstables/v3' of github.com:scylladb/seastar-dev:
sstables: Remove compressed parameter from get_write_test_path
sstables: Remove unused sstable test files
sstables: Ensure compare_sstables isn't used for compressed files
sstables: Don't binary compare compressed sstables
sstables: Remove debug printout from test_write_many_partitions
"
One part of the improvement comes from replacing zlib's CRC32 with the one
from libdeflate, which is optimized for modern architecture and utilizes the
PCLMUL instruction.
perf_checksum test was introduced to measure performance of various
checksumming operations.
Results for 514 B (relevant for writing with compression enabled):
test iterations median mad min max
crc_test.perf_deflate_crc32_combine 58414 16.711us 3.483ns 16.708us 16.725us
crc_test.perf_adler_combine 165788278 6.059ns 0.031ns 6.027ns 7.519ns
crc_test.perf_zlib_crc32_combine 59546 16.767us 26.191ns 16.741us 16.801us
---
crc_test.perf_deflate_crc32_checksum 12705072 83.267ns 4.580ns 78.687ns 98.964ns
crc_test.perf_adler_checksum 3918014 206.701ns 23.469ns 183.231ns 258.859ns
crc_test.perf_zlib_crc32_checksum 2329682 428.787ns 0.085ns 428.702ns 510.085ns
Results for 64 KB (relevant for writing with compression disabled):
test iterations median mad min max
crc_test.perf_deflate_crc32_combine 25364 38.393us 17.683ns 38.375us 38.545us
crc_test.perf_adler_combine 169797143 5.842ns 0.009ns 5.833ns 6.901ns
crc_test.perf_zlib_crc32_combine 26067 38.663us 95.094ns 38.546us 40.523us
---
crc_test.perf_deflate_crc32_checksum 202821 4.937us 14.426ns 4.912us 5.093us
crc_test.perf_adler_checksum 44684 22.733us 206.263ns 22.492us 25.258us
crc_test.perf_zlib_crc32_checksum 18839 53.049us 36.117ns 53.013us 53.274us
The new CRC32 implementation (deflate_crc32) doesn't provide a fast
checksum_combine() yet, it delegates to zlib so it's as slow as the latter.
Because for CRC32 checksum_combine() is several orders of magnitude slower
than checksum(), we avoid calling checksum_combine() completely for this
checksummer. We still do it for adler32, which has combine() which is faster
than checksum().
SStable write performance was evaluated by running:
perf_fast_forward --populate --data-directory /tmp/perf-mc \
--rows=10000000 -c1 -m4G --datasets small-part
Below is a summary of the average frag/s for a memtable flush. Each result is
an average of about 20 flushes with stddev of about 4k.
Before:
[1] MC,lz4: 330'903
[2] LA,lz4: 450'157
[3] MC,checksum: 419'716
[4] LA,checksum: 459'559
After:
[1'] MC,lz4: 446'917 ([1] + 35%)
[2'] LA,lz4: 456'046 ([2] + 1.3%)
[3'] MC,checksum: 462'894 ([3] + 10%)
[4'] LA,checksum: 467'508 ([4] + 1.7%)
After this series, the performance of the MC format writer is similar to that
of the LA format before the series.
There seems to be a small but consistent improvement for LA too. I'm not sure
why.
"
* tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla:
tests: perf: Introduce perf_checksum
tests: Add test for libdeflate CRC32 implementation
sstables: compress: Use libdeflate for crc32
sstables: compress: Rename crc32_utils to zlib_crc32_checksummer
licenses: Add libdeflate license
Integrate libdeflate with the build system
Add libdeflate submodule
sstables: Avoid checksum_combine() for the crc32 checksummer
sstables: compress: Avoid unnecessary checksum_combine()
sstables: checksum_utils: Add missing include
"
Previously we were checking for schema incompatibility between current schema and sstable
serialization header before reading any data. This isn't the best approach because
data in sstable may be already irrelevant due to column drop for example.
This patchset moves the check after actual data is read and verified that it has
a timestamp new enough to classify it as nonobsolete.
Fixes#3924
"
* 'haaawk/3924/v3' of github.com:scylladb/seastar-dev:
sstables: Enable test_schema_change for MC format
sstables3: Throw error on schema mismatch only for live cells
sstables: Pass column_info to consume_*_column
sstables: Add schema_mismatch to column_info
sstables: Store column data type in column_info
sstables: Remove code duplication in column_translation
This family of test_write_many_partitions_* tests writes
sstables down from memtable using different compressions.
Then it compares the resulting file with a blueprint file
and reads the data back to check everything is there.
Compression is not deterministic so this patch makes the
tests not compare resulting compressed sstable file with blueprint
file and instead only read data back.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
"
Some queries are very unlikely to hit cache. Usually this includes
range queries on large tables, but other patterns are possible.
While the database should adapt to the query pattern, sometimes the
user has information the database does not have. By passing this
information along, the user helps the database manage its resources
more optimally.
To do this, this patch introduces a BYPASS CACHE clause to the
SELECT statement. A query thus marked will not attempt to read
from the cache, and instead will read from sstables and memtables
only. This reduces CPU time spent to query and populate the cache,
and will prevent the cache from being flooded with data that is
not likely to be read again soon. The existing cache disabled path
is engaged when the option is selected.
Tests: unit (release), manual metrics verification with ccm with and without the
BYPASS CACHE clause.
Ref #3770.
"
* tag 'cache-bypass/v2' of https://github.com/avikivity/scylla:
doc: document SELECT ... BYPASS CACHE
tests: add test for SELECT ... BYPASS CACHE
cql: add SELECT ... BYPASS CACHE clause
db: add query option to bypass cache
* tag 'perf-ffwd-dataset-population-v2' of github.com:tgrabiec/scylla:
tests: perf_fast_forward: Measure performance of dataset population
tests: perf_fast_forward: Record the dataset on which test case was run
tests: perf_fast_forward: Introduce the concept of a dataset
tests: perf_fast_forward: Introduce make_compaction_disabling_guard()
tests: perf_fast_forward: Initialize output manager before population
tests: perf_fast_forward: Handle empty test parameter set
tests: perf_fast_forward: Extract json_output_writer::write_common_test_group()
tests: perf_fast_forward: Factor out access to cfg to a single place per function
tests: perf_fast_forward: Extract result_collector
tests: perf_fast_forward: Take writes into account in AIO statistics
tests: perf_fast_forward: Reorder members
tests: perf_fast_forward: Add --sstable-format command line option
Make sure that compaction is capable of releasing exhausted sstable space
early in the procedure.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently, compaction only replace input sstables at end of compaction,
meaning compaction must be finished for all the space of those sstables
to be released.
What we can do instead is to delete earlier some input sstable under
some conditions:
1) SStable data should be committed to a new, sealed output sstable,
meaning it's exhausted.
2) Exhausted sstable mustn't overlap with a non-exhausted sstable
because a tombstone in the exhausted could have been purged and the
shadowed data in non-exhausted could be ressurected if system
crashes.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Older sstables must have an identifier for them to be associated
with their own run.
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
A dataset represents a table with data, populated in certain way, with
certain characteristics of the schema and data.
Before this change, datasets were implicitly defined, with population
hard-coded inside the populate() function.
This change gathers logic related to datasets into classes, in order to:
- make it easier to define new datasets.
- be able to measure performance of dataset population in a
standardized way.
- being able to express constraints on datasets imposed by different
test cases. Test cases are matched with possible datasets based
on the abstract interface they accept (e.g. clustered_ds,
multipartition_ds), and which must be implemented by a compatible
dataset. To facilitate this matching, test function is now wrapped
into a dataset_acceptor object, with an automatically-generated can_run()
virtual method, deduced by make_test_fn().
- be able to select tests to run based on the dataset name.
Only tests which are compatible with that dataset will be run.
Extracts the result collection and reporting logic out of
run_test_case(). Will be needed in population tests, for which we
don't want the looping logic.
for_each_schema_change() is used for testing reading an sstable that was
written with a different schema. Because of #3924, for now the mc format
is not verified this way.
This patch adds for_each_schema_change() functions which generates
schemas and data before and after some modification to the schema (e.g.
adding a column, changing its type). It can be used to test schema
upgrades.
This patch introduces a model of Scylla schemas and data, implemented
using simple standard library primitives. It can be used for testing the
actuall schemas, mutation_partitions, etc. used by the schema by
comparing the results of various actions.
The initial use case for this model was to test schema changes, but
there is no reason why in the future it cannot be extended to test other
things as well.
"
Tested with perf_fast_forward from:
github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1
Using the following command line:
build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \
--data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \
--datasets small-part
The average reported flush throughput was (stdev for the avergages is around 4k):
- for mc before the series: 367848 frag/s
- for lc before the series: 463458 frag/s (= mc.before +25%)
- for mc after the series: 429276 frag/s (= mc.before +16%)
- for lc after the series: 466495 frag/s (= mc.before +26%)
Refs #3874.
"
* tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla:
sstables: mc: Avoid serialization of promoted index when empty
sstables: mc: Avoid double serialization of rows
tests: sstable 3.x: Do not compare Statistics component
utils: Introduce memory_data_sink
schema: Optimize column count getters
sstables: checksummed_file_data_sink_impl: Bypass output_stream
The Statistics component recorded in the test was generated using a
buggy verion of Scylla, and is not correct. Exposed by fixing the bug
in the way statistics are generated.
Rather than comparing binary content, we should have explicit checks
for statistics.
"
Enables sstable compression with LZ4 by default, which was the
long-time behavior until a regression turned off compression by
default.
Fixes#3926
"
* 'restore-default-compression/v2' of https://github.com/duarten/scylla:
tests/cql_query_test: Assert default compression options
compress: Restore lz4 as default compressor
tests: Be explicit about absence of compression
The boost multiprecision library that I am compiling against seems
to be missing an overload for the cast to a string. The easy
workaround seems to be to call str() directly instead.
This also fixes#3922.
Message-Id: <20181120215709.43939-1-mike.munday@ibm.com>