A dataset represents a table with data, populated in certain way, with
certain characteristics of the schema and data.
Before this change, datasets were implicitly defined, with population
hard-coded inside the populate() function.
This change gathers logic related to datasets into classes, in order to:
- make it easier to define new datasets.
- be able to measure performance of dataset population in a
standardized way.
- being able to express constraints on datasets imposed by different
test cases. Test cases are matched with possible datasets based
on the abstract interface they accept (e.g. clustered_ds,
multipartition_ds), and which must be implemented by a compatible
dataset. To facilitate this matching, test function is now wrapped
into a dataset_acceptor object, with an automatically-generated can_run()
virtual method, deduced by make_test_fn().
- be able to select tests to run based on the dataset name.
Only tests which are compatible with that dataset will be run.
Extracts the result collection and reporting logic out of
run_test_case(). Will be needed in population tests, for which we
don't want the looping logic.
Currently if hints directory contains unexpected directories Scylla fails to
start with unhandled std::invalid_argument exception. Make the manager
ignore malformed files instead and try to proceed anyway.
Message-Id: <20181121134618.29936-2-gleb@scylladb.com>
We scan hints directory in two places: to search for files to replay and
to search for directories to remove after resharding. The code that
translates directory name to a shard is duplicated. It is simple now, so
not a bit issue but in case it grows better have it in one place.
Message-Id: <20181121134618.29936-1-gleb@scylladb.com>
"
Tested with perf_fast_forward from:
github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1
Using the following command line:
build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \
--data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \
--datasets small-part
The average reported flush throughput was (stdev for the avergages is around 4k):
- for mc before the series: 367848 frag/s
- for lc before the series: 463458 frag/s (= mc.before +25%)
- for mc after the series: 429276 frag/s (= mc.before +16%)
- for lc after the series: 466495 frag/s (= mc.before +26%)
Refs #3874.
"
* tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla:
sstables: mc: Avoid serialization of promoted index when empty
sstables: mc: Avoid double serialization of rows
tests: sstable 3.x: Do not compare Statistics component
utils: Introduce memory_data_sink
schema: Optimize column count getters
sstables: checksummed_file_data_sink_impl: Bypass output_stream
The old code was serializing the row twice. Once to get the size of
its block on disk, which is needed to write the block length, and then
to actually write the block.
This patch avoids this by serializing once into a temporary buffer and
then appending that buffer to the data file writer.
I measured about 10% improvement in memtable flush throughput with
this for the small-part dataset in perf_fast_forward.
The Statistics component recorded in the test was generated using a
buggy verion of Scylla, and is not correct. Exposed by fixing the bug
in the way statistics are generated.
Rather than comparing binary content, we should have explicit checks
for statistics.
"
Enables sstable compression with LZ4 by default, which was the
long-time behavior until a regression turned off compression by
default.
Fixes#3926
"
* 'restore-default-compression/v2' of https://github.com/duarten/scylla:
tests/cql_query_test: Assert default compression options
compress: Restore lz4 as default compressor
tests: Be explicit about absence of compression
Indicate the default scylla directories, rather than Cassandra's.
Provide links to Scylladocumentation where possible,
update links to Casandra documentation otherwise.
Clean up a few typos.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181119141912.28830-1-bhalevy@scylladb.com>
This avoids a difference between little and big endian sytems. We
now also calculate a full murmur hash for tokens with less than 8
bytes, however in practice the token size is always 8.
Message-Id: <20181120214733.43800-1-mike.munday@ibm.com>
The boost multiprecision library that I am compiling against seems
to be missing an overload for the cast to a string. The easy
workaround seems to be to call str() directly instead.
This also fixes#3922.
Message-Id: <20181120215709.43939-1-mike.munday@ibm.com>
Fixes a regression introduced in
74758c87cd, where tables started to be
created without compression by default (before they were created with
lz4 by default).
Fixes#3926
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
The build environment may not installed ninja-build before running
install-dependencies.sh, so do it after running the script.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110737.17755-1-syuu@scylladb.com>
* seastar a44cedf...d59fcef (10):
> dns: Set tcp output stream buffer size to zero explicitly
> tests: add libc-ares to travis dependencies
> tests: add dns_test to test suite
> build: drop bundled c-ares package
> prometheus: replace the instance label with an optional one
> build: Refactor C++ dialect detection
> build: add libatomic to install-depenencies.sh
> core: use std::underlying_type for open_flags
> core: introduce open_flags::operator&
> core: Fix build for `gnu++14`
Currently, when advance_and_await() fails to allocate the new gate
object, it will throw bad_alloc and leave the phased_barrier object in
an invalid state. Calling advance_and_await() again on it will result
in undefined behavior (typically SIGSEGV) beacuse _gate will be
disengaged.
One place affected by this is table::seal_active_memtable(), which
calls _flush_barrier.advance_and_await(). If this throws, subsequent
flush attempts will SIGSEGV.
This patch rearranges the code so that advance_and_await() has strong
exception guarantees.
Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>
In (almost) all SSTable write paths, we need to inform the monitor that
the write has failed as well. The monitor will remove the SSTable from
controller's tracking at that point.
Except there is one place where we are not doing that: streaming of big
mutations. Streaming of big mutations is an interesting use case, in
which it is done in 2 parts: if the writing of the SSTable fails right
away, then we do the correct thing.
But the SSTables are not commited at that point and the monitors are
still kept around with the SSTables until a later time, when they are
finally committed. Between those two points in time, it is possible that
the streaming code will detect a failure and manually call
fail_streaming_mutations(), which marks the SSTable for deletions. At
that point we should propagate that information to the monitor as well,
but we don't.
Fixes#3732 (hopefully)
Tests: unit (release)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181114213618.16789-1-glauber@scylladb.com>
In a homogeneous cluster this will reduce number of internal cross-shard hops
per request since RPC calls will arrive to correct shard.
Message-Id: <20181118150817.GF2062@scylladb.com>
In commit a33f0d6, we changed the way we handle arrays during the write
and parse code to avoid reactor stalls. Some potentially big loops were
transformed into futurized loops, and also some calls to vector resizes
were replaced by a reserve + push_back idiom.
The latter broke parsing of the estimated histogram. The reason being
that the vectors that are used here are already initialized internally
by the estimated_histogram object. Therefore, when we push_back, we
don't fill the array all the way from index 0, but end up with a zeroed
beginning and only push back some of the elements we need.
We could revert this array to a resize() call. After all, the reason we
are using reserve + push_back is to avoid calling the constructor member
for each element, but We don't really expect the integer specialization
to do any of that.
However, to avoid confusion with future developers that may feel tempted
to converted this as well for the sake of consistency, it is safer to
just make sure these arrays are zeroed.
Fixes#3918
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181116130853.10473-1-glauber@scylladb.com>
Remove the timeout argument to
db::view::view_builder::wait_until_built(), a test-only function to
wait until a given materialized view has finished building.
This change is motivated by the fact that some tests running on slow
environments will timeout. Instead of incrementally increasing the
timeout, remove it completely since tests are already run under an
exterior timeout.
Fixes#3920
Tests: unit release(view_build_test, view_schema_test)
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181115173902.19048-1-duarte@scylladb.com>
map_reduce_column_families_locally iterate over all tables (column
family) in a shard.
If the number of tables is big it can cause latency spikes.
This patch replaces the current loop with a do_for_each allowing
preepmtion inside the loop.
Fixes#3886
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181115154825.23430-1-amnon@scylladb.com>
In a previous patch I fixed most TTLs in the view_complex_test.cc tests
from low numbers to 100 seconds. I missed one. This one never caused
problems in practice, but for good form, let's fix it too.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115160234.26478-1-nyh@scylladb.com>
After this patch, the Materialized Views and Secondary Index features
are considered generally-available and no longer require passing an
explicit "--experimental=on" flag to Scylla.
The "--experimental=on" flag and the db::config::check_experimental()
function remain unused, as we graduated the only two features which used
this flag. However, we leave the support for experimental features in
the code, to make it easier to add new experimental features in the future.
Another reason to leave the command-line parameter behind is so existing
scripts that still use it will not break.
Fixes#3917
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115144456.25518-1-nyh@scylladb.com>
"
This series enables filtering support for CONTAINS restriction.
"
* 'enable_filtering_for_contains_2' of https://github.com/psarna/scylla:
tests: add CONTAINS test case to filtering tests
cql3: enable filtering for CONTAINS restriction
cql3: add is_satisfied_by(bytes_view) for CONTAINS