Commit Graph

778 Commits

Author SHA1 Message Date
Calle Wilund
9ba84e458a Commitlog: Handle partial writes in segment::cycle
* Fixes #247
* Re-introduce test_allocation_failure, but allow for the "failure" to not
  happen. I.e. if run with low memory settings, the test will check that
  allocation failure is graceful. With lots of memory it will check partial
  write.
2015-08-31 20:02:05 +03:00
Paweł Dziepak
78eb61b38e tests: add test for managed_vector
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-31 17:29:16 +02:00
Paweł Dziepak
f1167a594a tests/cql_env: make sure that value views are correct
Query options need to have correct _value_views in order to
get_value_at() to work. With this patch we switch to constructor that
generates value views from the passed values and sets remaining options
to their default values.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-31 17:25:36 +02:00
Paweł Dziepak
4b9791230a tests/perf/simple_query: fix write mode
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-31 17:25:32 +02:00
Avi Kivity
f2a79aa7f6 Merge Prepare for closing sstables, part 1
Read-ahead will require that we close input_streams.  As part of that
we have to close sstables, and mutation_readers (which encapsulate
input_streams).  This is part 1 of a patchset series to do that.

(The overarching goal is to enable read-ahead for sstables, see #244)

Conflicts:
	sstables/compaction.cc
2015-08-31 16:15:18 +03:00
Avi Kivity
702de43ce3 Merge "Commit log replay" from Calle
"Initial implementation/transposition of commit log replay.

* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
  max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
  sstables are inspected for high water mark, and then replayed from
  those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
  per _previous_ runs shards, not current.

Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
  against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
  so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
  like origin. Partly because I am lazy, but also partly because our serial
  format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
  file, detailing which keyspace/cf:s to replay). Partly because we have no
  system properties.

There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"

Fixes #98.
2015-08-31 15:58:12 +03:00
Avi Kivity
7090dffe91 mutation_reader: switch to a class based implementation
Using a lambda for implementing a mutation_reader is nifty, but does not
allow us to add methods.

Switch to a class-based implementation in anticipation of adding a close()
method.
2015-08-31 15:53:53 +03:00
Calle Wilund
e068ffb5a5 Commitlog: Make file reader provide replay_position for entries 2015-08-31 14:29:47 +02:00
Calle Wilund
4ac07fa87d Commitlog test: remove some hardcoded assumptions on segment IDs
To enable changing the ID generation scheme.
2015-08-31 14:29:45 +02:00
Calle Wilund
0fcf7e3e91 Commitlog: Make "position" type 32-bit to align replay_position with
Origin

* Note: removed commitlog_test:test_allocation_failure because with 
  segments limited to 4GB -> mutation limited to 2GB, actually forcing
  a fail is not guaranteed or even likely.
2015-08-31 14:29:44 +02:00
Avi Kivity
8c69098c89 Merge "Optimize memtable's scanning_reader" from Tomasz
"I saw about 4% improvement in perf_sstable write on muninn with this. The
decorated_key comparison is gone from the perf profile now. Now most of the
work inside the reader is for copying the mutation."
2015-08-31 15:07:27 +03:00
Tomasz Grabiec
110a55886c lsa: Introduce region::compaction_counter() 2015-08-31 13:58:42 +02:00
Glauber Costa
a9ab31dd9c index_entry: move its fields to private visibility
And provide accessors. This will give us the freedom to change their internal
storage.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-29 14:05:36 -05:00
Glauber Costa
13d59c9618 index_entry: do away with the disk_string<> fields
Now that we are using the NSM, and not the general parser for the index, there
is no reason to keep using disk_string<>s in it. Since it is staying in the way
of further optimizations, let's get rid of it and use bytes directly.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-29 14:05:36 -05:00
Avi Kivity
c734ef2b72 Merge seastar upstream
* seastar 10e09b0...2e041c2 (7):
  > Merge "Change app_template::run() to terminate when callback is done" from Tomasz
  > resource: Fix compilation for hwloc version 1.8.0
  > memory: Fix infinite recursion when throwing std::bad_alloc
  > core/reactor: Throw the right error code when connect() fails
  > future: improve exception safety
  > xen: add missing virtual destructors
  > circular_buffer: do not destroy uninitialized object

app_template::run() users updated to call app_template::run_depracated().
2015-08-28 23:52:49 +03:00
Glauber Costa
bd272fe6aa perf_sstable: test sequential reads from an sstable.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-27 09:02:11 -05:00
Glauber Costa
b194509a6d perf_write: test for full writes
it writes 5 columns (configurable) per row. This will exercise other paths
aside from the index.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-27 09:02:11 -05:00
Glauber Costa
dcd312a982 perf_sstable: more than just the index
My plan was originally to have two separate sets of tests: one for the index,
and one for the data. With most of the code having ended up in the .hh file anyway,
this distinction became a bit pointless.

Let's put it everything here.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-27 09:02:11 -05:00
Glauber Costa
b3b0aff85e perf_sstable_index: add test for index_read
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-27 09:02:11 -05:00
Glauber Costa
93e55969f2 sstables: modify read_indexes so it no longer takes a quantity
read_indexes was one of the first functions coded in the sstable read path. At
the time, I made the (now so obviously) wrong decision to code it generic
enough so that we could specify the number of items to be read, instead of an
upper bound in the file.

The main reason for that, was that without the Summary, we have no way to know
where to stop reading, and the Summary is a relatively new addition to the C*
codebase: while I didn't really check when it got in, the code is full of tests
for its presence.

That turned out to be totally useless: we always read the indexes with the help
of the Summary. While the Summary is a relatively new addition to C*, it is
present in all version we aim to support. Meaning that reads without the
Summary will never happen in our codebase.

Even if, in the future, we happen to ditch the Summary file, we are very likely
to do so in favor of some other structure that also allows us to manipulate precise
borders in the Index.

The code as it is, however, would not be too big of a problem if that wasn't
causing us performance problems. But it is, and the majority of it is caused by
the fact that our underlying read_indexes do not know in advance how many bytes
to read, forcing us to do an element-per-element read.

It's time for a change.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-27 16:44:25 +03:00
Avi Kivity
5f62f7a288 Revert "Merge "Commit log replay" from Calle"
Due to test breakage.

This reverts commit 43a4491043, reversing
changes made to 5dcf1ab71a.
2015-08-27 12:39:08 +03:00
Avi Kivity
0fff367230 Merge "test for compaction metadata's ancestors" from Raphael 2015-08-27 11:07:53 +03:00
Avi Kivity
43a4491043 Merge "Commit log replay" from Calle
"Initial implementation/transposition of commit log replay.

* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
  max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
  sstables are inspected for high water mark, and then replayed from
  those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
  per _previous_ runs shards, not current.

Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
  against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
  so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
  like origin. Partly because I am lazy, but also partly because our serial
  format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
  file, detailing which keyspace/cf:s to replay). Partly because we have no
  system properties.

There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
2015-08-27 10:53:36 +03:00
Glauber Costa
873cf17cf4 sstable tests: allow for the creation of sstables of non-default buffer size.
This can now be used in the sstable_index_write performance test.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:50 -05:00
Glauber Costa
f4d8310d88 perf_sstable_index: calculate time spent before the map reduce operation.
Not doing that will include the smp communication costs in the total cost of
the operation. This will not very significant when comparing one run against
the other when the results clearly differ, but the proposed way yields error
figures that are much lower. So results are generally better.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:49 -05:00
Glauber Costa
19d25130af perf_sstable_index: make parallelism an explicit option
As we have discussed recently, the sstable writer can't even handle intra-core
parallelism - it has only one writer thread per core, and for reads, it affects
the final throughput a lot.

We don't want to get rid of it, because in real scenarios intra-core
parallelism will be there, specially for reads. So let's make it a tunable so we
can easily test its effect on the final result.

The iterations are now all sequential, and we will run x parallel invocation at
each of them.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:49 -05:00
Avi Kivity
b22a598efb mutation_reader: make noncopyable
Many mutation_reader implementations capture 'this', which, if copied,
becomes invalid.  Protect against this error my making mutation_reader
a non-copyable object.

Fix inadvertant copied around the code base.
2015-08-25 15:49:08 +03:00
Calle Wilund
fcb87471b9 Commitlog: Make file reader provide replay_position for entries 2015-08-25 09:40:53 +02:00
Calle Wilund
366263d866 Commitlog test: remove some hardcoded assumptions on segment IDs
To enable changing the ID generation scheme.
2015-08-25 09:14:40 +02:00
Raphael S. Carvalho
b2f76273bd tests: check correctness of sstable ancestor metadata
adding testcase for that purpose.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-24 15:25:52 -03:00
Avi Kivity
8a4648761c tests: make test cql environment use volatile system keyspace
Prevents hangs due to the database not being able to persist a memtable.

Tested-by: Asias He <asias@cloudius-systems.com>
2015-08-24 13:50:22 +03:00
Pekka Enberg
6dee204db2 cql3/query_options: Store values as bytes view
Store values as bytes view when possible. This improves the CQL protocol
option parsing path by avoiding allocating memory and copying individual
values as "bytes" objects.

Please note that we retain the non-view version for internal queries
where performance is not as important.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Raphael S. Carvalho
32ce27f00d tests: fix possible failure on compaction manager test
If sleep time isn't enough for compaction manager to select the
submitted cf for compaction, then the test will fail because the
compaction will not take place and subsequent checks will fail.
A solution is to sleep until the required condition becomes true.

Problem and solution found by Shlomi.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-19 18:26:54 +03:00
Avi Kivity
e7272d27cc tests: perf_mutation: convert to app_template
Won't work with lsa without it, due to too small default memory size.
2015-08-19 11:18:07 +03:00
Avi Kivity
2354611920 Merge "storage_service udpate" from Asias 2015-08-18 12:34:14 +03:00
Asias He
63a577c34c tests/cql_test_env: Init system_keyspace's local_cache
Soon storage_service will access it, we need to init it.
2015-08-18 17:06:02 +08:00
Raphael S. Carvalho
820ba6f4d2 adapt compaction manager for column family removal
We need a way to remove a column family from the compaction manager
because when dropping a column family we need to make sure that the
compaction manager doesn't hold a reference to it anymore.

So compaction manager queue is now of column_family, allowing us
to cancel requests pertaining to a column family being dropped.
There may be an ongoing compaction for the column family being
dropped, so we also need to wait for its termination.

Testcase for compaction manager was also adapted and improved.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-18 11:38:06 +03:00
Raphael S. Carvalho
2608427469 sstables: add support to range tombstone of a clustered row
Range tombstone for a clustered row wasn't supported, so an assert
to remember that was being triggered.
Testcase was added.

Fixes #158.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-18 10:41:25 +03:00
Calle Wilund
8f0f4e7945 Commitlog: do more extensive dir entry probes to determine type
Since directory_entry "type" might not be set.
Ensuring that code does not remain future free or easy to read.

Fixes #157.
2015-08-17 16:56:31 +03:00
Paweł Dziepak
4e3f81ee62 tests/cql3: test_tuples: test table creation as well
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-17 14:44:08 +02:00
Paweł Dziepak
498958878e tests/cql3: compare token() with bigints
The default partitioner is murmur3 for which correct token type is
bigint.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-14 16:51:20 +02:00
Paweł Dziepak
d9f20ebbd1 tests/cql3: add tests for compact storage tables
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-14 14:53:35 +02:00
Paweł Dziepak
36bd11bf96 tests/cql3: add tests for IN restrictions
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-13 11:36:16 +02:00
Paweł Dziepak
9966a2eac6 cql3: sort and remove duplicates in multi-column IN restrictions
Values inside IN clause should be sorted and duplicates removed if the
restricted columns are part of the clustering key, which is always true
for multi column restrictions.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-13 10:52:42 +02:00
Calle Wilund
562fa1a726 Disable allocation failure test in debug/sanitizer build
Since sanitizer does not fail gracefully on over-alloc
2015-08-12 20:00:44 +03:00
Avi Kivity
95847f86c3 Merge "locator: introduce i_endpoint_snitch::reset_snitch()" from Vlad
"This series introduces the i_endpoint_snitch::reset_snitch() static method
that allows to replace the current (global) snitch instance with the new one.
This is done in an (per-shard) atomic way transparent so anyone holding a reference
to snitch_ptr.

This series starts with some cleanups, adds the above method and the unit test
that verifies its functionality."
2015-08-12 19:29:08 +03:00
Avi Kivity
517ceed515 Merge "sstable index write benchmark"
"I am currently looking at the performance of our index_read, since it was in
the past pinpointed at the source of problems.

While the read side is the one that is mostly interesting, I would like to test
both - besides anything else, it is easier to test reads after writes so we
don't have to create synthetic data with outside tools.

This patch introduces the write side benchmark (read side will hopefully come
tomorrow).  While the write side is, as mentioned, not the most interesting
part, I did see some standing from the flamegraph that allowed me to optimize
one particular function, yielding a 8.6 % improvement."
2015-08-12 18:33:11 +03:00
Calle Wilund
47b7314c78 Commitlog: add test for too large alloc 2015-08-12 16:20:12 +02:00
Glauber Costa
4ddef06ba6 perf tests: test sstables index reads and writes
This is a test that allow us to query the performance of our sstable index
reads and writes (currently only writes implemented). A lot of potentially
common code is put into a header, which will make writing new tests easier if
needed.

We don't want to take shortcuts for this, so all reading and writing is done
through public sstable interfaces.

For writing, there is no way to write the index without writing the datafile.
But because we are only writing the primary key, the datafile will not contain
anything else. This is the closest we can get to an index testing with the
public interfaces.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-12 09:18:37 -05:00
Glauber Costa
07eb98e799 tests: enhance _remove so it also removes directory structures
if a directory is found, recursively delete it. This will be useful for
allowing the creation of test structures like test/cpuX/sstable

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-12 09:17:37 -05:00