Commit Graph

1555 Commits

Author SHA1 Message Date
Raphael S. Carvalho
050a7019b8 sstables/index_reader: fix index reader for summary entry spanning lots of keys
quantity prevents index_reader from reading all index entries of a summary
entry that span more than min_index_interval entries. That can happen after
introduction of size-based sampling, and consequently, sstable will not be
able to return a key which logical position in summary entry is beyond
min_index_interval. It's ok to not use quantity because index_reader will
read all indexes until either next summary entry or end of file is reached.

Fixes test_sstable_conforms_to_mutation_source

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170812045821.25269-1-raphaelsc@scylladb.com>
2017-08-12 09:44:16 +03:00
Avi Kivity
dbf8625ac9 Merge "size-based sampling for sstable summary" from Raphael
"Fixes #1842."

* 'size_based_sampling_v3' of github.com:raphaelsc/scylla:
  tests: test summary entry spanning more keys than min interval
  db/config: introduce sstable_summary_ratio option
  sstables: introduce size-based sampling for sstable summary
  sstables: make components_writer::offset const qualified and uint64_t
  sstables: make writer::offset const qualified and uint64_t
2017-08-11 18:41:45 +03:00
Duarte Nunes
20337053ad Don't use literal lambdas
These are only available in C++17. Fixes the build after b5460c2.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-11 13:08:42 +02:00
Duarte Nunes
b5460c2990 Merge "Support duration type" from Jesse
"This patch series adds support for the `duration` type in CQL, which
was added to Cassandra in 3.10.

As part of this work, it was necessary also to add support for the
`vint` and `unsigned vint` types to the native protocol implementation,
which are part of v5 of the specification.

To test interactively, it is necessary to use cqlsh distributed with
Cassandra, as the version we distribute does not yet support the
duration type."

* 'jhk/duration_protocol/v5' of https://github.com/hakuch/scylla:
  Support `duration` CQL native type
  CQL native protocol: Add support for `vint` serialization
  duration_test.cc: Add test for printing zero duration
  duration.cc: Remove nop `const` qualifier on return type
  Change `const` qualifier declaration order for `duration`
  duration.cc: Simplify range checking
  Rename `duration` to `cql_duration`
2017-08-11 10:56:55 +01:00
Raphael S. Carvalho
5124f94358 tests: test summary entry spanning more keys than min interval
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-08-11 01:37:06 -03:00
Raphael S. Carvalho
8726ee937d sstables: introduce size-based sampling for sstable summary
Currently, a summary entry is added after min_index_interval index
entries were written. Not taking into account size of index entries
becomes a problem with large partitions which may create big index
entries due to promoted indexes. Read performance is affected as a
consequence because index entries spanned by summary are all read
from disk to serve request.

What we wanna do is to also add a summary entry after index reaches
a boundary. To deal with oversampling, we want to write 1 byte to
summary for every 2000 bytes written to data file (this will be
eventually made into an option in the config file).
Both conditions must be met to avoid under or oversampling.
That way, the amount of data needed from index file to satify the
request is drastically reduced.

Fixes #1842.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-08-11 00:30:12 -03:00
Jesse Haber-Kucharsky
509626fe08 Support duration CQL native type
`duration` is a new native type that was introduced in Cassandra 3.10 [1].

Support for parsing and the internal representation of the type was added in
8fa47b74e8.

Important note: The version of cqlsh distributed with Scylla does not have
support for durations included (it was added to Cassandra in [2]). To test this
change, you can use cqlsh distributed with Cassandra.

Duration types are useful when working with time-series tables, because they can
be used to manipulate date-time values in relative terms.

Two interesting applications are:

- Aggregation by time intervals [3]:

`SELECT * FROM my_table GROUP BY floor(time, 3h)`

- Querying on changes in date-times:

`SELECT ... WHERE last_heartbeat_time < now() - 3h`

(Note: neither of these is currently supported, though columns with duration
values are.)

Internally, durations are represented as three signed counters: one for months,
for days, and for nanoseconds. Each of these counters is serialized using a
variable-length encoding which is described in version 5 of the CQL native
protocol specification.

The representation of a duration as three counters means that a semantic
ordering on durations doesn't exist: Is `1mo` greater than `1mo1d`? We cannot
know, because some months have more days than others. Durations can only have a
concrete absolute value when they are "attached" to absolute date-time
references. For example, `2015-04-31 at 12:00:00 + 1mo`.

That duration values are not comparable presents some difficulties for the
implementation, because most CQL types are. Like in Cassandra's implementation
[2], I adopted a similar strategy to the way restrictions on the `counter` type
are checked. A type "references" a duration if it is either a duration or it
contains a duration (like a `tuple<..., duration, ...>`, or a UDT with a
duration member).

The following restrictions apply on durations. Note that some of these contexts
are either experimental features (materialized views), or not currently
supported at run-time (though support exists in the parser and code, so it is
prudent to add the restrictions now):

- Durations cannot appear in any part of a primary key, either for tables or
  materialized views.

- Durations cannot be directly used as the element type of a `set`, nor can they
  be used as the key type of a `map`. Because internal ordering on durations is
  based on a byte-level comparison, this property of Cassandra was intended to
  help avoid user confusion around ordering of collection elements.

- Secondary indexes on durations are not supported.

- "Slice" relations (<=, <, >=, >) are not supported on durations with `WHERE`
   restrictions (like `SELECT ... WHERE span <= 3d`). Multi-column restrictions
   only work with clustering columns, which cannot be `duration` due to the
   first rule.

- "Slice" relations are not supported on durations with query conditions (like
  `UPDATE my_table ... IF span > 5us`).

Backwards incompatibility note:

As described in the documentation [4], duration literals take one of two
forms: either ISO 8601 formats (there are three), or a "standard" format. The ISO
8601 formats start with "P" (like "P5W"). Therefore, identifiers that have this
form are no longer supported.

Fixes #2240.

[1] https://issues.apache.org/jira/browse/CASSANDRA-11873

[2] bfd57d13b7

[3] https://issues.apache.org/jira/browse/CASSANDRA-11871

[4] http://cassandra.apache.org/doc/latest/cql/types.html#working-with-durations
2017-08-10 15:01:10 -04:00
Jesse Haber-Kucharsky
91dab1d998 CQL native protocol: Add support for vint serialization
Version 5 of the native protocol for CQL [1] adds the `vint` and `unsigned vint`
types.

An unsigned integer encoded as a `vint` has a variable size based on the
magnitude of the value. The first byte indicates the total number of bytes.

For signed integers, a "zig-zag" encoding scheme ensures that small negative
values are encoded as short-length `vint`s (0 -> 0, -1 -> 1, 1 -> 2, 2 -> 3, -2
-> 4, etc).

[1] https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec
2017-08-10 14:11:30 -04:00
Jesse Haber-Kucharsky
77489f843f duration_test.cc: Add test for printing zero duration
It's somewhat counter-intuitive, but Cassandra also formats zero-valued
duration values as an empty string.
2017-08-10 14:11:30 -04:00
Botond Dénes
9ee9988097 Add combined_mutation_reader_test unit test 2017-08-10 12:38:10 +03:00
Jesse Haber-Kucharsky
352e9f60ba Rename duration to cql_duration
`std::chrono::duration` is a prolific enough name that it's best to
disambiguate.
2017-08-09 15:15:20 -04:00
Botond Dénes
94fc550e68 sstable_set::incremental_selector: select() now returns a selection
A seletion contains - in addition to the list of sstables - a next_token
which is a hint as to what is the next best token to call select() with.
This should be the smallest token such that at the next call to
select() the least number of new sstables will be returned, without
skipping any.
2017-08-09 16:27:33 +03:00
Duarte Nunes
4c9206ba2f tests/sstable_mutation_test: Don't use moved-from object
Fix a bug introduced in dbbb9e93d and exposed by gcc6 by not using a
moved-from object. Twice.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170802161033.4213-1-duarte@scylladb.com>
2017-08-03 09:45:49 +03:00
Avi Kivity
db7329b1cb Merge "Ensure correct EOC for PI block cell names" from Duarte
"This series ensures the always write correct cell names to promoted
index cell blocks, taking into account the eoc of range tombstones.

Fixes #2333"

* 'pi-cell-name/v1' of github.com:duarten/scylla:
  tests/sstable_mutation_test: Test promoted index blocks are monotonic
  sstables: Consider eoc when flushing pi block
  sstables: Extract out converting bound_kind to eoc
2017-08-01 18:09:07 +03:00
Paweł Dziepak
e970630272 tests/serialized_action: add missing forced defers
serialized_action_tests depends on the fact that first part of the
serialized_action is executed at cetrtain points (in which it reads a
global variable that is later updated by the main thread).
This worked well in the release mode before ready continuations were
inlined and run immediately, but not in the debug mode since inlining
was not happening and the main seastar::thread was missing some yield
points.
Message-Id: <20170731103013.26542-1-pdziepak@scylladb.com>
2017-07-31 11:35:24 +01:00
Paweł Dziepak
e62403190b Merge "Introduce perf_cache_eviction test" from Tomasz
Runs appending writes to a single partition, at full speed, and a reader
which selects the head of the partition, with 100ms delay between reads.
Prints latency percentiles and some stats.

Intended to test performance at the transition from non-evicting to
evicting modes.

Currently we can see that after the transition, whole partition gets
evicted and reads constantly miss.

Sample output:

    rd/s: 10, wr/s: 135947, ev/s: 0, pmerge/s: 1, miss/s: 0, cache: 708/778 [MB], LSA: 820/910 [MB], std free: 82 [MB]

    reads : min: 149   , 50%: 179   , 90%: 1331  , 99%: 1331  , 99.9%: 1331  , max: 6866   [us]
    writes: min: 3     , 50%: 4     , 90%: 4     , 99%: 5     , 99.9%: 258   , max: 51012  [us]

    rd/s: 7, wr/s: 93354, ev/s: 9, pmerge/s: 1, miss/s: 3, cache: 0/0 [MB], LSA: 107/128 [MB], std free: 82 [MB]

    reads : min: 179   , 50%: 179   , 90%: 73457 , 99%: 73457 , 99.9%: 73457 , max: 105778 [us]
    writes: min: 3     , 50%: 4     , 90%: 4     , 99%: 5     , 99.9%: 258   , max: 105778 [us]

* tag 'tgrabiec/row-eviction-perf-test' of github.com:scylladb/seastar-dev:
  tests: Introduce perf_cache_eviction
  tests: simple_schema: Add getter for DDL statement
  estimated_histogram: Implement percentile()
  utils: estimated_histogram: Make printable
2017-07-28 09:49:22 +01:00
Tomasz Grabiec
6a3703944b utils: Introduce serialized_action 2017-07-27 20:08:21 +02:00
Duarte Nunes
dbbb9e93da tests/sstable_mutation_test: Test promoted index blocks are monotonic
Reproduces #2333

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 18:23:58 +02:00
Tomasz Grabiec
ac7e6ef1bc tests: Introduce perf_cache_eviction 2017-07-27 17:19:07 +02:00
Tomasz Grabiec
2d2e7ef6fb tests: simple_schema: Add getter for DDL statement 2017-07-27 17:19:07 +02:00
Paweł Dziepak
28c105e4a7 sstables: avoid copying key components 2017-07-26 14:38:27 +01:00
Duarte Nunes
1622847c1d perf/perf_fast_forward: Don't pass non-pod to varargs function
Passing a Non-POD object to variadic functions is unsupported.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170726094756.22867-1-duarte@scylladb.com>
2017-07-26 11:48:22 +01:00
Duarte Nunes
472f32fb06 tests/schema_change_test: Add test case for add+drop notification
Reproduces #2616

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170725170622.4380-2-duarte@scylladb.com>
2017-07-26 11:59:48 +02:00
Paweł Dziepak
79a1ad7a37 tests/row_cache: test queries with no clustering ranges
Reproducer for #2604.
Message-Id: <20170725131220.17467-3-pdziepak@scylladb.com>
2017-07-25 15:29:17 +02:00
Paweł Dziepak
1ea507d6ae tests: do not overload the meaning of empty clustering range
Empty clustering key range is perfectly valid and signifies that the
reader is not interested in anything but the static row. Let's not
make it mean anything else.
Message-Id: <20170725131220.17467-2-pdziepak@scylladb.com>
2017-07-25 15:28:12 +02:00
Avi Kivity
c21bb5ae05 tests: fix sstable_datafile_test build with boost 1.55
Boost 1.55 accidentally removed support for "range for" on
recursive_directory_iterator (previous and latter versions do
support it). Use old-style iteration instead.

Message-Id: <20170724080128.8824-1-avi@scylladb.com>
2017-07-24 11:20:12 +03:00
Paweł Dziepak
823fb5e9d8 perf_fast_forward: use consumer interface for reading streamed_mutation
Using streamed_mutation::operator() is undesirable as it introduces an
indirect call and a continuation overhead for each emitted mutation
fragment. Consumer interface is the preferred method of reading streamed
mutations.
2017-07-20 11:02:53 +01:00
Paweł Dziepak
d184508d7b perf_fast_forward: allow running only selected test groups 2017-07-20 11:02:31 +01:00
Paweł Dziepak
a18a36c94b perf_fast_forward: move tests groups to separate functions 2017-07-20 09:26:42 +01:00
Paweł Dziepak
3fd4f9c1c7 perf_fast_forward: move global state to global scope
All test perf_fast_forward test cases currently live in the main
function. This patch moves the state they rely on to a global scope
so that it will be easier to extract these tests to individual
functions.
2017-07-20 09:26:42 +01:00
Duarte Nunes
1daf1bc4bb Merge 'Revert back to 1.7 schema layout in memory' from Tomasz
"Fixes schema layout incompatibility in a mixed 1.7 and 2.0 cluster (#2555)
by reverting back to using the old layout in memory and thus also
in across-node requests. We still use the new v3 layout in schema
tables (needed by drivers and external tools). Translations happen
when converting to/from schema mutations."

* tag 'tgrabiec/use-v2-schema-layout-in-memory-v2' of github.com:scylladb/seastar-dev:
  schema: Revert back to the 1.7 layout of static compact tables in memory
  schema: Use v3 column layout when converting to/from schema mutations
  schema: Encapsulate column layout translations in the v3_columns class
2017-07-19 12:52:52 +02:00
Duarte Nunes
ab72132cb1 view_schema_test: Retry failed queries
Due to the asynchronous nature of view update propagation, results
might still be absent from views when we query them. To be able to
deterministically assert on view rows, this patch retries a query a
bounded number of times until it succeeds.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170718212646.2958-1-duarte@scylladb.com>
2017-07-19 09:59:44 +02:00
Tomasz Grabiec
a9237c1666 schema: Revert back to the 1.7 layout of static compact tables in memory
We are using C* 3.x compatible layout in schema tables but want to
keep using the 1.7 layout in memory for compatibility during rolling
upgrade. This patch switches the schema and schema_builder classes
back to the old layout. Translation of layout happens when converting
to/from schema mutations.

Notable changes:

 1) Includes a revert of commit 6260f31e08
    "thrift: Update CQL mapping of static CFs".

 2) Brings back the "default_validation_class" schema attribute. In v3
    it can be dervied from column definitions, but in v2 it can't, so
    we have to store it.

 3) legacy_schema_migrator and schema_builder don't have to do
    conversions to v3, this is now handled by the v3_columns
    class. schema_builder works with the same layout as schema, that
    is v2.

 4) Includes a revert of commit 66991a7ccb
    "v3 schema test fixes"

Fixes #2555.
2017-07-19 09:52:15 +02:00
Raphael S. Carvalho
c55c63f213 tests: add tests for time window compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-19 02:58:37 -03:00
Avi Kivity
d9c64ef737 tests: move tmpdir to /tmp
Reduces view_schema_test runtime to 5 seconds, from 53 seconds on an NVMe disk
with write-back cache, and forever on a spinning disk.
Message-Id: <20170716081653.10018-1-avi@scylladb.com>
2017-07-16 11:55:08 +02:00
Avi Kivity
9116dd91cb tests: copy the sstable with an unknown component to the data directory
We will be creating links to those sstable's files, and those don't work
if the data directory and the test sstable are on different devices.

Copying the files to the same directory fixes the problem.
Message-Id: <20170716090405.14307-1-avi@scylladb.com>
2017-07-16 11:55:00 +02:00
Avi Kivity
162d9aa85d tests: fix view_schema_test with clang
Clang is happy to create a vector<data_value> from a {}, a {1, 2}, but not a {1}.
No doubt it is correct, but sheesh.

Make the data_value explicit to humor it.
Message-Id: <20170713074315.9857-1-avi@scylladb.com>
2017-07-14 12:24:27 +03:00
Jesse Haber-Kucharsky
8fa47b74e8 cql: Add definition of underlying type for durations
Cassandra 3.10 added the `duration` type [1], intended to manipulate date-time
values with offsets (for example, `now() - 2y3h`).

The full implementation of the `duration` type in Scylla requires support
for version 5 of the binary protocol, which is not yet available.

In the meantime, this patch patch adds the implementation of the underlying type
for the eventual `duration` type. Included is also the ported test suite from
the reference implementation and additional tests.

Related to #2240.

[1] https://issues.apache.org/jira/browse/CASSANDRA-11873

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <b1e481da103efee82106bf31f261c5a1f4f8d9ca.1499885803.git.jhaberku@scylladb.com>
2017-07-13 17:26:00 +03:00
Avi Kivity
4704a78332 tests: remove bad constexpr in sstable_datafile_test
std::ceil() is not constexpr.

Found by clang.
2017-07-12 17:14:13 +03:00
Avi Kivity
a397889c81 Merge "Preserve table schema digest on schema tables migration" from Tomasz
"Currently new nodes calculate digests based on v3 schema mutations,
which are very different from v2 mutations. As a result they will
use schemas with different table_schema_version that the old nodes.
The old nodes will not recognize the version and will try to request
its definition. That will fail, because old nodes don't understand
v3 schema mutations.

To fix this problem, let's preserve the digests during migration,
so that they're the same on new and old nodes. This will allow
requests to proceed as usual.

This does not solve the problem of schema being changed during
the rolling upgrade. This is not allowed, as it would bring the
same problem back.

Fixes #2549."

* tag 'tgrabiec/use-consistent-schema-table-digests-v2' of github.com:cloudius-systems/seastar-dev:
  tests: Add test for concurrent column addition
  legacy_schema_migrator: Set digest to one compatible with the old nodes
  schema_tables: Persist table_schema_version
  schema_tables: Introduce system_schema.scylla_tables
  schema_tables: Simplify read_table_mutations()
  schema_tables: Resurrect v2 read_table_mutations()
  system_keyspace: Forward-declare legacy schemas
  legacy_schema_migrator: Take storage_proxy as dependency
2017-07-11 17:22:42 +03:00
Tomasz Grabiec
6d53cb7ab5 tests: Add test for concurrent column addition 2017-07-11 14:52:23 +02:00
Raphael S. Carvalho
8334086441 lcs: remove quadratic behavior from L0 compaction
L0 compaction triggers quadratic behavior when many newly created
sstables are needed for promotion due to their size being relatively
low to max sstable size parameter. So until L0 is worth promoting,
the strategy will compact every new sstable with all the existing
ones in L0. To fix it, let's do STCS on level 0 until it becomes
worth promoting.

Fixes #2432.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-11 09:35:35 -03:00
Avi Kivity
7b4412c3ce Revert "Merge "improvements for leveled strategy manifest" from Raphael"
This reverts commit 43a3e718e6, reversing
changes made to 3813e94b0a. It contains some
unrelated commits.
2017-07-11 11:12:53 +03:00
Raphael S. Carvalho
28ebe1807f lcs: remove quadratic behavior from L0 compaction
L0 compaction triggers quadratic behavior when many newly created
sstables are needed for promotion due to their size being relatively
low to max sstable size parameter. So until L0 is worth promoting,
the strategy will compact every new sstable with all the existing
ones in L0. To fix it, let's do STCS on level 0 until it becomes
worth promoting.

Fixes #2432.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-10 15:42:28 -03:00
Tomasz Grabiec
72e01b7fe8 tests: commitlog: Check there are no segments left on disk after clean shutdown
Reproduces #2550.

Message-Id: <1499358825-17855-2-git-send-email-tgrabiec@scylladb.com>
2017-07-09 19:25:27 +03:00
Raphael S. Carvalho
7f7758fb6f tests/sstable: make sstable_expired_data_ratio more robust
this change will stress histogram ability to return a good estimation
after merging keys such that it doesn't grow beyond size limit.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170708205713.5958-1-raphaelsc@scylladb.com>
2017-07-09 10:33:10 +03:00
Piotr Jastrzebski
a4b6cfe8f0 row_cache: use continuity info in single partition queries
If a query requests for a single partition that is inside
a range that has already been queried, use the continuity info
and don't go to disk when it's not needed.

Fixes #2244.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <15bb3b5b03225e7402e3862da53b5e06d3f4fa74.1499345295.git.piotr@scylladb.com>
2017-07-07 10:29:19 +02:00
Piotr Jastrzebski
70f4b23876 row_cache_test: Add test to reproduce issue 2544
This tests checks that cache should use continuity information
for single partition queries inside a range that has already been
queried.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <2ebd03ff5366e554d520f86da8054e0b9eff4178.1499345295.git.piotr@scylladb.com>
2017-07-07 10:29:19 +02:00
Tomasz Grabiec
a5fdff2ac2 row_cache: Add partition_ prefix to current counters
In preparation for adding per-row counters.
2017-07-04 13:55:06 +02:00
Asias He
2a794db61b tests: Add test_selective_token_range_sharder 2017-07-04 18:46:19 +08:00