Commit Graph

1352 Commits

Author SHA1 Message Date
Raphael S. Carvalho
11b74050a1 partitioned_sstable_set: fix quadratic space complexity
streaming generates lots of small sstables with large token range,
which triggers O(N^2) in space in interval map.
level 0 sstables will now be stored in a structure that has O(N)
in space complexity and which will be included for every read.

Fixes #2287.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170417185509.6633-1-raphaelsc@scylladb.com>
2017-04-18 13:04:38 +03:00
Benoît Canet
8f793905a3 perf_sstable: Change busy loop to futurized loop
The blocked task detector introduced in
113ed9e963 was seeing
the initialization phase of perf_ssttable as a blocked
task.

Tranform this part of the code in a futurized loop
to make to blocked task detector happy.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <20170413132506.17806-1-benoit@scylladb.com>
2017-04-13 18:17:28 +03:00
Raphael S. Carvalho
a6f8f4fe24 compaction: do not write expired cell as dead cell if it can be purged right away
When compacting a fully expired sstable, we're not allowing that sstable
to be purged because expired cell is *unconditionally* converted into a
dead cell. Why not check if the expired cell can be purged instead using
gc before and max purgeable timestamp?

Currently, we need two compactions to get rid of a fully expired sstable
which cells could have always been purged.

look at this sstable with expired cell:
  {
    "partition" : {
      "key" : [ "2" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 120,
        "liveness_info" : { "tstamp" : "2017-04-09T17:07:12.702597Z",
"ttl" : 20, "expires_at" : "2017-04-09T17:07:32Z", "expired" : true },
        "cells" : [
          { "name" : "country", "value" : "1" },
        ]

now this sstable data after first compaction:
[shard 0] compaction - Compacted 1 sstables to [...]. 120 bytes to 79
(~65% of original) in 229ms = 0.000328997MB/s.

  {
    ...
    "rows" : [
      {
        "type" : "row",
        "position" : 79,
        "cells" : [
          { "name" : "country", "deletion_info" :
{ "local_delete_time" : "2017-04-09T17:07:12Z" },
            "tstamp" : "2017-04-09T17:07:12.702597Z"
          },
        ]

now another compaction will actually get rid of data:
compaction - Compacted 1 sstables to []. 79 bytes to 0 (~0% of original)
in 1ms = 0MB/s. ~2 total partitions merged to 0

NOTE:
It's a waste of time to wait for second compaction because the expired
cell could have been purged at first compaction because it satisfied
gc_before and max purgeable timestamp.

Fixes #2249, #2253

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170413001049.9663-1-raphaelsc@scylladb.com>
2017-04-13 10:59:19 +03:00
Avi Kivity
5b530aa464 Merge "Use promoted index for skipping in sstable mutation readers" from Tomasz
"sstable_streamed_mutation::fast_forward_to() is changed to use promoted index
(via index_reader) to optimize skipping in large partitions.

In addition to that, sstable mutation_reader is changed to use the index
to skip to the next partition.

Performance impact was evaluated using newly added tests/perf/perf_fast_forward

What's beyond this series:

  - Using index_reader for single-partition reads as well

  - Using index_reader for skipping across ranges in clustering restrictions"

* tag 'tgrabiec/skip-within-partition-using-index-v2' of github.com:cloudius-systems/seastar-dev: (47 commits)
  tests: Add performance test for fast forwarding of sstable readers
  tests: Allow starting cql_test_env on pre-existing data
  config: Allow specifying source when setting value
  tests: sstable: Add test for fast forwarding within partition using index
  sstables: sstable_streamed_mutation: use index in fast_forward_to()
  sstables: Store parsed promoted index in index_entry
  sstables: Add trace-level logging for sstable consumption
  sstables: Define deletion_time earlier
  sstables: Make parsing throw exception on malformed promoted index
  tests: Add tests for ordering of position_in_partition relative to composites
  position_range: Introduce all_clustered_rows() factory method
  position_in_partition: Introduce for_key()/after_key() factory methods
  position_in_partition: Add factory methods for positions around all rows
  position_in_partition: Introduce for_range_start()/for_range_end()
  position_in_partition: Fix friendship declaration
  keys: Introduce is_empty() for prefixes
  position_in_partition: Make comparable with composites
  types: Enhance lexicographical comparators
  compound_compat: Accept marker value in serialize_value()
  compound_compat: Add trichotomic comparator
  ...
2017-03-29 19:01:12 +03:00
Raphael S. Carvalho
023031b0c8 compaction: lcs: fix functionality to feed starved levels
quick introduction to level starvation:
high levels may be left uncompacted (thus starved) for a long time if user
makes something that make they contain little data, such as cleanup or change
of max sstable size (default 160M). Leveled strategy handles this problem as
follow: consider we're compacting L1 to L2. If L3 is starved, we look for one
of its sstable that is fully contained in token range of candidates L1->L2,
so that we won't end up with an overlapping in L2.

now the problem:
the functionality isn't working properly now because range of candidates is
being incorrectly calculated due to an accident when converting the code to
C++. It won't cause an overlap because it's actually being more restrictive
about which sstable from starved level can be used.

A test case was added to confirm the problem.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170328223753.15398-1-raphaelsc@scylladb.com>
2017-03-29 18:59:46 +03:00
Tomasz Grabiec
7fd724821b tests: Add performance test for fast forwarding of sstable readers 2017-03-28 18:34:55 +02:00
Tomasz Grabiec
543a484d78 tests: Allow starting cql_test_env on pre-existing data 2017-03-28 18:34:55 +02:00
Tomasz Grabiec
f1aca6d116 tests: sstable: Add test for fast forwarding within partition using index 2017-03-28 18:34:55 +02:00
Tomasz Grabiec
b40b20387a tests: Add tests for ordering of position_in_partition relative to composites 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
18a057aa81 compound_compat: Return composite from serialize_value()
To make the code more type-safe. Also, mark constructor from bytes
explicit.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
27d86dfe18 sstables: Enable skipping to cells at data_consume_context level 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
cd295e9926 sstables: Avoid moving an sstable
In preparation for adding non-movable members.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
5edb427873 sstables: Remove private constructor
To reduce duplication.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
a7301a702f tests: Add missing blocking on fast_forward_to() 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
5fe14735e8 tests: dht: Test ring_position_comparator 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
ff6cca6e9e tests: Add utility for checking total orders 2017-03-28 18:10:39 +02:00
Duarte Nunes
53014bd762 mutation_source_test: Ensure unique collection elements
Duplicate elements are illegal in collections, so we ensure they only
contain unique ones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170327161149.8938-4-duarte@scylladb.com>
2017-03-27 18:44:11 +02:00
Duarte Nunes
94d568924d mutation_source_test: Sort collection elements
Ref #1607

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170327161149.8938-3-duarte@scylladb.com>
2017-03-27 18:43:58 +02:00
Duarte Nunes
4963902922 mutation_source_test: Remove extra randomness source
This patch ensures we generate UUIDs using the same randomness source
as all the other values we randomly generator, so that we can get a
deterministic run from the seeds we print.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170327161149.8938-2-duarte@scylladb.com>
2017-03-27 18:43:44 +02:00
Tomasz Grabiec
bb0ce5d8fe Merge "Ensure base and view schema versions match" from Duarte
The mapping between a base table update and a view update is schema
dependent, so we need to ensure the view schema versions match the
base schema version. For example, we match base columns to view
columns by name, so we need to ensure the base and view schemas we're
using for writting are isolated with respect to a previous alter
table statement.

We thus need to match base schema versions with view schema versions,
and we need to so atomically to ensure that when one fiber sees a
schema, it also sees the complete set of corresponding view schemas.
This series ensures the schemas modified as a result of an alter
table statement are published atomically, under the schema lock. This
way, all the schemas referenced by the database are consistent with
each other when they are observed by other fibers.

Finally, we upgrade the mutation schema before generating the view
updates, to ensure it matches the most recent view schemas the base
replica knows about, registered in the database.

The db::view::view class was replaced by a set of non-member
functions, with its state, which used to reflect only the most recent
schema version, being moved to a new view_info class.
2017-03-17 12:40:00 +01:00
Tomasz Grabiec
cefb6b604a tests: lsa_async_eviction_test: Allocate objects under allocating section 2017-03-16 10:21:10 +01:00
Duarte Nunes
e215f25b11 migration_manager: Atomically migrate table and views
This patch changes the migration path for table updates such that the
base table mutations are sent and applied atomically with the view
schema mutations.

This ensures that after schema merging, we have a consistent mapping
of base table versions to view table versions, which will be used in
later patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 16:03:56 +01:00
Duarte Nunes
143136647a mutation_test: Add more test cases for difference()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 14:34:01 +01:00
Duarte Nunes
005e4741e3 mutation_source_test: Randomly generate collection cells
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 14:34:01 +01:00
Tomasz Grabiec
ed530dfb3a tests: sstables: Add test for skipping within a compressed stream
Refs #2143.
2017-03-13 13:08:24 +01:00
Tomasz Grabiec
88ccc99017 tests: sstables: Add test for handling of repeated tombstones 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
124dde30db sstables: Extract writer parameters into config objects
Also enables users to change the default promoted index block size.
2017-03-10 14:42:22 +01:00
Tomasz Grabiec
ad1e69c4c5 tests: Move as_mutation_source() helper to header 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
6f409d367b tests: Extract ensure_monotonic_positions() to streamed_mutation_assertions 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
06a964b3a0 tests: mutation_source_test: Add test case for forwarding to a full range 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
929842ad3f tests: simple_schema: Add fragment factories 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
d98f013b07 tests: Extract simple_schema 2017-03-10 14:42:22 +01:00
Paweł Dziepak
aaae8db033 loggers should not have external linkage
Message-Id: <20170309111034.20929-1-pdziepak@scylladb.com>
2017-03-09 12:27:20 +01:00
Paweł Dziepak
04b80272f2 cell_locker: add metrics for lock acquisition 2017-03-02 09:05:12 +00:00
Paweł Dziepak
8457f407ef tests/counters: add test for apply reversability 2017-03-02 09:05:11 +00:00
Paweł Dziepak
2b5c4386b5 tests/cell_locker: add test for timing out lock acquisition 2017-03-02 09:05:10 +00:00
Paweł Dziepak
25173f8095 db: propagate timeout for counter writes 2017-03-02 09:05:10 +00:00
Paweł Dziepak
bdac487b5a do not use long_type for counter update 2017-03-01 16:33:37 +00:00
Paweł Dziepak
0198d8e470 Merge "Introduce streamed_mutation::fast_forward_to()" from Tomasz
"This introduces an API which allows forward navigation in a stream of mutation
fragments. It allows one to consume only a subset of the stream by iteratively
specifying sub-ranges from which fragments should be returned.

API outline:

  When in forwarding mode, the stream does not return all fragments right away,
  but only those belonging to the current range. Initially current range only
  covers the static row. The stream can be forwarded, even before reaching end-
  of-stream for current range, to a later range with fast_forward_to().
  Forwarding doesn't change initial restrictions of the stream, it can only be
  used to skip over data.

  Monotonicity of positions is preserved by forwarding. That is fragments
  emitted after forwarding will have greater positions than any fragments
  emitted before forwarding.

  For any range, all range tombstones relevant for that range which are present
  in the original stream will be emitted. Range tombstones emitted before
  forwarding which overlap with the new range are not necessarily re-emitted.

  When not in forwarding mode, the stream acts as if the current range was equal
  to the full range. This implies that fast_forward_to() cannot be
  used.

  Whether stream is in forwarding mode or not is specified when the stream
  is created, typically via mutation_source interface.

What's left for later series:

  Optimization by providing specialized implementations. This series implements
  forwarding support in all mutation sources via generic wrapper which simply
  drops fragments."

* tag 'tgrabiec/clustering-fast-forward-to-v2' of github.com:scylladb/seastar-dev:
  tests: mutation_source_tests: Verify monotonicty of positions
  tests: random_mutation_generator: Spread the keys more
  tests: mutation_source_test: Make blobs more easily distinguishable
  tests: streamed_mutation: Test that merged stream passes mutation source tests
  tests: mutation_source_test: Add tests for forwarding of streamed_mutation
  tests: streamed_mutation_assertions: Add methods for navigating the stream
  tests: Add range generators to random_mutation_generator
  partition_slice_builder: Add with_ranges()
  query: Introduce full_clustering_range
  streamed_mutation: Add non-owning variant of mutation_from_streamed_mutation()
  db: Enable creating forwardable readers via mutation_source
  mutation_source: Document liveness requirements
  mutation_source: Cleanup
  db: Replace virtual_reader_type with mutation_source_opt
  partition_version: Refactor make_partition_snapshot_reader() overloads
  database: Fix mutation_source created by as_mutation_source() to not ignore trace_state_ptr
  memtable: Accept all mutation_source parameters
  streamed_mutation: Implement fast_forward_to() in stream merger
  streamed_mutation: Add generic implementation of forwardable streamed_mutation
  streamed_mutation: Add fast_forward_to() API
  position_in_partition: Introduce position_range
  position_in_partition: Introduce position constructor for right after the static row
  streamed_mutation: Make cast to view non-explicit
  streamed_mutation: Make schema() getter non-copying
2017-02-24 10:37:51 +00:00
Tomasz Grabiec
0798ea22c8 tests: mutation_source_tests: Verify monotonicty of positions 2017-02-23 18:50:54 +01:00
Tomasz Grabiec
d0421ba545 tests: random_mutation_generator: Spread the keys more
The deviation was very low so most ranges were very close. Spread them
to test more cases.
2017-02-23 18:50:54 +01:00
Tomasz Grabiec
27ff169b6b tests: mutation_source_test: Make blobs more easily distinguishable
It's easier to compare them if they differ only by a few most
significant bits, than by all bits.
2017-02-23 18:50:53 +01:00
Tomasz Grabiec
182e3f981b tests: streamed_mutation: Test that merged stream passes mutation source tests 2017-02-23 18:50:53 +01:00
Tomasz Grabiec
122562c1cc tests: mutation_source_test: Add tests for forwarding of streamed_mutation 2017-02-23 18:50:53 +01:00
Tomasz Grabiec
1d7e84f770 tests: streamed_mutation_assertions: Add methods for navigating the stream 2017-02-23 18:50:53 +01:00
Tomasz Grabiec
f2feb54fb0 tests: Add range generators to random_mutation_generator 2017-02-23 18:50:53 +01:00
Tomasz Grabiec
892d4a2165 db: Enable creating forwardable readers via mutation_source
Right now all mutation source implementations will use
make_forwardable() wrapper.
2017-02-23 18:50:44 +01:00
Calle Wilund
0d87f3dd7d utils::UUID: operator< should behave as comparison of hex strings/bytes
I.e. need to be unsigned comparison.
Message-Id: <1487683665-23426-1-git-send-email-calle@scylladb.com>
2017-02-22 09:19:22 +00:00
Paweł Dziepak
274bcd415a tests/cql_test_env: wait for storage service initialization
Message-Id: <20170221121130.14064-1-pdziepak@scylladb.com>
2017-02-21 17:05:45 +02:00
Tomasz Grabiec
9f63e172fb tests: compaction_manager_test: Fix abort on exception
Message-Id: <1487343901-12745-1-git-send-email-tgrabiec@scylladb.com>
2017-02-17 15:53:55 +00:00