Commit Graph

1504 Commits

Author SHA1 Message Date
Raphael S. Carvalho
b350352e6c compaction: keep only one variant of size_tiered_most_interesting_bucket
two variants of size_tiered_most_interesting_bucket existed to avoid copy,
but subsequent work will make lcs use vector for each level of sstables,
so let's only keep one variant.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-04 03:34:51 -03:00
Avi Kivity
5883e85da3 Merge "improve maintainability of compaction strategies" from Raphael
"compaction_strategy.cc keeps the full implementation of size tiered,
major, and null strategies, and partial implementation of leveled
and date tiered strategies. It's a mess. In the future, we will also
need space for time window strategy. The file is hard to read and
maintain.
My goal here is to improve maintainability of the strategies by
putting each of them into its own header.

NOTE: No semantic change is introduced here."

* 'improve_compaction_strategy_maintainability' of github.com:raphaelsc/scylla:
  compaction_strategy: move dtcs to its existing header
  compaction_strategy: move lcs implementation to its own header
  compaction_strategy: move stcs implementation to its own header
  compaction_strategy: move compaction_strategy_impl to its own header
2017-07-03 11:39:30 +03:00
Avi Kivity
6895f6e603 sstable_datafile_test: fix sstable_expired_data_ratio failure
A comment states that we want the file to be old enough, but sets
a timestamp of max(), which is in the future. This may have passed
because the conversion from numeric_limits<time_t>::max() to
db_clock::time_point is not well defined (their dynamic range is
different), so truncation may have converted the large number to a
low one.
Message-Id: <20170702082903.20879-1-avi@scylladb.com>
2017-07-02 20:22:51 +02:00
Raphael S. Carvalho
69a9ad468c compaction_strategy: move dtcs to its existing header
Goal is to improve maintainability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:09 -03:00
Raphael S. Carvalho
ab335c8085 tests: more testing for tombstone compaction options
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
ce4dc15a20 tests: basic tombstone compaction test for date tiered
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
c400bf97b9 tests: basic test of tombstone compaction with lcs
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
138fda468f tests: basic tombstone compaction test for size tiered
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
ad24470972 tests: add test for estimation of droppable tombstone ratio
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
6fb26d9f0c tests: add streaming_histogram_test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:08:12 -03:00
Raphael S. Carvalho
c01c659594 tests: add test for sstable with bad tombstone histogram
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:08:12 -03:00
Raphael S. Carvalho
7b532867ce tests: add sstable tombstone histogram test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 01:17:28 -03:00
Raphael S. Carvalho
fb9bc609c6 streaming_histogram: do not limit it to be used by sstables
streaming histogram will later be placed in /utils, so we want
it to use std::unordered_map<> instead of disk_hash<>.
That also requires implementing serialization/deserialization
functions for it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-27 16:51:52 -03:00
Tomasz Grabiec
eb844a10e9 tests: streamed_mutation_test: Avoid using boost::size() on row ranges
Fails to compile with libboost 1.55.
2017-06-27 15:27:13 +02:00
Tomasz Grabiec
e68925595c tests: row_cache: Remove unused method 2017-06-27 14:10:37 +02:00
Avi Kivity
9b21a9bfb6 Merge "Implement partial cache" from Tomasz and Piotr
"This series enables cache to keep partial partitions.
Reads no longer have to read whole partition from sstables
in order to cache the result.

The 10MB threshold for partition size in cache is lifted.

Known issues:

 - There is no partial eviction yet, whole partitions are still evicted,
   and partition snapshots held by active reads are not evictable at all
 - Information about range continuity is not recorded if that
   would require inserting a dummy entry, or if previous entry
   doesn't belong to the latest snapshot
 - Cache update after memtable flush happening concurrently with reads
   may inhibit that reads' ability to populate cache (new issue)
 - Cache update from flushed memtables has partition granularity,
   so may cause latency problems with large partition
 - Schema is still tracked per-partition, so after schema changes
   reads may induce high latency due to whole partition needing
   to be converted atomically
 - Range tombstones are repeated in the stream for every range between
   cache entries they cover (new issue)
 - Populating scans for both small and large partitions (perf_fast_forward)
   experienced a 40% reduction of throughput, CPU bound

How was this tested:

 - test.py --mode release
 - row_cache_stress_test -c1 -m1G
 - perf_fast_forward, passes except for the test case checking range continuity population
   which would require inserting a dummy entry (mentioned above)
 - perf_simple_query (-c1 -m1G --duration 32):
     before: 90k [ops/s] stdev: 4k [ops/s]
     after:  94k [ops/s] stdev: 2k [ops/s]"

* tag 'tgrabiec/introduce-partial-cache-v8' of github.com:cloudius-systems/seastar-dev: (130 commits)
  tests: row_cache: Add test_tombstone_merging_in_partial_partition test case
  tests: Introduce row_cache_stress_test
  utils: Add helpers for dealing with nonwrapping_range<int>
  tests: simple_schema: Allow passing the tombstone to make_range_tombstone()
  tests: simple_schema: Accept value by reference
  tests: simple_schema: Make add_row() accept optional timestamp
  tests: simple_schema: Make new_timestamp() public
  tests: simple_schema: Introduce make_ckeys()
  tests: simple_schema: Introduce get_value(const clustered_row&) helper
  tests: simple_schema: Fix comment
  tests: simple_schema: Add missing include
  row_cache: Introduce evict()
  tests: Add cache_streamed_mutation_test
  tests: mutation_assertions: Allow expecting fragments
  mutation_fragment: Implement equality check
  tests: row_cache: Add test for population of random partitions
  tests: row_cache: Add test for partition tombstone population
  tests: row_cache: Test reading randomly populated partition
  tests: row_cache: Add test_single_partition_update()
  tests: row_cache: Add test_scan_with_partial_partitions
  ...
2017-06-26 14:54:37 +03:00
Avi Kivity
555621b537 Disentable memtables from sstables
Remove sstable::write_components(memtable), replacing it with a helper.

Fixes #2354
Message-Id: <20170624142639.16662-1-avi@scylladb.com>
2017-06-26 09:37:11 +02:00
Avi Kivity
236a8370e4 Remove use of std::random_shuffle()
It was removed in C++17. Replace with std::shuffle().
Message-Id: <20170626063809.7563-1-avi@scylladb.com>
2017-06-26 09:36:38 +02:00
Tomasz Grabiec
b0bcf2be53 tests: row_cache: Add test_tombstone_merging_in_partial_partition test case 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
23c6f517cb tests: Introduce row_cache_stress_test
Runs readers, updates and eviction concurrently and verifies the
following property of reads:

  - reads see all past writes

  - reads see no partial writes within a single partition
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
5c9f87fb27 tests: simple_schema: Allow passing the tombstone to make_range_tombstone() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
edf4a3494c tests: simple_schema: Accept value by reference 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
5f70df472f tests: simple_schema: Make add_row() accept optional timestamp 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
53867c4328 tests: simple_schema: Make new_timestamp() public 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
51b5814ec2 tests: simple_schema: Introduce make_ckeys() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
074c67fe4d tests: simple_schema: Introduce get_value(const clustered_row&) helper 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
8ffc776e06 tests: simple_schema: Fix comment 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
ecacd2e84a tests: simple_schema: Add missing include 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
c4e8effffa tests: Add cache_streamed_mutation_test
[tgrabiec:
  - extracted from a larger commit
  - removed coupling with how cache_streamed_mutation is created (the
    code went out of sync), used more stable make_reader(). it's simpler too.
  - replaced false/true literals with is_continuous/is_dummy where appropraite
  - dropped tests for cache::underlying (class is gone)
  - reused streamed_mutation_assertions, it has better error messages
  - fixed the tests to not create tombstones with missing timestamps
  - relaxed range tombstone assertions to only check information relevant for the query range
  - print cache on failure for improved debuggability
]
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
44fdee3f2e tests: mutation_assertions: Allow expecting fragments 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
116bcb8b30 tests: row_cache: Add test for population of random partitions 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
930a1415fe tests: row_cache: Add test for partition tombstone population 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
9bfece6f82 tests: row_cache: Test reading randomly populated partition 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
0358334579 tests: row_cache: Add test_single_partition_update()
[tgrabiec: Extracted from "row_cache: Introduce cache_streamed_mutation"]
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
8bb76e2f12 tests: row_cache: Add test_scan_with_partial_partitions 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
5a0ae55f6d Introduce schema_upgrader 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
7ae40d7045 tests: Add test for update_invalidating() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
fb62dfab02 tests: mvcc: Introduce test_schema_upgrade_preserves_continuity 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
164989a574 tests: mvcc: Add test for partition_entry::apply_to_incomplete() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
bbfa52822e row_cache: Switch readers to use per-entry snapshots
Currently readers are always using the latest snapshot. This is fine
for respecting write atomicity if partitions are fully continuous in
cache (now), but will break write atomicity once partial population is
allowed.

Consider the following case:

  flush write(ck=1), write(ck=2) -> snapshot_1
  cache reader 1 reads and inserts ck=1 @snapshot_1
  flush write(ck=1), write(ck=2) -> snapshot_2
  cache reader 2 reads and inserts ck=2 @snapshot_2

Because cache update is not atomic, it can happen that reader 2 will
complete while the partition hasn't been updated yet for snapshot_2.
In such case, after read 2 the partition would contain ck=1 from
snapshot_1 and ck=2 from snapshot_2. It will match neither of the
snapshots, and this could violate write atomicity.

To solve this problem we conceptually assign each partition key in the
ring to its current snapshot which it reflects. The update process
gradually converts entries in ring order to the new snapshot. Reads
will not be using the latest snapshot, but rather the current snapshot
for the position in the ring they are at.

There is a race between the update process and populating reads. Since
after the update all entries must reflect the new snapshot, reads
using the old snapshot cannot be allowed to insert data which can no
longer be reached by the update process. Before this patch this race
was prevented by the use of a phased_barrier, where readers would keep
phased_barrier::operation alive between starting a read of a partition
and inserting it into cache. Cache update was waiting for all prior
operations before starting the update. Any later read which was not
waited for would use the latest snapshot for reads, so the update
process didn't have to fix anything up for such reads.

After this change, later reads cannot always use the latest snapshot,
they have to use the snapshot corresponding to given entry. So it's
not enough for update() to wait for prior reads in order to prevent
stale populations. The (simple) solution implemented in this patch is
to detect the conflict and abandon population of given sub-range. In
general, reads are allowed to populate given range only if it belongs
to a single snapshot.

Note that the range here is not the whole query range. For population
of continuity, it is the range starting after the previous key and
ending after the key being inserted. When populating a partition
entry, the range is a singular range containing only the partition
key. Readers switch to new snapshots automatically as they move across
the ring. It's possible that the insertion of the partition doesn't
conflict, but continuity does. In such case the entry will be inserted
but continuity will not be set.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
8ba6366610 row_cache: Switch to using snapshot_source
Currently every time cache needs to create reader for missing data it
obtains a reader which is most up to date. That reader includes writes
from later populate phases, for which update() was not yet
called. This will be problematic once we allow partitions to be
partially populated, because different parts of the partition could be
partially populated using readers using different sets of writes, and break
write atomicity.

The solution will be to always populate given partition using the same
set of writes, using reader created from the current snapshot. The
snapshot changes only on update(), with update() gradually converting
each partition to the new snapshot.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
e23c7e2f34 row_cache: Rework invalidate() implementation
1) Reduce duplication by delegating to more general overloads

 2) Improve documentation to not mention effects in terms of
    population (detail) but rather write visibiliy

 3) Rename clear() to invalidate() and merge with the range variant,
    it has the same semantics
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
bd023b6161 tests: Introduce memtable_snapshot_source
Snapshottable in-memory mutation source for use in row_cache tests.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
7f8620d4a7 tests: mutation_source: Relax expectations about range tombstones
In preparation for having partial cache which trims range tombstones
to the lower bound of the query.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
3a9212e0f2 tests: mutation_assertions: Add ability to limit verification to given clustering_row_ranges
Currently mutation sources are free to return range tombstones
covering range which is larger than the query range. The cache
mutation source will soon become more eager about trimming such
tombstones. To cover up for such differences, allow telling the
restrictions to only care about differences relevant for given
clustering ranges.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
f925b26241 tests: mutation_reader_assertions: Simplify 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
9380dd1ee3 mutation_source: make sure we never ignore fast forwarding
mutation source sometimes ignore fast forwarding parameter so
this change adds assertion to check that this parameter
can be safely ignored.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
ac03331490 row_cache_test: improve test_sliced_read_row_presence
Remove unused parameter and add checks to make sure
all expected rows have been received.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
db053ef902 tests: Add test for continuity merging rules 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
2edf08d36a tests: random_mutation_generator: Generate random continuity 2017-06-24 18:06:11 +02:00