Commit Graph

157 Commits

Author SHA1 Message Date
Piotr Jastrzebski
ea449c9cce Replace sstables::mutation_reader with ::mutation_reader
This will make migration to flat_mutation_reader much
easier and sstables::mutation_reader is going away with
this migration anyway.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-11-15 10:40:01 +01:00
Raphael S. Carvalho
1f478d5daa tests: enable twcs test that relied on size-tiered properties
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-11-14 13:27:27 -02:00
Raphael S. Carvalho
8165af1d08 twcs: respect stcs options by forwarding them to stcs method
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-11-14 13:27:27 -02:00
Raphael S. Carvalho
9cdc047a4c lcs: forward stcs options to respect them
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-11-14 13:27:27 -02:00
Raphael S. Carvalho
d8ec913c34 stcs: make most_interesting_bucket respect thresholds
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-11-14 13:26:04 -02:00
Raphael S. Carvalho
cb6d060d8e compaction: make size_tiered_most_interesting_bucket static method of stcs class
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-11-14 13:24:03 -02:00
Duarte Nunes
baeec0935f Replace query::full_slice with schema::full_slice()
query::full_slice doesn't select any regular or static columns, which
is at odds with the expectations of its users. This patch replaces it
with the schema::full_slice() version.

Refs #2885

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>
2017-10-17 11:25:53 +02:00
Raphael S. Carvalho
67c5c8dc67 sstables: do not recompute shards for all tables after each compaction
For every finished compaction, we were calculating shards for all
existing tables. With ignore_msb set to 0, it's probably not a big
deal, but if ignore_msb is like 12 and LCS is used (meaning thousands
of tables possibly), the operation may stall the reactor for a
considerable amount of time. That's fixed by caching shards.

Fixes #2875.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20171011053424.22308-1-raphaelsc@scylladb.com>
2017-10-11 11:45:01 +03:00
Botond Dénes
046a1f9b05 sstables: Get rid of [[deprecated]] index_reader::get_index_entries()
Change test code (the only consumers) to read index by partitions.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <b6111e92b5e0729bfa2e76fd848215804174067a.1507297154.git.bdenes@scylladb.com>
2017-10-08 12:18:52 +03:00
Raphael S. Carvalho
e34c1db642 db: update compaction history outside the sstable write lock
The reason to do that is because compaction can deadlock if refresh
disables write which waits for compaction, and compaction in turn
waits for dirty memory[1] that would be released by memtable write.

Dirty memory manager for non-system cfs was being used for system cfs,
which was useful for exposing this problem.

[1]: when updating compaction history.

Fixes #2769.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170918215238.9810-2-raphaelsc@scylladb.com>
2017-09-26 19:51:12 +02:00
Raphael S. Carvalho
1524426deb sstables: Fix compaction correctness of higher-level tables
When incremental_reader_selector is used for compaction, it will
first call incremental selector of partitioned sstable set with
minimum token that will result in first interval being skipped,
which means not everything being compacted. The interval is
skipped because iterator is incorrectly advanced when token
lies before it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170918021446.15920-1-raphaelsc@scylladb.com>
2017-09-19 09:59:30 +03:00
Avi Kivity
f7023501d6 treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable>
Since shared_sstable is going to be its own type soon, we can't use the old alias.
2017-09-12 10:43:05 +03:00
Paweł Dziepak
2b614201a7 tests/sstables: add storage_service_for_tests to counter write test
Writing a counters to a sstable is going to require cluster feature
information, which requires accessing some singletons.
2017-09-05 10:32:48 +01:00
Paweł Dziepak
5007c9290a tests/sstables: add test for reading wrong-order counter cells 2017-09-05 10:32:48 +01:00
Raphael S. Carvalho
050a7019b8 sstables/index_reader: fix index reader for summary entry spanning lots of keys
quantity prevents index_reader from reading all index entries of a summary
entry that span more than min_index_interval entries. That can happen after
introduction of size-based sampling, and consequently, sstable will not be
able to return a key which logical position in summary entry is beyond
min_index_interval. It's ok to not use quantity because index_reader will
read all indexes until either next summary entry or end of file is reached.

Fixes test_sstable_conforms_to_mutation_source

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170812045821.25269-1-raphaelsc@scylladb.com>
2017-08-12 09:44:16 +03:00
Raphael S. Carvalho
5124f94358 tests: test summary entry spanning more keys than min interval
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-08-11 01:37:06 -03:00
Raphael S. Carvalho
8726ee937d sstables: introduce size-based sampling for sstable summary
Currently, a summary entry is added after min_index_interval index
entries were written. Not taking into account size of index entries
becomes a problem with large partitions which may create big index
entries due to promoted indexes. Read performance is affected as a
consequence because index entries spanned by summary are all read
from disk to serve request.

What we wanna do is to also add a summary entry after index reaches
a boundary. To deal with oversampling, we want to write 1 byte to
summary for every 2000 bytes written to data file (this will be
eventually made into an option in the config file).
Both conditions must be met to avoid under or oversampling.
That way, the amount of data needed from index file to satify the
request is drastically reduced.

Fixes #1842.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-08-11 00:30:12 -03:00
Botond Dénes
9ee9988097 Add combined_mutation_reader_test unit test 2017-08-10 12:38:10 +03:00
Botond Dénes
94fc550e68 sstable_set::incremental_selector: select() now returns a selection
A seletion contains - in addition to the list of sstables - a next_token
which is a hint as to what is the next best token to call select() with.
This should be the smallest token such that at the next call to
select() the least number of new sstables will be returned, without
skipping any.
2017-08-09 16:27:33 +03:00
Avi Kivity
c21bb5ae05 tests: fix sstable_datafile_test build with boost 1.55
Boost 1.55 accidentally removed support for "range for" on
recursive_directory_iterator (previous and latter versions do
support it). Use old-style iteration instead.

Message-Id: <20170724080128.8824-1-avi@scylladb.com>
2017-07-24 11:20:12 +03:00
Tomasz Grabiec
a9237c1666 schema: Revert back to the 1.7 layout of static compact tables in memory
We are using C* 3.x compatible layout in schema tables but want to
keep using the 1.7 layout in memory for compatibility during rolling
upgrade. This patch switches the schema and schema_builder classes
back to the old layout. Translation of layout happens when converting
to/from schema mutations.

Notable changes:

 1) Includes a revert of commit 6260f31e08
    "thrift: Update CQL mapping of static CFs".

 2) Brings back the "default_validation_class" schema attribute. In v3
    it can be dervied from column definitions, but in v2 it can't, so
    we have to store it.

 3) legacy_schema_migrator and schema_builder don't have to do
    conversions to v3, this is now handled by the v3_columns
    class. schema_builder works with the same layout as schema, that
    is v2.

 4) Includes a revert of commit 66991a7ccb
    "v3 schema test fixes"

Fixes #2555.
2017-07-19 09:52:15 +02:00
Raphael S. Carvalho
c55c63f213 tests: add tests for time window compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-19 02:58:37 -03:00
Avi Kivity
9116dd91cb tests: copy the sstable with an unknown component to the data directory
We will be creating links to those sstable's files, and those don't work
if the data directory and the test sstable are on different devices.

Copying the files to the same directory fixes the problem.
Message-Id: <20170716090405.14307-1-avi@scylladb.com>
2017-07-16 11:55:00 +02:00
Avi Kivity
4704a78332 tests: remove bad constexpr in sstable_datafile_test
std::ceil() is not constexpr.

Found by clang.
2017-07-12 17:14:13 +03:00
Raphael S. Carvalho
8334086441 lcs: remove quadratic behavior from L0 compaction
L0 compaction triggers quadratic behavior when many newly created
sstables are needed for promotion due to their size being relatively
low to max sstable size parameter. So until L0 is worth promoting,
the strategy will compact every new sstable with all the existing
ones in L0. To fix it, let's do STCS on level 0 until it becomes
worth promoting.

Fixes #2432.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-11 09:35:35 -03:00
Avi Kivity
7b4412c3ce Revert "Merge "improvements for leveled strategy manifest" from Raphael"
This reverts commit 43a3e718e6, reversing
changes made to 3813e94b0a. It contains some
unrelated commits.
2017-07-11 11:12:53 +03:00
Raphael S. Carvalho
28ebe1807f lcs: remove quadratic behavior from L0 compaction
L0 compaction triggers quadratic behavior when many newly created
sstables are needed for promotion due to their size being relatively
low to max sstable size parameter. So until L0 is worth promoting,
the strategy will compact every new sstable with all the existing
ones in L0. To fix it, let's do STCS on level 0 until it becomes
worth promoting.

Fixes #2432.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-10 15:42:28 -03:00
Raphael S. Carvalho
7f7758fb6f tests/sstable: make sstable_expired_data_ratio more robust
this change will stress histogram ability to return a good estimation
after merging keys such that it doesn't grow beyond size limit.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170708205713.5958-1-raphaelsc@scylladb.com>
2017-07-09 10:33:10 +03:00
Raphael S. Carvalho
b350352e6c compaction: keep only one variant of size_tiered_most_interesting_bucket
two variants of size_tiered_most_interesting_bucket existed to avoid copy,
but subsequent work will make lcs use vector for each level of sstables,
so let's only keep one variant.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-04 03:34:51 -03:00
Avi Kivity
5883e85da3 Merge "improve maintainability of compaction strategies" from Raphael
"compaction_strategy.cc keeps the full implementation of size tiered,
major, and null strategies, and partial implementation of leveled
and date tiered strategies. It's a mess. In the future, we will also
need space for time window strategy. The file is hard to read and
maintain.
My goal here is to improve maintainability of the strategies by
putting each of them into its own header.

NOTE: No semantic change is introduced here."

* 'improve_compaction_strategy_maintainability' of github.com:raphaelsc/scylla:
  compaction_strategy: move dtcs to its existing header
  compaction_strategy: move lcs implementation to its own header
  compaction_strategy: move stcs implementation to its own header
  compaction_strategy: move compaction_strategy_impl to its own header
2017-07-03 11:39:30 +03:00
Avi Kivity
6895f6e603 sstable_datafile_test: fix sstable_expired_data_ratio failure
A comment states that we want the file to be old enough, but sets
a timestamp of max(), which is in the future. This may have passed
because the conversion from numeric_limits<time_t>::max() to
db_clock::time_point is not well defined (their dynamic range is
different), so truncation may have converted the large number to a
low one.
Message-Id: <20170702082903.20879-1-avi@scylladb.com>
2017-07-02 20:22:51 +02:00
Raphael S. Carvalho
69a9ad468c compaction_strategy: move dtcs to its existing header
Goal is to improve maintainability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:09 -03:00
Raphael S. Carvalho
ab335c8085 tests: more testing for tombstone compaction options
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
ce4dc15a20 tests: basic tombstone compaction test for date tiered
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
c400bf97b9 tests: basic test of tombstone compaction with lcs
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
138fda468f tests: basic tombstone compaction test for size tiered
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
ad24470972 tests: add test for estimation of droppable tombstone ratio
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
c01c659594 tests: add test for sstable with bad tombstone histogram
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:08:12 -03:00
Raphael S. Carvalho
7b532867ce tests: add sstable tombstone histogram test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 01:17:28 -03:00
Raphael S. Carvalho
fb9bc609c6 streaming_histogram: do not limit it to be used by sstables
streaming histogram will later be placed in /utils, so we want
it to use std::unordered_map<> instead of disk_hash<>.
That also requires implementing serialization/deserialization
functions for it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-27 16:51:52 -03:00
Avi Kivity
555621b537 Disentable memtables from sstables
Remove sstable::write_components(memtable), replacing it with a helper.

Fixes #2354
Message-Id: <20170624142639.16662-1-avi@scylladb.com>
2017-06-26 09:37:11 +02:00
Raphael S. Carvalho
4bb27cbd6f lcs: actually prefer oldest sstables of L0 when it falls behind
Strategy prefers promoting oldest sstables in L0. Because sort
procedure is incorrectly sorting elements in descending order,
newest sstables will be promoted first *if and only if* L0 falls
behind (more than 32 sstables). If L0 doesn't fall behind, we'll
have all L0 sstables compacted with overlapping ones in L1.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-19 20:45:39 -03:00
Nadav Har'El
3018df11b5 Allow reading exactly desired byte ranges and fast_forward_to
In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170619152629.11703-1-nyh@scylladb.com>
2017-06-19 18:31:32 +03:00
Avi Kivity
6e2c9ef9fb Revert "Allow reading exactly desired byte ranges and fast_forward_to"
This reverts commit 317d7fc253 (and also the
related 2c57ab84b2).  It causes crashes
during range scans, reported by Gleb:

"To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s
dataset and 3 node cluster.

Backtrace:
    at /home/gleb/work/seastar/seastar/core/apply.hh:36
    rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57
    range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142
    at ./seastar/core/future.hh:890
    at /home/gleb/work/seastar/seastar/core/future-util.hh:119
    at /home/gleb/work/seastar/seastar/core/future-util.hh:142
2017-06-18 16:10:21 +03:00
Nadav Har'El
317d7fc253 Allow reading exactly desired byte ranges and fast_forward_to
In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170614072122.13473-1-nyh@scylladb.com>
2017-06-15 13:22:46 +01:00
Tomasz Grabiec
f3a6d94398 sstables: Introduce sstable::as_mutation_source()
Adaptors extracted from existing testing code.
Message-Id: <1495729508-30081-1-git-send-email-tgrabiec@scylladb.com>
2017-05-25 19:30:20 +03:00
Calle Wilund
66991a7ccb v3 schema test fixes 2017-05-10 16:44:48 +00:00
Duarte Nunes
65d96421da tests/sstable_datafile_test: Fix regression
This patch fixes a regression introduced in 9e88b60, where the wrong
clustering key was being specified.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170509091621.2682-1-duarte@scylladb.com>
2017-05-09 12:18:47 +03:00
Duarte Nunes
9e88b60ef5 mutation: Set cell using clustering_key_prefix
Change the clustering key argument in mutation::set_cell from
exploded_clustering_prefix to clustering_key_prefix, which allows for
some overall code simplification and fewer copies. This mostly affects
the cql3 layer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-04 15:59:50 +02:00
Raphael S. Carvalho
8b0e358d73 tests/sstable_test: fix release-mode compaction_manager_test
in release mode, compaction task is active after submitting request
because ready future may be scheduled immediately.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170502171925.9893-1-raphaelsc@scylladb.com>
2017-05-02 20:48:30 +03:00