Commit Graph

39 Commits

Author SHA1 Message Date
Raphael S. Carvalho
cb6d060d8e compaction: make size_tiered_most_interesting_bucket static method of stcs class
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-11-14 13:24:03 -02:00
Raphael S. Carvalho
1524426deb sstables: Fix compaction correctness of higher-level tables
When incremental_reader_selector is used for compaction, it will
first call incremental selector of partitioned sstable set with
minimum token that will result in first interval being skipped,
which means not everything being compacted. The interval is
skipped because iterator is incorrectly advanced when token
lies before it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170918021446.15920-1-raphaelsc@scylladb.com>
2017-09-19 09:59:30 +03:00
Avi Kivity
f7023501d6 treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable>
Since shared_sstable is going to be its own type soon, we can't use the old alias.
2017-09-12 10:43:05 +03:00
Botond Dénes
94fc550e68 sstable_set::incremental_selector: select() now returns a selection
A seletion contains - in addition to the list of sstables - a next_token
which is a hint as to what is the next best token to call select() with.
This should be the smallest token such that at the next call to
select() the least number of new sstables will be returned, without
skipping any.
2017-08-09 16:27:33 +03:00
Raphael S. Carvalho
7ecedac222 compaction: wire up time window compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-19 02:58:37 -03:00
Raphael S. Carvalho
b350352e6c compaction: keep only one variant of size_tiered_most_interesting_bucket
two variants of size_tiered_most_interesting_bucket existed to avoid copy,
but subsequent work will make lcs use vector for each level of sstables,
so let's only keep one variant.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-04 03:34:51 -03:00
Raphael S. Carvalho
69a9ad468c compaction_strategy: move dtcs to its existing header
Goal is to improve maintainability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:09 -03:00
Raphael S. Carvalho
4d387475fe compaction_strategy: move lcs implementation to its own header
Goal is to improve maintainability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:07 -03:00
Raphael S. Carvalho
4b46d286fd compaction_strategy: move stcs implementation to its own header
Goal is to improve maintainability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:06 -03:00
Raphael S. Carvalho
0d9bb0da39 compaction_strategy: move compaction_strategy_impl to its own header
compaction_strategy.cc keeps the full implementation of size tiered,
major, and null strategies, and partial implementation of leveled
and date tiered strategies. It's a mess. In the future, we will also
need space for time window strategy. The file is hard to read and
maintain.
My goal here is to eventually improve maintainability of the
strategies by putting each of them into its own header.
This is the first step towards that goal.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:04 -03:00
Raphael S. Carvalho
9fa855e105 compaction_strategy: use duration type for default tombstone compaction interval
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170630041838.20604-1-raphaelsc@scylladb.com>
2017-06-30 08:56:22 +03:00
Raphael S. Carvalho
f76ece5349 compaction/dtcs: add support for tombstone compaction
Unlike other strategies, dtcs has tombstone compaction disabled by
default due to:
- deletion shouldn't be used with DTCS; rather data is deleted through TTL.
- with time series workloads, it's usually better to wait for whole sstable
to be expired rather than compacting a single sstable when it's more than
20% (default value) expired.
See CASSANDRA-9234 for more details.

For tombstone compaction, unworthy sstables are filtered out and the oldest
one is chosen because it's the one less likely to shadow data and it's also
relatively big.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
70e54cfe6e compaction/lcs: add support for tombstone compaction
LCS will choose its candidate by starting from highest level and
getting sstable which has highest droppable tombstone ratio.
Unlike STCS which needs to choose oldest sstable from biggest tier,
LCS can choose the one with highest d__t__r because sstables in
a given level don't overlap.
Sstable picked up for tombstone removal compaction won't be demoted
or promoted.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
8fd80ac22c compaction/stcs: add support for tombstone compaction
Larger sstables are hard to find sstable peers and therefore are
left uncompacted for a long time. Expired data and tombstones which
can be purged will waste disk space meanwhile.

sstable tracks droppable tombstone from which ratio can be calculated.
If ratio is greater than threshold (0.2 by default), sstable will
be eligible for compaction. Oldest sstables from biggest tiers are
preferrable because droppable data in them are more likely to satisfy
the conditions for purge, like not shadowing data in another sstable.

Subsequent patches will add support in leveled and date tiered
strategies.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
b26dc6db1a lcs: make logger static
otherwise, there will be one instance per shard.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-19 21:01:21 -03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Raphael S. Carvalho
13477075e2 compaction_strategy: implement resharding strategy for compaction strategies
Strategies other than leveled will reshard one shared sstable at
a time, and the target shard, shard at which job will run, for each
job will be chosen in a round-robin fashion.

For leveled strategy, we will reshard together smp::count adjacent
sstables that belong to same level.
The reason for that is because resharding one sstable at a time
may result in creation of file for each shard, meaning after
resharding we could end up with NO_SSTABLES*NO_SHARDS.

These resharding strategies will be used for our new resharding
algorithm.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-04-21 17:11:24 -03:00
Raphael S. Carvalho
11b74050a1 partitioned_sstable_set: fix quadratic space complexity
streaming generates lots of small sstables with large token range,
which triggers O(N^2) in space in interval map.
level 0 sstables will now be stored in a structure that has O(N)
in space complexity and which will be included for every read.

Fixes #2287.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170417185509.6633-1-raphaelsc@scylladb.com>
2017-04-18 13:04:38 +03:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Raphael S. Carvalho
02541e15c1 sstable_set: introduce incremental selector
Incrementally select sstables from sstable set using token
in ascending order.
For leveled strategy, it returns all sstables that belong
to current interval. For other strategies, it just return
all sstables from the set.
Useful for compaction which needs all sstables that overlap
with key being currently compacted to calculate maximum
purgeable timestamp.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-12-09 16:17:16 -02:00
Raphael S. Carvalho
b30a2cb21a lcs: generate info that preserves token distribution in higher levels
The information (last compacted keys) is lost after node is restarted
or schema is updated, which causes strategy to be rebuilt.
We need it for strategy to guarantee uniform distribution of token
range across sstables, or we could end up with 1 sstable of level L
overlapping with lots of sstables of level L+1, and that results in
a compaction of undesired length.
That information can be generated from scratch by getting last key
of newest sstable in each level > 0.

Fixes #1906.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <35ebd15977d5a8418239febb160c796cdc0e98fa.1480533805.git.raphaelsc@scylladb.com>
2016-12-01 11:19:58 +02:00
Raphael S. Carvalho
a16425833c size_tiered: do not recreate bucket when it goes beyond max threshold
Problem will cause size tiered to return small jobs when there are
more than max_threshold sstables of similar size. For example, if
max_threshold is 32, and there are 36 sstables of similar size,
strategy will only return 4 sstables to be compacted. That's because
we incorrectly create a new bucket when it meets the max threshold.
What we should do is to allow buckets to grow beyond max threshold
and trim them when selecting the most suitable one for compaction.

Important to mention that estimation for size tiered will now
work better when there are more than max_threshold sstables of
similar size.

Fixes #1901.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <080bad70d6cb86eaf52ac1bdd6765ac47aab5b03.1478316140.git.raphaelsc@scylladb.com>
2016-11-29 16:56:02 +02:00
Raphael S. Carvalho
a8ab4b8f37 lcs: fix starvation at higher levels
When max sstable size is increased, higher levels are suffering from
starvation because we decide to compact a given level if the following
calculation results in a number greater than 1.001:
level_size(L) / max_size_for_level_l(L)

Fixes #1720.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-30 14:09:49 -03:00
Raphael S. Carvalho
a3bf7558f2 lcs: fix broken token range distribution at higher levels
Uniform token range distribution across sstables in a level > 1 was broken,
because we were only choosing sstable with lowest first key, when compacting
a level > 0. This resulted in performance problem because L1->L2 may have a
huge overlap over time, for example.
Last compacted key will now be stored for each level to ensure sort of
"round robin" selection of sstables for compactions at level >= 1.
That's also done by C*, and they were once affected by it as described in
https://issues.apache.org/jira/browse/CASSANDRA-6284.

Fixes #1719.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-30 14:09:16 -03:00
Raphael S. Carvalho
dffb41f9d8 sstables: remove schema parameter from some sstable methods
schema can now be found in the sstable object itself.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>
2016-09-19 13:25:58 +02:00
Raphael S. Carvalho
a2dc88889d db: enable clustering optimization only on dtcs
Leveled strategy will not benefit from this strategy because
there's only a few sstables that will contain a given partition
key, which means that a clustering key that belongs to a specific
partition key can only be in a few sstables as well.

Date tiered strategy is the one that will actually benefit the
most from this optimization. Size tiered may benefit from it too
if clustering key isn't overwritten, but it will not use the
clustering optimization.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 11:31:07 -03:00
Avi Kivity
7a140a306e Revert "sstables: optimize selection of sstables for leveled strategy"
This reverts commit c75b07fc34f0e7267a8e49276b96bbd4686cb78d; does not
deduplicate the sstable list.
2016-09-01 18:34:08 +03:00
Raphael S. Carvalho
c75b07fc34 sstables: optimize selection of sstables for leveled strategy
It's possible to copy sstables directly into vector, and that will
improve performance. my benchmark tool[1] shows that new version
reduces running time of *copy procedure* by factor of two after
1024^2 calls.
Switching to back_inserter improves throughput even further.
[1]: gist.github.com/raphaelsc/a4b27290f362cdecdef399770dda759c

Refs #1632.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <7153514a9b5f5eb24dff518ee9fa3680e0881dae.1472741401.git.raphaelsc@scylladb.com>
2016-09-01 18:08:53 +03:00
Raphael S. Carvalho
b699ef2de3 compaction: wire up date tiered compaction strategy
After this commit, date tiered compaction strategy is supported
on Scylla.

To understand how it works, take a look at our wiki page:
https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction

Fixes #511.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
43926026c3 compaction: introduce compaction strategy method to estimate pending compaction
At the moment, it's not possible to know how many compaction are needed for
compaction strategy to be satisfied. It's not possible to know exactly the
number of pending compaction, but the strategy can provide an estimation.

For size tiered, it's based on number of sstables in each bucket. By dividing
bucket size by max threshold, we get number of compaction needed to compact
that single bucket.

For leveled, it's about the number of sstables that exceeds the limit in
each level.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <e209e52f6159ee274a8358b69961a7c0ce357f7d.1467667054.git.raphaelsc@scylladb.com>
2016-07-05 19:03:11 +03:00
Avi Kivity
c8237fc262 compaction_strategy: introduce make_sstable_set()
Allow compaction_strategy to create a container for sstables that is
optimized for the strategy.

Most compaction_strategies return bag_sstable_set; leveled compaction
returns the specialized partitioned_sstable_set.
2016-07-03 10:27:01 +03:00
Avi Kivity
168696c558 Introduce partitioned_sstable_set
partitioned_sstable_set assumes that sstable are mostly partitioned along
the token range: only a few sstables will be needed to access a particular
token.  It is implemented as an interval_map.
2016-07-03 10:27:00 +03:00
Avi Kivity
64e4357461 Introduce bag_sstable_set
bag_sstable_set is a generic sstable_set implementation: it assumes nothing
about the sstables.  It is implemented as a vector, and any select will
return the entire sstable set.
2016-07-03 10:27:00 +03:00
Avi Kivity
85e9cf4616 Introduce sstable_set
sstable_set abstracts the notion of a container of sstables, allowing
different compaction strategies to supply their own implementation.  The
intended user is leveled compaction strategy; since it partitions sstables,
it can quickly restrict the number of sstables that participate in a query
by looking at the min/max partition key.

sstable_set also maintains an internal lw_shared_ptr<sstable_list>,
in parallel with the abstract container.  This is to support
column_family::get_sstable(), which returns a lw_shared_ptr<sstable_list>
which must be anchored somewhere if it is not saved at the caller side,
as it isn't in most current callers.
2016-07-03 10:27:00 +03:00
Avi Kivity
2a46410f4a Change sstable_list from a map to a set
sstable_list is now a map<generation, sstable>; change it to a set
in preparation for replacing it with sstable_set.  The change simplifies
a lot of code; the only casualty is the code that computes the highest
generation number.
2016-07-03 10:26:57 +03:00
Raphael S. Carvalho
588ce915d6 compaction: disable parallel compaction for leveled strategy
It was discussed that leveled strategy may not benefit from parallel
compaction feature because almost all compaction jobs will have similar
size. It was also found that leveled strategy wasn't working correctly
with it because two overlapping sstable (targetting the same level)
could be created in parallel by two ongoing compaction.

Fixes #1293.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <60fe165d611c0283ca203c6d3aa2662ab091e363.1464883077.git.raphaelsc@scylladb.com>
2016-06-05 18:20:00 +03:00
Raphael S. Carvalho
bf03cd1ea6 sstables: kill unused code from size tiered strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <485b1e49419cb052218ab4558f27270ce3bd03b4.1460761821.git.raphaelsc@scylladb.com>
2016-04-19 08:46:06 +03:00
Raphael S. Carvalho
29db5f5e1f sstables: move compaction strategy code to a new source file
Moving compaction strategy code from sstables/compaction.cc to
sstables/compaction_strategy.cc
That improves readability. Strategy code should be separated
from the generic compaction code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <5af6fc8f7321351a071fc0ce03c80ffea21f8396.1460761821.git.raphaelsc@scylladb.com>
2016-04-19 08:45:43 +03:00