Commit Graph

35 Commits

Author SHA1 Message Date
Glauber Costa
e40aa042a7 distributed_loader: reshard before the node is made online
This patch moves the resharding process to use the new
directory_with_sstables_handler infrastructure. There is no longer
a clear reshard step, and that just becomes a natural part of
populate_column_family.

In main.cc, a couple of changes are necessary to make that happen.
The first one obviously is to stop calling reshard. We also need to
make sure that:
 - The compaction manager is started much earlier, so we can register
   resharding jobs with it.
 - auto compactions are disabled in the populate method, so resharding
   doesn't have to fight for bandwidth with auto compactions.

Now that we are resharding through the sstable_directory, the old
resharding code can be deleted. There is also no need to deal with
the resharding backlog either, because the SSTables are not yet
added to the sstable set at this point.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-06-18 09:37:18 -04:00
Glauber Costa
3c254dd49d compaction_strategy: add method to reshape SSTables
Some SSTable sets are considered to be off-strategy: they are in a shape
that is at best not optimal and at worst adversarial to the current
compaction strategy.

This patch introduces the compaction strategy-specific method
get_reshaping_job(). Given an SSTable set, it returns one compaction
that can be done to bring the table closer to being in-strategy. The
caller can then call this repeatedly until the table is fully
in-strategy.

As an example of how this is supposed to work, consider TWCS: some
SSTables will belong to a single window -> in which case they are
already in-strategy and don't need to be compacted, and others span
multiple windows in which case they are considered off-strategy and
have to be compacted.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-06-18 09:37:18 -04:00
Raphael S. Carvalho
097a5e9e07 compaction: Disable garbage collected writer if interposer consumer is used
GC writer, used for incremental compaction, cannot be currently used if interposer
consumer is used. That's because compaction assumes that GC writer will be operated
only by a single compaction writer at a given point in time.
With interposer consumer, multiple writers will concurrently operate on the same
GC writer, leading to race condition which potentially result in use-after-free.

Let's disable GC writer if interposer consumer is enabled. We're not losing anything
because GC writer is currently only needed on strategies which don't implement an
interposer consumer. Resharding will always disable GC writer, which is the expected
behavior because it doesn't support incremental compaction yet.
The proper fix, which allows GC writer and interposer consumer to work together,
will require more time to implement and test, and for that reason, I am postponing
it as #6472 is a showstopper for the current release.

Fixes #6472.

tests: mode(dev).

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200526195428.230472-1-raphaelsc@scylladb.com>
2020-05-29 08:26:43 +02:00
Glauber Costa
44a0e40cb2 compaction: move compaction_strategy_type to its own header
I just hit a circularity in header inclusion that I traced back to the
fact that schema.hh includes compaction_strategy.hh. schema.hh is in
turn included in lots of places, so a circularity is not hard to come
by.

The schema header really only needs to know about the compaction_type,
so it can inform schema users about it. Following the trend in header
clenups, I am moving that to a separate header which will both break
the circularity and make sure we are included less stuff that is not
needed.

With this change, Scylla fails to compile due to a new missing forward
declaration at index/secondary_index_manager.hh, so this is fixed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200527172203.915936-1-glauber@scylladb.com>
2020-05-29 08:14:27 +03:00
Raphael S. Carvalho
76cde84540 sstables/compaction_manager: Fix logic for filtering out partial sstable runs
ignore_partial_runs() brings confusion because i__p__r() equal to true
doesn't mean filter out partial runs from compaction. It actually means
not caring about compaction of a partial run.

The logic was wrong because any compaction strategy that chooses not to ignore
partial sstable run[1] would have any fragment composing it incorrectly
becoming a candidate for compaction.
This problem could make compaction include only a subset of fragments composing
the partial run or even make the same fragment be compacted twice due to
parallel compaction.

[1]: partial sstable run is a sstable that is still being generated by
compaction and as a result cannot be selected as candidate whatsoever.

Fix is about making sure partial sstable run has none of its fragments
selected for compaction. And also renaming i__p__r.

Fixes #4729.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190807022814.12567-1-raphaelsc@scylladb.com>
2019-08-08 14:11:35 +03:00
Botond Dénes
a280dcfe4c compaction_strategy: add add_interposer_consumer()
This will be the customization point for compaction strategies, used to
inject a specific interposer consumer that can manipulate the fragment
stream so that it satisfies the requirements of the compaction strategy.
For now the only candidate for injecting such an interposer is
time-window compaction strategy, which needs to write sstables that
only contains atoms belonging to the same time-window. By default no
interposer is injected.
Also add an accompanying customization point
`adjust_partition_estimate()` which returns the estimated per-sstable
partition-estimate that the interposer will produce.
2019-06-26 15:45:59 +03:00
Botond Dénes
20d9d18ab3 compaction_strategy.hh: use schema_fwd.hh 2019-05-14 13:27:30 +03:00
Raphael S. Carvalho
3d9566e40d compaction: introduce notion of compaction-strategy-aware major compaction
That's only the very first step which introduces the machinery for making
major compaction aware of all strategies. By the time being, default
implementation is used for them all which only suits size tiered.

Refs #1431.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-12-06 18:22:30 -02:00
Raphael S. Carvalho
e88d1d54b9 sstables/compaction_manager: prevent partial run from being selected for compaction
Filter out sstable belonging to a partial run being generated by an ongoing
compaction. Otherwise, that could lead to wrong decisions by the compaction
strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:22 -02:00
Avi Kivity
a71ab365e3 toplevel: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
cb549c767a database: rename column_family to table
The name "column_family" is both awkward and obsolete. Rename to
the modern and accurate "table".

An alias is kept to avoid huge code churn.

To prevent a One Definition Rule violation, a preexisting "table"
type is moved to a new namespace row_cache_stress_test.

Tests: unit (release)
Message-Id: <20180624065238.26481-1-avi@scylladb.com>
2018-06-24 14:54:46 +03:00
Glauber Costa
ca284174d0 infrastructure for backlog estimator for compaction work.
This patch adds infrastucture in various points in the system to allow
us to determine the amount of work present as backlog from compactions.

What needs to be done can be explained in three major pieces:

1) Add hooks in the points where sstables are added or inserted to a
   column family (or more precisely, to a compaction_strategy object).

2) Add hooks in reads and write monitors that allows a compaction
   backlog estimator (tracker) to become aware of bytes that are
   partially written and compacted away.

3) Add a per-column family class (compaction_backlog_tracker) that
   can be used to track work that is done and relevant to compactions
   (like the two above), and a compaction manager to provide a
   system-wide backlog based on the response of the individual trackers.

The definition of how much backlog one has is strategy-specific. The
Null strategy is easy, as it never really has any backlog, and so is the
major strategy - since what it really matters is the backlog of the
underlying compaction strategy.

Although backlogs are strategy-specific, they should be "compatible", in
the sense that if a particular strategy has more work to do, it should
yield a higher number than its counterparts.

All the others are presented in this patch as unimplemented: they will
always advertise a mild backlog that should yield a constant
CPU-utilization if used alone.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-01-02 18:43:07 -05:00
Glauber Costa
ecad1be161 compaction_strategy: add missing header
compaction_strategy.hh throws an exception, but it doesn't add the
exception header. It is working in-tree because of inclusion order,
but it broke one of my yet-out-of-tree changes.

In any case, it is best to add the headers we will need to the files,
and that is what this patch does.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20170912233326.26114-1-glauber@scylladb.com>
2017-09-13 08:40:15 +02:00
Avi Kivity
07feaf9c4c sstables: use support for lw_shared_ptr with incomplete type for shared_sstable
Use the lw_shared_ptr deleter support to define shared_sstable without
pulling the definition of class sstable, reducing compile time and
dependencies if only shared_sstable is needed.
2017-09-12 10:43:05 +03:00
Avi Kivity
f7023501d6 treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable>
Since shared_sstable is going to be its own type soon, we can't use the old alias.
2017-09-12 10:43:05 +03:00
Raphael S. Carvalho
7ecedac222 compaction: wire up time window compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-19 02:58:37 -03:00
Raphael S. Carvalho
13477075e2 compaction_strategy: implement resharding strategy for compaction strategies
Strategies other than leveled will reshard one shared sstable at
a time, and the target shard, shard at which job will run, for each
job will be chosen in a round-robin fashion.

For leveled strategy, we will reshard together smp::count adjacent
sstables that belong to same level.
The reason for that is because resharding one sstable at a time
may result in creation of file for each shard, meaning after
resharding we could end up with NO_SSTABLES*NO_SHARDS.

These resharding strategies will be used for our new resharding
algorithm.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-04-21 17:11:24 -03:00
Raphael S. Carvalho
a3bf7558f2 lcs: fix broken token range distribution at higher levels
Uniform token range distribution across sstables in a level > 1 was broken,
because we were only choosing sstable with lowest first key, when compacting
a level > 0. This resulted in performance problem because L1->L2 may have a
huge overlap over time, for example.
Last compacted key will now be stored for each level to ensure sort of
"round robin" selection of sstables for compactions at level >= 1.
That's also done by C*, and they were once affected by it as described in
https://issues.apache.org/jira/browse/CASSANDRA-6284.

Fixes #1719.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-30 14:09:16 -03:00
Raphael S. Carvalho
a2dc88889d db: enable clustering optimization only on dtcs
Leveled strategy will not benefit from this strategy because
there's only a few sstables that will contain a given partition
key, which means that a clustering key that belongs to a specific
partition key can only be in a few sstables as well.

Date tiered strategy is the one that will actually benefit the
most from this optimization. Size tiered may benefit from it too
if clustering key isn't overwritten, but it will not use the
clustering optimization.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 11:31:07 -03:00
Raphael S. Carvalho
b699ef2de3 compaction: wire up date tiered compaction strategy
After this commit, date tiered compaction strategy is supported
on Scylla.

To understand how it works, take a look at our wiki page:
https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction

Fixes #511.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
43926026c3 compaction: introduce compaction strategy method to estimate pending compaction
At the moment, it's not possible to know how many compaction are needed for
compaction strategy to be satisfied. It's not possible to know exactly the
number of pending compaction, but the strategy can provide an estimation.

For size tiered, it's based on number of sstables in each bucket. By dividing
bucket size by max threshold, we get number of compaction needed to compact
that single bucket.

For leveled, it's about the number of sstables that exceeds the limit in
each level.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <e209e52f6159ee274a8358b69961a7c0ce357f7d.1467667054.git.raphaelsc@scylladb.com>
2016-07-05 19:03:11 +03:00
Avi Kivity
c8237fc262 compaction_strategy: introduce make_sstable_set()
Allow compaction_strategy to create a container for sstables that is
optimized for the strategy.

Most compaction_strategies return bag_sstable_set; leveled compaction
returns the specialized partitioned_sstable_set.
2016-07-03 10:27:01 +03:00
Raphael S. Carvalho
588ce915d6 compaction: disable parallel compaction for leveled strategy
It was discussed that leveled strategy may not benefit from parallel
compaction feature because almost all compaction jobs will have similar
size. It was also found that leveled strategy wasn't working correctly
with it because two overlapping sstable (targetting the same level)
could be created in parallel by two ongoing compaction.

Fixes #1293.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <60fe165d611c0283ca203c6d3aa2662ab091e363.1464883077.git.raphaelsc@scylladb.com>
2016-06-05 18:20:00 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Raphael S. Carvalho
d44a5d1e94 compaction: filter out compacting sstables
The implementation is about storing generation of compacting sstables
in an unordered set per column family, so before strategy is called,
compaction manager will filter out compacting sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 01:18:29 -02:00
Raphael S. Carvalho
9c13c1c738 compaction: move compaction execution from strategy to manager
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 00:04:27 -02:00
Amnon Heiman
1b369be663 compaction_strategy should accept both class name and full class name
For compatibility reasons, compaction_strategy should accept both class
name strategy and the full class name that includes the package name.

In origin the result name depends on the configuration, we cannot mimic
that as we are using enum for the type.

So currently the return class name remains the class itself, we can
consider changing it in the future.

If the name is org.apache.cassandra.db.compaction.Name the it will be
compare as Name

The error message was modified to report the name it was given.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Fixes #545
2015-11-11 15:31:39 +02:00
Raphael S. Carvalho
06e9836a66 wire up support for leveled compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-16 01:57:04 -03:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Amnon Heiman
520e96c634 compaction strategy: Return the compaction type
The compaction strategy was modify to return its compaction type.
The type method calls the virtual impl type method. Each of the
implementations return its type.

A name method was added to the compaction strategy that return the name
according to the strategy type.

And the static type method was modified to recieve a const reference to
the string.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-09-13 13:19:52 +03:00
Pekka Enberg
6b42d0f9b1 compaction_strategy: Fix type() to throw configuration exception
Fix compaction_strategy::type() to throw configuration_exception which
is what Origin throws from CFMetaData.createCompactionStrategy(). This
ensures that the CQL error we send back to the client is the same.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-27 14:20:55 +03:00
Raphael S. Carvalho
634d00511b compaction: use compaction options in strategy
Support to compaction strategy options was recently added.
Previously, we were using default values in compaction strategy for
options, but now we can use the options defined in the schema.
Currently, we only support size-tiered strategy, so let's start
with it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-23 15:26:47 -03:00
Glauber Costa
d1496944d9 sstables: handle compaction strategy
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-23 00:02:11 -04:00
Raphael S. Carvalho
a99c92f1b6 sstable compaction: add initial support to size-tiered strategy
Size-tired strategy basically consists of creating buckets with sstables
of nearly the same size.
Afterwards, it will find the most interesting bucket, which size must be
between min threshold and max threshold. Bucket with the smallest average
size is the most interesting one.

Bucket hotness is also considered when finding the most interesting bucket,
but we don't support this yet.
We are also missing some code that discards sstable based on its coldness,
i.e. hardly read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-20 10:08:14 -03:00
Raphael S. Carvalho
719898d0e5 introduce automatic compaction
As the name implies, this patch introduces the concept of automatic
compaction for sstables.

Compaction task is triggered whenever a new sstable is written.
Concurrent compaction on the same column family isn't supported, so
compaction may be postponed if there is an ongoing compression.
In addition, seastar::gate is used both to prevent a new compaction
from starting and to wait for an ongoing compaction to finish, when
the system is asked for a shutdown.

This patch also introduces an abstract class for compaction strategy,
which is really useful for supporting multiple strategies.
Currently, null and major compaction strategies are supported.
As the name implies, null compaction strategy does nothing.
Major compaction strategy is about compacting all sstables into one.
This strategy may end up being helpful when adding support to major
compaction via nodetool.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-16 12:00:12 +03:00