Commit Graph

34 Commits

Author SHA1 Message Date
Botond Dénes
86ed627fc4 compaction: move code to namespace compaction
The namespace usage in this directory is very inconsistent, with files
and classes scattered in:
* global namespace
* namespace compaction
* namespace sstables

With cases, where all three used in the same file. This code used to
live in sstables/ and some of it still retains namespace sstables as a
heritage of that time. The mismatch between the dir (future module) and
the namespace used is confusing, so finish the migration and move all
code in compaction/ to namespace compaction too.

This patch, although large, is mechanic and only the following kind of
changes are made:
* replace namespace sstable {} with namespace compaction {}
* add namespace compaction {}
* drop/add sstables::
* drop/add compaction::
* move around forward-declarations so they are in the correct namespace
  context

This refactoring revealed some awkward leftover coupling between
sstables and compaction, in sstables/sstable_set.cc, where the
make_sstable_set() methods of compaction strategies are implemented.
2025-09-25 15:03:56 +03:00
Raphael S. Carvalho
9d3755f276 replica: Futurize retrieval of sstable sets in compaction_group_view
This will allow upcoming work to gently produce a sstable set for
each compaction group view. Example: repaired and unrepaired.

Locking strategy for compaction's sstable selection:
Since sstable retrieval path became futurized, tasks in compaction
manager will now hold the write lock (compaction_state::lock)
when retrieving the sstable list, feeding them into compaction
strategy, and finally registering selected sstables as compacting.
The last step prevents another concurrent task from picking the
same sstable. Previously, all those steps were atomic, but
we have seen stall in that area in large installations, so
futurization of that area would come sooner or later.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:58:00 +03:00
Raphael S. Carvalho
20c3301a1a treewide: Futurize estimation of pending compaction tasks
This is to allow futurization of compaction_group_view method that
retrieves sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:51:29 +03:00
Raphael S. Carvalho
2c4a9ba70c treewide: Rename table_state to compaction_group_view
Since table_state is a view to a compaction group, it makes sense
to rename it as so.

With upcoming incremental repair, each replica::compaction_group
will be actually two compaction groups, so there will be two
views for each replica::compaction_group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:51:28 +03:00
Botond Dénes
efc48caea5 readers/mutation_reader: s/reader_consumer_v2/mutation_reader_consumer/ 2025-05-09 07:53:29 -04:00
Raphael S. Carvalho
21d1e78457 compaction: Wire table_state into make_sstable_set()
This will be useful for feeding token range owned by compaction group
into sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-04-29 15:47:33 -03:00
Benny Halevy
88ae067ddb everywhere: add skeletal support for the in_memory_tables feature
Forward-ported from scylla-enterprise.
Note that the feature has been deprecated and the implementation
is provided only for backward compatibility with pre-existing
features and schema.

Tested manually after adding the following to feature_service:
```
    gms::feature workload_prioritization { *this, "WORKLOAD_PRIORITIZATION"sv };
```

Launched a single-node cluster running 2023.1.10
```
cqlsh> create KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> create TABLE ks.test ( pk int PRIMARY KEY, val int ) WITH compaction = {'class': 'InMemoryCompactionStrategy'};
```

log:
```
Scylla version 2023.1.10-0.20241227.21cffccc1ccd with build-id bd65b8399cb13b713a87e57fe333cfcabfd50be7 starting ...
...
INFO  2024-12-27 19:45:16,563 [shard 0] migration_manager - Create new ColumnFamily: org.apache.cassandra.config.CFMetaData@0x600000f1b400[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName=ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,readRepairChance=0,dcLocalReadRepairChance=0,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,keyValidator=org.apache.cassandra.db.marshal.Int32Type,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.InMemoryCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,in_memory=false,version=5529c631-c47a-11ef-bd1d-4295734ce5a8,droppedColumns={},collections={},indices={}]
INFO  2024-12-27 19:45:16,564 [shard 0] schema_tables - Creating ks.test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ec88d510-6aff-344a-914d-541d37081440
```

Upgraded to this branch and started scylla.
Verified that ks.test was successfuly loaded:

log:
```
INFO  2024-12-27 19:48:58,115 [shard 0:main] init - Scylla version 6.3.0~dev-0.20241227.a64c6dfc153e with build-id f9496134a09cf2e55d3865b9e9ff499f672aa7da starting ...
...
WARN  2024-12-27 19:53:02,948 [shard 1:main] CompactionStrategy - InMemoryCompactionStrategy is no longer supported. Defaulting to NullCompactionStrategy.
...
INFO  2024-12-27 19:53:02,948 [shard 0:main] database - Keyspace ks: Reading CF test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ec88d510-6aff-344a-914d-541d37081440 storage=/home/bhalevy/scylladb/data/ks/test-5529c630c47a11efbd1d4295734ce5a8
```

Then, tested:
```
cqlsh> describe KEYSPACE ks;

CREATE KEYSPACE ks WITH replication = {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true AND tablets = {'enabled': false};

CREATE TABLE ks.test (
    pk int,
    val int,
    PRIMARY KEY (pk)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'InMemoryCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND speculative_retry = '99.0PERCENTILE';

cqlsh> alter TABLE ks.test with compaction = {'class': 'SizeTieredCompactionStrategy'};
cqlsh> describe KEYSPACE ks;

CREATE KEYSPACE ks WITH replication = {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true AND tablets = {'enabled': false};

CREATE TABLE ks.test (
    pk int,
    val int,
    PRIMARY KEY (pk)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND speculative_retry = '99.0PERCENTILE'
    AND tombstone_gc = {'mode': 'timeout', 'propagation_delay_in_seconds': '3600'};
```

log:
```
INFO  2024-12-27 19:56:40,465 [shard 0:stmt] migration_manager - Update table 'ks.test' From org.apache.cassandra.config.CFMetaData@0x60000362d800[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName==ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.InMemoryCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,version=ec88d510-6aff-344a-914d-541d37081440,droppedColumns={},collections={},indices={}] To org.apache.cassandra.config.CFMetaData@0x60000336e000[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName==ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,version=ecccf010-c47b-11ef-b52c-622f2f0e87c4,droppedColumns={},collections={},indices={}]
INFO  2024-12-27 19:56:40,466 [shard 0: gms] schema_tables - Altering ks.test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ecccf010-c47b-11ef-b52c-622f2f0e87c4
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22068
2025-01-20 16:55:17 +02:00
Raphael S. Carvalho
c973254362 Introduce incremental compaction strategy (ICS)
ICS is a compaction strategy that inherits size tiered properties --
therefore it's write optimized too -- but fixes its space overhead of
100% due to input files being only released on completion. That's
achieved with the concept of sstable run (similar in concept to LCS
levels) which breaks a large sstable into fixed-size chunks (1G by
default), known as run fragments. ICS picks similar-sized runs
for compaction, and fragments of those runs can be released
incrementally as they're compacted, reducing the space overhead
to about (number_of_input_runs * 1G). This allows user to increase
storage density of nodes (from 50% to ~80%), reducing the cost of
ownership.

NOTE: test_system_schema_version_is_stable adjusted to account for batchlog
using IncrementalCompactionStrategy

contains:

compaction/: added incremental_compaction_strategy.cc (.hh), incremental_backlog_tracker.cc (.hh)
compaction/CMakeLists.txt: include ICS cc files
configure.py: changes for ICS files, includes test
db/legacy_schema_migrator.cc / db/schema_tables.cc: fallback to ICS when strategy is not supported
db/system_keyspace: pick ICS for some system tables
schema/schema.hh: ICS becomes default
test/boost: Add incremental_compaction_test.cc
test/boost/sstable_compaction_test.cc: ICS related changes
test/cqlpy/test_compaction_strategy_validation.py: ICS related changes

docs/architecture/compaction/compaction-strategies.rst: changes to ICS section
docs/cql/compaction.rst: changes to ICS section
docs/cql/ddl.rst: adds reference to ICS options
docs/getting-started/system-requirements.rst: updates sentence mentioning ICS
docs/kb/compaction.rst: changes to ICS section
docs/kb/garbage-collection-ics.rst: add file
docs/kb/index.rst: add reference to <garbage-collection-ics>
docs/operating-scylla/procedures/tips/production-readiness.rst: add ICS section

some relevant commits throughout the ICS history:

commit 434b97699b39c570d0d849d372bf64f418e5c692
Merge: 105586f747 30250749b8
Author: Paweł Dziepak <pdziepak@scylladb.com>
Date:   Tue Mar 12 12:14:23 2019 +0000

    Merge "Introduce Incremental Compaction Strategy (ICS)" from Raphael

    "
    Introduce new compaction strategy which is essentially like size tiered
    but will work with the existing incremental compaction. Thus incremental
    compaction strategy.

    It works like size tiered, but each element composing a tier is a sstable
    run, meaning that the compaction strategy will look for N similar-sized
    sstable runs to compact, not just individual sstables.

    Parameters:
    * "sstable_size_in_mb": defines the maximum sstable (fragment) size
    composing
    a sstable run, which impacts directly the disk space requirement which is
    improved with incremental compaction.
    The lower the value the lower the space requirement for compaction because
    fragments involved will be released more frequently.
    * all others available in size tiered compaction strategy

    HOWTO
    =====

    To change an existing table to use it, do:
         ALTER TABLE mykeyspace.mytable  WITH compaction =
    {'class' : 'IncrementalCompactionStrategy'};

    Set fragment size:
         ALTER TABLE mykeyspace.mytable  WITH compaction =
    {'class' : 'IncrementalCompactionStrategy', 'sstable_size_in_mb' : 1000 }

    "

commit 94ef3cd29a196bedbbeb8707e20fe78a197f30a1
Merge: dca89ce7a5 e08ef3e1a3
Author: Avi Kivity <avi@scylladb.com>
Date:   Tue Sep 8 11:31:52 2020 +0300

    Merge "Add feature to limit space amplification in Incremental Compaction" from Raphael

    "
    A new option, space_amplification_goal (SAG), is being added to ICS. This option
    will allow ICS user to set a goal on the space amplification (SA). It's not
    supposed to be an upper bound on the space amplification, but rather, a goal.
    This new option will be disabled by default as it doesn't benefit write-only
    (no overwrites) workloads and could hurt severely the write performance.
    The strategy is free to delay triggering this new behavior, in order to
    increase overall compaction efficiency.

    The graph below shows how this feature works in practice for different values
    of space_amplification_goal:
    https://user-images.githubusercontent.com/1409139/89347544-60b7b980-d681-11ea-87ab-e2fdc3ecb9f0.png

    When strategy finds space amplification crossed space_amplification_goal, it
    will work on reducing the SA by doing a cross-tier compaction on the two
    largest tiers. This feature works only on the two largest tiers, because taking
    into account others, could hurt the compaction efficiency which is based on
    the fact that the more similar-sized sstables are compacted together the higher
    the compaction efficiency will be.

    With SAG enabled, min_threshold only plays an important role on the smallest
    tiers, given that the second-largest tier could be compacted into the largest
    tier for a space_amplification_goal value < 2.
    By making the options space_amplification_goal and min_threshold independent,
    user will be able to tune write amplification and space amplification, based on
    the needs. The lower the space_amplification_goal the higher the write
    amplification, but by increasing the min threshold, the write amplification
    can be decreased to a desired amount.
    "

commit 7d90911c5fb3fa891ad64a62147c3a6ca26d61b1
Author: Raphael S. Carvalho <raphaelsc@scylladb.com>
Date:   Sat Oct 16 13:41:46 2021 -0300

    compaction: ICS: Add garbage collection

    Today, ICS lacks an approach to persist expired tombstones in a timely manner,
    which is a problem because accumulation of tombstones are known to affecting
    latency considerably.

    For an expired tombstone to be purged, it has to reach the top of the LSM tree
    and hope that older overlapping data wasn't introduced at the bottom.
    The condition are there and must be satisfied to avoid data resurrection.

    STCS, today, has an inefficient garbage collection approach because it only
    picks a single sstable, which satisfies the tombstone density threshold and
    file staleness. That's a problem because overlapping data either on same tier
    or smaller tiers will prevent tombstones from being purged. Also, nothing is
    done to push the tombstones to the top of the tree, for the conditions to be
    eventually satisfied.

    Due to incremental compaction, ICS can more easily have an effecient GC by
    doing cross-tier compaction of relevant tiers.

    The trigger will be file staleness and tombstone density, which threshold
    values can be configured by tombstone_compaction_interval and
    tombstone_threshold, respectively.

    If ICS finds a tier which meets both conditions, then that tier and the
    larger[1] *and* closest-in-size[2] tier will be compacted together.
    [1]: A larger tier is picked because we want tombstones to eventually reach the
    top of the tree.
    [2]: It also has to be the closest-in-size tier as the smaller the size
    difference the higher the efficiency of the compaction. We want to minimize
    write amplification as much as possible.
    The staleness condition is there to prevent the same file from being picked
    over and over again in a short interval.

    With this approach, ICS will be continuously working to purge garbage while
    not hurting overall efficiency on a steady state, as same-tier compactions are
    prioritized.

    Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
    Message-Id: <20211016164146.38010-1-raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22063
2025-01-04 15:43:52 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
e87b64b7bb compaction: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-02 14:06:42 +08:00
Raphael S. Carvalho
0ce8ee03f1 compaction: wire storage free space into reshape procedure
After this, TWCS reshape procedure can be changed to limit job
to 10% of available space.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-06-13 12:53:27 -03:00
Ferenc Szili
b50a9f9bab removed forward declaration of resharding_descriptor
resharding_descriptor has been removed in e40aa042 in 2020
2024-03-22 11:35:10 +01:00
Raphael S. Carvalho
b551f4abd2 streaming: Improve partition estimation with TWCS
When off-strategy is disabled, data segregation is not postponed,
meaning that getting partition estimate right is important to
decrease filter's false positives. With streaming, we don't
have min and max timestamps at destination, well, we could have
extended the RPC verb to send them, but turns out we can deduce
easily the amount of windows using default TTL. Given partitioner
random nature, it's not absurd to assume that a given range being
streamed may overlap with all windows, meaning that each range
will yield one sstable for each window when segregating incoming
data. Today, we assume the worst of 100 windows (which is the
max amount of sstables the input data can be segregated into)
due to the lack of metadata for estimating the window count.
But given that users are recommended to target a max of ~20
windows, it means partition estimate is being downsized 5x more
than needed. Let's improve it by using default TTL when
estimating window count, so even on absence of timestamp
metadata, the partition estimation won't be way off.

Fixes #15704.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-11-08 12:10:03 +02:00
Raphael S. Carvalho
8997fe0625 compaction: Switch to strategy_control::candidates() for regular compaction
Now everything is prepared for the switch, let's do it.

Now let's wait for ICS to enjoy the set of changes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-09-25 17:18:21 -03:00
Raphael S. Carvalho
d6029a195e Remove DateTieredCompactionStrategy
This is the last step of deprecation dance of DTCS.

In Scylla 5.1, users were warned that DTCS was deprecated.

In 5.2, altering or creation of tables with DTCS was forbidden.

5.3 branch was already created, so this is targetting 5.4.

Users that refused to move away from DTCS will have Scylla
falling back to the default strategy, either STCS or ICS.

See:
WARN  2023-07-14 09:49:11,857 [shard 0] schema_tables - Falling back to size-tiered compaction strategy after the problem: Unable to find compaction strategy class 'DateTieredCompactionStrategy

Then user can later switch to a supported strategy with
alter table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14559
2023-07-14 16:20:48 +03:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Raphael S. Carvalho
1ffe2f04ef compaction: add table_state param to compaction_strategy::notify_completion()
once compaction_strategy is made staless, the state must be retrieved
in notify_completion() through table_state.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 13:40:02 -03:00
Raphael S. Carvalho
232e71f2cf compaction: add const-qualifier to a few compaction_strategy methods
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 11:13:10 -03:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Raphael S. Carvalho
b88acffd66 replica: Allow one compaction_backlog_tracker for each compaction_group
Today, compaction_backlog_tracker is managed in each compaction_strategy
implementation. So every compaction strategy is managing its own
tracker and providing a reference to it through get_backlog_tracker().

But this prevents each group from having its own tracker, because
there's only a single compaction_strategy instance per table.
To remove this limitation, compaction_strategy impl will no longer
manage trackers but will instead provide an interface for trackers
to be created, such that each compaction group will be allowed to
have its own tracker, which will be managed by compaction manager.

On compaction strategy change, table will update each group with
the new tracker, which is created using the previously introduced
ompaction_group_sstable_set_updater.

Now table's backlog will be the sum of all compaction_group backlogs.
The normalization factor is applied on the sum, so we don't have
to adjust each individual backlog to any factor.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:22:51 -03:00
Benny Halevy
78d6f6a519 compaction: sanitize headers from flat_mutation_reader v1
flat_mutation_reader make_scrubbing_reader no longer exists
and there is no need to include flat_mutation_reader.hh
nor forward declare the class.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-28 17:23:04 +03:00
Raphael S. Carvalho
2a9bfa3e3f compaction_strategy: get_cleanup_compaction_jobs: accept candidates by value
Then caller can decide whether to copy or move candidate set into the
function. cleanup_sstables_compaction_task can move candidates as
it's no longer needed once it retrieves all descriptors.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-29 09:49:13 -03:00
Raphael S. Carvalho
44e9e10414 compaction_strategy: Allow strategies to define their own cleanup strategy
Today, all compaction strategies will clean up their files using the
incremental approach of one sstable being rewritten at a time.

Turns out that's not the best approach performance wise. Let's take
STCS for example. As cleanup finishes rewriting one file, the output
file is placed into the sstable set. Regular now can compact that
file with another that was already there (e.g. produced by flush after
cleanup started). Inefficient compactions like this can keep happening
as cleanup incrementally places output file into the candidate list
for regular.

This method will allow strategies to clean up their files in batches.
For example, STCS can clean up all files in smallest tiers in single
round, allowing the output data to be added at once. So next compaction
rounds can be more efficient in terms of writeamp. Another benefit is
that deduplication and GC can happen more efficiently.

The drawback is the space requirement, as we no longer compact one file
a a time. However, the impact is minimized by cleaning up the smallest
tier first. With leveled strategy for example, even though 90% of data
is in highest level, the space requirement is not a problem because
we can apply the incremental compaction on its behalf. The same applies
to ICS. With STCS, the requirement is the size of the tier being
compacted, but that's already expected by its users anyway.

By the time being, all strategies have it unimplemented. so they still
use the old behavior where files are rewritten on at a time.
This will allow us to incrementally implement the cleanup method for
all compaction strategies.

Refs #10097.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-23 00:04:03 -03:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Botond Dénes
1ba19c2aa4 compaction/compaction_strategy: convert make_interposer_consumer() to v2
The underlying timestamp-based splitter is v2 already.
2022-01-07 13:51:59 +02:00
Raphael S. Carvalho
49f40c8791 compaction: Implement strategy control and wire it
This implements strategy control interface for both manager and
tests, and wire it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-13 16:05:23 -03:00
Raphael S. Carvalho
2f9f089eda compaction_strategy: kill unused compaction_strategy_type::major
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-03 12:27:10 -03:00
Raphael S. Carvalho
9725e5efa9 compaction_strategy: kill unused can_compact_partial_runs()
This strategy method was introduced unnecessarily. We assume it was
going to be needed, but turns out it was never needed, not even
for ICS. Also it's built on a wrong assumption as an output
sstable run being generated can never be compacted in parallel
as the non-overlapping requirement can be easily broken.
LCS for example can allow parallel compaction on different runs
(levels) but correctness cannto be guaranteed with same runs
are compacted in parallel.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-03 12:20:51 -03:00
Raphael S. Carvalho
bb5a8682f3 compaction: stop including database.hh for compaction_strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 11:29:47 -03:00
Raphael S. Carvalho
e2f6a47999 compaction: switch to table_state in estimated_pending_compactions()
Last method in compaction_strategy using table. From now on,
compaction strategy no longer works directly with table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 11:25:28 -03:00
Raphael S. Carvalho
93ae9225f7 compaction: switch to table_state in compaction_strategy::get_major_compaction_job()
From now on, get_major_compaction_job() will use table_state instead of
a plain reference to table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 11:25:22 -03:00
Raphael S. Carvalho
d881310b52 compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
From now on, get_sstables_for_compaction() will use table_state.
With table_state, we avoid layer violations like strategy using
manager and also makes testing easier.

Compaction unit tests were temporarily disabled to avoid a giant
commit which is hard to parse.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 10:52:14 -03:00
Asias He
6350a19f73 compaction: Move compaction_strategy.hh to compaction dir
The top dir is a mess. Move compaction_strategy.hh and
compaction_strategy_type.hh to the new home.
2021-08-07 08:06:37 +08:00