Commit Graph

435 Commits

Author SHA1 Message Date
Taras Borodin
c155ae1182 add utf8:validate to operator<< partition_key with_schema. 2022-09-22 16:42:31 +03:00
Botond Dénes
05ef13a627 Merge 'Add support to split large partitions across SSTables' from Raphael "Raph" Carvalho
Introduces support to split large partitions during compaction. Today, compaction can only split input data at partition boundary, so a large partition is stored in a single file. But that can cause many problems, like memory pressure (e.g.: https://github.com/scylladb/scylladb/issues/4217), and incremental compaction can also not fulfill its promise as the file storing the large partition can only be released once exhausted.

The first step was to add clustering range metadata for first and last partition keys (retrieved from promoted index), which is crucial to determine disjointness at clustering level, and also the order at which the disjoint files should be opened for incremental reading.

The second step was to extend sstable_run to look at clustering dimension, so a set of files storing disjoint ranges for the same partition can live in the same sstable run.

The final step was to introduce the option for compaction to split large partition being written if it has exceeded the size threshold.

What's next? Following this series, a reader will be implemented for sstable_run that will incrementally open the readers. It can be safely built on the assumption of the disjoint invariant after the second step aforementioned.

Closes #11233

* github.com:scylladb/scylladb:
  test: Add test for large partition splitting on compaction
  compaction: Add support to split large partitions
  sstable: Extend sstable_run to allow disjointness on the clustering level
  sstables: simplify will_introduce_overlapping()
  test: move sstable_run_disjoint_invariant_test into sstable_datafile_test
  test: lib: Fix inefficient merging of mutations in make_sstable_containing()
  sstables: Keep track of first partition's first pos and last partition's last pos
  sstables: Rename min/max position_range to a descriptive name
  sstables_manager: Add sstable metadata reader concurrency semaphore
  sstables: Add ability to find first or last position in a partition
2022-09-15 16:08:56 +03:00
Raphael S. Carvalho
a04047f390 compaction: Properly handle stop request for off-strategy
If user stops off-strategy via API, compaction manager can decide
to give up on it completely, so data will sit unreshaped in
maintenance set, preventing it from being compacted with data
in the main set. That's problematic because it will probably lead
to a significant increase in read and space amplification until
off-strategy is triggered again, which cannot happen anytime
soon.

Let's handle it by moving data in maintenance set into main one,
even if unreshaped. Then regular compaction will be able to
continue from where off-strategy left off.

Fixes #11543.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11545
2022-09-15 09:21:22 +03:00
Raphael S. Carvalho
e2ccafbe38 compaction: Add support to split large partitions
Adds support for splitting large partitions during compaction.

Large partitions introduce many problems, like memory overhead and
breaks incremental compaction promise. We want to split large
partitions across fixed-size fragments. We'll allow a partition
to exceed size limit by 10%, as we don't want to unnecessarily split
partitions that just crossed the limit boundary.

To avoid having to open a minimal of 2 fragments in a read, partition
tombstone will be replicated to every fragment storing the
partition.

The splitting isn't enabled by default, and can be used by
strategies that are run aware like ICS. LCS still cannot support
it as it's still using physical level metadata, not run id.

An incremental reader for sstable runs will follow soon.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-14 13:23:16 -03:00
Nadav Har'El
8ece63c433 Merge 'Safemode - Introduce TimeWindowCompactionStrategy Guardrails'
This series introduces two configurable options when working with TWCS tables:

- `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting
- `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check.

Refs: #6923
Fixes: #9029

Closes #11445

* github.com:scylladb/scylladb:
  tests: cql_query_test: add mixed tests for verifying TWCS guard rails
  tests: cql_query_test: add test for TWCS window size
  tests: cql_query_test: add test for TWCS tables with no TTL defined
  cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables
  cql: add max window restriction for TimeWindowCompactionStrategy
  time_window_compaction_strategy: reject invalid window_sizes
  cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr
2022-09-12 23:55:51 +03:00
Felipe Mendes
f1ffb501f0 time_window_compaction_strategy: reject invalid window_sizes
Scylla mistakenly allows an user to configure an invalid TWCS window_size <= 0, which effectively breaks the notion of compaction windows.
Interestingly enough, a <= 0 window size should be considered an undefined behavior as either we would create a new window every 0 duration (?) or the table would behave as STCS, the reader is encouraged to figure out which one of these is true. :-)

Cassandra, on the other hand, will properly throw a ConfigurationException when receiving such invalid window sizes and we now match the behavior to the same as Cassandra's.

Refs: #2336
2022-09-11 16:40:03 -03:00
Raphael S. Carvalho
44913ebbd0 compaction_manager: restore indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
888660fa44 compaction_manager: Make remove() and stop_ongoing_compactions() noexcept
stop_ongoing_compactions() is made noexcept too as it's called from
remove() and we want to make the latter noexcept, to allow compaction
group to qualify its stop function as noexcept too.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Benny Halevy
e9cfe9e572 tombstone_gc: deglobalize repair_history_maps
Move the thread-local instances of the
per-table repair history maps into compaction_manager.

Fixes #11208

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-07 07:43:15 +03:00
Benny Halevy
d86810d22c mutation_partition: compact_for_compaction_v2: get tombstone_gc_state
To be passed down to compact_mutation_state in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-07 07:43:15 +03:00
Benny Halevy
7e4612d3aa mutation_readers: pass tombstone_gc_state to compating_reader
To be passed further done to `compact_mutation_state` in
a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-07 07:43:14 +03:00
Benny Halevy
572d534d0d sstables: get_gc_before_*: get tombstone_gc_state from caller
Pass the tombstone_gc_state from the compaction_strategy
to sstables get_gc_before_* functions using the table state
to get to the tombstone_gc_state.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:05:39 +03:00
Benny Halevy
2cd3fc2f36 compaction: table_state: add virtual get_tombstone_gc_state method
and override it in table::table_state to get the tombstone_gc_state
from the table's compaction_manager.

It is going to be used in the next patched to pass the gc state
from the compaction_strategy down to sstables and compaction.

table_state_for_test was modified to just keep a null
tombstone_gc_state.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:05:39 +03:00
Benny Halevy
8b841e1207 compaction_manager: add tombstone_gc_state
Add a tombstone_gc_state member and methods to get it.

Currently the tombstone_gc_state is default constructed,
but a following patch will move the thread-local
repair history maps into the compaction_manager as a member
and then the _tombstone_gc_state member will be initialized
from that member.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:02:54 +03:00
Botond Dénes
b89b84ad3c compaction: scrub/abort: be more verbose
Currently abort-mode scrub exits with a message which basically says
"some problem was found", with no details on what problem it found. Add
a detailed error report on the found problem before aborting the scrub.

Closes #11418
2022-09-06 11:42:34 +03:00
Benny Halevy
7747b8fa33 sstables: define run_identifier as a strong tagged_uuid type
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11321
2022-08-18 19:03:10 +03:00
Benny Halevy
14faa3b6f4 compaction_manager: perform_cleanup, perform_sstable_upgrade: use a lw_shared_ptr for owned token ranges
And completely get rid of the dependency on replica::database.

Also, add respective rest_api tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 08:08:11 +03:00
Benny Halevy
e1fe598760 compaction: cleanup, upgrade: use a lw_shared_ptr for owned token ranges
Currently they are copied for the get_sstables function
so this change reduces copies.

Also, it will allow further decoupling of compaction_manager
from replica::database, by letting the caller of
perform_cleanup and perform_sstable_upgrade get the
owned token ranges from db and pass it to the perform_*
functions in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:57:41 +03:00
Benny Halevy
e4e92d44ae main: start compaction_manager as a sharded service
And pass a reference to it to the database rather
than having the database construct its own compaction_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:50:15 +03:00
Benny Halevy
7f70949693 compaction_manager: keep config as member
Rather than keeping separate, duplicated members.

And define helpers to get those members.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:48:01 +03:00
Benny Halevy
450ecd60c6 backlog_controller: scheduling_group: define default member initializers
To prepare for the next patch, implement default initialization
of the scheduling_group and io_priority_class, to the default values.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:38:40 +03:00
Aleksandra Martyniuk
6ea5bc96d7 scrub compaction: return status indicating aborted operations
over the rest api

Performing compaction scrub user did not know whether an operation
was aborted.

If compaction scrub is aborted, return status the user gets over
rest api is set to 1.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
f1980f8dc6 scrub compaction: count validation errors and return status over the rest api
Performing compaction scrub user did not know whether any validation
errors were encountered.

The number of validation errors per given compaction scrub is gathered
and summed from each shard. Basing on that value return status over
the rest api is set to 3 if any validation errors were encountered.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
7d457cffb8 scrub compaction: count validation errors for specific scrub task
The number of validation errors per given compaction scrub on given
shard is passed up to perform_task() function.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
3a805a9d9b compaction: extract statistics in compaction_result
Statistics from compaction_result are extracted to new struct
compaction_stats and stored as a field of compaction_result.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
a80c187b20 scrub compaction: register validation errors in metrics
The number of validation errors is registered in metrics. Metrics
provide common counters for all scrub operation within a compaction
manager, though. Thus, to check the exact number of validation errors,
the comparison of counters before and after scrub operation needs
to be done.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
ab85dab05d scrub compaction: count validation errors
The number of validation errors encountered during scrub compaction
is counted.
2022-07-29 09:35:20 +02:00
Benny Halevy
f26e655646 compaction_manager: add maybe_wait_for_sstable_count_reduction
Called from try_flush_memtable_to_sstable,
maybe_wait_for_sstable_count_reduction will wait for
compaction to catch up with memtable flush if there
the bucket to compact is inflated, having too many
sstables.  In that case we don't want to add fuel
to the fire by creating yet another sstable.

Fixes #4116

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 14:43:30 +03:00
Benny Halevy
69d4a16908 time_window_compaction_strategy: get_sstables_for_compaction: clean up code
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 14:22:03 +03:00
Benny Halevy
c450f3ee11 time_window_compaction_strategy: make get_sstables_for_compaction idempotent
To make sure fully_expired sstables are not missed
if get_sstables_for_compaction is called just heuristically,
change the state by setting _last_expired_check
to the current time only when no fully_expired_sstables are found
among the candidates.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 14:22:03 +03:00
Benny Halevy
3d07882431 time_window_compaction_strategy: get_sstables_for_compaction: improve debug messages
Print the compaction_strategy `this` pointer
so we can distinguish between different instance of the
compaction_strategy object (some code paths copy it and
some may instantiate a branch new compaction_strategy object).

The motivation is detecting when the side effects of this function are
applied on the "master" instance, stored in the table shard.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 14:22:03 +03:00
Benny Halevy
a149022ed4 leveled_manifest: pass compaction_counter as const&
It is not modified by the leveld_manifest functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 14:22:03 +03:00
Igor Ribeiro Barbosa Duarte
8dd0f4672d compaction: Make compaction_static_shares liveupdateable
This patch makes compaction_static_shares liveupdateable
to avoid having to restart the cluster after updating
this config.

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-07-19 10:10:46 -03:00
Igor Ribeiro Barbosa Duarte
c2ee6492e6 backlog_controller: Unify backlog_controller constructors
This patch adds the _static_shares variable to the backlog_controller so that
instead of having to use a separate constructor when controller is disabled,
we can use a single constructor and periodically check on the adjust method
if we should use the static shares or the controller. This will be useful on
the next patches to make compaction_static_shares and memtable_flush_static_shares
live updateable.

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-07-19 10:06:12 -03:00
Raphael S. Carvalho
246e945086 compaction: remove forward declaration of replica::table
compaction_manager.cc still cannot stop including replica/database.hh
because upgrade and scrub still take replica::database as param,
but I'll remove it soon in another series.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
a94d974835 compaction_manager: make add() and remove() switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
31655acb5e compaction_manager: make run_custom_job() switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
9a1efc69d0 compaction_manager: major: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
cebe6e22cb compaction_manager: scrub: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
d29f7070d9 compaction_manager: upgrade: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
c2678ca661 compaction: table_state: add get_sstables_manager()
That will be needed for retrieving sstable manager in
perform_sstable_upgrade().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
bdd049afd6 compaction_manager: cleanup: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
f547e0f2fb compaction_manager: offstrategy: switch to table_state()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
538d412fba compaction_manager: rewrite_sstables(): switch to table_state
rewrite_sstables() is used by maintenance compactions that perform
an operation on a single file at a time.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
79e385057f compaction_manager: make run_with_compaction_disabled() switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
79f91fe61e compaction_manager: compaction_reenabler: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
7c1d178f4e compaction_manager: make submit(T) switch to table_state
Now that submit() switched to table_state, compaction_reenabler
and friends can switch to table_state too.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
a176022272 compaction_manager: task: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
43136a3ca7 compaction: table_state: Add is_auto_compaction_disabled_by_user()
auto_compaction_disabled_by_user is a configuration that can be enabled
or disabled on a particular table. We're adding this interface to
avoid having to push the configuration for every compaction_state,
which would result in redundant information as the configuration
value is the same for all table states.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
1deeeff825 compaction: table_state: Add on_compaction_completion()
The idea is that we'll have a single on-completion interface for both
"in-strategy" and off-strategy compactions, so not to pollute table_state
with one interface for each.
replica::table::on_compaction_completion is being moved into private namespace.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00