3b424e391b introduced a loop
in `perform_cleanup` that waits until all sstables that require
cleanup are cleaned up.
However, with f1bbf705f9,
an sstable that is_eligible_for_compaction (i.e. it
is not in staging, awaiting view update generation),
may already be compacted by e.g. regular compaction.
And so perform_cleanup should interrupt that
by calling try_perform_cleanup, since the latter
reevaluates `update_sstable_cleanup_state` with
compaction disabled - that stops ongoing compactions.
Refs scylladb/scylladb#15673
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The sstables replay_position in stats_metadata is
valid only on the originating node and shard.
Therefore, validate the originating host and shard
before using it in compaction or table truncate.
Fixes#10080
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#16550
Make compaction tasks internal. Drop all internal tasks without parents
immediately after they are done.
Fixes: #16735
Refs: #16694.
Closesscylladb/scylladb#16698
* github.com:scylladb/scylladb:
compaction: make regular compaction tasks internal
tasks: don't keep internal root tasks after they complete
Regular compaction tasks are internal.
Adjust test_compaction_task accordingly: modify test_regular_compaction_task,
delete test_running_compaction_task_abort (relying on regular compaction)
which checks are already achived by test_not_created_compaction_task_abort.
Rename the latter.
If for some reason an exception is thrown in compaction_manager::remove,
it might leave behind stale table pointers in _compaction_state. Fix
that by setting up a deffered action to perform the cleanup.
Fixes#16635
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#16632
The interface is fragile because the user may incorrectly use the
wrong "gc before". Given that sstable knows how to properly calculate
"gc before", let's do it in estimate__d__t__r(), leaving no room
for mistakes.
sstable_run's variant was also changed to conform to new interface,
allowing ICS to properly estimate droppable ratio, using GC before
that is calculated using each sstable's range. That's important for
upcoming tablets, as we want to query only the range that belongs
to a particular tablet in the repair history table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15931
The task for splitting compaction will run until all sstables
in the main set are split. The only exceptions are shutdown
or user has explicitly asked for abort.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
reader_consumer_v2 being a noncopyable_function imposes a restriction
when stacking one interposer consumer on top of another.
Think for example of a token-based segregator on top of a timestamp
based one.
To achieve that, the interposer consumer creator must be reentrant,
such that the consumer can be created on each "channel", but today
the creator becomes unusable after first usage.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently, if a compaction function enters the table
or compaction_group async_gate, we can't stop it
on the table/compaction_group stop path as they co_await
their respective async_gate.close().
This series introduces a table_ptr smart pointer to guards
the table object by entering its async_gate, and
it also defers awaiting the gate.close future
till after stopping ongoing compaction so that
closing the gate will prevent starting new compactions
while ongoing compaction can be stopped and finally
awaiting the close() future will wait for them to
unwind and exit the gate after being stopped.
Fixes#16305Closesscylladb/scylladb#16351
* github.com:scylladb/scylladb:
compaction: run_on_table: skip compaction also on gate_closed_exception
compaction: run_on_table: hold table
table: add table_holder and hold method
table: stop: allow compactions to be stopped while closing async_gate
Similar to the no_such_column_family error,
gate_closed_exception indicates that the table
is stopped and we should skip compaction on it
gracefully.
Fixes#16305
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As a preparation for asynchronous compaction api, from which we
cannot take values by reference, top level compaction tasks get
pointers which need to be set to nullptr when they are not needed
(like in async api).
run_on_existing_tables() is not used at all. and we have two of them.
in this change, let's drop them.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16304
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
For major compacting all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool compact` translates to
a sequence of `/storage_service/keyspace_compaction` calls).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6
However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).
Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.
However, flushing all tables too frequently might
result in tiny sstables. Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.
Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).
In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.
Fixesscylladb/scylladb#15777
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When flushing is done externally, e.g. by running
`nodetool flush` prior to `nodetool compact`,
flush_memtables=false can be passed to skip flushing
of tables right before they are major-compacted.
This is useful to prevent creation of small sstables
due to excessive memtable flushing.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.
Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.
Closesscylladb/scylladb#16177
* github.com:scylladb/scylladb:
test: test abort of compaction task that isn't started yet
test: test running compaction task abort
tasks: fail if a task was aborted
compaction: abort task manager compaction tasks
Set top level compaction tasks as abortable.
Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for
map<timestamp_type, vector<shared_sstable>>. since the operator<<
for this type is only used in the .cc file, and the only use case
of it is to provide the formatter for fmt, so the operator<< based
formatter is remove in this change.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16163
This reverts commit 11cafd2fc8, reversing
changes made to 2bae14f743.
Reverting because this series causes frequent CI failures, and the
proposed quickfix causes other failures of its own.
Fixes: #16113
TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows.
Turns out we had two problems in this area that leads to suboptimal bloom filters.
1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed.
2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count.
For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong.
Fixes https://github.com/scylladb/scylladb/issues/15704.
Closesscylladb/scylladb#15938
* github.com:scylladb/scylladb:
streaming: Improve partition estimation with TWCS
streaming: Don't adjust partition estimate if segregation is postponed
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.
Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.
Closesscylladb/scylladb#16050
* github.com:scylladb/scylladb:
test: test abort of compaction task that isn't started yet
test: test running compaction task abort
tasks: fail if a task was aborted
compaction: abort task manager compaction tasks
This is continuation of a34c8dc4 (Drop compaction_manager_for_testing).
There's one more wrapper over compaction_manager to access its private fields. All such access was recently moved to sstables::test_env's compaction manager, now it's time to drop the remaining legacy wrapper class.
Closesscylladb/scylladb#16017
* github.com:scylladb/scylladb:
test/utils: Drop compaction_manager_test
test/utils: Get compaction manager from test_env
test/sstables: Introduce test_env_compaction_manager::perform_compaction()
test/env: Add sstables::test_env& to compaction_manager_test::run()
test/utils: Add sstables::test_env& to compact_sstables()
test/utils: Simplify and unify compaction_manager_test::run()
test/utils: Squash two compact_sstables() helpers
test/compaction: Use shorter compact_sstables() helper
test/utils: Keep test task compaction gate on task itself
test/utils: Move compaction_manager_test::propagate_replacement()
Set top level compaction tasks as abortable.
Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
This class only provides a .run() method which allocates a task and
calls sstables::test_env::perform_compaction(). This can be done in a
helper method, no need for the whole class for it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Take it from compaction_manager_test::run() which is simplified overwite
of the compaction_manager::perform_compaction().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The purpose of this method is to turn public the private
compaction_manager method of the same name. The caller of this method is
having sstable_test_env at hand with its test_env_compaction_manager, so
the de-private-isation call can be moved.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.
we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15913
The polling loop was intended to ignore
`condition_variable_timed_out` and check for progress
using a longer `max_idle_duration` timeout in the loop.
Fixes#15669
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#15671
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.
Reverting because rest_api.test_compaction_task started failing after
this was merged.
Fixes: #16005
When off-strategy is disabled, data segregation is not postponed,
meaning that getting partition estimate right is important to
decrease filter's false positives. With streaming, we don't
have min and max timestamps at destination, well, we could have
extended the RPC verb to send them, but turns out we can deduce
easily the amount of windows using default TTL. Given partitioner
random nature, it's not absurd to assume that a given range being
streamed may overlap with all windows, meaning that each range
will yield one sstable for each window when segregating incoming
data. Today, we assume the worst of 100 windows (which is the
max amount of sstables the input data can be segregated into)
due to the lack of metadata for estimating the window count.
But given that users are recommended to target a max of ~20
windows, it means partition estimate is being downsized 5x more
than needed. Let's improve it by using default TTL when
estimating window count, so even on absence of timestamp
metadata, the partition estimation won't be way off.
Fixes#15704.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.
Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.
Closesscylladb/scylladb#15083
* github.com:scylladb/scylladb:
test: test abort of compaction task that isn't started yet
test: test running compaction task abort
tasks: fail if a task was aborted
compaction: abort task manager compaction tasks
default_compaction_progress_monitor returns a reference to a static
object. So, it should be read-only, but its users need to modify it.
Delete default_compaction_progress_monitor and use one's own
compaction_progress_monitor instance where it's needed.
Closesscylladb/scylladb#15800
After "repair: Get rid of the gc_grace_seconds", the sstable's schema (mode,
gc period if applicable, etc) is used to estimate the amount of droppable
data (or determine full expiration = max_deletion_time < gc_before).
It could happen that the user switched from timeout to repair mode, but
sstables will still use the old mode, despite the user asked for a new one.
Another example is when you play with value of grace period, to prevent
data resurrection if repair won't be able to run in a timely manner.
The problem persists until all sstables using old GC settings are recompacted
or node is restarted.
To fix this, we have to feed latest schema into sstable procedures used
for expiration purposes.
Fixes#15643.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15746