Commit Graph

796 Commits

Author SHA1 Message Date
Pavel Emelyanov
1f44a374b8 error_injection: Overload inject() instead of inject_with_handler()
The inject_with_handler() method accepts a coroutine that can be called
wiht injection_handler. With such function as an argument, there's no
need in distinctive inject_with_handler() name for a method, it can be
overload of all the existing inject()-s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 19:30:19 +03:00
Botond Dénes
41424231f1 Merge 'compaction: reshape sstables within compaction groups' from Lakshmi Narayanan Sreethar
For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. The shard_reshaping_compaction_task_impl now groups the sstables based on their compaction groups before reshaping them.

Fixes https://github.com/scylladb/scylladb/issues/16966

Closes scylladb/scylladb#17395

* github.com:scylladb/scylladb:
  test/topology_custom: add testcase to verify reshape with tablets
  test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
  replica/distributed_loader: enable reshape for sstables
  compaction: reshape sstables within compaction groups
  replica/table : add method to get compaction group id for an sstable
  compaction: reshape: update total reshaped size only on success
  compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
2024-03-06 10:33:56 +02:00
Raphael S. Carvalho
f07c233ad5 Fix potential data resurrection when another compaction type does cleanup work
Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.

So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.

Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.

Scylla 5.4 is affected.

Fixes #17501.
Fixes #17452.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17502
2024-02-25 13:08:04 +02:00
Lakshmi Narayanan Sreethar
83fecc2f1f compaction: reshape sstables within compaction groups
For tables using tablet based replication strategies, the sstables
should be reshaped only within the compaction groups they belong to.
Updated shard_reshaping_compaction_task_impl to group the sstables based
on their compaction groups before reshaping them within the groups.

Fixes #16966

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar
9fffd8905f compaction: reshape: update total reshaped size only on success
The total reshaped size should only be updated on reshape success and
not after reshape has been failed due to some exception.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar
4fb099659a compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
Catch and handle the exceptions directly instead of rethrowing and
catching again.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 01:07:54 +05:30
Kefu Chai
6408834e33 compaction: add formatter for formatted_sstables_list
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `formatted_sstables_list`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:45:45 +08:00
Kefu Chai
9969d88d82 compaction: add fmt::formatter for compaction_type and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode`

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:45:40 +08:00
Avi Kivity
605bf6e221 range.hh: retire
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.

Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.

The unit tests are renamed and range.hh is deleted.

Closes scylladb/scylladb#17428
2024-02-21 00:24:25 +02:00
Avi Kivity
eedb997568 Merge 'compaction: upgrade: handle keyspaces that use tablets' from Lakshmi Narayanan Sreethar
Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace.

Fixes #16848

Closes scylladb/scylladb#17335

* github.com:scylladb/scylladb:
  compaction: upgrade: handle keyspaces that use tablets
  replica/database: add an optional variant to get_keyspace_local_ranges
2024-02-15 21:31:54 +02:00
Lakshmi Narayanan Sreethar
7a98877798 compaction: upgrade: handle keyspaces that use tablets
Tables in keyspaces governed by replication strategy that uses tablets, have
separate effective_replication_maps. Update the upgrade compaction task to
handle this when getting owned key ranges for a keyspace.

Fixes #16848

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-15 17:47:39 +05:30
Kefu Chai
caa20c491f storage_service: pass non-empty keyspace when performing cleanup_all
this change addresses the regression introduced by 5e0b3671, which
fall backs to local cleanup in cleanup_all. but 5e0b3671 failed to
pass the keyspace to the `shard_cleanup_keyspace_compaction_task_impl`
is its constructor parameter, that's why the test fails like
```
error executing POST request to http://localhost:10000/storage_service/cleanup_all with parameters {}: remote replied with status code 400 Bad Request:
Can't find a keyspace

```

where the string after "Can't find a keyspace" is empty.

in this change, the keyspace name of the keyspace to be cleaned is passed to
`shard_cleanup_keyspace_compaction_task_impl`.

we always enable the topology coordinator when performing testing,
that's why this issue does not pop up until the longevity test.

Fixes #17302
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17320
2024-02-15 13:17:45 +02:00
Avi Kivity
784c2f8ad2 Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai
get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.

Closes scylladb/scylladb#17130

* github.com:scylladb/scylladb:
  treewide: replace seastar::future::get0() with seastar::future::get()
  sstable: capture return value of get0() using auto
  utils: result_loop: define result_type with decayed type

[avi: add another one that snuck in while this was cooking]
2024-02-04 15:23:33 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Lakshmi Narayanan Sreethar
e86965c272 compaction: run rewrite_sstables_compaction_task_executor tasks in maintenance group
Use maintenance group to run all the compaction tasks that use the
rewrite_sstables_compaction_task_executor.

Fixes #16699

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17112
2024-02-02 11:18:49 +02:00
Kefu Chai
5e0b3671d3 storage_service: fall back to local cleanup in cleanup_all
before this change, if no keyspaces are specified,
scylla-nodetool just enumerate all non-local keyspaces, and
call "/storage_service/keyspace_cleanup" on them one after another.
this is not quite efficient, as each this RESTful API call
force a new active commitlog segment, and flushes all tables.
so, if the target node of this command has N non-local keyspaces,
it would repeat the steps above for N times. this is not necessary.
and after a topology change, we would like to run a global
"nodetool cleanup" without specifying the keyspace, so this
is a typical use case which we do care about.

to address this performance issue, in this change, we improve
an existing RESTful API call "/storage_service/cleanup_all", so
if the topology coordinator is not enabled, we fall back to
a local cleanup to cleanup all non-local keyspaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
4f90a875f6 compaction: format flush_mode without the helper
since flush_mode is moved out of major_compaction_task_impl, let's
drop the helper hosted in that class as well, and implement the
formatter witout it.

please note, the `__builtin_unreachable()` is dropped. it should
not change the behavior of the formatter. we don't put it in the
`default` branch in hope that `-Wswitch` can warn us in the case
when another enum of `flush_mode` is added, but we fail to handle
it somehow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
b39cc01bb3 compaction_manager: flush all tables before cleanup
according to the document "nodetool cleanup"

> Triggers removal of data that the node no longer owns

currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.

so, in this change we following the same model of major compaction:

1. force new active segment,
2. flush all tables
3. perform cleanup using compaction, which rewrites the sstables
   of specified tables

because we already `flush()` all tables in
`cleanup_keyspace_compaction_task_impl::run()`, there is no need to
call `flush()` again, in `table::perform_cleanup_compaction()`, so
the `flush()` call is dropped in this function, and the tests using
this function are updated to call `flush()` manually to preserve
the existing behavior.

there are two callers of `cleanup_keyspace_compaction_task_impl`,

* one is `storage_service::sstable_cleanup_fiber()`, which listens
  for the events fired by topology_state_machine, which is in turn
  driven by, for instance, "/storage_service/cleanup_all" API.
  which cleanup all keyspaces in one after another.
* another is "/storage_service/keyspace_cleanup", which cleans up
  the specified keyspace.

in the first use case, we can force a new active segment for a single
time, so another parameter to the ctor of
`cleanup_keyspace_compaction_task_impl` is introduced to specify if
the `db.flush_all_tables()` call should be skiped.

please note, there are two possible optimizations,

1. force new active segment only if the mutations in it touches the
   tables being cleaned up
2. after forcing new active segment, only flush the (mem)tables
   mutated by the non-active segments

but let's leave them for following-up changes. this change is a
minimal fix for data resurrection issue.

Fixes #16757
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
9afec2e3e7 api, compaction: promote flush_mode
so that this enum type can be shared by other task(s) as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Botond Dénes
c67698ea06 compaction/compaction_manager: perform_cleanup(): hold the compaction gate
While the cleanup is ongoing. Otherwise, a concurrent table drop might
trigger a use-after-free, as we have seen in dtests recently.

Fixes: #16770

Closes scylladb/scylladb#16874
2024-01-25 14:52:50 +01:00
Benny Halevy
51a46aa83b compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
Prevent the creation of a compaction task when
the list of sstables is known to be empty ahead
of time.

Refs scylladb/scylladb#16694
Fixes scylladb/scylladb#16803

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-01-17 11:53:39 +02:00
Benny Halevy
bd1d65ec38 compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
3b424e391b introduced a loop
in `perform_cleanup` that waits until all sstables that require
cleanup are cleaned up.

However, with f1bbf705f9,
an sstable that is_eligible_for_compaction (i.e. it
is not in staging, awaiting view update generation),
may already be compacted by e.g. regular compaction.
And so perform_cleanup should interrupt that
by calling try_perform_cleanup, since the latter
reevaluates `update_sstable_cleanup_state` with
compaction disabled - that stops ongoing compactions.

Refs scylladb/scylladb#15673

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-01-17 11:53:39 +02:00
Benny Halevy
d6071945c8 compaction, table: ignore foreign sstables replay_position
The sstables replay_position in stats_metadata is
valid only on the originating node and shard.

Therefore, validate the originating host and shard
before using it in compaction or table truncate.

Fixes #10080

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16550
2024-01-16 18:45:59 +02:00
Botond Dénes
697ebef149 Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk
Make compaction tasks internal. Drop all internal tasks without parents
immediately after they are done.

Fixes: #16735
Refs: #16694.

Closes scylladb/scylladb#16698

* github.com:scylladb/scylladb:
  compaction: make regular compaction tasks internal
  tasks: don't keep internal root tasks after they complete
2024-01-11 12:10:44 +02:00
Kefu Chai
eb9216ef11 compaction: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16707
2024-01-10 11:07:36 +02:00
Aleksandra Martyniuk
6b87778ef2 compaction: make regular compaction tasks internal
Regular compaction tasks are internal.

Adjust test_compaction_task accordingly: modify test_regular_compaction_task,
delete test_running_compaction_task_abort (relying on regular compaction)
which checks are already achived by test_not_created_compaction_task_abort.
Rename the latter.
2024-01-09 13:13:54 +01:00
Aleksandra Martyniuk
6f13e55187 tasks: call release_resources when task is finished
Call task_manager::task::impl::release_resources when task is finished
instead of putting the responsibility on user.

Closes scylladb/scylladb#16660
2024-01-09 11:41:54 +02:00
Lakshmi Narayanan Sreethar
1d6eaf2985 compaction manager: remove: cleanup _compaction_state on exceptions
If for some reason an exception is thrown in compaction_manager::remove,
it might leave behind stale table pointers in _compaction_state. Fix
that by setting up a deffered action to perform the cleanup.

Fixes #16635

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16632
2024-01-03 22:03:24 +02:00
Raphael S. Carvalho
d1e6dfadea sstables: Harden estimate_droppable_tombstone_ratio() interface
The interface is fragile because the user may incorrectly use the
wrong "gc before". Given that sstable knows how to properly calculate
"gc before", let's do it in estimate__d__t__r(), leaving no room
for mistakes.

sstable_run's variant was also changed to conform to new interface,
allowing ICS to properly estimate droppable ratio, using GC before
that is calculated using each sstable's range. That's important for
upcoming tablets, as we want to query only the range that belongs
to a particular tablet in the repair history table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15931
2023-12-20 19:04:41 +02:00
Raphael S. Carvalho
dd1a6d6309 compaction: Add splitting compaction task to manager
The task for splitting compaction will run until all sstables
in the main set are split. The only exceptions are shutdown
or user has explicitly asked for abort.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
f87161e556 compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
c96938c49b compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
b1c5d5dd4e compaction: Add splitting compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:08 -03:00
Raphael S. Carvalho
3dcb800a96 flat_mutation_reader: Allow interposer consumers to be stacked
reader_consumer_v2 being a noncopyable_function imposes a restriction
when stacking one interposer consumer on top of another.

Think for example of a token-based segregator on top of a timestamp
based one.

To achieve that, the interposer consumer creator must be reentrant,
such that the consumer can be created on each "channel", but today
the creator becomes unusable after first usage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:26:32 -03:00
Botond Dénes
493b6bc65f Merge 'Guard tables in compaction tasks' from Benny Halevy
Currently, if a compaction function enters the table
or compaction_group async_gate, we can't stop it
on the table/compaction_group stop path as they co_await
their respective async_gate.close().

This series introduces a table_ptr smart pointer to guards
the table object by entering its async_gate, and
it also defers awaiting the gate.close future
till after stopping ongoing compaction so that
closing the gate will prevent starting new compactions
while ongoing compaction can be stopped and finally
awaiting the close() future will wait for them to
unwind and exit the gate after being stopped.

Fixes #16305

Closes scylladb/scylladb#16351

* github.com:scylladb/scylladb:
  compaction: run_on_table: skip compaction also on gate_closed_exception
  compaction: run_on_table: hold table
  table: add table_holder and hold method
  table: stop: allow compactions to be stopped while closing async_gate
2023-12-12 12:50:17 +02:00
Benny Halevy
7843025a53 compaction: run_on_table: skip compaction also on gate_closed_exception
Similar to the no_such_column_family error,
gate_closed_exception indicates that the table
is stopped and we should skip compaction on it
gracefully.

Fixes #16305

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:46:37 +02:00
Benny Halevy
92c718c60a compaction: run_on_table: hold table
To ensure the table will not be dropped while
the compaction task is ongoing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:45:59 +02:00
Aleksandra Martyniuk
ceec5577d8 api: compaction: pass pointer to top level compaction tasks
As a preparation for asynchronous compaction api, from which we
cannot take values by reference, top level compaction tasks get
pointers which need to be set to nullptr when they are not needed
(like in async api).
2023-12-11 11:36:10 +01:00
Kefu Chai
f483309165 compaction, api: drop unused functions
run_on_existing_tables() is not used at all. and we have two of them.
in this change, let's drop them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16304
2023-12-06 14:31:08 +02:00
Benny Halevy
0bcce35abd treewide: get rid of now unused fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:22:49 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Benny Halevy
b12b142232 api: add /storage_service/compact
For major compacting all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool compact` translates to
a sequence of `/storage_service/keyspace_compaction` calls).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
66ba983fe0 compaction_manager: flush_all_tables before major compaction
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb/scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
1fd85bd37b api: compaction: add flush_memtables option
When flushing is done externally, e.g. by running
`nodetool flush` prior to `nodetool compact`,
flush_memtables=false can be passed to skip flushing
of tables right before they are major-compacted.

This is useful to prevent creation of small sstables
due to excessive memtable flushing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Botond Dénes
3ccf1e020b Merge ' compaction: abort compaction tasks' from Aleksandra Martyniuk
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.

Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.

Closes scylladb/scylladb#16177

* github.com:scylladb/scylladb:
  test: test abort of compaction task that isn't started yet
  test: test running compaction task abort
  tasks: fail if a task was aborted
  compaction: abort task manager compaction tasks
2023-11-28 09:08:04 +02:00
Aleksandra Martyniuk
9c2c964b8e test: test abort of compaction task that isn't started yet
Test whether a task which parent was aborted has a proper status.
2023-11-24 19:25:27 +01:00
Aleksandra Martyniuk
8639eae0ce test: test running compaction task abort
Test whether a task which is aborted while running has a proper status.
2023-11-24 19:25:20 +01:00
Aleksandra Martyniuk
aa7bba2d8b compaction: abort task manager compaction tasks
Set top level compaction tasks as abortable.

Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
2023-11-24 15:44:34 +01:00
Kefu Chai
f99223919a compaction: add formatter for map<timestamp_type, vector<shared_sstable>>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
map<timestamp_type, vector<shared_sstable>>. since the operator<<
for this type is only used in the .cc file, and the only use case
of it is to provide the formatter for fmt, so the operator<< based
formatter is remove in this change.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16163
2023-11-24 11:56:28 +02:00
Raphael S. Carvalho
157a5c4b1b treewide: Avoid using namespace sstables in header to avoid conflicts
That's needed for compaction_group.hh to be included in headers.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-11-23 17:36:57 +02:00