Commit Graph

1265 Commits

Author SHA1 Message Date
Avi Kivity
7da3314deb Merge 'Integrated restore' from Ernest Zaslavsky
Handed over from https://github.com/scylladb/scylladb/pull/20149

This adds minimal implementation of the start-restore API call.

The method starts a task that runs load-and-stream functionality against sstables from S3 bucket. Arguments are:

```
endpoint -- the ID in object_store.yaml config file
bucket -- the target bucket to get objects from
keyspace -- the keyspace to work on
table -- the table to work on
snapshot -- the name of the snapshot from which the backup was taken
```
The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion.

Remote sstables components are scanned as if they were placed in local upload/ directory. Then colelcted sstables are fed into load-and-stream.

This branch has https://github.com/scylladb/scylladb/pull/19890 (Integrated backup), https://github.com/scylladb/scylladb/pull/20120 (S3 lister) and few more minor PRs merged in. The restore branch itself starts with [utils: Introduce abstract (directory) lister](29c867b54d) commit.

refs: https://github.com/scylladb/scylladb/issues/18392

Closes scylladb/scylladb#20305

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add restore integration
  test/object_store: Add simple restore test
  test/object_store: Generalize prepare_snapshot_for_backup()
  code: Introduce restore API method
  sstable_loader: Add sstables::storage_manager dependency
  sstable_loader: Maintain task manager module
  sstable_loader: Out-line constructor
  distributed_loader: Split get_sstables_from_upload_dir()
  sstables/storage: Compose uploaded sstable path simpler
  sstable_directory: Prepare FS lister to scan files on S3
  sstable_directory: Parse sstable component without full path
  s3-client: Add support for lister::filter
  utils: Introduce abstract (directory) lister
2024-08-29 18:25:30 +03:00
Pavel Emelyanov
11a04bfb66 code: Introduce restore API method
The method starts a task that uses sstables_loader load-and-stream
functionality to bring new sstables into the cluster. The existing
load-and-stream picks up sstables from upload/ directory, the newly
introduced task collects them from S3 bucket and given prefix (that
correspond to the path where backup API method put them).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-28 15:42:49 +03:00
Avi Kivity
94d5507237 Merge 'select from mutation_fragments() + tablets: handle reads for non-owned partitions' from Botond Dénes
Attempting to read a partition via `SELECT * FROM MUTATION_FRAGMENTS()`, which the node doesn't own, from a table using tablets causes a crash.
This is because when using tablets, the replica side simply doesn't handle requests for un-owned tokens and this triggers a crash.
We should probably improve how this is handled (an exception is better than a crash), but this is outside the scope of this PR.
This PR fixes this and also adds a reproducer test.

Fixes: https://github.com/scylladb/scylladb/issues/18786

Fixes a regression introduced in 6.0, so needs backport to 6.0 and 6.1

Closes scylladb/scylladb#20109

* github.com:scylladb/scylladb:
  test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works
  replica/mutation_dump: enfore pinning of effective replication map
  replica/mutation_dump: handle un-owned tokens (with tablets)
2024-08-27 20:46:10 +03:00
Pavel Emelyanov
6a006d2255 distributed_loader: Split get_sstables_from_upload_dir()
Next patches will need this method to initialize sstable_directory
differently and then do its regular processing. For that, split the
method into two, next patch will re-use the common part it needs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Avi Kivity
0acfa4a00d Merge 'abstract_replication_strategy: make get_ranges async' from Benny Halevy
To prevent stalls due to large number of tokens.
For example, large cluster with say 70 nodes can have
more than 16K tokens.

Fixes #19757

Closes scylladb/scylladb#19758

* github.com:scylladb/scylladb:
  abstract_replication_strategy: make get_ranges async
  database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param
  compaction: task_manager_module: open code maybe_get_keyspace_local_ranges
  alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder
  alternator: ttl: can pass const gms::gossiper& to ranges_holder
  alternator: ttl: ranges_holder_primary: unconstify _token_ranges member
  alternator: ttl: refactor token_ranges_owned_by_this_shard
2024-08-26 16:56:18 +03:00
Botond Dénes
b2c07c9b6f Merge 'compaction: change compaction stop reason ' from Aleksandra Martyniuk
Currently "table removal" is logged as a reason of compaction stop for table drop,
tablet cleanup and tablet split. Modify log to reflect the reason.

Closes scylladb/scylladb#20042

* github.com:scylladb/scylladb:
  test: add test to check compaction stop log
  compaction: fix compaction group stop reason
2024-08-26 13:40:07 +03:00
Benny Halevy
686a8f2939 abstract_replication_strategy: make get_ranges async
To prevent stalls due to large number of tokens.
For example, large cluster with say 70 nodes can have
more than 16K tokens.

Fixes #19757

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:57:34 +03:00
Benny Halevy
2bbbe2a8bc database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param
Prepare for making the function async.
Then, it will need to hold on to the erm while getting
the token_ranges asynchronously.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:55:33 +03:00
Benny Halevy
ea5a0cca10 compaction: task_manager_module: open code maybe_get_keyspace_local_ranges
It is used only here and can be simplified by
checking if the keyspace replication strategy
is per table by the caller.

Prepare for making get_keyspace_local_ranges async.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Pavel Emelyanov
f7b380d53b database: Export parse_table_directory_name() helper
There's parse_table_directory_name() static helper in database.cc code
that is used by methods that parse table tree layout for snapshot.
Export this helper for external usage and rename to fit the format_...
one introduced by previous patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:57:48 +03:00
Pavel Emelyanov
33962946fc database: Introduce format_table_directory_name() helper
The one makes table directory (not full path) out of table name and
uuid. This is to be symmetrical with yet another helper that converts
dirctory name back to table name and uuid (next patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:57:48 +03:00
Botond Dénes
46563d719f replica/mutation_dump: enfore pinning of effective replication map
By making it a required argument, making sure the topology version is
pinned for the duration of the query. This is needed because mutation
dump queries bypass the storage proxy, where this pinning usually takes
place. So it has to be enforced here.
2024-08-22 06:24:06 -04:00
Botond Dénes
de5329157c replica/mutation_dump: handle un-owned tokens (with tablets)
When using tablets, the replica-side doesn't handle un-owned tokens.
table::shard_for_reads() will just return 0 for un-owned tokens, and a
later attempt at calling table::storage_group_for_token() with said
un-owned token will cause a crash (std::terminate due to
std::out_of_range thrown in noexcept context).
The replicas rely on the coordinator to not send stray requests, but for
select from mutation_fragments(table) queries, there is no coordinator
side who could do the correct dispatching. So do this in
mutation_dump(), just creating empty readers for un-owned tokens.
2024-08-22 03:06:55 -04:00
Aleksandra Martyniuk
5005e19de7 compaction: fix compaction group stop reason
compaction_manager::remove passes "table removal" as a reason
of stopping ongoing compactions, but currently remove method
is also called when a tablet is migrated or split.

Pass the actual reason of compaction stop, so that logs aren't
misleading.
2024-08-21 12:42:09 +02:00
Benny Halevy
f40d06b766 table: calculate_tablet_count: use sg_manager storage_groups size
Now, when each shard storage_group_manager keeps
only the storage_groups for the tablet replica it owns,
we can simple return the storage_group map size
instead of counting the number of tablet replicas
mapped to this shard.

Add a unit test that sums the tablet count
on all shards and tests that the sum is equal
to the configured default `initial_tablets.

Fixes #18909

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#20223
2024-08-21 11:01:58 +02:00
Aleksandra Martyniuk
9d9414a75d replica: add/remove table atomically
Currently, database::tables_metadata::add_table needs to hold a write
lock before adding a table. So, if we update other classes keeping
track of tables before calling add_table, and the method yields,
table's metadata will be inconsistent.

Set all table-related info in tables_metadata::add_table_helper (called
by add_table) so that the operation is atomic.

Analogically for remove_table.

Fixes: #19833.

Closes scylladb/scylladb#20064
2024-08-20 20:53:32 +03:00
Botond Dénes
3ee0d7f2d1 Merge 'tools: Enhance scylla sstable shard-of to support tablets' from Kefu Chai
before this change, `scylla sstable shard-of` didn't support tablets,
because:

- with tablets enabled, data distribution uses the scheduler
- this replaces the previous method of mapping based on vnodes and shard numbers
- as a result, we can no longer deduce sstable mapping from token ranges

in this change, we:
- read `system.tablets` table to retrieve tablet information
- print the tablet's replica set (list of <host, shard> pairs)
- this helps users determine where a given sstable is hosted

This approach provides the closest equivalent functionality of
`shard-of` in the tablet era.

Fixes scylladb/scylladb#16488

---

no need to backport, it's an improvement, not a critical fix.

Closes scylladb/scylladb#20002

* github.com:scylladb/scylladb:
  tools: enhance `scylla sstable shard-of` to support tablets
  replica/tablets: extract tablet_replica_set_from_cell()
  tools: extract get_table_directory() out
  tools: extract read_mutation out
  build: split the list of source file across multiple line
  tools/scylla-sstable: print warning when running shard-of with tablets
2024-08-20 13:51:12 +03:00
Tomasz Grabiec
c1de4859d8 Merge 'tablets: Fix race between repair and split' from Raphael "Raph" Carvalho
Consider the following:

```
T
0   split prepare starts
1                               repair starts
2   split prepare finishes
3                               repair adds unsplit sstables
4                               repair ends
5   split executes
```

If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path.

The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set.

Fixes #19378.
Fixes #19416.

**Please replace this line with justification for the backport/\* labels added to this PR**

Closes scylladb/scylladb#19427

* github.com:scylladb/scylladb:
  tablets: Fix race between repair and split
  compaction: Allow "offline" sstable to be split
2024-08-19 14:44:28 +02:00
Lakshmi Narayanan Sreethar
44583eed9e replica: fix copy constructor of tablet_sstable_set
Remove the existing copy constructor to enable the use of the implicit
copy constructor. This fixes the issue of `_sstable_set_ids` not being
copied in the current copy constructor.

Fixes #19519

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-08-17 23:37:58 +05:30
Kefu Chai
4291033b14 replica/tablets: extract tablet_replica_set_from_cell()
so it can be reused to implement a low-level tool which reads tablets
data from sstables

Refs scylladb/scylladb#16488
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-15 15:49:55 +08:00
Pavel Emelyanov
66d72e010c distributed_loader: Lock table via global table ptr
The lock_table() method needs database, ks and cf to find the table on
all shards. The same can be achieved with the help of global_table_ptr
thing that all the core callers already have at hand.

There's a test that doesn't have global table, but it can get one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20139
2024-08-14 20:53:21 +03:00
Łukasz Paszkowski
43221bbeed clustering_key_filter: unify get_ranges and get_native_ranges
When a reverse slice is provided, it is given in the native reverse
format. Thus the ranges will be returned in the same order as stored
in the slice.

Therefore there is no need to distinguish between get_ranges and
get_native_ranges. The latter one gets dropped and get_ranges returns
ranges in the same order as stored in the slice.
2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
da95f44adc readers: Use reversed schema and native reversed slices
The reconcilable_result is built as it would be constructed for
forward read queries for tables with reversed order.

Mutations constructed for reversed queries are consumed forward.

Drop overloaded reversed functions that reverse read_command and
reconcilable_result directly and keep only those requiring smart
pointers. They are not used any more.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
faa62310d9 database: accept reversed schema for reversed queries
Remove schema reversing in query() and query_mutations() methods.
Instead, a reversed schema shall be passed for reversed queries.
Rename a schema variable from s into query_schema for readability.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
b270097f1f config: drop reversed_reads_auto_bypass_cache
Reverse reads have already been with us for a while, thus this back
door option to bypass in-memory data cache for reversed queries can
be retired.
2024-08-13 10:02:42 +02:00
Łukasz Paszkowski
80df313f49 config: drop enable_optimized_reversed_reads
Reverse reads have already been with us for a while, thus this back
door option to read entire paritions forward and reversing them after
can be retired.
2024-08-13 10:02:42 +02:00
Raphael S. Carvalho
74612ad358 tablets: Fix race between repair and split
Consider the following:

T
0   split prepare starts
1                               repair starts
2   split prepare finishes
3                               repair adds unsplit sstables
4                               repair ends
5   split executes

If repair produces sstable after split prepare phase, the replica
will not split that sstable later, as prepare phase is considered
completed already. That causes split execution to fail as replicas
weren't really prepared. This also can be triggered with
load-and-stream which shares the same write (consumer) path.

The approach to fix this is the same employed to prevent a race
between split and migration. If migration happens during prepare
phase, it can happen source misses the split request, but the
tablet will still be split on the destination (if needed).
Similarly, the repair writer becomes responsible for splitting
the data if underlying table is in split mode. That's implemented
in replica::table for correctness, so if node crashes, the new
sstable missing split is still split before added to the set.

Fixes #19378.
Fixes #19416.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-08-12 17:28:51 -03:00
Avi Kivity
318278ff92 Merge 'tablets: reload only changed metadata' from Botond Dénes
Currently, each change to tablet metadata triggers a full metadata reload from disk. This is very wasteful, especially if the metadata change affects only a single row in the `system.tablets` table. This is the case when the tablet load balancer triggers a migration, this will affect a single row in the table, but today will trigger a full reload.
We expect tablet count to potentially grow to thousands and beyond and the overhead of this full reload can become significant.
This PR makes tablet metadata reload partial, instead of reloading all metadata on topology or schema changes, reload only the partitions that are affected by the change. Copy the rest from the in-memory state.
This is done with two passes: first the change mutations are scanned and a hint is produced. This hint is then passed down to the reload code, which will use it to only reload parts (rows/partitions) of the metadata that has actually changed.

The performance difference between full reload and partial reload is quite drastic:
```
INFO  2024-07-25 05:06:27,347 [shard 0:stat] testlog - Tablet metadata reload:
full      616.39ms
partial     0.18ms
```
This was measured with the modified (by this PR) `perf_tablets`, which creates 100 tables, each with 2K tablets. The test was modified to change a single tablet, then do a full and partial reload respectively, measuring the time it takes for reach.

Fixes: #15294

New feature, no backport needed.

Closes scylladb/scylladb#15541

* github.com:scylladb/scylladb:
  test/perf/perf_tablets: add tablet metadata reload perf measurement
  test/boost/tablets_test: add test for partial tablet metadata updates
  db/schema_tables: pass tablet hint to update_tablet_metadata()
  service/storage_service: load_tablet_metadata(): add hint parameter
  service/migration_listener: update_tablet_metadata(): add hint parameter
  service/raft/group0_state_machine: provide tablet change hint on topology change
  service/storage_service: topology_state_load(): allow providing change hint
  replica/tablets: add update_tablet_metadata()
  replica/tablets: fix indentation
  replica/tablets: extract tablet_metadata builder logic
  replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint()
  locator/tablets: add tablet_map::clear_tablet_transition_info()
  locator/tablets: make tablet_metadata cheap to copy
  mutation/canonical_mutation: add key()
2024-08-11 21:27:18 +03:00
Botond Dénes
bb1e733fe0 replica/tablets: add update_tablet_metadata()
Allows updateng tablet metadata in-place, according to the provided
hint, reading and updating only the parts that actually changed.
2024-08-11 09:52:37 -04:00
Botond Dénes
66292b4baa replica/tablets: fix indentation
Left broken from the previous patch.
2024-08-11 09:52:37 -04:00
Botond Dénes
aa378c458e replica/tablets: extract tablet_metadata builder logic
So it can be reused in a new method.
Indentation is left broken deliberately, to make the patch easier to
read.
2024-08-11 09:52:37 -04:00
Botond Dénes
f5976aa87b replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint()
Extract a hint of what a tablet mutation changed. The hint can be later
used to selectively reload only the changed parts from disk.
Two variants are added:
* get_tablet_metadata_change_hint() - extracts a hint from a list of
  tablet mutations
* update_tablet_metadata_change_hint() - updates an existing hint based
  on a single mutation, allowing for incremental hint extraction
2024-08-11 09:52:37 -04:00
Botond Dénes
0254cfc7d3 locator/tablets: make tablet_metadata cheap to copy
Keep lw_shared_ptr<tablet_map> in the tablet map and use COW semantics.
To prevent accidental changes to shared tablet_map instances, all
modifications to a tablet_map have to go through a new
`mutate_tablet_map()` method, which implements the copy-modify-swap
idiom.
2024-08-11 09:52:37 -04:00
Calle Wilund
e18a855abe extensions: Add exception types for IO extensions and handle in memtable write path
Fixes #19960

Write path for sstables/commitlog need to handle the fact that IO extensions can
generate errors, some of which should be considered retry-able, and some that should,
similar to system IO errors, cause the node to go into isolate mode.

One option would of course be for extensions to simply generate std::system_errors,
with system_category and appropriate codes. But this is probably a bad idea, since
it makes it more muddy at which level an error happened, as well as limits the
expressibility of the error.

This adds three distinct types (sharing base) distinguishing permission, availabilty
and configuration errors. These are treated akin to EACCESS, ENOENT and EINVAL in
disk error handler and memtable write loop.

Tests updated to use and verify behaviour.

Closes scylladb/scylladb#19961
2024-08-11 13:52:35 +03:00
Raphael S. Carvalho
75829d75ec replica: Fix race between split compaction and migration
After removal of rwlock (53a6ec05ed), the race was introduced because the order that
compaction groups of a tablet are closed, is no longer deterministic.

Some background first:
Split compaction runs in main (unsplit) group, and adds sstable to left and right groups
on completion.

The race works as follow:
1) split compaction starts on main group of tablet X
2) tablet X reaches cleanup stage, so its compaction groups are closed in parallel
3) left or right group are closed before main (more likely when only main has flush work to do)
4) split compaction completes, and adds sstable to left and right
5) if e.g left is closed, adjusting backlog tracker will trigger an exception, and since that
happens in row cache update's execute(), node crashes.

The problem manifested as follow:
[shard 0: gms] raft_topology - Initiating tablet cleanup of 5739b9b0-49d4-11ef-828f-770894013415:15 on 102a904a-0b15-4661-ba3f-f9085a5ad03c:0
...
[shard 0:strm] compaction - [Split keyspace1.standard1 009e2f80-49e5-11ef-85e3-7161200fb137] Splitting [/var/lib/scylla/data/keyspace1/...]
...
[shard 0:strm] cache - Fatal error during cache update: std::out_of_range (Compaction state for table [0x600007772740] not found),
at: ...
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, row_cache::do_update(...
   --------
   seastar::internal::do_with_state<std::tuple<row_cache::external_updater, std::function<seastar::future<void> ()> >, seastar::future<void> >
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::(anonymous namespace)::thread_wake_task
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::async<sstables::compaction::run(...
   seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::future<sstables::compaction_resu...

From the log above, it can be seen cache update failure happens under streaming sched group and
during compaction completion, which was good evidence to the cause.
Problem was reproduced locally with the help of tablet shuffling.

Fixes: #19873.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#19987
2024-08-11 11:00:19 +03:00
Botond Dénes
1f4b9a5300 Merge 'compaction: drop compaction executors' possibility to bypass task manager' from Aleksandra Martyniuk
If parent_info argument of compaction_manager::perform_compaction
is std::nullopt, then created compaction executor isn't tracked by task
manager. Currently, all compaction operations should by visible in task
manager.

Modify split methods to keep split executor in task manager. Get rid of
the option to bypass task manager.

Closes scylladb/scylladb#19995

* github.com:scylladb/scylladb:
  compaction: replace optional<task_info> with task_info param
  compaction: keep split executor in task manager
2024-08-11 10:26:43 +03:00
Calle Wilund
d6742e9bce distributed_loader: Remove load_prio_keyspaces
Fixes #13334

All required code paths (see enterprise) now uses
extensions::is_extension_internal_keyspace.
The old mechanism can be removed. One less global var.

Closes scylladb/scylladb#20047
2024-08-08 12:10:27 +03:00
Piotr Dulikowski
a038a1fdef Merge 'db: coroutinize do_apply_counter_update' from Michael Litvak
rewrite the function as coroutine to make it easier to read and maintain, following lifetime issues we had and fixed in this function.

The second commit adds a test that drops a table while there is a counter update operation ongoing in the table.
The test reproduces issue https://github.com/scylladb/scylla-enterprise/issues/4475 and verifies it is fixed.

Follow-up to https://github.com/scylladb/scylladb/pull/19948
Doesn't require backport because the fix to the issue was already done and backported. This is just cleanup and a test.

Closes scylladb/scylladb#19982

* github.com:scylladb/scylladb:
  db: test counter update while table is dropped
  db: coroutinize do_apply_counter_update
2024-08-05 10:08:18 +02:00
Botond Dénes
76b6e8c5aa Merge 'Drop datadir from keyspace::config' from Pavel Emelyanov
Commit ad0e6b79 (replica: Remove all_datadir from keyspace config) removed all_datadirs from keyspace config, now it's datadir turn. After this change keyspace no longer references any on-disk directories, only the sstables's storage driver attached to keyspace's tables does.

refs #12707

Closes scylladb/scylladb#19866

* github.com:scylladb/scylladb:
  replica: Remove keyspace::config::datadir
  sstables/storage: Evaluate path for keyspace directory in storage
  sstables/storage: Add sstables_manager arg to init_keyspace_storage()
2024-08-05 09:46:29 +03:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Aleksandra Martyniuk
c456a43173 compaction: replace optional<task_info> with task_info param
compaction_manager::perform_compaction does not create task manager
task for compaction if parent_info is set to std::nullopt. Currently,
we always want to create task manager task for compaction.

Remove optional from task info parameters which start compaction.
Track all compactions with task manager.
2024-08-02 14:38:46 +02:00
Aleksandra Martyniuk
108d0344b8 compaction: keep split executor in task manager
If perform_compaction gets std::nullopt as a parent info then
the executor won't be tracked by task manager.

Modify storage_group::split call so that it passes empty task_info
instead of nullopt to track split.
2024-08-02 12:45:32 +02:00
Michael Litvak
0f5e8c52ad db: test counter update while table is dropped
Add a test that drops a table while there is a counter update operation
ongoing in the table.
The test reproduces issue scylladb/scylla-enterprise#4475 and verifies
it is fixed.
2024-08-01 22:23:17 +03:00
Michael Litvak
22b282f5c5 db: coroutinize do_apply_counter_update
rewrite the function as coroutine to make it easier to read and maintain,
following lifetime issues we had and fixed in this function.
2024-08-01 19:09:04 +03:00
Michael Litvak
c944e28e43 db: fix waiting for counter update operations on table stop
When a table is dropped it should wait for all pending operations in the
table before the table is destroyed, because the operations may use the
table's resources.
With counter update operations, currently this is not the case. The
table may be destroyed while there is a counter update operation in
progress, causing an assert to be triggered due to a resource being
destroyed while it's in use.
The reason the operation is not waited for is a mistake in the lifetime
management of the object representing the write in progress. The commit
fixes it so the object lives for the duration of the entire counter
update operation, by moving it to the `do_with` list.

Fixes scylladb/scylla-enterprise#4475

Closes scylladb/scylladb#19948
2024-08-01 09:39:49 +02:00
Pavel Emelyanov
6357755624 replica: Remove keyspace::config::datadir
It's finally no longer used. Now only sstables storage code "knows" that
keyspace may have its on-disk directory.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 17:45:51 +03:00
Pavel Emelyanov
f767e25c8b sstables/storage: Evaluate path for keyspace directory in storage
Currently the init_keyspace_storage() expects that the caller would
tell it where the ks directory is, but it's not nice as keyspace may
not necessarity keep its sstables in any directory.

This patch moves the directory path evaluation into storage code,
specifically to the lambda that is called for on-disk sstables. The
way directory is evaluated mirrors the one from make_keyspace_config()
that will be removed by next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 17:45:50 +03:00
Pavel Emelyanov
b02d20d12d Merge 'Minor improvements around compaction groups' from Raphael "Raph" Carvalho
Minor changes, no backporting needed.

Closes scylladb/scylladb#19723

* github.com:scylladb/scylladb:
  replica: rename for_each_const_compaction_group()
  replica: Fix comment about compaction group
  replica: remove unused compaction_group_vector
2024-07-24 11:22:24 +03:00
Botond Dénes
d3135db457 Merge 'commitlog: Add optional max lifetime parameter to cl instance' from Calle Wilund
If set, any remaining segment that has data older than this threshold will request flushing, regardless of data pressure. I.e. even a system where nothing happends will after X seconds flush data to free up the commit log.

Related to  #15820

The functionality here is to prevent pathological/test cases where a silent system cannot fully process stuff like compaction, GC etc due to things like CL forcing smaller GC windows etc.

Closes scylladb/scylladb#15971

* github.com:scylladb/scylladb:
  commitlog: Make max data lifetime runtime-configurable
  db::config: Expose commitlog_max_data_lifetime_in_s parameter
  commitlog: Add optional max lifetime parameter to cl instance
2024-07-22 17:21:33 +03:00
Lakshmi Narayanan Sreethar
6a3e7a5e7a sstables/sstables_manager: store abort_source in sstable_manager
Add a new member that stores the abort_source. This can later be used by
the sstables to check if an abort has been requested. Also implement
sstables_manager::get_abort_source() that returns a const reference to
the abort source.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:36:06 +05:30