Commit Graph

1212 Commits

Author SHA1 Message Date
Asias He
0204372156 repair: Reject repair requests where start and end tokens are equal
When a user calls the repair API with identical startToken and endToken
values, the code creates a wrapping interval (T, T]. This causes
unwrap() to split it into (-inf, T] and (T, +inf), covering the entire
token ring and triggering a full repair.

Reject such requests early with an error message matching
Cassandra's behavior: "Start and end tokens must be different."

Fixes: https://scylladb.atlassian.net/browse/CUSTOMER-358

Closes scylladb/scylladb#29821
2026-05-11 14:08:20 +03:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Botond Dénes
3289928679 repair: fix quadratic complexity when loading repair history
shared_tombstone_gc_state::update_repair_time() uses copy-on-write
semantics: each call copies the entire per_table_history_maps and the
per-table repair_history_map.  repair_service::load_history() called
this once per history entry, making the load O(N²) in both time and
memory.

Introduce batch_update_repair_time() which performs a single
copy-on-write for any number of entries belonging to the same table.
Restructure load_history() to collect entries into batches of up to
1000 and flush each batch in one call, keeping peak memory bounded.
The batch size limit is intentional: the repair history table currently
has no bound on the number of entries and can grow large.  Note that
this does not cause a problem in the in-memory history map itself:
entries are coalesced internally and only the latest repair time is
kept per range.  The unbounded entry count only makes the batched
update during load expensive.

Fixes: SCYLLADB-104

Closes scylladb/scylladb#29326
2026-04-11 23:54:26 +03:00
Botond Dénes
fbbe2bdce8 Merge 'Introduce repair_service::config and cut dependency from db::config' from Pavel Emelyanov
Spreading db::config around and making all services depend on it is not nice. Most other service that need configuration provide their own config that's populated from db::config in main.cc/cql_test_env.cc and use it, not the global config.

This PR does the same for repair_service.

Enhancing components dependencies, not backporting

Closes scylladb/scylladb#29153

* github.com:scylladb/scylladb:
  repair: Remove db/config.hh from repair/*.cc files
  repair: Move repair_multishard_reader options onto repair_service::config
  repair: Move critical_disk_utilization_level onto repair_service::config
  repair: Move repair_partition_count_estimation_ratio onto repair_service::config
  repair: Move repair_hints_batchlog_flush_cache_time_in_ms onto repair_service::config
  repair: Move enable_small_table_optimization_for_rbno onto repair_service::config
  repair: Introduce repair_service::config
2026-04-09 11:44:25 +03:00
Raphael S. Carvalho
16e387d5f9 repair/replica: Fix race window where post-repair data is wrongly promoted to repaired
During incremental repair, each tablet replica holds three SSTable views:
UNREPAIRED, REPAIRING, and REPAIRED.  The repair lifecycle is:

  1. Replicas snapshot unrepaired SSTables and mark them REPAIRING.
  2. Row-level repair streams missing rows between replicas.
  3. mark_sstable_as_repaired() runs on all replicas, rewriting the
     SSTables with repaired_at = sstables_repaired_at + 1 (e.g. N+1).
  4. The coordinator atomically commits sstables_repaired_at=N+1 and
     the end_repair stage to Raft, then broadcasts
     repair_update_compaction_ctrl which calls clear_being_repaired().

The bug lives in the window between steps 3 and 4.  After step 3, each
replica has on-disk SSTables with repaired_at=N+1, but sstables_repaired_at
in Raft is still N.  The classifier therefore sees:

  is_repaired(N, sst{repaired_at=N+1}) == false
  sst->being_repaired == null   (lost on restart, or not yet set)

and puts them in the UNREPAIRED view.  If a new write arrives and is
flushed (repaired_at=0), STCS minor compaction can fire immediately and
merge the two SSTables.  The output gets repaired_at = max(N+1, 0) = N+1
because compaction preserves the maximum repaired_at of its inputs.

Once step 4 commits sstables_repaired_at=N+1, the compacted output is
classified REPAIRED on the affected replica even though it contains data
that was never part of the repair scan.  Other replicas, which did not
experience this compaction, classify the same rows as UNREPAIRED.  This
divergence is never healed by future repairs because the repaired set is
considered authoritative.  The result is data resurrection: deleted rows
can reappear after the next compaction that merges unrepaired data with the
wrongly-promoted repaired SSTable.

The fix has two layers:

Layer 1 (in-memory, fast path): mark_sstable_as_repaired() now also calls
mark_as_being_repaired(session) on the new SSTables it writes.  This keeps
them in the REPAIRING view from the moment they are created until
repair_update_compaction_ctrl clears the flag after step 4, covering the
race window in the normal (no-restart) case.

Layer 2 (durable, restart-safe): a new is_being_repaired() helper on
tablet_storage_group_manager detects the race window even after a node
restart, when being_repaired has been lost from memory.  It checks:

  sst.repaired_at == sstables_repaired_at + 1
  AND tablet transition kind == tablet_transition_kind::repair

Both conditions survive restarts: repaired_at is on-disk in SSTable
metadata, and the tablet transition is persisted in Raft.  Once the
coordinator commits sstables_repaired_at=N+1 (step 4), is_repaired()
returns true and the SSTable naturally moves to the REPAIRED view.

The classifier in make_repair_sstable_classifier_func() is updated to call
is_being_repaired(sst, sstables_repaired_at) in place of the previous
sst->being_repaired.uuid().is_null() check.

A new test, test_incremental_repair_race_window_promotes_unrepaired_data,
reproduces the bug by:
  - Running repair round 1 to establish sstables_repaired_at=1.
  - Injecting delay_end_repair_update to hold the race window open.
  - Running repair round 2 so all replicas complete mark_sstable_as_repaired
    (repaired_at=2) but the coordinator has not yet committed step 4.
  - Writing post-repair keys to all replicas and flushing servers[1] to
    create an SSTable with repaired_at=0 on disk.
  - Restarting servers[1] so being_repaired is lost from memory.
  - Waiting for autocompaction to merge the two SSTables on servers[1].
  - Asserting that the merged SSTable contains post-repair keys (the bug)
    and that servers[0] and servers[2] do not see those keys as repaired.

NOTE FOR MAINTAINER: Copilot initially only implemented Layer 1 (the
in-memory being_repaired guard), missing the restart scenario entirely.
I pointed out that being_repaired is lost on restart and guided Copilot
to add the durable Layer 2 check.  I also polished the implementation:
moving is_being_repaired into tablet_storage_group_manager so it can
reuse the already-held _tablet_map (avoiding an ERM lookup and try/catch),
passing sstables_repaired_at in from the classifier to avoid re-reading it,
and using compaction_group_for_sstable inside the function rather than
threading a tablet_id parameter through the classifier.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1239.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#29244
2026-04-09 11:42:28 +03:00
Pavel Emelyanov
4bc8ec174c repair: Remove db/config.hh from repair/*.cc files
Now all the code uses repair_service::config and no longer needs global
config description.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-03-20 19:36:50 +03:00
Pavel Emelyanov
35f625e5c7 repair: Move repair_multishard_reader options onto repair_service::config
This actually uses two interconnected options:
repair_multishard_reader_buffer_hint_size and
repair_multishard_reader_enable_read_ahead.

Both are propagated through repair_service::config and pass their
values to repair_reader/make_reader at construction time.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-20 19:36:50 +03:00
Pavel Emelyanov
9bc0d27aae repair: Move critical_disk_utilization_level onto repair_service::config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-20 19:23:47 +03:00
Pavel Emelyanov
80aa0fcdc2 repair: Move repair_partition_count_estimation_ratio onto repair_service::config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-20 19:23:47 +03:00
Pavel Emelyanov
585cb0c718 repair: Move repair_hints_batchlog_flush_cache_time_in_ms onto repair_service::config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-20 19:23:47 +03:00
Pavel Emelyanov
d8f7f86e10 repair: Move enable_small_table_optimization_for_rbno onto repair_service::config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-20 19:23:47 +03:00
Pavel Emelyanov
38a23ff927 repair: Introduce repair_service::config
Most other services have their configs, rpair still uses global
db::config.

Add an empty config struct to repair_service to carry db::config options
the repair service needs.

Subsequent patches will populate the struct with options.

The config is created in main.cc as sharded_parameter because all future
options are live-updateable and should capture theirs source from
db::config on correct shard.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-03-20 19:23:47 +03:00
Asias He
6cb263bab0 repair: Prevent CPU stall during cross-shard row copy and destruction
When handling `repair_stream_cmd::end_of_current_rows`, passing the
foreign list directly to `put_row_diff_handler` triggered a massive
synchronous deep copy on the destination shard. Additionally, destroying
the list triggered a synchronous deallocation on the source shard. This
blocked the reactor and triggered the CPU stall detector.

This commit fixes the issue by introducing `clone_gently()` to copy the
list elements one by one, and leveraging the existing
`utils::clear_gently()` to destroy them. Both utilize
`seastar::coroutine::maybe_yield()` to allow the reactor to breathe
during large cross-shard transfers and cleanups.

Fixes SCYLLADB-403

Closes scylladb/scylladb#28979
2026-03-17 11:05:15 +02:00
Botond Dénes
475220b9c9 Merge 'Remove the rest of pre raft topology code' from Gleb Natapov
Remove the rest of the code that assumes that either group0 does not exist yet or a cluster is till not upgraded to raft topology. Both of those are not supported any more.

No need to backport since we remove functionality here.

Closes scylladb/scylladb#28841

* github.com:scylladb/scylladb:
  service level: remove version 1 service level code
  features: move GROUP0_SCHEMA_VERSIONING to deprecated features list
  migration_manager: remove unused forward definitions
  test: remove unused code
  auth: drop auth_migration_listener since it does nothing now
  schema: drop schema_registry_entry::maybe_sync() function
  schema: drop make_table_deleting_mutations since it should not be needed with raft
  schema: remove calculate_schema_digest function
  schema: drop recalculate_schema_version function and its uses
  migration_manager: drop check for group0_schema_versioning feature
  cdc: drop usage of cdc_local table and v1 generation definition
  storage_service: no need to add yourself to the topology during reboot since raft state loading already did it
  storage_service: remove unused functions
  group0: drop with_raft() function from group0_guard since it always returns true now
  gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more
  gossiper: drop tokens from loaded_endpoint_state
  gossiper: remove unused functions
  storage_service: do not pass loaded_peer_features to join_topology()
  storage_service: remove unused fields from replacement_info
  gossiper: drop is_safe_for_restart() function and its use
  storage_service: remove unused variables from join_topology
  gossiper: remove the code that was only used in gossiper topology
  storage_service: drop the check for raft mode from recovery code
  cdc: remove legacy code
  test: remove unused injection points
  auth: remove legacy auth mode and upgrade code
  treewide: remove schema pull code since we never pull schema any more
  raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer
  group0: hoist the checks for an illegal upgrade into main.cc
  api: drop get_topology_upgrade_state and always report upgrade status as done
  service_level_controller: drop service level upgrade code
  test: drop run_with_raft_recovery parameter to cql_test_env
  group0: get rid of group0_upgrade_state
  storage_service: drop topology_change_kind as it is no longer needed
  storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more
  service_storage: remove unused functions
  storage_service: remove non raft rebuild code
  storage_service: set topology change kind only once
  group0: drop in_recovery function and its uses
  group0: rename use_raft to maintenance_mode and make it sync
2026-03-11 10:24:20 +02:00
Botond Dénes
81e214237f Merge 'Add digests for all sstable components in scylla metadata' from Taras Veretilnyk
This pull request adds support for calculation and storing CRC32 digests for all SSTable components.
This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure
and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream.
Several test cases where introduced to verify expected behaviour.

Additionally, this PR adds new rewrite component mechanism for safe sstable component rewriting.
Previously, rewriting an sstable component (e.g., via rewrite_statistics) created a temporary file that was renamed to the final name after sealing. This allowed crash recovery by simply removing the temporary file on startup.

However, with component digests stored in scylla_metadata (#20100),
replacing a component like Statistics requires atomically updating both the component
and scylla_metadata with the new digest - impossible with POSIX rename.

The new mechanism creates a clone sstable with a fresh generation:

- Hard-links all components from the source except the component being rewritten and scylla_metadata
- Copies original sstable components pointer and recognized components from the source
- Invokes a modifier callback to adjust the new sstable before rewriting
- Writes the modified component along with updated scylla_metadata containing the new digest
- Seals the new sstable with a temporary TOC
- Replaces the old sstable atomically, the same way as it is done in compaction

This is built on the rewrite_sstables compaction framework to support batch operations (e.g., following incremental repair).
In case of any failure durning the whole process, sstable will be automatically deleted on the node startup due to
temporary toc persistence.

Backport is not required, it is a new feature

Fixes https://github.com/scylladb/scylladb/issues/20100, https://github.com/scylladb/scylladb/issues/27453

Closes scylladb/scylladb#28338

* github.com:scylladb/scylladb:
  docs: document components_digests subcomponent and trailing digest in Scylla.db
  sstable_compaction_test: Add tests for perform_component_rewrite
  sstable_test: add verification testcases of SSTable components digests persistance
  sstables: store digest of all sstable components in scylla metadata
  sstables: replace rewrite_statistics with new rewrite component mechanism
  sstables: add new rewrite component mechanism for safe sstable component rewriting
  compaction: add compaction_group_view method to specify sstable version
  sstables: add null_data_sink and serialized_checksum for checksum-only calculation
  sstables: extract default write open flags into a constant
  sstables: Add write_simple_with_digest for component checksumming
  sstables: Extract file writer closing logic into separate methods
  sstables: Implement CRC32 digest-only writer
2026-03-10 16:02:53 +02:00
Gleb Natapov
02fc4ad0a9 treewide: remove schema pull code since we never pull schema any more
Schema pull was used by legacy schema code which is not supported for a
long time now and during legacy recovery which is no longer supported as
well. It can be dropped now.
2026-03-10 10:09:39 +02:00
Botond Dénes
509f2af8db Merge 'repair: Fix rwlock in compaction_state and lock holder lifecycle' from Raphael Raph Carvalho
Consider this:

- repair takes the lock holder
- tablet merge filber destories the compaction group and the compaction state
- repair fails
- repair destroy the lock holder

This is observed in the test:

```
repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036] Repair 1 out of 1 tablets: table=sec_index.users range=(432345564227567615,504403158265495551] replicas=[0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea:15, 498e354c-1254-4d8d-a565-2f5c6523845a:9, 5208598c-84f0-4526-bb7f-573728592172:28]

...

repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: Started to repair 1 out of 1 tables in keyspace=sec_index, table=users, table_id=ea2072d0-ccd9-11f0-8dba-c5ab01bffb77, repair_reason=repair
repair - Enable incremental repair for table=sec_index.users range=(432345564227567615,504403158265495551]
table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: get_sync_boundary: got error from node=0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea, keyspace=sec_index, table=users, range=(432345564227567615,504403158265495551], error=seastar::rpc::remote_verb_error (Compaction state for table [0x60f008fa34c0] not found)
compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge
compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge

....

scylla[10793] Segmentation fault on shard 28, in scheduling group streaming
```

The rwlock in compaction_state could be destroyed before the lock holder
of the rwlock is destroyed. This causes user after free when the lock
the holder is destroyed.

To fix it, users of repair lock will now be waited when a compaction
group is being stopped.
That way, compaction group - which controls the lifetime of rwlock -
cannot be destroyed while the lock is held.
Additionally, the merge completion fiber - that might remove groups -
is properly serialized with incremental repair.

The issue can be reproduced using sanitize build consistently and can not
be reproduced after the fix.

Fixes #27365

Closes scylladb/scylladb#28823

* github.com:scylladb/scylladb:
  repair: Fix rwlock in compaction_state and lock holder lifecycle
  repair: Prevent repair lock holder leakage after table drop
2026-03-05 14:18:25 +02:00
Asias He
225b10b683 repair: Fix rwlock in compaction_state and lock holder lifecycle
Consider this:

- repair takes the lock holder
- tablet merge filber destories the compaction group and the compaction state
- repair fails
- repair destroy the lock holder

This is observed in the test:

```
repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036] Repair 1 out of 1 tablets: table=sec_index.users range=(432345564227567615,504403158265495551] replicas=[0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea:15, 498e354c-1254-4d8d-a565-2f5c6523845a:9, 5208598c-84f0-4526-bb7f-573728592172:28]

...

repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: Started to repair 1 out of 1 tables in keyspace=sec_index, table=users, table_id=ea2072d0-ccd9-11f0-8dba-c5ab01bffb77, repair_reason=repair
repair - Enable incremental repair for table=sec_index.users range=(432345564227567615,504403158265495551]
table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: get_sync_boundary: got error from node=0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea, keyspace=sec_index, table=users, range=(432345564227567615,504403158265495551], error=seastar::rpc::remote_verb_error (Compaction state for table [0x60f008fa34c0] not found)
compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge
compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge

....

scylla[10793] Segmentation fault on shard 28, in scheduling group streaming
```

The rwlock in compaction_state could be destroyed before the lock holder
of the rwlock is destroyed. This causes user after free when the lock
the holder is destroyed.

To fix it, users of repair lock will now be waited when a compaction
group is being stopped.
That way, compaction group - which controls the lifetime of rwlock -
cannot be destroyed while the lock is held.
Additionally, the merge completion fiber - that might remove groups -
is properly serialized with incremental repair.

The issue can be reproduced using sanitize build consistently and can not
be reproduced after the fix.

Fixes #27365

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2026-03-03 21:05:15 -03:00
Raphael S. Carvalho
1d8903d9f7 repair: Prevent repair lock holder leakage after table drop
Prevent repair lock holder from being leaked in repair_service when table
is dropped midway.
The leakage might result in use-after-free later, since the repair lock
itself will be gone after table drop.
The RPC verb that removes the lock on success path will not be called
by coordinator after table was dropped.

Refs #27365.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-896.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2026-03-03 21:05:10 -03:00
Marcin Maliszkiewicz
a83ee6cf66 Merge 'db/batchlog_manager: re-add v1 support for mixed clusters' from Botond Dénes
3f7ee3ce5d introduced system.batchlog_v2, with a schema designed to speed up batchlog replays and make post-replay cleanups much more effective.
It did not introduce a cluster feature for the new table, because it is node local table, so the cluster can switch to the new table gradually, one node at a time.
However, https://github.com/scylladb/scylladb/issues/27886 showed that the switching causes timeouts during upgrades, in mixed clusters. Furthermore, switching to the new table unconditionally  on upgrades nodes, means that on rollback, the batches saved into the v2 table are lost.
This PR introduces re-introduces v1 (`system.batchlog`) support and guards the use of the v2 table with a cluster feature, so mixed clusters keep using v1 and thus be rollback-compatible.
The re-introduced v1 support doesn't support post-replay cleanups for simplicity. The cleanup in v1 was never particularly effective anyway and we ended up disabling it for heavy batchlog users, so I don't think the lack of support for cleanup is a problem.

Fixes: https://github.com/scylladb/scylladb/issues/27886

Needs backport to 2026.1, to fix upgrades for clusters using batches

Closes scylladb/scylladb#28736

* github.com:scylladb/scylladb:
  test/boost/batchlog_manager_test: add tests for v1 batchlog
  test/boost/batchlog_manager_test: make prepare_batches() work with both v1 and v2
  test/boost/batchlog_manager_test: fix indentation
  test/boost/batchlog_manager_test: extract prepare_batches() method
  test/lib/cql_assertions: is_rows(): add dump parameter
  tools/scylla-sstable: extract query result printers
  tools/scylla-sstable: add std::ostream& arg to query result printers
  repair/row_level: repair_flush_hints_batchlog_handler(): add all_replayed to finish log
  db/batchlog_manager: re-add v1 support
  db/batchlog_manager: return all_replayed from process_batch()
  db/batchlog_manager: process_bath() fix indentation
  db/batchlog_manager: make batch() a standalone function
  db/batchlog_manager: make structs stats public
  db/batchlog_manager: allocate limiter on the stack
  db/batchlog_manager: add feature_service dependency
  gms/feature_service: add batchlog_v2 feature
2026-03-02 12:09:10 +01:00
Taras Veretilnyk
5bbc44ed12 sstables: replace rewrite_statistics with new rewrite component mechanism
This commits migrates all callers that used rewrite_statistics to new
rewrite component mechanism.
2026-02-26 22:38:55 +01:00
Gleb Natapov
6173ea476b node_ops: remove topology over node ops code
The code is no longer called.
2026-02-25 10:08:32 +02:00
Botond Dénes
0549b61d55 repair/row_level: repair_flush_hints_batchlog_handler(): add all_replayed to finish log
Provides visibility into whether batchlog replay was successful or not.
2026-02-20 07:03:46 +02:00
Aleksandra Martyniuk
3fe596d556 service: pass topology guard to RBNO
Currently, raft-based node operations with streaming use topology
guards, but repair-based don't.

Topology guards ensure that if a respective session is closed
(the operation has finished), each leftover operation being a part
of this session fails. Thanks to that we won't incorrectly assume
that e.g. the old rpc received late belongs to the newly started
operation. This is especially important if the operation involves
writes.

Pass a topology_guard down from raft_topology_cmd_handler to repair
tasks. Repair tasks already support topology guards.

Fixes: https://github.com/scylladb/scylladb/issues/27759
2026-01-20 10:06:34 +01:00
Asias He
7ba7b25bdd repair: Implement auto repair for tablet repair
This patch implements the basic auto repair support for tablet repair.

It was decided to add no per table configuration for the initial
implementation, so two scylla yaml config options are introduced to set
the default auto repair configs for all the tablet tables.

- auto_repair_enabled_default

Set true to enable auto repair for tablet tables by default. The value
will be overridden by the per keyspace or per table configuration which
is not implemented yet.

- auto_repair_threshold_default_in_seconds

Set the default time in seconds for the auto repair threshold for tablet
tables. If the time since last repair is bigger than the configured
time, the tablet is eligible for auto repair. The value will be
overridden by the per keyspace or per table configuration which is not
implemented yet.

The following metrcis are added:

- auto_repair_needs_repair_nr

The number of tablets with auto repair enabled that needs repair

- auto_repair_enabled_nr

The number of tablets with auto repair enabled

The metrics are useful to tell if auto repair is falling behind.

In the future, more auto repair scheduling will be added, e.g.,
scheduling based on the repaired and unrepaired sstable set size,
tombstone ratio and so on, in addition to the time based scheduling.

Fixes SCYLLADB-99
2026-01-09 16:11:39 +08:00
Asias He
4f77dd058d repair: Add tablet repair progress report support
This patch adds tablet repair progress report support so that the user
could use the /task_manager/task_status API to query the progress.

In order to support this, a new system table is introduced to record the
user request related info, i.e, start of the request and end of the
request.

The progress is accurate when tablet split or merge happens in the
middle of the request, since the tokens of the tablet are recorded when
the request is started and when repair of each tablet is finished. The
original tablet repair is considered as finished when the finished
ranges cover the original tablet token ranges.

After this patch, the /task_manager/task_status API will report correct
progress_total and progress_completed.

Fixes #22564
Fixes #26896

Closes scylladb/scylladb#27679
2026-01-08 21:55:18 +02:00
Asias He
0aabf51380 repair: Fix sstable_list_to_mark_as_repaired with multishard writer
It was obseved:

```
test_repair_disjoint_row_2nodes_diff_shard_count was spuriously failing due to
segfault.

backtrace pointed to a failure when allocating an object from the chain of
freed objects, which indicates memory corruption.

(gdb) bt
    at ./seastar/include/seastar/core/shared_ptr.hh:275
    at ./seastar/include/seastar/core/shared_ptr.hh:430
Usual suspect is use-after-free, so ran the reproducer in the sanitize mode,
which indicated shared ptr was being copied into another cpu through the
multi shard writer:

seastar - shared_ptr accessed on non-owner cpu, at: ...
--------
seastar::smp_message_queue::async_work_item<mutation_writer::multishard_writer::make_shard_writer...

```

The multishard writer itself was fine, the problem was in the streaming consumer
for repair copying a shared ptr. It could work fine with same smp setting, since
there will be only 1 shard in the consumer path, from rpc handler all the way
to the consumer. But with mixed smp setting, the ptr would be copied into the
cpus involved, and since the shared ptr is not cpu safe, the refcount change
can go wrong, causing double free, use-after-free.

To fix, we pass a generic incremental repair handler to the streaming
consumer. The handler is safe to be copied to different shards. It will
be a no op if incremental repair is not enabled or on a different shard.

A reproducer test is added. The test could reproduce the crash
consistently before the fix and work well after the fix.

Fixes #27666

Closes scylladb/scylladb#27870
2026-01-08 21:55:18 +02:00
Asias He
3abda7d15e topology_coordinator: Ensure repair_update_compaction_ctrl is executed
Consider this:

- n1 is a coordinator and schedules tablet repair
- n1 detects tablet repair failed, so it schedules tablet transition to end_repair state
- n1 loses leadership and n2 becomes the new topology coordinator
- n2 runs end_repair on the tablet with session_id=00000000-0000-0000-0000-000000000000
- when a new tablet repair is scheduled, it hangs since the lock is already taken because it was not removed in previous step

To fix, we use the global_tablet_id to index the lock instead of the
session id.

In addition, we retry the repair_update_compaction_ctrl verb in case of
error to ensure the verb is eventually executed. The verb handler is
also updated to check if it is still in end_repair stage.

Fixes #26346

Closes scylladb/scylladb#27740
2025-12-31 13:17:18 +01:00
Botond Dénes
bfdd4f7776 Merge 'Synchronize incremental repair and tablet split' from Raphael Raph Carvalho
Split prepare can run concurrently with repair.

Consider this:

1) split prepare starts
2) incremental repair starts
3) split prepare finishes
4) incremental repair produces unsplit sstable
5) split is not happening on sstable produced by repair
        5.1) that sstable is not marked as repaired yet
        5.2) might belong to repairing set (has compaction disabled)
6) split executes
7) repairing or repaired set has unsplit sstable

If split was acked to coordinator (meaning prepare phase finished),
repair must make sure that all sstables produced by it are split.
It's not happening today with incremental repair because it disables
split on sstables belonging to repairing group. And there's a window
where sstables produced by repair belong to that group.

To solve the problem, we want the invariant where all sealed sstables
will be split.
To achieve this, streaming consumers are patched to produce unsealed
sstable, and the new variant add_new_sstable_and_update_cache() will
take care of splitting the sstable while it's unsealed.
If no split is needed, the new sstable will be sealed and attached.

This solution was also needed to interact nicely with out of space
prevention too. If disk usage is critical, split must not happen on
restart, and the invariant aforementioned allows for it, since any
unsplit sstable left unsealed will be discarded on restart.
The streaming consumer will fail if disk usage is critical too.

The reason interposer consumer doesn't fully solve the problem is
because incremental repair can start before split, and the sstable
being produced when split decision was emitted must be split before
attached. So we need a solution which covers both scenarios.

Fixes #26041.
Fixes #27414.

Should be backported to 2025.4 that contains incremental repair

Closes scylladb/scylladb#26528

* github.com:scylladb/scylladb:
  test: Add reproducer for split vs intra-node migration race
  test: Verify split failure on behalf of repair during critical disk utilization
  test: boost: Add failure_when_adding_new_sstable_test
  test: Add reproducer for split vs incremental repair race condition
  compaction: Fail split of new sstable if manager is disabled
  replica: Don't split in do_add_sstable_and_update_cache()
  streaming: Leave sstables unsealed until attached to the table
  replica: Wire add_new_sstables_and_update_cache() into intra-node streaming
  replica: Wire add_new_sstable_and_update_cache() into file streaming consumer
  replica: Wire add_new_sstable_and_update_cache() into streaming consumer
  replica: Document old add_sstable_and_update_cache() variants
  replica: Introduce add_new_sstables_and_update_cache()
  replica: Introduce add_new_sstable_and_update_cache()
  replica: Account for sstables being added before ACKing split
  replica: Remove repair read lock from maybe_split_new_sstable()
  compaction: Preserve state of input sstable in maybe_split_new_sstable()
  Rename maybe_split_sstable() to maybe_split_new_sstable()
  sstables: Allow storage::snapshot() to leave destination sstable unsealed
  sstables: Add option to leave sstable unsealed in the stream sink
  test: Verify unsealed sstable can be compacted
  sstables: Allow unsealed sstable to be loaded
  sstables: Restore sstable_writer_config::leave_unsealed
2025-12-23 07:28:56 +02:00
Calle Wilund
5f8f724d78 repair: Don't use off-strategy as repair destination with tablet tables
Fixes #17384

Bypasses enabling off-strategy storage/placement for repair streams
when table repaired is using tablets. Instead, the resulting sstable(s)
will be placed in the "normal" set of sstables, and bypass a post-repair
off-strategy compaction.

v2:
Bypass off-strat for whatever reason iff dest is tablets.

Closes scylladb/scylladb#27500
2025-12-16 06:54:07 +02:00
Raphael S. Carvalho
77a4f95eb8 test: Add reproducer for split vs incremental repair race condition
Refs #26041.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-12-12 17:01:16 -03:00
Avi Kivity
24264e24bb Revert "repair: Add tablet repair progress report support"
This reverts commit faad0167d7. It causes
a regression in

test_two_tablets_concurrent_repair_and_migration_repair_writer_level

in debug mode (with ~5%-10% probability).

Fixes #27510.

Closes scylladb/scylladb#27560
2025-12-11 12:18:11 +02:00
Asias He
faad0167d7 repair: Add tablet repair progress report support
This patch adds tablet repair progress report support so that the user
could use the /task_manager/task_status API to query the progress.

In order to support this, a new system table is introduced to record the
user request related info, i.e, start of the request and end of the
request.

The progress is accurate when tablet split or merge happens in the
middle of the request, since the tokens of the tablet are recorded when
the request is started and when repair of each tablet is finished. The
original tablet repair is considered as finished when the finished
ranges cover the original tablet token ranges.

After this patch, the /task_manager/task_status API will report correct
progress_total and progress_completed.

Fixes #22564
Fixes #26896

Closes scylladb/scylladb#26924
2025-12-08 13:35:19 +02:00
Asias He
e97a504775 repair: Allow min max range to be updated for repair history
It is observed that:

repair - repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: Failed to update
system.repair_history table of node d27de212-6f32-4649ad76-a9ef1165fdcb:
seastar::rpc::remote_verb_error (repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: range (minimum
token,maximum token) is not in the format of (start, end])

This is because repair checks the end of the range to be repaired needs
to be inclusive. When small_table_optimization is enabled for regular
repair, a (minimum token,maximum token) will be used.

To fix, we can relax the check of (start, end] for the min max range.

Fixes #27220

Closes scylladb/scylladb#27357
2025-12-05 10:41:25 +02:00
Aleksandra Martyniuk
e3e81a9a7a repair: throw if flush failed in get_flush_time
Currently, _flush_time was stored as a std::optional<gc_clock::time_point>
and std::nullopt indicates that the flush was needed but failed. It's confusing
for the caller and does not work as expected since the _flush_time is initialized
with value (not optional).

Change _flush_time type to gc_clock::time_point. If a flush is needed but failed,
get_flush_time() throws an exception.

This was suppose to be a part of https://github.com/scylladb/scylladb/pull/26319
but it was mistakenly overwritten during rebases.

Refs: https://github.com/scylladb/scylladb/issues/24415.

Closes scylladb/scylladb#26794
2025-12-04 11:45:53 +02:00
Asias He
da5cc13e97 repair: Fix deadlock when topology coordinator steps down in the middle
Consider this:

1) n1 is the topology coordinator
2) n1 schedules and executes a tablet repair with session id s1 for a
tablet on n3 an n4.
3) n3 and n4 take and store the in _rs._repair_compaction_locks[s1]
4) n1 steps down before it executes
locator::tablet_transition_stage::end_repair
5) n2 becomes the new topology coordinator
6) n2 runs locator::tablet_transition_stage::repair again
7) n3 and n4 try to take the lock again and hangs since the lock is
already taken.

To avoid the deadlock, we can throw in step 7 so that n2 will
proceed to end_repair stage and release the lock. After that, the
scheduler could schedule the tablet repair request again.

Fixes #26346

Closes scylladb/scylladb#27163
2025-11-28 15:14:39 +01:00
Radosław Cybulski
d589e68642 Add precompiled headers to CMakeLists.txt
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.

New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.

The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.

The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.

Note: following configuration needs to be added to ccache.conf:

    sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime

Closes scylladb/scylladb#26617
2025-11-21 12:27:41 +02:00
Asias He
dbeca7c14d repair: Add metric for time spent on tablet repair
It is useful to check time spent on tablet repair. It can be used to
compare incremental repair and non-incremental repair. The time does not
include the time waiting for the tablet scheduler to schedule the tablet
repair task.

Fixes #26505

Closes scylladb/scylladb#26502
2025-11-06 10:00:20 +03:00
Aleksandra Martyniuk
d436233209 repair: fail tablet repair if any batch wasn't sent successfully
If any batch replay failed, we cannot update repair_time as we risk the
data resurrection.

If replay of any batch needs to be retried, run the whole repair but
fail at the very end, so that the repair_time for it won't be updated.
2025-10-23 10:39:42 +02:00
Aleksandra Martyniuk
7f20b66eff db: repair: throw if replay fails
Return a flag determining whether all the batches were sent successfully in
batchlog_manager::replay_all_failed_batches (batches skipped due to being
too fresh are not counted). Throw in repair_flush_hints_batchlog_handler
if not all batches were replayed, to ensure that repair_time isn't updated.
2025-10-23 10:38:31 +02:00
Asias He
33bc1669c4 repair: Fix uuid and nodes_down order in the log
Fixes #26536

Closes scylladb/scylladb#26547
2025-10-20 13:21:59 +03:00
Michał Chojnowski
55c4b89b88 sstables: make sstable::estimated_keys_for_range asynchronous
Currently, `sstable::estimated_keys_for_range` works by
checking what fraction of Summary is covered by the given
range, and multiplying this fraction to the number of all keys.
Since computing things on Summary doesn't involve I/O (because Summary
is always kept in RAM), this is synchronous.

In a later patch, we will modify `sstable::estimated_keys_for_range`
so that it can deal with sstables that don't have a Summary
(because they use BTI indexes instead of BIG indexes).
In that case, the function is going to compute the relevant fraction
by using the index instead of Summary. This will require making
the function asynchronous. This is what we do in this patch.

(The actual change to the logic of `sstable::estimated_keys_for_range`
will come in the next patch. In this one, we only make it asynchronous).
2025-09-29 13:01:21 +02:00
Asias He
b31e651657 repair: Always reset node ops progress to 100% upon completion
Always set the node ops progress to 100% when the operation finishes,
regardless of success or failure. This ensures the progress never
remains below 100%, which would otherwise indicates a pending node
operation in case of an error.

Fixes #26193

Closes scylladb/scylladb#26194
2025-09-25 11:05:52 +03:00
Pavel Emelyanov
a1ea553fe1 code: Replace distributed<> with sharded<>
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.

Most of the patch is generated with

    for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
    for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done

and a small manual change in test/perf/perf.hh

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#26136
2025-09-19 12:22:51 +02:00
Gleb Natapov
d3badf7406 storage_service: change node_ops_info::ignore_nodes to host id
It drop useless translation from id to ip during removenode through
topology coordinator.

Closes scylladb/scylladb#25958
2025-09-15 10:18:24 +02:00
Asias He
cb7db47ae1 repair: Add incremental_mode option for tablet repair
This patch introduces a new `incremental_mode` parameter to the tablet
repair REST API, providing more fine-grained control over the
incremental repair process.

Previously, incremental repair was on and could not be turned off. This
change allows users to select from three distinct modes:

- `regular`: This is the default mode. It performs a standard
  incremental repair, processing only unrepaired sstables and skipping
  those that are already repaired. The repair state (`repaired_at`,
  `sstables_repaired_at`) is updated.

- `full`: This mode forces the repair to process all sstables, including
  those that have been previously repaired. This is useful when a full
  data validation is needed without disabling the incremental repair
  feature. The repair state is updated.

- `disabled`: This mode completely disables the incremental repair logic
  for the current repair operation. It behaves like a classic
  (pre-incremental) repair, and it does not update any incremental
  repair state (`repaired_at` in sstables or `sstables_repaired_at` in
  the system.tablets table).

The implementation includes:

- Adding the `incremental_mode` parameter to the
  `/storage_service/repair/tablet` API endpoint.
- Updating the internal repair logic to handle the different modes.
- Adding a new test case to verify the behavior of each mode.
- Updating the API documentation and developer documentation.

Fixes #25605

Closes scylladb/scylladb#25693
2025-09-09 06:50:21 +03:00
Radosław Cybulski
c242234552 Revert "build: add precompiled headers to CMakeLists.txt"
This reverts commit 01bb7b629a.

Closes scylladb/scylladb#25735
2025-09-03 09:46:00 +03:00
Avi Kivity
600349e29a Merge 'tasks: return task::impl from make_and_start_task ' from Aleksandra Martyniuk
Currently, make_and_start_task returns a pointer to task_manager::task
that hides the implementation details. If we need to access
the implementation (e.g. because we want a task to "return" a value),
we need to make and start task step by step openly.

Return task_manager::task::impl from make_and_start_task. Use it
where possible.

Fixes: https://github.com/scylladb/scylladb/issues/22146.

Optimization; no backport

Closes scylladb/scylladb#25743

* github.com:scylladb/scylladb:
  tasks: return task::impl from make_and_start_task
  compaction: use current_task_type
  repair: add new param to tablet_repair_task_impl
  repair: add new params to shard_repair_task_impl
  repair: pass argument by value
2025-08-31 15:44:37 +03:00
Avi Kivity
bc5773f777 Merge 'Add out of space prevention mechanisms' from Łukasz Paszkowski
When a scaling out is delayed or fails, it is crucial to ensure that clusters remain operational
and recoverable even under extreme conditions. To achieve this, the following proactive measures
are implemented:
- reject writes
      - includes: inserts, updates, deletes, counter updates, hints, read+repair and lwt writes
      - applicable to: user tables, views, CDC log, audit, cql tracing
- stop running compactions/repairs and prevent from starting new ones
- reject incoming tablet migrations

The aforementioned mechanisms are automatically enabled when node's disk utilization reaches
the critical level (default: 98%) and disabled when the utilization drop below the threshold.

Apart from that, the series add tests that require mounted volumes to simulate out of space.
The paths to the volumes can be provided using the a pytest argument, i.e.  `--space-limited-dirs`.
When not provided, tests are skipped.

Test scenarios:

1. Start a cluster and write data until one of the nodes reaches 90% of the disk utilization
2. Perform an **operation** that would take the nodes over 100%
3. The nodes should not exceed the critical disk utilization (98% by default)
4. Scale out the cluster by adding one node per rack
5. Retry or wait for the **operation** from step 2

The **operation** is: writing data, running compactions, building materialized views, running repair,
migrating tablets (caused by RF change, decommission).

The test is successful, if no nodes run out of space, the **operation** from step 2 is
aborted/paused/timed out and the **operation** from step 5 is successful.

`perf-simple-query --smp 1 -m 1G` results obtained for fixed 400MHz frequency:

Read path (before)

```
instructions_per_op:
	mean=   39661.51 standard-deviation=34.53
	median= 39655.39 median-absolute-deviation=23.33
	maximum=39708.71 minimum=39622.61
```

Read path (after)

```
instructions_per_op:
	mean=   39691.68 standard-deviation=34.54
	median= 39683.14 median-absolute-deviation=11.94
	maximum=39749.32 minimum=39656.63
```

Write path (before):

```
instructions_per_op:
	mean=   50942.86 standard-deviation=97.69
	median= 50974.11 median-absolute-deviation=34.25
	maximum=51019.23 minimum=50771.60
```

Write path (after):

```
instructions_per_op:
	mean=   51000.15 standard-deviation=115.04
	median= 51043.93 median-absolute-deviation=52.19
	maximum=51065.81 minimum=50795.00
```

Fixes: https://github.com/scylladb/scylladb/issues/14067
Refs: https://github.com/scylladb/scylladb/issues/2871

No backport, as it is a new feature.

Closes scylladb/scylladb#23917

* github.com:scylladb/scylladb:
  tests/cluster: Add new storage tests
  test/scylla_cluster: Override workdir when passed via cmdline
  streaming: Reject incoming migrations
  storage_service: extend locator::load_stats to collect per-node critical disk utilization flag
  repair_service: Add a facility to disable the service
  compaction_manager: Subscribe to out of space controller
  compaction_manager: Replace enabled/disabled states with running state
  database: Add critical_disk_utilization mode database can be moved to
  disk_space_monitor: add subscription API for threshold-based disk space monitoring
  docs: Add feature documentation
  config: Add critical_disk_utilization_level option
  replica/exceptions: Add a new custom replica exception
2025-08-30 18:47:57 +03:00
Piotr Dulikowski
7ccb50514d Merge 'Introduce view building coordinator' from Michał Jadwiszczak
This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views.

The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table.
The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built.
The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results.
This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch.

The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation.
If the operation fails at any moment, aborted tasks are rollback.

The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149).

For detailed description check: `docs/dev/view-building-coordinator.md`

Fixes https://github.com/scylladb/scylladb/issues/22288
Fixes https://github.com/scylladb/scylladb/issues/19149
Fixes https://github.com/scylladb/scylladb/issues/21564
Fixes https://github.com/scylladb/scylladb/issues/17603
Fixes https://github.com/scylladb/scylladb/issues/22586
Fixes https://github.com/scylladb/scylladb/issues/18826
Fixes https://github.com/scylladb/scylladb/issues/23930

---

This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942

Closes scylladb/scylladb#23760

* github.com:scylladb/scylladb:
  test/cluster: add view build status tests
  test/cluster: add view building coordinator tests
  utils/error_injection: allow to abort `injection_handler::wait_for_message()`
  test: adjust existing tests
  utils/error_injection: add injection with `sleep_abortable()`
  db/view/view_builder: ignore `no_such_keyspace` exception
  docs/dev: add view building coordinator documentation
  db/view/view_building_worker: work on `process_staging` tasks
  db/view/view_building_worker: register staging sstable to view building coordinator when needed
  db/view/view_building_worker: discover staging sstables
  db/view/view_building_worker: add method to register staging sstable
  db/view/view_update_generator: add method to process staging sstables instantly
  db/view/view_update_generator: extract generating updates from staging sstables to a method
  db/view/view_update_generator: ignore tablet-based sstables
  db/view/view_building_coordinator: update view build status on node join/left
  db/view/view_building_coordinator: handle tablet operations
  db/view: add view building task mutation builder
  service/topology_coordinator: run view building coordinator
  db/view: introduce `view_building_coordinator`
  db/view/view_building_worker: update built views locally
  db/view: introduce `view_building_worker`
  db/view: extract common view building functionalities
  db/view: prepare to create abstract `view_consumer`
  message/messaging_service: add `work_on_view_building_tasks` RPC
  service/topology_coordinator: make `term_changed_error` public
  db/schema_tables: create/cleanup tasks when an index is created/dropped
  service/migration_manager: cleanup view building state on drop keyspace
  service/migration_manager: cleanup view building state on drop view
  service/migration_manager: create view building tasks on create view
  test/boost: enable proxy remote in some tests
  service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()`
  service/migration_manager: coroutinize `prepare_new_view_announcement()`
  service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine`
  service: reload `view_building_state_machine` on group0 apply()
  service/vb_coordinator: add currently processing base
  db/system_keyspace: move `get_scylla_local_mutation()` up
  db/system_keyspace: add `view_building_tasks` table
  db/view: add view_building_state and views_state
  db/system_keyspace: add method to get view build status map
  db/view: extract `system.view_build_status_v2` cql statements to system_keyspace
  db/system_keyspace: move `internal_system_query_state()` function earlier
  db/view: ignore tablet-based views in `view_builder`
  gms/feature_service: add VIEW_BUILDING_COORDINATOR feature
2025-08-29 17:28:44 +02:00