Commit Graph

805 Commits

Author SHA1 Message Date
Yaniv Michael Kaul
2fbba4a071 raft, service, locator: create raft_fwd.hh and reduce heavy header includes
Create raft/raft_fwd.hh with lightweight type aliases (server_id, group_id,
term_t, index_t) backed only by raft/internal.hh, avoiding the heavy
raft/raft.hh (832 lines with futures, abort_source, bytes_ostream).

Replace raft/raft.hh with raft/raft_fwd.hh in headers that only need the
basic ID types: tablets.hh, topology_state_machine.hh,
topology_coordinator.hh, storage_service.hh, group0_fwd.hh,
view_building_coordinator.hh, view_building_worker.hh.

Also remove gossiper.hh and tablet_allocator.hh from storage_service.hh
(forward declarations suffice), and remove unused reactor.hh from
tablets.hh. Add explicit includes in .cc files that lost transitive
availability.
2026-04-17 01:08:04 +03:00
Yaniv Michael Kaul
be5fa64d36 db: break gossiper.hh include from system_keyspace.hh
Extract loaded_endpoint_state into a standalone lightweight header to
avoid pulling the heavy gossiper.hh (and transitively query-result-set.hh)
into every includer of system_keyspace.hh. Add explicit includes where
the full definitions are actually needed.

Reduces clean dev build time by ~2 minutes (-8%).
2026-04-16 23:27:55 +03:00
Yaniv Michael Kaul
43e337a663 db, test: add explicit includes for storage_service.hh and system_keyspace.hh
Add explicit includes that were previously available transitively through
service/storage_proxy.hh -> service/storage_service.hh.

This prepares for removing the unused storage_service.hh include from
storage_proxy.hh in a follow-up commit.

Speedup: prerequisite for storage_proxy.hh include chain reduction
(measured -5.8% wall-clock combined with all changes in this series,
same-session A/B: 16m14s -> 15m17s at -j16).
2026-04-16 18:22:41 +03:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Avi Kivity
8ccee6803e Merge 'Remove upgrade view builder' from Gleb Natapov
Since we do no longer support upgrade from versions that do not support
v2 of "view building status" code (building status is managed by raft) we can remove v1 code and upgrade code and make sure we do not boot with old "builder status" version.

v2 version was introduced by 8d25a4d678 which is included in scylla-2025.1.0.

No backport needed since this is code removal.

Closes scylladb/scylladb#29105

* github.com:scylladb/scylladb:
  view: drop unused v1 builder code
  view: remove upgrade to raft code
2026-04-12 00:39:26 +03:00
Piotr Dulikowski
ec0231c36c Merge 'db/view/view_building_worker: lock staging sstables mutex for all necessary shards when creating tasks' from Michał Jadwiszczak
To create `process_staging` view building tasks, we firstly need to collect informations about them on shard0, create necessary mutations, commit them to group0 and move staging sstables objects to their original shards.

But there is a possible race after committing the group0 command and before moving the staging sstables to their shards. Between those two events, the coordinator may schedule freshly created tasks and dispatch them to the worker but the worker won't have the sstables objects because they weren't moved yet.

This patch fixes the race by holding `_staging_sstables_mutex` locks from all necessary shards when executing `create_staging_sstable_tasks()`. With this, even if the task will be scheduled and dispatched quickly, the worker will wait with executing it until the sstables objects are moved and the locks are released.

Fixes SCYLLADB-816

This PR should be backported to all versions containing view building coordinator (2025.4 and newer).

Closes scylladb/scylladb#29174

* github.com:scylladb/scylladb:
  db/view/view_building_worker: fix indentation
  db/view/view_building_worker: lock staging sstables mutex for necessary shards when creating tasks
2026-04-09 08:37:51 +03:00
Michał Jadwiszczak
9cf94116c2 db/view/view_building_worker: fix indentation 2026-04-07 16:12:04 +02:00
Michał Jadwiszczak
c9aa5bb09c db/view/view_building_worker: lock staging sstables mutex for necessary shards when creating tasks
To create `process_staging` view building tasks, we firstly need to
collect informations about them on shard0, create necessary mutations,
commit them to group0 and move staging sstables objects to their
original shards.

But there is a possible race after committing the group0 command
and before moving the staging sstables to their shards.
Between those two events, the coordinator may schedule freshly created
tasks and dispatch them to the worker but the worker won't have the
sstables objects because they weren't moved yet.

This patch fixes the race by holding `_staging_sstables_mutex` locks
from necessary shards when executing `create_staging_sstable_tasks()`.
With this, even if the task will be scheduled and dispatched quickly,
the worker will wait with executing it until the sstables objects are
moved and the locks are released.

Fixes SCYLLADB-816
2026-04-07 16:11:45 +02:00
Michał Jadwiszczak
51c164c8d2 view_building_worker: extract starting a new batch to state's method
Following the previous commit, a new batch cannot be started if the
state was already drained.
This commit also adds a check that only one batch is running at a time.
2026-04-07 08:39:05 +02:00
Michał Jadwiszczak
639aa223f3 view_building_worker: distinguish between state's clear() and drain()
While both of this methods do the same (abort current batch, clear
data), we can clear the state multiple times during view_building_worker
lifetime (for instance when processing base table is changed) but
`view_building_worker::state::drain()` should be called only once and
after this no other work on the state should be done.
2026-04-07 08:39:05 +02:00
Michał Jadwiszczak
7aea524f52 view_building_worker: lock mutexes before breaking them in drain()
Not doing this may lead to races like SCYLLADB-844.
If some consumer is holding a lock of a mutex and `drain()`
is just braking the mutex without locking it beforehand,
then the consumer may process its code which should be aborted.

An example of the race is SCYLLADB-844, where `work_on_tasks()` is
holding `_state._mutex` while it is broken by `drain()`.
This causes a new batch is started after the `_state` is cleared.
2026-04-07 08:39:00 +02:00
Michał Jadwiszczak
91c7ac1fb2 view_building_worker: execute drain() once
Future changes will require that the view building worker is drained
only once per its lifetime.
2026-04-07 08:35:02 +02:00
Michael Litvak
399260a6c0 test: mv: fix flaky wait for commitlog sync
Previously the test test_interrupt_view_build_shard_registration stopped
the node ungracefully and used commitlog periodic mode to persist the
view build progress in a not very reliable way.

It can happen that due to timing issues, the view build progress is not
persisted, or some of it is persisted in a different ordering than
expected.

To make the test more reliable we change it to stop the node gracefully,
so the commitlog is persisted in a graceful and consistent way, without
using the periodic mode delay. We need to also change the injection for
the shutdown to not get stuck.

Fixes SCYLLADB-1005

Closes scylladb/scylladb#29008
2026-03-19 10:41:21 +01:00
Gleb Natapov
77d3245e02 view: remove upgrade to raft code
Since we do no longer support upgrade from versions that do not support
v2 of view building code we can remove upgrade code and make sure we do
not boot with old builder version.
2026-03-18 17:45:40 +02:00
Botond Dénes
475220b9c9 Merge 'Remove the rest of pre raft topology code' from Gleb Natapov
Remove the rest of the code that assumes that either group0 does not exist yet or a cluster is till not upgraded to raft topology. Both of those are not supported any more.

No need to backport since we remove functionality here.

Closes scylladb/scylladb#28841

* github.com:scylladb/scylladb:
  service level: remove version 1 service level code
  features: move GROUP0_SCHEMA_VERSIONING to deprecated features list
  migration_manager: remove unused forward definitions
  test: remove unused code
  auth: drop auth_migration_listener since it does nothing now
  schema: drop schema_registry_entry::maybe_sync() function
  schema: drop make_table_deleting_mutations since it should not be needed with raft
  schema: remove calculate_schema_digest function
  schema: drop recalculate_schema_version function and its uses
  migration_manager: drop check for group0_schema_versioning feature
  cdc: drop usage of cdc_local table and v1 generation definition
  storage_service: no need to add yourself to the topology during reboot since raft state loading already did it
  storage_service: remove unused functions
  group0: drop with_raft() function from group0_guard since it always returns true now
  gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more
  gossiper: drop tokens from loaded_endpoint_state
  gossiper: remove unused functions
  storage_service: do not pass loaded_peer_features to join_topology()
  storage_service: remove unused fields from replacement_info
  gossiper: drop is_safe_for_restart() function and its use
  storage_service: remove unused variables from join_topology
  gossiper: remove the code that was only used in gossiper topology
  storage_service: drop the check for raft mode from recovery code
  cdc: remove legacy code
  test: remove unused injection points
  auth: remove legacy auth mode and upgrade code
  treewide: remove schema pull code since we never pull schema any more
  raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer
  group0: hoist the checks for an illegal upgrade into main.cc
  api: drop get_topology_upgrade_state and always report upgrade status as done
  service_level_controller: drop service level upgrade code
  test: drop run_with_raft_recovery parameter to cql_test_env
  group0: get rid of group0_upgrade_state
  storage_service: drop topology_change_kind as it is no longer needed
  storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more
  service_storage: remove unused functions
  storage_service: remove non raft rebuild code
  storage_service: set topology change kind only once
  group0: drop in_recovery function and its uses
  group0: rename use_raft to maintenance_mode and make it sync
2026-03-11 10:24:20 +02:00
Gleb Natapov
0b508c5f96 test: remove unused injection points
Also remove test_auth_raft_command_split test which is irrelevant since 5ba7d1b116
because it does not use the function that injects max sized command after the
commit.
2026-03-10 10:09:39 +02:00
Avi Kivity
01ddc17ab9 Merge 'mv: allow skipping view updates when a collection is unmodified' from Wojciech Mitros
When we generate view updates, we check whether we can skip the
entire view update if all columns selected by the view are unmodified.
However, for collection columns, we only check if they were unset
before and after the update.
In this patch we add a check for the actual collection contents.
We perform this check for both virtual and non-virtual selections.
When the column is only a virtual column in the view, it would be
enough to check the liveness of each collection cell, however for
that we'd need to deserialize the entire collection anyway, which
should be effectively as expensive as comparing all of its bytes.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-808

Closes scylladb/scylladb#28839

* github.com:scylladb/scylladb:
  mv: allow skipping view updates when a collection is unmodified
  mv: allow skipping view updates if an empty collection remains unset
2026-03-09 22:46:01 +02:00
Piotr Dulikowski
42d70baad3 db: view: mutate_MV: don't hold keyspace ref across preemption
Currently, the view_update_generator::mutate_MV function acquires a
reference to the keyspace relevant to the operation, then it calls
max_concurrent_for_each and uses that reference inside the lambda passed
to that function. max_concurrent_for_each can preempt and there is no
mechanism that makes sure that the keyspace is alive until the view
updates are generated, so it is possible that the keyspace is freed by
the time the reference is used.

Fix the issue by precomputing the necessary information based on the
keyspace reference right away, and then passing that information by
value to the other parts of the code. It turns out that we only need to
know whether the keyspace uses tablets and whether it uses a network
topology strategy.

Fixes: scylladb/scylladb#28925

Closes scylladb/scylladb#28928
2026-03-09 15:04:26 +02:00
Wojciech Mitros
0008976e2f mv: allow skipping view updates when a collection is unmodified
When we generate view updates, we check whether we can skip the
entire view update if all columns selected by the view are unmodified.
However, for collection columns, we only check if they were unset
before and after the update.
In this patch we add a check for the actual collection contents.
We perform this check for both virtual and non-virtual selections.
When the column is only a virtual column in the view, it would be
enough to check the liveness of each collection cell, however for
that we'd need to deserialize the entire collection anyway, which
should be effectively as expensive as comparing all of its bytes.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-808
2026-03-08 16:23:22 +01:00
Wojciech Mitros
7d1e0a2e4d mv: allow skipping view updates if an empty collection remains unset
Currently, when we generate view updates, we skip the view update if
all columns selected by the view are unchanged in the base table update.
However, this does not apply for collection columns - if the base table
has a collection regular column, we never allow skipping generating
view updates and the reason for that is missing implementation.
We can easily relax this for the case where the collection was missing
before and after the update - in this commit we move the check for
collections after the check for missing cells.
2026-03-08 16:22:27 +01:00
Dawid Mędrek
7564a56dc8 Merge 'tombstone_gc: allow using repair-mode tombstone gc with RF=1 tables' from Botond Dénes
Currently, repair-mode tombstone-gc cannot be used on tables with RF=1. We want to make repair-mode the default for all tablet tables (and more, see https://github.com/scylladb/scylladb/issues/22814), but currently a keyspace created with RF=1 and later altered to RF>1 will end up using timeout-mode tombstone gc. This is because the repair-mode tombstone-gc code relies on repair history to determine the gc-before time for keys/ranges. RF=1 tables cannot run repairs so they will have empty repair history and consequently won't be able to purge tombstones.
This PR solves this by keeping a registry of RF=1 tables and consulting this registry when creating `tombstone_gc_state` objects. If the table is RF=1, tombstone-gc will work as if the table used immediate-mode tombstone-gc. The registry is updated on each replication update. As soon as the table is not RF=1 anymore, the tombstone-gc reverts to the natural repair-mode behaviour.

After this PR, tombstone-gc defaults to repair-mode for all tables, regardless of RF and tablets/vnodes.

Fixes: SCYLLADB-106.

New feature, no backport required.

Closes scylladb/scylladb#22945

* github.com:scylladb/scylladb:
  test/{boost,cluster}: add test for tombstone gc mode=repair with RF=1
  tombstone_gc: allow use of repair-mode for RF=1 tables
  replica/table: update rf=1 table registry in shared tombstone-gc state
  tombstone_gc: tombstone_gc_before_getter: consider RF when getting gc before time
  tombstone_gc: unpack per_table_history_maps
  tombstone_gc: extract _group0_gc_time from per_table_history_map
  tombstone_gc: drop tombstone_gc_state(nullptr) ctor and operator bool()
  test/lib/random_schema: use timeout-mode tombstone_gc
  tombstone_gc_options: add C++ friendly constructor
  test: move away from tombstone_gc_state(nullptr) ctor
  treewide: move away from tombstone_gc_state(nullptr) ctor
  sstable: move away from tombstone_gc_mode::operator bool()
  replica/table: add get_tombstone_gc_state()
  compaction: use tombstone_gc_state with value semantics
  db/row_cache: use tombstone_gc_state with value semantics
  tombstone_gc: introduce tombstone_gc_state::for_tests()
2026-03-05 11:50:31 +01:00
Wojciech Mitros
38f02b8d76 mv: remove dead code in view_updates::can_skip_view_updates
When we create a materialized view, we consider 2 cases:
1. the view's primary key contains a column that is not
in the primary key of the base table
2. the view's primary key doesn't contain such a column

In the 2nd case, we add all columns from the base table
to the schema of the view (as virtual columns). As a result,
all of these columns are effectively "selected" in
view_updates::can_skip_view_updates. Same thing happens when
we add new columns to the base table using ALTER.
Because of this, we can never have !column_is_selected and
!has_base_non_pk_columns_in_view_pk at the same time. And
thus, the check (!column_is_selected
&& _base_info.has_base_non_pk_columns_in_view_pk) is always
the same as (!column_is_selected).
Because we immediately return after this check, the tail of
this function is also never reached - all checks after the
(column_is_selected) are affected by this. Also, the condition
(!column_is_selected && base_has_nonexpiring_marker) is always
false at the point it is called. And this in turn makes the
`base_has_nonexpiring_marker` unused, so we delete it as well.

It's worth considering, why did we even have
`base_has_nonexpiring_marker` if it's effectively unused. We
initially introduced it in bd52e05ae2 and we (incorrectly)
used it to allow skipping view updates even if the liveness of
virtual columns changed. Soon after, in 5f85a7a821, we
started categorizing virtual columns as column_is_selected == true
and we moved the liveness checks for virtual columns to the
`if (column_is_selected)` clause, before the `base_has_nonexpiring_marker`
check. We changed this because even if we have a nonexpiring marker
right now, it may be changed in the future, in which case the liveness
of the view row will depend on liveness of the virtual columns and
we'll need to have the view updates from the time the row marker was
nonexpiring.

Closes scylladb/scylladb#28838
2026-03-04 08:55:43 +02:00
Botond Dénes
3c34598d88 treewide: move away from tombstone_gc_state(nullptr) ctor
It is ambigous, use the appropriate no-gc or gc-all factories instead,
as appropriate.
A special note for mutation::compacted(): according to the comment above
it, it doesn't drop expired tombstones but as it is currently, it
actually does. Change the tombstone gc param for the underlying call to
compact_for_compaction() to uphold the comment. This is used in tests
mostly, so no fallout expected.

Tests are handled in the next commit, to reduce noise.

Two tests in mutation_test.cc have to be updated:

* test_compactor_range_tombstone_spanning_many_pages
  has to be updated in this commit, as it uses
  mutation_partition::compact_for_query() as well as
  compact_for_query(). The test passes default constructed
  tombstone_gc() to the latter while the former now uses no-gc
  creating a mismatch in tombstone gc behaviour, resulting in test
  failure. Update the test to also pass no-gc to compact_for_query().

* test_query_digest similarly uses mutation_partition::query_mutation()
  and another compaction method, having to match the no-gc now used in
  query_mutation().
2026-03-03 14:09:28 +02:00
Botond Dénes
6364e35403 replica/table: add get_tombstone_gc_state()
Shorthand for
get_compaction_manager().get_shared_tombstone_gc_state().get_tombstone_gc_state().
2026-03-03 14:09:28 +02:00
Alex
c44ad31d44 db/view: gate detached view-builder callbacks during shutdown
Detached migration callbacks (on_create_view, on_update_view, on_drop_view)
  can race with view_builder::drain() teardown.

  Add a lifetime gate to view_builder and wire callback launches through
  _ops_gate.hold() so each detached dispatch future is tracked until it
  completes (finally keeps the hold alive). During shutdown, drain()
  now waits for all tracked callback work with _ops_gate.close().

  This ensures drain does not proceed past callback lifetime while shutdown is in
  progress, and ignores only gate_closed_exception at callback entry as the
  expected shutdown path.
2026-02-18 11:56:41 +02:00
Alex
75e25493c1 db:view: refactor on_update_view to use coroutine dispatcher
on_update_view() currently runs its serialized logic inline via with_semaphore()
  from a detached callback path, while create/drop already use dedicated async
  dispatchers.

  Refactor update handling to follow the same pattern:

  - add dispatch_update_view(sstring ks_name, sstring view_name)
  - move update logic into that coroutine
  - acquire the existing view-builder lock via get_or_adopt_view_builder_lock()
  - keep existing behavior for missing base/view state
  - keep background invocation semantics from on_update_view()

  This aligns update/create/drop flow and keeps async lifecycle handling and a first step to fix shutdown issue.
2026-02-15 18:50:32 +02:00
Pavel Emelyanov
19ea05692c view_build_worker: Do not switch scheduling groups inside work_on_view_building_tasks
The handler appeared back in c9e710dca3. In this commit it performed the
"core" part of the task -- the do_build_range() method -- inside the
streaming sched group. The setup code looks seemingly was copied from the
view_builder::do_build_step() method and got the explicit switch of the
scheduling group.

The switch looks both -- justified and not. On one hand, it makes it
explict that the activity runs in the streaming scheduling group. On the
other hand, the verb already uses RPC index on 1, which is negotiated to
be run in streaming group anyway. On the "third hand", even though being
explicit the switch happens too late, as there exists a lot of other
activities performed by the handler that seems to also belong to the
same scheduling group, but which is not switched into explicitly.

By and large, it seems better to avoid the explicit switch and rely on
the RPC-level negotiation-based sched group switching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#28397
2026-02-03 07:00:32 +02:00
Piotr Dulikowski
ec6a2661de Merge 'Keep view_builder background fiber in maintenance scheduling group' from Pavel Emelyanov
In fact, it's partially there already. When view_builder::start() is called is first calls initialization code (the start_in_background() method), then kicks do_build_step() that runs a background fiber to perform build steps. The starting code inherits scheduling group from main(). And the step fiber code needs to run itself in a maintenance scheduling group, so it explicitly grabs one via database->db_config.

This PR mainly gets rid of the call to database::get_streaming_scheduling_group() from do_build_step() as preparation to splitting the streaming scheduling group into parts (see SCYLLADB-351). To make it happen the do_build_step() is patched to inherit its scheduling group from view_builder::start() and the start() itself is called by main from maintenance scheduling group (like for other view building services).

New feature (nested scheduling group), not backporting

Closes scylladb/scylladb#28386

* github.com:scylladb/scylladb:
  view_builder: Start background in maintenance group
  view_builder: Wake-up step fiber with condition variable
2026-01-28 20:49:19 +01:00
Pavel Emelyanov
3ebd02513a view_builder: Start background in maintenance group
Currently view_builder::start() is called in default scheduling group.
Once it initializes itself, it wakes up the step fiber that explicitly
switches to maintenance scheduling group.

This explicit switch made sence before previous patch, when the fiber
was implemented as a serialized action. Now the fiber starts directly
from .start() method and can inherit scheduling group from it.

Said that, main code calls view_builder::start() in maintenance
scheduling group killing two birds with one stone. First, the step fiber
no longer needs borrow its scheduling group indirectly via database.
Second, the start_in_background() code itself runs in a more suitable
scheduling group.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-01-28 18:34:59 +03:00
Pavel Emelyanov
2439d27b60 view_builder: Wake-up step fiber with condition variable
View builder runs a background fiber that perform build steps. To kick
the fiber it uses serizlized action, but it's an overkill -- nobody
waits for the action to finish, but on stop, when it's joined.

This patch uses condition variable to kick the fiber, and starts it
instantly, in the place where serialized action was first kicked.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-01-28 18:34:58 +03:00
Botond Dénes
ee631f31a0 Merge 'Do not export system keyspace from raft_group0_client' from Pavel Emelyanov
There are few places that use raft_group0_client as a way to get to system_keyspace. Mostly they can live without it -- either the needed reference is already at hand, or it's (ab)used to get to the database reference. The only place that really needs the system keyspace is the state merger code that needs last state ID. For that, the explicit helper method is added to group0_client.

Refining API between components, not backporting

Closes scylladb/scylladb#28387

* github.com:scylladb/scylladb:
  raft_group0_client: Dont export system keyspace
  raft_group0_client: Add and use get_last_group0_state_id()
  group0_state_machine: Call ensure_group0_sched() with data_dictionary
  view_building_worker: Use its own system_keyspace& reference
2026-01-28 13:24:32 +02:00
Emil Maskovsky
834961c308 db/view: add missing include for coroutine::all to fix build without precompiled headers
When building with `--disable-precompiled-header`, view.cc failed to
compile due to missing <seastar/coroutine/all.hh> include, which provides
`coroutine::all`.

The problem doesn't manifest when precompiled headers are used, which is
the default. So that's likely why it was missed by the CI.

Adding the explicit include fixes the build.

Fixes: scylladb/scylladb#28378
Ref: scylladb/scylladb#28093

No backport: This problem is only present in master.

Closes scylladb/scylladb#28379
2026-01-27 18:56:56 +01:00
Pavel Emelyanov
20a2b944df view_building_worker: Use its own system_keyspace& reference
Some code in the worker need to mess with system_keyspace&. While
there's a reference on it from the worker object, it gets one via
group0 -> group0_client, which is a bit an overkill.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-01-27 14:46:48 +03:00
Piotr Dulikowski
5d5e829107 Merge 'db: view: refactor usage and building of semaphore in create and drop views plus change continuation to co routine style' from Alex Dathskovsky
db: view: refactor semaphore usage in create/drop view paths
Refactor the construction and usage of semaphore units in the create and drop view flows.
The previous semaphore handling was hard to follow (as noted while working on https://github.com/scylladb/scylladb/pull/27929), so this change restructures unit creation and movement to follow a clearer and symmetric pattern across shards.

The semaphore usage model is now documented with a detailed in-code comment to make the intended behavior and invariants explicit.

As part of the refactor, the control flow is modernized by replacing continuation-based logic with coroutine-style code, improving readability and maintainability.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-250

backport: not required, this is a refactor

Closes scylladb/scylladb#28093

* github.com:scylladb/scylladb:
  db: view: extend try/catch scope in handle_create_view_local The try/catch region is extended to cover step functions and inner helpers, which may throw or abort during view creation. This change is safe because we are just swolowing more parts that may throw due to semaphore abortion or any other abortion request, and doesnt change the logic
  db: view: refine create/drop coroutine signatures Refactor the create/drop coroutine interfaces to accept parameters as const references, enabling a clearer workflow and safer data flow.
  db: view: switch from continuations to coroutines Refactor the flow and style of create and drop view to use coroutines instead of continuations. This simplifies the logic, improves readability, and makes the code easier to maintain and extend. This commit also utilizes the get_view_builder_units function that was added in the previous commit. this commit also introduces a new alisasing for optional unit type for simpler and more readable functions that use this type
  db: view: introduce helper to acquire or reuse semaphore units Introduce a small helper that acquires semaphore units when needed or reuses units provided by the caller. This centralizes semaphore handling, simplifies the current logic, and enables refactoring the view create/drop path to a coroutine-based implementation instead of continuation-style code.
  db: view: add detailed comments on semaphore bookkeeping and serialized create/drop on shard 0
2026-01-26 17:16:01 +01:00
Alex
954d18903e db: view: extend try/catch scope in handle_create_view_local
The try/catch region is extended to cover step functions and inner helpers,
which may throw or abort during view creation.
This change is safe because we are just swolowing more parts that may throw due to semaphore abortion
or any other abortion request, and doesnt change the logic
2026-01-26 13:10:37 +02:00
Alex
2c3ab8490c db: view: refine create/drop coroutine signatures
Refactor the create/drop coroutine interfaces to accept parameters as
const references, enabling a clearer workflow and safer data flow.
2026-01-26 13:10:34 +02:00
Alex
114f88cb9b db: view: switch from continuations to coroutines
Refactor the flow and style of create and drop view to use coroutines instead of continuations.
This simplifies the logic, improves readability, and makes the code
easier to maintain and extend. This commit also utilizes the get_view_builder_units function that was added in the previous commit.
this commit also introduces a new alisasing for optional unit type for simpler and more readable functions that use this type
2026-01-26 13:08:24 +02:00
Alex
87c1c6f40f db: view: introduce helper to acquire or reuse semaphore units
Introduce a small helper that acquires semaphore units when needed or
reuses units provided by the caller.
This centralizes semaphore handling, simplifies the current logic, and
enables refactoring the view create/drop path to a coroutine-based
implementation instead of continuation-style code.
2026-01-26 13:03:26 +02:00
Alex
1aadedc596 db: view: add detailed comments on semaphore bookkeeping and serialized create/drop on shard 0 2026-01-25 14:29:09 +02:00
Michael Litvak
d5009882c6 locator: document the exception type of assert_rf_rack_valid_keyspace
The function assert_rf_rack_valid_keyspace uses the exception type
std::invalid_argument when the RF-rack validation fails. Document it and
change all callers to catch this specific exception type when checking
for RF-rack validation failures, so that other exception types can be
propagated properly.
2026-01-22 16:11:35 +01:00
Botond Dénes
e01041d3ee db/system_keyspace: move remining tables out of v3 keyspace
The last remining tables in the v3 keyspace are those that are genuinely
distinct -- added by Cassandra 3.0 or >= ScyllaDB 2.0.
Move these out of the v3 keyspace too, with this the v3 keyspace is
defunct and removed.
2026-01-19 12:32:21 +02:00
Avi Kivity
c6dfae5661 treewide: #include Seastar headers with angle brackets
Seastar is an external library from the point of view of
ScyllaDB, so should be included with angle brackets.

Closes scylladb/scylladb#27947
2026-01-13 14:56:15 +02:00
Alex
e430065c92 db: views: serialize create/drop view operations via shard 0
Create and drop view operations are currently performed on all shards, and their execution is not fully serialized. On slower processors this can lead to interleavings that leave stale entries in `system.scylla_views_build`

A problematic sequence looks like this:

* `on_create_view()` runs on shard 0 → entries for shard 0 and shard 1 are created
* `on_drop_view()` runs on shard 0 → entry for shard 0 is removed
* `on_create_view()` runs on shard 1 → entries for shard 0 and shard 1 are created again
* `on_drop_view()` runs on shard 1 → entry for shard 1 is removed, while the shard 0 entry remains

This results in a leftover row in `system.scylla_views_builds_in_progress`, causing `view_build_test.cc` to get stuck indefinitely in an eventual state and eventually be terminated by CI.

This patch fixes the issue by fully serializing all view create and drop operations through shard 0. Shard 0 becomes the single execution point and notifies other shards to perform their work in order. Requests originating.

new process:
  - view_builder::on_create_view(...) runs only on shard 0 and kicks off dispatch_create_view(...) in the background.
  - dispatch_create_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed.
  - dispatch_create_view(...) calls handle_seed_view_build_progress(...) on shard 0. That:
      - writes the global “build progress” row across all shards via _sys_ks.register_view_for_building_for_all_shards(...).
  - After seeding, dispatch_create_view(...) broadcasts to all shards with container().invoke_on_all(...).
  - Each shard runs handle_create_view_local(...), which:
      - waits for pending base writes/streams, flushes the base,
      - resets the reader to the current token and adds the new view,
      - handles errors and triggers _build_step to continue processing.

  Drop view

  - view_builder::on_drop_view(...) runs only on shard 0 and kicks off dispatch_drop_view(...) in the background.
  - dispatch_drop_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed.
  - It broadcasts handle_drop_view_local(...) to all shards with invoke_on_all(...).
  - Each shard runs handle_drop_view_local(...), which:
      - removes the view from local build state (_base_to_build_step and _built_views) by scanning existing steps,
      - ignores missing keyspace cases.
  - After all shards finish local cleanup, shard 0 runs handle_drop_view_global_cleanup(...), which:
      - removes global build progress, built‑view state, and view build status in system tables,

  Shutdown

  - drain() waits on _view_notification_sem before _sem so in‑flight dispatches finish before bookkeeping is halted.

In addition, the test is adjusted to remove the long eventual wait (596.52s / 30 iterations) and instead rely on the default wait of 17 iterations (~4.37 minutes), eliminating unnecessary delays while preserving correctness.

Fixes: https://github.com/scylladb/scylladb/issues/27898
Backport: not required as the problem happens on master

Closes scylladb/scylladb#27929
2026-01-12 09:23:22 +02:00
Botond Dénes
60570d7114 Merge 'topology coordinator: restrict node join/remove to preserve RF-rack validity' from Michael Litvak
Allow creating materialized views and secondary indexes in a tablets keyspace only if it's RF-rack-valid, and enforce RF-rack-validity while the keyspace has views by restricting some operations:
* Altering a keyspace's RF if it would make the keyspace RF-rack-invalid
* Adding a node in a new rack
* Removing / Decommissioning the last node in a rack

Previously the config option `rf_rack_valid_keyspaces` was required for creating views. We now remove this restriction - it's not needed because we always maintain RF-rack-validity for keyspaces with views.

The restrictions are relevant only for keyspaces with numerical RF. Keyspace with rack-list-based RF are always RF-rack-valid.

Fixes scylladb/scylladb#23345
Fixes https://github.com/scylladb/scylladb/issues/26820

backport to relevant versions for materialized views with tablets since it depends on rf-rack validity

Closes scylladb/scylladb#26354

* github.com:scylladb/scylladb:
  docs: update RF-rack restrictions
  cql3: don't apply RF-rack restrictions on vector indexes
  cql3: add warning when creating mv/index with tablets about rf-rack
  service/tablet_allocator: always allow tablet merge of tables with views
  locator: extend rf-rack validation for rack lists
  test: test rf-rack validity when creating keyspace during node ops
  locator: fix rf-rack validation during node join/remove
  test: test topology restrictions for views with tablets
  test: add test_topology_ops_with_rf_rack_valid
  topology coordinator: restrict node join/remove to preserve RF-rack validity
  topology coordinator: add validation to node remove
  locator: extend rf-rack validation functions
  view: change validate_view_keyspace to allow MVs if RF=Racks
  db: enforce rf-rack-validity for keyspaces with views
  replica/db: add enforce_rf_rack_validity_for_keyspace helper
  db: remove enforce parameter from check_rf_rack_validity
  test: adjust test to not break rf-rack validity
2026-01-09 10:01:23 +02:00
Michael Litvak
8f15c7a874 db/view/view_update_generator: move discover_staging_sstables to start
Call discover_staging_sstables in view_update_generator::start() instead
of in the constructor, because the constructor is called during
initialization before sstables are loaded.

The initialization order was changed in 5d1f74b86a and caused this
regression. It means the view update generator won't discover staging
sstables on startup and view updates won't be generated for them. It
also causes issues in sstable cleanup.

view_update_generator::start() is called in a later stage of the
initialization, after sstable loading, so do the discovery of staging
sstables there.

Fixes scylladb/scylladb#27956

Closes scylladb/scylladb#27970
2026-01-08 21:55:19 +02:00
Pavel Emelyanov
e304d912b4 Merge 'db/view/view_building_worker: follow-ups' from Michał Jadwiszczak
This patch consists of a few smaller follow-ups to the view building worker:
- catch general execption in staging task registrator
- remove unnecessary CV broadcast
- don't pollute function context with conditionally compiled variable
- avoid creating a copy of tasks map
- fix some typos

Refs https://github.com/scylladb/scylladb/issues/25929
Refs https://github.com/scylladb/scylladb/pull/26897

This PR doesn't fix any bugs but recently we're backporting some PRs to 2025.4, so let's also backport this one to avoid painful conflicts.

Closes scylladb/scylladb#26558

* github.com:scylladb/scylladb:
  docs/dev/view-building-coordinator: fix typos
  db/view/view_building_worker: remove unnnecessary empty lines
  db/view/view_building_worker: fix typo
  db/view/view_building_worker: avoid creating a copy of tasks map
  db/view/view_building_worker: wrap conditionally compiled code in a scope
  db/view/view_building_worker: remove unnecessary CV broadcast
  db/view/view_building_worker: catch general execption in staging task registrator
2025-12-22 20:02:40 +02:00
Michael Litvak
8df61f6d99 view: change validate_view_keyspace to allow MVs if RF=Racks
The function validate_view_keyspace checks if a keyspace is eligible for
having materialized views, and it is used for validation when creating a
MV or a MV-based index.

Previously, it was required that the rf_rack_valid_keyspaces option is
set in order for tablets-based keyspaces to be considered eligible, and
the RF-rack condition was enforced when the option is set.

Instead of this, we change the validation to allow MVs in a keyspace if
the RF-rack condition is satisfied for the keyspace - regardless of the
config option.

We remove the config validation for views on startup that validates the
option `rf_rack_valid_keyspaces` is set if there are any views with
tablets, since this is not required anymore.

We can do this without worrying about upgrades because this change will
be effective from 2025.4 where MVs with tablets are first out of
experimental phase.

We update the test for MV and index restrictions in tablets keyspaces
according to the new requirements.

* Create MV/index: previously the test checked that it's allowed only if
  the config option `rf_rack_valid_keyspaces` is set. This is changed
  now so it's always allowed to create MV/index if the keyspace is
  RF-rack-valid. Update the test to verify that we can create MV/index
  when the keyspace is RF-rack-valid, even if the rf_rack option is not
  set, and verify that it fails when the keyspace is RF-rack-invalid.
* Alter: Add a new test to verify that while a keyspace has views, it
  can't be altered to become RF-rack-invalid.
2025-12-22 09:14:29 +01:00
Nadav Har'El
95e303faf3 Merge 'Refactor get_view_natural_endpoint' from Wojciech Mitros
With the introduction of rack-lists and the reliance of materialized views on them, the `get_view_natural_endpoint` function can be greatly simplified. When using tablets, instead of doing any index-matching, we can now pair base tables with views only in the same rack.
In this series we remove no longer needed code and reorganize the needed code for better clarity.
After the changes, the `get_view_natural_endpoint` function goes down from 245 lines to 85 lines, while the whole pairing-related text goes down from 346 lines to 239 lines.

Fixes https://github.com/scylladb/scylladb/issues/26313

Closes scylladb/scylladb#27383

* github.com:scylladb/scylladb:
  mv: replace the simple/complex rack-aware pairing with exact rack matching
  mv: split out vnode pairing code from get_view_natural_endpoint
  mv: unify self-pairing and rack-aware pairing into one bool
  mv: remove the workaround for left nodes when sending view updates
2025-12-09 13:19:13 +02:00
Michał Jadwiszczak
529cd25c51 db/view/view_building_worker: remove unnnecessary empty lines 2025-12-04 12:52:42 +01:00
Michał Jadwiszczak
4fc5fcaec4 db/view/view_building_worker: fix typo 2025-12-04 12:52:42 +01:00