Commit Graph

49102 Commits

Author SHA1 Message Date
Michał Jadwiszczak
201c4fafec db/view/view_building_coordinator: update view build status on node join/left
Copy view build status for new node for tablet views and remove
relevant statuses when a node is leaving the cluster.
2025-08-27 10:23:03 +02:00
Michał Jadwiszczak
b855786804 db/view/view_building_coordinator: handle tablet operations
If the view building coordinator is running, adjust view_building_tasks
in case of tablet operations.

The mutations are generated in the same batch as tablet mutations.
At the start of tablet migration/resize/RF change, started
view building tasks are aborted (by setting ABORTED state) if needed.
Then, new adjusted tasks are created in group0 batch which ends the
tablet operation and aborted tasks are removed from the table.

In case the tablet operation fails or is revoked, aborted view building
tasks are rollback by creating new copies of them and aborted ones are
deleted from the table.

View building tasks are not aborted/changed during tablet repair,
because in this case, even if vb task is started, a staging sstable will
be generated.
2025-08-27 10:23:03 +02:00
Michał Jadwiszczak
56df5acd77 db/view: add view building task mutation builder 2025-08-27 10:23:03 +02:00
Michał Jadwiszczak
9312cd83c5 service/topology_coordinator: run view building coordinator
Run view building coordinator alongside topology coordinator
once the feature is available.
2025-08-27 10:23:03 +02:00
Michał Jadwiszczak
08c9e6b9bb db/view: introduce view_building_coordinator
The coordinator is responsible for building tablet-based views.
It schedules tasks for `view_building_worker` and updates views'
statuses.

The tasks are scheduled in a way that one shard is processing only one
tablet at most (there may be multiple tasks since a base table may have
multiple views).

Support for tablet operations will be added in next commits.
2025-08-27 10:23:03 +02:00
Michał Jadwiszczak
2b3e1682d7 db/view/view_building_worker: update built views locally
Because `system.built_views` is a node-local table, we cannot mark a
view as built directly from the view building coordinator.

Instead, view building worker looks at data from
`syste.view_build_status_v2` and updates `built_views` table
accordingly.
2025-08-27 10:23:03 +02:00
Michał Jadwiszczak
c9e710dca3 db/view: introduce view_building_worker
The worker is responsible for building tablet-based views by
executing tasks scheduled by the view building coordinator.

It observes view building state machine and wait on the machine's
conditional variable (so the worker is woken up when group0 state is
applied).
The tasks are executed in batches, all tasks in one batch need to have
the same: type, base_id, table_id. One shard can only execute one batch
at a time (at least for now, in the future we might want to change
that).

That worker keeps track of finished and failed tasks in its local state.
The state is cleared when `view_building_state::currently_processed_base_table`
is changed.
2025-08-27 10:22:59 +02:00
Michał Jadwiszczak
a59624c604 db/view: extract common view building functionalities
Extract common methods of view builder consumer to an abstract class
and `flush_base()` and `make_partition_slice()` functions,
so they can be used in view builder (vnode-based views) and view
building consumer (tablet-based views; introduced in the next commit).
2025-08-27 08:55:48 +02:00
Michał Jadwiszczak
f71594738e db/view: prepare to create abstract view_consumer
In next commit, I'm going to introduce `view_building_worker::consumer`,
with very similar functionalities to `view_builder::consumer` but it'll
only consume range of one tablet per execution.

Since most functions are very similar, I'll create abstract
`view_consumer` which will be base for both of the consumers.

In order to make the transition more readable, this commit prepares
the `view_builder::consumer` by making some functions virtual and next
commit will extract part of functions to the abstract class.
2025-08-27 08:55:48 +02:00
Michał Jadwiszczak
e901b6fde4 message/messaging_service: add work_on_view_building_tasks RPC
The RPC will be used by view building coordinator to attach to and wait
for tasks performed by view building worker (introduced in later
commit).

The RPC gets vector of tasks' ids and returns vector of
`view_task_result`s.
i-th task result reffers to i-th task id.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
7c73792194 service/topology_coordinator: make term_changed_error public
View building coordinator may also throw `term_changed_error`.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
6e3e287a39 db/schema_tables: create/cleanup tasks when an index is created/dropped
Similarly as in previous commits, create view building tasks when an
index is created and cleanup view building status when it's dropped.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
76caaea3f1 service/migration_manager: cleanup view building state on drop keyspace
When a keyspace is dropped, remove all unfinished building tasks for all
views and remove their entries from `system.view_built_status_v2`
and `system.built_views`.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
f10c5c4493 service/migration_manager: cleanup view building state on drop view
When a view is dropped, remove all unfinished building tasks, remove
entries from `system.view_built_status_v2` and `system.built_views`.

If the view is currently being built, removing its tasks means they
are also aborted.
Finished tasks are already removed from the table.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
6d1fbf06ed service/migration_manager: create view building tasks on create view
Create view building tasks in the same batch as new view mutations.
The tasks are created only if `VIEW_BUILDING_COORDINATOR` feature is on
and the view is in tablet keyspace.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
19651b4978 test/boost: enable proxy remote in some tests
After a few next patches, creating/dropping a view in tablet keyspace
will require a remote proxy to obtain references to system keyspace
and view building state.
Because of this, remote proxy needs to be explicitly enabled in boost
tests which create views.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
204f61ffe1 service/migration_manager: pass storage_proxy to prepare_keyspace_drop_announcement()
The reference is needed to get `view_building_state_machine`.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
76a6dd82fd service/migration_manager: coroutinize prepare_new_view_announcement() 2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
d2e1b6d44a service/storage_proxy: expose references to system_keyspace and view_building_state_machine
Those references are needed to manage view building tasks while a view
is created/dropped.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
f2e7051a84 service: reload view_building_state_machine on group0 apply()
The state may be also reloaded on `topology_change` or `mixed_change`
because topology coordinator may change view building tasks during
tablet operations.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
d5d81591db service/vb_coordinator: add currently processing base
The view building coordinator will be building all views
of one base table at a time.
Select first available base table as currently processing base
and save this information to `system.scylla_local`.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
e0184377ca db/system_keyspace: move get_scylla_local_mutation() up
So it can be used by all helper functions, keeping the logical order from
the header file.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
46a24d960d db/system_keyspace: add view_building_tasks table
The table is managed by group0 and uses schema commitlog.
The commit also includes helper functions.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
44890a52a8 db/view: add view_building_state and views_state
`view_building_state` holds mapping of `view_building_task`s for
tablet-based views. The structure is a memory representation of data
stored in group0 tables.

`views_state` holds information about tablet-based views and their
build status.
2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
ce1890e512 db/system_keyspace: add method to get view build status map 2025-08-27 08:55:47 +02:00
Michał Jadwiszczak
f90dd522df db/view: extract system.view_build_status_v2 cql statements to system_keyspace
Until now, all changes to `system.view_build_status_v2` were made from
view.cc and the file contained all of the helper methods.

This commit introduces a `build_status` enum class to avoid using
hardcoded strings and extracts the helper methods to `system_keyspace`
class, so they can be later used by the view building coordinator.
2025-08-27 08:55:46 +02:00
Michał Jadwiszczak
18609adfce db/system_keyspace: move internal_system_query_state() function earlier
So it can be used in all system keyspace proxy methods while maintaining
the same order as in the header file.
2025-08-27 08:55:46 +02:00
Michał Jadwiszczak
d0826e7cb1 db/view: ignore tablet-based views in view_builder
View building of tablet-based views will be handled by
the view building coordinator later in this patch.
2025-08-27 08:55:46 +02:00
Michał Jadwiszczak
7dba3667c9 gms/feature_service: add VIEW_BUILDING_COORDINATOR feature 2025-08-27 08:55:46 +02:00
Avi Kivity
6dc2c42f8b alternator: streams: refactor std::views::transform with side effect
std::views::trasform()s should not have side effects since they could be
called several times, depending on the algorithm they're paired with.
For example, std::ranges::to() can run the algorithm once to measure
the resulting container size, and then a second time to copy the data
(avoiding reallocations). If that happens, then the side-effect happens
twice.

Avoid this be refactoring the code. Make the side-effect -- appending
to the `column` vector -- happen first, then use that result to generate
the `regular_column` vector.

In this case, the side effect did not happen twice because small_vector's
std::from_range_t constructor only reserves if the input range is sized
(and it is not), but better not have the weakness in the code.

Closes scylladb/scylladb#25011
2025-08-25 09:40:05 +03:00
Michael Litvak
25fb3b49fa dist/docker: add dc and rack arguments
add --dc and --rack commandline arguments to the scylla docker image, to
allow starting a node with a specified dc and rack names in a simple
way.

This is useful mostly for small examples and demonstrations of starting
multiple nodes with different racks, when we prefer not to bother with
editing configuration files. The ability to assign nodes to different
racks is especially important with RF=Rack enforcing.

The previous method to achieve this is to set the snitch to
GossipingPropertyFileSnitch and provide a configuration file in
/etc/scylla/cassandra-rackdc.properties with the name of the dc and
rack.

The new dc and rack parameters are implemented similarly by using the
snitch GossipingPropertyFileSnitch and writing the dc and rack values to
the rackdc properties file. We don't support passing the parameters
together with a different snitch, or when mounting a properties file
from the host, because we don't want to overwrite it.

Example:
docker run -d --name scylla1 scylladb/scylla --dc my_dc1 --rack my_rack1

Fixes scylladb/scylladb#23423

Closes scylladb/scylladb#25607
2025-08-24 17:48:07 +03:00
Nadav Har'El
87dd96f9a2 Merge ' Alternator: DynamoDB compatible WCU Calculation via Read-Before-Write Support' from Amnon Heiman
This series adds support for a DynamoDB-compatible Write Capacity Unit (WCU) calculation in Alternator by introducing an optional forced read-before-write mechanism.

Alternator's model differs from DynamoDB, and as a result, some write operations may report lower WCU usage compared to what DynamoDB would report. While this is acceptable in many cases, there are scenarios where users may require accurate WCU reporting that aligns more closely with DynamoDB's behavior.

To address this, a new configuration option, alternator_force_read_before_write, is introduced. When enabled, Alternator will perform a read before executing PutItem, UpdateItem, and DeleteItem operations. This allows it to take the existing item size into account when computing the WCU. BatchWriteItem support is also extended to use this mechanism. Because BatchWriteItem does not support returning old items directly, several internal changes were made to support reading previous item sizes with minimal overhead. Reads are performed at consistency level LOCAL_ONE for efficiency, and the WCU calculation is now done in multiple stages to accurately account for item size differences.

In addition to the implementation changes, test coverage was added to validate the new behavior. These tests confirm that WCU is calculated based on the larger of the old and new items when read-before-write is active, including for BatchWriteItem.

This feature comes with performance overhead and is therefore disabled by default. It can be enabled at runtime via the system.config table and should be used only when precise WCU tracking is necessary.
**New feature, no need to backport**

Closes scylladb/scylladb#24436

* github.com:scylladb/scylladb:
  alternator/test_returnconsumedcapacity.py: Test forced read before write
  alternator/executor.cc: DynamoDB WCU calculation in BatchWriteItem using read-before-write
  executor.cc: get_previous_item with consistency level
  executor: Extend API of put_or_delete_item
  alternator/executor.cc: Accurate WCU for put, update, delete
  config: add alternator_force_read_before_write
2025-08-24 11:38:24 +03:00
Avi Kivity
8815491085 treewide: include boost headers as "system" headers
Boost is external to the project so treat its headers as "system"
headers and include them with angle brackets.

Closes scylladb/scylladb#25619
2025-08-22 17:21:24 +03:00
Piotr Dulikowski
5709d94826 Merge 'cql3: Warn when creating RF-rack-invalid keyspace' from Dawid Mędrek
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.

To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.

If the option is turned off, we will also log all of the
RF-rack-invalid keyspaces at start-up.

We provide validation tests.

Fixes scylladb/scylladb#23330

Backport: we'd like to encourage the user to abide by the restriction
even when they don't enforce it to make it easier in the future to
adjust the schema when there's no way to disable it anymore. Because
of that, we'd like to backport it to all relevant versions, starting with 2025.1.

Closes scylladb/scylladb#24785

* github.com:scylladb/scylladb:
  main: Log RF-rack-invalid keyspaces at startup
  cql3/statements: Fix indentation
  cql3: Warn when creating RF-rack-invalid keyspace
2025-08-22 11:33:32 +02:00
Evgeniy Naydanov
ab15c94a09 test.py: dtest/commitlog_test: add test_pinned_cl_segment_doesnt_resurrect_data
test_pinned_cl_segment_doesnt_resurrect_data was not moved in #24946 from
scylla-dtest to this repo, because it's marked as xfail (#14879), but
actually the issue is fixed and there is no reason to keep the test in
scylla-dtest.

Also remove unused imports.

Closes scylladb/scylladb#25592
2025-08-22 11:30:10 +03:00
Raphael S. Carvalho
149f9d8448 replica: Fix race between drop table and merge completion handling
Consider this:
1) merge finishes, wakes up fiber to merge compaction groups
2) drop table happens, which in turn invokes truncate underneath
3) merge fiber stops old groups
4) truncate disables compaction on all groups, but the ones stopped
5) truncate performs a check that compaction has been disabled on
all groups, including the ones stopped
6) the check fails because groups being stopped didn't have compaction
explicitly disabled on them

To fix it, the check on step 6 will ignore groups that have been
stopped, since those are not eligible for having compaction explicitly
disabled on them. The compaction check is there, so ongoing compaction
will not propagate data being truncated, but here it happens in the
context of drop table which doesn't leave anything behind. Also, a
group stopped is somewhat equivalent to compaction disabled on it,
since the procedure to stop a group stops all ongoing compaction
and eventually removes its state from compaction manager.

Fixes #25551.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#25563
2025-08-22 10:19:43 +03:00
kendrick-ren
d6e62aeb6a Update launch-on-gcp.rst
Add the missing '=' mark in --zone option. Otherwise the command complains.

Closes scylladb/scylladb#25471
2025-08-22 10:13:52 +03:00
Botond Dénes
3dcb596201 Merge 'test: properly unset recovery_leader in the recovery procedure tests' from Patryk Jędrzejczak
After changing the type of the `recovery_leader` config option from
`sstring` to `UUID` in #25032, setting `recovery_leader` to an empty
string became an incorrect way to unset it. The following error started
to appear in the recovery procedure tests:
```
init - marshaling error: UUID string size mismatch: '' : recovery_leader
```
We unset `recovery_leader` properly in this PR. To do it, we introduce
a simple way to remove config options in tests.

Backport is unneeded. This error was harmless, and Scylla ignored
`recovery_leader` after logging the error as expected by the tests.

Closes scylladb/scylladb#25365

* github.com:scylladb/scylladb:
  test: properly unset recovery_leader in the recovery procedure tests
  test: manager_client: allow removing a config option
  test: manager_client: add docstring to server_update_config
2025-08-22 10:09:37 +03:00
Benny Halevy
45c496c276 api: storage_service: fix token_range documentation
Note that the token_range type is used only by describe_ring.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25609
2025-08-22 10:06:21 +03:00
Patryk Jędrzejczak
193a74576a test/cluster/conftest: cluster_con: provide default values for port and use_ssl
Some cluster tests use `cluster_con` when they need a different load
balancing policy or auth provider. However, no test uses a port
other than 9042 or enables SSL, but all tests must pass `9042, False`
because these parameters don't have default values. This makes the code
more verbose. Also, it's quite obvious that 9042 stands for port, but
it's not obvious what `False` is related to, so there is a need to check
the definition of `cluster_con` while reading any test that uses it.

No reason to backport, it's only a minor refactoring.

Closes scylladb/scylladb#25516
2025-08-22 09:51:24 +03:00
David Garcia
07d798a59d docs: fix sidebar on local preview
Closes scylladb/scylladb#25560
2025-08-22 09:50:07 +03:00
David Garcia
c3c70ba73f docs: expose alternator metrics
Renders in the docs some metrics introduced in https://github.com/scylladb/scylladb/pull/24046/files that were not being displayed in https://docs.scylladb.com/manual/stable/reference/metrics.html

Closes scylladb/scylladb#25561
2025-08-22 09:49:52 +03:00
David Garcia
461a0bad8a docs: do not show any version warning for upgrade guide pages
Closes scylladb/scylladb#25562
2025-08-22 09:49:27 +03:00
Avi Kivity
e140bd6355 Update seastar submodule
* seastar 1520326e6...0a90f7945 (13):
  > include: Keep linux-aio.hh deeper in internal
  > include: Move array_map.hh to util/internal/
  > io_tester: Add support for scheduling supergroups
  > Merge 'tls: Force buffer splitting into gnutls record block sized chunks' from Calle Wilund
  > Add iotune --get-best-iops-with-buffer-sizes option
  > noncopyable_function: use memcpy instead of bytewise copy loop
  > test: Make fair_queue_test validation code use BOOST_CHECK_...-s
  > test: Rework test_fair_queue_random_run
  > Merge 'Remove capacity configuration for fair_queue tests' from Pavel Emelyanov
  > reactor: replace boost::barrier with std::barrier<>
  > rpc: server::process(): reindent
  > test: Remove no-op dispatch from fair_queue ticker
  > Merge 'json: API level 8: use noncopyable_function in json_return_type' from Benny (#2921)

Closes scylladb/scylladb#25624
2025-08-22 09:41:02 +03:00
Andrzej Jackowski
86fc513bd9 auth: allow dropping roles in saslauthd_authenticator
Before this change, `saslauthd_authenticator` prevented dropping
roles. The current documentation instructs users to `Ensure Scylla has
the same users and roles as listed in the LDAP directory`. Therefore,
ScyllaDB should allow dropping roles so administrators can remove
obsolete roles from both LDAP and ScyllaDB.

The code change is minimal — dropping a role is a no-op, similar to the
existing no-op implementations for successful `create` and `alter`
operations.

`saslauthd_authenticator_test` is updated to verify that dropping
a role doesn't throw anymore.

Fixes: scylladb/scylladb#25571

Closes scylladb/scylladb#25574
2025-08-22 09:40:44 +03:00
Yaron Kaikov
c0fd3deeab github: Enhance label sync to support P0/P1 priority labels
Extend the existing label synchronization system to handle P0 and P1
priority labels in addition to backport/* labels:

- Add P0/P1 label syncing between issues and PRs bidirectionally
- Automatically add 'force_on_cloud' label to PRs when P0/P1 labels
  are present (either copied from issues or added directly)

The workflow now triggers on P0 and P1 label events in addition to
backport/* labels, ensuring priority labels are properly reflected
across the entire PR lifecycle.

Refs: https://github.com/scylladb/scylla-pkg/issues/5383

Closes scylladb/scylladb#25604
2025-08-22 06:50:13 +03:00
Dawid Mędrek
837d267cbf main: Log RF-rack-invalid keyspaces at startup
When the configuration option `rf_rack_valid_keyspaces` is enabled and there
is an RF-rack-invalid keyspace, starting a node fails. However, when the
configuration option is disabled, but there still is a keyspace that violates
the condition, we'd like Scylla to print a warning informing the user about
the fact. That's what happens in this commit.

We provide a validation test.
2025-08-21 19:35:33 +02:00
Dawid Mędrek
af8a3dd17b cql3/statements: Fix indentation 2025-08-21 19:29:36 +02:00
Dawid Mędrek
60ea22d887 cql3: Warn when creating RF-rack-invalid keyspace
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.

To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.

We provide a validation test.
2025-08-21 19:29:33 +02:00
Evgeniy Naydanov
3a98331731 test.py: don't fail if use multiple tests from one dir in commandline
There is the stash item REPEATED_FILES for directory items which used to cut
recursion.  But if multiple tests from one directory added to ./test.py
commandline this solution prevents handling non-first tests well because
it was already collected for the first one.  Change behavior to not store
all repeated files in the stash but just files which are in the process
of repetition.  Rename the stash item to REPEATING_FILES to reflect this
change.

Closes scylladb/scylladb#25611
2025-08-21 19:43:13 +03:00