If the view building coordinator is running, adjust view_building_tasks
in case of tablet operations.
The mutations are generated in the same batch as tablet mutations.
At the start of tablet migration/resize/RF change, started
view building tasks are aborted (by setting ABORTED state) if needed.
Then, new adjusted tasks are created in group0 batch which ends the
tablet operation and aborted tasks are removed from the table.
In case the tablet operation fails or is revoked, aborted view building
tasks are rollback by creating new copies of them and aborted ones are
deleted from the table.
View building tasks are not aborted/changed during tablet repair,
because in this case, even if vb task is started, a staging sstable will
be generated.
The coordinator is responsible for building tablet-based views.
It schedules tasks for `view_building_worker` and updates views'
statuses.
The tasks are scheduled in a way that one shard is processing only one
tablet at most (there may be multiple tasks since a base table may have
multiple views).
Support for tablet operations will be added in next commits.
Because `system.built_views` is a node-local table, we cannot mark a
view as built directly from the view building coordinator.
Instead, view building worker looks at data from
`syste.view_build_status_v2` and updates `built_views` table
accordingly.
The worker is responsible for building tablet-based views by
executing tasks scheduled by the view building coordinator.
It observes view building state machine and wait on the machine's
conditional variable (so the worker is woken up when group0 state is
applied).
The tasks are executed in batches, all tasks in one batch need to have
the same: type, base_id, table_id. One shard can only execute one batch
at a time (at least for now, in the future we might want to change
that).
That worker keeps track of finished and failed tasks in its local state.
The state is cleared when `view_building_state::currently_processed_base_table`
is changed.
Extract common methods of view builder consumer to an abstract class
and `flush_base()` and `make_partition_slice()` functions,
so they can be used in view builder (vnode-based views) and view
building consumer (tablet-based views; introduced in the next commit).
In next commit, I'm going to introduce `view_building_worker::consumer`,
with very similar functionalities to `view_builder::consumer` but it'll
only consume range of one tablet per execution.
Since most functions are very similar, I'll create abstract
`view_consumer` which will be base for both of the consumers.
In order to make the transition more readable, this commit prepares
the `view_builder::consumer` by making some functions virtual and next
commit will extract part of functions to the abstract class.
The RPC will be used by view building coordinator to attach to and wait
for tasks performed by view building worker (introduced in later
commit).
The RPC gets vector of tasks' ids and returns vector of
`view_task_result`s.
i-th task result reffers to i-th task id.
When a keyspace is dropped, remove all unfinished building tasks for all
views and remove their entries from `system.view_built_status_v2`
and `system.built_views`.
When a view is dropped, remove all unfinished building tasks, remove
entries from `system.view_built_status_v2` and `system.built_views`.
If the view is currently being built, removing its tasks means they
are also aborted.
Finished tasks are already removed from the table.
Create view building tasks in the same batch as new view mutations.
The tasks are created only if `VIEW_BUILDING_COORDINATOR` feature is on
and the view is in tablet keyspace.
After a few next patches, creating/dropping a view in tablet keyspace
will require a remote proxy to obtain references to system keyspace
and view building state.
Because of this, remote proxy needs to be explicitly enabled in boost
tests which create views.
The state may be also reloaded on `topology_change` or `mixed_change`
because topology coordinator may change view building tasks during
tablet operations.
The view building coordinator will be building all views
of one base table at a time.
Select first available base table as currently processing base
and save this information to `system.scylla_local`.
`view_building_state` holds mapping of `view_building_task`s for
tablet-based views. The structure is a memory representation of data
stored in group0 tables.
`views_state` holds information about tablet-based views and their
build status.
Until now, all changes to `system.view_build_status_v2` were made from
view.cc and the file contained all of the helper methods.
This commit introduces a `build_status` enum class to avoid using
hardcoded strings and extracts the helper methods to `system_keyspace`
class, so they can be later used by the view building coordinator.
std::views::trasform()s should not have side effects since they could be
called several times, depending on the algorithm they're paired with.
For example, std::ranges::to() can run the algorithm once to measure
the resulting container size, and then a second time to copy the data
(avoiding reallocations). If that happens, then the side-effect happens
twice.
Avoid this be refactoring the code. Make the side-effect -- appending
to the `column` vector -- happen first, then use that result to generate
the `regular_column` vector.
In this case, the side effect did not happen twice because small_vector's
std::from_range_t constructor only reserves if the input range is sized
(and it is not), but better not have the weakness in the code.
Closesscylladb/scylladb#25011
add --dc and --rack commandline arguments to the scylla docker image, to
allow starting a node with a specified dc and rack names in a simple
way.
This is useful mostly for small examples and demonstrations of starting
multiple nodes with different racks, when we prefer not to bother with
editing configuration files. The ability to assign nodes to different
racks is especially important with RF=Rack enforcing.
The previous method to achieve this is to set the snitch to
GossipingPropertyFileSnitch and provide a configuration file in
/etc/scylla/cassandra-rackdc.properties with the name of the dc and
rack.
The new dc and rack parameters are implemented similarly by using the
snitch GossipingPropertyFileSnitch and writing the dc and rack values to
the rackdc properties file. We don't support passing the parameters
together with a different snitch, or when mounting a properties file
from the host, because we don't want to overwrite it.
Example:
docker run -d --name scylla1 scylladb/scylla --dc my_dc1 --rack my_rack1
Fixesscylladb/scylladb#23423Closesscylladb/scylladb#25607
This series adds support for a DynamoDB-compatible Write Capacity Unit (WCU) calculation in Alternator by introducing an optional forced read-before-write mechanism.
Alternator's model differs from DynamoDB, and as a result, some write operations may report lower WCU usage compared to what DynamoDB would report. While this is acceptable in many cases, there are scenarios where users may require accurate WCU reporting that aligns more closely with DynamoDB's behavior.
To address this, a new configuration option, alternator_force_read_before_write, is introduced. When enabled, Alternator will perform a read before executing PutItem, UpdateItem, and DeleteItem operations. This allows it to take the existing item size into account when computing the WCU. BatchWriteItem support is also extended to use this mechanism. Because BatchWriteItem does not support returning old items directly, several internal changes were made to support reading previous item sizes with minimal overhead. Reads are performed at consistency level LOCAL_ONE for efficiency, and the WCU calculation is now done in multiple stages to accurately account for item size differences.
In addition to the implementation changes, test coverage was added to validate the new behavior. These tests confirm that WCU is calculated based on the larger of the old and new items when read-before-write is active, including for BatchWriteItem.
This feature comes with performance overhead and is therefore disabled by default. It can be enabled at runtime via the system.config table and should be used only when precise WCU tracking is necessary.
**New feature, no need to backport**
Closesscylladb/scylladb#24436
* github.com:scylladb/scylladb:
alternator/test_returnconsumedcapacity.py: Test forced read before write
alternator/executor.cc: DynamoDB WCU calculation in BatchWriteItem using read-before-write
executor.cc: get_previous_item with consistency level
executor: Extend API of put_or_delete_item
alternator/executor.cc: Accurate WCU for put, update, delete
config: add alternator_force_read_before_write
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.
To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.
If the option is turned off, we will also log all of the
RF-rack-invalid keyspaces at start-up.
We provide validation tests.
Fixesscylladb/scylladb#23330
Backport: we'd like to encourage the user to abide by the restriction
even when they don't enforce it to make it easier in the future to
adjust the schema when there's no way to disable it anymore. Because
of that, we'd like to backport it to all relevant versions, starting with 2025.1.
Closesscylladb/scylladb#24785
* github.com:scylladb/scylladb:
main: Log RF-rack-invalid keyspaces at startup
cql3/statements: Fix indentation
cql3: Warn when creating RF-rack-invalid keyspace
test_pinned_cl_segment_doesnt_resurrect_data was not moved in #24946 from
scylla-dtest to this repo, because it's marked as xfail (#14879), but
actually the issue is fixed and there is no reason to keep the test in
scylla-dtest.
Also remove unused imports.
Closesscylladb/scylladb#25592
Consider this:
1) merge finishes, wakes up fiber to merge compaction groups
2) drop table happens, which in turn invokes truncate underneath
3) merge fiber stops old groups
4) truncate disables compaction on all groups, but the ones stopped
5) truncate performs a check that compaction has been disabled on
all groups, including the ones stopped
6) the check fails because groups being stopped didn't have compaction
explicitly disabled on them
To fix it, the check on step 6 will ignore groups that have been
stopped, since those are not eligible for having compaction explicitly
disabled on them. The compaction check is there, so ongoing compaction
will not propagate data being truncated, but here it happens in the
context of drop table which doesn't leave anything behind. Also, a
group stopped is somewhat equivalent to compaction disabled on it,
since the procedure to stop a group stops all ongoing compaction
and eventually removes its state from compaction manager.
Fixes#25551.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#25563
After changing the type of the `recovery_leader` config option from
`sstring` to `UUID` in #25032, setting `recovery_leader` to an empty
string became an incorrect way to unset it. The following error started
to appear in the recovery procedure tests:
```
init - marshaling error: UUID string size mismatch: '' : recovery_leader
```
We unset `recovery_leader` properly in this PR. To do it, we introduce
a simple way to remove config options in tests.
Backport is unneeded. This error was harmless, and Scylla ignored
`recovery_leader` after logging the error as expected by the tests.
Closesscylladb/scylladb#25365
* github.com:scylladb/scylladb:
test: properly unset recovery_leader in the recovery procedure tests
test: manager_client: allow removing a config option
test: manager_client: add docstring to server_update_config
Some cluster tests use `cluster_con` when they need a different load
balancing policy or auth provider. However, no test uses a port
other than 9042 or enables SSL, but all tests must pass `9042, False`
because these parameters don't have default values. This makes the code
more verbose. Also, it's quite obvious that 9042 stands for port, but
it's not obvious what `False` is related to, so there is a need to check
the definition of `cluster_con` while reading any test that uses it.
No reason to backport, it's only a minor refactoring.
Closesscylladb/scylladb#25516
Before this change, `saslauthd_authenticator` prevented dropping
roles. The current documentation instructs users to `Ensure Scylla has
the same users and roles as listed in the LDAP directory`. Therefore,
ScyllaDB should allow dropping roles so administrators can remove
obsolete roles from both LDAP and ScyllaDB.
The code change is minimal — dropping a role is a no-op, similar to the
existing no-op implementations for successful `create` and `alter`
operations.
`saslauthd_authenticator_test` is updated to verify that dropping
a role doesn't throw anymore.
Fixes: scylladb/scylladb#25571Closesscylladb/scylladb#25574
Extend the existing label synchronization system to handle P0 and P1
priority labels in addition to backport/* labels:
- Add P0/P1 label syncing between issues and PRs bidirectionally
- Automatically add 'force_on_cloud' label to PRs when P0/P1 labels
are present (either copied from issues or added directly)
The workflow now triggers on P0 and P1 label events in addition to
backport/* labels, ensuring priority labels are properly reflected
across the entire PR lifecycle.
Refs: https://github.com/scylladb/scylla-pkg/issues/5383Closesscylladb/scylladb#25604
When the configuration option `rf_rack_valid_keyspaces` is enabled and there
is an RF-rack-invalid keyspace, starting a node fails. However, when the
configuration option is disabled, but there still is a keyspace that violates
the condition, we'd like Scylla to print a warning informing the user about
the fact. That's what happens in this commit.
We provide a validation test.
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.
To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.
We provide a validation test.
There is the stash item REPEATED_FILES for directory items which used to cut
recursion. But if multiple tests from one directory added to ./test.py
commandline this solution prevents handling non-first tests well because
it was already collected for the first one. Change behavior to not store
all repeated files in the stash but just files which are in the process
of repetition. Rename the stash item to REPEATING_FILES to reflect this
change.
Closesscylladb/scylladb#25611
Content of the HTTP error was logged in Scylla as literal
list of chars (default temporary buffer formatting).
Changed to print the sstring made out of temporary buffer,
which fixes the problem with formatting, making the output
clear and readable for humans.
Fixes: VECTOR-141
Closesscylladb/scylladb#25329