Commit Graph

1757 Commits

Author SHA1 Message Date
Sayanta Banerjee
6092520631 Small grammatical changes
Closes scylladb/scylladb#24667
2025-09-12 15:58:19 +03:00
Radosław Cybulski
436150eb52 treewide: fix spelling errors
Fix spelling errors reported by copilot on github.
Remove single use namespace alias.

Closes scylladb/scylladb#25960
2025-09-12 15:58:19 +03:00
Asias He
cb7db47ae1 repair: Add incremental_mode option for tablet repair
This patch introduces a new `incremental_mode` parameter to the tablet
repair REST API, providing more fine-grained control over the
incremental repair process.

Previously, incremental repair was on and could not be turned off. This
change allows users to select from three distinct modes:

- `regular`: This is the default mode. It performs a standard
  incremental repair, processing only unrepaired sstables and skipping
  those that are already repaired. The repair state (`repaired_at`,
  `sstables_repaired_at`) is updated.

- `full`: This mode forces the repair to process all sstables, including
  those that have been previously repaired. This is useful when a full
  data validation is needed without disabling the incremental repair
  feature. The repair state is updated.

- `disabled`: This mode completely disables the incremental repair logic
  for the current repair operation. It behaves like a classic
  (pre-incremental) repair, and it does not update any incremental
  repair state (`repaired_at` in sstables or `sstables_repaired_at` in
  the system.tablets table).

The implementation includes:

- Adding the `incremental_mode` parameter to the
  `/storage_service/repair/tablet` API endpoint.
- Updating the internal repair logic to handle the different modes.
- Adding a new test case to verify the behavior of each mode.
- Updating the API documentation and developer documentation.

Fixes #25605

Closes scylladb/scylladb#25693
2025-09-09 06:50:21 +03:00
Anna Stuchlik
f66580a28f doc: add support for i7i instances
This commit adds currently supported i7i and i7ie instances
to the list of instance recommendations.

Fixes https://github.com/scylladb/scylladb/issues/25808

Closes scylladb/scylladb#25817
2025-09-05 14:14:58 +02:00
dependabot[bot]
862f965196 build(deps): bump sphinx-scylladb-theme from 1.8.7 to 1.8.8 in /docs
Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.8.7 to 1.8.8.
- [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases)
- [Commits](https://github.com/scylladb/sphinx-scylladb-theme/compare/1.8.7...1.8.8)

---
updated-dependencies:
- dependency-name: sphinx-scylladb-theme
  dependency-version: 1.8.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Closes scylladb/scylladb#25823
2025-09-04 18:24:09 +03:00
Pavel Emelyanov
b0aa2d61d9 Merge 'cql3: add default replication factor to create_keyspace_statement' from Dario Mirovic
When creating a new keyspace, replication factor must be stated.
For example:
`CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3 };`

This patch changes it in the following way - if there is no
replication factor specified, use default replication factor.
Default replication factor is equal to the number of racks that
are not arbiter-only, i.e. racks that have at least one non-arbiter node.
The following syntax is now valid:
`CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy' };`
`CREATE KEYSPACE ks WITH REPLICATION { };`

Fixes #16028

Backport is not needed. This is an enhancement for future releases.

Closes scylladb/scylladb#25570

* github.com:scylladb/scylladb:
  docs/cql: update documentation for default replication factor
  test/cqlpy: add keyspace creation default replication factor tests
  cql3: add default replication factor to `create_keyspace_statement`
2025-09-03 12:31:53 +03:00
Piotr Wieczorek
5add43e15c alternator: streams: Address minor incompatibilities with DynamoDB in GetRecords response.
This commit adds missing fields to GetRecords responses: `awsRegion` and
`eventVersion`. We also considered changing `eventSource` from
`scylladb:alternator` to `aws:dynamodb` and setting `SizeBytes` subfield
inside the `dynamodb` field.

We set `awsRegion` to the datacenter's name of the node that received
the request. This is in line with the AWS documentation, except that
Scylla has no direct equivalent of a region, so we use the datacenter's
name, which is analogous to DynamoDB's concept of region.

The field `eventVersion` determines the structure of a Record. It is
updated whenever the structure changes. We think that adding a field
`userIdentity` bumped the version from `1.0` to `1.1`. Currently, Scylla
doesn't support this field (#11523), hence we use the older 1.0 version.

We have decided to leave `eventSource` as is, since it's easy to modify
it in case of problems to `aws:dynamodb` used by DynamoDB.

Not setting `SizeBytes` subfield inside the `dynamodb` field was
dictated by the lack of apparent use cases. The documentation is unclear
about how `SizeBytes` is calculated and after experimenting a little
bit, I haven't found an obvious pattern.

Fixes: #6931

Closes scylladb/scylladb#24903
2025-08-31 14:55:47 +03:00
Avi Kivity
bc5773f777 Merge 'Add out of space prevention mechanisms' from Łukasz Paszkowski
When a scaling out is delayed or fails, it is crucial to ensure that clusters remain operational
and recoverable even under extreme conditions. To achieve this, the following proactive measures
are implemented:
- reject writes
      - includes: inserts, updates, deletes, counter updates, hints, read+repair and lwt writes
      - applicable to: user tables, views, CDC log, audit, cql tracing
- stop running compactions/repairs and prevent from starting new ones
- reject incoming tablet migrations

The aforementioned mechanisms are automatically enabled when node's disk utilization reaches
the critical level (default: 98%) and disabled when the utilization drop below the threshold.

Apart from that, the series add tests that require mounted volumes to simulate out of space.
The paths to the volumes can be provided using the a pytest argument, i.e.  `--space-limited-dirs`.
When not provided, tests are skipped.

Test scenarios:

1. Start a cluster and write data until one of the nodes reaches 90% of the disk utilization
2. Perform an **operation** that would take the nodes over 100%
3. The nodes should not exceed the critical disk utilization (98% by default)
4. Scale out the cluster by adding one node per rack
5. Retry or wait for the **operation** from step 2

The **operation** is: writing data, running compactions, building materialized views, running repair,
migrating tablets (caused by RF change, decommission).

The test is successful, if no nodes run out of space, the **operation** from step 2 is
aborted/paused/timed out and the **operation** from step 5 is successful.

`perf-simple-query --smp 1 -m 1G` results obtained for fixed 400MHz frequency:

Read path (before)

```
instructions_per_op:
	mean=   39661.51 standard-deviation=34.53
	median= 39655.39 median-absolute-deviation=23.33
	maximum=39708.71 minimum=39622.61
```

Read path (after)

```
instructions_per_op:
	mean=   39691.68 standard-deviation=34.54
	median= 39683.14 median-absolute-deviation=11.94
	maximum=39749.32 minimum=39656.63
```

Write path (before):

```
instructions_per_op:
	mean=   50942.86 standard-deviation=97.69
	median= 50974.11 median-absolute-deviation=34.25
	maximum=51019.23 minimum=50771.60
```

Write path (after):

```
instructions_per_op:
	mean=   51000.15 standard-deviation=115.04
	median= 51043.93 median-absolute-deviation=52.19
	maximum=51065.81 minimum=50795.00
```

Fixes: https://github.com/scylladb/scylladb/issues/14067
Refs: https://github.com/scylladb/scylladb/issues/2871

No backport, as it is a new feature.

Closes scylladb/scylladb#23917

* github.com:scylladb/scylladb:
  tests/cluster: Add new storage tests
  test/scylla_cluster: Override workdir when passed via cmdline
  streaming: Reject incoming migrations
  storage_service: extend locator::load_stats to collect per-node critical disk utilization flag
  repair_service: Add a facility to disable the service
  compaction_manager: Subscribe to out of space controller
  compaction_manager: Replace enabled/disabled states with running state
  database: Add critical_disk_utilization mode database can be moved to
  disk_space_monitor: add subscription API for threshold-based disk space monitoring
  docs: Add feature documentation
  config: Add critical_disk_utilization_level option
  replica/exceptions: Add a new custom replica exception
2025-08-30 18:47:57 +03:00
Piotr Dulikowski
7ccb50514d Merge 'Introduce view building coordinator' from Michał Jadwiszczak
This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views.

The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table.
The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built.
The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results.
This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch.

The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation.
If the operation fails at any moment, aborted tasks are rollback.

The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149).

For detailed description check: `docs/dev/view-building-coordinator.md`

Fixes https://github.com/scylladb/scylladb/issues/22288
Fixes https://github.com/scylladb/scylladb/issues/19149
Fixes https://github.com/scylladb/scylladb/issues/21564
Fixes https://github.com/scylladb/scylladb/issues/17603
Fixes https://github.com/scylladb/scylladb/issues/22586
Fixes https://github.com/scylladb/scylladb/issues/18826
Fixes https://github.com/scylladb/scylladb/issues/23930

---

This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942

Closes scylladb/scylladb#23760

* github.com:scylladb/scylladb:
  test/cluster: add view build status tests
  test/cluster: add view building coordinator tests
  utils/error_injection: allow to abort `injection_handler::wait_for_message()`
  test: adjust existing tests
  utils/error_injection: add injection with `sleep_abortable()`
  db/view/view_builder: ignore `no_such_keyspace` exception
  docs/dev: add view building coordinator documentation
  db/view/view_building_worker: work on `process_staging` tasks
  db/view/view_building_worker: register staging sstable to view building coordinator when needed
  db/view/view_building_worker: discover staging sstables
  db/view/view_building_worker: add method to register staging sstable
  db/view/view_update_generator: add method to process staging sstables instantly
  db/view/view_update_generator: extract generating updates from staging sstables to a method
  db/view/view_update_generator: ignore tablet-based sstables
  db/view/view_building_coordinator: update view build status on node join/left
  db/view/view_building_coordinator: handle tablet operations
  db/view: add view building task mutation builder
  service/topology_coordinator: run view building coordinator
  db/view: introduce `view_building_coordinator`
  db/view/view_building_worker: update built views locally
  db/view: introduce `view_building_worker`
  db/view: extract common view building functionalities
  db/view: prepare to create abstract `view_consumer`
  message/messaging_service: add `work_on_view_building_tasks` RPC
  service/topology_coordinator: make `term_changed_error` public
  db/schema_tables: create/cleanup tasks when an index is created/dropped
  service/migration_manager: cleanup view building state on drop keyspace
  service/migration_manager: cleanup view building state on drop view
  service/migration_manager: create view building tasks on create view
  test/boost: enable proxy remote in some tests
  service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()`
  service/migration_manager: coroutinize `prepare_new_view_announcement()`
  service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine`
  service: reload `view_building_state_machine` on group0 apply()
  service/vb_coordinator: add currently processing base
  db/system_keyspace: move `get_scylla_local_mutation()` up
  db/system_keyspace: add `view_building_tasks` table
  db/view: add view_building_state and views_state
  db/system_keyspace: add method to get view build status map
  db/view: extract `system.view_build_status_v2` cql statements to system_keyspace
  db/system_keyspace: move `internal_system_query_state()` function earlier
  db/view: ignore tablet-based views in `view_builder`
  gms/feature_service: add VIEW_BUILDING_COORDINATOR feature
2025-08-29 17:28:44 +02:00
Łukasz Paszkowski
c2de678a87 docs: Add feature documentation
1. Adds user-facing page in /docs/troubleshooting/error-messages
2025-08-28 18:06:37 +02:00
Ran Regev
515d9f3e21 docs: backup and restore feature
added backup and restore as a feature
to documentation

Signed-off-by: Ran Regev <ran.regev@scylladb.com>

Closes scylladb/scylladb#25608
2025-08-28 13:00:19 +03:00
Dario Mirovic
587a877718 docs/cql: update documentation for default replication factor
Update create-keyspace-statement section of ddl.rst since replication factor is no longer mandatory.
Add an example for keyspace creation without specifying replication factor.
Add an example for keyspace creation without specifying both `class` and replication factor.

Refs: #16028
2025-08-28 01:42:34 +02:00
Michał Jadwiszczak
c4288aa1f8 docs/dev: add view building coordinator documentation 2025-08-27 10:23:04 +02:00
Michael Litvak
25fb3b49fa dist/docker: add dc and rack arguments
add --dc and --rack commandline arguments to the scylla docker image, to
allow starting a node with a specified dc and rack names in a simple
way.

This is useful mostly for small examples and demonstrations of starting
multiple nodes with different racks, when we prefer not to bother with
editing configuration files. The ability to assign nodes to different
racks is especially important with RF=Rack enforcing.

The previous method to achieve this is to set the snitch to
GossipingPropertyFileSnitch and provide a configuration file in
/etc/scylla/cassandra-rackdc.properties with the name of the dc and
rack.

The new dc and rack parameters are implemented similarly by using the
snitch GossipingPropertyFileSnitch and writing the dc and rack values to
the rackdc properties file. We don't support passing the parameters
together with a different snitch, or when mounting a properties file
from the host, because we don't want to overwrite it.

Example:
docker run -d --name scylla1 scylladb/scylla --dc my_dc1 --rack my_rack1

Fixes scylladb/scylladb#23423

Closes scylladb/scylladb#25607
2025-08-24 17:48:07 +03:00
kendrick-ren
d6e62aeb6a Update launch-on-gcp.rst
Add the missing '=' mark in --zone option. Otherwise the command complains.

Closes scylladb/scylladb#25471
2025-08-22 10:13:52 +03:00
David Garcia
07d798a59d docs: fix sidebar on local preview
Closes scylladb/scylladb#25560
2025-08-22 09:50:07 +03:00
David Garcia
461a0bad8a docs: do not show any version warning for upgrade guide pages
Closes scylladb/scylladb#25562
2025-08-22 09:49:27 +03:00
Ran Regev
6eca083137 remove redis documentation
As part of removing redis from Scylla source tree.
This commit removes all related documentation.
Following commit remove the code itself.

Signed-off-by: Ran Regev <ran.regev@scylladb.com>
2025-08-20 17:53:23 +03:00
Avi Kivity
611918056a Merge 'repair: Add tablet incremental repair support' from Asias He
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.

Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.

This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:

The repaired_at on disk field of a SSTable is used.
   - A 64-bit number increases sequentially

The sstables_repaired_at is added to the system.tablets table.
   - repaired_at <= sstables_repaired_at means the sstable is repaired

The being_repaired in memory field of a SSTable is added.
   - A repair UUID tells which sstable has participated in the repair

Initial test results:

    1) Medium dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~500GB
    Cluster pre-populated with ~500GB of data before starting repairs job.
    Results for Repair Timings:
    The regular repair run took 210 mins.
    Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
    The speedup is: 183 mins  / 48s = 228X

    2) Small dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~167GB
    Cluster pre-populated with ~167GB of data before starting the repairs job.
    Regular repair 1st run took 110s,  2nd and 3rd runs took 110s.
    Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
    The speedup is: 110s / 1.5s = 73X

    3) Large dataset results
    Node amount: 6
    Instance type: i4i.2xlarge, 3 racks
    50% of base load, 50% read/write
    Dataset == Sum of data on each node

    Dataset     Non-incremental repair (minutes)
    1.3 TiB     31:07
    3.5 TiB     25:10
    5.0 TiB     19:03
    6.3 TiB     31:42

    Dataset     Incremental repair (minutes)
    1.3 TiB     24:32
    3.0 TiB     13:06
    4.0 TiB     5:23
    4.8 TiB     7:14
    5.6 TiB     3:58
    6.3 TiB     7:33
    7.0 TiB     6:55

Fixes #22472

Closes scylladb/scylladb#24291

* github.com:scylladb/scylladb:
  replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair
  compaction: Move compaction_reenabler to compaction_reenabler.hh
  topology_coordinator: Make rpc::remote_verb_error to warning level
  repair: Add metrics for sstable bytes read and skipped from sstables
  test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
  test.py: Add tests for tablet incremental repair
  repair: Add tablet incremental repair support
  compaction: Add tablet incremental repair support
  feature_service: Add TABLET_INCREMENTAL_REPAIR feature
  tablet_allocator: Add tablet_force_tablet_count_increase and decrease
  repair: Add incremental helpers
  sstable: Add being_repaired to sstable
  sstables: Add set_repaired_at to metadata_collector
  mutation_compactor: Introduce add operator to compaction_stats
  tablet: Add sstables_repaired_at to system.tablets table
  test: Fix drain api in task_manager_client.py
2025-08-19 13:13:22 +03:00
Dawid Mędrek
6a71461e53 treewide: Fix spelling errors
The errors were spotted by our GitHub Actions.

Closes scylladb/scylladb#24822
2025-08-19 13:07:43 +03:00
Anna Stuchlik
841ba86609 doc: document support for new z3 instance types
This commit adds new z3 instances we now support to the list of GCP instance types.

Fixes https://github.com/scylladb/scylladb/issues/25438

Closes scylladb/scylladb#25446
2025-08-14 10:59:45 +02:00
Anna Stuchlik
1e5659ac30 doc: add the information about ScyllaDB C# Driver
This commit adds the driver to the list of ScyllaDB drivers,
including the information about:
- CDC integration (not available)
- Tablets (supported)

Fixes https://github.com/scylladb/scylladb/issues/25495

Closes scylladb/scylladb#25498
2025-08-14 11:29:52 +03:00
Pavel Emelyanov
eaec7c9b2e Merge 'cql3: add default replication strategy to create_keyspace_statement' from Dario Mirovic
When creating a new keyspace, both replication strategy and replication
factor must be stated. For example:
`CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 };`

This syntax is verbose, and in all but some testing scenarios
`NetworkTopologyStrategy` is used.

This patch allows skipping replication strategy name, filling it with
`NetworkTopologyStrategy` when that happens. The following syntax is now
valid:
`CREATE KEYSPACE ks WITH REPLICATION = { 'replication_factor' : 3 };`
and will give the same result as the previous, more explicit one.

Fixes https://github.com/scylladb/scylladb/issues/16029

Backport is not needed. This is an enhancement for future releases.

Closes scylladb/scylladb#25236

* github.com:scylladb/scylladb:
  docs/cql: update documentation for default replication strategy
  test/cqlpy: add keyspace creation default strategy test
  cql3: add default replication strategy to `create_keyspace_statement`
2025-08-14 11:18:36 +03:00
Dario Mirovic
2ac37b4fde docs/cql: update documentation for default replication strategy
Update create-keyspace-statement section of ddl.rst since `class` is no longer mandatory.
Add an example for keyspace creation without specifying `class`.

Refs: #16029
2025-08-13 01:52:00 +02:00
Wojciech Przytuła
7600ccfb20 Fix link to ScyllaDB manual
The link would point to outdated OS docs. I fixed it to point to up-to-date Enterprise docs.

Closes scylladb/scylladb#25328
2025-08-12 10:33:06 +03:00
Tomasz Grabiec
9fd312d157 Merge 'row_cache: add memtable overlap checks elision optimization for tombstone gc' from Botond Dénes
https://github.com/scylladb/scylladb/issues/24962 introduced memtable overlap checks to cache tombstone GC. This was observed to be very strict and greatly reduce the effectiveness of tombstone GC in the cache, especially for MV workloads, which regularly recycle old timestamp into new writes, so the memtable often has smaller min live timestamp than the timestamp of the tombstones in the cache.

When creating a new memtable, save a snapshot of the tombstone gc state. This snapshot is used later to exclude this memtable from overlap checks for tombstones, whose token have an expiry time larger than that of the tombstone, meaning: all writes in this memtable were produced at a point in time when the current tombstone has already expired. This has the following implications:
* The partition the tombstone is part of was already repaired at the time the memtable was created.
* All writes in the memtable were produced *after* this tombstone's expiry time, these writes cannot be possibly relevant for this tombstone.

Based on this, such memtables are excluded from the overlap checks. With adequately frequent memtable flushes -- so that the tombstone gc state snapshot is refreshed -- most memtables should be excluded from overlap checks, greatly helping the cache's tombstone GC efficiency.

Fixes: https://github.com/scylladb/scylladb/issues/24962

Fixes a regression introduced by https://github.com/scylladb/scylladb/pull/23255 which was backported to all releases, needs backport to all releases as well

Closes scylladb/scylladb#25033

* github.com:scylladb/scylladb:
  docs/dev/tombstone.md: document the memtable overlap check elision optimization
  test/boost/row_cache_test: add test for memtable overlap check elision
  db/cache_mutation_reader: obtain gc-before and min-live-ts lazily
  mutation/mutation_compactor: use max_purgeable::can_purge and max_purgeable::purge_result
  db/cache_mutation_reader: use max_purgeable::can_purge()
  replica/table: get_max_purgeable_fn_for_cache_underlying_reader(): use max_purgable::combine()
  replica/database: memtable_list::get_max_purgeable(): set expiry-treshold
  compaction/compaction_garbage_collector: max_purgeable: add expiry_treshold
  replica/table: propagate gc_state to memtable_list
  replica/memtable_list: add tombstone_gc_state* member
  replica/memtable: add tombstone_gc_state_snapshot
  tombstone_gc: introduce tombstone_gc_state_snapshot
  tombstone_gc: extract shared state into shared_tombstone_gc_state
  tombstone_gc: per_table_history_maps::_group0_gc_time: make it a value
  tombstone_gc: fold get_group0_gc_time() into its caller
  tombstone_gc: fold get_or_create_group0_gc_time() into update_group0_refresh_time()
  tombstone_gc: fold get_or_create_repair_history_for_table() into update_repair_time()
  tombstone_gc: refactor get_or_greate_repair_history_for_table()
  replica/memtable_list: s/min_live_timestamp()/get_max_purgeable()/
  db/read_context: return max_purgeable from get_max_purgeable()
  compaction/compaction_garbage_collector: add formatter for max_purgeable
  mutation: move definition of gc symbols to compaction.cc
  compaction/compaction_garbage_collector: refactor max_purgeable into a class
  test/boost/row_cache_test: refactor test_populating_reader_tombstone_gc_with_data_in_memtable
  test: rewrite test_compacting_reader_tombstone_gc_with_data_in_memtable in C++
  test/boost/row_cache_test: refactor cache tombstone GC with memtable overlap tests
2025-08-11 23:54:59 +02:00
Botond Dénes
660ea9202a docs/dev/tombstone.md: document the memtable overlap check elision optimization 2025-08-11 17:20:12 +03:00
Anna Stuchlik
1322f301f6 doc: add support for RHEL 10
This commit adds RHEL 10 to the list of supported platforms.

Fixes https://github.com/scylladb/scylladb/issues/25436

Closes scylladb/scylladb#25437
2025-08-11 13:13:37 +02:00
Patryk Jędrzejczak
7b77c6cc4a docs: Raft recovery procedure: recommend verifying participation in Raft recovery
This instruction adds additional safety. The faster we notice that
a node didn't restart properly, the better.

The old gossip-based recovery procedure had a similar recommendation
to verify that each restarting node entered `RECOVERY` mode.

Fixes #25375

This is a documentation improvement. We should backport it to all
branches with the new recovery procedure, so 2025.2 and 2025.3.

Closes scylladb/scylladb#25376
2025-08-11 09:21:29 +03:00
Asias He
5377f87e5a tablet: Add sstables_repaired_at to system.tablets table
It is used to store the repaired_at for each tablet.
2025-08-11 10:10:07 +08:00
Anna Stuchlik
f3d9d0c1c7 doc: add new and removed metrics to the 2025.3 upgrade guide
This commit adds the list of new and removed metrics to the already existing upgrade guide
from 2025.2 to 2025.3.

Fixes https://github.com/scylladb/scylladb/issues/24697

Closes scylladb/scylladb#25385
2025-08-08 13:25:51 +02:00
Botond Dénes
70aa81990b Merge 'Alternator - add the ability to write, not just read, system tables' from Nadav Har'El
In commit 44a1daf we added the ability to read Scylla system tables with Alternator. This feature is useful, among other things, in tests that want to read Scylla's configuration through the system table system.config. But tests often want to modify system.config, e.g., to temporarily reduce some threshold to make tests shorter. Until now, this was not possible

This series add supports for writing to system tables through Alternator, and examples of tests using this capability (and utility functions to make it easy).

Because the ability to write to system tables may have non-obvious security consequences, it is turned off by default and needs to be enabled with a new configuration option "alternator_allow_system_table_write"

No backports are necessary - this feature is only intended for tests. We may later decide to backport if we want to backport new tests, but I think the probability we'll want to do this is low.

Fixes #12348

Closes scylladb/scylladb#19147

* github.com:scylladb/scylladb:
  test/alternator: utility functions for changing configuration
  alternator: add optional support for writing to system table
  test/alternator: reduce duplicated code
2025-08-08 09:13:15 +03:00
Pavel Emelyanov
0616407be5 Merge 'rest_api: add endpoint which drops all quarantined sstables' from Taras Veretilnyk
Added a new POST endpoint `/storage_service/drop_quarantined_sstables` to the REST API.
This endpoint allows dropping all quarantined SSTables either globally or
for a specific keyspace and tables.
Optional query parameters `keyspace` and `tables` (comma-separated table names) can be
provided to limit the scope of the operation.

Fixes scylladb/scylladb#19061

Backport is not required, it is new functionality

Closes scylladb/scylladb#25063

* github.com:scylladb/scylladb:
  docs: Add documentation for the nodetool dropquarantinedsstables command
  nodetool: add command for dropping quarantine sstables
  rest_api: add endpoint which drops all quarantined sstables
2025-08-06 11:55:15 +03:00
Taras Veretilnyk
bcb90c42e4 docs: Sort commands list in nodetool.rst
Fixes scylladb/scylladb#25330

Closes scylladb/scylladb#25331
2025-08-06 11:20:53 +03:00
Nadav Har'El
a896e2dbb9 alternator: add optional support for writing to system table
In commit 44a1daf we added the ability to read system tables through
the DynamoDB API (actually, the Scan and Query requests only).
This ability is useful for tests, and can also be useful to users who
want to read information that is only available through system tables.

This patch adds support also for *writing* into system tables. This will
be useful for Alternator tests, were we want to temporarily change
some live-updatable configuration option - and so far haven't been
able to do that like we did do in some cql-pytest tests.

For reasons explained in issue #23218, only superuser roles are allowed to
write to system tables - it is not enough for the role to be granted
MODIFY permissions on the system table or on ALL KEYSPACES. Moreover,
the ability to modify system tables carries special risks, so this
patch only allows writes to the system tables if a new configuration
option "alternator_allow_system_table_write" turned on. This option is
turned off by default.

This patch also includes a test for this new configuration-writing
capability. The test scripts test/alternator/run and test.py now
run Scylla with alternator_allow_system_table_write turned on, but
the new test can also run without this option, and will be skipped
in that case (to allow running the test suite against some manually-
run instance of Scylla).

Fixes: #12348

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-08-06 10:00:04 +03:00
Avi Kivity
4c785b31c7 Merge 'List Alternator clients in system.clients virtual table' from Nadav Har'El
Before this series, the "system.clients" virtual table lists active connections (and their various properties, like client address, logged in username and client version) only for CQL requests. This series adds also Alternator clients to system.clients. One of the interesting use cases of this new feature is understanding exactly which SDK a user is using -without inspecting their application code.  Different SDKs pass different "User-Agent" headers in requests, and that User-Agent will be visible in the system.clients entries for Alternator requests as the "driver_name" field.

Unlike CQL where logged in username, driver name, etc. applies to a complete connection, in the Alternator API, different requests can theoretically be signed by different users and carry different headers but still arrive over the same HTTP connection. So instead of listing the currently open Alternator *connections*, we will list the currently active *requests*.

The first three patches introduce utilities that will be useful in the implementation. The fourth patch is the implementation itself (which is quite simple with the utility introduced in the second patch), and the fifth patch a regression test for the new feature. The sixth patch adds documentation, the seventh patch refactors generic_server to use the newly introduced utility class and reduce code duplication, and the eighth patch adds a small check to an existing check of CQL's system.clients.

Fixes #24993

This patch adds a new feature, so doesn't require a backport. Nevertheless, if we want it to get to existing customers more quickly to allow us to better understand their use case by reading the system.clients table, we may want to consider backporting this patch to existing branches. There is some risk involved in this patch, because it adds code that gets run on every Alternator request, so a bug on it can cause problems for every Alternator request.

Closes scylladb/scylladb#25178

* github.com:scylladb/scylladb:
  test/cqlpy: slightly strengthen test for system.clients
  generic_server: use utils::scoped_item_list
  docs/alternator: document the system.clients system table in Alternator
  alternator: add test for Alternator clients in system.clients
  alternator: list active Alternator requests in system.clients
  utils: unit test for utils::scoped_item_list
  utils: add a scoped_item_list utility class
  utils: add "fatal" version of utils::on_internal_error()
2025-08-05 15:55:41 +03:00
Pavel Emelyanov
5fcdf948d9 doc: Update system.clients schema with scheduling_group cell
It was added by 9319d65971 (db/virtual_tables: add scheduling group
column to system.clients) recently.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#25294
2025-08-05 10:16:20 +03:00
Piotr Dulikowski
ec7832cc84 Merge 'Raft-based recovery procedure: simplify rolling restart with recovery_leader' from Patryk Jędrzejczak
The following steps are performed in sequence as part of the
Raft-based recovery procedure:
- set `recovery_leader` to the host ID of the recovery leader in
  `scylla.yaml` on all live nodes,
- send the `SIGHUP` signal to all Scylla processes to reload the config,
- perform a rolling restart (with the recovery leader being restarted
  first).

These steps are not intuitive and more complicated than they could be.

In this PR, we simplify these steps. From now on, we will be able to
simply set `recovery_leader` on each node just before restarting it.

Apart from making necessary changes in the code, we also update all
tests of the Raft-based recovery procedure and the user-facing
documentation.

Fixes scylladb/scylladb#25015

The Raft-based procedure was added in 2025.2. This PR makes the
procedure simpler and less error-prone, so it should be backported
to 2025.2 and 2025.3.

Closes scylladb/scylladb#25032

* github.com:scylladb/scylladb:
  docs: document the option to set recovery_leader later
  test: delay setting recovery_leader in the recovery procedure tests
  gossip: add recovery_leader to gossip_digest_syn
  db: system_keyspace: peers_table_read_fixup: remove rows with null host_id
  db/config, gms/gossiper: change recovery_leader to UUID
  db/config, utils: allow using UUID as a config option
2025-08-04 08:29:32 +02:00
Taras Veretilnyk
15e3980693 docs: Add documentation for the nodetool dropquarantinedsstables command
Fixes scylladb/scylladb#19061
2025-08-01 11:46:33 +02:00
Nadav Har'El
70c94ac9dd docs/alternator: document the system.clients system table in Alternator
Add to docs/alternator/new-apis.md a full description of the
`system.clients` support in Alternator that was added in the previous
patches.

Although arguably *all* Scylla system tables should work on Alternator
and do not need to be individually documented, I believe that this
specific table, is interesting to document. This is because some of
the attributes in this table have non-obvious and Alternator-specific
meanings. Moreover, there's even a diffence in what each individual
item in the table represents (it represents active requests, not entire
connections as in CQL).

While editing the system tables section of new-apis.md, this patch also slightly
improves its formatting.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-08-01 02:15:05 +03:00
Nadav Har'El
22f845b128 docs/alternator: mention missing ShardFilter support
Add in docs/alternator/compatibility.md a mention of the ShardFilter
option which we don't support in Alternator Streams. This option was
only introduced to DynamoDB a week ago, so it's not surprising we
don't yet support it :-)

Refs #25160

Closes scylladb/scylladb#25161
2025-07-29 14:37:24 +03:00
Andrei Chekun
a6a3d119e8 docs: update documentation with new way of running C++ tests
Documentation had outdated information how to run C++ test.
Additionally, some information added about gathered test metrics.

Closes scylladb/scylladb#25180
2025-07-29 14:36:19 +03:00
Anna Stuchlik
b67bb641bc doc: add OS support for ScyllaDB 2025.3
This commit adds the information about support for platforms in ScyllaDB version 2025.3.

Fixes https://github.com/scylladb/scylladb/issues/24698

Closes scylladb/scylladb#25220
2025-07-29 14:33:12 +03:00
Anna Stuchlik
8365219d40 doc: add the upgrade guide from 2025.2 to 2025.3
This PR adds the upgrade guide from version 2025.2 to 2025.3.
Also, it removes the upgrade guide existing for the previous version
that is irrelevant in 2025.2 (upgrade from 2025.1 to 2025.2).

Note that the new guide does not include the "Enable Consistent Topology Updates" page and note,
as users upgrading to 2025.3 have consistent topology updates already enabled.

Fixes https://github.com/scylladb/scylladb/issues/24696

Closes scylladb/scylladb#25219
2025-07-29 14:32:31 +03:00
Anna Stuchlik
18b4d4a77c doc: add tablets support information to the Drivers table
This commit:

- Extends the Drivers support table with information on which driver supports tablets
  and since which version.
- Adds the driver support policy to the Drivers page.
- Reorganizes the Drivers page to accommodate the updates.

In addition:
- The CPP-over-Rust driver is added to the table.
- The information about Serverless (which we don't support) is removed
  and replaced with tablets to correctly describe the contents of the table.

Fixes https://github.com/scylladb/scylladb/issues/19471

Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69

Closes scylladb/scylladb#24635
2025-07-29 08:11:42 +03:00
Nadav Har'El
b4fc3578fc Merge 'LWT: enable for tablet-based tables' from Petr Gusev
This PR enables **LWT (Lightweight Transactions)** support for tablet-based tables by leveraging **colocated tables**.

Currently, storing Paxos state in system tables causes two major issues:
* **Loss of Paxos state during tablet migration or base table rebuilds**
  * When a tablet is migrated or the base table is rebuilt, system tables don't retain Paxos state.
  * This breaks LWT correctness in certain scenarios.
  * Failing test cases demonstrating this:
      * test_lwt_state_is_preserved_on_tablet_migration
      * test_lwt_state_is_preserved_on_rebuild
* **Shard misalignment and performance overhead**
  * Tablets may be placed on arbitrary shards by the tablet balancer.
  * Accessing Paxos state in system tables could require a shard jump, degrading performance.

We move Paxos state into a dedicated Paxos table, colocated with the base table:
  * Each base table gets its own Paxos state table.
  * This table is lazily created on the first LWT operation.
  * Its tablets are colocated with those of the base table, ensuring:
    * Co-migration during tablet movement
    * Co-rebuilding with the base table
    * Shard alignment for local access to Paxos state

Some reasoning for why this is sufficient to preserve LWT correctness is discussed in [2].

This PR addresses two issues from the "Why doesn't it work for tablets" section  in [1]:
  * Tablet migration vs LWT correctness
  * Paxos table sharding

Other issues ("bounce to shard" and "locking for intranode_migration") have already been resolved in previous PRs.

References
[1] - [LWT over tablets design](https://docs.google.com/document/d/1CPm0N9XFUcZ8zILpTkfP5O4EtlwGsXg_TU4-1m7dTuM/edit?tab=t.0#heading=h.goufx7gx24yu)
[2] - [LWT: Paxos state and tablet balancer](https://docs.google.com/document/d/1-xubDo612GGgguc0khCj5ukmMGgLGCLWLIeG6GtHTY4/edit?tab=t.0)
[3] - [Colocated tables PR](https://github.com/scylladb/scylladb/pull/22906#issuecomment-3027123886)
[4] - [Possible LWT consistency violations after a topology change](https://github.com/scylladb/scylladb/issues/5251)

Backport: not needed because this is a new feature.

Closes scylladb/scylladb#24819

* github.com:scylladb/scylladb:
  create_keyspace: fix warning for tablets
  docs: fix lwt.rst
  docs: fix tablets.rst
  alternator: enable LWT
  random_failures: enable execute_lwt_transaction
  test_tablets_lwt: add test_paxos_state_table_permissions
  test_tablets_lwt: add test_lwt_for_tablets_is_not_supported_without_raft
  test_tablets_lwt: test timeout creating paxos state table
  test_tablets_lwt: add test_lwt_concurrent_base_table_recreation
  test_tablets_lwt: add test_lwt_state_is_preserved_on_rebuild
  test_tablets_lwt: migrate test_lwt_support_with_tablets
  test_tablets_lwt: add test_lwt_state_is_preserved_on_tablet_migration
  test_tablets_lwt: add simple test for LWT
  check_internal_table_permissions: handle Paxos state tables
  client_state: extract check_internal_table_permissions
  paxos_store: handle base table removal
  database: get_base_table_for_tablet_colocation: handle paxos state table
  paxos_state: use node_local_only mode to access paxos state
  query_options: add node_local_only mode
  storage_proxy: handle node_local_only in query
  storage_proxy: handle node_local_only in mutate
  storage_proxy: introduce node_local_only flag
  abstract_replication_strategy: remove unused using
  storage_proxy: add coordinator_mutate_options
  storage_proxy: rename create_write_response_handler -> make_write_response_handler
  storage_proxy: simplify mutate_prepare
  paxos_state: lazily create paxos state table
  migration_manager: add timeout to start_group0_operation and announce
  paxos_store: use non-internal queries
  qp: make make_internal_options public
  paxos_store: conditional cf_id filter
  paxos_store: coroutinize
  feature_service: add LWT_WITH_TABLETS feature
  paxos_state: inline system_keyspace functions into paxos_store
  paxos_state: extract state access functions into paxos_store
2025-07-28 13:19:23 +03:00
Taras Veretilnyk
6b6622e07a docs: fix typo in command name enbleautocompaction -> enableautocompaction
Renamed the file and updated all references from 'enbleautocompaction' to the correct 'enableautocompaction'.

Fixes scylladb/scylladb#25172

Closes scylladb/scylladb#25175
2025-07-28 12:49:26 +03:00
Botond Dénes
837424f7bb Merge 'Add Azure Key Provider for Encryption at Rest' from Nikos Dragazis
This PR introduces a new Key Provider to support Azure Key Vault as a Key Management System (KMS) for Encryption at Rest. The core design principle is the same as in the AWS and GCP key providers - an externally provided Vault key that is used to protect local data encryption keys (a process known as "key wrapping").

In more detail, this patch series consists of:
* Multiple Azure credential sources, offering a variety of authentication options (Service Principals, Managed Identities, environment variables, Azure CLI).
* The Azure host - the Key Vault endpoint bridge.
* The Azure Key Provider - the interface for the Azure host.
* Unit tests using real Azure resources (credentials and Vault keys).
* Log filtering logic to not expose sensitive data in the logs (plaintext keys, credentials, access tokens).

This is part of the overall effort to support Azure deployments.

Testing done:
* Unit tests.
* Manual test on an Azure VM with a Managed Identity.
* Manual test with credentials from Azure CLI.
* Manual test of `--azure-hosts` cmdline option.
* Manual test of log filtering.

Remaining items:
- [x] Create necessary Azure resources for CI.
- [x] Merge pipeline changes (https://github.com/scylladb/scylla-pkg/pull/5201).

Closes https://github.com/scylladb/scylla-enterprise/issues/1077.

New feature. No backport is needed.

Closes scylladb/scylladb#23920

* github.com:scylladb/scylladb:
  docs: Document the Azure Key Provider
  test: Add tests for Azure Key Provider
  pylib: Add mock server for Azure Key Vault
  encryption: Define and enable Azure Key Provider
  encryption: azure: Delegate hosts to shard 0
  encryption: Add Azure host cache
  encryption: Add config options for Azure hosts
  encryption: azure: Add override options
  encryption: azure: Add retries for transient errors
  encryption: azure: Implement init()
  encryption: azure: Implement get_key_by_id()
  encryption: azure: Add id-based key cache
  encryption: azure: Implement get_or_create_key()
  encryption: azure: Add credentials in Azure host
  encryption: azure: Add attribute-based key cache
  encryption: azure: Add skeleton for Azure host
  encryption: Templatize get_{kmip,kms,gcp}_host()
  encryption: gcp: Fix typo in docstring
  utils: azure: Get access token with default credentials
  utils: azure: Get access token from Azure CLI
  utils: azure: Get access token from IMDS
  utils: azure: Get access token with SP certificate
  utils: azure: Get access token with SP secret
  utils: rest: Add interface for request/response redaction logic
  utils: azure: Declare all Azure credential types
  utils: azure: Define interface for Azure credentials
  utils: Introduce base64url_{encode,decode}
2025-07-25 10:45:32 +03:00
Petr Gusev
1f5d9ace93 docs: fix lwt.rst
Add a new section about Paxos state tables. Update all
references to system.paxos in the text to refer to this
section.
2025-07-24 20:04:43 +02:00
Petr Gusev
69017fb52a docs: fix tablets.rst
LWT and Alternator are now supported with tablets.
2025-07-24 20:04:43 +02:00