Commit Graph

2159 Commits

Author SHA1 Message Date
Avi Kivity
611918056a Merge 'repair: Add tablet incremental repair support' from Asias He
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.

Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.

This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:

The repaired_at on disk field of a SSTable is used.
   - A 64-bit number increases sequentially

The sstables_repaired_at is added to the system.tablets table.
   - repaired_at <= sstables_repaired_at means the sstable is repaired

The being_repaired in memory field of a SSTable is added.
   - A repair UUID tells which sstable has participated in the repair

Initial test results:

    1) Medium dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~500GB
    Cluster pre-populated with ~500GB of data before starting repairs job.
    Results for Repair Timings:
    The regular repair run took 210 mins.
    Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
    The speedup is: 183 mins  / 48s = 228X

    2) Small dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~167GB
    Cluster pre-populated with ~167GB of data before starting the repairs job.
    Regular repair 1st run took 110s,  2nd and 3rd runs took 110s.
    Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
    The speedup is: 110s / 1.5s = 73X

    3) Large dataset results
    Node amount: 6
    Instance type: i4i.2xlarge, 3 racks
    50% of base load, 50% read/write
    Dataset == Sum of data on each node

    Dataset     Non-incremental repair (minutes)
    1.3 TiB     31:07
    3.5 TiB     25:10
    5.0 TiB     19:03
    6.3 TiB     31:42

    Dataset     Incremental repair (minutes)
    1.3 TiB     24:32
    3.0 TiB     13:06
    4.0 TiB     5:23
    4.8 TiB     7:14
    5.6 TiB     3:58
    6.3 TiB     7:33
    7.0 TiB     6:55

Fixes #22472

Closes scylladb/scylladb#24291

* github.com:scylladb/scylladb:
  replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair
  compaction: Move compaction_reenabler to compaction_reenabler.hh
  topology_coordinator: Make rpc::remote_verb_error to warning level
  repair: Add metrics for sstable bytes read and skipped from sstables
  test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
  test.py: Add tests for tablet incremental repair
  repair: Add tablet incremental repair support
  compaction: Add tablet incremental repair support
  feature_service: Add TABLET_INCREMENTAL_REPAIR feature
  tablet_allocator: Add tablet_force_tablet_count_increase and decrease
  repair: Add incremental helpers
  sstable: Add being_repaired to sstable
  sstables: Add set_repaired_at to metadata_collector
  mutation_compactor: Introduce add operator to compaction_stats
  tablet: Add sstables_repaired_at to system.tablets table
  test: Fix drain api in task_manager_client.py
2025-08-19 13:13:22 +03:00
Dawid Mędrek
6a71461e53 treewide: Fix spelling errors
The errors were spotted by our GitHub Actions.

Closes scylladb/scylladb#24822
2025-08-19 13:07:43 +03:00
Sayanta Banerjee
eae1869d3a Update docs/features/cdc/cdc-streams.rst
Co-authored-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2025-08-19 15:06:30 +05:30
Anna Stuchlik
841ba86609 doc: document support for new z3 instance types
This commit adds new z3 instances we now support to the list of GCP instance types.

Fixes https://github.com/scylladb/scylladb/issues/25438

Closes scylladb/scylladb#25446
2025-08-14 10:59:45 +02:00
Anna Stuchlik
1e5659ac30 doc: add the information about ScyllaDB C# Driver
This commit adds the driver to the list of ScyllaDB drivers,
including the information about:
- CDC integration (not available)
- Tablets (supported)

Fixes https://github.com/scylladb/scylladb/issues/25495

Closes scylladb/scylladb#25498
2025-08-14 11:29:52 +03:00
Pavel Emelyanov
eaec7c9b2e Merge 'cql3: add default replication strategy to create_keyspace_statement' from Dario Mirovic
When creating a new keyspace, both replication strategy and replication
factor must be stated. For example:
`CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 };`

This syntax is verbose, and in all but some testing scenarios
`NetworkTopologyStrategy` is used.

This patch allows skipping replication strategy name, filling it with
`NetworkTopologyStrategy` when that happens. The following syntax is now
valid:
`CREATE KEYSPACE ks WITH REPLICATION = { 'replication_factor' : 3 };`
and will give the same result as the previous, more explicit one.

Fixes https://github.com/scylladb/scylladb/issues/16029

Backport is not needed. This is an enhancement for future releases.

Closes scylladb/scylladb#25236

* github.com:scylladb/scylladb:
  docs/cql: update documentation for default replication strategy
  test/cqlpy: add keyspace creation default strategy test
  cql3: add default replication strategy to `create_keyspace_statement`
2025-08-14 11:18:36 +03:00
Dario Mirovic
2ac37b4fde docs/cql: update documentation for default replication strategy
Update create-keyspace-statement section of ddl.rst since `class` is no longer mandatory.
Add an example for keyspace creation without specifying `class`.

Refs: #16029
2025-08-13 01:52:00 +02:00
Wojciech Przytuła
7600ccfb20 Fix link to ScyllaDB manual
The link would point to outdated OS docs. I fixed it to point to up-to-date Enterprise docs.

Closes scylladb/scylladb#25328
2025-08-12 10:33:06 +03:00
Tomasz Grabiec
9fd312d157 Merge 'row_cache: add memtable overlap checks elision optimization for tombstone gc' from Botond Dénes
https://github.com/scylladb/scylladb/issues/24962 introduced memtable overlap checks to cache tombstone GC. This was observed to be very strict and greatly reduce the effectiveness of tombstone GC in the cache, especially for MV workloads, which regularly recycle old timestamp into new writes, so the memtable often has smaller min live timestamp than the timestamp of the tombstones in the cache.

When creating a new memtable, save a snapshot of the tombstone gc state. This snapshot is used later to exclude this memtable from overlap checks for tombstones, whose token have an expiry time larger than that of the tombstone, meaning: all writes in this memtable were produced at a point in time when the current tombstone has already expired. This has the following implications:
* The partition the tombstone is part of was already repaired at the time the memtable was created.
* All writes in the memtable were produced *after* this tombstone's expiry time, these writes cannot be possibly relevant for this tombstone.

Based on this, such memtables are excluded from the overlap checks. With adequately frequent memtable flushes -- so that the tombstone gc state snapshot is refreshed -- most memtables should be excluded from overlap checks, greatly helping the cache's tombstone GC efficiency.

Fixes: https://github.com/scylladb/scylladb/issues/24962

Fixes a regression introduced by https://github.com/scylladb/scylladb/pull/23255 which was backported to all releases, needs backport to all releases as well

Closes scylladb/scylladb#25033

* github.com:scylladb/scylladb:
  docs/dev/tombstone.md: document the memtable overlap check elision optimization
  test/boost/row_cache_test: add test for memtable overlap check elision
  db/cache_mutation_reader: obtain gc-before and min-live-ts lazily
  mutation/mutation_compactor: use max_purgeable::can_purge and max_purgeable::purge_result
  db/cache_mutation_reader: use max_purgeable::can_purge()
  replica/table: get_max_purgeable_fn_for_cache_underlying_reader(): use max_purgable::combine()
  replica/database: memtable_list::get_max_purgeable(): set expiry-treshold
  compaction/compaction_garbage_collector: max_purgeable: add expiry_treshold
  replica/table: propagate gc_state to memtable_list
  replica/memtable_list: add tombstone_gc_state* member
  replica/memtable: add tombstone_gc_state_snapshot
  tombstone_gc: introduce tombstone_gc_state_snapshot
  tombstone_gc: extract shared state into shared_tombstone_gc_state
  tombstone_gc: per_table_history_maps::_group0_gc_time: make it a value
  tombstone_gc: fold get_group0_gc_time() into its caller
  tombstone_gc: fold get_or_create_group0_gc_time() into update_group0_refresh_time()
  tombstone_gc: fold get_or_create_repair_history_for_table() into update_repair_time()
  tombstone_gc: refactor get_or_greate_repair_history_for_table()
  replica/memtable_list: s/min_live_timestamp()/get_max_purgeable()/
  db/read_context: return max_purgeable from get_max_purgeable()
  compaction/compaction_garbage_collector: add formatter for max_purgeable
  mutation: move definition of gc symbols to compaction.cc
  compaction/compaction_garbage_collector: refactor max_purgeable into a class
  test/boost/row_cache_test: refactor test_populating_reader_tombstone_gc_with_data_in_memtable
  test: rewrite test_compacting_reader_tombstone_gc_with_data_in_memtable in C++
  test/boost/row_cache_test: refactor cache tombstone GC with memtable overlap tests
2025-08-11 23:54:59 +02:00
Botond Dénes
660ea9202a docs/dev/tombstone.md: document the memtable overlap check elision optimization 2025-08-11 17:20:12 +03:00
Anna Stuchlik
1322f301f6 doc: add support for RHEL 10
This commit adds RHEL 10 to the list of supported platforms.

Fixes https://github.com/scylladb/scylladb/issues/25436

Closes scylladb/scylladb#25437
2025-08-11 13:13:37 +02:00
Patryk Jędrzejczak
7b77c6cc4a docs: Raft recovery procedure: recommend verifying participation in Raft recovery
This instruction adds additional safety. The faster we notice that
a node didn't restart properly, the better.

The old gossip-based recovery procedure had a similar recommendation
to verify that each restarting node entered `RECOVERY` mode.

Fixes #25375

This is a documentation improvement. We should backport it to all
branches with the new recovery procedure, so 2025.2 and 2025.3.

Closes scylladb/scylladb#25376
2025-08-11 09:21:29 +03:00
Asias He
5377f87e5a tablet: Add sstables_repaired_at to system.tablets table
It is used to store the repaired_at for each tablet.
2025-08-11 10:10:07 +08:00
Anna Stuchlik
f3d9d0c1c7 doc: add new and removed metrics to the 2025.3 upgrade guide
This commit adds the list of new and removed metrics to the already existing upgrade guide
from 2025.2 to 2025.3.

Fixes https://github.com/scylladb/scylladb/issues/24697

Closes scylladb/scylladb#25385
2025-08-08 13:25:51 +02:00
Botond Dénes
70aa81990b Merge 'Alternator - add the ability to write, not just read, system tables' from Nadav Har'El
In commit 44a1daf we added the ability to read Scylla system tables with Alternator. This feature is useful, among other things, in tests that want to read Scylla's configuration through the system table system.config. But tests often want to modify system.config, e.g., to temporarily reduce some threshold to make tests shorter. Until now, this was not possible

This series add supports for writing to system tables through Alternator, and examples of tests using this capability (and utility functions to make it easy).

Because the ability to write to system tables may have non-obvious security consequences, it is turned off by default and needs to be enabled with a new configuration option "alternator_allow_system_table_write"

No backports are necessary - this feature is only intended for tests. We may later decide to backport if we want to backport new tests, but I think the probability we'll want to do this is low.

Fixes #12348

Closes scylladb/scylladb#19147

* github.com:scylladb/scylladb:
  test/alternator: utility functions for changing configuration
  alternator: add optional support for writing to system table
  test/alternator: reduce duplicated code
2025-08-08 09:13:15 +03:00
Pavel Emelyanov
0616407be5 Merge 'rest_api: add endpoint which drops all quarantined sstables' from Taras Veretilnyk
Added a new POST endpoint `/storage_service/drop_quarantined_sstables` to the REST API.
This endpoint allows dropping all quarantined SSTables either globally or
for a specific keyspace and tables.
Optional query parameters `keyspace` and `tables` (comma-separated table names) can be
provided to limit the scope of the operation.

Fixes scylladb/scylladb#19061

Backport is not required, it is new functionality

Closes scylladb/scylladb#25063

* github.com:scylladb/scylladb:
  docs: Add documentation for the nodetool dropquarantinedsstables command
  nodetool: add command for dropping quarantine sstables
  rest_api: add endpoint which drops all quarantined sstables
2025-08-06 11:55:15 +03:00
Taras Veretilnyk
bcb90c42e4 docs: Sort commands list in nodetool.rst
Fixes scylladb/scylladb#25330

Closes scylladb/scylladb#25331
2025-08-06 11:20:53 +03:00
Nadav Har'El
a896e2dbb9 alternator: add optional support for writing to system table
In commit 44a1daf we added the ability to read system tables through
the DynamoDB API (actually, the Scan and Query requests only).
This ability is useful for tests, and can also be useful to users who
want to read information that is only available through system tables.

This patch adds support also for *writing* into system tables. This will
be useful for Alternator tests, were we want to temporarily change
some live-updatable configuration option - and so far haven't been
able to do that like we did do in some cql-pytest tests.

For reasons explained in issue #23218, only superuser roles are allowed to
write to system tables - it is not enough for the role to be granted
MODIFY permissions on the system table or on ALL KEYSPACES. Moreover,
the ability to modify system tables carries special risks, so this
patch only allows writes to the system tables if a new configuration
option "alternator_allow_system_table_write" turned on. This option is
turned off by default.

This patch also includes a test for this new configuration-writing
capability. The test scripts test/alternator/run and test.py now
run Scylla with alternator_allow_system_table_write turned on, but
the new test can also run without this option, and will be skipped
in that case (to allow running the test suite against some manually-
run instance of Scylla).

Fixes: #12348

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-08-06 10:00:04 +03:00
Avi Kivity
4c785b31c7 Merge 'List Alternator clients in system.clients virtual table' from Nadav Har'El
Before this series, the "system.clients" virtual table lists active connections (and their various properties, like client address, logged in username and client version) only for CQL requests. This series adds also Alternator clients to system.clients. One of the interesting use cases of this new feature is understanding exactly which SDK a user is using -without inspecting their application code.  Different SDKs pass different "User-Agent" headers in requests, and that User-Agent will be visible in the system.clients entries for Alternator requests as the "driver_name" field.

Unlike CQL where logged in username, driver name, etc. applies to a complete connection, in the Alternator API, different requests can theoretically be signed by different users and carry different headers but still arrive over the same HTTP connection. So instead of listing the currently open Alternator *connections*, we will list the currently active *requests*.

The first three patches introduce utilities that will be useful in the implementation. The fourth patch is the implementation itself (which is quite simple with the utility introduced in the second patch), and the fifth patch a regression test for the new feature. The sixth patch adds documentation, the seventh patch refactors generic_server to use the newly introduced utility class and reduce code duplication, and the eighth patch adds a small check to an existing check of CQL's system.clients.

Fixes #24993

This patch adds a new feature, so doesn't require a backport. Nevertheless, if we want it to get to existing customers more quickly to allow us to better understand their use case by reading the system.clients table, we may want to consider backporting this patch to existing branches. There is some risk involved in this patch, because it adds code that gets run on every Alternator request, so a bug on it can cause problems for every Alternator request.

Closes scylladb/scylladb#25178

* github.com:scylladb/scylladb:
  test/cqlpy: slightly strengthen test for system.clients
  generic_server: use utils::scoped_item_list
  docs/alternator: document the system.clients system table in Alternator
  alternator: add test for Alternator clients in system.clients
  alternator: list active Alternator requests in system.clients
  utils: unit test for utils::scoped_item_list
  utils: add a scoped_item_list utility class
  utils: add "fatal" version of utils::on_internal_error()
2025-08-05 15:55:41 +03:00
Pavel Emelyanov
5fcdf948d9 doc: Update system.clients schema with scheduling_group cell
It was added by 9319d65971 (db/virtual_tables: add scheduling group
column to system.clients) recently.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#25294
2025-08-05 10:16:20 +03:00
Piotr Dulikowski
ec7832cc84 Merge 'Raft-based recovery procedure: simplify rolling restart with recovery_leader' from Patryk Jędrzejczak
The following steps are performed in sequence as part of the
Raft-based recovery procedure:
- set `recovery_leader` to the host ID of the recovery leader in
  `scylla.yaml` on all live nodes,
- send the `SIGHUP` signal to all Scylla processes to reload the config,
- perform a rolling restart (with the recovery leader being restarted
  first).

These steps are not intuitive and more complicated than they could be.

In this PR, we simplify these steps. From now on, we will be able to
simply set `recovery_leader` on each node just before restarting it.

Apart from making necessary changes in the code, we also update all
tests of the Raft-based recovery procedure and the user-facing
documentation.

Fixes scylladb/scylladb#25015

The Raft-based procedure was added in 2025.2. This PR makes the
procedure simpler and less error-prone, so it should be backported
to 2025.2 and 2025.3.

Closes scylladb/scylladb#25032

* github.com:scylladb/scylladb:
  docs: document the option to set recovery_leader later
  test: delay setting recovery_leader in the recovery procedure tests
  gossip: add recovery_leader to gossip_digest_syn
  db: system_keyspace: peers_table_read_fixup: remove rows with null host_id
  db/config, gms/gossiper: change recovery_leader to UUID
  db/config, utils: allow using UUID as a config option
2025-08-04 08:29:32 +02:00
Nikos Dragazis
b186c48a65 encryption-at-rest.rst: add "Rotate Encryption Keys" section
Add a new section for key rotation, offering separate instructions per
key provider, organized in tabs.

The gist:

* Local Key Provider - Rotation requires creating a new key file per
  node. It's a manual procedure.

* Replicated Key Provider - Rotation is not supported.

* KMIP Key Provider - Rotation is transparent to Scylla, but it requires
  manually revoking the key in the server.

* {KMS,GCP} Key Provider - Rotation is transparent to Scylla and can be
  automated in the server.

* Azure Key Provider - Rotation is automatically supported by Scylla by
  keeping track of the key version along with the encrypted data. The
  rotation needs to be done at the Key Vault server, and can be
  automated.

Explain that, even after rotation, old keys may be still in use due to
caching, and that old SSTables will remain encrypted with the old key
until the next compaction. Provide instructions in case they prefer not
to wait.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:46 +03:00
Nikos Dragazis
3abacaa465 encryption-at-rest.rst: rewrite "Encrypt System Resources" section
- Mention all types of system data that fall under system encryption.

- Add "Before you Begin" section with requirements per key provider.
  The requirements are the same as in user encryption.

- Mention explicitly that the Replicated Key Provider cannot be used for
  system encryption.

- Provide separate instructions for each key provider. Explain all the
  configuration options.

- Provide an extra example for the Local Key Provider with a
  ``system_key_directory`` and ``key_name``.

- Highlight the code blocks as YAML. Make their indentation consistent
  with the rest of the doc (2 spaces).

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:46 +03:00
Nikos Dragazis
c59f71b399 encryption-at-rest.rst: rewrite "Update Encryption Properties of Existing Tables" section
- Split the various scenarios into sub-sections, not just examples.

- Amend the example for changing cipher algorithm and key length. The
  algorithm used in the example was the same.

- Point out that disabling encryption through the table schema is not
  possible if a node has default encryption configured.

- Amend the `nodetool upgradesstables` command. The
  `--include-all-sstables` is necessary.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:46 +03:00
Nikos Dragazis
22f941b325 encryption-at-rest.rst: rewrite "Encrypt a Single Table" section
- Add a short intro.

- Add an early note about the fact that options from
  ``scylla_encryption_options`` cannot be mixed with options from
  ``user_info_encryption``.

- Add a new "Allow Per-Table Encryption" subsection to document the
  ``allow_per_table_encryption`` option.

- Move the top-level procedure into a new "Encrypt a New Table"
  subsection to differentiate it from the "Update Encryption Properties
  of Existing Tables"".

- Add tabs for provider-dependent steps in "Before you Begin" and
  "Procedure".

- Amend "bytes" to "bits" (for the key length).

- Add examples for the replicated, KMIP, GCP, and Azure key providers.
  Use consistent keyspace and table names in all examples.

- Remove step for upgrading SSTables. The table is new - no SSTables
  exist yet.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:46 +03:00
Nikos Dragazis
bd83f3e672 encryption-at-rest.rst: rewrite "Encrypt Tables" section
- Provide separate requirements and instructions for each key provider,
  organized in tabs.
- Mention explicitly that the Replicated Key Provider cannot be used for
  default encryption.
- Fix indentation for code blocks in examples (2 spaces).
- For KMS, GCP, and Azure, add the `master_key` option in the list of
  options and remove the relevant example (not so common).
- Add steps for rolling restart.
- Amend "bytes" to "bits" (for the key length).

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:46 +03:00
Nikos Dragazis
fb030b11c3 encryption-at-rest.rst: update "Set the Azure Host" section
- Mark the `master_key` as required. Technically, it's not, since it can
  be specified in the schema encryption options, but:
  - It's better to keep it simple. The common case is to have a default
    value that occasionally needs to be overridden.
  - No functionality is lost.
  - It is mentioned as required for AWS and GCP.
- Add a note about credential resolution.
- Make some minor formatting changes to be consistent with the AWS and
  GCP sections.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Nikos Dragazis
e25b283c8d encryption-at-rest.rst: update "Set the GCP Host" section
- Add list of requirements (KMS Key, credentials, permissions).
- Add a reference to "Create Encryption Keys" section.
- Amend description for `master_key`.
- Add one example per credential type.
- Explain how credentials are resolved if not explicitly specified in
  the configuration.
- Fix indentation of "restart" command.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Nikos Dragazis
d9242ba47f encryption-at-rest.rst: update "Set the KMS Host" section
- Add a list of requirements (KMS key, credentials, permissions).
- Add a reference to "Create Encryption Keys" section.
- Add one example per credential type.
- Explain how credentials are resolved from the environment, or the
  AWS credentials file.
- Fix indentation of "restart" command.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Nikos Dragazis
cf9301c573 encryption-at-rest.rst: update "Set the KMIP Host" section
- Uncomment the code block to match the other hosts.
- Remove the ``certficate_revocation_list`` option; it's not supported.
- Amend the default values for ``key_cache_expiry`` and
  ``key_cache_refresh``.
- Add an example with mutual TLS authentication.
- Fix indentation of "restart" command.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Nikos Dragazis
b777dd267d encryption-at-rest.rst: rewrite "Create Encryption Keys" section
- Provide separate instructions for each key provider, organized in tabs.
  Move the existing instructions with the key generator script under the
  "Local Key Provider" tab. Point to the cloud provider's documentation
  for AWS, GCP, and Azure keys. List the required attributes for KMIP
  keys. List the required keys for the Replicated Key Provider.

- In the example for the key generator script, use the same algorithm
  and key strength for both the secret key and the system key, since
  this is the recommended case.

- Reorder the usage list of arguments for the key generator script.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Nikos Dragazis
60df275197 encryption-at-rest.rst: rewrite "Key Providers" section
- Use monospace font for key provider factories.

- Add a sub-section for every key provider. Explain how they operate at
  a high level and highlight any possible limitations.

- Remove version availability notes. The version 2019.1.3 is old and
  unsupported.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Nikos Dragazis
3c2f4ed1e7 encryption-at-rest.rst: hoist and update "Cipher Algorithm Descriptors"
Turn an earlier reference to "algorithm descriptor" into a hyperlink.

Use monospace font in the table header for "cipher_algorithm" and
"secret_key_strength"; these are verbatim identifiers in "scylla.yaml"
and "scylla_encryption_options". Same for their supported values.

Restrict the Blowfish key size to 128 bits, due to
<https://github.com/scylladb/scylla-enterprise/issues/4848>.

Add notes on ECB vs. CBC, and on Blowfish's 64-bit block size. Emphasize
our recommendation more.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Laszlo Ersek
f07125cfea encryption-at-rest.rst: rewrite/replace section "Encryption Key Types"
- Referring to system info encryption vs. user info encryption as distinct
  "encryption key types" is confusing. The behavior of encryption is
  similar in both cases, only the sets of data that are subject to
  encryption differ. Rename the section to "Data Classes for
  Encryption".

- Introduce the two highest-level "scylla.yaml" stanzas,
  "system_info_encryption" and "user_info_encryption". Subsequently, we'll
  expand on their (common!) contents later.

- Remove the comment that, for the Local Key Provider, a keystore can be
  created either manually or automatically. This is stated / repeated
  elsewhere in the document.

- Remove the unused anchor "_Replicated".

- The notes on the Replicated Key Provider both lack nuance, and are
  ill-placed, here. Remove those notes. Add a dedicated description for
  Replicated later, elsewhere. Do mention
  "system_replicated_keys.encrypted_keys" here in passing, as a system
  table with sensitive contents.

- The short listing of key providers is ill-placed here. We have an entire
  section dedicated to those. Furthermore, the various key providers apply
  to system info encryption, too.

- Explain the two levels of configuration for SSTables of user tables.

- Move the note about preserving keys for restoring backups to Key
  Providers | About Local Key Storage, at least temporarily. When keys are
  stored on a key management server (KMIP, GCP, AWS, Azure), then
  backing those up is its own admin task / responsibility.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Laszlo Ersek
268f5b1564 encryption-at-rest.rst: About: describe high-level operation more precisely
Clarify some table vs. SSTable differences.

Spell out the SSTable metadata ("Scylla.db") component. Spell out commit
log metadata files. Explain that encryption settings are "snapshotted"
into those meta-files.

Highlight that encryption config may vary per table *and* per node. (For
example, a local file key provider under the same pathname on each node,
referenced by the table's "scylla_encryption_options" in the schema, may
provide different keys for different nodes.)

Introduce "algorithm descriptor" and "key provider" as generic concepts.

Touch up the grammar / vocabulary slightly.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Laszlo Ersek
8717102ae5 encryption-at-rest.rst: improve wording / formatting in About intro
- Remove the KMIP password from the list of system level data.
  Encrypting this would require the `configuration_encryptor`, which has
  been removed as part of the effort to decommission all our java tools.

- Provide an exhaustive list of system tables being encrypted.

- "Table level granularity" is redundant; either "table level" or "table
  granularity" should suffice. Pick the latter.

- Distinguish "block cipher" from "mode of operation" more precisely.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Laszlo Ersek
b45d7417ef encryption-at-rest.rst: users (plural) typo fix
scylladb presumably stores data for multiple users.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2025-08-01 17:27:45 +03:00
Laszlo Ersek
68dfa41e69 encryption-at-rest.rst: rewrap
Wrap long lines at 80 chars. Seastar coding style suggests 160 chars,
but 80 chars is more comfortable for side-by-side PR diffs on GitHub.
Exclude arg lists and code blocks. Set the limit at 160 chars for arg
lists to avoid too much wrapping that would hurt readability. Do not
wrap code blocks at all.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-08-01 17:27:45 +03:00
Laszlo Ersek
54ad1fe35f encryption-at-rest.rst: strip trailing whitespace
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2025-08-01 17:27:45 +03:00
Taras Veretilnyk
15e3980693 docs: Add documentation for the nodetool dropquarantinedsstables command
Fixes scylladb/scylladb#19061
2025-08-01 11:46:33 +02:00
Nadav Har'El
70c94ac9dd docs/alternator: document the system.clients system table in Alternator
Add to docs/alternator/new-apis.md a full description of the
`system.clients` support in Alternator that was added in the previous
patches.

Although arguably *all* Scylla system tables should work on Alternator
and do not need to be individually documented, I believe that this
specific table, is interesting to document. This is because some of
the attributes in this table have non-obvious and Alternator-specific
meanings. Moreover, there's even a diffence in what each individual
item in the table represents (it represents active requests, not entire
connections as in CQL).

While editing the system tables section of new-apis.md, this patch also slightly
improves its formatting.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-08-01 02:15:05 +03:00
Nadav Har'El
22f845b128 docs/alternator: mention missing ShardFilter support
Add in docs/alternator/compatibility.md a mention of the ShardFilter
option which we don't support in Alternator Streams. This option was
only introduced to DynamoDB a week ago, so it's not surprising we
don't yet support it :-)

Refs #25160

Closes scylladb/scylladb#25161
2025-07-29 14:37:24 +03:00
Andrei Chekun
a6a3d119e8 docs: update documentation with new way of running C++ tests
Documentation had outdated information how to run C++ test.
Additionally, some information added about gathered test metrics.

Closes scylladb/scylladb#25180
2025-07-29 14:36:19 +03:00
Anna Stuchlik
b67bb641bc doc: add OS support for ScyllaDB 2025.3
This commit adds the information about support for platforms in ScyllaDB version 2025.3.

Fixes https://github.com/scylladb/scylladb/issues/24698

Closes scylladb/scylladb#25220
2025-07-29 14:33:12 +03:00
Anna Stuchlik
8365219d40 doc: add the upgrade guide from 2025.2 to 2025.3
This PR adds the upgrade guide from version 2025.2 to 2025.3.
Also, it removes the upgrade guide existing for the previous version
that is irrelevant in 2025.2 (upgrade from 2025.1 to 2025.2).

Note that the new guide does not include the "Enable Consistent Topology Updates" page and note,
as users upgrading to 2025.3 have consistent topology updates already enabled.

Fixes https://github.com/scylladb/scylladb/issues/24696

Closes scylladb/scylladb#25219
2025-07-29 14:32:31 +03:00
Anna Stuchlik
18b4d4a77c doc: add tablets support information to the Drivers table
This commit:

- Extends the Drivers support table with information on which driver supports tablets
  and since which version.
- Adds the driver support policy to the Drivers page.
- Reorganizes the Drivers page to accommodate the updates.

In addition:
- The CPP-over-Rust driver is added to the table.
- The information about Serverless (which we don't support) is removed
  and replaced with tablets to correctly describe the contents of the table.

Fixes https://github.com/scylladb/scylladb/issues/19471

Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69

Closes scylladb/scylladb#24635
2025-07-29 08:11:42 +03:00
Nadav Har'El
b4fc3578fc Merge 'LWT: enable for tablet-based tables' from Petr Gusev
This PR enables **LWT (Lightweight Transactions)** support for tablet-based tables by leveraging **colocated tables**.

Currently, storing Paxos state in system tables causes two major issues:
* **Loss of Paxos state during tablet migration or base table rebuilds**
  * When a tablet is migrated or the base table is rebuilt, system tables don't retain Paxos state.
  * This breaks LWT correctness in certain scenarios.
  * Failing test cases demonstrating this:
      * test_lwt_state_is_preserved_on_tablet_migration
      * test_lwt_state_is_preserved_on_rebuild
* **Shard misalignment and performance overhead**
  * Tablets may be placed on arbitrary shards by the tablet balancer.
  * Accessing Paxos state in system tables could require a shard jump, degrading performance.

We move Paxos state into a dedicated Paxos table, colocated with the base table:
  * Each base table gets its own Paxos state table.
  * This table is lazily created on the first LWT operation.
  * Its tablets are colocated with those of the base table, ensuring:
    * Co-migration during tablet movement
    * Co-rebuilding with the base table
    * Shard alignment for local access to Paxos state

Some reasoning for why this is sufficient to preserve LWT correctness is discussed in [2].

This PR addresses two issues from the "Why doesn't it work for tablets" section  in [1]:
  * Tablet migration vs LWT correctness
  * Paxos table sharding

Other issues ("bounce to shard" and "locking for intranode_migration") have already been resolved in previous PRs.

References
[1] - [LWT over tablets design](https://docs.google.com/document/d/1CPm0N9XFUcZ8zILpTkfP5O4EtlwGsXg_TU4-1m7dTuM/edit?tab=t.0#heading=h.goufx7gx24yu)
[2] - [LWT: Paxos state and tablet balancer](https://docs.google.com/document/d/1-xubDo612GGgguc0khCj5ukmMGgLGCLWLIeG6GtHTY4/edit?tab=t.0)
[3] - [Colocated tables PR](https://github.com/scylladb/scylladb/pull/22906#issuecomment-3027123886)
[4] - [Possible LWT consistency violations after a topology change](https://github.com/scylladb/scylladb/issues/5251)

Backport: not needed because this is a new feature.

Closes scylladb/scylladb#24819

* github.com:scylladb/scylladb:
  create_keyspace: fix warning for tablets
  docs: fix lwt.rst
  docs: fix tablets.rst
  alternator: enable LWT
  random_failures: enable execute_lwt_transaction
  test_tablets_lwt: add test_paxos_state_table_permissions
  test_tablets_lwt: add test_lwt_for_tablets_is_not_supported_without_raft
  test_tablets_lwt: test timeout creating paxos state table
  test_tablets_lwt: add test_lwt_concurrent_base_table_recreation
  test_tablets_lwt: add test_lwt_state_is_preserved_on_rebuild
  test_tablets_lwt: migrate test_lwt_support_with_tablets
  test_tablets_lwt: add test_lwt_state_is_preserved_on_tablet_migration
  test_tablets_lwt: add simple test for LWT
  check_internal_table_permissions: handle Paxos state tables
  client_state: extract check_internal_table_permissions
  paxos_store: handle base table removal
  database: get_base_table_for_tablet_colocation: handle paxos state table
  paxos_state: use node_local_only mode to access paxos state
  query_options: add node_local_only mode
  storage_proxy: handle node_local_only in query
  storage_proxy: handle node_local_only in mutate
  storage_proxy: introduce node_local_only flag
  abstract_replication_strategy: remove unused using
  storage_proxy: add coordinator_mutate_options
  storage_proxy: rename create_write_response_handler -> make_write_response_handler
  storage_proxy: simplify mutate_prepare
  paxos_state: lazily create paxos state table
  migration_manager: add timeout to start_group0_operation and announce
  paxos_store: use non-internal queries
  qp: make make_internal_options public
  paxos_store: conditional cf_id filter
  paxos_store: coroutinize
  feature_service: add LWT_WITH_TABLETS feature
  paxos_state: inline system_keyspace functions into paxos_store
  paxos_state: extract state access functions into paxos_store
2025-07-28 13:19:23 +03:00
Taras Veretilnyk
6b6622e07a docs: fix typo in command name enbleautocompaction -> enableautocompaction
Renamed the file and updated all references from 'enbleautocompaction' to the correct 'enableautocompaction'.

Fixes scylladb/scylladb#25172

Closes scylladb/scylladb#25175
2025-07-28 12:49:26 +03:00
Botond Dénes
837424f7bb Merge 'Add Azure Key Provider for Encryption at Rest' from Nikos Dragazis
This PR introduces a new Key Provider to support Azure Key Vault as a Key Management System (KMS) for Encryption at Rest. The core design principle is the same as in the AWS and GCP key providers - an externally provided Vault key that is used to protect local data encryption keys (a process known as "key wrapping").

In more detail, this patch series consists of:
* Multiple Azure credential sources, offering a variety of authentication options (Service Principals, Managed Identities, environment variables, Azure CLI).
* The Azure host - the Key Vault endpoint bridge.
* The Azure Key Provider - the interface for the Azure host.
* Unit tests using real Azure resources (credentials and Vault keys).
* Log filtering logic to not expose sensitive data in the logs (plaintext keys, credentials, access tokens).

This is part of the overall effort to support Azure deployments.

Testing done:
* Unit tests.
* Manual test on an Azure VM with a Managed Identity.
* Manual test with credentials from Azure CLI.
* Manual test of `--azure-hosts` cmdline option.
* Manual test of log filtering.

Remaining items:
- [x] Create necessary Azure resources for CI.
- [x] Merge pipeline changes (https://github.com/scylladb/scylla-pkg/pull/5201).

Closes https://github.com/scylladb/scylla-enterprise/issues/1077.

New feature. No backport is needed.

Closes scylladb/scylladb#23920

* github.com:scylladb/scylladb:
  docs: Document the Azure Key Provider
  test: Add tests for Azure Key Provider
  pylib: Add mock server for Azure Key Vault
  encryption: Define and enable Azure Key Provider
  encryption: azure: Delegate hosts to shard 0
  encryption: Add Azure host cache
  encryption: Add config options for Azure hosts
  encryption: azure: Add override options
  encryption: azure: Add retries for transient errors
  encryption: azure: Implement init()
  encryption: azure: Implement get_key_by_id()
  encryption: azure: Add id-based key cache
  encryption: azure: Implement get_or_create_key()
  encryption: azure: Add credentials in Azure host
  encryption: azure: Add attribute-based key cache
  encryption: azure: Add skeleton for Azure host
  encryption: Templatize get_{kmip,kms,gcp}_host()
  encryption: gcp: Fix typo in docstring
  utils: azure: Get access token with default credentials
  utils: azure: Get access token from Azure CLI
  utils: azure: Get access token from IMDS
  utils: azure: Get access token with SP certificate
  utils: azure: Get access token with SP secret
  utils: rest: Add interface for request/response redaction logic
  utils: azure: Declare all Azure credential types
  utils: azure: Define interface for Azure credentials
  utils: Introduce base64url_{encode,decode}
2025-07-25 10:45:32 +03:00
Petr Gusev
1f5d9ace93 docs: fix lwt.rst
Add a new section about Paxos state tables. Update all
references to system.paxos in the text to refer to this
section.
2025-07-24 20:04:43 +02:00