Compare commits

...

119 Commits

Author SHA1 Message Date
Jenkins Promoter
9dca28d2b8 Update ScyllaDB version to: 2025.1.0 2025-03-25 09:19:12 +02:00
Avi Kivity
bc98301783 Merge '[Backport 2025.1] repair: allow concurrent repair and migration of two different tablets' from Aleksandra Martyniuk
Do not hold erm during repair of a tablet that is started with tablet
repair scheduler. This way two different tablets can be repaired
and migrated concurrently. The same tablet won't be migrated while
being repaired as it is provided by topology coordinator.

Use topology_guard to maintain safety.

Fixes: https://github.com/scylladb/scylladb/issues/22408.

Needs backport to 2025.1 that introduces the tablet repair scheduler.

Closes scylladb/scylladb#23362

* github.com:scylladb/scylladb:
  test: add test to check concurrent tablets migration and repair
  repair: do not hold erm for repair scheduled by scheduler
  repair: get total rf based on current erm
  repair: make shard_repair_task_impl::erm private
  repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary
  repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary
  repair: pass session_id to repair_writer_impl::create_writer
  repair: keep materialized topology guard in shard_repair_task_impl
  repair: pass session_id to repair_meta
2025-03-23 20:14:53 +02:00
Avi Kivity
220bbcf329 Merge '[Backport 2025.1] cql3: Introduce RF-rack-valid keyspaces' from Scylladb[bot]
This PR is an introductory step towards enforcing
RF-rack-valid keyspaces in Scylla.

The scope of changes:
* defining RF-rack-valid keyspaces,
* introducing a configuration option enforcing RF-rack-valid
  keyspaces,
* restricting the CREATE and ALTER KEYSPACE statements
  so that they never lead to RF-rack invalid keyspaces,
* during the initialization of a node, it verifies that all existing
  keyspaces are RF-rack-valid. If not, the initialization fails.

We provide tests verifying that the changes behave as intended.

---

Note that there are a number of things that still need to be implemented.
That includes, for instance, restricting topology operations too.

---

Implementation strategy (going beyond the scope of this PR):

1. Introduce the new configuration option `rf_rack_valid_keyspaces`.
2. Start enforcing RF-rack-validity in keyspaces if the option is enabled.
3. Adjust the tests: in the tree and out of it. Explicitly enable the option in all tests.
4. Once the tests have been adjusted, change the default value of the option to enabled.
5. Stop explicitly enabling the option in tests.
6. Get rid of the option.

---

Fixes scylladb/scylladb#20356
Fixes scylladb/scylladb#23276
Fixes scylladb/scylladb#23300

---

Backport: this is part of the requirements for releasing 2025.1.

- (cherry picked from commit 32879ec0d5)

- (cherry picked from commit 41f862d7ba)

- (cherry picked from commit 0e04a6f3eb)

Parent PR: #23138

Closes scylladb/scylladb#23398

* github.com:scylladb/scylladb:
  main: Refuse to start node when RF-rack-invalid keyspace exists
  cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces
  db/config: Introduce RF-rack-valid keyspaces
2025-03-23 16:16:29 +02:00
Dawid Mędrek
ecdefe801c main: Refuse to start node when RF-rack-invalid keyspace exists
When a node is started with the option `rf_rack_valid_keyspaces`
enabled, the initialization will fail if there is an RF-rack-invalid
keyspace. We want to force the user to adjust their existing
keyspaces when upgrading to 2025.* so that the invariant that
every keyspace is RF-rack-valid is always satisfied.

Fixes scylladb/scylladb#23300

(cherry picked from commit 0e04a6f3eb)
2025-03-21 12:27:04 +00:00
Dawid Mędrek
af2215c2d2 cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces
In this commit, we refuse to create or alter a keyspace when that operation
would make it RF-rack-invalid if the option `rf_rack_valid_keyspaces` is
enabled.

We provide two tests verifying that the changes work as intended.

Fixes scylladb/scylladb#23276

(cherry picked from commit 41f862d7ba)
2025-03-21 12:27:04 +00:00
Dawid Mędrek
864528eb9b db/config: Introduce RF-rack-valid keyspaces
We introduce a new term in the glossary: RF-rack-valid keyspace.

We also highlight in our user documentation that all keyspaces
must remain RF-rack-valid throughout their lifetime, and failing
to guarantee that may result in data inconsistencies or other
issues. We base that information on our experience with materialized
views in keyspaces using tablets, even though they remain
an experimental feature.

Along with the new term, we introduce a new configuration option
called `rf_rack_valid_keyspaces`, which, when enabled, will enforce
preserving all keyspaces RF-rack-valid. That functionality will be
implemented in upcoming commits. For now, we materialize the
restriction in form of a named requirement: a function verifying
that the passed keyspace is RF-rack-valid.

The option is disabled by default. That will change once we adjust
the existing tests to the new semantics. Once that is done, the option
will first be enabled by default, and then it will be removed.

Fixes scylladb/scylladb#20356

(cherry picked from commit 32879ec0d5)
2025-03-21 12:27:04 +00:00
Aleksandra Martyniuk
5153b91514 test: add test to check concurrent tablets migration and repair
Add a test to check whether a tablet can be migrated while another
tablet is repaired.

(cherry picked from commit 20f9d7b6eb)
2025-03-19 10:15:19 +01:00
Aleksandra Martyniuk
0a0347cb4e repair: do not hold erm for repair scheduled by scheduler
Do not hold erm	for tablet repair scheduled by scheduler. Thanks to
that one tablet repair won't exclude migration of other tablets.

Concurrent repair and migration of the same tablet isn't possible,
since a tablet can be in one type of transition only at the time.
Hence the change is safe.

Refs: https://github.com/scylladb/scylladb/issues/22408.
(cherry picked from commit 5b792bdc98)
2025-03-19 10:09:51 +01:00
Aleksandra Martyniuk
da64c02b92 repair: get total rf based on current erm
Get total rf based on erm. Currently, it does not change anything
because erm stays the same during the whole repair.

(cherry picked from commit a1375896df)
2025-03-19 10:09:30 +01:00
Aleksandra Martyniuk
39aabe5191 repair: make shard_repair_task_impl::erm private
Make shard_repair_task_impl::erm private. Access it with getter.

(cherry picked from commit 34cd485553)
2025-03-19 10:09:11 +01:00
Aleksandra Martyniuk
9eeff8573b repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary
When small_table_optimization isn't enabled, put_row_diff_with_rpc_stream
does not access erm. Pass small_table_optimization_params containing erm
only when small_table_optimization is enabled.

This is safe as erm is kept by shard_repair_task_impl.

(cherry picked from commit 444c7eab90)
2025-03-19 10:08:22 +01:00
Aleksandra Martyniuk
4115f6f367 repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary
When small_table_optimization isn't enabled, flush_rows_in_working_row_buf
does not access erm. Add small_table_optimization_params containing erm and
pass it only when small_table_optimization is enabled.

This is safe as erm is kept by shard_repair_task_impl.

(cherry picked from commit e56bb5b6e2)
2025-03-19 10:07:45 +01:00
Aleksandra Martyniuk
fb2c46dfbe repair: pass session_id to repair_writer_impl::create_writer
(cherry picked from commit 09c74aa294)
2025-03-19 10:07:00 +01:00
Aleksandra Martyniuk
b4e37600d6 repair: keep materialized topology guard in shard_repair_task_impl
Keep materialized topology guard in shard_repair_task_impl and check
it in check_in_abort_or_shutdown and before each range repair.

(cherry picked from commit 47bb9dcf78)
2025-03-19 10:04:17 +01:00
Aleksandra Martyniuk
6bbf20a440 repair: pass session_id to repair_meta
Pass session_id of tablet repair down the stack from the repair request
to repair_meta.

The session_id will be utiziled in the following patches.

(cherry picked from commit 928f92c780)
2025-03-19 10:02:24 +01:00
Botond Dénes
b8797551eb Merge '[Backport 2025.1] Rack aware tablet merge colocation migration ' from Tomasz Grabiec
service: Introduce rack-aware co-location migrations for tablet merge

Merge co-location can emit migrations across racks even when RF=#racks,
reducing availability and affecting consistency of base-view pairing.

Given replica set of sibling tablets T0 and T1 below:
[T0: (rack1,rack3,rack2)]
[T1: (rack2,rack1,rack3)]

Merge will co-locate T1:rack2 into T0:rack1, T1 will be temporarily only at
only a subset of racks, reducing availability.

This is the main problem fixed by this patch.

It also lays the ground for consistent base-view replica pairing,
which is rack-based. For tables on which views can be created we plan
to enforce the constraint that replicas don't move across racks and
that all tablets use the same set of racks (RF=#racks). This patch
avoids moving replicas across racks unless it's necessary, so if the
constraint is satisfied before merge, there will be no co-locating
migrations across racks. This constraint of RF=#racks is not enforced
yet, it requires more extensive changes.

Fixes #22994.
Refs #17265.

This patch is based on Raphael's work done in PR #23081. The main differences are:

1) Instead of sorting replicas by rack, we try to find
    replicas in sibling tablets which belong to the same rack.
    This is similar to how we match replicas within the same host.
    It reduces number of across-rack migrations even if RF!=#racks,
    which the original patch didn't handle.
    Unlike the original patch, it also avoids rack-overloaded in case
    RF!=#racks

2) We emit across-rack co-locating migrations if we have no other choice
   in order to finalize the merge

   This is ok, since views are not supported with tablets yet. Later,
   we will disallow this for tables which have views, and we will
   allow creating views in the first place only when no such migrations
   can happen (RF=#racks).

3) Added boost unit test which checks that rack overload is avoided during merge
   in case RF<#racks

4) Moved logging of across-rack migration to debug level

5) Exposed metric for across-rack co-locating migrations

(cherry picked from commit af949f3b6a)

Also backports dependent patches:
  - locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
  - locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
  - Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec

Closes scylladb/scylladb#22657
Closes scylladb/scylladb#22652

Closes scylladb/scylladb#23297

* github.com:scylladb/scylladb:
  service: Introduce rack-aware co-location migrations for tablet merge
  Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec
  locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
  locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
2025-03-18 16:22:29 +02:00
Nadav Har'El
b1cf1890a9 alternator: document the state of tablet support in Alternator
In commit c24bc3b we decided that creating a new table in Alternator
will by default use vnodes - not tablets - because of all the missing
features in our tablets implementation that are important for
Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag
`experimental:initial_tablets` which allows to override this decision
and create an Alternator table using tablets. We also never documented
what exactly doesn't work when Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md
(which is a good place for describing the `experimental:initial_tablets`
tag). The patch also adds a new test file, test_tablets.py, which
includes tests for all the statements made in the document regarding
how `experimental:initial_tablets` works and what works or doesn't
work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets -
are moved to the new test file.

When the tablets feature will finally be completed, both the document
and the tests will need to be modified (some of the tests should be
outright deleted). But it seems this will not happen for at least
several months, and that is too long to wait without accurate
documentation.

Fixes #21629

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22462

(cherry picked from commit c0821842de)

Closes scylladb/scylladb#23298
2025-03-16 18:25:21 +02:00
Jenkins Promoter
2f0ebe9f49 Update pgo profiles - aarch64 2025-03-15 04:21:14 +02:00
Jenkins Promoter
3633fb9ff8 Update pgo profiles - x86_64 2025-03-15 04:13:25 +02:00
Raphael S. Carvalho
33b5f27057 service: Introduce rack-aware co-location migrations for tablet merge
Merge co-location can emit migrations across racks even when RF=#racks,
reducing availability and affecting consistency of base-view pairing.

Given replica set of sibling tablets T0 and T1 below:
[T0: (rack1,rack3,rack2)]
[T1: (rack2,rack1,rack3)]

Merge will co-locate T1:rack2 into T0:rack1, T1 will be temporarily only at
only a subset of racks, reducing availability.

This is the main problem fixed by this patch.

It also lays the ground for consistent base-view replica pairing,
which is rack-based. For tables on which views can be created we plan
to enforce the constraint that replicas don't move across racks and
that all tablets use the same set of racks (RF=#racks). This patch
avoids moving replicas across racks unless it's necessary, so if the
constraint is satisfied before merge, there will be no co-locating
migrations across racks. This constraint of RF=#racks is not enforced
yet, it requires more extensive changes.

Fixes #22994.
Refs #17265.

This patch is based on Raphael's work done in PR #23081. The main differences are:

1) Instead of sorting replicas by rack, we try to find
    replicas in sibling tablets which belong to the same rack.
    This is similar to how we match replicas within the same host.
    It reduces number of across-rack migrations even if RF!=#racks,
    which the original patch didn't handle.
    Unlike the original patch, it also avoids rack-overloaded in case
    RF!=#racks

2) We emit across-rack co-locating migrations if we have no other choice
   in order to finalize the merge

   This is ok, since views are not supported with tablets yet. Later,
   we will disallow this for tables which have views, and we will
   allow creating views in the first place only when no such migrations
   can happen (RF=#racks).

3) Added boost unit test which checks that rack overload is avoided during merge
   in case RF<#racks

4) Moved logging of across-rack migration to debug level

5) Exposed metric for across-rack co-locating migrations

(cherry picked from commit af949f3b6a)

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
2025-03-14 20:02:33 +01:00
Anna Stuchlik
11ecc886c3 doc: Remove "experimental" from ALTER KEYSPACE with Tablets
Altering a keyspace with tablets is no longer experimental.
This commit removes the "Experimental" label from the feature.

Fixes https://github.com/scylladb/scylladb/issues/23166

Closes scylladb/scylladb#23183

(cherry picked from commit 562b5db5b8)

Closes scylladb/scylladb#23274
2025-03-14 13:57:55 +01:00
Botond Dénes
eb147ec564 Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec
This PR converts boost load balancer tests in preparation for load balancer changes
which add per-table tablet hints. After those changes, load balancer consults with the replication
strategy in the database, so we need to create proper schema in the
database. To do that, we need proper topology for replication
strategies which use RF > 1, otherwise keyspace creation will fail.

Topology is created in tests via group0 commands, which is abstracted by
the new `topology_builder` class.

Tests cannot modify token_metadata only in memory now as it needs to be
consistent with the schema and on-disk metadata. That's why modifications to
tablet metadata are now made under group0 guard and save back metadata to disk.

Closes scylladb/scylladb#22648

* github.com:scylladb/scylladb:
  test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario
  tests: tablets: Set initial tablets to 1 to exit growing mode
  test: tablets_test: Create proper schema in load balancer tests
  test: lib: Introduce topology_builder
  test: cql_test_env: Expose topology_state_machine
  topology_state_machine: Introduce lock transition

(cherry picked from commit 51a273401c)
2025-03-13 14:08:30 +01:00
Tomasz Grabiec
637e5fc9b5 locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
For example, nodes which are being decommissioned should not be
consider as available capacity for new tables. We don't allocate
tablets on such nodes.

Would result in higher per-shard load then planned.

Closes scylladb/scylladb#22657

(cherry picked from commit 3bb19e9ac9)
2025-03-13 14:08:27 +01:00
Tomasz Grabiec
0d77754c63 locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
In that case, new_racks will be used, but when we discover no
candidates, we try to pop from existing_racks.

Fixes #22625

Closes scylladb/scylladb#22652

(cherry picked from commit e22e3b21b1)
2025-03-13 14:00:48 +01:00
Benny Halevy
5481c9aedd docs: document the views-with-tablets experimental feature
Refs scylladb/scylladb#22217

Fixes scylladb/scylladb#22893

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22896

(cherry picked from commit 55dbf5493c)

Closes scylladb/scylladb#23024
2025-03-10 13:26:36 +01:00
Botond Dénes
59db708cba Merge '[Backport 2025.1] tablets: repair: fix hosts and dcs filters behavior for tablet repair' from Scylladb[bot]
If hosts and/or dcs filters are specified for tablet repair and
some replicas match these filters, choose the replica that will
be the repair master according to round-robin principle
(currently it's always the first replica).

If hosts and/or dcs filters are specified for tablet repair and
no replica matches these filters, the repair succeeds and
the repair request is removed (currently an exception is thrown
and tablet repair scheduler reschedules the repair forever).

Fixes: https://github.com/scylladb/scylladb/issues/23100.

Needs backport to 2025.1 that introduces hosts and dcs filters for tablet repair

- (cherry picked from commit 9bce40d917)

- (cherry picked from commit fe4e99d7b3)

- (cherry picked from commit 2b538d228c)

- (cherry picked from commit c40eaa0577)

- (cherry picked from commit c7c6d820d7)

Parent PR: #23101

Closes scylladb/scylladb#23109

* github.com:scylladb/scylladb:
  test: add new cases to tablet_repair tests
  test: extract repiar check to function
  locator: add round-robin selection of filtered replicas
  locator: add tablet_task_info::selected_by_filters
  service: finish repair successfully if no matching replica found
2025-03-10 12:49:01 +02:00
Botond Dénes
28690f8203 Merge '[Backport 2025.1] repair: Introduce Host and DC filter support' from Scylladb[bot]
Currently, the tablet repair scheduler repairs all replicas of a tablet. It does not support hosts or DCs selection. It should be enough for most cases. However, users might still want to limit the repair to certain hosts or DCs in production. https://github.com/scylladb/scylladb/pull/21985 added the preparation work to add the config options for the selection. This patch adds the hosts or DCs selection support.

Fixes https://github.com/scylladb/scylladb/issues/22417

New feature. No backport is needed.

- (cherry picked from commit 4c75701756)

- (cherry picked from commit 5545289bfa)

- (cherry picked from commit 1c8a41e2dd)

- (cherry picked from commit e499f7c971)

Parent PR: #22621

Closes scylladb/scylladb#23080

* github.com:scylladb/scylladb:
  test: add test to check dcs and hosts repair filter
  test: add repair dc selection to test_tablet_metadata_persistence
  repair: Introduce Host and DC filter support
  docs: locator: update the docs and formatter of tablet_task_info
2025-03-10 12:48:49 +02:00
Anna Stuchlik
235c859b98 doc: zero-token nodes and Arbiter DC
This commit adds documentation for zero-token nodes and an explanation
of how to use them to set up an arbiter DC to prevent a quorum loss
in multi-DC deployments.

The commit adds two documents:
- The one in Architecture describes zero-token nodes.
- The other in Cluster Management explains how to use them.

We need separate documents because zero-token nodes may be used
for other purposes in the future.

In addition, the documents are cross-linked, and the link is added
to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document.

Refs https://github.com/scylladb/scylladb/pull/19684

Fixes https://github.com/scylladb/scylladb/issues/20294

Closes scylladb/scylladb#21348

(cherry picked from commit 9ac0aa7bba)

Closes scylladb/scylladb#23201
2025-03-10 10:59:07 +01:00
Anna Stuchlik
5453e85f39 doc: remove the reference to the 6.2 version
This commit removes the OSS version name, which is irrelevant
and confusing for 2025.1 and later users.
Also, it updates the warning to avoid specifying the release
when the deprecated feature will be removed.

Fixes https://github.com/scylladb/scylladb/issues/22839

Closes scylladb/scylladb#22936

(cherry picked from commit d0a48c5661)

Closes scylladb/scylladb#23022
2025-03-07 12:53:42 +02:00
Anna Stuchlik
7a6bcb3a3f doc: remove references to Enterprise
This commit removes the redundant references to Enterprise,
which are no longer valid.

Fixes https://github.com/scylladb/scylladb/issues/22927

Closes scylladb/scylladb#22930

(cherry picked from commit a28bbc22bd)

Closes scylladb/scylladb#22963
2025-03-07 12:53:22 +02:00
Anna Stuchlik
8b2a382eb6 doc: add support for Ubuntu 24.04 in 2024.1
Fixes https://github.com/scylladb/scylladb/issues/22841

Refs https://github.com/scylladb/scylla-enterprise/issues/4550

Closes scylladb/scylladb#22843

(cherry picked from commit 439463dbbf)

Closes scylladb/scylladb#23092
2025-03-07 12:51:13 +02:00
Dusan Malusev
cdd51d8b7a docs: add instruction for installing cassandra-stress
Signed-off-by: Dusan Malusev <dusan.malusev@scylladb.com>

Closes scylladb/scylladb#21723

(cherry picked from commit 4e6ea232d2)

Closes scylladb/scylladb#22947
2025-03-07 11:48:46 +02:00
Anna Stuchlik
88a8d140b3 doc: add information about tablets limitation to the CQL page
This commit adds a link to the Limitations section on the Tablets page
to the CQL pag, the tablets option.
This is actually the place where the user will need the information:
when creating a keyspace.

In addition, I've reorganized the section for better readability
(otherwise, the section about limitations was easy to miss)
and moved the section up on the page.

Note that I've removed the updated content from the  `_common` folder
(which I deleted) to the .rst page - we no longer split OSS and Enterprise,
so there's no need to keep using the `scylladb_include_flag` directive
to include OSS- and Ent-specific content.

Fixes https://github.com/scylladb/scylladb/issues/22892

Fixes https://github.com/scylladb/scylladb/issues/22940

Closes scylladb/scylladb#22939

(cherry picked from commit 0999fad279)

Closes scylladb/scylladb#23091
2025-03-07 11:48:07 +02:00
Aleksandra Martyniuk
1957dac2b4 test: add new cases to tablet_repair tests
Add tests for tablet repair with host and dc filters that select
one or no replica.

(cherry picked from commit c7c6d820d7)
2025-03-05 10:59:00 +01:00
Aleksandra Martyniuk
1091ef89e1 test: extract repiar check to function
(cherry picked from commit c40eaa0577)
2025-03-05 10:59:00 +01:00
Aleksandra Martyniuk
b081e07ffa locator: add round-robin selection of filtered replicas
(cherry picked from commit 2b538d228c)
2025-03-05 10:58:59 +01:00
Aleksandra Martyniuk
1f102ca2f7 locator: add tablet_task_info::selected_by_filters
Extract dcs and hosts filters check to a method.

(cherry picked from commit fe4e99d7b3)
2025-03-05 10:36:51 +01:00
Aleksandra Martyniuk
8a98f0d5b6 service: finish repair successfully if no matching replica found
If hosts and/or dcs filters are specified for tablet repair and
no replica matches these filters, an exception is thrown. The repair
fails and tablet repair scheduler reschedules it forever.

Such a repair should actually succeed (as all specified relpicas were
repaired) and the repair request should be removed.

Treat the repair as successful if the filters were specified and
selected no replica.

(cherry picked from commit 9bce40d917)
2025-03-05 10:36:50 +01:00
Anna Stuchlik
cdae92065b doc: add the 2025.1 upgrade guides and reorganize the upgrade section
This commit adds the upgrade guides relevant in version 2025.1:
- From 6.2 to 2025.1
- From 2024.x to 2025.1

It also removes the upgrade guides that are not relevant in 2025.1 source available:
- Open Source upgrade guides
- From Open Source to Enterprise upgrade guides
- Links to the Enterprise upgrade guides

Also, as part of this PR, the remaining relevant content has been moved to
the new About Upgrade page.

WHAT NEEDS TO BE REVIEWED
- Review the instructions in the 6.2-to-2025.1 guide
- Review the instructions in the 2024.x-to-2025.1 guide
- Verify that there are no references to Open Source and Enterprise.

The scope of this PR does not have to include metrics - the info can be added
in a follow-up PR.

Fixes https://github.com/scylladb/scylladb/issues/22208
Fixes https://github.com/scylladb/scylladb/issues/22209
Fixes https://github.com/scylladb/scylladb/issues/23072
Fixes https://github.com/scylladb/scylladb/issues/22346

Closes scylladb/scylladb#22352

(cherry picked from commit 850aec58e0)

Closes scylladb/scylladb#23106
2025-03-04 08:15:08 +02:00
Jenkins Promoter
4813c48d64 Update pgo profiles - aarch64 2025-03-01 04:23:19 +02:00
Jenkins Promoter
b623b108c3 Update pgo profiles - x86_64 2025-03-01 04:05:24 +02:00
Aleksandra Martyniuk
7fdc7bdc4b test: add test to check dcs and hosts repair filter
(cherry picked from commit e499f7c971)
2025-02-27 12:14:47 +01:00
Aleksandra Martyniuk
c2e926850d test: add repair dc selection to test_tablet_metadata_persistence
(cherry picked from commit 1c8a41e2dd)
2025-02-27 12:14:47 +01:00
Asias He
6d5b029812 repair: Introduce Host and DC filter support
Currently, the tablet repair scheduler repairs all replicas of a tablet.
It does not support hosts or DCs selection. It should be enough for most
cases. However, users might still want to limit the repair to certain
hosts or DCs in production. #21985 added the preparation work to add the
config options for the selection. This patch adds the hosts or DCs
selection support.

Fixes #22417

(cherry picked from commit 5545289bfa)
2025-02-27 12:14:44 +01:00
Aleksandra Martyniuk
ffeb55cf77 docs: locator: update the docs and formatter of tablet_task_info
(cherry picked from commit 4c75701756)
2025-02-26 23:49:50 +00:00
Jenkins Promoter
37aa7c216c Update ScyllaDB version to: 2025.1.0-rc4 2025-02-25 21:33:18 +02:00
Gleb Natapov
0b0e9f0c32 treewide: include build_mode.hh for SCYLLA_BUILD_MODE_RELEASE where it is missing
Fixes: #22914

Closes scylladb/scylladb#22915

(cherry picked from commit 914c9f1711)

Closes scylladb/scylladb#22962
2025-02-25 18:12:54 +03:00
Evgeniy Naydanov
871fabd60a test.py: test_random_failures: improve handling of hung node
In some cases the paused/unpaused node can hang not after 30s timeout.
This make the test flaky.  Change the condition to always check the
coordinator's log if there is a hung node.

Add `stop_after_streaming` to the list of error injections which can
cause a node's hang.

Also add a wait for a new coordinator election in cluster events
which cause such elections.

Closes scylladb/scylladb#22825

(cherry picked from commit 99be9ac8d8)

Closes scylladb/scylladb#23007
2025-02-25 14:31:51 +03:00
Pavel Emelyanov
aa5cb15166 Merge 'Alternator: implement UpdateTable operation to add or delete GSI' from Nadav Har'El
In this series we implement the UpdateTable operation to add a GSI to an existing table, or remove a GSI from a table. As the individual commit messages will explained, this required changing how Alternator stores materialized view keys - instead of insisting that these key must be real columns (that is **not** the case when adding a GSI to an existing table), the materialized view can now take as its key any Alternator attribute serialized inside the ":attrs" map holding all non-key attributes. Fixes #11567.

We also fix the IndexStatus and Backfilling attributes returned by DescribeTable - as DynamoDB API users use this API to discover when a newly added GSI completed its "backfilling" (what we call "view building") stage. Fixes #11471.

This series should not be backported lightly - it's a new feature and required fairly large and intrusive changes that can introduce bugs to use cases that don't even use Alternator or its UpdateTable operations - every user of CQL materialized views or secondary indexes, as well as Alternator GSI or LSI, will use modified code. **It should be backported to 2025.1**, though - this version was actually branched long after this PR was sent, and it provides a feature that was promised for 2025.1.

Closes scylladb/scylladb#21989

* github.com:scylladb/scylladb:
  alternator: fix view build on oversized GSI key attribute
  mv: clean up do_delete_old_entry
  test/alternator: unflake test for IndexStatus
  test/alternator: work around unrelated bug causing test flakiness
  docs/alternator: adding a GSI is no longer an unimplemented feature
  test/alternator: remove xfail from all tests for issue 11567
  alternator: overhaul implementation of GSIs and support UpdateTable
  mv: support regular_column_transformation key columns in view
  alternator: add new materialized-view computed column for item in map
  build: in cmake build, schema needs alternator
  build: build tests with Alternator
  alternator: add function serialized_value_if_type()
  mv: introduce regular_column_transformation, a new type of computed column
  alternator: add IndexStatus/Backfilling in DescribeTable
  alternator: add "LimitExceededException" error type
  docs/alternator: document two more unimplemented Alternator features

(cherry picked from commit 529ff3efa5)

Closes scylladb/scylladb#22826
2025-02-18 19:05:21 +02:00
Jenkins Promoter
13d79ba990 Update ScyllaDB version to: 2025.1.0-rc3 2025-02-18 15:06:57 +02:00
Nadav Har'El
35b410326b test/topology_custom: fix very slow test test_localnodes_broadcast_rpc_address
The test
topology_custom/test_alternator::test_localnodes_broadcast_rpc_address
sets up nodes with a silly "broadcast rpc address" and checks that
Alternator's "/localnodes" requests returns it correctly.

The problem is that although we don't use CQL in this test, the test
framework does open a CQL connection when the test starts, and closes
it when it ends. It turns out that when we set a silly "broadcast RPC
address", the driver tends to try to connect to it when shutting down,
I'm not even sure why. But the choice of the silly address was 1.2.3.4
is unfortunate, because this IP address is actually routable - and
the driver hangs until it times out (in practice, in a bit over two
minutes). This trivial patch changes 1.2.3.4 to 127.0.0.0 - and equally
silly address but one to which connections fail immediately.

Before this patch, the test often takes more than 2 minutes to finish
on my laptop, after this patch, it always finishes in 4-5 seconds.

Fixes #22744

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22746

(cherry picked from commit f89235517d)

Closes scylladb/scylladb#22875
2025-02-18 10:33:21 +02:00
Botond Dénes
12a3fcceae Merge '[Backport 2025.1] sstable_loader: fix cross-shard resource cleanup in download_task_impl ' from Scylladb[bot]
This PR addresses two related issues in our task system:

1. Prepares for asynchronous resource cleanup by converting release_resources() to a coroutine. This refactoring enables future improvements in how we handle resource cleanup.

2. Fixes a cross-shard resource cleanup issue in the SSTable loader where destruction of per-shard progress elements could trigger "shared_ptr accessed on non-owner cpu" errors in multi-shard environments. The fix uses coroutines to ensure resources are released on their owner shards.

Fixes #22759

---

this change addresses a regression introduced by d815d7013c, which is contained by 2025.1 and master branches. so it should be backported to 2025.1 branch.

- (cherry picked from commit 4c1f1baab4)

- (cherry picked from commit b448fea260)

Parent PR: #22791

Closes scylladb/scylladb#22871

* github.com:scylladb/scylladb:
  sstable_loader: fix cross-shard resource cleanup in download_task_impl
  tasks: make release_resources() a coroutine
2025-02-18 10:32:48 +02:00
Gleb Natapov
040c59674a api: initialize token metadata API after starting the gossiper
Token metadata API now depend on gossiper to do ip to host id mappings,
so initialized it after the gossiper is initialized and de-initialized
it before gossiper is stopped.

Fixes: scylladb/scylladb#22743

Closes scylladb/scylladb#22760

(cherry picked from commit d288d79d78)

Closes scylladb/scylladb#22854
2025-02-18 10:32:24 +02:00
Asias He
b50a6657e8 repair: Add await_completion option for tablet_repair api
Set true to wait for the repair to complete. Set false to skip waiting
for the repair to complete. When the option is not provided, it defaults
to false.

It is useful for management tool that wants the api to be async.

Fixes #22418

Closes scylladb/scylladb#22436

(cherry picked from commit fb318d0c81)

Closes scylladb/scylladb#22851
2025-02-18 10:31:53 +02:00
Botond Dénes
93479ffcf9 Merge '[Backport 2025.1] raft/group0_state_machine: load current RPC compression dict on startup' from Michał Chojnowski
We are supposed to be loading the most recent RPC compression dictionary on startup, but we forgot to port the relevant piece of logic during the source-available port. This causes a restarted node not to use the dictionary for RPC compression until the next dictionary update.

Fix that.

Fixes #22738

This is more of a bugfix than an improvement, so it should be backported to 2025.1.

* (cherry picked from commit [dd82b40](dd82b40186))

* (cherry picked from commit [8fb2ea6](8fb2ea61ba))

Additionally cherry picked https://github.com/scylladb/scylladb/pull/22836 to fix the timeout.

Parent PR: #22739

Closes scylladb/scylladb#22837

* github.com:scylladb/scylladb:
  test_rpc_compression.py: fix an overly-short timeout
  test_rpc_compression.py: test the dictionaries are loaded on startup
  raft/group0_state_machine: load current RPC compression dict on startup
2025-02-18 10:31:23 +02:00
Botond Dénes
38bd74b2d4 tools/scylla-nodetool: netstats: don't assume both senders and receivers
The code currently assumes that a session has both sender and receiver
streams, but it is possible to have just one or the other.
Change the test to include this scenario and remove this assumption from
the code.

Fixes: #22770

Closes scylladb/scylladb#22771

(cherry picked from commit 87e8e00de6)

Closes scylladb/scylladb#22874
2025-02-17 14:34:36 +02:00
Takuya ASADA
6ee1779578 dist: fix upgrade error from 2024.1
We need to allow replacing nodetool from scylla-enterprise-tools < 2024.2,
just like we did for scylla-tools < 5.5.
This is required to make packages able to upgrade from 2024.1.

Fixes #22820

Closes scylladb/scylladb#22821

(cherry picked from commit b5e306047f)

Closes scylladb/scylladb#22867
2025-02-16 14:47:48 +02:00
Kefu Chai
9fe2301647 sstable_loader: fix cross-shard resource cleanup in download_task_impl
Previously, download_task_impl's destructor would destroy per-shard progress
elements on whatever shard the task was destroyed on. In multi-shard
environments, this caused "shared_ptr accessed on non-owner cpu" errors when
attempting to free memory allocated on a different shard.

Fix by:
- Convert progress_per_shard into a sharded service
- Stop the service on owner shards during cleanup using coroutines
- Add operator+= to stream_progress to leverage seastar's built-in adder
  instead of a custom adder struct

Alternative approaches considered:

1. Using foreign_ptr: Rejected as it would require interface changes
   that complicate stream delegation. foreign_ptr manages the underlying
   pointee with another smart pointer but does not expose the smart
   pointer instance in its APIs, making it impossible to use
   shared_ptr<stream_progress> in the interface.
2. Using vector<stream_progress>: Rejected for similar interface
   compatibility reasons.

This solution maintains the existing interfaces while ensuring proper
cross-shard cleanup.

Fixes scylladb/scylladb#22759
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit b448fea260)
2025-02-15 22:46:43 +00:00
Kefu Chai
6b27459de3 tasks: make release_resources() a coroutine
Convert tasks::task_manager::task::impl::release_resources() to a coroutine
to prepare for upcoming changes that will implement asynchronous resource
release.

This is a preparatory refactoring that enables future coroutine-based
implementation of resource cleanup logic.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 4c1f1baab4)
2025-02-15 22:46:43 +00:00
Jenkins Promoter
48130ca2e9 Update pgo profiles - aarch64 2025-02-15 04:20:15 +02:00
Jenkins Promoter
5054087f0b Update pgo profiles - x86_64 2025-02-15 04:05:06 +02:00
Botond Dénes
889fb9c18b Update tools/java submodule
* tools/java 807e991d...6dfe728a (1):
  > dist: support smooth upgrade from enterprise to source availalbe

Fixes: scylladb/scylladb#22820
2025-02-14 11:14:07 +02:00
Botond Dénes
c627aff5f7 Merge '[Backport 2025.1] reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Scylladb[bot]
`set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature eviction.

Fixes: https://github.com/scylladb/scylladb/issues/22629

Backport required to all active releases, they are all affected.

- (cherry picked from commit a3ae0c7cee)

- (cherry picked from commit 9174f27cc8)

Parent PR: #22701

Closes scylladb/scylladb#22752

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: set_notify_handler(): disable timeout
  reader_permit: mark check_abort() as const
2025-02-13 15:24:54 +02:00
Michał Chojnowski
ffca4a9f85 test_rpc_compression.py: fix an overly-short timeout
The timeout of 10 seconds is too small for CI.
I didn't mean to make it so short, it was an accident.

Fix that by changing the timeout to 10 minutes.
2025-02-13 10:03:13 +01:00
Michał Chojnowski
2c0ffdce31 pgo: disable tablets for training with secondary index, lwt and counters
As of right now, materialized views (and consequently secondary
indexes), lwt and counters are unsupported or experimental with tablets.
Since by defaults tablets are enabled, training cases using those
features are currently broken.

The right thing to do here is to disable tablets in those cases.

Fixes https://github.com/scylladb/scylladb/issues/22638

Closes scylladb/scylladb#22661

(cherry picked from commit bea434f417)

Closes scylladb/scylladb#22808
2025-02-13 09:42:09 +02:00
Botond Dénes
ff7e93ddd5 db/config: reader_concurrency_semaphore_cpu_concurrency: bump default to 2
This config item controls how many CPU-bound reads are allowed to run in
parallel. The effective concurrency of a single CPU core is 1, so
allowing more than one CPU-bound reads to run concurrently will just
result in time-sharing and both reads having higher latency.
However, restricting concurrency to 1 means that a CPU bound read that
takes a lot of time to complete can block other quick reads while it is
running. Increase this default setting to 2 as a compromise between not
over-using time-sharing, while not allowing such slow reads to block the
queue behind them.

Fixes: #22450

Closes scylladb/scylladb#22679

(cherry picked from commit 3d12451d1f)

Closes scylladb/scylladb#22722
2025-02-13 09:40:25 +02:00
Botond Dénes
1998733228 service: query_pager: fix last-position for filtering queries
On short-pages, cut short because of a tombstone prefix.
When page-results are filtered and the filter drops some rows, the
last-position is taken from the page visitor, which does the filtering.
This means that last partition and row position will be that of the last
row the filter saw. This will not match the last position of the
replica, when the replica cut the page due to tombstones.
When fetching the next page, this means that all the tombstone suffix of
the last page, will be re-fetched. Worse still: the last position of the
next page will not match that of the saved reader left on the replica, so
the saved reader will be dropped and a new one created from scratch.
This wasted work will show up as elevated tail latencies.
Fix by always taking the last position from raw query results.

Fixes: #22620

Closes scylladb/scylladb#22622

(cherry picked from commit 7ce932ce01)

Closes scylladb/scylladb#22719
2025-02-13 09:40:05 +02:00
Botond Dénes
e79ee2ddb0 reader_concurrency_semaphore: foreach_permit(): include _inactive_reads
So inactive reads show up in semaphore diagnostics dumps (currently the
only non-test user of this method).

Fixes: #22574

Closes scylladb/scylladb#22575

(cherry picked from commit e1b1a2068a)

Closes scylladb/scylladb#22611
2025-02-13 09:39:39 +02:00
Aleksandra Martyniuk
4c39943b3f replica: mark registry entry as synch after the table is added
When a replica get a write request it performs get_schema_for_write,
which waits until the schema is synced. However, database::add_column_family
marks a schema as synced before the table is added. Hence, the write may
see the schema as synced, but hit no_such_column_family as the table
hasn't been added yet.

Mark schema as synced after the table is added to database::_tables_metadata.

Fixes: #22347.

Closes scylladb/scylladb#22348

(cherry picked from commit 328818a50f)

Closes scylladb/scylladb#22604
2025-02-13 09:39:13 +02:00
Calle Wilund
17c86f8b57 encryption: Fix encrypted components mask check in describe
Fixes #22401

In the fix for scylladb/scylla-enterprise#892, the extraction and check for sstable component encryption mask was copied
to a subroutine for description purposes, but a very important 1 << <value> shift was somehow
left on the floor.

Without this, the check for whether we actually contain a component encrypted can be wholly
broken for some components.

Closes scylladb/scylladb#22398

(cherry picked from commit 7db14420b7)

Closes scylladb/scylladb#22599
2025-02-13 09:38:41 +02:00
Botond Dénes
d05b3897a2 Merge '[Backport 2025.1] api: task_manager: do not unregister finish task when its status is queried' from Scylladb[bot]
Currently, when the status of a task is queried and the task is already finished,
it gets unregistered. Getting the status shouldn't be a one-time operation.

Stop removing the task after its status is queried. Adjust tests not to rely
on this behavior. Add task_manager/drain API and nodetool tasks drain
command to remove finished tasks in the module.

Fixes: https://github.com/scylladb/scylladb/issues/21388.

It's a fix to task_manager API, should be backported to all branches

- (cherry picked from commit e37d1bcb98)

- (cherry picked from commit 18cc79176a)

Parent PR: #22310

Closes scylladb/scylladb#22598

* github.com:scylladb/scylladb:
  api: task_manager: do not unregister tasks on get_status
  api: task_manager: add /task_manager/drain
2025-02-13 09:38:12 +02:00
Botond Dénes
9116fc635e Merge '[Backport 2025.1] split: run set_split_mode() on all storage groups during all_storage_groups_split()' from Scylladb[bot]
`tablet_storage_group_manager::all_storage_groups_split()` calls `set_split_mode()` for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using `std::ranges::all_of()` which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurrence of the predicate (`set_split_mode()`) returning false. `set_split_mode()` creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups.

The missing split compaction groups are later created in `tablet_storage_group_manager::split_all_storage_groups()` which also calls `set_split_mode()`, and that is the reason why split completes successfully. The problem is that
`tablet_storage_group_manager::all_storage_groups_split()` runs under a group0 guard, but
`tablet_storage_group_manager::split_all_storage_groups()` does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE

Fixes #22431

This is a bugfix and should be back ported to versions with tablets: 6.1 6.2 and 2025.1

- (cherry picked from commit 24e8d2a55c)

- (cherry picked from commit 8bff7786a8)

Parent PR: #22330

Closes scylladb/scylladb#22560

* github.com:scylladb/scylladb:
  test: add reproducer and test for fix to split ready CG creation
  table: run set_split_mode() on all storage groups during all_storage_groups_split()
2025-02-13 09:36:23 +02:00
Raphael S. Carvalho
5f74b5fdff test: Use linux-aio backend again on seastar-based tests
Since mid December, tests started failing with ENOMEM while
submitting I/O requests.

Logs of failed tests show IO uring was used as backend, but we
never deliberately switched to IO uring. Investigation pointed
to it happening accidentaly in commit 1bac6b75dc,
which turned on IO uring for allowing native tool in production,
and picked linux-aio backend explicitly when initializing Scylla.
But it missed that seastar-based tests would pick the default
backend, which is io_uring once enabled.

There's a reason we never made io_uring the default, which is
that it's not stable enough, and turns out we made the right
choice back then and it apparently continue to be unstable
causing flakiness in the tests.

Let's undo that accidental change in tests by explicitly
picking the linux-aio backend for seastar-based tests.
This should hopefully bring back stability.

Refs #21968.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22695

(cherry picked from commit ce65164315)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22800
2025-02-12 20:50:51 +02:00
Michał Chojnowski
a746fd2bb8 test_rpc_compression.py: test the dictionaries are loaded on startup
Reproduces scylladb/scylladb#22738

(cherry picked from commit 8fb2ea61ba)
2025-02-11 15:52:34 +00:00
Michał Chojnowski
89a5889bed raft/group0_state_machine: load current RPC compression dict on startup
We are supposed to be loading the most recent RPC compression dictionary
on startup, but we forgot to port the relevant piece of logic during
the source-available port.

(cherry picked from commit dd82b40186)
2025-02-11 15:52:33 +00:00
Michael Litvak
8d1f6df818 test/test_view_build_status: fix flaky asserts
In few test cases of test_view_build_status we create a view, wait for
it and then query the view_build_status table and expect it to have all
rows for each node and view.

But it may fail because it could happen that the wait_for_view query and
the following queries are done on different nodes, and some of the nodes
didn't apply all the table updates yet, so they have missing rows.

To fix it, we change the assert to work in the eventual consistency
sense, retrying until the number of rows is as expectd.

Fixes scylladb/scylladb#22644

Closes scylladb/scylladb#22654

(cherry picked from commit c098e9a327)

Closes scylladb/scylladb#22780
2025-02-11 10:21:54 +01:00
Avi Kivity
75320c9a13 Update tools/cqlsh submodule (driver update, upgradability)
* tools/cqlsh 52c6130...02ec7c5 (18):
  > chore(deps): update dependency scylla-driver to v3.28.2
  > dist: support smooth upgrade from enterprise to source availalbe
  > github action: fix downloading of artifacts
  > chore(deps): update docker/setup-buildx-action action to v3
  > chore(deps): update docker/login-action action to v3
  > chore(deps): update docker/build-push-action action to v6
  > chore(deps): update docker/setup-qemu-action action to v3
  > chore(deps): update peter-evans/dockerhub-description action to v4
  > upload actions: update the usage for multiple artifacts
  > chore(deps): update actions/download-artifact action to v4.1.8
  > chore(deps): update dependency scylla-driver to v3.28.0
  > chore(deps): update pypa/cibuildwheel action to v2.22.0
  > chore(deps): update actions/checkout action to v4
  > chore(deps): update python docker tag to v3.13
  > chore(deps): update actions/upload-artifact action to v4
  > github actions: update it to work
  > add option to output driver debug
  > Add renovate.json (#107)

Fixes: https://github.com/scylladb/scylladb/issues/22420
2025-02-09 18:07:55 +02:00
Yaron Kaikov
359af0ae9c dist: support smooth upgrade from enterprise to source availalbe
When upgrading for example from `2024.1` to `2025.1` the package name is
not identical casuing the upgrade command to fail:
```
Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade scylla -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"'
Exit code: 100
Stdout:
Selecting previously unselected package scylla.
Preparing to unpack .../6-scylla_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb ...
Unpacking scylla (2025.1.0~dev-0.20250118.1ef2d9d07692-1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-JbOMav/0-scylla-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/1-scylla-python3_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/2-scylla-server_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/3-scylla-kernel-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/4-scylla-node-exporter_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/5-scylla-cqlsh_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
Stderr:
E: Sub-process /usr/bin/dpkg returned an error code (1)
```

Adding `Obsoletes` (for rpm) and `Replaces` (for deb)

Fixes: https://github.com/scylladb/scylladb/issues/22420

Closes scylladb/scylladb#22457

(cherry picked from commit 93f53f4eb8)

Closes scylladb/scylladb#22753
2025-02-09 18:06:52 +02:00
Avi Kivity
7f350558c2 Update tools/python3 (smooth upgrade from enterprise)
* tools/python3 8415caf...91c9531 (1):
  > dist: support smooth upgrade from enterprise to source availalbe

Ref #22420
2025-02-09 14:22:33 +02:00
Botond Dénes
fa9b1800b6 reader_concurrency_semaphore: set_notify_handler(): disable timeout
set_notify_handler() is called after a querier was inserted into the
querier cache. It has two purposes: set a callback for eviction and set
a TTL for the cache entry. This latter was not disabling the
pre-existing timeout of the permit (if any) and this would lead to
premature eviction of the cache entry if the timeout was shorter than
TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature
eviction.

Fixes: #scylladb/scylladb#22629
(cherry picked from commit 9174f27cc8)
2025-02-09 00:32:38 +00:00
Botond Dénes
c25d447b9c reader_permit: mark check_abort() as const
All it does is read one field, making it const makes using it easier.

(cherry picked from commit a3ae0c7cee)
2025-02-09 00:32:38 +00:00
Ferenc Szili
cf147d8f85 truncate: create session during request handling
Currently, the session ID under which the truncate for tablets request is
running is created during the request creation and queuing. This is a problem
because this could overwrite the session ID of any ongoing operation on
system.topology#session

This change moves the creation of the session ID for truncate from the request
creation to the request handling.

Fixes #22613

Closes scylladb/scylladb#22615

(cherry picked from commit a59618e83d)

Closes scylladb/scylladb#22705
2025-02-06 10:09:00 +02:00
Botond Dénes
319626e941 reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload
with_permit() creates a permit, with a self-reference, to avoid
attaching a continuation to the permit's run function. This
self-reference is used to keep the permit alive, until the execution
loop processes it. This self reference has to be carefully cleared on
error-paths, otherwise the permit will become a zombie, effectively
leaking memory.
Instead of trying to handle all loose ends, get rid of this
self-reference altogether: ask caller to provide a place to save the
permit, where it will survive until the end of the call. This makes the
call-site a little bit less nice, but it gets rid of a whole class of
possible bugs.

Fixes: #22588

Closes scylladb/scylladb#22624

(cherry picked from commit f2d5819645)

Closes scylladb/scylladb#22704
2025-02-06 10:08:19 +02:00
Aleksandra Martyniuk
cca2d974b6 service: use read barrier in tablet_virtual_task::contains
Currently, when the tablet repair is started, info regarding
the operation is kept in the system.tablets. The new tablet states
are reflected in memory after load_topology_state is called.
Before that, the data in the table and the memory aren't consistent.

To check the supported operations, tablet_virtual_task uses in-memory
tablet_metadata. Hence, it may not see the operation, even though
its info is already kept in system.tablets table.

Run read barrier in tablet_virtual_task::contains to ensure it will
see the latest data. Add a test to check it.

Fixes: #21975.

Closes scylladb/scylladb#21995

(cherry picked from commit 610a761ca2)

Closes scylladb/scylladb#22694
2025-02-06 10:07:51 +02:00
Aleksandra Martyniuk
43f2e5f86b nodetool: tasks: print empty string for start_time/end_time if unspecified
If start_time/end_time is unspecified for a task, task_manager API
returns epoch. Nodetool prints the value in task status.

Fix nodetool tasks commands to print empty string for start_time/end_time
if it isn't specified.

Modify nodetool tasks status docs to show empty end_time.

Fixes: #22373.

Closes scylladb/scylladb#22370

(cherry picked from commit 477ad98b72)

Closes scylladb/scylladb#22601
2025-02-06 10:05:07 +02:00
Takuya ASADA
ad81d49923 dist: Support FIPS mode
- To make Scylla able to run in FIPS-compliant system, add .hmac files for
  crypto libraries on relocatable/rpm/deb packages.
- Currently we just write hmac value on *.hmac files, but there is new
  .hmac file format something like this:

  ```
  [global]
  format-version = 1
  [lib.xxx.so.yy]
  path = /lib64/libxxx.so.yy
  hmac = <hmac>
  ```
  Seems like GnuTLS rejects fips selftest on .libgnutls.so.30.hmac when
  file format is older one.
  Since we need to absolute path on "path" directive, we need to generate
  .libgnutls.so.30.hmac in older format on create-relocatable-script.py,

Fixes scylladb/scylladb#22573

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes scylladb/scylladb#22384

(cherry picked from commit fb4c7dc3d8)

Closes scylladb/scylladb#22587
2025-02-06 10:01:12 +02:00
Wojciech Mitros
138c68d80e mv: forbid views with tablets by default
Materialized views with tablets are not stable yet, but we want
them available as an experimental feature, mainly for teseting.

The feature was added in scylladb/scylladb#21833,
but currently it has no effect. All tests have been updated to use the
feature, so we should finally make it work.
This patch prevents users from creating materialized views in keyspaces
using tablets when the VIEWS_WITH_TABLETS feature is not enabled - such
requests will now get rejected.

Fixes scylladb/scylladb#21832

Closes scylladb/scylladb#22217

(cherry picked from commit 677f9962cf)

Closes scylladb/scylladb#22659
2025-02-04 08:06:23 +01:00
Avi Kivity
e0fb727f18 Update seastar submodule (hwloc failure on some AWS instances)
* seastar 1822136684...a350b5d70e (1):
  > resource: fallback to sysconf when failed to detect memory size from hwloc

Fixes #22382.
2025-02-03 22:47:39 +02:00
Jenkins Promoter
440833ae59 Update ScyllaDB version to: 2025.1.0-rc2 2025-02-03 13:23:18 +02:00
Michael Litvak
246635c426 test/test_view_build_status: fix wrong assert in test
The test expects and asserts that after wait_for_view is completed we
read the view_build_status table and get a row for each node and view.
But this is wrong because wait_for_view may have read the table on one
node, and then we query the table on a different node that didn't insert
all the rows yet, so the assert could fail.

To fix it we change the test to retry and check that eventually all
expected rows are found and then eventually removed on the same host.

Fixes scylladb/scylladb#22547

Closes scylladb/scylladb#22585

(cherry picked from commit 44c06ddfbb)

Closes scylladb/scylladb#22608
2025-02-03 09:24:17 +01:00
Michael Litvak
58eda6670f view_builder: fix loop in view builder when tokens are moved
The view builder builds a view by going over the entire token ring,
consuming the base table partitions, and generating view updates for
each partition.

A view is considered as built when we complete a full cycle of the
token ring. Suppose we start to build a view at a token F. We will
consume all partitions with tokens starting at F until the maximum
token, then go back to the minimum token and consume all partitions
until F, and then we detect that we pass F and complete building the
view. This happens in the view builder consumer in
`check_for_built_views`.

The problem is that we check if we pass the first token F with the
condition `_step.current_token() >= it->first_token` whenever we consume
a new partition or the current_token goes back to the minimum token.
But suppose that we don't have any partitions with a token greater than
or equal to the first token (this could happen if the partition with
token F was moved to another node for example), then this condition will never be
satisfied, and we don't detect correctly when we pass F. Instead, we
go back to the minimum token, building the same token ranges again,
in a possibly infinite loop.

To fix this we add another step when reaching the end of the reader's
stream. When this happens it means we don't have any more fragments to
consume until the end of the range, so we advance the current_token to
the end of the range, simulating a partition, and check for built views
in that range.

Fixes scylladb/scylladb#21829

Closes scylladb/scylladb#22493

(cherry picked from commit 6d34125eb7)

Closes scylladb/scylladb#22607
2025-02-02 22:29:52 +02:00
Jenkins Promoter
28b8896680 Update pgo profiles - aarch64 2025-02-01 04:30:11 +02:00
Jenkins Promoter
e9cae4be17 Update pgo profiles - x86_64 2025-02-01 04:05:22 +02:00
Avi Kivity
daf1c96ad3 seatar: point submodule at scylla-seastar.git
This allows backporting commits to seastar.
2025-01-31 19:47:30 +02:00
Botond Dénes
1a1893078a Merge '[Backport 2025.1] encrypted_file_impl: Check for reads on or past actual file length in transform' from Scylladb[bot]
Fixes #22236

If reading a file and not stopping on block bounds returned by `size()`, we could allow reading from (_file_size+&lt;1-15&gt;) (if crossing block boundary) and try to decrypt this buffer (last one).

Simplest example:
Actual data size: 4095
Physical file size: 4095 + key block size (typically 16)
Read from 4096: -> 15 bytes (padding) -> transform return `_file_size` - `read offset` -> wraparound -> rather larger number than we expected (not to mention the data in question is junk/zero).

Check on last block in `transform` would wrap around size due to us being >= file size (l).
Just do an early bounds check and return zero if we're past the actual data limit.

- (cherry picked from commit e96cc52668)

- (cherry picked from commit 2fb95e4e2f)

Parent PR: #22395

Closes scylladb/scylladb#22583

* github.com:scylladb/scylladb:
  encrypted_file_test: Test reads beyond decrypted file length
  encrypted_file_impl: Check for reads on or past actual file length in transform
2025-01-31 11:38:50 +02:00
Aleksandra Martyniuk
8cc5566a3c api: task_manager: do not unregister tasks on get_status
Currently, /task_manager/task_status_recursive/{task_id} and
/task_manager/task_status/{task_id} unregister queries task if it
has already finished.

The status should not disappear after being queried. Do not unregister
finished task when its status or recursive status is queried.

(cherry picked from commit 18cc79176a)
2025-01-31 08:21:03 +00:00
Aleksandra Martyniuk
1f52ced2ff api: task_manager: add /task_manager/drain
In the following patches, get_status won't be unregistering finished
tasks. However, tests need a functionality to drop a task, so that
they could manipulate only with the tasks for operations that were
invoked by these tests.

Add /task_manager/drain/{module} to unregister all finished tasks
from the module. Add respective nodetool command.

(cherry picked from commit e37d1bcb98)
2025-01-31 08:21:03 +00:00
Avi Kivity
d7e3ab2226 Merge '[Backport 2025.1] truncate: trigger truncate logic from a transition state instead of global topology request' from Ferenc Szili
This is a manual backport of #22452

Truncate table for tablets is implemented as a global topology operation. However, it does not have a transition state associated with it, and performs the truncate logic in topology_coordinator::handle_global_request() while topology::tstate remains empty. This creates problems because topology::is_busy() uses transition_state to determine if the topology state machine is busy, and will return false even though a truncate operation is ongoing.

This change introduces a new topology transition topology::transition_state::truncate_table and moves the truncate logic to a new method topology_coordinator::handle_truncate_table(). This method is now called as a handler of the truncate_table transition state instead of a handler of the truncate_table global topology request.

Fixes #22552

Closes scylladb/scylladb#22557

* github.com:scylladb/scylladb:
  truncate: trigger truncate logic from transition state instead of global request handler
  truncate: add truncate_table transition state
2025-01-30 22:49:17 +02:00
Anna Stuchlik
cf589222a0 doc: update the Web Installer docs to remove OSS
Fixes https://github.com/scylladb/scylladb/issues/22292

Closes scylladb/scylladb#22433

(cherry picked from commit 2a6445343c)

Closes scylladb/scylladb#22581
2025-01-30 13:04:16 +02:00
Anna Stuchlik
156800a3dd doc: add SStable support in 2025.1
This commit adds the information about SStable version support in 2025.1
by replacing "2022.2" with "2022.2 and above".

In addition, this commit removes information about versions that are
no longer supported.

Fixes https://github.com/scylladb/scylladb/issues/22485

Closes scylladb/scylladb#22486

(cherry picked from commit caf598b118)

Closes scylladb/scylladb#22580
2025-01-30 13:03:47 +02:00
Nikos Dragazis
d1e8b02260 encrypted_file_test: Test reads beyond decrypted file length
Add a test to reproduce a bug in the read DMA API of
`encrypted_file_impl` (the file implementation for Encryption-at-Rest).

The test creates an encrypted file that contains padding, and then
attempts to read from an offset within the padding area. Although this
offset is invalid on the decrypted file, the `encrypted_file_impl` makes
no checks and proceeds with the decryption of padding data, which
eventually leads to bogus results.

Refs #22236.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit 8f936b2cbc)
(cherry picked from commit 2fb95e4e2f)
2025-01-30 09:17:31 +00:00
Calle Wilund
a51888694e encrypted_file_impl: Check for reads on or past actual file length in transform
Fixes #22236

If reading a file and not stopping on block bounds returned by `size()`, we could
allow reading from (_file_size+1-15) (block boundary) and try to decrypt this
buffer (last one).
Check on last block in `transform` would wrap around size due to us being >=
file size (l).

Simplest example:
Actual data size: 4095
Physical file size: 4095 + key block size (typically 16)
Read from 4096: -> 15 bytes (padding) -> transform return _file_size - read offset
-> wraparound -> rather larger number than we expected
(not to mention the data in question is junk/zero).

Just do an early bounds check and return zero if we're past the actual data limit.

v2:
* Moved check to a min expression instead
* Added lengthy comment
* Added unit test

v3:
* Fixed read_dma_bulk handling of short, unaligned read
* Added test for unaligned read

v4:
* Added another unaligned test case

(cherry picked from commit e96cc52668)
2025-01-30 09:17:31 +00:00
Botond Dénes
68f134ee23 Merge '[Backport 2025.1] Do not update topology on address change' from Scylladb[bot]
Since now topology does not contain ip addresses there is no need to
create topology on an ip address change. Only peers table has to be
updated. The series factors out peers table update code from
sync_raft_topology_nodes() and calls it on topology and ip address
updates. As a side effect it fixes #22293 since now topology loading
does not require IP do be present, so the assert that is triggered in
this bug is removed.

Fixes: scylladb/scylladb#22293

- (cherry picked from commit ef929c5def)

- (cherry picked from commit fbfef6b28a)

Parent PR: #22519

Closes scylladb/scylladb#22543

* github.com:scylladb/scylladb:
  topology coordinator: do not update topology on address change
  topology coordinator: split out the peer table update functionality from raft state application
2025-01-30 11:14:19 +02:00
Jenkins Promoter
b623c237bc Update ScyllaDB version to: 2025.1.0-rc1 2025-01-30 01:25:18 +02:00
Calle Wilund
8379d545c5 docs: Remove configuration_encryptor
Fixes #21993

Removes configuration_encryptor mention from docs.
The tool itself (java) is not included in the main branch
java tools, thus need not remove from there. Only the words.

Closes scylladb/scylladb#22427

(cherry picked from commit bae5b44b97)

Closes scylladb/scylladb#22556
2025-01-29 20:17:36 +02:00
Michael Litvak
58d13d0daf cdc: fix handling of new generation during raft upgrade
During raft upgrade, a node may gossip about a new CDC generation that
was propagated through raft. The node that receives the generation by
gossip may have not applied the raft update yet, and it will not find
the generation in the system tables. We should consider this error
non-fatal and retry to read until it succeeds or becomes obsolete.

Another issue is when we fail with a "fatal" exception and not retrying
to read, the cdc metadata is left in an inconsistent state that causes
further attempts to insert this CDC generation to fail.

What happens is we complete preparing the new generation by calling `prepare`,
we insert an empty entry for the generation's timestamp, and then we fail. The
next time we try to insert the generation, we skip inserting it because we see
that it already has an entry in the metadata and we determine that
there's nothing to do. But this is wrong, because the entry is empty,
and we should continue to insert the generation.

To fix it, we change `prepare` to return `true` when the entry already
exists but it's empty, indicating we should continue to insert the
generation.

Fixes scylladb/scylladb#21227

Closes scylladb/scylladb#22093

(cherry picked from commit 4f5550d7f2)

Closes scylladb/scylladb#22546
2025-01-29 20:06:18 +02:00
Anna Stuchlik
4def507b1b doc: add OS support for 2025.1 and reorganize the page
This commit adds the OS support information for version 2025.1.
In addition, the OS support page is reorganized so that:
- The content is moved from the include page _common/os-support-info.rst
  to the regular os-support.rst page. The include page was necessary
  to document different support for OSS and Enterprise versions, so
  we don't need it anymore.
- I skipped the entries for versions that won't be supported when 2025.1
  is released: 6.1 and 2023.1.
- I moved the definition of "supported" to the end of the page for better
  readability.
- I've renamed the index entry to "OS Support" to be shorter on the left menu.

Fixes https://github.com/scylladb/scylladb/issues/22474

Closes scylladb/scylladb#22476

(cherry picked from commit 61c822715c)

Closes scylladb/scylladb#22538
2025-01-29 19:48:32 +02:00
Anna Stuchlik
69ad9350cc doc: remove Enterprise labels and directives
This PR removes the now redundant Enterprise labels and directives
from the ScyllDB documentation.

Fixes https://github.com/scylladb/scylladb/issues/22432

Closes scylladb/scylladb#22434

(cherry picked from commit b2a718547f)

Closes scylladb/scylladb#22539
2025-01-29 19:48:11 +02:00
Anna Stuchlik
29e5f5f54d doc: enable the FIPS note in the ScyllaDB docs
This commit removes the information about FIPS out of the '.. only:: enterprise' directive.
As a result, the information will now show in the doc in the ScyllaDB repo
(previously, the directive included the note in the Entrprise docs only).

Refs https://github.com/scylladb/scylla-enterprise/issues/5020

Closes scylladb/scylladb#22374

(cherry picked from commit 1d5ef3dddb)

Closes scylladb/scylladb#22550
2025-01-29 19:47:37 +02:00
Avi Kivity
379b3fa46c Merge '[Backport 2025.1] repair: handle no_such_keyspace in repair preparation phase' from null
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.

Fixes: #22073.

Requires backport to 6.1 and 6.2 as they contain the bug

- (cherry picked from commit bfb1704afa)

- (cherry picked from commit 54e7f2819c)

Parent PR: #22473

Closes scylladb/scylladb#22542

* github.com:scylladb/scylladb:
  test: add test to check if repair handles no_such_keyspace
  repair: handle keyspace dropped
2025-01-29 14:09:23 +02:00
Ferenc Szili
fe869fd902 test: add reproducer and test for fix to split ready CG creation
This adds a reproducer for #22431

In cases where a tablet storage group manager had more than one storage
group, it was possible to create compaction groups outside the group0
guard, which could create problems with operations which should exclude
with compaction group creation.

(cherry picked from commit 8bff7786a8)
2025-01-29 10:10:28 +00:00
Ferenc Szili
dc55a566fa table: run set_split_mode() on all storage groups during all_storage_groups_split()
tablet_storage_group_manager::all_storage_groups_split() calls set_split_mode()
for each of its storage groups to create split ready compaction groups. It does
this by iterating through storage groups using std::ranges::all_of() which is
not guaranteed to iterate through the entire range, and will stop iterating on
the first occurance of the predicate (set_split_mode()) returning false.
set_split_mode() creates the split compaction groups and returns false if the
storage group's main compaction group or merging groups are not empty. This
means that in cases where the tablet storage group manager has non-empty
storage groups, we could have a situation where split compaction groups are not
created for all storage groups.

The missing split compaction groups are later created in
tablet_storage_group_manager::split_all_storage_groups() which also calls
set_split_mode(), and that is the reason why split completes successfully. The
problem is that tablet_storage_group_manager::all_storage_groups_split() runs
under a group0 guard, and tablet_storage_group_manager::split_all_storage_groups()
does not. This can cause problems with operations which should exclude with
compaction group creation. i.e. DROP TABLE/DROP KEYSPACE

(cherry picked from commit 24e8d2a55c)
2025-01-29 10:10:28 +00:00
Ferenc Szili
3bb8039359 truncate: trigger truncate logic from transition state instead of global
request handler

Before this change, the logic of truncate for tablets was triggered from
topology_coordinator::handle_global_request(). This was done without
using a topology transition state which remained empty throughout the
truncate handler's execution.

This change moves the truncate logic to a new method
topology_coordinator::handle_truncate_table(). This method is now called
as a handler of the truncate_table topology transition state instead of
a handler of the trunacate_table global topology request.
2025-01-29 10:48:34 +01:00
Ferenc Szili
9f3838e614 truncate: add truncate_table transition state
Truncate table for tablets is implemented as a global topology operation.
However, it does not have a transition state associated with it, and
performs the truncate logic in handle_global_request() while
topology::tstate remains empty. This creates problems because
topology::is_busy() uses transition_state to determine if the topology
state machine is busy, and will return false even though a truncate
operation is ongoing.

This change adds a new transition state: truncate_table
2025-01-29 10:47:15 +01:00
Gleb Natapov
366212f997 topology coordinator: do not update topology on address change
Since now topology does not contain ip addresses there is no need to
create topology on an ip address change. Only peers table has to be
updated, so call a function that does peers table update only.

(cherry picked from commit fbfef6b28a)
2025-01-28 21:51:11 +00:00
Gleb Natapov
c0637aff81 topology coordinator: split out the peer table update functionality from raft state application
Raft topology state application does two things: re-creates token metadata
and updates peers table if needed. The code for both task is intermixed
now. The patch separates it into separate functions. Will be needed in
the next patch.

(cherry picked from commit ef929c5def)
2025-01-28 21:51:11 +00:00
Aleksandra Martyniuk
dcf436eb84 test: add test to check if repair handles no_such_keyspace
(cherry picked from commit 54e7f2819c)
2025-01-28 21:50:35 +00:00
Aleksandra Martyniuk
8e754e9d41 repair: handle keyspace dropped
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.

(cherry picked from commit bfb1704afa)
2025-01-28 21:50:35 +00:00
Yaron Kaikov
f407799f25 Update ScyllaDB version to: 2025.1.0-rc0 2025-01-27 11:29:45 +02:00
198 changed files with 5338 additions and 3461 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -78,7 +78,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=2025.1.0-dev
VERSION=2025.1.0
if test -f version
then

View File

@@ -88,6 +88,9 @@ public:
static api_error table_not_found(std::string msg) {
return api_error("TableNotFoundException", std::move(msg));
}
static api_error limit_exceeded(std::string msg) {
return api_error("LimitExceededException", std::move(msg));
}
static api_error internal(std::string msg) {
return api_error("InternalServerError", std::move(msg), http::reply::status_type::internal_server_error);
}

View File

@@ -7,6 +7,7 @@
*/
#include <fmt/ranges.h>
#include <seastar/core/on_internal_error.hh>
#include "alternator/executor.hh"
#include "alternator/consumed_capacity.hh"
#include "auth/permission.hh"
@@ -55,6 +56,9 @@
#include "utils/error_injection.hh"
#include "db/schema_tables.hh"
#include "utils/rjson.hh"
#include "alternator/extract_from_attrs.hh"
#include "types/types.hh"
#include "db/system_keyspace.hh"
using namespace std::chrono_literals;
@@ -215,7 +219,7 @@ static void validate_table_name(const std::string& name) {
// instead of each component individually as DynamoDB does.
// The view_name() function assumes the table_name has already been validated
// but validates the legality of index_name and the combination of both.
static std::string view_name(const std::string& table_name, std::string_view index_name, const std::string& delim = ":") {
static std::string view_name(std::string_view table_name, std::string_view index_name, const std::string& delim = ":") {
if (index_name.length() < 3) {
throw api_error::validation("IndexName must be at least 3 characters long");
}
@@ -223,7 +227,7 @@ static std::string view_name(const std::string& table_name, std::string_view ind
throw api_error::validation(
fmt::format("IndexName '{}' must satisfy regular expression pattern: [a-zA-Z0-9_.-]+", index_name));
}
std::string ret = table_name + delim + std::string(index_name);
std::string ret = std::string(table_name) + delim + std::string(index_name);
if (ret.length() > max_table_name_length) {
throw api_error::validation(
fmt::format("The total length of TableName ('{}') and IndexName ('{}') cannot exceed {} characters",
@@ -232,7 +236,7 @@ static std::string view_name(const std::string& table_name, std::string_view ind
return ret;
}
static std::string lsi_name(const std::string& table_name, std::string_view index_name) {
static std::string lsi_name(std::string_view table_name, std::string_view index_name) {
return view_name(table_name, index_name, "!:");
}
@@ -469,7 +473,90 @@ static rjson::value generate_arn_for_index(const schema& schema, std::string_vie
schema.ks_name(), schema.cf_name(), index_name));
}
static rjson::value fill_table_description(schema_ptr schema, table_status tbl_status, service::storage_proxy const& proxy)
// The following function checks if a given view has finished building.
// We need this for describe_table() to know if a view is still backfilling,
// or active.
//
// Currently we don't have in view_ptr the knowledge whether a view finished
// building long ago - so checking this involves a somewhat inefficient, but
// still node-local, process:
// We need a table that can accurately tell that all nodes have finished
// building this view. system.built_views is not good enough because it only
// knows the view building status in the current node. In recent versions,
// after PR #19745, we have a local table system.view_build_status_v2 with
// global information, replacing the old system_distributed.view_build_status.
// In theory, there can be a period during upgrading an old cluster when this
// table is not yet available. However, since the IndexStatus is a new feature
// too, it is acceptable that it doesn't yet work in the middle of the update.
static future<bool> is_view_built(
view_ptr view,
service::storage_proxy& proxy,
service::client_state& client_state,
tracing::trace_state_ptr trace_state,
service_permit permit) {
auto schema = proxy.data_dictionary().find_table(
"system", db::system_keyspace::VIEW_BUILD_STATUS_V2).schema();
// The table system.view_build_status_v2 has "keyspace_name" and
// "view_name" as the partition key, and each clustering row has
// "host_id" as clustering key and a string "status". We need to
// read a single partition:
partition_key pk = partition_key::from_exploded(*schema,
{utf8_type->decompose(view->ks_name()),
utf8_type->decompose(view->cf_name())});
dht::partition_range_vector partition_ranges{
dht::partition_range(dht::decorate_key(*schema, pk))};
auto selection = cql3::selection::selection::wildcard(schema); // only for get_query_options()!
auto partition_slice = query::partition_slice(
{query::clustering_range::make_open_ended_both_sides()},
{}, // static columns
{schema->get_column_definition("status")->id}, // regular columns
selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(
schema->id(), schema->version(), partition_slice,
proxy.get_max_result_size(partition_slice),
query::tombstone_limit(proxy.get_tombstone_limit()));
service::storage_proxy::coordinator_query_result qr =
co_await proxy.query(
schema, std::move(command), std::move(partition_ranges),
db::consistency_level::LOCAL_ONE,
service::storage_proxy::coordinator_query_options(
executor::default_timeout(), std::move(permit), client_state, trace_state));
query::result_set rs = query::result_set::from_raw_result(
schema, partition_slice, *qr.query_result);
std::unordered_map<locator::host_id, sstring> statuses;
for (auto&& r : rs.rows()) {
auto host_id = r.get<utils::UUID>("host_id");
auto status = r.get<sstring>("status");
if (host_id && status) {
statuses.emplace(locator::host_id(*host_id), *status);
}
}
// A view is considered "built" if all nodes reported SUCCESS in having
// built this view. Note that we need this "SUCCESS" for all nodes in the
// cluster - even those that are temporarily down (their success is known
// by this node, even if they are down). Conversely, we don't care what is
// the recorded status for any node which is no longer in the cluster - it
// is possible we forgot to erase the status of nodes that left the
// cluster, but here we just ignore them and look at the nodes actually
// in the topology.
bool all_built = true;
auto token_metadata = proxy.get_token_metadata_ptr();
token_metadata->get_topology().for_each_node(
[&] (const locator::node& node) {
// Note: we could skip nodes in DCs which have no replication of
// this view. However, in practice even those nodes would run
// the view building (and just see empty content) so we don't
// need to bother with this skipping.
auto it = statuses.find(node.host_id());
if (it == statuses.end() || it->second != "SUCCESS") {
all_built = false;
}
});
co_return all_built;
}
static future<rjson::value> fill_table_description(schema_ptr schema, table_status tbl_status, service::storage_proxy& proxy, service::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit)
{
rjson::value table_description = rjson::empty_object();
auto tags_ptr = db::get_tags_of_table(schema);
@@ -548,7 +635,22 @@ static rjson::value fill_table_description(schema_ptr schema, table_status tbl_s
// FIXME: we have to get ProjectionType from the schema when it is added
rjson::add(view_entry, "Projection", std::move(projection));
// Local secondary indexes are marked by an extra '!' sign occurring before the ':' delimiter
rjson::value& index_array = (delim_it > 1 && cf_name[delim_it-1] == '!') ? lsi_array : gsi_array;
bool is_lsi = (delim_it > 1 && cf_name[delim_it-1] == '!');
// Add IndexStatus and Backfilling flags, but only for GSIs -
// LSIs can only be created with the table itself and do not
// have a status. Alternator schema operations are synchronous
// so only two combinations of these flags are possible: ACTIVE
// (for a built view) or CREATING+Backfilling (if view building
// is in progress).
if (!is_lsi) {
if (co_await is_view_built(vptr, proxy, client_state, trace_state, permit)) {
rjson::add(view_entry, "IndexStatus", "ACTIVE");
} else {
rjson::add(view_entry, "IndexStatus", "CREATING");
rjson::add(view_entry, "Backfilling", rjson::value(true));
}
}
rjson::value& index_array = is_lsi ? lsi_array : gsi_array;
rjson::push_back(index_array, std::move(view_entry));
}
if (!lsi_array.Empty()) {
@@ -572,7 +674,7 @@ static rjson::value fill_table_description(schema_ptr schema, table_status tbl_s
executor::supplement_table_stream_info(table_description, *schema, proxy);
// FIXME: still missing some response fields (issue #5026)
return table_description;
co_return table_description;
}
bool is_alternator_keyspace(const sstring& ks_name) {
@@ -591,11 +693,11 @@ future<executor::request_return_type> executor::describe_table(client_state& cli
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
rjson::value table_description = fill_table_description(schema, table_status::active, _proxy);
rjson::value table_description = co_await fill_table_description(schema, table_status::active, _proxy, client_state, trace_state, permit);
rjson::value response = rjson::empty_object();
rjson::add(response, "Table", std::move(table_description));
elogger.trace("returning {}", response);
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(response)));
co_return make_jsonable(std::move(response));
}
// Check CQL's Role-Based Access Control (RBAC) permission_to_check (MODIFY,
@@ -656,7 +758,7 @@ future<executor::request_return_type> executor::delete_table(client_state& clien
auto& p = _proxy.container();
schema_ptr schema = get_table(_proxy, request);
rjson::value table_description = fill_table_description(schema, table_status::deleting, _proxy);
rjson::value table_description = co_await fill_table_description(schema, table_status::deleting, _proxy, client_state, trace_state, permit);
co_await verify_permission(_enforce_authorization, client_state, schema, auth::permission::DROP);
co_await _mm.container().invoke_on(0, [&, cs = client_state.move_to_other_shard()] (service::migration_manager& mm) -> future<> {
// FIXME: the following needs to be in a loop. If mm.announce() below
@@ -704,7 +806,7 @@ future<executor::request_return_type> executor::delete_table(client_state& clien
co_return make_jsonable(std::move(response));
}
static data_type parse_key_type(const std::string& type) {
static data_type parse_key_type(std::string_view type) {
// Note that keys are only allowed to be string, blob or number (S/B/N).
// The other types: boolean and various lists or sets - are not allowed.
if (type.length() == 1) {
@@ -719,7 +821,7 @@ static data_type parse_key_type(const std::string& type) {
}
static void add_column(schema_builder& builder, const std::string& name, const rjson::value& attribute_definitions, column_kind kind) {
static void add_column(schema_builder& builder, const std::string& name, const rjson::value& attribute_definitions, column_kind kind, bool computed_column=false) {
// FIXME: Currently, the column name ATTRS_COLUMN_NAME is not allowed
// because we use it for our untyped attribute map, and we can't have a
// second column with the same name. We should fix this, by renaming
@@ -731,7 +833,16 @@ static void add_column(schema_builder& builder, const std::string& name, const r
const rjson::value& attribute_info = *it;
if (attribute_info["AttributeName"].GetString() == name) {
auto type = attribute_info["AttributeType"].GetString();
builder.with_column(to_bytes(name), parse_key_type(type), kind);
data_type dt = parse_key_type(type);
if (computed_column) {
// Computed column for GSI (doesn't choose a real column as-is
// but rather extracts a single value from the ":attrs" map)
alternator_type at = type_info_from_string(type).atype;
builder.with_computed_column(to_bytes(name), dt, kind,
std::make_unique<extract_from_attrs_column_computation>(to_bytes(name), at));
} else {
builder.with_column(to_bytes(name), dt, kind);
}
return;
}
}
@@ -1072,6 +1183,87 @@ static std::unordered_set<std::string> validate_attribute_definitions(const rjso
return seen_attribute_names;
}
// The following "extract_from_attrs_column_computation" implementation is
// what allows Alternator GSIs to use in a materialized view's key a member
// from the ":attrs" map instead of a real column in the schema:
const bytes extract_from_attrs_column_computation::MAP_NAME = executor::ATTRS_COLUMN_NAME;
column_computation_ptr extract_from_attrs_column_computation::clone() const {
return std::make_unique<extract_from_attrs_column_computation>(*this);
}
// Serialize the *definition* of this column computation into a JSON
// string with a unique "type" string - TYPE_NAME - which then causes
// column_computation::deserialize() to create an object from this class.
bytes extract_from_attrs_column_computation::serialize() const {
rjson::value ret = rjson::empty_object();
rjson::add(ret, "type", TYPE_NAME);
rjson::add(ret, "attr_name", rjson::from_string(to_string_view(_attr_name)));
rjson::add(ret, "desired_type", represent_type(_desired_type).ident);
return to_bytes(rjson::print(ret));
}
// Construct an extract_from_attrs_column_computation object based on the
// saved output of serialize(). Calls on_internal_error() if the string
// doesn't match the expected output format of serialize(). "type" is not
// checked - we assume the caller (column_computation::deserialize()) won't
// call this constructor if "type" doesn't match.
extract_from_attrs_column_computation::extract_from_attrs_column_computation(const rjson::value &v) {
const rjson::value* attr_name = rjson::find(v, "attr_name");
if (attr_name->IsString()) {
_attr_name = bytes(to_bytes_view(rjson::to_string_view(*attr_name)));
const rjson::value* desired_type = rjson::find(v, "desired_type");
if (desired_type->IsString()) {
_desired_type = type_info_from_string(rjson::to_string_view(*desired_type)).atype;
switch (_desired_type) {
case alternator_type::S:
case alternator_type::B:
case alternator_type::N:
// We're done
return;
default:
// Fall through to on_internal_error below.
break;
}
}
}
on_internal_error(elogger, format("Improperly formatted alternator::extract_from_attrs_column_computation computed column definition: {}", v));
}
regular_column_transformation::result extract_from_attrs_column_computation::compute_value(
const schema& schema,
const partition_key& key,
const db::view::clustering_or_static_row& row) const
{
const column_definition* attrs_col = schema.get_column_definition(MAP_NAME);
if (!attrs_col || !attrs_col->is_regular() || !attrs_col->is_multi_cell()) {
on_internal_error(elogger, "extract_from_attrs_column_computation::compute_value() on a table without an attrs map");
}
// Look for the desired attribute _attr_name in the attrs_col map in row:
const atomic_cell_or_collection* attrs = row.cells().find_cell(attrs_col->id);
if (!attrs) {
return regular_column_transformation::result();
}
collection_mutation_view cmv = attrs->as_collection_mutation();
return cmv.with_deserialized(*attrs_col->type, [this] (const collection_mutation_view_description& cmvd) {
for (auto&& [key, cell] : cmvd.cells) {
if (key == _attr_name) {
return regular_column_transformation::result(cell,
std::bind(serialized_value_if_type, std::placeholders::_1, _desired_type));
}
}
return regular_column_transformation::result();
});
}
// extract_from_attrs_column_computation needs the whole row to compute
// value, it cann't use just the partition key.
bytes extract_from_attrs_column_computation::compute_value(const schema&, const partition_key&) const {
on_internal_error(elogger, "extract_from_attrs_column_computation::compute_value called without row");
}
static future<executor::request_return_type> create_table_on_shard0(service::client_state&& client_state, tracing::trace_state_ptr trace_state, rjson::value request, service::storage_proxy& sp, service::migration_manager& mm, gms::gossiper& gossiper, bool enforce_authorization) {
SCYLLA_ASSERT(this_shard_id() == 0);
@@ -1110,67 +1302,15 @@ static future<executor::request_return_type> create_table_on_shard0(service::cli
schema_ptr partial_schema = builder.build();
// Parse GlobalSecondaryIndexes parameters before creating the base
// table, so if we have a parse errors we can fail without creating
// Parse Local/GlobalSecondaryIndexes parameters before creating the
// base table, so if we have a parse errors we can fail without creating
// any table.
const rjson::value* gsi = rjson::find(request, "GlobalSecondaryIndexes");
std::vector<schema_builder> view_builders;
std::unordered_set<std::string> index_names;
if (gsi) {
if (!gsi->IsArray()) {
co_return api_error::validation("GlobalSecondaryIndexes must be an array.");
}
for (const rjson::value& g : gsi->GetArray()) {
const rjson::value* index_name_v = rjson::find(g, "IndexName");
if (!index_name_v || !index_name_v->IsString()) {
co_return api_error::validation("GlobalSecondaryIndexes IndexName must be a string.");
}
std::string_view index_name = rjson::to_string_view(*index_name_v);
auto [it, added] = index_names.emplace(index_name);
if (!added) {
co_return api_error::validation(fmt::format("Duplicate IndexName '{}', ", index_name));
}
std::string vname(view_name(table_name, index_name));
elogger.trace("Adding GSI {}", index_name);
// FIXME: read and handle "Projection" parameter. This will
// require the MV code to copy just parts of the attrs map.
schema_builder view_builder(keyspace_name, vname);
auto [view_hash_key, view_range_key] = parse_key_schema(g);
if (partial_schema->get_column_definition(to_bytes(view_hash_key)) == nullptr) {
// A column that exists in a global secondary index is upgraded from being a map entry
// to having a regular column definition in the base schema
add_column(builder, view_hash_key, attribute_definitions, column_kind::regular_column);
}
add_column(view_builder, view_hash_key, attribute_definitions, column_kind::partition_key);
unused_attribute_definitions.erase(view_hash_key);
if (!view_range_key.empty()) {
if (partial_schema->get_column_definition(to_bytes(view_range_key)) == nullptr) {
// A column that exists in a global secondary index is upgraded from being a map entry
// to having a regular column definition in the base schema
if (partial_schema->get_column_definition(to_bytes(view_hash_key)) == nullptr) {
// FIXME: this is alternator limitation only, because Scylla's materialized views
// we use underneath do not allow more than 1 base regular column to be part of the MV key
elogger.warn("Only 1 regular column from the base table should be used in the GSI key in order to ensure correct liveness management without assumptions");
}
add_column(builder, view_range_key, attribute_definitions, column_kind::regular_column);
}
add_column(view_builder, view_range_key, attribute_definitions, column_kind::clustering_key);
unused_attribute_definitions.erase(view_range_key);
}
// Base key columns which aren't part of the index's key need to
// be added to the view nonetheless, as (additional) clustering
// key(s).
if (hash_key != view_hash_key && hash_key != view_range_key) {
add_column(view_builder, hash_key, attribute_definitions, column_kind::clustering_key);
}
if (!range_key.empty() && range_key != view_hash_key && range_key != view_range_key) {
add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);
}
// GSIs have no tags:
view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>());
view_builders.emplace_back(std::move(view_builder));
}
}
// Remember the attributes used for LSI keys. Since LSI must be created
// with the table, we make these attributes real schema columns, and need
// to remember this below if the same attributes are used as GSI keys.
std::unordered_set<std::string> lsi_range_keys;
const rjson::value* lsi = rjson::find(request, "LocalSecondaryIndexes");
if (lsi) {
@@ -1228,9 +1368,68 @@ static future<executor::request_return_type> create_table_on_shard0(service::cli
std::map<sstring, sstring> tags_map = {{db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY, "true"}};
view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>(tags_map));
view_builders.emplace_back(std::move(view_builder));
lsi_range_keys.emplace(view_range_key);
}
}
const rjson::value* gsi = rjson::find(request, "GlobalSecondaryIndexes");
if (gsi) {
if (!gsi->IsArray()) {
co_return api_error::validation("GlobalSecondaryIndexes must be an array.");
}
for (const rjson::value& g : gsi->GetArray()) {
const rjson::value* index_name_v = rjson::find(g, "IndexName");
if (!index_name_v || !index_name_v->IsString()) {
co_return api_error::validation("GlobalSecondaryIndexes IndexName must be a string.");
}
std::string_view index_name = rjson::to_string_view(*index_name_v);
auto [it, added] = index_names.emplace(index_name);
if (!added) {
co_return api_error::validation(fmt::format("Duplicate IndexName '{}', ", index_name));
}
std::string vname(view_name(table_name, index_name));
elogger.trace("Adding GSI {}", index_name);
// FIXME: read and handle "Projection" parameter. This will
// require the MV code to copy just parts of the attrs map.
schema_builder view_builder(keyspace_name, vname);
auto [view_hash_key, view_range_key] = parse_key_schema(g);
// If an attribute is already a real column in the base table
// (i.e., a key attribute) or we already made it a real column
// as an LSI key above, we can use it directly as a view key.
// Otherwise, we need to add it as a "computed column", which
// extracts and deserializes the attribute from the ":attrs" map.
bool view_hash_key_real_column =
partial_schema->get_column_definition(to_bytes(view_hash_key)) ||
lsi_range_keys.contains(view_hash_key);
add_column(view_builder, view_hash_key, attribute_definitions, column_kind::partition_key, !view_hash_key_real_column);
unused_attribute_definitions.erase(view_hash_key);
if (!view_range_key.empty()) {
bool view_range_key_real_column =
partial_schema->get_column_definition(to_bytes(view_range_key)) ||
lsi_range_keys.contains(view_range_key);
add_column(view_builder, view_range_key, attribute_definitions, column_kind::clustering_key, !view_range_key_real_column);
if (!partial_schema->get_column_definition(to_bytes(view_range_key)) &&
!partial_schema->get_column_definition(to_bytes(view_hash_key))) {
// FIXME: This warning should go away. See issue #6714
elogger.warn("Only 1 regular column from the base table should be used in the GSI key in order to ensure correct liveness management without assumptions");
}
unused_attribute_definitions.erase(view_range_key);
}
// Base key columns which aren't part of the index's key need to
// be added to the view nonetheless, as (additional) clustering
// key(s).
if (hash_key != view_hash_key && hash_key != view_range_key) {
add_column(view_builder, hash_key, attribute_definitions, column_kind::clustering_key);
}
if (!range_key.empty() && range_key != view_hash_key && range_key != view_range_key) {
add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);
}
// GSIs have no tags:
view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>());
view_builders.emplace_back(std::move(view_builder));
}
}
if (!unused_attribute_definitions.empty()) {
co_return api_error::validation(fmt::format(
"AttributeDefinitions defines spurious attributes not used by any KeySchema: {}",
@@ -1371,12 +1570,37 @@ future<executor::request_return_type> executor::create_table(client_state& clien
});
}
// When UpdateTable adds a GSI, the type of its key columns must be specified
// in a AttributeDefinitions. If one of these key columns are *already* key
// columns of the base table or any of its prior GSIs or LSIs, the type
// given in AttributeDefinitions must match the type of the existing key -
// otherise Alternator will not know which type to enforce in new writes.
// This function checks for such conflicts. It assumes that the structure of
// the given attribute_definitions was already validated (with
// validate_attribute_definitions()).
// This function should be called multiple times - once for the base schema
// and once for each of its views (existing GSIs and LSIs on this table).
static void check_attribute_definitions_conflicts(const rjson::value& attribute_definitions, const schema& schema) {
for (auto& def : schema.primary_key_columns()) {
std::string def_type = type_to_string(def.type);
for (auto it = attribute_definitions.Begin(); it != attribute_definitions.End(); ++it) {
const rjson::value& attribute_info = *it;
if (attribute_info["AttributeName"].GetString() == def.name_as_text()) {
auto type = attribute_info["AttributeType"].GetString();
if (type != def_type) {
throw api_error::validation(fmt::format("AttributeDefinitions redefined {} to {} already a key attribute of type {} in this table", def.name_as_text(), type, def_type));
}
break;
}
}
}
}
future<executor::request_return_type> executor::update_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request) {
_stats.api_operations.update_table++;
elogger.trace("Updating table {}", request);
static const std::vector<sstring> unsupported = {
"GlobalSecondaryIndexUpdates",
"ProvisionedThroughput",
"ReplicaUpdates",
"SSESpecification",
@@ -1388,11 +1612,14 @@ future<executor::request_return_type> executor::update_table(client_state& clien
}
}
bool empty_request = true;
if (rjson::find(request, "BillingMode")) {
empty_request = false;
verify_billing_mode(request);
}
co_return co_await _mm.container().invoke_on(0, [&p = _proxy.container(), request = std::move(request), gt = tracing::global_trace_state_ptr(std::move(trace_state)), enforce_authorization = bool(_enforce_authorization), client_state_other_shard = client_state.move_to_other_shard()]
co_return co_await _mm.container().invoke_on(0, [&p = _proxy.container(), request = std::move(request), gt = tracing::global_trace_state_ptr(std::move(trace_state)), enforce_authorization = bool(_enforce_authorization), client_state_other_shard = client_state.move_to_other_shard(), empty_request]
(service::migration_manager& mm) mutable -> future<executor::request_return_type> {
// FIXME: the following needs to be in a loop. If mm.announce() below
// fails, we need to retry the whole thing.
@@ -1412,6 +1639,7 @@ future<executor::request_return_type> executor::update_table(client_state& clien
rjson::value* stream_specification = rjson::find(request, "StreamSpecification");
if (stream_specification && stream_specification->IsObject()) {
empty_request = false;
add_stream_options(*stream_specification, builder, p.local());
// Alternator Streams doesn't yet work when the table uses tablets (#16317)
auto stream_enabled = rjson::find(*stream_specification, "StreamEnabled");
@@ -1423,8 +1651,162 @@ future<executor::request_return_type> executor::update_table(client_state& clien
}
auto schema = builder.build();
std::vector<view_ptr> new_views;
std::vector<std::string> dropped_views;
rjson::value* gsi_updates = rjson::find(request, "GlobalSecondaryIndexUpdates");
if (gsi_updates) {
if (!gsi_updates->IsArray()) {
co_return api_error::validation("GlobalSecondaryIndexUpdates must be an array");
}
if (gsi_updates->Size() > 1) {
// Although UpdateTable takes an array of operations and could
// support multiple Create and/or Delete operations in one
// command, DynamoDB doesn't actually allows this, and throws
// a LimitExceededException if this is attempted.
co_return api_error::limit_exceeded("GlobalSecondaryIndexUpdates only allows one index creation or deletion");
}
if (gsi_updates->Size() == 1) {
empty_request = false;
if (!(*gsi_updates)[0].IsObject() || (*gsi_updates)[0].MemberCount() != 1) {
co_return api_error::validation("GlobalSecondaryIndexUpdates array must contain one object with a Create, Delete or Update operation");
}
auto it = (*gsi_updates)[0].MemberBegin();
const std::string_view op = rjson::to_string_view(it->name);
if (!it->value.IsObject()) {
co_return api_error::validation("GlobalSecondaryIndexUpdates entries must be objects");
}
const rjson::value* index_name_v = rjson::find(it->value, "IndexName");
if (!index_name_v || !index_name_v->IsString()) {
co_return api_error::validation("GlobalSecondaryIndexUpdates operation must have IndexName");
}
std::string_view index_name = rjson::to_string_view(*index_name_v);
std::string_view table_name = schema->cf_name();
std::string_view keyspace_name = schema->ks_name();
std::string vname(view_name(table_name, index_name));
if (op == "Create") {
const rjson::value* attribute_definitions = rjson::find(request, "AttributeDefinitions");
if (!attribute_definitions) {
co_return api_error::validation("GlobalSecondaryIndexUpdates Create needs AttributeDefinitions");
}
std::unordered_set<std::string> unused_attribute_definitions =
validate_attribute_definitions(*attribute_definitions);
check_attribute_definitions_conflicts(*attribute_definitions, *schema);
for (auto& view : p.local().data_dictionary().find_column_family(tab).views()) {
check_attribute_definitions_conflicts(*attribute_definitions, *view);
}
if (p.local().data_dictionary().has_schema(keyspace_name, vname)) {
// Surprisingly, DynamoDB uses validation error here, not resource_in_use
co_return api_error::validation(fmt::format(
"GSI {} already exists in table {}", index_name, table_name));
}
if (p.local().data_dictionary().has_schema(keyspace_name, lsi_name(table_name, index_name))) {
co_return api_error::validation(fmt::format(
"LSI {} already exists in table {}, can't use same name for GSI", index_name, table_name));
}
elogger.trace("Adding GSI {}", index_name);
// FIXME: read and handle "Projection" parameter. This will
// require the MV code to copy just parts of the attrs map.
schema_builder view_builder(keyspace_name, vname);
auto [view_hash_key, view_range_key] = parse_key_schema(it->value);
// If an attribute is already a real column in the base
// table (i.e., a key attribute in the base table or LSI),
// we can use it directly as a view key. Otherwise, we
// need to add it as a "computed column", which extracts
// and deserializes the attribute from the ":attrs" map.
bool view_hash_key_real_column =
schema->get_column_definition(to_bytes(view_hash_key));
add_column(view_builder, view_hash_key, *attribute_definitions, column_kind::partition_key, !view_hash_key_real_column);
unused_attribute_definitions.erase(view_hash_key);
if (!view_range_key.empty()) {
bool view_range_key_real_column =
schema->get_column_definition(to_bytes(view_range_key));
add_column(view_builder, view_range_key, *attribute_definitions, column_kind::clustering_key, !view_range_key_real_column);
if (!schema->get_column_definition(to_bytes(view_range_key)) &&
!schema->get_column_definition(to_bytes(view_hash_key))) {
// FIXME: This warning should go away. See issue #6714
elogger.warn("Only 1 regular column from the base table should be used in the GSI key in order to ensure correct liveness management without assumptions");
}
unused_attribute_definitions.erase(view_range_key);
}
// Surprisingly, although DynamoDB checks for unused
// AttributeDefinitions in CreateTable, it does not
// check it in UpdateTable. We decided to check anyway.
if (!unused_attribute_definitions.empty()) {
co_return api_error::validation(fmt::format(
"AttributeDefinitions defines spurious attributes not used by any KeySchema: {}",
unused_attribute_definitions));
}
// Base key columns which aren't part of the index's key need to
// be added to the view nonetheless, as (additional) clustering
// key(s).
for (auto& def : schema->primary_key_columns()) {
if (def.name_as_text() != view_hash_key && def.name_as_text() != view_range_key) {
view_builder.with_column(def.name(), def.type, column_kind::clustering_key);
}
}
// GSIs have no tags:
view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>());
// Note below we don't need to add virtual columns, as all
// base columns were copied to view. TODO: reconsider the need
// for virtual columns when we support Projection.
for (const column_definition& regular_cdef : schema->regular_columns()) {
if (!view_builder.has_column(*cql3::to_identifier(regular_cdef))) {
view_builder.with_column(regular_cdef.name(), regular_cdef.type, column_kind::regular_column);
}
}
const bool include_all_columns = true;
view_builder.with_view_info(*schema, include_all_columns, ""/*where clause*/);
new_views.emplace_back(view_builder.build());
} else if (op == "Delete") {
elogger.trace("Deleting GSI {}", index_name);
if (!p.local().data_dictionary().has_schema(keyspace_name, vname)) {
co_return api_error::resource_not_found(fmt::format("No GSI {} in table {}", index_name, table_name));
}
dropped_views.emplace_back(vname);
} else if (op == "Update") {
co_return api_error::validation("GlobalSecondaryIndexUpdates Update not yet supported");
} else {
co_return api_error::validation(fmt::format("GlobalSecondaryIndexUpdates supports a Create, Delete or Update operation, saw '{}'", op));
}
}
}
if (empty_request) {
co_return api_error::validation("UpdateTable requires one of GlobalSecondaryIndexUpdates, StreamSpecification or BillingMode to be specified");
}
co_await verify_permission(enforce_authorization, client_state_other_shard.get(), schema, auth::permission::ALTER);
auto m = co_await service::prepare_column_family_update_announcement(p.local(), schema, std::vector<view_ptr>(), group0_guard.write_timestamp());
auto m = co_await service::prepare_column_family_update_announcement(p.local(), schema, std::vector<view_ptr>(), group0_guard.write_timestamp());
for (view_ptr view : new_views) {
auto m2 = co_await service::prepare_new_view_announcement(p.local(), view, group0_guard.write_timestamp());
std::move(m2.begin(), m2.end(), std::back_inserter(m));
}
for (const std::string& view_name : dropped_views) {
auto m2 = co_await service::prepare_view_drop_announcement(p.local(), schema->ks_name(), view_name, group0_guard.write_timestamp());
std::move(m2.begin(), m2.end(), std::back_inserter(m));
}
// If a role is allowed to create a GSI, we should give it permissions
// to read the GSI it just created. This is known as "auto-grant".
// Also, when we delete a GSI we should revoke any permissions set on
// it - so if it's ever created again the old permissions wouldn't be
// remembered for the new GSI. This is known as "auto-revoke"
if (client_state_other_shard.get().user() && (!new_views.empty() || !dropped_views.empty())) {
service::group0_batch mc(std::move(group0_guard));
mc.add_mutations(std::move(m));
for (view_ptr view : new_views) {
auto resource = auth::make_data_resource(view->ks_name(), view->cf_name());
co_await auth::grant_applicable_permissions(
*client_state_other_shard.get().get_auth_service(), *client_state_other_shard.get().user(), resource, mc);
}
for (const auto& view_name : dropped_views) {
auto resource = auth::make_data_resource(schema->ks_name(), view_name);
co_await auth::revoke_all(*client_state_other_shard.get().get_auth_service(), resource, mc);
}
std::tie(m, group0_guard) = co_await std::move(mc).extract();
}
co_await mm.announce(std::move(m), std::move(group0_guard), format("alternator-executor: update {} table", tab->cf_name()));
@@ -1546,7 +1928,7 @@ public:
struct delete_item {};
struct put_item {};
put_or_delete_item(const rjson::value& key, schema_ptr schema, delete_item);
put_or_delete_item(const rjson::value& item, schema_ptr schema, put_item);
put_or_delete_item(const rjson::value& item, schema_ptr schema, put_item, std::unordered_map<bytes, std::string> key_attributes);
// put_or_delete_item doesn't keep a reference to schema (so it can be
// moved between shards for LWT) so it needs to be given again to build():
mutation build(schema_ptr schema, api::timestamp_type ts) const;
@@ -1578,7 +1960,75 @@ static inline const column_definition* find_attribute(const schema& schema, cons
return cdef;
}
put_or_delete_item::put_or_delete_item(const rjson::value& item, schema_ptr schema, put_item)
// Get a list of all attributes that serve as a key attributes for any of the
// GSIs or LSIs of this table, and the declared type for each (can be only
// "S", "B", or "N"). The implementation below will also list the base table's
// key columns (they are the views' clustering keys).
std::unordered_map<bytes, std::string> si_key_attributes(data_dictionary::table t) {
std::unordered_map<bytes, std::string> ret;
for (const view_ptr& v : t.views()) {
for (const column_definition& cdef : v->partition_key_columns()) {
ret[cdef.name()] = type_to_string(cdef.type);
}
for (const column_definition& cdef : v->clustering_key_columns()) {
ret[cdef.name()] = type_to_string(cdef.type);
}
}
return ret;
}
// When an attribute is a key (hash or sort) of one of the GSIs on a table,
// DynamoDB refuses an update to that attribute with an unsuitable value.
// Unsuitable values are:
// 1. An empty string (those are normally allowed as values, but not allowed
// as keys, including GSI keys).
// 2. A value with a type different than that declared for the GSI key.
// Normally non-key attributes can take values of any type (DynamoDB is
// schema-less), but as soon as an attribute is used as a GSI key, it
// must be set only to the specific type declared for that key.
// (Note that a missing value for an GSI key attribute is fine - the update
// will happen on the base table, but won't reach the view table. In this
// case, this function simply won't be called for this attribute.)
//
// This function checks if the given attribute update is an update to some
// GSI's key, and if the value is unsuitable, a api_error::validation is
// thrown. The checking here is similar to the checking done in
// get_key_from_typed_value() for the base table's key columns.
//
// validate_value_if_gsi_key() should only be called after validate_value()
// already validated that the value itself has a valid form.
static inline void validate_value_if_gsi_key(
std::unordered_map<bytes, std::string> key_attributes,
const bytes& attribute,
const rjson::value& value) {
if (key_attributes.empty()) {
return;
}
auto it = key_attributes.find(attribute);
if (it == key_attributes.end()) {
// Given attribute is not a key column with a fixed type, so no
// more validation to do.
return;
}
const std::string& expected_type = it->second;
// We assume that validate_value() was previously called on this value,
// so value is known to be of the proper format (an object with one
// member, whose key and value are strings)
std::string_view value_type = rjson::to_string_view(value.MemberBegin()->name);
if (expected_type != value_type) {
throw api_error::validation(fmt::format(
"Type mismatch: expected type {} for GSI key attribute {}, got type {}",
expected_type, to_string_view(attribute), value_type));
}
std::string_view value_content = rjson::to_string_view(value.MemberBegin()->value);
if (value_content.empty()) {
throw api_error::validation(fmt::format(
"GSI key attribute {} cannot be set to an empty string", to_string_view(attribute)));
}
}
put_or_delete_item::put_or_delete_item(const rjson::value& item, schema_ptr schema, put_item, std::unordered_map<bytes, std::string> key_attributes)
: _pk(pk_from_json(item, schema)), _ck(ck_from_json(item, schema)) {
_cells = std::vector<cell>();
_cells->reserve(item.MemberCount());
@@ -1588,6 +2038,9 @@ put_or_delete_item::put_or_delete_item(const rjson::value& item, schema_ptr sche
const column_definition* cdef = find_attribute(*schema, column_name);
_length_in_bytes += column_name.size();
if (!cdef) {
// This attribute may be a key column of one of the GSI, in which
// case there are some limitations on the value
validate_value_if_gsi_key(key_attributes, column_name, it->value);
bytes value = serialize_item(it->value);
if (value.size()) {
// ScyllaDB uses one extra byte compared to DynamoDB for the bytes length
@@ -1595,7 +2048,7 @@ put_or_delete_item::put_or_delete_item(const rjson::value& item, schema_ptr sche
}
_cells->push_back({std::move(column_name), serialize_item(it->value)});
} else if (!cdef->is_primary_key()) {
// Fixed-type regular column can be used for GSI key
// Fixed-type regular column can be used for LSI key
bytes value = get_key_from_typed_value(it->value, *cdef);
_cells->push_back({std::move(column_name),
value});
@@ -1954,7 +2407,8 @@ public:
parsed::condition_expression _condition_expression;
put_item_operation(service::storage_proxy& proxy, rjson::value&& request)
: rmw_operation(proxy, std::move(request))
, _mutation_builder(rjson::get(_request, "Item"), schema(), put_or_delete_item::put_item{}) {
, _mutation_builder(rjson::get(_request, "Item"), schema(), put_or_delete_item::put_item{},
si_key_attributes(proxy.data_dictionary().find_table(schema()->ks_name(), schema()->cf_name()))) {
_pk = _mutation_builder.pk();
_ck = _mutation_builder.ck();
if (_returnvalues != returnvalues::NONE && _returnvalues != returnvalues::ALL_OLD) {
@@ -2315,7 +2769,8 @@ future<executor::request_return_type> executor::batch_write_item(client_state& c
const rjson::value& put_request = r->value;
const rjson::value& item = put_request["Item"];
mutation_builders.emplace_back(schema, put_or_delete_item(
item, schema, put_or_delete_item::put_item{}));
item, schema, put_or_delete_item::put_item{},
si_key_attributes(_proxy.data_dictionary().find_table(schema->ks_name(), schema->cf_name()))));
auto mut_key = std::make_pair(mutation_builders.back().second.pk(), mutation_builders.back().second.ck());
if (used_keys.contains(mut_key)) {
co_return api_error::validation("Provided list of item keys contains duplicates");
@@ -2859,6 +3314,10 @@ public:
// them by top-level attribute, and detects forbidden overlaps/conflicts.
attribute_path_map<parsed::update_expression::action> _update_expression;
// Saved list of GSI keys in the table being updated, used for
// validate_value_if_gsi_key()
std::unordered_map<bytes, std::string> _key_attributes;
parsed::condition_expression _condition_expression;
update_item_operation(service::storage_proxy& proxy, rjson::value&& request);
@@ -2950,6 +3409,9 @@ update_item_operation::update_item_operation(service::storage_proxy& proxy, rjso
if (expression_attribute_values) {
_consumed_capacity._total_bytes += estimate_value_size(*expression_attribute_values);
}
_key_attributes = si_key_attributes(proxy.data_dictionary().find_table(
_schema->ks_name(), _schema->cf_name()));
}
// These are the cases where update_item_operation::apply() needs to use
@@ -3247,6 +3709,9 @@ update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::t
bytes column_value = get_key_from_typed_value(json_value, *cdef);
row.cells().apply(*cdef, atomic_cell::make_live(*cdef->type, ts, column_value));
} else {
// This attribute may be a key column of one of the GSIs, in which
// case there are some limitations on the value.
validate_value_if_gsi_key(_key_attributes, column_name, json_value);
attrs_collector.put(std::move(column_name), serialize_item(json_value), ts);
}
};

View File

@@ -0,0 +1,73 @@
/*
* Copyright 2024-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include <string>
#include <string_view>
#include "utils/rjson.hh"
#include "serialization.hh"
#include "column_computation.hh"
#include "db/view/regular_column_transformation.hh"
namespace alternator {
// An implementation of a "column_computation" which extracts a specific
// non-key attribute from the big map (":attrs") of all non-key attributes,
// and deserializes it if it has the desired type. GSI will use this computed
// column as a materialized-view key when the view key attribute isn't a
// full-fledged CQL column but rather stored in ":attrs".
class extract_from_attrs_column_computation : public regular_column_transformation {
// The name of the CQL column name holding the attribute map. It is a
// constant defined in executor.cc (as ":attrs"), so doesn't need
// to be specified when constructing the column computation.
static const bytes MAP_NAME;
// The top-level attribute name to extract from the ":attrs" map.
bytes _attr_name;
// The type we expect for the value stored in the attribute. If the type
// matches the expected type, it is decoded from the serialized format
// we store in the map's values) into the raw CQL type value that we use
// for keys, and returned by compute_value(). Only the types "S" (string),
// "B" (bytes) and "N" (number) are allowed as keys in DynamoDB, and
// therefore in desired_type.
alternator_type _desired_type;
public:
virtual column_computation_ptr clone() const override;
// TYPE_NAME is a unique string that distinguishes this class from other
// column_computation subclasses. column_computation::deserialize() will
// construct an object of this subclass if it sees a "type" TYPE_NAME.
static inline const std::string TYPE_NAME = "alternator_extract_from_attrs";
// Serialize the *definition* of this column computation into a JSON
// string with a unique "type" string - TYPE_NAME - which then causes
// column_computation::deserialize() to create an object from this class.
virtual bytes serialize() const override;
// Construct this object based on the previous output of serialize().
// Calls on_internal_error() if the string doesn't match the output format
// of serialize(). "type" is not checked column_computation::deserialize()
// won't call this constructor if "type" doesn't match.
extract_from_attrs_column_computation(const rjson::value &v);
extract_from_attrs_column_computation(bytes_view attr_name, alternator_type desired_type)
: _attr_name(attr_name), _desired_type(desired_type)
{}
// Implement regular_column_transformation's compute_value() that
// accepts the full row:
result compute_value(const schema& schema, const partition_key& key,
const db::view::clustering_or_static_row& row) const override;
// But do not implement column_computation's compute_value() that
// accepts only a partition key - that's not enough so our implementation
// of this function does on_internal_error().
bytes compute_value(const schema& schema, const partition_key& key) const override;
// This computed column does depend on a non-primary key column, so
// its result may change in the update and we need to compute it
// before and after the update.
virtual bool depends_on_non_primary_key_column() const override {
return true;
}
};
} // namespace alternator

View File

@@ -245,6 +245,27 @@ rjson::value deserialize_item(bytes_view bv) {
return deserialized;
}
// This function takes a bytes_view created earlier by serialize_item(), and
// if has the type "expected_type", the function returns the value as a
// raw Scylla type. If the type doesn't match, returns an unset optional.
// This function only supports the key types S (string), B (bytes) and N
// (number) - serialize_item() serializes those types as a single-byte type
// followed by the serialized raw Scylla type, so all this function needs to
// do is to remove the first byte. This makes this function much more
// efficient than deserialize_item() above because it avoids transformation
// to/from JSON.
std::optional<bytes> serialized_value_if_type(bytes_view bv, alternator_type expected_type) {
if (bv.empty() || alternator_type(bv[0]) != expected_type) {
return std::nullopt;
}
// Currently, serialize_item() for types in alternator_type (notably S, B
// and N) are nothing more than Scylla's raw format for these types
// preceded by a type byte. So we just need to skip that byte and we are
// left by exactly what we need to return.
bv.remove_prefix(1);
return bytes(bv);
}
std::string type_to_string(data_type type) {
static thread_local std::unordered_map<data_type, std::string> types = {
{utf8_type, "S"},

View File

@@ -43,6 +43,7 @@ type_representation represent_type(alternator_type atype);
bytes serialize_item(const rjson::value& item);
rjson::value deserialize_item(bytes_view bv);
std::optional<bytes> serialized_value_if_type(bytes_view bv, alternator_type expected_type);
std::string type_to_string(data_type type);

View File

@@ -2864,6 +2864,30 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"hosts_filter",
"description":"Repair replicas listed in the comma-separated host_id list.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"dcs_filter",
"description":"Repair replicas listed in the comma-separated DC list",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"await_completion",
"description":"Set true to wait for the repair to complete. Set false to skip waiting for the repair to complete. When the option is not provided, it defaults to false.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}

View File

@@ -253,6 +253,30 @@
]
}
]
},
{
"path":"/task_manager/drain/{module}",
"operations":[
{
"method":"POST",
"summary":"Drain finished local tasks",
"type":"void",
"nickname":"drain_tasks",
"produces":[
"application/json"
],
"parameters":[
{
"name":"module",
"description":"The module to drain",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
}
],
"models":{

View File

@@ -6,6 +6,8 @@
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "build_mode.hh"
#ifndef SCYLLA_BUILD_MODE_RELEASE
#include <seastar/core/coroutine.hh>

View File

@@ -1543,6 +1543,11 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
}
auto ks = req->get_query_param("ks");
auto table = req->get_query_param("table");
bool await_completion = false;
auto await = req->get_query_param("await_completion");
if (!await.empty()) {
await_completion = validate_bool(await);
}
validate_table(ctx, ks, table);
auto table_id = ctx.db.local().find_column_family(ks, table).schema()->id();
std::variant<utils::chunked_vector<dht::token>, service::storage_service::all_tokens_tag> tokens_variant;
@@ -1551,8 +1556,22 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
} else {
tokens_variant = tokens;
}
auto hosts = req->get_query_param("hosts_filter");
auto dcs = req->get_query_param("dcs_filter");
auto res = co_await ss.local().add_repair_tablet_request(table_id, tokens_variant);
std::unordered_set<locator::host_id> hosts_filter;
if (!hosts.empty()) {
std::string delim = ",";
hosts_filter = std::ranges::views::split(hosts, delim) | std::views::transform([](auto&& h) {
try {
return locator::host_id(utils::UUID(std::string_view{h}));
} catch (...) {
throw httpd::bad_param_exception(fmt::format("Wrong host_id format {}", h));
}
}) | std::ranges::to<std::unordered_set>();
}
auto dcs_filter = locator::tablet_task_info::deserialize_repair_dcs_filter(dcs);
auto res = co_await ss.local().add_repair_tablet_request(table_id, tokens_variant, hosts_filter, dcs_filter, await_completion);
co_return json::json_return_type(res);
});

View File

@@ -232,6 +232,32 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
uint32_t user_ttl = cfg.user_task_ttl_seconds();
co_return json::json_return_type(user_ttl);
});
tm::drain_tasks.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
co_await tm.invoke_on_all([&req] (tasks::task_manager& tm) -> future<> {
tasks::task_manager::module_ptr module;
try {
module = tm.find_module(req->get_path_param("module"));
} catch (...) {
throw bad_param_exception(fmt::format("{}", std::current_exception()));
}
const auto& local_tasks = module->get_local_tasks();
std::vector<tasks::task_id> ids;
ids.reserve(local_tasks.size());
std::transform(begin(local_tasks), end(local_tasks), std::back_inserter(ids), [] (const auto& task) {
return task.second->is_complete() ? task.first : tasks::task_id::create_null_id();
});
for (auto&& id : ids) {
if (id) {
module->unregister_task(id);
}
co_await maybe_yield();
}
});
co_return json_void();
});
}
void unset_task_manager(http_context& ctx, routes& r) {
@@ -243,6 +269,7 @@ void unset_task_manager(http_context& ctx, routes& r) {
tm::get_task_status_recursively.unset(r);
tm::get_and_update_ttl.unset(r);
tm::get_ttl.unset(r);
tm::drain_tasks.unset(r);
}
}

View File

@@ -6,6 +6,9 @@
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "build_mode.hh"
#ifndef SCYLLA_BUILD_MODE_RELEASE
#include <seastar/core/coroutine.hh>

View File

@@ -1112,7 +1112,9 @@ future<bool> generation_service::legacy_do_handle_cdc_generation(cdc::generation
auto sys_dist_ks = get_sys_dist_ks();
auto gen = co_await retrieve_generation_data(gen_id, _sys_ks.local(), *sys_dist_ks, { _token_metadata.get()->count_normal_token_owners() });
if (!gen) {
throw std::runtime_error(fmt::format(
// This may happen during raft upgrade when a node gossips about a generation that
// was propagated through raft and we didn't apply it yet.
throw generation_handling_nonfatal_exception(fmt::format(
"Could not find CDC generation {} in distributed system tables (current time: {}),"
" even though some node gossiped about it.",
gen_id, db_clock::now()));

View File

@@ -186,7 +186,7 @@ bool cdc::metadata::prepare(db_clock::time_point tp) {
}
auto ts = to_ts(tp);
auto emplaced = _gens.emplace(to_ts(tp), std::nullopt).second;
auto [it, emplaced] = _gens.emplace(to_ts(tp), std::nullopt);
if (_last_stream_timestamp != api::missing_timestamp) {
auto last_correct_gen = gen_used_at(_last_stream_timestamp);
@@ -201,5 +201,5 @@ bool cdc::metadata::prepare(db_clock::time_point tp) {
}
}
return emplaced;
return !it->second;
}

View File

@@ -15,6 +15,7 @@
#include "sstables/sstables_manager.hh"
#include <memory>
#include <fmt/ranges.h>
#include <seastar/core/future.hh>
#include <seastar/core/metrics.hh>
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/switch_to.hh>
@@ -503,7 +504,7 @@ public:
virtual ~sstables_task_executor() = default;
virtual void release_resources() noexcept override;
virtual future<> release_resources() noexcept override;
virtual future<tasks::task_manager::task::progress> get_progress() const override {
return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
@@ -788,9 +789,10 @@ compaction::compaction_state::~compaction_state() {
compaction_done.broken();
}
void sstables_task_executor::release_resources() noexcept {
future<> sstables_task_executor::release_resources() noexcept {
_cm._stats.pending_tasks -= _sstables.size() - (_state == state::pending);
_sstables = {};
return make_ready_future();
}
future<compaction_manager::compaction_stats_opt> compaction_task_executor::run_compaction() noexcept {
@@ -1565,10 +1567,10 @@ public:
, _can_purge(can_purge)
{}
virtual void release_resources() noexcept override {
virtual future<> release_resources() noexcept override {
_compacting.release_all();
_owned_ranges_ptr = nullptr;
sstables_task_executor::release_resources();
co_await sstables_task_executor::release_resources();
}
protected:
@@ -1846,11 +1848,12 @@ public:
virtual ~cleanup_sstables_compaction_task_executor() = default;
virtual void release_resources() noexcept override {
virtual future<> release_resources() noexcept override {
_cm._stats.pending_tasks -= _pending_cleanup_jobs.size();
_pending_cleanup_jobs = {};
_compacting.release_all();
_owned_ranges_ptr = nullptr;
return make_ready_future();
}
virtual future<tasks::task_manager::task::progress> get_progress() const override {

View File

@@ -689,3 +689,6 @@ maintenance_socket: ignore
# Note that creating keyspaces with tablets enabled or disabled is irreversible.
# The `tablets` option cannot be changed using `ALTER KEYSPACE`.
enable_tablets: true
# Enforce RF-rack-valid keyspaces.
rf_rack_valid_keyspaces: false

View File

@@ -1564,7 +1564,7 @@ deps['test/boost/linearizing_input_stream_test'] = [
"test/boost/linearizing_input_stream_test.cc",
"test/lib/log.cc",
]
deps['test/boost/expr_test'] = ['test/boost/expr_test.cc', 'test/lib/expr_test_utils.cc'] + scylla_core
deps['test/boost/expr_test'] = ['test/boost/expr_test.cc', 'test/lib/expr_test_utils.cc'] + scylla_core + alternator
deps['test/boost/rate_limiter_test'] = ['test/boost/rate_limiter_test.cc', 'db/rate_limiter.cc']
deps['test/boost/exceptions_optimized_test'] = ['test/boost/exceptions_optimized_test.cc', 'utils/exceptions.cc']
deps['test/boost/exceptions_fallback_test'] = ['test/boost/exceptions_fallback_test.cc', 'utils/exceptions.cc']
@@ -1581,8 +1581,8 @@ deps['test/raft/many_test'] = ['test/raft/many_test.cc', 'test/raft/replication.
deps['test/raft/fsm_test'] = ['test/raft/fsm_test.cc', 'test/raft/helpers.cc', 'test/lib/log.cc'] + scylla_raft_dependencies
deps['test/raft/etcd_test'] = ['test/raft/etcd_test.cc', 'test/raft/helpers.cc', 'test/lib/log.cc'] + scylla_raft_dependencies
deps['test/raft/raft_sys_table_storage_test'] = ['test/raft/raft_sys_table_storage_test.cc'] + \
scylla_core + scylla_tests_generic_dependencies
deps['test/boost/address_map_test'] = ['test/boost/address_map_test.cc'] + scylla_core
scylla_core + alternator + scylla_tests_generic_dependencies
deps['test/boost/address_map_test'] = ['test/boost/address_map_test.cc'] + scylla_core + alternator
deps['test/raft/discovery_test'] = ['test/raft/discovery_test.cc',
'test/raft/helpers.cc',
'test/lib/log.cc',

View File

@@ -13,6 +13,7 @@
#include <seastar/core/on_internal_error.hh>
#include <stdexcept>
#include "alter_keyspace_statement.hh"
#include "locator/tablets.hh"
#include "prepared_statement.hh"
#include "service/migration_manager.hh"
#include "service/storage_proxy.hh"
@@ -25,6 +26,7 @@
#include "create_keyspace_statement.hh"
#include "gms/feature_service.hh"
#include "replica/database.hh"
#include "db/config.hh"
static logging::logger mylogger("alter_keyspace");
@@ -193,9 +195,9 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
event::schema_change::target_type target_type = event::schema_change::target_type::KEYSPACE;
auto ks = qp.db().find_keyspace(_name);
auto ks_md = ks.metadata();
const auto& tm = *qp.proxy().get_token_metadata_ptr();
const auto tmptr = qp.proxy().get_token_metadata_ptr();
const auto& feat = qp.proxy().features();
auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, tm, feat);
auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, *tmptr, feat);
std::vector<mutation> muts;
std::vector<sstring> warnings;
bool include_tablet_options = _attrs->get_map(_attrs->KW_TABLETS).has_value();
@@ -246,6 +248,36 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
muts.insert(muts.begin(), schema_mutations.begin(), schema_mutations.end());
}
// If `rf_rack_valid_keyspaces` is enabled, it's forbidden to perform a schema change that
// would lead to an RF-rack-valid keyspace. Verify that this change does not.
// For more context, see: scylladb/scylladb#23071.
if (qp.db().get_config().rf_rack_valid_keyspaces()) {
auto rs = locator::abstract_replication_strategy::create_replication_strategy(
ks_md_update->strategy_name(),
locator::replication_strategy_params(ks_md_update->strategy_options(), ks_md_update->initial_tablets()));
try {
// There are two things to note here:
// 1. We hold a group0_guard, so it's correct to check this here.
// The topology or schema cannot change while we're performing this query.
// 2. The replication strategy we use here does NOT represent the actual state
// we will arrive at after applying the schema change. For instance, if the user
// did not specify the RF for some of the DCs, it's equal to 0 in the replication
// strategy we pass to this function, while in reality that means that the RF
// will NOT change. That is not a problem:
// - RF=0 is valid for all DCs, so it won't trigger an exception on its own,
// - the keyspace must've been RF-rack-valid before this change. We check that
// condition for all keyspaces at startup.
// The second hyphen is not really true because currently topological changes can
// disturb it (see scylladb/scylladb#23345), but we ignore that.
locator::assert_rf_rack_valid_keyspace(_name, tmptr, *rs);
} catch (const std::exception& e) {
// There's no guarantee what the type of the exception will be, so we need to
// wrap it manually here in a type that can be passed to the user.
throw exceptions::invalid_request_exception(e.what());
}
}
auto ret = ::make_shared<event::schema_change>(
event::schema_change::change_type::UPDATED,
target_type,

View File

@@ -87,6 +87,9 @@ std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_e
"Secondary indexes are not supported on COMPACT STORAGE tables that have clustering columns");
}
if (!db.features().views_with_tablets && db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
throw exceptions::invalid_request_exception(format("Secondary indexes are not supported on base tables with tablets (keyspace '{}')", keyspace()));
}
validate_for_local_index(*schema);
std::vector<::shared_ptr<index_target>> targets;

View File

@@ -11,6 +11,8 @@
#include <seastar/core/coroutine.hh>
#include "cql3/statements/create_keyspace_statement.hh"
#include "cql3/statements/ks_prop_defs.hh"
#include "exceptions/exceptions.hh"
#include "locator/tablets.hh"
#include "prepared_statement.hh"
#include "data_dictionary/data_dictionary.hh"
#include "data_dictionary/keyspace_metadata.hh"
@@ -90,14 +92,14 @@ void create_keyspace_statement::validate(query_processor& qp, const service::cli
future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>> create_keyspace_statement::prepare_schema_mutations(query_processor& qp, const query_options&, api::timestamp_type ts) const {
using namespace cql_transport;
const auto& tm = *qp.proxy().get_token_metadata_ptr();
const auto tmptr = qp.proxy().get_token_metadata_ptr();
const auto& feat = qp.proxy().features();
const auto& cfg = qp.db().get_config();
std::vector<mutation> m;
std::vector<sstring> warnings;
try {
auto ksm = _attrs->as_ks_metadata(_name, tm, feat, cfg);
auto ksm = _attrs->as_ks_metadata(_name, *tmptr, feat, cfg);
m = service::prepare_new_keyspace_announcement(qp.db().real_database(), ksm, ts);
// If the new keyspace uses tablets, as long as there are features
// which aren't supported by tablets we want to warn the user that
@@ -116,6 +118,21 @@ future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector
"without tablets by adding AND TABLETS = {'enabled': false} "
"to the CREATE KEYSPACE statement.");
}
// If `rf_rack_valid_keyspaces` is enabled, it's forbidden to create an RF-rack-invalid keyspace.
// Verify that it's RF-rack-valid.
// For more context, see: scylladb/scylladb#23071.
if (cfg.rf_rack_valid_keyspaces()) {
try {
// We hold a group0_guard, so it's correct to check this here.
// The topology or schema cannot change while we're performing this query.
locator::assert_rf_rack_valid_keyspace(_name, tmptr, *rs);
} catch (const std::exception& e) {
// There's no guarantee what the type of the exception will be, so we need to
// wrap it manually here in a type that can be passed to the user.
throw exceptions::invalid_request_exception(e.what());
}
}
} catch (const exceptions::already_exists_exception& e) {
if (!_if_not_exists) {
co_return coroutine::exception(std::current_exception());

View File

@@ -140,6 +140,9 @@ std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(
schema_ptr schema = validation::validate_column_family(db, _base_name.get_keyspace(), _base_name.get_column_family());
if (!db.features().views_with_tablets && db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
throw exceptions::invalid_request_exception(format("Materialized views are not supported on base tables with tablets"));
}
if (schema->is_counter()) {
throw exceptions::invalid_request_exception(format("Materialized views are not supported on counter tables"));
}

View File

@@ -1201,7 +1201,7 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"Start serializing reads after their collective memory consumption goes above $normal_limit * $multiplier.")
, reader_concurrency_semaphore_kill_limit_multiplier(this, "reader_concurrency_semaphore_kill_limit_multiplier", liveness::LiveUpdate, value_status::Used, 4,
"Start killing reads after their collective memory consumption goes above $normal_limit * $multiplier.")
, reader_concurrency_semaphore_cpu_concurrency(this, "reader_concurrency_semaphore_cpu_concurrency", liveness::LiveUpdate, value_status::Used, 1,
, reader_concurrency_semaphore_cpu_concurrency(this, "reader_concurrency_semaphore_cpu_concurrency", liveness::LiveUpdate, value_status::Used, 2,
"Admit new reads while there are less than this number of requests that need CPU.")
, view_update_reader_concurrency_semaphore_serialize_limit_multiplier(this, "view_update_reader_concurrency_semaphore_serialize_limit_multiplier", liveness::LiveUpdate, value_status::Used, 2,
"Start serializing view update reads after their collective memory consumption goes above $normal_limit * $multiplier.")
@@ -1364,6 +1364,9 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, disk_space_monitor_high_polling_interval_in_seconds(this, "disk_space_monitor_high_polling_interval_in_seconds", value_status::Used, 1, "Disk-space polling interval at or above polling threshold")
, disk_space_monitor_polling_interval_threshold(this, "disk_space_monitor_polling_interval_threshold", value_status::Used, 0.9, "Disk-space polling threshold. Polling interval is increased when disk utilization is greater than or equal to this threshold")
, enable_create_table_with_compact_storage(this, "enable_create_table_with_compact_storage", liveness::LiveUpdate, value_status::Used, false, "Enable the deprecated feature of CREATE TABLE WITH COMPACT STORAGE. This feature will eventually be removed in a future version.")
, rf_rack_valid_keyspaces(this, "rf_rack_valid_keyspaces", liveness::MustRestart, value_status::Used, false,
"Enforce RF-rack-valid keyspaces. Additionally, if there are existing RF-rack-invalid "
"keyspaces, attempting to start a node with this option ON will fail.")
, default_log_level(this, "default_log_level", value_status::Used, seastar::log_level::info, "Default log level for log messages")
, logger_log_level(this, "logger_log_level", value_status::Used, {}, "Map of logger name to log level. Valid log levels are 'error', 'warn', 'info', 'debug' and 'trace'")
, log_to_stdout(this, "log_to_stdout", value_status::Used, true, "Send log output to stdout")

View File

@@ -535,6 +535,8 @@ public:
named_value<bool> enable_create_table_with_compact_storage;
named_value<bool> rf_rack_valid_keyspaces;
static const sstring default_tls_priority;
private:
template<typename T>

View File

@@ -0,0 +1,127 @@
/*
* Copyright (C) 2024-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include "column_computation.hh"
#include "mutation/atomic_cell.hh"
#include "timestamp.hh"
#include <type_traits>
class row_marker;
// In a basic column_computation defined in column_computation.hh, the
// compute_value() method is only based on the partition key, and it must
// return a value. That API has very limited applications - basically the
// only thing we can implement with it is token_column_computation which
// we used to create the token column in secondary indexes.
// The regular_column_transformation base class here is more powerful, but
// still is not a completely general computation: Its compute_value() virtual
// method can transform the value read from a single cell of a regular column
// into a new cell stored in a structure regular_column_transformation::result.
//
// In more details, the assumptions of regular_column_transformation is:
// 1. compute_value() computes the value based on a *single* column in a
// row passed to compute_value().
// This assumption means that the value or deletion of the value always
// has a single known timestamp (and the value can't be half-missing)
// and single TTL information. That would not have been possible if we
// allowed the computation to depend on multiple columns.
// 2. compute_value() computes the value based on a *regular* column in the
// base table. This means that an update can modify this value (unlike a
// base-table key column that can't change in an update), so the view
// update code needs to compute the value before and after the update,
// and potentially delete and create view rows.
// 3. compute_value() returns a column_computation::result which includes
// a value and its liveness information (timestamp and ttl/expiry) or
// is missing a value.
class regular_column_transformation : public column_computation {
public:
struct result {
// We can use "bytes" instead of "managed_bytes" here because we know
// that a column_computation is only used for generating a key value,
// and that is limited to 64K. This limitation is enforced below -
// we never linearize a cell's value if its size is more than 64K.
std::optional<bytes> _value;
// _ttl and _expiry are only defined if _value is set.
// The default values below are used when the source cell does not
// expire, and are the same values that row_marker uses for a non-
// expiring marker. This is useful when creating a row_marker from
// get_ttl() and get_expiry().
gc_clock::duration _ttl { 0 };
gc_clock::time_point _expiry { gc_clock::duration(0) };
// _ts may be set even if _value is missing, which can remember the
// timestamp of a tombstone. Note that the current view-update code
// that uses this class doesn't use _ts when _value is missing.
api::timestamp_type _ts = api::missing_timestamp;
api::timestamp_type get_ts() const {
return _ts;
}
bool has_value() const {
return _value.has_value();
}
// Should only be called if has_value() is true:
const bytes& get_value() const {
return *_value;
}
gc_clock::duration get_ttl() const {
return _ttl;
}
gc_clock::time_point get_expiry() const {
return _expiry;
}
// A missing computation result
result() { }
// Construct a computation result by copying a given atomic_cell -
// including its value, timestamp, and ttl - or deletion timestamp.
// The second parameter is an optional transformation function f -
// taking a bytes and returning an optional<bytes> - that transforms
// the value of the cell but keeps its other liveness information.
// If f returns a nullopt, it causes the view row should be deleted.
template<typename Func=std::identity>
requires std::invocable<Func, bytes> && std::convertible_to<std::invoke_result_t<Func, bytes>, std::optional<bytes>>
result(atomic_cell_view cell, Func f = {}) {
_ts = cell.timestamp();
if (cell.is_live()) {
// If the cell is larger than what a key can hold (64KB),
// return a missing value. This lets us skip this item during
// view building and avoid hanging the view build as described
// in #8627. But it doesn't prevent later inserting such a item
// to the base table, nor does it implement front-end specific
// limits (such as Alternator's 1K or 2K limits - see #10347).
// Those stricter limits should be validated in the base-table
// write code, not here - deep inside the view update code.
// Note also we assume that f() doesn't grow the value further.
if (cell.value().size() >= 65536) {
return;
}
_value = f(to_bytes(cell.value()));
if (_value) {
if (cell.is_live_and_has_ttl()) {
_ttl = cell.ttl();
_expiry = cell.expiry();
}
}
}
}
};
virtual ~regular_column_transformation() = default;
virtual result compute_value(
const schema& schema,
const partition_key& key,
const db::view::clustering_or_static_row& row) const = 0;
};

View File

@@ -36,6 +36,7 @@
#include "db/view/view_builder.hh"
#include "db/view/view_updating_consumer.hh"
#include "db/view/view_update_generator.hh"
#include "db/view/regular_column_transformation.hh"
#include "db/system_keyspace_view_types.hh"
#include "db/system_keyspace.hh"
#include "db/system_distributed_keyspace.hh"
@@ -506,79 +507,6 @@ size_t view_updates::op_count() const {
return _op_count;
}
row_marker view_updates::compute_row_marker(const clustering_or_static_row& base_row) const {
/*
* We need to compute both the timestamp and expiration for view rows.
*
* Below there are several distinct cases depending on how many new key
* columns the view has - i.e., how many of the view's key columns were
* regular columns in the base. base_regular_columns_in_view_pk.size():
*
* Zero new key columns:
* The view rows key is composed only from base key columns, and those
* cannot be changed in an update, so the view row remains alive as
* long as the base row is alive. We need to return the same row
* marker as the base for the view - to keep an empty view row alive
* for as long as an empty base row exists.
* Note that in this case, if there are *unselected* base columns, we
* may need to keep an empty view row alive even without a row marker
* because the base row (which has additional columns) is still alive.
* For that we have the "virtual columns" feature: In the zero new
* key columns case, we put unselected columns in the view as empty
* columns, to keep the view row alive.
*
* One new key column:
* In this case, there is a regular base column that is part of the
* view key. This regular column can be added or deleted in an update,
* or its expiration be set, and those can cause the view row -
* including its row marker - to need to appear or disappear as well.
* So the liveness of cell of this one column determines the liveness
* of the view row and the row marker that we return.
*
* Two or more new key columns:
* This case is explicitly NOT supported in CQL - one cannot create a
* view with more than one base-regular columns in its key. In general
* picking one liveness (timestamp and expiration) is not possible
* if there are multiple regular base columns in the view key, as
* those can have different liveness.
* However, we do allow this case for Alternator - we need to allow
* the case of two (but not more) because the DynamoDB API allows
* creating a GSI whose two key columns (hash and range key) were
* regular columns.
* We can support this case in Alternator because it doesn't use
* expiration (the "TTL" it does support is different), and doesn't
* support user-defined timestamps. But, the two columns can still
* have different timestamps - this happens if an update modifies
* just one of them. In this case the timestamp of the view update
* (and that of the row marker we return) is the later of these two
* updated columns.
*/
const auto& col_ids = base_row.is_clustering_row()
? _base_info->base_regular_columns_in_view_pk()
: _base_info->base_static_columns_in_view_pk();
if (!col_ids.empty()) {
auto& def = _base->column_at(base_row.column_kind(), col_ids[0]);
// Note: multi-cell columns can't be part of the primary key.
auto cell = base_row.cells().cell_at(col_ids[0]).as_atomic_cell(def);
auto ts = cell.timestamp();
if (col_ids.size() > 1){
// As explained above, this case only happens in Alternator,
// and we may need to pick a higher ts:
auto& second_def = _base->column_at(base_row.column_kind(), col_ids[1]);
auto second_cell = base_row.cells().cell_at(col_ids[1]).as_atomic_cell(second_def);
auto second_ts = second_cell.timestamp();
ts = std::max(ts, second_ts);
// Alternator isn't supposed to have TTL or more than two col_ids!
if (col_ids.size() != 2 || cell.is_live_and_has_ttl() || second_cell.is_live_and_has_ttl()) [[unlikely]] {
utils::on_internal_error(format("Unexpected col_ids length {} or has TTL", col_ids.size()));
}
}
return cell.is_live_and_has_ttl() ? row_marker(ts, cell.ttl(), cell.expiry()) : row_marker(ts);
}
return base_row.marker();
}
namespace {
// The following struct is identical to view_key_with_action, except the key
// is stored as a managed_bytes_view instead of bytes.
@@ -654,8 +582,8 @@ public:
return {_update.key()->get_component(_base, base_col->position())};
default:
if (base_col->kind != _update.column_kind()) {
on_internal_error(vlogger, format("Tried to get a {} column from a {} row update, which is impossible",
to_sstring(base_col->kind), _update.is_clustering_row() ? "clustering" : "static"));
on_internal_error(vlogger, format("Tried to get a {} column {} from a {} row update, which is impossible",
to_sstring(base_col->kind), base_col->name_as_text(), _update.is_clustering_row() ? "clustering" : "static"));
}
auto& c = _update.cells().cell_at(base_col->id);
auto value_view = base_col->is_atomic() ? c.as_atomic_cell(cdef).value() : c.as_collection_mutation().data;
@@ -676,6 +604,22 @@ private:
return handle_collection_column_computation(collection_computation);
}
// TODO: we already calculated this computation in updatable_view_key_cols,
// so perhaps we should pass it here and not re-compute it. But this will
// mean computed columns will only work for view key columns (currently
// we assume that anyway)
if (auto* c = dynamic_cast<const regular_column_transformation*>(&computation)) {
regular_column_transformation::result after =
c->compute_value(_base, _base_key, _update);
if (after.has_value()) {
return {managed_bytes_view(linearized_values.emplace_back(after.get_value()))};
}
// We only get to this function when we know the _update row
// exists and call it to read its key columns, so we don't expect
// to see a missing value for any of those columns
on_internal_error(vlogger, fmt::format("unexpected call to handle_computed_column {} missing in update", cdef.name_as_text()));
}
auto computed_value = computation.compute_value(_base, _base_key);
return {managed_bytes_view(linearized_values.emplace_back(std::move(computed_value)))};
}
@@ -727,7 +671,6 @@ view_updates::get_view_rows(const partition_key& base_key, const clustering_or_s
if (partition.partition_tombstone() && partition.partition_tombstone() == row_delete_tomb.tomb()) {
return;
}
ret.push_back({&partition.clustered_row(*_view, std::move(ckey)), action});
};
@@ -934,13 +877,12 @@ static void add_cells_to_view(const schema& base, const schema& view, column_kin
* Creates a view entry corresponding to the provided base row.
* This method checks that the base row does match the view filter before applying anything.
*/
void view_updates::create_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& update, gc_clock::time_point now) {
void view_updates::create_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& update, gc_clock::time_point now, row_marker update_marker) {
if (!matches_view_filter(db, *_base, _view_info, base_key, update, now)) {
return;
}
auto view_rows = get_view_rows(base_key, update, std::nullopt, {});
auto update_marker = compute_row_marker(update);
const auto kind = update.column_kind();
for (const auto& [r, action]: view_rows) {
if (auto rm = std::get_if<row_marker>(&action)) {
@@ -958,48 +900,28 @@ void view_updates::create_entry(data_dictionary::database db, const partition_ke
* Deletes the view entry corresponding to the provided base row.
* This method checks that the base row does match the view filter before bothering.
*/
void view_updates::delete_old_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& existing, const clustering_or_static_row& update, gc_clock::time_point now) {
void view_updates::delete_old_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& existing, const clustering_or_static_row& update, gc_clock::time_point now, api::timestamp_type deletion_ts) {
// Before deleting an old entry, make sure it was matching the view filter
// (otherwise there is nothing to delete)
if (matches_view_filter(db, *_base, _view_info, base_key, existing, now)) {
do_delete_old_entry(base_key, existing, update, now);
do_delete_old_entry(base_key, existing, update, now, deletion_ts);
}
}
void view_updates::do_delete_old_entry(const partition_key& base_key, const clustering_or_static_row& existing, const clustering_or_static_row& update, gc_clock::time_point now) {
void view_updates::do_delete_old_entry(const partition_key& base_key, const clustering_or_static_row& existing, const clustering_or_static_row& update, gc_clock::time_point now, api::timestamp_type deletion_ts) {
auto view_rows = get_view_rows(base_key, existing, std::nullopt, update.tomb());
const auto kind = existing.column_kind();
for (const auto& [r, action] : view_rows) {
const auto& col_ids = existing.is_clustering_row()
? _base_info->base_regular_columns_in_view_pk()
: _base_info->base_static_columns_in_view_pk();
if (_view_info.has_computed_column_depending_on_base_non_primary_key()) {
if (auto ts_tag = std::get_if<view_key_and_action::shadowable_tombstone_tag>(&action)) {
r->apply(ts_tag->into_shadowable_tombstone(now));
}
} else if (!col_ids.empty()) {
// We delete the old row using a shadowable row tombstone, making sure that
// the tombstone deletes everything in the row (or it might still show up).
// Note: multi-cell columns can't be part of the primary key.
auto& def = _base->column_at(kind, col_ids[0]);
auto cell = existing.cells().cell_at(col_ids[0]).as_atomic_cell(def);
auto ts = cell.timestamp();
if (col_ids.size() > 1) {
// This is the Alternator-only support for two regular base
// columns that become view key columns. See explanation in
// view_updates::compute_row_marker().
auto& second_def = _base->column_at(kind, col_ids[1]);
auto second_cell = existing.cells().cell_at(col_ids[1]).as_atomic_cell(second_def);
auto second_ts = second_cell.timestamp();
ts = std::max(ts, second_ts);
// Alternator isn't supposed to have more than two col_ids!
if (col_ids.size() != 2) [[unlikely]] {
utils::on_internal_error(format("Unexpected col_ids length {}", col_ids.size()));
}
}
if (cell.is_live()) {
r->apply(shadowable_tombstone(ts, now));
}
if (!col_ids.empty() || _view_info.has_computed_column_depending_on_base_non_primary_key()) {
// The view key could have been modified because it contains or
// depends on a non-primary-key. The fact that this function was
// called instead of update_entry() means the caller knows it
// wants to delete the old row (with the given deletion_ts) and
// will create a different one. So let's honor this.
r->apply(shadowable_tombstone(deletion_ts, now));
} else {
// "update" caused the base row to have been deleted, and !col_id
// means view row is the same - so it needs to be deleted as well
@@ -1100,15 +1022,15 @@ bool view_updates::can_skip_view_updates(const clustering_or_static_row& update,
* This method checks that the base row (before and after) matches the view filter before
* applying anything.
*/
void view_updates::update_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& update, const clustering_or_static_row& existing, gc_clock::time_point now) {
void view_updates::update_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& update, const clustering_or_static_row& existing, gc_clock::time_point now, row_marker update_marker) {
// While we know update and existing correspond to the same view entry,
// they may not match the view filter.
if (!matches_view_filter(db, *_base, _view_info, base_key, existing, now)) {
create_entry(db, base_key, update, now);
create_entry(db, base_key, update, now, update_marker);
return;
}
if (!matches_view_filter(db, *_base, _view_info, base_key, update, now)) {
do_delete_old_entry(base_key, existing, update, now);
do_delete_old_entry(base_key, existing, update, now, update_marker.timestamp());
return;
}
@@ -1117,7 +1039,7 @@ void view_updates::update_entry(data_dictionary::database db, const partition_ke
}
auto view_rows = get_view_rows(base_key, update, std::nullopt, {});
auto update_marker = compute_row_marker(update);
const auto kind = update.column_kind();
for (const auto& [r, action] : view_rows) {
if (auto rm = std::get_if<row_marker>(&action)) {
@@ -1133,6 +1055,8 @@ void view_updates::update_entry(data_dictionary::database db, const partition_ke
_op_count += view_rows.size();
}
// Note: despite the general-sounding name of this function, it is used
// just for the case of collection indexing.
void view_updates::update_entry_for_computed_column(
const partition_key& base_key,
const clustering_or_static_row& update,
@@ -1155,30 +1079,72 @@ void view_updates::update_entry_for_computed_column(
}
}
// view_updates::generate_update() is the main function for taking an update
// to a base table row - consisting of existing and updated versions of row -
// and creating from it zero or more updates to a given materialized view.
// These view updates may consist of updating an existing view row, deleting
// an old view row, and/or creating a new view row.
// There are several distinct cases depending on how many of the view's key
// columns are "new key columns", i.e., were regular key columns in the base
// or are a computed column based on a regular column (these computed columns
// are used by, for example, Alternator's GSI):
//
// Zero new key columns:
// The view rows key is composed only from base key columns, and those can't
// be changed in an update, so the view row remains alive as long as the
// base row is alive. The row marker for the view needs to be set to the
// same row marker in the base - to keep an empty view row alive for as long
// as an empty base row exists.
// Note that in this case, if there are *unselected* base columns, we may
// need to keep an empty view row alive even without a row marker because
// the base row (which has additional columns) is still alive. For that we
// have the "virtual columns" feature: In the zero new key columns case, we
// put unselected columns in the view as empty columns, to keep the view
// row alive.
//
// One new key column:
// In this case, there is a regular base column that is part of the view
// key. This regular column can be added or deleted in an update, or its
// expiration be set, and those can cause the view row - including its row
// marker - to need to appear or disappear as well. So the liveness of cell
// of this one column determines the liveness of the view row and the row
// marker that we set for it.
//
// Two or more new key columns:
// This case is explicitly NOT supported in CQL - one cannot create a view
// with more than one base-regular columns in its key. In general picking
// one liveness (timestamp and expiration) is not possible if there are
// multiple regular base columns in the view key, asthose can have different
// liveness.
// However, we do allow this case for Alternator - we need to allow the case
// of two (but not more) because the DynamoDB API allows creating a GSI
// whose two key columns (hash and range key) were regular columns. We can
// support this case in Alternator because it doesn't use expiration (the
// "TTL" it does support is different), and doesn't support user-defined
// timestamps. But, the two columns can still have different timestamps -
// this happens if an update modifies just one of them. In this case the
// timestamp of the view update (and that of the row marker) is the later
// of these two updated columns.
void view_updates::generate_update(
data_dictionary::database db,
const partition_key& base_key,
const clustering_or_static_row& update,
const std::optional<clustering_or_static_row>& existing,
gc_clock::time_point now) {
// Note that the base PK columns in update and existing are the same, since we're intrinsically dealing
// with the same base row. So we have to check 3 things:
// 1) that the clustering key doesn't have a null, which can happen for compact tables. If that's the case,
// there is no corresponding entries.
// 2) if there is a column not part of the base PK in the view PK, whether it is changed by the update.
// 3) whether the update actually matches the view SELECT filter
// FIXME: The following if() is old code which may be related to COMPACT
// STORAGE. If this is a real case, refer to a test that demonstrates it.
// If it's not a real case, remove this if().
if (update.is_clustering_row()) {
if (!update.key()->is_full(*_base)) {
return;
}
}
if (_view_info.has_computed_column_depending_on_base_non_primary_key()) {
return update_entry_for_computed_column(base_key, update, existing, now);
}
if (!_base_info->has_base_non_pk_columns_in_view_pk) {
// If the view key depends on any regular column in the base, the update
// may change the view key and may require deleting an old view row and
// inserting a new row. The other case, which we'll handle here first,
// is easier and require just modifying one view row.
if (!_base_info->has_base_non_pk_columns_in_view_pk &&
!_view_info.has_computed_column_depending_on_base_non_primary_key()) {
if (update.is_static_row()) {
// TODO: support static rows in views with pk only including columns from base pk
return;
@@ -1186,85 +1152,186 @@ void view_updates::generate_update(
// The view key is necessarily the same pre and post update.
if (existing && existing->is_live(*_base)) {
if (update.is_live(*_base)) {
update_entry(db, base_key, update, *existing, now);
update_entry(db, base_key, update, *existing, now, update.marker());
} else {
delete_old_entry(db, base_key, *existing, update, now);
delete_old_entry(db, base_key, *existing, update, now, api::missing_timestamp);
}
} else if (update.is_live(*_base)) {
create_entry(db, base_key, update, now);
create_entry(db, base_key, update, now, update.marker());
}
return;
}
const auto& col_ids = update.is_clustering_row()
? _base_info->base_regular_columns_in_view_pk()
: _base_info->base_static_columns_in_view_pk();
// The view has a non-primary-key column from the base table as its primary key.
// That means it's either a regular or static column. If we are currently
// processing an update which does not correspond to the column's kind,
// just stop here.
if (col_ids.empty()) {
// Find the view key columns that may be changed by an update.
// This case is interesting because a change to the view key means that
// we may need to delete an old view row and/or create a new view row.
// The columns we look for are view key columns that are neither base key
// columns nor computed columns based just on key columns. In other words,
// we look here for columns which were regular columns or static columns
// in the base table, or computed columns based on regular columns.
struct updatable_view_key_col {
column_id view_col_id;
regular_column_transformation::result before;
regular_column_transformation::result after;
};
std::vector<updatable_view_key_col> updatable_view_key_cols;
for (const column_definition& view_col : _view->primary_key_columns()) {
if (view_col.is_computed()) {
const column_computation& computation = view_col.get_computation();
if (computation.depends_on_non_primary_key_column()) {
// Column is a computed column that does not depend just on
// the base key, so it may change in the update.
if (auto* c = dynamic_cast<const regular_column_transformation*>(&computation)) {
updatable_view_key_cols.emplace_back(view_col.id,
existing ? c->compute_value(*_base, base_key, *existing) : regular_column_transformation::result(),
c->compute_value(*_base, base_key, update));
} else {
// The only other column_computation we have which has
// depends_on_non_primary_key_column is
// collection_column_computation, and we have a special
// function to handle that case:
return update_entry_for_computed_column(base_key, update, existing, now);
}
}
} else {
const column_definition* base_col = _base->get_column_definition(view_col.name());
if (!base_col) {
on_internal_error(vlogger, fmt::format("Column {} in view {}.{} was not found in the base table {}.{}",
view_col.name(), _view->ks_name(), _view->cf_name(), _base->ks_name(), _base->cf_name()));
}
// If the view key column was also a base primary key column, then
// it can't possibly change in this update. But the column was not
// not a primary key column - i.e., a regular column or static
// column, the update might have changed it and we need to list it
// on updatable_view_key_cols.
// We check base_col->kind == update.column_kind() instead of just
// !base_col->is_primary_key() because when update is a static row
// we know it can't possibly update a regular column (and vice
// versa).
if (base_col->kind == update.column_kind()) {
// This is view key, so we know it is atomic
std::optional<atomic_cell_view> after;
auto afterp = update.cells().find_cell(base_col->id);
if (afterp) {
after = afterp->as_atomic_cell(*base_col);
}
std::optional<atomic_cell_view> before;
if (existing) {
auto beforep = existing->cells().find_cell(base_col->id);
if (beforep) {
before = beforep->as_atomic_cell(*base_col);
}
}
updatable_view_key_cols.emplace_back(view_col.id,
before ? regular_column_transformation::result(*before) : regular_column_transformation::result(),
after ? regular_column_transformation::result(*after) : regular_column_transformation::result());
}
}
}
// If we reached here, the view has a non-primary-key column from the base
// table as its primary key. That means it's either a regular or static
// column. If we are currently processing an update which does not
// correspond to the column's kind, updatable_view_key_cols will be empty
// and we can just stop here.
if (updatable_view_key_cols.empty()) {
return;
}
const auto kind = update.column_kind();
// If one of the key columns is missing, set has_new_row = false
// meaning that after the update there will be no view row.
// If one of the key columns is missing in the existing value,
// set has_old_row = false meaning we don't have an old row to
// delete.
// Use updatable_view_key_cols - the before and after values of the
// view key columns that may have changed - to determine if the update
// changes an existing view row, deletes an old row or creates a new row.
bool has_old_row = true;
bool has_new_row = true;
bool same_row = true;
for (auto col_id : col_ids) {
auto* after = update.cells().find_cell(col_id);
auto& cdef = _base->column_at(kind, col_id);
if (existing) {
auto* before = existing->cells().find_cell(col_id);
// Note that this cell is necessarily atomic, because col_ids are
// view key columns, and keys must be atomic.
if (before && before->as_atomic_cell(cdef).is_live()) {
if (after && after->as_atomic_cell(cdef).is_live()) {
// We need to compare just the values of the keys, not
// metadata like the timestamp. This is because below,
// if the old and new view row have the same key, we need
// to be sure to reach the update_entry() case.
auto cmp = compare_unsigned(before->as_atomic_cell(cdef).value(), after->as_atomic_cell(cdef).value());
if (cmp != 0) {
same_row = false;
}
bool same_row = true; // undefined if either has_old_row or has_new_row are false
for (const auto& u : updatable_view_key_cols) {
if (u.before.has_value()) {
if (u.after.has_value()) {
if (compare_unsigned(u.before.get_value(), u.after.get_value()) != 0) {
same_row = false;
}
} else {
has_old_row = false;
has_new_row = false;
}
} else {
has_old_row = false;
}
if (!after || !after->as_atomic_cell(cdef).is_live()) {
has_new_row = false;
if (!u.after.has_value()) {
has_new_row = false;
}
}
}
// If has_new_row, calculate a row marker for this view row - i.e., a
// timestamp and ttl - based on those of the updatable view key column
// (or, in an Alternator-only extension, more than one).
row_marker new_row_rm; // only set if has_new_row
if (has_new_row) {
// Note:
// 1. By reaching here we know that updatable_view_key_cols has at
// least one member (in CQL, it's always one, in Alternator it
// may be two).
// 2. Because has_new_row, we know all elements in that array have
// after.has_value() true, so we can use after.get_ts() et al.
api::timestamp_type new_row_ts = updatable_view_key_cols[0].after.get_ts();
// This is the Alternator-only support for *two* regular base columns
// that become view key columns. The timestamp we use is the *maximum*
// of the two key columns, as explained in pull-request #17172.
if (updatable_view_key_cols.size() > 1) {
auto second_ts = updatable_view_key_cols[1].after.get_ts();
new_row_ts = std::max(new_row_ts, second_ts);
// Alternator isn't supposed to have more than two updatable view key columns!
if (updatable_view_key_cols.size() != 2) [[unlikely]] {
utils::on_internal_error(format("Unexpected updatable_view_key_col length {}", updatable_view_key_cols.size()));
}
}
// We assume that either updatable_view_key_cols has just one column
// (the only situation allowed in CQL) or if there is more then one
// they have the same expiry information (in Alternator, there is
// never a CQL TTL set).
new_row_rm = row_marker(new_row_ts, updatable_view_key_cols[0].after.get_ttl(), updatable_view_key_cols[0].after.get_expiry());
}
if (has_old_row) {
// As explained in #19977, when there is one updatable_view_key_cols
// (the only case allowed in CQL) the deletion timestamp is before's
// timestamp. As explained in #17119, if there are two of them (only
// possible in Alternator), we take the maximum.
// Note:
// 1. By reaching here we know that updatable_view_key_cols has at
// least one member (in CQL, it's always one, in Alternator it
// may be two).
// 2. Because has_old_row, we know all elements in that array have
// before.has_value() true, so we can use before.get_ts().
auto old_row_ts = updatable_view_key_cols[0].before.get_ts();
if (updatable_view_key_cols.size() > 1) {
// This is the Alternator-only support for two regular base
// columns that become view key columns. See explanation in
// view_updates::compute_row_marker().
auto second_ts = updatable_view_key_cols[1].before.get_ts();
old_row_ts = std::max(old_row_ts, second_ts);
// Alternator isn't supposed to have more than two updatable view key columns!
if (updatable_view_key_cols.size() != 2) [[unlikely]] {
utils::on_internal_error(format("Unexpected updatable_view_key_col length {}", updatable_view_key_cols.size()));
}
}
if (has_new_row) {
if (same_row) {
update_entry(db, base_key, update, *existing, now);
update_entry(db, base_key, update, *existing, now, new_row_rm);
} else {
// This code doesn't work if the old and new view row have the
// same key, because if they do we get both data and tombstone
// for the same timestamp (now) and the tombstone wins. This
// is why we need the "same_row" case above - it's not just a
// performance optimization.
delete_old_entry(db, base_key, *existing, update, now);
create_entry(db, base_key, update, now);
// The following code doesn't work if the old and new view row
// have the same key, because if they do we can get both data
// and tombstone for the same timestamp and the tombstone
// wins. This is why we need the "same_row" case above - it's
// not just a performance optimization.
delete_old_entry(db, base_key, *existing, update, now, old_row_ts);
create_entry(db, base_key, update, now, new_row_rm);
}
} else {
delete_old_entry(db, base_key, *existing, update, now);
delete_old_entry(db, base_key, *existing, update, now, old_row_ts);
}
} else if (has_new_row) {
create_entry(db, base_key, update, now);
create_entry(db, base_key, update, now, new_row_rm);
}
}
bool view_updates::is_partition_key_permutation_of_base_partition_key() const {
@@ -2995,6 +3062,12 @@ public:
_step.build_status.pop_back();
}
}
// before going back to the minimum token, advance current_key to the end
// and check for built views in that range.
_step.current_key = {_step.prange.end().value_or(dht::ring_position::max()).value().token(), partition_key::make_empty()};
check_for_built_views();
_step.current_key = {dht::minimum_token(), partition_key::make_empty()};
for (auto&& vs : _step.build_status) {
vs.next_token = dht::minimum_token();

View File

@@ -240,10 +240,10 @@ private:
};
std::vector<view_row_entry> get_view_rows(const partition_key& base_key, const clustering_or_static_row& update, const std::optional<clustering_or_static_row>& existing, row_tombstone update_tomb);
bool can_skip_view_updates(const clustering_or_static_row& update, const clustering_or_static_row& existing) const;
void create_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& update, gc_clock::time_point now);
void delete_old_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& existing, const clustering_or_static_row& update, gc_clock::time_point now);
void do_delete_old_entry(const partition_key& base_key, const clustering_or_static_row& existing, const clustering_or_static_row& update, gc_clock::time_point now);
void update_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& update, const clustering_or_static_row& existing, gc_clock::time_point now);
void create_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& update, gc_clock::time_point now, row_marker update_marker);
void delete_old_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& existing, const clustering_or_static_row& update, gc_clock::time_point now, api::timestamp_type deletion_ts);
void do_delete_old_entry(const partition_key& base_key, const clustering_or_static_row& existing, const clustering_or_static_row& update, gc_clock::time_point now, api::timestamp_type deletion_ts);
void update_entry(data_dictionary::database db, const partition_key& base_key, const clustering_or_static_row& update, const clustering_or_static_row& existing, gc_clock::time_point now, row_marker update_marker);
void update_entry_for_computed_column(const partition_key& base_key, const clustering_or_static_row& update, const std::optional<clustering_or_static_row>& existing, gc_clock::time_point now);
};

View File

@@ -12,15 +12,16 @@ Architecture: any
Description: Scylla database main configuration file
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
Replaces: %{product}-server (<< 1.1)
Replaces: %{product}-server (<< 1.1), scylla-enterprise-conf (<< 2025.1.0~)
Conflicts: %{product}-server (<< 1.1)
Breaks: scylla-enterprise-conf (<< 2025.1.0~)
Package: %{product}-server
Architecture: any
Depends: ${misc:Depends}, %{product}-conf (= ${binary:Version}), %{product}-python3 (= ${binary:Version})
Replaces: %{product}-tools (<<5.5)
Breaks: %{product}-tools (<<5.5)
Description: Scylla database server binaries
Replaces: %{product}-tools (<<5.5), scylla-enterprise-tools (<< 2024.2.0~), scylla-enterprise-server (<< 2025.1.0~)
Breaks: %{product}-tools (<<5.5), scylla-enterprise-tools (<< 2024.2.0~), scylla-enterprise-server (<< 2025.1.0~)
Description: Scylla database server binaries
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
@@ -29,6 +30,8 @@ Section: debug
Priority: extra
Architecture: any
Depends: %{product}-server (= ${binary:Version}), ${misc:Depends}
Replaces: scylla-enterprise-server-dbg (<< 2025.1.0~)
Breaks: scylla-enterprise-server-dbg (<< 2025.1.0~)
Description: debugging symbols for %{product}-server
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
@@ -37,13 +40,17 @@ Description: debugging symbols for %{product}-server
Package: %{product}-kernel-conf
Architecture: any
Depends: procps
Replaces: scylla-enterprise-kernel-conf (<< 2025.1.0~)
Breaks: scylla-enterprise-kernel-conf (<< 2025.1.0~)
Description: Scylla kernel tuning configuration
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
Package: %{product}-node-exporter
Architecture: any
Replaces: scylla-enterprise-node-exporter (<< 2025.1.0~)
Conflicts: prometheus-node-exporter
Breaks: scylla-enterprise-node-exporter (<< 2025.1.0~)
Description: Prometheus exporter for machine metrics
Prometheus exporter for machine metrics, written in Go with pluggable metric collectors.
@@ -54,6 +61,49 @@ Depends: %{product}-server (= ${binary:Version})
, %{product}-kernel-conf (= ${binary:Version})
, %{product}-node-exporter (= ${binary:Version})
, %{product}-cqlsh (= ${binary:Version})
Replaces: scylla-enterprise (<< 2025.1.0~)
Breaks: scylla-enterprise (<< 2025.1.0~)
Description: Scylla database metapackage
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
Package: scylla-enterprise-conf
Depends: %{product}-conf (= ${binary:Version})
Architecture: all
Priority: optional
Section: oldlibs
Description: transitional package
This is a transitional package. It can safely be removed.
Package: scylla-enterprise-server
Depends: %{product}-server (= ${binary:Version})
Architecture: all
Priority: optional
Section: oldlibs
Description: transitional package
This is a transitional package. It can safely be removed.
Package: scylla-enterprise
Depends: %{product} (= ${binary:Version})
Architecture: all
Priority: optional
Section: oldlibs
Description: transitional package
This is a transitional package. It can safely be removed.
Package: scylla-enterprise-kernel-conf
Depends: %{product}-kernel-conf (= ${binary:Version})
Architecture: all
Priority: optional
Section: oldlibs
Description: transitional package
This is a transitional package. It can safely be removed.
Package: scylla-enterprise-node-exporter
Depends: %{product}-node-exporter (= ${binary:Version})
Architecture: all
Priority: optional
Section: oldlibs
Description: transitional package
This is a transitional package. It can safely be removed.

View File

@@ -11,6 +11,8 @@ endif
product := $(subst -server,,$(DEB_SOURCE))
libreloc_list := $(shell find scylla/libreloc/ -maxdepth 1 -type f -not -name .*.hmac -and -not -name gnutls.config -printf '-X%f ')
libexec_list := $(shell find scylla/libexec/ -maxdepth 1 -type f -not -name scylla -and -not -name iotune -printf '-X%f ')
override_dh_auto_configure:
override_dh_auto_build:
@@ -38,7 +40,7 @@ endif
override_dh_strip:
# The binaries (ethtool...patchelf) don't pass dh_strip after going through patchelf. Since they are
# already stripped, nothing is lost if we exclude them, so that's what we do.
dh_strip -Xlibprotobuf.so.15 -Xld.so -Xethtool -Xgawk -Xgzip -Xhwloc-calc -Xhwloc-distrib -Xifconfig -Xlscpu -Xnetstat -Xpatchelf --dbg-package=$(product)-server-dbg
dh_strip $(libreloc_list) $(libexec_list) --dbg-package=$(product)-server-dbg
find $(CURDIR)/debian/$(product)-server-dbg/usr/lib/debug/.build-id/ -name "*.debug" -exec objcopy --decompress-debug-sections {} \;
override_dh_makeshlibs:

View File

@@ -21,6 +21,7 @@ opt/scylladb/scyllatop/*
opt/scylladb/scripts/libexec/*
opt/scylladb/bin/*
opt/scylladb/libreloc/*
opt/scylladb/libreloc/.*.hmac
opt/scylladb/libexec/*
usr/lib/scylla/*
var/lib/scylla/data

View File

@@ -13,7 +13,8 @@ Requires: %{product}-python3 = %{version}-%{release}
Requires: %{product}-kernel-conf = %{version}-%{release}
Requires: %{product}-node-exporter = %{version}-%{release}
Requires: %{product}-cqlsh = %{version}-%{release}
Obsoletes: scylla-server < 1.1
Provides: scylla-enterprise = %{version}-%{release}
Obsoletes: scylla-enterprise < 2025.1.0
%global _debugsource_template %{nil}
%global _debuginfo_subpackages %{nil}
@@ -73,6 +74,10 @@ Requires: %{product}-python3 = %{version}-%{release}
AutoReqProv: no
Provides: %{product}-tools:%{_bindir}/nodetool
Provides: %{product}-tools:%{_sysconfigdir}/bash_completion.d/nodetool-completion
Provides: scylla-enterprise-tools:%{_bindir}/nodetool
Provides: scylla-enterprise-tools:%{_sysconfigdir}/bash_completion.d/nodetool-completion
Provides: scylla-enterprise-server = %{version}-%{release}
Obsoletes: scylla-enterprise-server < 2025.1.0
%description server
This package contains ScyllaDB server.
@@ -132,6 +137,7 @@ ln -sfT /etc/scylla /var/lib/scylla/conf
/opt/scylladb/scyllatop/*
/opt/scylladb/bin/*
/opt/scylladb/libreloc/*
/opt/scylladb/libreloc/.*.hmac
/opt/scylladb/libexec/*
%{_prefix}/lib/scylla/*
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/
@@ -156,6 +162,8 @@ ln -sfT /etc/scylla /var/lib/scylla/conf
Group: Applications/Databases
Summary: Scylla configuration package
Obsoletes: scylla-server < 1.1
Provides: scylla-enterprise-conf = %{version}-%{release}
Obsoletes: scylla-enterprise-conf < 2025.1.0
%description conf
This package contains the main scylla configuration file.
@@ -176,6 +184,8 @@ Summary: Scylla configuration package for the Linux kernel
Requires: kmod
# tuned overwrites our sysctl settings
Obsoletes: tuned >= 2.11.0
Provides: scylla-enterprise-kernel-conf = %{version}-%{release}
Obsoletes: scylla-enterprise-kernel-conf < 2025.1.0
%description kernel-conf
This package contains Linux kernel configuration changes for the Scylla database. Install this package
@@ -212,6 +222,8 @@ Group: Applications/Databases
Summary: Prometheus exporter for machine metrics
License: ASL 2.0
URL: https://github.com/prometheus/node_exporter
Provides: scylla-enterprise-node-exporter = %{version}-%{release}
Obsoletes: scylla-enterprise-node-exporter < 2025.1.0
%description node-exporter
Prometheus exporter for machine metrics, written in Go with pluggable metric collectors.

View File

@@ -187,8 +187,8 @@ ATTACH SERVICE_LEVEL oltp TO bob;
Note that `alternator_enforce_authorization` has to be enabled in Scylla configuration.
See [Authorization](##Authorization) section to learn more about roles and authorization.
See <https://enterprise.docs.scylladb.com/stable/using-scylla/workload-prioritization.html>
to read about **Workload Prioritization** in detail.
See [Workload Prioritization](../features/workload-prioritization)
to read about Workload Prioritization in detail.
## Metrics
@@ -272,12 +272,6 @@ behave the same in Alternator. However, there are a few features which we have
not implemented yet. Unimplemented features return an error when used, so
they should be easy to detect. Here is a list of these unimplemented features:
* Currently in Alternator, a GSI (Global Secondary Index) can only be added
to a table at table creation time. DynamoDB allows adding a GSI (but not an
LSI) to an existing table using an UpdateTable operation, and similarly it
allows removing a GSI from a table.
<https://github.com/scylladb/scylla/issues/11567>
* GSI (Global Secondary Index) and LSI (Local Secondary Index) may be
configured to project only a subset of the base-table attributes to the
index. This option is not yet respected by Alternator - all attributes
@@ -319,7 +313,7 @@ they should be easy to detect. Here is a list of these unimplemented features:
RestoreTableToPointInTime
* DynamoDB's encryption-at-rest settings are not supported. The Encryption-
at-rest feature is available in Scylla Enterprise, but needs to be
at-rest feature is available in ScyllaDB, but needs to be
enabled and configured separately, not through the DynamoDB API.
* No support for throughput accounting or capping. As mentioned above, the
@@ -378,3 +372,14 @@ they should be easy to detect. Here is a list of these unimplemented features:
that can be used to forbid table deletion. This table option was added to
DynamoDB in March 2023.
<https://github.com/scylladb/scylla/issues/14482>
* Alternator does not support the table option WarmThroughput that can be
used to check or guarantee that the database has "warmed" to handle a
particular throughput. This table option was added to DynamoDB in
November 2024.
<https://github.com/scylladb/scylladb/issues/21853>
* Alternator does not support the table option MultiRegionConsistency
that can be used to achieve consistent reads on global (multi-region) tables.
This table option was added as a preview to DynamoDB in December 2024.
<https://github.com/scylladb/scylladb/issues/21852>

View File

@@ -144,3 +144,46 @@ If a certain data center or rack has no functional nodes, or doesn't even
exist, an empty list (`[]`) is returned by the `/localnodes` request.
A client should be prepared to consider expanding the node search to an
entire data center, or other data centers, in that case.
## Tablets
"Tablets" are ScyllaDB's new approach to replicating data across a cluster.
It replaces the older approach which was named "vnodes". Compared to vnodes,
tablets are smaller pieces of tables that are easier to move between nodes,
and allow for faster growing or shrinking of the cluster when needed.
In this version, tablet support is incomplete and not all of the features
which Alternator needs are supported with tablets. So currently, new
Alternator tables default to using vnodes - not tablets.
However, if you do want to create an Alternator table which uses tablets,
you can do this by specifying the `experimental:initial_tablets` tag in
the CreateTable operation. The value of this tag can be:
* Any valid integer as the value of this tag enables tablets.
Typically the number "0" is used - which tells ScyllaDB to pick a reasonable
number of initial tablets. But any other number can be used, and this
number overrides the default choice of initial number of tablets.
* Any non-integer value - e.g., the string "none" - creates the table
without tablets - i.e., using vnodes.
The `experimental:initial_tablets` tag only has any effect while creating
a new table with CreateTable - changing it later has no effect.
Because the tablets support is incomplete, when tablets are enabled for an
Alternator table, the following features will not work for this table:
* The table must have one of the write isolation modes which does not
not use LWT, because it's not supported with tablets. The allowed write
isolation modes are `forbid_rmw` or `unsafe_rmw`.
Setting the isolation mode to `always_use_lwt` will succeed, but the writes
themselves will fail with an InternalServerError. At that point you can
still change the write isolation mode of the table to a supported mode.
See <https://github.com/scylladb/scylladb/issues/18068>.
* Enabling TTL with UpdateTableToLive doesn't work (results in an error).
See <https://github.com/scylladb/scylla/issues/16567>.
* Enabling Streams with CreateTable or UpdateTable doesn't work
(results in an error).
See <https://github.com/scylladb/scylla/issues/16317>.

View File

@@ -70,8 +70,6 @@ Set the parameters for :ref:`Leveled Compaction <leveled-compaction-strategy-lcs
Incremental Compaction Strategy (ICS)
=====================================
.. versionadded:: 2019.1.4 Scylla Enterprise
ICS principles of operation are similar to those of STCS, merely replacing the increasingly larger SSTables in each tier, by increasingly longer SSTable runs, modeled after LCS runs, but using larger fragment size of 1 GB, by default.
Compaction is triggered when there are two or more runs of roughly the same size. These runs are incrementally compacted with each other, producing a new SSTable run, while incrementally releasing space as soon as each SSTable in the input run is processed and compacted. This method eliminates the high temporary space amplification problem of STCS by limiting the overhead to twice the (constant) fragment size, per shard.

View File

@@ -12,6 +12,7 @@ ScyllaDB Architecture
SSTable <sstable/index/>
Compaction Strategies <compaction/compaction-strategies>
Raft Consensus Algorithm in ScyllaDB </architecture/raft>
Zero-token Nodes </architecture/zero-token-nodes>
* :doc:`Data Distribution with Tablets </architecture/tablets/>` - Tablets in ScyllaDB
@@ -22,5 +23,6 @@ ScyllaDB Architecture
* :doc:`SSTable </architecture/sstable/index/>` - ScyllaDB SSTable 2.0 and 3.0 Format Information
* :doc:`Compaction Strategies </architecture/compaction/compaction-strategies>` - High-level analysis of different compaction strategies
* :doc:`Raft Consensus Algorithm in ScyllaDB </architecture/raft>` - Overview of how Raft is implemented in ScyllaDB.
* :doc:`Zero-token Nodes </architecture/zero-token-nodes>` - Nodes that do not replicate any data.
Learn more about these topics in the `ScyllaDB University: Architecture lesson <https://university.scylladb.com/courses/scylla-essentials-overview/lessons/architecture/>`_.

View File

@@ -15,7 +15,7 @@ SSTable Version Support
- ScyllaDB Enterprise Version
- ScyllaDB Open Source Version
* - 3.x ('me')
- 2022.2
- 2022.2 and above
- 5.1 and above
* - 3.x ('md')
- 2021.1

View File

@@ -9,11 +9,7 @@ ScyllaDB SSTable Format
.. include:: _common/sstable_what_is.rst
* In ScyllaDB 6.0 and above, *me* format is enabled by default.
* In ScyllaDB Enterprise 2021.1, ScyllaDB 4.3 and above, *md* format is enabled by default.
* In ScyllaDB 3.1 and above, *mc* format is enabled by default.
In ScyllaDB 6.0 and above, *me* format is enabled by default.
For more information on each of the SSTable formats, see below:

View File

@@ -12,17 +12,7 @@ ScyllaDB SSTable - 3.x
.. include:: ../_common/sstable_what_is.rst
* In ScyllaDB 6.0 and above, the ``me`` format is mandatory, and ``md`` format is used only when upgrading from an existing cluster using ``md``. The ``sstable_format`` parameter is ignored if it is set to ``md``.
* In ScyllaDB 5.1 and above, the ``me`` format is enabled by default.
* In ScyllaDB 4.3 to 5.0, the ``md`` format is enabled by default.
* In ScyllaDB 3.1 to 4.2, the ``mc`` format is enabled by default.
* In ScyllaDB 3.0, the ``mc`` format is disabled by default. You can enable it by adding the ``enable_sstables_mc_format`` parameter set to ``true`` in the ``scylla.yaml`` file. For example:
.. code-block:: shell
enable_sstables_mc_format: true
.. REMOVE IN FUTURE VERSIONS - Remove the note above in version 5.2.
In ScyllaDB 6.0 and above, the ``me`` format is mandatory, and ``md`` format is used only when upgrading from an existing cluster using ``md``. The ``sstable_format`` parameter is ignored if it is set to ``md``.
Additional Information
-------------------------

View File

@@ -75,15 +75,7 @@ to a new node.
File-based Streaming
========================
:label-tip:`ScyllaDB Enterprise`
File-based streaming is a ScyllaDB Enterprise-only feature that optimizes
tablet migration.
In ScyllaDB Open Source, migrating tablets is performed by streaming mutation
fragments, which involves deserializing SSTable files into mutation fragments
and re-serializing them back into SSTables on the other node.
In ScyllaDB Enterprise, migrating tablets is performed by streaming entire
Migrating tablets is performed by streaming entire
SStables, which does not require (de)serializing or processing mutation fragments.
As a result, less data is streamed over the network, and less CPU is consumed,
especially for data models that contain small cells.
@@ -143,9 +135,17 @@ You can create a keyspace with tablets enabled with the ``tablets = {'enabled':
the keyspace schema with ``tablets = { 'enabled': false }`` or
``tablets = { 'enabled': true }``.
.. _tablets-limitations:
Limitations and Unsupported Features
--------------------------------------
.. warning::
If a keyspace has tablets enabled, it must remain :term:`RF-rack-valid <RF-rack-valid keyspace>`
throughout its lifetime. Failing to keep that invariant satisfied may result in data inconsistencies,
performance problems, or other issues.
The following ScyllaDB features are not supported if a keyspace has tablets
enabled:
@@ -157,6 +157,15 @@ enabled:
If you plan to use any of the above features, CREATE your keyspace
:ref:`with tablets disabled <tablets-enable-tablets>`.
The following ScyllaDB features are disabled by default when used with a keyspace
that has tablets enabled:
* Materialized Views (MV)
* Secondary indexes (SI, as it depends on MV)
To enable MV and SI for tablet keyspaces, use the `--experimental-features=views-with-tablets`
configuration option. See :ref:`Views with tablets <admin-views-with-tablets>` for details.
Resharding in keyspaces with tablets enabled has the following limitations:
* ScyllaDB does not support reducing the number of shards after node restart.

View File

@@ -0,0 +1,28 @@
=========================
Zero-token Nodes
=========================
By default, all nodes in a cluster own a set of token ranges and are used to
replicate data. In certain circumstances, you may choose to add a node that
doesn't own any token. Such nodes are referred to as zero-token nodes. They
do not have a copy of the data but only participate in Raft quorum voting.
To configure a zero-token node, set the ``join_ring`` parameter to ``false``.
You can use zero-token nodes in multi-DC deployments to reduce the risk of
losing a quorum of nodes.
See :doc:`Preventing Quorum Loss in Symmetrical Multi-DC Clusters </operating-scylla/procedures/cluster-management/arbiter-dc>` for details.
Note that:
* Zero-token nodes are ignored by drivers, so there is no need to change
the load balancing policy on the clients after adding zero-token nodes
to the cluster.
* Zero-token nodes never store replicated data, so running ``nodetool rebuild``,
``nodetool repair``, and ``nodetool cleanup`` can be skipped as it does not
affect zero-token nodes.
* Racks consisting solely of zero-token nodes are not taken into consideration
when deciding whether a keyspace is :term:`RF-rack-valid <RF-rack-valid keyspace>`.
However, an RF-rack-valid keyspace must have the replication factor equal to 0
in an :doc:`arbiter DC </operating-scylla/procedures/cluster-management/arbiter-dc>`.
Otherwise, it is RF-rack-invalid.

View File

@@ -1,3 +0,0 @@
By default, a keyspace is created with tablets enabled. The ``tablets`` option
is used to opt out a keyspace from tablets-based distribution; see :ref:`Enabling Tablets <tablets-enable-tablets>`
for details.

View File

@@ -170,8 +170,6 @@ LCS options
Incremental Compaction Strategy (ICS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. versionadded:: 2019.1.4 Scylla Enterprise
When using ICS, SSTable runs are put in different buckets depending on their size.
When an SSTable run is bucketed, the average size of the runs in the bucket is compared to the new run, as well as the ``bucket_high`` and ``bucket_low`` levels.

View File

@@ -203,18 +203,6 @@ An example that excludes a datacenter while using ``replication_factor``::
DESCRIBE KEYSPACE excalibur
CREATE KEYSPACE excalibur WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = true;
.. only:: opensource
Keyspace storage options :label-caution:`Experimental`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, SStables of a keyspace are stored locally.
As an alternative, you can configure your keyspace to be stored
on Amazon S3 or another S3-compatible object store.
See :ref:`Keyspace storage options <admin-keyspace-storage-options>` for details.
.. _tablets:
The ``tablets`` property
@@ -232,7 +220,15 @@ sub-option type description
``'initial'`` int The number of tablets to start with
===================================== ====== =============================================
.. scylladb_include_flag:: tablets-default.rst
By default, a keyspace is created with tablets enabled. You can use the ``tablets`` option
to opt out a keyspace from tablets-based distribution.
You may want to opt out if you plan to use features that are not supported for keyspaces
with tablets enabled. Keyspaces using tablets must also remain :term:`RF-rack-valid <RF-rack-valid keyspace>`
throughout their lifetime. See :ref:`Limitations and Unsupported Features <tablets-limitations>`
for details.
**The ``initial`` sub-option (deprecated)**
A good rule of thumb to calculate initial tablets is to divide the expected total storage used
by tables in this keyspace by (``replication_factor`` * 5GB). For example, if you expect a 30TB
@@ -253,6 +249,14 @@ An example that creates a keyspace with 2048 tablets per table::
See :doc:`Data Distribution with Tablets </architecture/tablets>` for more information about tablets.
Keyspace storage options :label-caution:`Experimental`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, SStables of a keyspace are stored locally.
As an alternative, you can configure your keyspace to be stored
on Amazon S3 or another S3-compatible object store.
See :ref:`Keyspace storage options <admin-keyspace-storage-options>` for details.
.. _use-statement:
USE
@@ -285,8 +289,8 @@ For instance::
The supported options are the same as :ref:`creating a keyspace <create-keyspace-statement>`.
ALTER KEYSPACE with Tablets :label-caution:`Experimental`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ALTER KEYSPACE with Tablets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Modifying a keyspace with tablets enabled is possible and doesn't require any special CQL syntax. However, there are some limitations:
@@ -295,6 +299,7 @@ Modifying a keyspace with tablets enabled is possible and doesn't require any sp
- If there's any other ongoing global topology operation, executing the ``ALTER`` statement will fail (with an explicit and specific error) and needs to be repeated.
- The ``ALTER`` statement may take longer than the regular query timeout, and even if it times out, it will continue to execute in the background.
- The replication strategy cannot be modified, as keyspaces with tablets only support ``NetworkTopologyStrategy``.
- The ``ALTER`` statement will fail if it would make the keyspace :term:`RF-rack-invalid <RF-rack-valid keyspace>`.
.. _drop-keyspace-statement:

View File

@@ -225,7 +225,9 @@ CREATE TYPE system.tablet_task_info (
tablet_task_id uuid,
request_time timestamp,
sched_nr bigint,
sched_time timestamp
sched_time timestamp,
repair_hosts_filter text,
repair_dcs_filter text,
)
~~~
@@ -255,6 +257,8 @@ Only tables which use tablet-based replication strategy have an entry here.
* `request_time` - The time the request is created.
* `sched_nr` - Number of times the request has been scheduled by the repair scheduler.
* `sched_time` - The time the request has been scheduled by the repair scheduler.
* `repair_hosts_filter` - Repair replicas listed in the comma-separated host_id list.
* `repair_dcs_filter` - Repair replicas listed in the comma-separated DC list.
`repair_scheduler_config` contains configuration for the repair scheduler. It contains the following values:
* `auto_repair_enabled` - When set to true, auto repair is enabled. Disabled by default.

View File

@@ -64,18 +64,20 @@ Briefly:
- `/task_manager/list_module_tasks/{module}` -
lists (by default non-internal) tasks in the module;
- `/task_manager/task_status/{task_id}` -
gets the task's status, unregisters the task if it's finished;
gets the task's status;
- `/task_manager/abort_task/{task_id}` -
aborts the task if it's abortable;
- `/task_manager/wait_task/{task_id}` -
waits for the task and gets its status;
- `/task_manager/task_status_recursive/{task_id}` -
gets statuses of the task and all its descendants in BFS
order, unregisters the task;
order;
- `/task_manager/ttl` -
gets or sets new ttl.
- `/task_manager/user_ttl` -
gets or sets new user ttl.
- `/task_manager/drain/{module}` -
unregisters all finished local tasks in the module.
# Virtual tasks

View File

@@ -124,6 +124,9 @@ Additionally to specific node states, there entire topology can also be in a tra
it from group 0. We also use this state to rollback a failed bootstrap or decommission.
- `rollback_to_normal` - the decommission or removenode operation failed. Rollback the operation by
moving the node we tried to decommission/remove back to the normal state.
- `lock` - the topology stays in this state until externally changed (to null state), preventing topology
requests from starting. Intended to be used in tests which want to prevent internally-triggered topology
operations during the test.
When a node bootstraps, we create new tokens for it and a new CDC generation
and enter the `commit_cdc_generation` state. Once the generation is committed,

View File

@@ -193,6 +193,8 @@ ScyllaDB comes with its own version of the Apache Cassandra client tools, in the
We recommend uninstalling Apache Cassandra before installing :code:`scylla-tools`.
.. TODO Update the example below then a patch release for 2025.1 is available
.. _faq-pinning:
Can I install or upgrade to a patch release other than latest on Debian or Ubuntu?

View File

@@ -18,7 +18,7 @@ For example, consider the following two workloads:
- Slow queries
- In essence - Latency agnostic
Using Service Level CQL commands, database administrators (working on Scylla Enterprise) can set different workload prioritization levels (levels of service) for each workload without sacrificing latency or throughput.
Using Service Level CQL commands, database administrators (working on ScyllaDB) can set different workload prioritization levels (levels of service) for each workload without sacrificing latency or throughput.
By assigning each service level to the different roles within your organization, DBAs ensure that each role_ receives the level of service the role requires.
.. _`role` : /operating-scylla/security/rbac_usecase/
@@ -425,7 +425,7 @@ In order for workload prioritization to take effect, application users need to b
Limits
======
Scylla Enterprise is limited to 8 service levels, including the default one; this means you can create up to 7 service levels.
ScyllaDB is limited to 8 service levels, including the default one; this means you can create up to 7 service levels.
Additional References

View File

@@ -1,21 +0,0 @@
You can `build ScyllaDB from source <https://github.com/scylladb/scylladb#build-prerequisites>`_ on other x86_64 or aarch64 platforms, without any guarantees.
+----------------------------+--------------------+-------+---------------+
| Linux Distributions |Ubuntu | Debian|Rocky / CentOS |
| | | |/ RHEL |
+----------------------------+------+------+------+-------+-------+-------+
| ScyllaDB Version / Version |20.04 |22.04 |24.04 | 11 | 8 | 9 |
+============================+======+======+======+=======+=======+=======+
| 6.2 | |v| | |v| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+-------+-------+-------+
| 6.1 | |v| | |v| | |v| | |v| | |v| | |v| |
+----------------------------+------+------+------+-------+-------+-------+
* The recommended OS for ScyllaDB Open Source is Ubuntu 22.04.
* All releases are available as a Docker container and EC2 AMI, GCP, and Azure images.
Supported Architecture
-----------------------------
ScyllaDB Open Source supports x86_64 for all versions and AArch64 starting from ScyllaDB 4.6 and nightly build.
In particular, aarch64 support includes AWS EC2 Graviton.

View File

@@ -110,7 +110,7 @@ Google Compute Engine (GCE)
-----------------------------------
Pick a zone where Haswell CPUs are found. Local SSD performance offers, according to Google, less than 1 ms of latency and up to 680,000 read IOPS and 360,000 write IOPS.
Image with NVMe disk interface is recommended, CentOS 7 for ScyllaDB Enterprise 2020.1 and older, and Ubuntu 20 for 2021.1 and later.
Image with NVMe disk interface is recommended.
(`More info <https://cloud.google.com/compute/docs/disks/local-ssd>`_)
Recommended instances types are `n1-highmem <https://cloud.google.com/compute/docs/general-purpose-machines#n1_machines>`_ and `n2-highmem <https://cloud.google.com/compute/docs/general-purpose-machines#n2_machines>`_

View File

@@ -4,7 +4,7 @@ ScyllaDB Web Installer for Linux
ScyllaDB Web Installer is a platform-agnostic installation script you can run with ``curl`` to install ScyllaDB on Linux.
See `ScyllaDB Download Center <https://www.scylladb.com/download/#core>`_ for information on manually installing ScyllaDB with platform-specific installation packages.
See :doc:`Install ScyllaDB Linux Packages </getting-started/install-scylla/install-on-linux/>` for information on manually installing ScyllaDB with platform-specific installation packages.
Prerequisites
--------------
@@ -20,44 +20,50 @@ To install ScyllaDB with Web Installer, run:
curl -sSf get.scylladb.com/server | sudo bash
By default, running the script installs the latest official version of ScyllaDB Open Source. You can use the following
options to install a different version or ScyllaDB Enterprise:
.. list-table::
:widths: 20 25 55
:header-rows: 1
* - Option
- Acceptable values
- Description
* - ``--scylla-product``
- ``scylla`` | ``scylla-enterprise``
- Specifies the ScyllaDB product to install: Open Source (``scylla``) or Enterprise (``scylla-enterprise``) The default is ``scylla``.
* - ``--scylla-version``
- ``<version number>``
- Specifies the ScyllaDB version to install. You can specify the major release (``x.y``) to install the latest patch for that version or a specific patch release (``x.y.x``). The default is the latest official version.
By default, running the script installs the latest official version of ScyllaDB.
You can run the command with the ``-h`` or ``--help`` flag to print information about the script.
Examples
===========
Installing a Non-default Version
---------------------------------------
Installing ScyllaDB Open Source 6.0.1:
You can install a version other than the default.
Versions 2025.1 and Later
==============================
Run the command with the ``--scylla-version`` option to specify the version
you want to install.
**Example**
.. code:: console
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 2025.1.1
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 6.0.1
Installing the latest patch release for ScyllaDB Open Source 6.0:
Versions Earlier than 2025.1
================================
To install a supported version of *ScyllaDB Enterprise*, run the command with:
* ``--scylla-product scylla-enterprise`` to specify that you want to install
ScyllaDB Entrprise.
* ``--scylla-version`` to specify the version you want to install.
For example:
.. code:: console
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-product scylla-enterprise --scylla-version 2024.1
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 6.0
To install a supported version of *ScyllaDB Open Source*, run the command with
the ``--scylla-version`` option to specify the version you want to install.
Installing ScyllaDB Enterprise 2024.1:
For example:
.. code:: console
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-product scylla-enterprise --scylla-version 2024.1
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 6.2.1
.. include:: /getting-started/_common/setup-after-install.rst

View File

@@ -1,13 +1,38 @@
OS Support by Linux Distributions and Version
==============================================
The following matrix shows which Linux distributions, containers, and images are supported with which versions of ScyllaDB.
The following matrix shows which Linux distributions, containers, and images
are :ref:`supported <os-support-definition>` with which versions of ScyllaDB.
Where *supported* in this scope means:
+-------------------------------+--------------------------+-------+------------------+---------------+
| Linux Distributions |Ubuntu | Debian| Rocky / Centos / | Amazon Linux |
| | | | RHEL | |
+-------------------------------+------+------+------------+-------+-------+----------+---------------+
| ScyllaDB Version / OS Version |20.04 |22.04 |24.04 | 11 | 8 | 9 | 2023 |
+===============================+======+======+============+=======+=======+==========+===============+
| Enterprise 2025.1 | |v| | |v| | |v| | |v| | |v| | |v| | |v| |
+-------------------------------+------+------+------------+-------+-------+----------+---------------+
| Enterprise 2024.2 | |v| | |v| | |v| | |v| | |v| | |v| | |v| |
+-------------------------------+------+------+------------+-------+-------+----------+---------------+
| Enterprise 2024.1 | |v| | |v| | |v| ``*`` | |v| | |v| | |v| | |x| |
+-------------------------------+------+------+------------+-------+-------+----------+---------------+
| Open Source 6.2 | |v| | |v| | |v| | |v| | |v| | |v| | |v| |
+-------------------------------+------+------+------------+-------+-------+----------+---------------+
``*`` 2024.1.9 and later
All releases are available as a Docker container, EC2 AMI, GCP, and Azure images.
.. _os-support-definition:
By *supported*, it is meant that:
- A binary installation package is available to `download <https://www.scylladb.com/download/>`_.
- The download and install procedures are tested as part of ScyllaDB release process for each version.
- An automated install is included from :doc:`ScyllaDB Web Installer for Linux tool </getting-started/installation-common/scylla-web-installer>` (for latest versions)
- The download and install procedures are tested as part of the ScyllaDB release process for each version.
- An automated install is included from :doc:`ScyllaDB Web Installer for Linux tool </getting-started/installation-common/scylla-web-installer>` (for the latest versions).
You can `build ScyllaDB from source <https://github.com/scylladb/scylladb#build-prerequisites>`_
on other x86_64 or aarch64 platforms, without any guarantees.
.. scylladb_include_flag:: os-support-info.rst

View File

@@ -8,7 +8,7 @@ ScyllaDB Requirements
:hidden:
system-requirements
os-support
OS Support <os-support>
Cloud Instance Recommendations <cloud-instance-recommendations>
scylla-in-a-shared-environment

View File

@@ -8,7 +8,6 @@
* :doc:`cassandra-stress </operating-scylla/admin-tools/cassandra-stress/>` A tool for benchmarking and load testing a ScyllaDB and Cassandra clusters.
* :doc:`SSTabledump </operating-scylla/admin-tools/sstabledump>`
* :doc:`SSTableMetadata </operating-scylla/admin-tools/sstablemetadata>`
* configuration_encryptor - :doc:`encrypt at rest </operating-scylla/security/encryption-at-rest>` sensitive scylla configuration entries using system key.
* scylla local-file-key-generator - Generate a local file (system) key for :doc:`encryption at rest </operating-scylla/security/encryption-at-rest>`, with the provided length, Key algorithm, Algorithm block mode and Algorithm padding method.
* `scyllatop <https://www.scylladb.com/2016/03/22/scyllatop/>`_ - A terminal base top-like tool for scylladb collectd/prometheus metrics.
* :doc:`scylla_dev_mode_setup</getting-started/installation-common/dev-mod>` - run ScyllaDB in Developer Mode.

View File

@@ -3,275 +3,6 @@ Cassandra Stress
The cassandra-stress tool is used for benchmarking and load testing both ScyllaDB and Cassandra clusters. The cassandra-stress tool also supports testing arbitrary CQL tables and queries to allow users to benchmark their data model.
This documentation focuses on user mode as this allows the testing of your actual schema.
Usage
-----
There are several operation types:
* write-only, read-only, and mixed workloads of standard data
* write-only and read-only workloads for counter columns
* user configured workloads, running custom queries on custom schemas
* The syntax is cassandra-stress <command> [options]. If you want more information on a given command or options, just run cassandra-stress help
Commands:
read: Multiple concurrent reads - the cluster must first be populated by a write test.
write: Multiple concurrent writes against the cluster.
mixed: Interleaving of any basic commands, with configurable ratio and distribution - the cluster must first be populated by a write test.
counter_write: Multiple concurrent updates of counters.
counter_read: Multiple concurrent reads of counters. The cluster must first be populated by a counterwrite test.
user: Interleaving of user provided queries, with configurable ratio and distribution.
help: Print help for a command or option.
print: Inspect the output of a distribution definition.
legacy: Legacy support mode.
Primary Options:
-pop: Population distribution and intra-partition visit order.
-insert: Insert specific options relating to various methods for batching and splitting partition updates.
-col: Column details such as size and count distribution, data generator, names, comparator and if super columns should be used.
-rate: Thread count, rate limit or automatic mode (default is auto).
-mode: CQL with options.
-errors: How to handle errors when encountered during stress.
-sample: Specify the number of samples to collect for measuring latency.
-schema: Replication settings, compression, compaction, etc.
-node: Nodes to connect to.
-log: Where to log progress to, and the interval at which to do it.
-transport: Custom transport factories.
-port: The port to connect to cassandra nodes on.
-sendto: Specify a stress server to send this command to.
-graph: Graph recorded metrics.
-tokenrange: Token range settings.
User mode
---------
User mode allows you to use your stress your own schemas. This can save time in the long run rather than building an application and then realising your schema doesnt scale.
Profile
.......
User mode requires a profile defined in YAML. Multiple YAML files may be specified in which case operations in the ops argument are referenced as specname.opname.
An identifier for the profile:
.. code-block:: cql
specname: staff_activities
The keyspace for the test:
.. code-block:: cql
keyspace: staff
CQL for the keyspace. Optional if the keyspace already exists:
.. code-block:: cql
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
The table to be stressed:
.. code-block:: cql
table: staff_activities
CQL for the table. Optional if the table already exists:
.. code-block:: cql
table_definition: |
CREATE TABLE staff_activities (
name text,
when timeuuid,
what text,
PRIMARY KEY(name, when, what)
)
Optional meta information on the generated columns in the above table. The min and max only apply to text and blob types. The distribution field represents the total unique population distribution of that column across rows:
.. code-block:: cql
columnspec:
- name: name
size: uniform(5..10) # The names of the staff members are between 5-10 characters
population: uniform(1..10) # 10 possible staff members to pick from
- name: when
cluster: uniform(20..500) # Staff members do between 20 and 500 events
- name: what
size: normal(10..100,50)
Supported types are:
An exponential distribution over the range [min..max]:
.. code-block:: cql
EXP(min..max)
An extreme value (Weibull) distribution over the range [min..max]:
.. code-block:: cql
EXTREME(min..max,shape)
A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng:
.. code-block:: cql
GAUSSIAN(min..max,stdvrng)
A gaussian/normal distribution, with explicitly defined mean and stdev:
.. code-block:: cql
GAUSSIAN(min..max,mean,stdev)
A uniform distribution over the range [min, max]:
.. code-block:: cql
UNIFORM(min..max)
A fixed distribution, always returning the same value:
.. code-block:: cql
FIXED(val)
If preceded by ~, the distribution is inverted
Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1)
Insert distributions:
.. code-block:: cql
insert:
# How many partition to insert per batch
partitions: fixed(1)
# How many rows to update per partition
select: fixed(1)/500
# UNLOGGED or LOGGED batch for insert
batchtype: UNLOGGED
Currently all inserts are done inside batches.
Read statements to use during the test:
.. code-block:: cql
queries:
events:
cql: select * from staff_activities where name = ?
fields: samerow
latest_event:
cql: select * from staff_activities where name = ? LIMIT 1
fields: samerow
Running a user mode test:
.. code-block:: cql
cassandra-stress user profile=./example.yaml duration=1m "ops(insert=1,latest_event=1,events=1)" truncate=once
This will create the schema then run tests for 1 minute with an equal number of inserts, latest_event queries and events queries. Additionally the table will be truncated once before the test.
The full example can be found here yaml
Running a user mode test with multiple yaml files:
.. code-block::
cassandra-stress user profile=./example.yaml,./example2.yaml duration=1m “ops(ex1.insert=1,ex1.latest_event=1,ex2.insert=2)” truncate=once
This will run operations as specified in both the example.yaml and example2.yaml files. example.yaml and example2.yaml can reference the same table
although care must be taken that the table definition is identical (data generation specs can be different).
.. Lightweight transaction support
.. ...............................
.. cassandra-stress supports lightweight transactions. In this it will first read current data from Cassandra and then uses read value(s) to fulfill lightweight transaction condition(s).
.. Lightweight transaction update query:
.. .. code-block:: cql
.. queries:
.. regularupdate:
.. cql: update blogposts set author = ? where domain = ? and published_date = ?
.. fields: samerow
.. updatewithlwt:
.. cql: update blogposts set author = ? where domain = ? and published_date = ? IF body = ? AND url = ?
.. fields: samerow
Graphing
........
Graphs can be generated for each run of stress.
.. image:: example-stress-graph.png
To create a new graph:
.. code-block:: cql
cassandra-stress user profile=./stress-example.yaml "ops(insert=1,latest_event=1,events=1)" -graph file=graph.html title="Awesome graph"
To add a new run to an existing graph point to an existing file and add a revision name:
.. code-block:: cql
cassandra-stress user profile=./stress-example.yaml duration=1m "ops(insert=1,latest_event=1,events=1)" -graph file=graph.html title="Awesome graph" revision="Second run"
FAQ
...
How do you use NetworkTopologyStrategy for the keyspace?
Use the schema option making sure to either escape the parenthesis or enclose in quotes:
.. code-block:: cql
cassandra-stress write -schema "replication(strategy=NetworkTopologyStrategy,datacenter1=3)"
How do you use SSL?
Use the transport option:
.. code-block:: cql
cassandra-stress "write n=100k cl=ONE no-warmup" -transport "truststore=$HOME/jks/truststore.jks truststore-password=cassandra"
Cassandra Stress is not part of ScyllaDB and it is not distributed along side it anymore. It has it's own seperate repository and release cycle. More information about it can be found on `GitHub <https://github.com/scylladb/cassandra-stress>`_ or on `DockerHub <https://hub.docker.com/r/scylladb/cassandra-stress>`_.
.. include:: /rst_include/apache-copyrights.rst

View File

@@ -5,7 +5,7 @@ Bulk loads SSTables from a directory to a ScyllaDB cluster via the **CQL API**.
.. warning::
SSTableLoader is deprecated since ScyllaDB 6.2 and will be removed in the next release.
SSTableLoader is deprecated and will be removed in a future release.
Please consider switching to :doc:`nodetool refresh --load-and-stream </operating-scylla/nodetool-commands/refresh>`.
.. note::

View File

@@ -74,13 +74,13 @@ API calls
- *keyspace* - if set, tasks are filtered to contain only the ones working on this keyspace;
- *table* - if set, tasks are filtered to contain only the ones working on this table;
* ``/task_manager/task_status/{task_id}`` - gets the task's status, unregisters the task if it's finished;
* ``/task_manager/task_status/{task_id}`` - gets the task's status;
* ``/task_manager/abort_task/{task_id}`` - aborts the task if it's abortable, otherwise 403 status code is returned;
* ``/task_manager/wait_task/{task_id}`` - waits for the task and gets its status (does not unregister the tasks); query params:
* ``/task_manager/wait_task/{task_id}`` - waits for the task and gets its status; query params:
- *timeout* - timeout in seconds; if set - 408 status code is returned if waiting times out;
* ``/task_manager/task_status_recursive/{task_id}`` - gets statuses of the task and all its descendants in BFS order, unregisters the root task;
* ``/task_manager/task_status_recursive/{task_id}`` - gets statuses of the task and all its descendants in BFS order;
* ``/task_manager/ttl`` - gets or sets new ttl; query params (if setting):
- *ttl* - new ttl value.
@@ -89,6 +89,8 @@ API calls
- *user_ttl* - new user ttl value.
* ``/task_manager/drain/{module}`` - unregisters all finished local tasks in the module.
Cluster tasks are not unregistered from task manager with API calls.
Tasks API

View File

@@ -257,8 +257,6 @@ ScyllaDB uses experimental flags to expose non-production-ready features safely.
In recent ScyllaDB versions, these features are controlled by the ``experimental_features`` list in scylla.yaml, allowing one to choose which experimental to enable.
Use ``scylla --help`` to get the list of experimental features.
ScyllaDB Enterprise and ScyllaDB Cloud do not officially support experimental Features.
.. _admin-keyspace-storage-options:
Keyspace storage options
@@ -286,6 +284,24 @@ Before creating keyspaces with object storage, you also need to
:ref:`configure <object-storage-configuration>` the object storage
credentials and endpoint.
.. _admin-views-with-tablets:
Views with tablets
------------------
By default, Materialized Views (MV) and Secondary Indexes (SI)
are disabled in keyspaces that use tablets.
Support for MV and SI with tablets is experimental and must be explicitly
enabled in the ``scylla.yaml`` configuration file by specifying
the ``views-with-tablets`` option:
.. code-block:: yaml
experimental_features:
- views-with-tablets
Monitoring
==========
ScyllaDB exposes interfaces for online monitoring, as described below.

View File

@@ -0,0 +1,21 @@
Nodetool tasks drain
====================
**tasks drain** - Unregisters all finished local tasks from the module.
If a module is not specified, finished tasks in all modules are unregistered.
Syntax
-------
.. code-block:: console
nodetool tasks drain [--module <module>]
Options
-------
* ``--module`` - if set, only the specified module is drained.
For example:
.. code-block:: shell
> nodetool tasks drain --module repair

View File

@@ -5,6 +5,7 @@ Nodetool tasks
:hidden:
abort <abort>
drain <drain>
user-ttl <user-ttl>
list <list>
modules <modules>
@@ -23,15 +24,12 @@ Task Status Retention
* When a task completes, its status is temporarily stored on the executing node
* Status information is retained for up to :confval:`task_ttl_in_seconds` seconds
* The status information of a completed task is automatically removed after being queried with ``tasks status`` or ``tasks tree``
* ``tasks wait`` returns the status, but it does not remove the task information of the queried task
.. note:: Multiple status queries using ``tasks status`` and ``tasks tree`` for the same completed task will only receive a response for the first query, since the status is removed after being retrieved.
Supported tasks suboperations
-----------------------------
* :doc:`abort </operating-scylla/nodetool-commands/tasks/abort>` - Aborts the task.
* :doc:`drain </operating-scylla/nodetool-commands/tasks/drain>` - Unregisters all finished local tasks.
* :doc:`user-ttl </operating-scylla/nodetool-commands/tasks/user-ttl>` - Gets or sets user_task_ttl value.
* :doc:`list </operating-scylla/nodetool-commands/tasks/list>` - Lists tasks in the module.
* :doc:`modules </operating-scylla/nodetool-commands/tasks/modules>` - Lists supported modules.

View File

@@ -1,6 +1,6 @@
Nodetool tasks status
=========================
**tasks status** - Gets the status of a task manager task. If the task was finished it is unregistered.
**tasks status** - Gets the status of a task manager task.
Syntax
-------
@@ -23,10 +23,10 @@ Example output
type: repair
kind: node
scope: keyspace
state: done
state: running
is_abortable: true
start_time: 2024-07-29T15:48:55Z
end_time: 2024-07-29T15:48:55Z
end_time:
error:
parent_id: none
sequence_number: 5

View File

@@ -1,7 +1,7 @@
Nodetool tasks tree
=======================
**tasks tree** - Gets the statuses of a task manager task and all its descendants.
The statuses are listed in BFS order. If the task was finished it is unregistered.
The statuses are listed in BFS order.
If task_id isn't specified, trees of all non-internal tasks are printed
(internal tasks are the ones that have a parent or cover an operation that

View File

@@ -7,8 +7,8 @@ Even though ScyllaDB is a fault-tolerant system, it is recommended to regularly
* Backup is a per-node procedure. Make sure to back up each node in your
cluster. For cluster-wide backup and restore, see `ScyllaDB Manager <https://manager.docs.scylladb.com/stable/restore/>`_.
* Backup works the same for non-encrypted and encrypted SStables. You can use
`Encryption at Rest <https://enterprise.docs.scylladb.com/stable/operating-scylla/security/encryption-at-rest.html>`_
available in ScyllaDB Enterprise without affecting the backup procedure.
:doc:`Encryption at Rest </operating-scylla/security/encryption-at-rest>`
without affecting the backup procedure.
You can choose one of the following:

View File

@@ -77,7 +77,7 @@ Procedure
.. note::
ScyllaDB Open Source 3.0 and later and ScyllaDB Enterprise 2019.1 and later support :doc:`Materialized View(MV) </features/materialized-views>` and :doc:`Secondary Index(SI) </features/secondary-indexes>`.
ScyllaDB supports :doc:`Materialized View(MV) </features/materialized-views>` and :doc:`Secondary Index(SI) </features/secondary-indexes>`.
When migrating data from Apache Cassandra with MV or SI, you can either:

View File

@@ -1,10 +1,13 @@
.. Note::
Make sure to use the same ScyllaDB **patch release** on the new/replaced node, to match the rest of the cluster. It is not recommended to add a new node with a different release to the cluster.
For example, use the following for installing ScyllaDB patch release (use your deployed version)
For example, use the following for installing ScyllaDB patch release (use your deployed version):
.. code::
sudo yum install scylla-2025.1.0
* ScyllaDB Enterprise - ``sudo yum install scylla-enterprise-2018.1.9``
* ScyllaDB open source - ``sudo yum install scylla-3.0.3``

View File

@@ -202,6 +202,7 @@ Add New DC
#. If you are using ScyllaDB Monitoring, update the `monitoring stack <https://monitoring.docs.scylladb.com/stable/install/monitoring_stack.html#configure-scylla-nodes-from-files>`_ to monitor it. If you are using ScyllaDB Manager, make sure you install the `Manager Agent <https://manager.docs.scylladb.com/stable/install-scylla-manager-agent.html>`_ and Manager can access the new DC.
.. _add-dc-to-existing-dc-not-connect-clients:
Configure the Client not to Connect to the New DC
-------------------------------------------------

View File

@@ -0,0 +1,57 @@
=========================================================
Preventing Quorum Loss in Symmetrical Multi-DC Clusters
=========================================================
ScyllaDB requires at least a quorum (majority) of nodes in a cluster to be up
and communicate with each other. A cluster that loses a quorum can handle reads
and writes of user data, but cluster management operations, such as schema and
topology updates, are impossible.
In clusters that are symmetrical, i.e., have two (DCs) with the same number of
nodes, losing a quorum may occur if one of the DCs becomes unavailable.
For example, if one DC fails in a 2-DC cluster where each DC has three nodes,
only three out of six nodes are available, and the quorum is lost.
Adding another DC would mitigate the risk of losing a quorum, but it comes
with network and storage costs. To prevent the quorum loss with minimum costs,
you can configure an arbiter (tie-breaker) DC.
An arbiter DC is a datacenter with a :doc:`zero-token node </architecture/zero-token-nodes>`
-- a node that doesn't replicate any data but is only used for Raft quorum
voting. An arbiter DC maintains the cluster quorum if one of the other DCs
fails, while it doesn't incur extra network and storage costs as it has no
user data.
Adding an Arbiter DC
-----------------------
To set up an arbiter DC, follow the procedure to
:doc:`add a new datacenter to an existing cluster </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`.
When editing the *scylla.yaml* file, set the ``join_ring`` parameter to
``false`` following these guidelines:
* Set ``join_ring=false`` before you start the node(s). If you set that
parameter on a node that has already been bootstrapped and owns a token
range, the node startup will fail. In such a case, you'll need to
:doc:`decommission </operating-scylla/procedures/cluster-management/decommissioning-data-center>`
the node, :doc:`wipe it clean </operating-scylla/procedures/cluster-management/clear-data>`,
and add it back to the arbiter DC properly following
the :doc:`procedure </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`.
* As a rule, one node is sufficient for an arbiter to serve as a tie-breaker.
In case you add more than one node to the arbiter DC, ensure that you set
``join_ring=false`` on all the nodes in that DC.
Follow-up steps:
^^^^^^^^^^^^^^^^^^^
* An arbiter DC has a replication factor of 0 (RF=0) for all keyspaces. You
need to ``ALTER`` the keyspaces to update their RF.
* Since zero-token nodes are ignored by drivers, you can skip
:ref:`configuring the client not to connect to the new DC <add-dc-to-existing-dc-not-connect-clients>`.
References
----------------
* :doc:`Zero-token Nodes </architecture/zero-token-nodes>`
* :doc:`Raft Consensus Algorithm in ScyllaDB </architecture/raft>`
* :doc:`Handling Node Failures </troubleshooting/handling-node-failures>`
* :doc:`Adding a New Data Center Into an Existing ScyllaDB Cluster </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`

View File

@@ -209,6 +209,17 @@ In this example, we will show how to install a nine nodes cluster.
UN 54.187.142.201 109.54 KB 256 ? d99967d6-987c-4a54-829d-86d1b921470f RACK1
UN 54.187.168.20 109.54 KB 256 ? 2329c2e0-64e1-41dc-8202-74403a40f851 RACK1
See also:
--------------------------
Preventing Quorum Loss
--------------------------
If your cluster is symmetrical, i.e., it has an even number of datacenters
with the same number of nodes, consider adding an arbiter DC to mitigate
the risk of losing a quorum at a minimum cost.
See :doc:`Preventing Quorum Loss in Symmetrical Multi-DC Clusters </operating-scylla/procedures/cluster-management/arbiter-dc>`
for details.
------------
See also:
------------
:doc:`Create a ScyllaDB Cluster - Single Data Center (DC) </operating-scylla/procedures/cluster-management/create-cluster>`

View File

@@ -26,6 +26,8 @@ Cluster Management Procedures
Safely Restart Your Cluster <safe-start>
Handling Membership Change Failures <handling-membership-change-failures>
repair-based-node-operation
Prevent Quorum Loss in Symmetrical Multi-DC Clusters <arbiter-dc>
.. panel-box::
:title: Cluster and DC Creation
@@ -84,6 +86,8 @@ Cluster Management Procedures
* :doc:`Repair Based Node Operations (RBNO) </operating-scylla/procedures/cluster-management/repair-based-node-operation>`
* :doc:`Preventing Quorum Loss in Symmetrical Multi-DC Clusters <arbiter-dc>`
.. panel-box::
:title: Topology Changes
:id: "getting-started"

View File

@@ -5,7 +5,7 @@ Advanced Internode (RPC) Compression
Internode (RPC) compression controls whether traffic between nodes is
compressed. If enabled, it reduces network bandwidth usage.
To further reduce network traffic, you can configure ScyllaDB Enterprise to use
To further reduce network traffic, you can configure ScyllaDB to use
ZSTD-based compression and shared dictionary compression. You can enable one or
both of these features to limit network throughput and reduce network transfer costs.

View File

@@ -181,7 +181,7 @@ Use Workload Prioritization
In a typical application there are operational workloads that require low latency.
Sometimes these run in parallel with analytic workloads that process high volumes of data and do not require low latency.
With workload prioritization, one can prevent that the analytic workloads lead to an unwanted high latency on operational workload.
`Workload prioritization <https://enterprise.docs.scylladb.com/stable/using-scylla/workload-prioritization.html>`_ is only available with `ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/>`_.
See :doc:`Workload prioritization </operating-scylla/security/encryption-at-rest>`.
Bypass Cache
============
@@ -330,7 +330,7 @@ When records get updated or deleted, the old data eventually needs to be deleted
The compaction settings can make a huge difference.
* Use the following :ref:`Compaction Strategy Matrix <CSM1>` to use the correct compaction strategy for your workload.
* ICS is an incremental compaction strategy that combines the low space amplification of LCS with the low write amplification of STCS. It is **only** available with ScyllaDB Enterprise.
* ICS is an incremental compaction strategy that combines the low space amplification of LCS with the low write amplification of STCS.
* If you have time series data, the TWCS should be used.
Read more about :doc:`Compaction Strategies </architecture/compaction/compaction-strategies>`

View File

@@ -28,7 +28,7 @@ Incremental Compaction Strategy (ICS)
We highly recommend using ICS (the default setting) for any table that you have.
You will have much less Space Amplification with ICS as it only requires 25% additional storage, as opposed to STCS which requires 50% more.
.. note:: ICS is the default compaction strategy setting for Scylla Enterprise versions 2020.1 and higher.
.. note:: ICS is the default compaction strategy.
* Refer to :ref:`Incremental Compaction Strategy <ICS1>` for an overview of the benefits.
* Refer to :ref:`Incremental Compaction Strategy Overview <incremental-compaction-strategy-ics>` for a description of how it works.

View File

@@ -2,9 +2,6 @@
ScyllaDB Auditing Guide
========================
:label-tip:`ScyllaDB Enterprise`
Auditing allows the administrator to monitor activities on a Scylla cluster, including queries and data changes.
The information is stored in a Syslog or a Scylla table.
@@ -64,7 +61,7 @@ QUERY Logs all queries
--------- -----------------------------------------------------------------------------------------
ADMIN Logs service level operations: create, alter, drop, attach, detach, list.
For :ref:`service level <workload-priorization-service-level-management>`
auditing, this parameter is available in Scylla Enterprise 2019.1 and later.
auditing.
========= =========================================================================================
Note that audit for every DML or QUERY might impact performance and consume a lot of storage.

View File

@@ -5,11 +5,11 @@ Encryption at Rest
Introduction
----------------------
ScyllaDB Enterprise protects your sensitive data with data-at-rest encryption.
ScyllaDB protects your sensitive data with data-at-rest encryption.
It protects the privacy of your user's data, reduces the risk of data breaches, and helps meet regulatory requirements.
In particular, it provides an additional level of protection for your data persisted in storage or its backups.
When ScyllaDB Enterprise Encryption at Rest is used together with Encryption in Transit (:doc:`Node to Node </operating-scylla/security/node-node-encryption>` and :doc:`Client to Node </operating-scylla/security/client-node-encryption>`), you benefit from end to end data encryption.
When ScyllaDB's Encryption at Rest is used together with Encryption in Transit (:doc:`Node to Node </operating-scylla/security/node-node-encryption>` and :doc:`Client to Node </operating-scylla/security/client-node-encryption>`), you benefit from end to end data encryption.
About Encryption at Rest
-----------------------------
@@ -143,8 +143,6 @@ Depending on your key provider, you will either have the option of allowing Scyl
* Replicated Key Provider - you must generate a system key yourself
* Local Key Provider - If you do not generate your own secret key, ScyllaDB will create one for you
When encrypting ScyllaDB config by ``configuration_encryptor``, you also need to generate a secret key and upload the key to all nodes.
Use the key generator script
================================
@@ -282,8 +280,6 @@ If you are using :term:`KMIP <Key Management Interoperability Protocol (KMIP)>`
Set the KMS Host
----------------------
.. note:: KMS support is available since ScyllaDB Enterprise **2023.1.1**.
If you are using AWS KMS to encrypt tables or system information, add the KMS information to the ``scylla.yaml`` configuration file.
#. Edit the ``scylla.yaml`` file located in ``/etc/scylla/`` to add the following in KMS host(s) section:
@@ -408,10 +404,6 @@ If you are using Google GCP KMS to encrypt tables or system information, add the
Encrypt Tables
-----------------------------
.. note::
This feature is available since ScyllaDB Enterprise 2023.1.2.
ScyllaDB allows you to enable or disable default encryption of tables.
When enabled, tables will be encrypted by default using the configuration
provided for the ``user_info_encryption`` option in the ``scylla.yaml`` file.
@@ -820,32 +812,6 @@ Once this encryption is enabled, it is used for all system data.
.. wasn't able to test this successfully
.. Encrypt and Decrypt Configuration Files
.. =======================================
.. Using the Configuration Encryption tool, you can encrypt parts of the scylla.yaml file which contain encryption configuration settings.
.. **Procedure**
.. 1. Run the Configuration Encryption script:
.. test code-block: none
.. /bin/configuration_encryptor [options] [key-path]
.. Where:
.. * ``-c, --config`` - the path to the configuration file (/etc/scylla/scylla.yaml, for example)
.. * ``-d, --decrypt`` - decrypts the configuration file at the specified path
.. * ``-o, --output`` - (optional) writes the configuration file to a specified target. This can be the same location as the source file.
.. * ``-h. --help`` - help for this command
.. For example:
.. test code-block: none
.. sudo -u scylla /bin/configuration_encryptor -c /etc/scylla/scylla.yaml /etc/scylla/encryption_keys/secret_key
.. end of test
When a Key is Lost
----------------------

View File

@@ -7,10 +7,6 @@ LDAP Authentication
saslauthd
:label-tip:`ScyllaDB Enterprise`
.. versionadded:: 2021.1.2
Scylla supports user authentication via an LDAP server by leveraging the SaslauthdAuthenticator.
By configuring saslauthd correctly against your LDAP server, you enable Scylla to check the users credentials through it.

View File

@@ -2,11 +2,7 @@
LDAP Authorization (Role Management)
=====================================
:label-tip:`ScyllaDB Enterprise`
.. versionadded:: 2021.1.2
Scylla Enterprise customers can manage and authorize users privileges via an :abbr:`LDAP (Lightweight Directory Access Protocol)` server.
Scylla customers can manage and authorize users privileges via an :abbr:`LDAP (Lightweight Directory Access Protocol)` server.
LDAP is an open, vendor-neutral, industry-standard protocol for accessing and maintaining distributed user access control over a standard IP network.
If your users are already stored in an LDAP directory, you can now use the same LDAP server to regulate their roles in Scylla.

View File

@@ -22,7 +22,7 @@ In the same manner, should someone leave the organization, all you would have to
Should someone change positions at the company, just assign the new employee to the new role and revoke roles no longer required for the new position.
To build an RBAC environment, you need to create the roles and their associated permissions and then assign or grant the roles to the individual users. Roles inherit the permissions of any other roles that they are granted. The hierarchy of roles can be either simple or extremely complex. This gives great flexibility to database administrators, where they can create specific permission conditions without incurring a huge administrative burden.
In addition to standard roles, ScyllaDB Enterprise users can implement :doc:`Workload Prioritization </features/workload-prioritization/>`, which allows you to attach roles to Service Levels, thus granting resources to roles as the role demands.
In addition to standard roles, ScyllaDB users can implement :doc:`Workload Prioritization </features/workload-prioritization/>`, which allows you to attach roles to Service Levels, thus granting resources to roles as the role demands.
.. _rbac-usecase-grant-roles-and-permissions:

View File

@@ -31,11 +31,9 @@ Encryption on Transit, Client to Node and Node to Node
Encryption on Transit protects your communication against a 3rd interception on the network connection.
Configure ScyllaDB to use TLS/SSL for all the connections. Use TLS/SSL to encrypt communication between ScyllaDB nodes and client applications.
.. only:: enterprise
Starting with version 2023.1.1, you can run ScyllaDB Enterprise on FIPS-enabled Ubuntu,
which uses FIPS 140-2 certified libraries (such as OpenSSL, GnuTLS, and more) and Linux
kernel in FIPS mode.
You can run ScyllaDB on FIPS-enabled Ubuntu,
which uses FIPS 140-2 certified libraries (such as OpenSSL, GnuTLS, and more) and Linux
kernel in FIPS mode.
* :doc:`Encryption Data in Transit Client to Node </operating-scylla/security/client-node-encryption>`
@@ -43,7 +41,6 @@ Configure ScyllaDB to use TLS/SSL for all the connections. Use TLS/SSL to encryp
Encryption at Rest
~~~~~~~~~~~~~~~~~~
Encryption at Rest is available in a Scylla Enterprise 2019.1.1.
Encryption at Rest protects the privacy of your user's data, reduces the risk of data breaches, and helps meet regulatory requirements.
In particular, it provides an additional level of protection for your data persisted in storage or backup.

View File

@@ -127,6 +127,12 @@ Glossary
RBNO is enabled by default for a subset node operations.
See :doc:`Repair Based Node Operations </operating-scylla/procedures/cluster-management/repair-based-node-operation>` for details.
RF-rack-valid keyspace
A keyspace with :doc:`tablets </architecture/tablets>` enabled is RF-rack-valid if all of its data centers
have the :term:`Replication Factor (RF) <Replication Factor (RF)>` of 0, 1, or the number of racks in that data center.
Keyspaces with tablets disabled are always deemed RF-rack-valid, even if they do not satisfy the aforementioned condition.
Shard
Each ScyllaDB node is internally split into *shards*, an independent thread bound to a dedicated core.
Each shard of data is allotted CPU, RAM, persistent storage, and networking resources which it uses as efficiently as possible.
@@ -187,8 +193,8 @@ Glossary
Cache dummy rows are entries in the row set, which have a clustering position, although they do not represent CQL rows written by users. ScyllaDB cache uses them to mark boundaries of population ranges, to represent the information that the whole range is complete, and there is no need to go to sstables to read the gaps between existing row entries when scanning.
Workload
A database category that allows you to manage different sources of database activities, such as requests or administrative activities. By defining workloads, you can specify how ScyllaDB will process those activities. For example, `ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/>`_
ships with a feature that allows you to prioritize one workload over another (e.g., user requests over administrative activities). See `Workload Prioritization <https://enterprise.docs.scylladb.com/stable/using-scylla/workload-prioritization.html>`_.
A database category that allows you to manage different sources of database activities, such as requests or administrative activities. By defining workloads, you can specify how ScyllaDB will process those activities. For example, ScyllaDB
ships with a feature that allows you to prioritize one workload over another (e.g., user requests over administrative activities). See :doc:`Workload Prioritization </features/workload-prioritization/>`.
MurmurHash3
A hash function `created by Austin Appleby <https://en.wikipedia.org/wiki/MurmurHash>`_, and used by the :term:`Partitioner` to distribute the partitions between nodes.

View File

@@ -15,4 +15,3 @@ Reference
* :doc:`Limits </reference/limits>`
* :doc:`API Reference </reference/api-reference>`
* :doc:`Metrics </reference/metrics>`
* .. scylladb_include_flag:: enterprise-vs-oss-matrix-link.rst

View File

@@ -1,50 +0,0 @@
There are two alternative upgrade procedures:
* :ref:`Upgrading ScyllaDB and simultaneously updating 3rd party and OS packages <upgrade-image-recommended-procedure>`. It is recommended if you are running a ScyllaDB official image (EC2 AMI, GCP, and Azure images), which is based on Ubuntu 20.04.
* :ref:`Upgrading ScyllaDB without updating any external packages <upgrade-image-enterprise-upgrade-guide-regular-procedure>`.
.. _upgrade-image-recommended-procedure:
**To upgrade ScyllaDB and update 3rd party and OS packages (RECOMMENDED):**
Choosing this upgrade procedure allows you to upgrade your ScyllaDB version and update the 3rd party and OS packages using one command.
#. Update the |SCYLLA_REPO|_ to |NEW_VERSION|.
#. Load the new repo:
.. code:: sh
sudo apt-get update
#. Run the following command to update the manifest file:
.. code:: sh
cat scylla-enterprise-packages-<version>-<arch>.txt | sudo xargs -n1 apt-get install -y
Where:
* ``<version>`` - The ScyllaDB version to which you are upgrading ( |NEW_VERSION| ).
* ``<arch>`` - Architecture type: ``x86_64`` or ``aarch64``.
The file is included in the ScyllaDB packages downloaded in the previous step. The file location is ``http://downloads.scylladb.com/downloads/scylla-enterprise/aws/manifest/scylla-enterprise-packages-<version>-<arch>.txt``.
Example:
.. code:: console
cat scylla-enterprise-packages-2022.1.10-x86_64.txt | sudo xargs -n1 apt-get install -y
.. note::
Alternatively, you can update the manifest file with the following command:
``sudo apt-get install $(awk '{print $1'} scylla-enterprise-packages-<version>-<arch>.txt) -y``
.. _upgrade-image-enterprise-upgrade-guide-regular-procedure:

View File

@@ -0,0 +1,27 @@
================
About Upgrade
================
ScyllaDB upgrade is a rolling procedure - it does not require a full cluster
shutdown and is performed without any downtime or disruption of service.
To ensure a successful upgrade, follow
the :doc:`documented upgrade procedures <upgrade-guides/index>` tested by
ScyllaDB. This means that:
* You should perform the upgrades consecutively - to each successive X.Y
version, **without skipping any major or minor version**, unless there is
a documented upgrade procedure to bypass a version.
* Before you upgrade to the next version, the whole cluster (each node) must
be upgraded to the previous version.
* You cannot perform an upgrade by replacing the nodes in the cluster with new
nodes with a different ScyllaDB version. You should never add a new node with
a different version to a cluster - if you
:doc:`add a node </operating-scylla/procedures/cluster-management/add-node-to-cluster>`,
it must have the same X.Y.Z (major.minor.patch) version as the other nodes in
the cluster.
Upgrading to each patch version by following the Maintenance Release Upgrade
Guide is optional. However, we recommend upgrading to the latest patch release
for your version before upgrading to a new version.

View File

@@ -4,52 +4,13 @@ Upgrade ScyllaDB
.. toctree::
:titlesonly:
:hidden:
ScyllaDB Versioning <scylladb-versioning>
ScyllaDB Open Source Upgrade <upgrade-opensource/index>
ScyllaDB Open Source to ScyllaDB Enterprise Upgrade <upgrade-to-enterprise/index>
ScyllaDB Image <ami-upgrade>
ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/index.html>
Overview
---------
ScyllaDB upgrade is a rolling procedure - it does not require a full cluster shutdown and is performed without any
downtime or disruption of service.
To ensure a successful upgrade, follow the :ref:`documented upgrade procedures <upgrade-upgrade-procedures>` tested by ScyllaDB. This means that:
* You should perform the upgrades consecutively - to each successive X.Y version, **without skipping any major or minor version**.
* Before you upgrade to the next version, the whole cluster (each node) must be upgraded to the previous version.
* You cannot perform an upgrade by replacing the nodes in the cluster with new nodes with a different ScyllaDB version. You should never add a new node with a different version to a cluster - if you :doc:`add a node </operating-scylla/procedures/cluster-management/add-node-to-cluster>`, it must have the same X.Y.Z (major.minor.patch) version as the other nodes in the cluster.
Example
========
The following example shows the upgrade path for a 3-node cluster from version 4.3 to version 4.6:
#. Upgrade all three nodes to version 4.4.
#. Upgrade all three nodes to version 4.5.
#. Upgrade all three nodes to version 4.6.
Upgrading to each patch version by following the Maintenance Release Upgrade Guide
is optional. However, we recommend upgrading to the latest patch release for your version before upgrading to a new version.
For example, upgrade to patch 4.4.8 before upgrading to version 4.5.
.. _upgrade-upgrade-procedures:
Procedures for Upgrading ScyllaDB
-----------------------------------
* :doc:`Upgrade ScyllaDB Open Source <upgrade-opensource/index>`
* :doc:`Upgrade from ScyllaDB Open Source to ScyllaDB Enterprise <upgrade-to-enterprise/index>`
* :doc:`Upgrade ScyllaDB Image <ami-upgrade>`
* `Upgrade ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/stable/upgrade/upgrade-enterprise/index.html>`_
About Upgrade <about-upgrade>
Upgrade Guides <upgrade-guides/index>

View File

@@ -1,61 +0,0 @@
============================
ScyllaDB Versioning
============================
ScyllaDB follows the ``MAJOR.MINOR.PATCH`` `semantic versioning <https://semver.org/>`_:
* ``MAJOR`` versions contain significant changes in the product and may introduce incompatible API changes.
* ``MINOR`` versions introduce new features and improvements in a backward-compatible manner.
* ``PATCH`` versions have backward-compatible bug fixes.
**Examples**
ScyllaDB Open Source:
* ``MAJOR`` versions: 4.y, 5.y
* ``MINOR`` versions: 5.2.z, 5.4.z
* ``PATCH`` versions: 5.2.1, 5.2.2
ScyllaDB Enterprise:
* ``MAJOR`` versions: 2021.y.x, 2022.y.z
* ``MINOR`` versions: 2022.1.z, 2022.2.z
* ``PATCH`` versions: 2022.1.1, 2022.1.2
.. only:: enterprise
ScyllaDB Enterprise Version Support Policy
----------------------------------------------------
ScyllaDB Enterprise supports two latest ``MAJOR`` versions and two latest ``MINOR`` versions. They are referred to as LTS (long-term support) and feature releass, respectively.
**Example**
Let's assume that the following versions are available as of today:
2021.1, 2022.1, 2022.2, 2022.3, 2022.4
The following versions would be supported:
* 2021.1 and 2022.1 - two latest ``MAJOR`` versions (LTS)
* 2022.3 and 2022.4 - two latest ``MINOR`` versions (feature releases)
Version 2022.2 would not be supported.
LTS vs. Feature Releases
-----------------------------
Long-Term Support (LTS) - Major Versions:
* Released approximately once a year.
Feature Releases - Minor Versions:
* 3-4 releases per year
* Closely follow ScyllaDB Open Source releases (see `ScyllaDB Enterprise vs. Open Source Matrix <https://enterprise.docs.scylladb.com/stable/reference/versions-matrix-enterprise-oss.html>`_)
* Introduce features added in ScyllaDB Open Source, as well as Enterprise-only premium features

View File

@@ -0,0 +1,12 @@
====================
Upgrade ScyllaDB
====================
.. toctree::
ScyllaDB 6.2 to 2025.1 <upgrade-guide-from-6.2-to-2025.1/index>
ScyllaDB 2024.x to 2025.1 <upgrade-guide-from-2024.x-to-2025.1/index>
ScyllaDB Image <ami-upgrade>

View File

@@ -0,0 +1,15 @@
==========================================================
Upgrade - ScyllaDB 2024.x to ScyllaDB Enterprise 2025.1
==========================================================
.. toctree::
:maxdepth: 2
:hidden:
ScyllaDB <upgrade-guide-from-2024.x-to-2025.1>
Metrics <metric-update-2024.x-to-2025.1>
* :doc:`Upgrade ScyllaDB from 2024.x.y to 2025.1.y <upgrade-guide-from-2024.x-to-2025.1>`
* :doc:`Metrics Update Between 2024.x and 2025.1 <metric-update-2024.x-to-2025.1>`

View File

@@ -0,0 +1,74 @@
.. |SRC_VERSION| replace:: 2024.x
.. |NEW_VERSION| replace:: 2025.1
=======================================================================================
Metrics Update Between |SRC_VERSION| and |NEW_VERSION|
=======================================================================================
ScyllaDB Enterprise |NEW_VERSION| Dashboards are available as part of the latest |mon_root|.
New Metrics
------------
The following metrics are new in ScyllaDB |NEW_VERSION| compared to |SRC_VERSION|:
.. list-table::
:widths: 25 150
:header-rows: 1
* - Metric
- Description
* - scylla_alternator_batch_item_count
- The total number of items processed across all batches.
* - scylla_hints_for_views_manager_sent_bytes_total
- The total size of the sent hints (in bytes).
* - scylla_hints_manager_sent_bytes_total
- The total size of the sent hints (in bytes).
* - scylla_io_queue_activations
- The number of times the class was woken up from idle.
* - scylla_raft_apply_index
- The applied index.
* - scylla_raft_commit_index
- The commit index.
* - scylla_raft_log_last_index
- The index of the last log entry.
* - scylla_raft_log_last_term
- The term of the last log entry.
* - scylla_raft_snapshot_last_index
- The index of the snapshot.
* - scylla_raft_snapshot_last_term
- The term of the snapshot.
* - scylla_raft_state
- The current state: 0 - follower, 1 - candidate, 2 - leader
* - scylla_rpc_client_delay_samples
- The total number of delay samples.
* - scylla_rpc_client_delay_total
- The total delay in seconds.
* - scylla_storage_proxy_replica_received_hints_bytes_total
- The total size of hints and MV hints received by this node.
* - scylla_storage_proxy_replica_received_hints_total
- The number of hints and MV hints received by this node.
Renamed Metrics
------------------
The following metrics are renamed in ScyllaDB |NEW_VERSION| compared to |SRC_VERSION|:
.. list-table::
:widths: 25 150
:header-rows: 1
* - 2024.2
- 2025.1
* - scylla_hints_for_views_manager_sent
- scylla_hints_for_views_manager_sent_total
* - scylla_hints_manager_sent
- scylla_hints_manager_sent_total
* - scylla_forward_service_requests_dispatched_to_other_nodes
- scylla_mapreduce_service_requests_dispatched_to_other_nodes
* - scylla_forward_service_requests_dispatched_to_own_shards
- scylla_mapreduce_service_requests_dispatched_to_own_shards
* - scylla_forward_service_requests_executed
- scylla_mapreduce_service_requests_executed

View File

@@ -1,48 +1,26 @@
.. |SCYLLA_NAME| replace:: ScyllaDB
.. |SRC_VERSION| replace:: 6.0
.. |NEW_VERSION| replace:: 2024.2
.. |DEBIAN_SRC_REPO| replace:: Debian
.. _DEBIAN_SRC_REPO: https://www.scylladb.com/download/?platform=debian-11&version=scylla-6.0
.. |UBUNTU_SRC_REPO| replace:: Ubuntu
.. _UBUNTU_SRC_REPO: https://www.scylladb.com/download/?platform=ubuntu-22.04&version=scylla-6.0
.. |SCYLLA_DEB_SRC_REPO| replace:: ScyllaDB deb repo (|DEBIAN_SRC_REPO|_, |UBUNTU_SRC_REPO|_)
.. |SCYLLA_RPM_SRC_REPO| replace:: ScyllaDB rpm repo
.. _SCYLLA_RPM_SRC_REPO: https://www.scylladb.com/download/?platform=centos&version=scylla-6.0
.. |DEBIAN_NEW_REPO| replace:: Debian
.. _DEBIAN_NEW_REPO: https://www.scylladb.com/customer-portal/?product=ent&platform=debian-11&version=stable-release-2024.2
.. |UBUNTU_NEW_REPO| replace:: Ubuntu
.. _UBUNTU_NEW_REPO: https://www.scylladb.com/customer-portal/?product=ent&platform=ubuntu-22.04&version=stable-release-2024.2
.. |SCYLLA_DEB_NEW_REPO| replace:: ScyllaDB deb repo (|DEBIAN_NEW_REPO|_, |UBUNTU_NEW_REPO|_)
.. |SCYLLA_RPM_NEW_REPO| replace:: ScyllaDB rpm repo
.. _SCYLLA_RPM_NEW_REPO: https://www.scylladb.com/customer-portal/?product=ent&platform=centos7&version=stable-release-2024.2
.. |SRC_VERSION| replace:: 2024.x
.. |NEW_VERSION| replace:: 2025.1
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: ./#rollback-procedure
.. |SCYLLA_METRICS| replace:: ScyllaDB Enterprise Metrics Update - ScyllaDB Enterprise 6.0 to 2024.2
.. _SCYLLA_METRICS: ../metric-update-6.0-to-2024.2
.. |SCYLLA_METRICS| replace:: ScyllaDB Metrics Update - ScyllaDB 2024.x to 2025.1
.. _SCYLLA_METRICS: ../metric-update-2024.x-to-2025.1
=============================================================================
Upgrade Guide - |SCYLLA_NAME| |SRC_VERSION| to |NEW_VERSION|
Upgrade |SCYLLA_NAME| from |SRC_VERSION| to |NEW_VERSION|
=============================================================================
This document is a step-by-step procedure for upgrading from |SCYLLA_NAME| |SRC_VERSION|
to |SCYLLA_NAME| Enterpise |NEW_VERSION|, and rollback to version |SRC_VERSION| if required.
to |NEW_VERSION|, and rollback to version |SRC_VERSION| if required.
This guide covers upgrading ScyllaDB on Red Hat Enterprise Linux (RHEL) CentOS, Debian,
and Ubuntu. See :doc:`OS Support by Platform and Version </getting-started/os-support>`
for information about supported versions.
This guide also applies when you're upgrading ScyllaDB Enterprise official image on EC2,
This guide also applies when you're upgrading ScyllaDB official image on EC2,
GCP, or Azure.
@@ -68,34 +46,6 @@ We recommend upgrading the Monitoring Stack to the latest version.
See the ScyllaDB Release Notes for the latest updates. The Release Notes are published
at the `ScyllaDB Community Forum <https://forum.scylladb.com/>`_.
.. note::
Unlike ScyllaDB 6.0, ScyllaDB Enterprise 2024.2 has **tablets disabled by
default**. This means that after you upgrade to 2024.2:
* Keyspaces that had tablets enabled in 6.0 will continue to work with tablets.
* Keyspaces created with default settings after upgrading to 2024.2 will have
tablets disabled.
To use tablets, create a new keyspace with the ``tablets = { 'enabled': true }``
option. For example:
.. code::
CREATE KEYSPACE my_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'replication_factor': 3
} AND tablets = {
'enabled': true
};
All tables created in this keyspace will use tablets.
Note that ``NetworkTopologyStrategy`` is required when tablets are enabled.
See :doc:`Data Distribution with Tablets </architecture/tablets/>` for more information
about tablets.
Upgrade Procedure
=================
@@ -136,12 +86,15 @@ procedure will fail if there is a schema disagreement between nodes.
nodetool describecluster
Drain the nodes and backup the data
Backup the data
-----------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all
the data to an external device. In ScyllaDB, you can backup the data using
the ``nodetool snapshot`` command. For **each** node in the cluster, run
Before any major procedure, like an upgrade, it is recommended to backup all the data
to an external device.
We recommend using `ScyllaDB Manager <https://manager.docs.scylladb.com/stable/backup/index.html>`_
to create backups.
Alternatively, you can use the ``nodetool snapshot`` command. For **each** node in the cluster, run
the following command:
.. code:: sh
@@ -158,9 +111,24 @@ When the upgrade is completed on all nodes, remove the snapshot with the
Backup the configuration file
------------------------------
.. code:: sh
Back up the ``scylla.yaml`` configuration file and the ScyllaDB packages
in case you need to rollback the upgrade.
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-src
.. tabs::
.. group-tab:: Debian/Ubuntu
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup
sudo cp /etc/apt/sources.list.d/scylla.list ~/scylla.list-backup
.. group-tab:: RHEL/CentOS
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup
sudo cp /etc/yum.repos.d/scylla.repo ~/scylla.repo-backup
Gracefully stop the node
------------------------
@@ -180,14 +148,11 @@ the upgrade.
.. group-tab:: Debian/Ubuntu
#. Update the |SCYLLA_DEB_NEW_REPO| to |NEW_VERSION|.
#. Configure Java 1.8:
#. Update the ScyllaDB deb repo to |NEW_VERSION|.
.. code-block:: console
sudo apt-get update
sudo apt-get install -y openjdk-8-jre-headless
sudo update-java-alternatives -s java-1.8.0-openjdk-amd64
sudo wget -O /etc/apt/sources.list.d/scylla.list https://downloads.scylladb.com/deb/debian/scylla-2025.1.list
#. Install the new ScyllaDB version:
@@ -195,23 +160,24 @@ the upgrade.
sudo apt-get clean all
sudo apt-get update
sudo apt-get remove scylla\*
sudo apt-get install scylla-enterprise
sudo systemctl daemon-reload
sudo apt-get dist-upgrade scylla
Answer y to the first two questions.
.. group-tab:: RHEL/CentOS
#. Update the |SCYLLA_RPM_NEW_REPO|_ to |NEW_VERSION|.
#. Update the ScyllaDB rpm repo to |NEW_VERSION|.
.. code-block:: console
sudo curl -o /etc/yum.repos.d/scylla.repo -L https://downloads.scylladb.com/rpm/centos/scylla-2025.1.repo
#. Install the new ScyllaDB version:
.. code:: sh
sudo yum clean all
sudo rm -rf /var/cache/yum
sudo yum remove scylla\*
sudo yum install scylla-enterprise
sudo yum update scylla\* -y
.. group-tab:: EC2/GCP/Azure Ubuntu Image
@@ -220,21 +186,24 @@ the upgrade.
own image and have installed ScyllaDB packages for Ubuntu or Debian,
you need to apply an extended upgrade procedure:
#. Update the ScyllaDB deb repo (see above).
#. Configure Java 1.8 (see above).
#. Update the ScyllaDB deb repo (see the **Debian/Ubuntu** tab).
#. Install the new ScyllaDB version with the additional
``scylla-enterprise-machine-image`` package:
``scylla-machine-image`` package:
.. code::
sudo apt-get clean all
sudo apt-get update
sudo apt-get dist-upgrade scylla-enterprise
sudo apt-get dist-upgrade scylla-enterprise-machine-image
sudo apt-get dist-upgrade scylla
sudo apt-get dist-upgrade scylla-machine-image
#. Run ``scylla_setup`` without running ``io_setup``.
#. Run ``sudo /opt/scylladb/scylla-machine-image/scylla_cloud_io_setup``.
If you need JMX server, see
:doc:`Install scylla-jmx Package </getting-started/installation-common/install-jmx>`
and get new version.
Start the node
--------------
@@ -278,7 +247,6 @@ For each of the nodes you rollback to |SRC_VERSION|, you will:
* Drain the node and stop ScyllaDB
* Retrieve the old ScyllaDB packages
* Restore the configuration file
* Restore system tables
* Reload systemd configuration
* Restart ScyllaDB
* Validate the rollback success
@@ -311,14 +279,22 @@ Download and install the old release
sudo rm -rf /etc/apt/sources.list.d/scylla.list
#. Update the |SCYLLA_DEB_SRC_REPO| to |SRC_VERSION|.
#. Restore the |SRC_VERSION| packages backed up during the upgrade.
.. code:: sh
sudo cp ~/scylla.list-backup /etc/apt/sources.list.d/scylla.list
sudo chown root.root /etc/apt/sources.list.d/scylla.list
sudo chmod 644 /etc/apt/sources.list.d/scylla.list
#. Install:
.. code-block::
sudo apt-get update
sudo apt-get remove scylla\* -y
sudo apt-get install scylla
sudo apt-get install scylla-enterprise
Answer y to the first two questions.
@@ -330,18 +306,42 @@ Download and install the old release
sudo rm -rf /etc/yum.repos.d/scylla.repo
#. Update the |SCYLLA_RPM_SRC_REPO|_ to |SRC_VERSION|.
#. Restore the |SRC_VERSION| packages backed up during the upgrade procedure.
.. code:: sh
sudo cp ~/scylla.repo-backup /etc/yum.repos.d/scylla.repo
sudo chown root.root /etc/yum.repos.d/scylla.repo
sudo chmod 644 /etc/yum.repos.d/scylla.repo
#. Install:
.. code:: console
sudo yum clean all
sudo yum remove scylla\*
sudo yum install scylla
sudo yum install scylla-enterprise
.. note::
If you are running a ScyllaDB Enterprise official image (for EC2 AMI, GCP, or Azure), follow the instructions for Ubuntu.
.. group-tab:: EC2/GCP/Azure Ubuntu Image
If youre using the ScyllaDB official image (recommended), see the **Debian/Ubuntu**
tab for upgrade instructions.
If youre using your own image and installed ScyllaDB packages for Ubuntu or Debian,
you need to additionally restore the ``scylla-machine-image`` package.
#. Restore the |SRC_VERSION| packages backed up during the upgrade
(see the **Debian/Ubuntu** tab).
#. Install:
.. code-block::
sudo apt-get update
sudo apt-get remove scylla\* -y
sudo apt-get install scylla-enterprise
sudo apt-get install scylla-enterpraise-machine-image
Answer y to the first two questions.
Restore the configuration file
------------------------------
@@ -349,24 +349,7 @@ Restore the configuration file
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-src | /etc/scylla/scylla.yaml
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from the previous snapshot because
|NEW_VERSION| uses a different set of system tables.
See :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>`
for reference.
.. code:: console
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo find . -maxdepth 1 -type f -exec sudo rm -f "{}" +
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo chown -R scylla:scylla /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo cp /etc/scylla/scylla.yaml-backup /etc/scylla/scylla.yaml
Reload systemd configuration
----------------------------

View File

@@ -0,0 +1,13 @@
=====================================
ScyllaDB 6.2 to 2025.1 Upgrade Guide
=====================================
.. toctree::
:maxdepth: 2
:hidden:
Upgrade ScyllaDB <upgrade-guide-from-6.2-to-2025.1>
Metrics Update <metric-update-6.2-to-2025.1>
* :doc:`Upgrade ScyllaDB from 6.2 .x to 2025.1.y <upgrade-guide-from-6.2-to-2025.1>`
* :doc:`Metrics Update Between 6.2 and 2025.1 <metric-update-6.2-to-2025.1>`

View File

@@ -0,0 +1,54 @@
.. |SRC_VERSION| replace:: 6.2
.. |NEW_VERSION| replace:: 2025.1
Metrics Update Between |SRC_VERSION| and |NEW_VERSION|
================================================================
.. toctree::
:maxdepth: 2
:hidden:
ScyllaDB |NEW_VERSION| Dashboards are available as part of the latest |mon_root|.
New Metrics
------------
The following metrics are new in ScyllaDB |NEW_VERSION| compared to |SRC_VERSION|:
.. list-table::
:widths: 25 150
:header-rows: 1
* - Metric
- Description
* - scylla_alternator_rcu_total
- The total number of consumed read units, counted as half units.
* - scylla_alternator_wcu_total
- The total number of consumed write units, counted as half units.
* - scylla_rpc_compression_bytes_received
- The bytes read from RPC connections after decompression.
* - scylla_rpc_compression_bytes_sent
- The bytes written to RPC connections before compression.
* - scylla_rpc_compression_compressed_bytes_received
- The bytes read from RPC connections before decompression.
* - scylla_rpc_compression_compressed_bytes_sent
- The bytes written to RPC connections after compression.
* - scylla_rpc_compression_compression_cpu_nanos
- The nanoseconds spent on compression.
* - scylla_rpc_compression_decompression_cpu_nanos
- The nanoseconds spent on decompression.
* - scylla_rpc_compression_messages_received
- The RPC messages received.
* - scylla_rpc_compression_messages_sent
- The RPC messages sent.

View File

@@ -1,35 +1,13 @@
.. |SCYLLA_NAME| replace:: ScyllaDB
.. |SRC_VERSION| replace:: 6.1
.. |NEW_VERSION| replace:: 6.2
.. |DEBIAN_SRC_REPO| replace:: Debian
.. _DEBIAN_SRC_REPO: https://www.scylladb.com/download/?platform=debian-11&version=scylla-6.1
.. |UBUNTU_SRC_REPO| replace:: Ubuntu
.. _UBUNTU_SRC_REPO: https://www.scylladb.com/download/?platform=ubuntu-22.04&version=scylla-6.1
.. |SCYLLA_DEB_SRC_REPO| replace:: ScyllaDB deb repo (|DEBIAN_SRC_REPO|_, |UBUNTU_SRC_REPO|_)
.. |SCYLLA_RPM_SRC_REPO| replace:: ScyllaDB rpm repo
.. _SCYLLA_RPM_SRC_REPO: https://www.scylladb.com/download/?platform=centos&version=scylla-6.1
.. |DEBIAN_NEW_REPO| replace:: Debian
.. _DEBIAN_NEW_REPO: https://www.scylladb.com/download/?platform=debian-11&version=scylla-6.2
.. |UBUNTU_NEW_REPO| replace:: Ubuntu
.. _UBUNTU_NEW_REPO: https://www.scylladb.com/download/?platform=ubuntu-22.04&version=scylla-6.2
.. |SCYLLA_DEB_NEW_REPO| replace:: ScyllaDB deb repo (|DEBIAN_NEW_REPO|_, |UBUNTU_NEW_REPO|_)
.. |SCYLLA_RPM_NEW_REPO| replace:: ScyllaDB rpm repo
.. _SCYLLA_RPM_NEW_REPO: https://www.scylladb.com/download/?platform=centos&version=scylla-6.1
.. |SRC_VERSION| replace:: 6.2
.. |NEW_VERSION| replace:: 2025.1
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: ./#rollback-procedure
.. |SCYLLA_METRICS| replace:: ScyllaDB Metrics Update - ScyllaDB 6.1 to 6.2
.. _SCYLLA_METRICS: ../metric-update-6.1-to-6.2
.. |SCYLLA_METRICS| replace:: ScyllaDB Metrics Update - ScyllaDB 6.2 to 2025.1
.. _SCYLLA_METRICS: ../metric-update-6.2-to-2025.1
=============================================================================
Upgrade |SCYLLA_NAME| from |SRC_VERSION| to |NEW_VERSION|
@@ -80,14 +58,17 @@ For each of the nodes in the cluster, serially (i.e., one node at a time), you w
* Start ScyllaDB
* Validate that the upgrade was successful
Apply the following procedure **serially** on each node. Do not move to the next
node before validating that the node you upgraded is up and running the new version.
.. caution::
Apply the procedure **serially** on each node. Do not move to the next node before
validating that the node you upgraded is up and running the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use the new |NEW_VERSION| features.
* Not to run administration functions, such as repairs, refresh, rebuild, or add
or remove nodes.
or remove nodes. See `sctool <https://manager.docs.scylladb.com/stable/sctool/>`_ for suspending
ScyllaDB Manager's scheduled or running repairs.
* Not to apply schema changes.
Upgrade Steps
@@ -103,12 +84,16 @@ procedure will fail if there is a schema disagreement between nodes.
nodetool describecluster
Drain the nodes and backup data
Backup the data
-----------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data
to an external device. In ScyllaDB, backup is performed using the ``nodetool snapshot``
command. For **each** node in the cluster, run the following command:
to an external device.
We recommend using `ScyllaDB Manager <https://manager.docs.scylladb.com/stable/backup/index.html>`_
to create backups.
Alternatively, you can use the ``nodetool snapshot`` command.
For **each** node in the cluster, run the following:
.. code:: sh
@@ -161,7 +146,11 @@ You should take note of the current version in case you want to |ROLLBACK|_ the
.. group-tab:: Debian/Ubuntu
#. Update the |SCYLLA_DEB_NEW_REPO| to |NEW_VERSION|.
#. Update the ScyllaDB deb repo to |NEW_VERSION|.
.. code-block:: console
sudo wget -O /etc/apt/sources.list.d/scylla.list https://downloads.scylladb.com/deb/debian/scylla-2025.1.list
#. Install the new ScyllaDB version:
@@ -171,21 +160,16 @@ You should take note of the current version in case you want to |ROLLBACK|_ the
sudo apt-get update
sudo apt-get dist-upgrade scylla
#. Remove old scylla-jmx package since the package is not used anymore:
.. code-block:: console
sudo apt-get purge scylla-jmx
scylla-jmx becomes optional package from ScyllaDB 6.2.
If you still need JMX server, see :doc:`Install scylla-jmx Package </getting-started/installation-common/install-jmx>` and get new version.
Answer y to the first two questions.
.. group-tab:: RHEL/CentOS
#. Update the |SCYLLA_RPM_NEW_REPO|_ to |NEW_VERSION|.
#. Update the ScyllaDB rpm repo to |NEW_VERSION|.
.. code-block:: console
sudo curl -o /etc/yum.repos.d/scylla.repo -L https://downloads.scylladb.com/rpm/centos/scylla-2025.1.repo
#. Install the new ScyllaDB version:
.. code:: sh
@@ -193,16 +177,6 @@ You should take note of the current version in case you want to |ROLLBACK|_ the
sudo yum clean all
sudo yum update scylla\* -y
#. Remove old scylla-jmx package since the package is not used anymore:
.. code:: sh
sudo yum remove scylla-jmx
scylla-jmx becomes optional package from ScyllaDB 6.2.
If you still need JMX server, see :doc:`Install scylla-jmx Package </getting-started/installation-common/install-jmx>` and get new version.
.. group-tab:: EC2/GCP/Azure Ubuntu Image
If youre using the ScyllaDB official image (recommended), see the **Debian/Ubuntu**
@@ -211,7 +185,7 @@ You should take note of the current version in case you want to |ROLLBACK|_ the
If youre using your own image and installed ScyllaDB packages for Ubuntu or Debian,
you need to apply an extended upgrade procedure:
#. Update the |SCYLLA_DEB_NEW_REPO| to |NEW_VERSION|.
#. Update the ScyllaDB deb repo (see the **Debian/Ubuntu** tab).
#. Install the new ScyllaDB version with the additional ``scylla-machine-image`` package:
.. code-block:: console
@@ -221,19 +195,14 @@ You should take note of the current version in case you want to |ROLLBACK|_ the
sudo apt-get dist-upgrade scylla
sudo apt-get dist-upgrade scylla-machine-image
#. Remove old scylla-jmx package since the package is not used anymore:
.. code-block:: console
sudo apt-get purge scylla-jmx
scylla-jmx becomes optional package from ScyllaDB 6.2.
If you still need JMX server, see :doc:`Install scylla-jmx Package </getting-started/installation-common/install-jmx>` and get new version.
#. Run ``scylla_setup`` without ``running io_setup``.
#. Run ``sudo /opt/scylladb/scylla-machine-image/scylla_cloud_io_setup``.
If you need JMX server, see
:doc:`Install scylla-jmx Package </getting-started/installation-common/install-jmx>`
and get new version.
Start the node
--------------
@@ -250,7 +219,7 @@ Validate
to check the ScyllaDB version. Validate that the version matches the one you upgraded to.
#. Check scylla-server log (by ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to
validate there are no new errors in the log.
#. Check again after two minutes, to validate no new issues are introduced.
#. Check again after two minutes to validate no new issues are introduced.
Once you are sure the node upgrade was successful, move to the next node in the cluster.
@@ -260,7 +229,7 @@ Rollback Procedure
.. warning::
The rollback procedure can be applied **only** if some nodes have not been
upgraded to |NEW_VERSION| yet.As soon as the last node in the rolling upgrade
upgraded to |NEW_VERSION| yet. As soon as the last node in the rolling upgrade
procedure is started with |NEW_VERSION|, rollback becomes impossible. At that
point, the only way to restore a cluster to |SRC_VERSION| is by restoring it
from backup.
@@ -290,13 +259,13 @@ running the old version.
Rollback Steps
==============
Drain and gracefully stop the node
----------------------------------
.. code:: sh
nodetool drain
nodetool snapshot
sudo service scylla-server stop
Restore and install the old release
@@ -306,6 +275,12 @@ Restore and install the old release
.. group-tab:: Debian/Ubuntu
#. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/apt/sources.list.d/scylla.list
#. Restore the |SRC_VERSION| packages backed up during the upgrade.
.. code:: sh
@@ -326,6 +301,12 @@ Restore and install the old release
.. group-tab:: RHEL/CentOS
#. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/yum.repos.d/scylla.repo
#. Restore the |SRC_VERSION| packages backed up during the upgrade procedure.
.. code:: sh
@@ -339,11 +320,30 @@ Restore and install the old release
.. code:: console
sudo yum clean all
sudo rm -rf /var/cache/yum
sudo yum downgrade scylla-\*cqlsh -y
sudo yum remove scylla-\*cqlsh -y
sudo yum downgrade scylla\* -y
sudo yum install scylla -y
sudo yum remove scylla\*
sudo yum install scylla
.. group-tab:: EC2/GCP/Azure Ubuntu Image
If youre using the ScyllaDB official image (recommended), see the **Debian/Ubuntu**
tab for upgrade instructions.
If youre using your own image and installed ScyllaDB packages for Ubuntu or Debian,
you need to additionally restore the ``scylla-machine-image`` package.
#. Restore the |SRC_VERSION| packages backed up during the upgrade
(see the **Debian/Ubuntu** tab).
#. Install:
.. code-block::
sudo apt-get update
sudo apt-get remove scylla\* -y
sudo apt-get install scylla
sudo apt-get install scylla-machine-image
Answer y to the first two questions.
Restore the configuration file
------------------------------

View File

@@ -1,16 +0,0 @@
=============================
Upgrade ScyllaDB Open Source
=============================
.. toctree::
:hidden:
ScyllaDB 6.1 to 6.2 <upgrade-guide-from-6.1-to-6.2/index>
ScyllaDB 6.x Maintenance Upgrade <upgrade-guide-from-6.x.y-to-6.x.z>
Procedures for upgrading to a newer version of ScyllaDB Open Source.
* :doc:`ScyllaDB 6.1 to 6.2 <upgrade-guide-from-6.1-to-6.2/index>`
* :doc:`ScyllaDB 6.x Maintenance Upgrade <upgrade-guide-from-6.x.y-to-6.x.z>`

View File

@@ -1,13 +0,0 @@
=====================================
ScyllaDB 6.1 to 6.2 Upgrade Guide
=====================================
.. toctree::
:maxdepth: 2
:hidden:
Upgrade ScyllaDB <upgrade-guide-from-6.1-to-6.2-generic>
Metrics Update <metric-update-6.1-to-6.2>
* :doc:`Upgrade ScyllaDB from 6.1.x to 6.2.y <upgrade-guide-from-6.1-to-6.2-generic>`
* :doc:`ScyllaDB Metrics Update - ScyllaDB 6.1 to 6.2 <metric-update-6.1-to-6.2>`

View File

@@ -1,32 +0,0 @@
.. |SRC_VERSION| replace:: 6.1
.. |NEW_VERSION| replace:: 6.2
ScyllaDB Metric Update - ScyllaDB |SRC_VERSION| to |NEW_VERSION|
================================================================
.. toctree::
:maxdepth: 2
:hidden:
ScyllaDB |NEW_VERSION| Dashboards are available as part of the latest |mon_root|.
New Metrics
------------
The following metrics are new in ScyllaDB |NEW_VERSION|:
.. list-table::
:widths: 25 150
:header-rows: 1
* - Metric
- Description
* - scylla_alternator_batch_item_count
- The total number of items processed across all batches

View File

@@ -1,252 +0,0 @@
.. |SCYLLA_NAME| replace:: ScyllaDB
.. |SRC_VERSION| replace:: 6.x.y
.. |NEW_VERSION| replace:: 6.x.z
.. |MINOR_VERSION| replace:: 6.x
.. |SCYLLA_DEB_NEW_REPO| replace:: ScyllaDB deb repo
.. _SCYLLA_DEB_NEW_REPO: https://www.scylladb.com/download/#open-source
.. |SCYLLA_RPM_NEW_REPO| replace:: ScyllaDB Enterprise rpm repo
.. _SCYLLA_RPM_NEW_REPO: https://www.scylladb.com/download/#open-source
=============================================================================
Upgrade Guide - |SCYLLA_NAME| |SRC_VERSION| to |NEW_VERSION|
=============================================================================
This document is a step-by-step procedure for upgrading from |SCYLLA_NAME| |SRC_VERSION|
to |SCYLLA_NAME| |NEW_VERSION| (where "z" is the latest available version).
Applicable Versions
===================
This guide covers upgrading ScyllaDB on Red Hat Enterprise Linux (RHEL), CentOS, Debian,
and Ubuntu. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for
information about supported versions.
This guide also applies when you're upgrading ScyllaDB Enterprise official image on EC2, GCP,
or Azure.
Upgrade Procedure
=================
.. note::
Apply the following procedure **serially** on each node. Do not move to the next node
before validating the node is up and running the new version.
A ScyllaDB upgrade is a rolling procedure which does **not** require full cluster shutdown.
For each of the nodes in the cluster, you will:
* Drain node and backup the data.
* Check your current release.
* Backup configuration file.
* Stop ScyllaDB.
* Download and install new ScyllaDB packages.
* Start ScyllaDB.
* Validate that the upgrade was successful.
**Before** upgrading, check what version you are running now using ``scylla --version``.
You should use the same version in case you want to rollback the upgrade.
**During** the rolling upgrade it is highly recommended:
* Not to use new |NEW_VERSION| features.
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes.
* Not to apply schema changes.
Upgrade steps
=============
Drain node and backup the data
------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to
an external device. In ScyllaDB, backup is done using the ``nodetool snapshot`` command.
For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all
the directories having this name under ``/var/lib/scylla`` to a backup device.
When the upgrade is complete (all nodes), the snapshot should be removed by
``nodetool clearsnapshot -t <snapshot>``, or you risk running out of space.
Backup the configuration file
------------------------------
Back up the scylla.yaml configuration file and the ScyllaDB packages in case
you need to rollback the upgrade.
.. tabs::
.. group-tab:: Debian/Ubuntu
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup
sudo cp /etc/apt/sources.list.d/scylla.list ~/scylla.list-backup
.. group-tab:: RHEL/CentOS
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup
sudo cp /etc/yum.repos.d/scylla.repo ~/scylla.repo-backup
Gracefully stop the node
------------------------
.. code:: sh
sudo service scylla-server stop
Download and install the new release
------------------------------------
.. tabs::
.. group-tab:: Debian/Ubuntu
#. Update the |SCYLLA_DEB_NEW_REPO|_ to |MINOR_VERSION|.
#. Install:
.. code:: sh
sudo apt-get clean all
sudo apt-get update
sudo apt-get dist-upgrade scylla
Answer y to the first two questions.
.. group-tab:: RHEL/CentOS
#. Update the |SCYLLA_RPM_NEW_REPO|_ to |MINOR_VERSION|.
#. Install:
.. code:: sh
sudo yum clean all
sudo yum update scylla\* -y
.. group-tab:: EC2/GCP/Azure Ubuntu Image
If you're using the ScyllaDB official image (recommended), see
the **Debian/Ubuntu** tab for upgrade instructions.
If you're using your own image and installed ScyllaDB packages for
Ubuntu or Debian, you need to apply an extended upgrade procedure:
#. Update the |SCYLLA_DEB_NEW_REPO|_ to |MINOR_VERSION|.
#. Install the new ScyllaDB version with the additional ``scylla-machine-image`` package:
.. code-block:: console
sudo apt-get clean all
sudo apt-get update
sudo apt-get dist-upgrade scylla
sudo apt-get dist-upgrade scylla-machine-image
#. Run ``scylla_setup`` without ``running io_setup``.
#. Run ``sudo /opt/scylladb/scylla-machine-image/scylla_cloud_io_setup``.
Start the node
--------------
.. code:: sh
sudo service start scylla-server
Validate
--------
1. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in UN status.
2. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version.
3. Use ``journalctl _COMM=scylla`` to check there are no new errors in the log.
4. Check again after 2 minutes, to validate no new issues are introduced.
Once you are sure the node upgrade is successful, move to the next node in the cluster.
Rollback Procedure
==================
The following procedure describes a rollback from ScyllaDB release |NEW_VERSION| to |SRC_VERSION|.
Apply this procedure if an upgrade from |SRC_VERSION| to |NEW_VERSION| failed before
completing on all nodes. Use this procedure only for nodes you upgraded to |NEW_VERSION|.
.. caution::
Apply the procedure **serially** on each node. Do not move to the next node
before validating the node is up and running with the new version.
ScyllaDB rollback is a rolling procedure which does **not** require full cluster shutdown.
For each of the nodes to rollback to |SRC_VERSION|, you will:
* Drain the node and stop ScyllaDB.
* Downgrade to previous release.
* Restore the configuration file.
* Restart ScyllaDB.
* Validate the rollback success.
Rollback steps
==============
Gracefully shutdown ScyllaDB
-----------------------------
.. code:: sh
nodetool drain
sudo service stop scylla-server
Downgrade to the previous release
----------------------------------
.. tabs::
.. group-tab:: Debian/Ubuntu
Install:
.. code-block:: console
:substitutions:
sudo apt-get install scylla=|SRC_VERSION|\* scylla-server=|SRC_VERSION|\* scylla-tools=|SRC_VERSION|\* scylla-tools-core=|SRC_VERSION|\* scylla-kernel-conf=|SRC_VERSION|\* scylla-conf=|SRC_VERSION|\*
sudo apt-get install scylla-machine-image=|SRC_VERSION|\* # only execute on AMI instance
Answer y to the first two questions.
.. group-tab:: RHEL/CentOS
Install:
.. code-block:: console
:substitutions:
sudo yum downgrade scylla\*-|SRC_VERSION|-\* -y
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup /etc/scylla/scylla.yaml
Start the node
--------------
.. code:: sh
sudo service scylla-server start
Validate
--------
Check upgrade instruction above for validation. Once you are sure the node
rollback is successful, move to the next node in the cluster.

View File

@@ -1,18 +0,0 @@
=========================================================
Upgrade from ScyllaDB Open Source to ScyllaDB Enterprise
=========================================================
.. toctree::
:titlesonly:
:hidden:
ScyllaDB 6.0 to ScyllaDB Enterprise 2024.2 <upgrade-guide-from-6.0-to-2024.2/index>
ScyllaDB 5.4 to ScyllaDB Enterprise 2024.1 <upgrade-guide-from-5.4-to-2024.1/index>
ScyllaDB 5.2 to ScyllaDB Enterprise 2023.1 <upgrade-guide-from-5.2-to-2023.1/index>
Procedures for upgrading from ScyllaDB Open Source to ScyllaDB Enterprise:
* :doc:`ScyllaDB 6.0 to ScyllaDB Enterprise 2024.2 </upgrade/upgrade-to-enterprise/upgrade-guide-from-6.0-to-2024.2/index>`
* :doc:`ScyllaDB 5.4 to ScyllaDB Enterprise 2024.1 </upgrade/upgrade-to-enterprise/upgrade-guide-from-5.4-to-2024.1/index>`
* :doc:`ScyllaDB 5.2 to ScyllaDB Enterprise 2023.1 </upgrade/upgrade-to-enterprise/upgrade-guide-from-5.2-to-2023.1/index>`

View File

@@ -1,20 +0,0 @@
======================================================
Upgrade - ScyllaDB 5.2 to ScyllaDB Enterprise 2023.1
======================================================
.. toctree::
:maxdepth: 2
:hidden:
ScyllaDB <upgrade-guide-from-5.2-to-2023.1-generic>
Metrics <metric-update-5.2-to-2023.1>
.. panel-box::
:title: Upgrade ScyllaDB
:id: "getting-started"
:class: my-panel
* :doc:`Upgrade ScyllaDB from 5.2.x to 2023.1.y <upgrade-guide-from-5.2-to-2023.1-generic>`
* :doc:`ScyllaDB Metrics Update - ScyllaDB 5.2 to 2023.1 <metric-update-5.2-to-2023.1>`

View File

@@ -1,5 +0,0 @@
===================================================================
ScyllaDB Metric Update - ScyllaDB 5.2 to ScyllaDB Enterprise 2023.1
===================================================================
There are no metric updates in ScyllaDB Enterprise 2023.1 compared to ScyllaDB 5.2.

View File

@@ -1,344 +0,0 @@
.. |SCYLLA_NAME| replace:: ScyllaDB
.. |SRC_VERSION| replace:: 5.2
.. |NEW_VERSION| replace:: 2023.1
.. |DEBIAN_SRC_REPO| replace:: Debian
.. _DEBIAN_SRC_REPO: http://www.scylladb.com/download/?platform=debian-10&version=scylla-5.2
.. |UBUNTU_SRC_REPO| replace:: Ubuntu
.. _UBUNTU_SRC_REPO: https://www.scylladb.com/download/?platform=ubuntu-20.04&version=scylla-5.2
.. |SCYLLA_DEB_SRC_REPO| replace:: ScyllaDB deb repo (|DEBIAN_SRC_REPO|_, |UBUNTU_SRC_REPO|_)
.. |SCYLLA_RPM_SRC_REPO| replace:: ScyllaDB rpm repo
.. _SCYLLA_RPM_SRC_REPO: https://www.scylladb.com/download/?platform=centos&version=scylla-5.2
.. |DEBIAN_NEW_REPO| replace:: Debian
.. _DEBIAN_NEW_REPO: https://www.scylladb.com/customer-portal/?product=ent&platform=debian-10&version=stable-release-2023.1
.. |UBUNTU_NEW_REPO| replace:: Ubuntu
.. _UBUNTU_NEW_REPO: https://www.scylladb.com/customer-portal/?product=ent&platform=ubuntu-20.04&version=stable-release-2023.1
.. |SCYLLA_DEB_NEW_REPO| replace:: ScyllaDB deb repo (|DEBIAN_NEW_REPO|_, |UBUNTU_NEW_REPO|_)
.. |SCYLLA_RPM_NEW_REPO| replace:: ScyllaDB rpm repo
.. _SCYLLA_RPM_NEW_REPO: https://www.scylladb.com/customer-portal/?product=ent&platform=centos7&version=stable-release-2023.1
.. |ROLLBACK| replace:: rollback
.. _ROLLBACK: ./#rollback-procedure
.. |SCYLLA_METRICS| replace:: ScyllaDB Enterprise Metrics Update - ScyllaDB Enterprise 5.2 to 2023.1
.. _SCYLLA_METRICS: ../metric-update-5.2-to-2023.1
=============================================================================
Upgrade Guide - |SCYLLA_NAME| |SRC_VERSION| to |NEW_VERSION|
=============================================================================
This document is a step by step procedure for upgrading from |SCYLLA_NAME| |SRC_VERSION| to |SCYLLA_NAME| Enterpise |NEW_VERSION|, and rollback to version |SRC_VERSION| if required.
This guide covers upgrading ScyllaDB on Red Hat Enterprise Linux (RHEL) CentOS, Debian, and Ubuntu. See :doc:`OS Support by Platform and Version </getting-started/os-support>` for information about supported versions.
This guide also applies when you're upgrading ScyllaDB Enterprise official image on EC2, GCP, or Azure. The image is based on Ubuntu 22.04.
Upgrade Procedure
=================
A ScyllaDB upgrade is a rolling procedure which does **not** require full cluster shutdown.
For each of the nodes in the cluster you will:
* Check that the cluster's schema is synchronized
* Drain the node and backup the data
* Backup the configuration file
* Stop ScyllaDB
* Download and install new ScyllaDB packages
* Start ScyllaDB
* Validate that the upgrade was successful
.. caution::
Apply the procedure **serially** on each node. Do not move to the next node before validating that the node you upgraded is up and running the new version.
**During** the rolling upgrade, it is highly recommended:
* Not to use the new |NEW_VERSION| features.
* Not to run administration functions, like repairs, refresh, rebuild or add or remove nodes. See `sctool <https://manager.docs.scylladb.com/stable/sctool/>`_ for suspending ScyllaDB Manager's scheduled or running repairs.
* Not to apply schema changes.
.. note:: Before upgrading, make sure to use the latest `ScyllaDB Monitoring <https://monitoring.docs.scylladb.com/>`_ stack.
Upgrade Steps
=============
Check the cluster schema
-------------------------
Make sure that all nodes have the schema synchronized before upgrade. The upgrade procedure will fail if there is a schema disagreement between nodes.
.. code:: sh
nodetool describecluster
Drain the nodes and backup the data
-----------------------------------
Before any major procedure, like an upgrade, it is recommended to backup all the data to an external device. In ScyllaDB, you can backup
the data using the ``nodetool snapshot`` command. For **each** node in the cluster, run the following command:
.. code:: sh
nodetool drain
nodetool snapshot
Take note of the directory name that nodetool gives you, and copy all the directories having that name under ``/var/lib/scylla`` to
a backup device.
When the upgrade is completed on all nodes, remove the snapshot with the ``nodetool clearsnapshot -t <snapshot>`` command to
prevent running out of space.
Backup the configuration file
------------------------------
.. code:: sh
sudo cp -a /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup-src
Gracefully stop the node
------------------------
.. code:: sh
sudo service scylla-server stop
.. _upgrade-debian-ubuntu-5.2-to-enterprise-2023.1:
Download and install the new release
------------------------------------
.. tabs::
.. group-tab:: Debian/Ubuntu
Before upgrading, check what version you are running now using ``scylla --version``. You should use the same version as this version in case you want to |ROLLBACK|_ the upgrade. If you are not running a |SRC_VERSION|.x version, stop right here! This guide only covers |SRC_VERSION|.x to |NEW_VERSION|.y upgrades.
**To upgrade ScyllaDB:**
#. Update the |SCYLLA_DEB_NEW_REPO| to |NEW_VERSION|
#. Configure Java 1.8:
.. code-block:: console
sudo apt-get update
sudo apt-get install -y openjdk-8-jre-headless
sudo update-java-alternatives -s java-1.8.0-openjdk-amd64
#. Install the new ScyllaDB version:
.. code-block:: console
sudo apt-get clean all
sudo apt-get update
sudo apt-get remove scylla\*
sudo apt-get install scylla-enterprise
sudo systemctl daemon-reload
Answer y to the first two questions.
.. group-tab:: RHEL/CentOS
Before upgrading, check what version you are running now using ``scylla --version``. You should use the same version as this version in case you want to |ROLLBACK|_ the upgrade. If you are not running a |SRC_VERSION|.x version, stop right here! This guide only covers |SRC_VERSION|.x to |NEW_VERSION|.y upgrades.
**To upgrade ScyllaDB:**
#. Update the |SCYLLA_RPM_NEW_REPO|_ to |NEW_VERSION|.
#. Install the new ScyllaDB version:
.. code:: sh
sudo yum clean all
sudo rm -rf /var/cache/yum
sudo yum remove scylla\*
sudo yum install scylla-enterprise
.. group-tab:: EC2/GCP/Azure Ubuntu Image
Before upgrading, check what version you are running now using ``scylla --version``. You should use the same version as this version in case you want to |ROLLBACK|_ the upgrade. If you are not running a |SRC_VERSION|.x version, stop right here! This guide only covers |SRC_VERSION|.x to |NEW_VERSION|.y upgrades.
If youre using the ScyllaDB official image (recommended), see
the **Debian/Ubuntu** tab for upgrade instructions. If youre using your
own image and have installed ScyllaDB packages for Ubuntu or Debian,
you need to apply an extended upgrade procedure:
#. Update the ScyllaDB deb repo (see above).
#. Configure Java 1.8 (see above).
#. Install the new ScyllaDB version with the additional
``scylla-enterprise-machine-image`` package:
.. code::
sudo apt-get clean all
sudo apt-get update
sudo apt-get dist-upgrade scylla-enterprise
sudo apt-get dist-upgrade scylla-enterprise-machine-image
#. Run ``scylla_setup`` without running ``io_setup``.
#. Run ``sudo /opt/scylladb/scylla-machine-image/scylla_cloud_io_setup``.
Start the node
--------------
.. code:: sh
sudo service scylla-server start
Validate
--------
#. Check cluster status with ``nodetool status`` and make sure **all** nodes, including the one you just upgraded, are in ``UN`` status.
#. Use ``curl -X GET "http://localhost:10000/storage_service/scylla_release_version"`` to check the ScyllaDB version. Validate that the version matches the one you upgraded to.
#. Check scylla-server log (using ``journalctl _COMM=scylla``) and ``/var/log/syslog`` to validate there are no new errors in the log.
#. Check again after two minutes, to validate no new issues are introduced.
Once you are sure the node upgrade was successful, move to the next node in the cluster.
See |Scylla_METRICS|_ for more information.
Rollback Procedure
==================
.. include:: /upgrade/_common/warning_rollback.rst
The following procedure describes a rollback from |SCYLLA_NAME| |NEW_VERSION|.x to |SRC_VERSION|.y. Apply this procedure if an upgrade from |SRC_VERSION| to |NEW_VERSION| failed before completing on all nodes. Use this procedure only for nodes you upgraded to |NEW_VERSION|.
.. warning::
The rollback procedure can be applied **only** if some nodes have not been upgraded to |NEW_VERSION| yet.
As soon as the last node in the rolling upgrade procedure is started with |NEW_VERSION|, rollback becomes impossible.
At that point, the only way to restore a cluster to |SRC_VERSION| is by restoring it from backup.
ScyllaDB rollback is a rolling procedure which does **not** require a full cluster shutdown.
For each of the nodes you rollback to |SRC_VERSION| you will:
* Drain the node and stop ScyllaDB
* Retrieve the old ScyllaDB packages
* Restore the configuration file
* Restore system tables
* Reload systemd configuration
* Restart ScyllaDB
* Validate the rollback success
Apply the following procedure **serially** on each node. Do not move to the next node before validating that the rollback was successful and the node is up and running the old version.
Rollback Steps
==============
Drain and gracefully stop the node
----------------------------------
.. code:: sh
nodetool drain
sudo service scylla-server stop
Download and install the old release
------------------------------------
..
TODO: downgrade for 3rd party packages in EC2/GCP/Azure - like in the upgrade section?
.. tabs::
.. group-tab:: Debian/Ubuntu
#. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/apt/sources.list.d/scylla.list
#. Update the |SCYLLA_DEB_SRC_REPO| to |SRC_VERSION|.
#. Install:
.. code-block::
sudo apt-get update
sudo apt-get remove scylla\* -y
sudo apt-get install scylla
Answer y to the first two questions.
.. group-tab:: RHEL/CentOS
#. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/yum.repos.d/scylla.repo
#. Update the |SCYLLA_RPM_SRC_REPO|_ to |SRC_VERSION|.
#. Install:
.. code:: console
sudo yum clean all
sudo yum remove scylla\*
sudo yum install scylla
.. group-tab:: EC2/GCP/Azure Ubuntu Image
#. Remove the old repo file.
.. code:: sh
sudo rm -rf /etc/apt/sources.list.d/scylla.list
#. Update the |SCYLLA_DEB_SRC_REPO| to |SRC_VERSION|.
#. Install:
.. code-block::
sudo apt-get update
sudo apt-get remove scylla\* -y
sudo apt-get install scylla
Answer y to the first two questions.
Restore the configuration file
------------------------------
.. code:: sh
sudo rm -rf /etc/scylla/scylla.yaml
sudo cp -a /etc/scylla/scylla.yaml.backup-src | /etc/scylla/scylla.yaml
Restore system tables
---------------------
Restore all tables of **system** and **system_schema** from the previous snapshot because |NEW_VERSION| uses a different set of system tables. See :doc:`Restore from a Backup and Incremental Backup </operating-scylla/procedures/backup-restore/restore/>` for reference.
.. code:: console
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo find . -maxdepth 1 -type f -exec sudo rm -f "{}" +
cd /var/lib/scylla/data/keyspace_name/table_name-UUID/snapshots/<snapshot_name>/
sudo cp -r * /var/lib/scylla/data/keyspace_name/table_name-UUID/
sudo chown -R scylla:scylla /var/lib/scylla/data/keyspace_name/table_name-UUID/
Reload systemd configuration
----------------------------
You must reload the unit file if the systemd unit file is changed.
.. code:: sh
sudo systemctl daemon-reload
Start the node
--------------
.. code:: sh
sudo service scylla-server start
Validate
--------
Check the upgrade instructions above for validation. Once you are sure the node rollback is successful, move to the next node in the cluster.

Some files were not shown because too many files have changed in this diff Show More