Commit Graph

46469 Commits

Author SHA1 Message Date
Lakshmi Narayanan Sreethar
b7e72b3167 replica/table::do_apply : do not check for async gate's closure
The `table::do_apply()` method verifies if the compaction group's async
gate is open to determine if the compaction group is active. Closing
this async gate prevents any new operations but waits for existing
holders to exit, allowing their operations to complete. When holding a
gate, holders will observe the gate as closed when it is being closed,
but this is irrelevant as they are already inside the gate and are
allowed to complete. All the callers of `table::do_apply()` already
enter the gate before calling the method. So, the async gate check
inside `table::do_apply()` will erroneously throw an exception when the
compaction group is closing despite holding the gate. This commit
removes the check to prevent this from happening.

Fixes #23348

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#23579

(cherry picked from commit 750f4baf44)

Closes scylladb/scylladb#23645
2025-04-08 18:59:22 +03:00
Yaron Kaikov
98359dbfb1 .github: Make "make-pr-ready-for-review" workflow run in base repo
in 57683c1a50 we fixed the `token` error,
but removed the checkout part which causing now the following error
```
failed to run git: fatal: not a git repository (or any of the parent directories): .git
```
Adding the repo checkout stage to avoid such error

Fixes: https://github.com/scylladb/scylladb/issues/22765

Closes scylladb/scylladb#23641

(cherry picked from commit 2dc7ea366b)

Closes scylladb/scylladb#23654
2025-04-08 13:49:27 +03:00
Botond Dénes
1a896169dc Merge '[Backport 2025.1] repair: release erm in repair_writer_impl::create_writer when possible' from Scylladb[bot]
Currently, repair_writer_impl::create_writer keeps erm to ensure that a sharder is valid. If we repair a tablet, erm blocks the state machine and no operation on any tablet of this table might be performed.

Use auto_refreshing_sharder and topology_guard to ensure that the operation is safe and that tablet operations on the whole table aren't blocked.

Fixes: #23453.

Needs backport to 2025.1 that introduces the tablet repair scheduler.

- (cherry picked from commit 1dc29ddc86)

- (cherry picked from commit bae6711809)

Parent PR: #23455

Closes scylladb/scylladb#23580

* github.com:scylladb/scylladb:
  \test: add test to check concurrent migration and repair of two different tablets
  repair: release erm in repair_writer_impl::create_writer when possible
scylla-2025.1.1 scylla-2025.1.1-candidate-20250408065609
2025-04-07 10:10:20 +03:00
Kefu Chai
9ccad33e59 .github: Make "make-pr-ready-for-review" workflow run in base repo
The "make-pr-ready-for-review" workflow was failing with an "Input
required and not supplied: token" error.  This was due to GitHub Actions
security restrictions preventing access to the token when the workflow
is triggered in a fork:
```
    Error: Input required and not supplied: token
```

This commit addresses the issue by:

- Running the workflow in the base repository instead of the fork. This
  grants the workflow access to the required token with write permissions.
- Simplifying the workflow by using a job-level `if` condition to
  controlexecution, as recommended in the GitHub Actions documentation
  (https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/using-conditions-to-control-job-execution).
  This is cleaner than conditional steps.
- Removing the repository checkout step, as the source code is not required for this workflow.

This change resolves the token error and ensures the
"make-pr-ready-for-review" workflow functions correctly.

Fixes scylladb/scylladb#22765

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22766

(cherry picked from commit ca832dc4fb)

Closes scylladb/scylladb#23561
2025-04-07 08:10:10 +03:00
Piotr Smaron
a17dd4d4c9 [Backport 2025.1] auth: forbid modifying system ks by non-superusers
Before this patch, granting a user MODIFY permissions on ALL KEYSPACES allowed the user to write to system tables, where the user could also set himself to "superuser" granting him all other permissions. After this patch, MODIFY permissions on ALL KEYSPACES is limited only to non-system keyspaces.

Fixes: scylladb/scylladb#23218
(cherry picked from commit fee50f287c)

Parent PR: #23219

Closes scylladb/scylladb#23594
2025-04-06 15:10:06 +03:00
Nadav Har'El
a2a4c6e4b2 test/alternator: increase timeout in Alternator RBAC test
On our testing infrastructure, tests often run a hundred times (!)
slower than usual, for various reasons that we can't always avoid.
This is why all our test frameworks drastically increase the default
timeouts.

We forgot to increase the timeout in one place - where Alternator tests
use CQL. This is needed for the Alternator role-based access control
(RBAC) tests, which is configured via CQL and therefore the Alternator
test unusually uses CQL.

So in this patch we increase the timeout of CQL driver used by
Alternator tests to the same high timeouts (60-120 seconds) used by
the regular CQL tests. As the famous saying goes, these timeouts should
be enough for anyone.

Fixes #23569.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23578

(cherry picked from commit a9a6f9eecc)

Closes scylladb/scylladb#23601
2025-04-06 11:49:46 +03:00
Avi Kivity
64182d9df6 Update seastar submodule (prefaulter leaving zombie threads)
* seastar a350b5d70e...6d8fccf14c (1):
  > smp: prefaulter: don't leave zombie worker threads

Fixes #23316
2025-04-05 22:28:53 +03:00
Pavel Emelyanov
8e85ef90d2 sstables_loader: Do not stop sharded<progress_monitor> unconditionally
The member in question is unconditionally .stop()-ed in task's
release_resources() method, however, it may happen that the thing wasn't
.start()-ed in the first place. Start happens in the middle of the
task's .run() method and there can be several reasons why it can be
skipped -- e.g. the task is aborted early, or collecting sstables from
S3 throws.

fixes: #23231

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23483

(cherry picked from commit 832d83ae4b)

Closes scylladb/scylladb#23557
2025-04-04 17:46:20 +03:00
Aleksandra Martyniuk
b5b2ffa5df \test: add test to check concurrent migration and repair of two different tablets
(cherry picked from commit bae6711809)
2025-04-04 10:14:51 +02:00
Andrzej Jackowski
b7f067ce33 audit: fix empty query string in BATCH query
Function modification_statement::add_raw() is never called, which
makes query string in audit_info of batch queries empty. In enterprise
branch, add_raw is called in Cql.g and those changes were never merged
to master.

This changes:
 - Add missing call of add_raw() to Cql.g
 - Include other related changes (from PR#3228 in scylla-enterprise)

Fixes scylladb#23311

Closes scylladb/scylladb#23315

(cherry picked from commit b8adbcbc84)

Closes scylladb/scylladb#23495
2025-04-03 16:46:33 +03:00
Aleksandra Martyniuk
307f00a398 repair: release erm in repair_writer_impl::create_writer when possible
Currently, repair_writer_impl::create_writer keeps erm to ensure
that a sharder is valid. If we repair a tablet, erm blocks the state
machine and no operation on any tablet of this table might be performed.

Use auto_refreshing_sharder and topology_guard to ensure that the
operation is safe and that tablet operations on the whole table
aren't blocked.

Fixes: #23453.
(cherry picked from commit 1dc29ddc86)
2025-04-03 13:19:40 +00:00
Dawid Mędrek
c56e47f72f db/hints: Cancel draining when stopping node
Draining hints may occur in one of the two scenarios:

* a node leaves the cluster and the local node drains all of the hints
  saved for that node,
* the local node is being decommissioned.

Draining may take some time and the hint manager won't stop until it
finishes. It's not a problem when decommissioning a node, especially
because we want the cluster to retain the data stored in the hints.
However, it may become a problem when the local node started draining
hints saved for another node and now it's being shut down.

There are two reasons for that:

* Generally, in situations like that, we'd like to be able to shut down
  nodes as fast as possible. The data stored in the hints won't
  disappear from the cluster yet since we can restart the local node.
* Draining hints may introduce flakiness in tests. Replaying hints doesn't
  have the highest priority and it's reflected in the scheduling groups we
  use as well as the explicitly enforced throughput. If there are a large
  number of hints to be replayed, it might affect our tests.
  It's already happened, see: scylladb/scylladb#21949.

To solve those problems, we change the semantics of draining. It will behave
as before when the local node is being decommissioned. However, when the
local node is only being stopped, we will immediately cancel all ongoing
draining processes and stop the hint manager. To amend for that, when we
start a node and it initializes a hint endpoint manager corresponding to
a node that's already left the cluster, we will begin the draining process
of that endpoint manager right away.

That should ensure all data is retained, while possibly speeding up
the shutdown process.

There's a small trade-off to it, though. If we stop a node, we can then
remove it. It won't have a chance to replay hints it might've before
these changes, but that's an edge case. We expect this commit to bring
more benefit than harm.

We also provide tests verifying that the implementation works as intended.

Fixes scylladb/scylladb#21949

Closes scylladb/scylladb#22811

(cherry picked from commit 0a6137218a)

Closes scylladb/scylladb#23370
2025-04-03 09:09:05 +02:00
Tomasz Grabiec
51ee15f02d Merge '[Backport 2025.1] tablets: Make load balancing capacity-aware' from Tomasz Grabiec
Before this patch, the load balancer was equalizing tablet count per
shard, so it achieved balance assuming that:
 1) tablets have the same size
 2) shards have the same capacity

That can cause imbalance of utilization if shards have different
capacity, which can happen in heterogeneous clusters with different
instance types. One of the causes for capacity difference is that
larger instances run with fewer shards due to vCPUs being dedicated to
IRQ handling. This makes those shards have more disk capacity, and
more CPU power.

After this patch, the load balancer equalizes shard's storage
utilization, so it no longer assumes that shards have the same
capacity. It still assumes that each tablet has equal size. So it's a
middle step towards full size-aware balancing.

One consequence is that to be able to balance, the load balancer need
to know about every node's capacity, which is collected with the same
RPC which collects load_stats for average tablet size. This is not a
significant set back because migrations cannot proceed anyway if nodes
are down due to barriers. We could make intra-node migration
scheduling work without capacity information, but it's pointless due
to above, so not implemented.

Also, per-shard goal for tablet count is still the same for all nodes in the cluster,
so nodes with less capacity will be below limit and nodes with more capacity will
be slightly above limit. This shouldn't be a significant problem in practice, we could
compensate for this by increasing the limit.

Fixes #23042

* github.com:scylladb/scylladb:
  tablets: Make load balancing capacity-aware
  topology_coordinator: Fix confusing log message
  topology_coordinator: Refresh load stats after adding a new node
  topology_coordinator: Allow capacity stats to be refreshed with some nodes down
  topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places
  test: boost: tablets_test: Always provide capacity in load_stats
  test: perf_load_balancing: Set node capacity
  test: perf_load_balancing: Convert to topology_builder
  config, disk_space_monitor: Allow overriding capacity via config
  storage_service, tablets: Collect per-node capacity in load_stats
  test: tablets_test: Add support for auto-split mode
  test: cql_test_env: Expose db config

Closes scylladb/scylladb#23443

* github.com:scylladb/scylladb:
  Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec
  test: tablets_test: Add support for auto-split mode
  test: cql_test_env: Expose db config
scylla-2025.1.1-candidate-20250402022041
2025-04-01 20:31:05 +02:00
Jenkins Promoter
36bb089663 Update ScyllaDB version to: 2025.1.1 2025-04-01 14:18:18 +03:00
yangpeiyu2_yewu
1661b35050 mutation_writer/multishard_writer.cc: wrap writer into futurize_invoke
wrapped writer in seastar::futurize_invoke to make sure that the close() for the mutation_reader can be executed before destruction.

Fixes scylladb/scylladb#22790

Closes scylladb/scylladb#22812

(cherry picked from commit 0de232934a)

Closes scylladb/scylladb#22943
2025-04-01 13:46:27 +03:00
Asias He
8c93a331f7 repair: Enable small table optimization for system_replicated_keys
This enterprise-only system table is replicated and small. It should be
included for small table optimization.

Fixes scylladb/scylla-enterprise#5256

Closes scylladb/scylladb#23135

Closes scylladb/scylladb#23147
2025-04-01 13:36:51 +03:00
Calle Wilund
85c161b9f1 generic_server: Update conditions for is_broken_pipe_or_connection_reset
Refs scylla-enterprise#5185
Fixes #22901

If a tls socket gets EPIPE the error is not translated to a specific
gnutls error code, but only a generic ERROR_PULL/PUSH. Since we treat
EPIPE as ignorable for plain sockets, we need to unwind nested exception
here to detect that the error was in fact due to this, so we can suppress
log output for this.

Closes scylladb/scylladb#22888

(cherry picked from commit e49f2046e5)

Closes scylladb/scylladb#23045
2025-04-01 13:06:29 +03:00
Patryk Jędrzejczak
d088cc8a2d Merge '[Backport 2025.1] Fix a regression that sometimes causes an internal error and demote barrier_and_drain rpc error log to a warning ' from Scylladb[bot]
The series fixes a regression and demotes a barrier_and_drain logging error to a warning since this particular condition may happen during normal operation.

We want to backport both since one is a bug fix and another is trivial and reduces CI flakiness.

- (cherry picked from commit 1da7d6bf02)

- (cherry picked from commit fe45ea505b)

Parent PR: #22650

Closes scylladb/scylladb#22923

* https://github.com/scylladb/scylladb:
  topology_coordinator: demote barrier_and_drain rpc failure to warning
  topology_coordinator: read peers table only once during topology state application
2025-04-01 11:54:56 +02:00
Patryk Jędrzejczak
39c20144e5 Merge '[Backport 2025.1] raft topology: Add support for raft topology init to happen before group0 initialization' from Scylladb[bot]
In the current scenario, the problem discovered is that there is a time
gap between group0 creation and raft_initialize_discovery_leader call.
Because of that, the group0 snapshot/apply entry enters wrong values
from the disk(null) and updates the in-memory variables to wrong values.
During the above time gap, the in-memory variables have wrong values and
perform absurd actions.

This PR removes the variable `_manage_topology_change_kind_from_group0`
which was used earlier as a work around for correctly handling
`topology_change_kind` variable, it was brittle and had some bugs
(causing issues like scylladb/scylladb#21114). The reason for this bug
that _manage_topology_change_kind used to block reading from disk and
was enabled after group0 initialization and starting raft server for the
restart case. Similarly, it was hard to manage `topology_change_kind`
using `_manage_topology_change_kind_from_group0` correctly in bug free
anner.

Post `_manage_topology_change_kind_from_group0` removal, careful
management of `topology_change_kind` variable was needed for maintaining
correct `topology_change_kind` in all scenarios. So this PR also performs
a refactoring to populate all init data to system tables even before
group0 creation(via `raft_initialize_discovery_leader` function). Now
because `raft_initialize_discovery_leader` happens before the group 0
creation, we write mutations directly to system tables instead of a
group 0 command. Hence, post group0 creation, the node can read the
correct values from system tables and correct values are maintained
throughout.

Added a new function `initialize_done_topology_upgrade_state` which
takes care of updating the correct upgrade state to system tables before
starting group0 server. This ensures that the node can read the correct
values from system tables and correct values are maintained throughout.

By moving `raft_initialize_discovery_leader` logic to happen before
starting group0 server, and not as group0 command post server start, we
also get rid of the potential problem of init group0 command not being
the 1st command on the server. Hence ensuring full integrity as expected
by programmer.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#21114

- (cherry picked from commit 4748125a48)

- (cherry picked from commit e491950c47)

- (cherry picked from commit 623e01344b)

- (cherry picked from commit d7884cf651)

Parent PR: #22484

Closes scylladb/scylladb#22966

* https://github.com/scylladb/scylladb:
  storage_service: Remove the variable _manage_topology_change_kind_from_group0
  storage_service: fix indentation after the previous commit
  raft topology: Add support for raft topology system tables initialization to happen before group0 initialization
  service/raft: Refactor mutation writing helper functions.
2025-04-01 11:46:15 +02:00
Jenkins Promoter
f1e7cee7a5 Update pgo profiles - aarch64 2025-04-01 04:20:56 +03:00
Jenkins Promoter
023b27312d Update pgo profiles - x86_64 2025-04-01 04:08:00 +03:00
Anna Stuchlik
2ffbc81e19 doc: remove the outdated info on seeds-info
This commit removes the outdated information about seed nodes.
We no longer need it in the docs, as a) the documentation is versioned,
and b) the ScyllaDB Open Source 4.3 and ScyllaDB Enterprise 2021.1 versions
mentioned in the docs are no longer supported.

In addition, some clarification has been added to the existing sections.

Fixes https://github.com/scylladb/scylladb/issues/22400

Closes scylladb/scylladb#23282

(cherry picked from commit dbbf9e19e4)

Closes scylladb/scylladb#23327
2025-03-31 12:33:59 +02:00
Yaron Kaikov
88e548ed72 .github: add action to make PR ready for review when conflicts label was removed
Moving a PR out of draft is only allowed to users with write access,
adding a github action to switch PR to `ready for review` once the
`conflicts` label was removed

Closes scylladb/scylladb#22446

(cherry picked from commit ed4bfad5c3)

Closes scylladb/scylladb#23023
2025-03-31 13:22:04 +03:00
Tomasz Grabiec
975882a489 test: tablets: Fix flakiness due to ungraceful shutdown
The test fails sporadically with:

cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}

That's becase a server is stopped in the middle of the workload.

The server is stopped ungracefully which will cause some requests to
time out. We should stop it gracefully to allow in-flight requests to
finish.

Fixes #20492

Closes scylladb/scylladb#23451

(cherry picked from commit 8e506c5a8f)

Closes scylladb/scylladb#23469
2025-03-28 14:56:02 +01:00
Evgeniy Naydanov
3653662099 test.py: random_failures: deselect topology ops for some injections
After recent changes #18640 and #19151 started to reproduce for
stop_after_sending_join_node_request and
stop_after_bootstrapping_initial_raft_configuration error injections too.

The solution is the same: deselect the tests.

Fixes #23302

Closes scylladb/scylladb#23405

(cherry picked from commit 574c81eac6)

Closes scylladb/scylladb#23460
2025-03-27 13:19:59 +02:00
Anna Stuchlik
7336bb38fa doc: fix product names in the 2025.1 upgrage guides
This commit fixes the product names in the upgrade 2025.1 guides so that:

- 6.2 is preceded with "ScyllaDB Open Source"
- 2024.x is preceded with "ScyllaDB Enterprise"
- 2025.1 is preceded with "ScyllaDB"

Fixes https://github.com/scylladb/scylladb/issues/23154

Closes scylladb/scylladb#23223

(cherry picked from commit cd61f60549)

Closes scylladb/scylladb#23328
2025-03-27 11:58:01 +02:00
Avi Kivity
cff90755d8 Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec
Before this patch, the load balancer was equalizing tablet count per
shard, so it achieved balance assuming that:
 1) tablets have the same size
 2) shards have the same capacity

That can cause imbalance of utilization if shards have different
capacity, which can happen in heterogeneous clusters with different
instance types. One of the causes for capacity difference is that
larger instances run with fewer shards due to vCPUs being dedicated to
IRQ handling. This makes those shards have more disk capacity, and
more CPU power.

After this patch, the load balancer equalizes shard's storage
utilization, so it no longer assumes that shards have the same
capacity. It still assumes that each tablet has equal size. So it's a
middle step towards full size-aware balancing.

One consequence is that to be able to balance, the load balancer need
to know about every node's capacity, which is collected with the same
RPC which collects load_stats for average tablet size. This is not a
significant set back because migrations cannot proceed anyway if nodes
are down due to barriers. We could make intra-node migration
scheduling work without capacity information, but it's pointless due
to above, so not implemented.

Also, per-shard goal for tablet count is still the same for all nodes in the cluster,
so nodes with less capacity will be below limit and nodes with more capacity will
be slightly above limit. This shouldn't be a significant problem in practice, we could
compensate for this by increasing the limit.

Refs #23042

Closes scylladb/scylladb#23079

* github.com:scylladb/scylladb:
  tablets: Make load balancing capacity-aware
  topology_coordinator: Fix confusing log message
  topology_coordinator: Refresh load stats after adding a new node
  topology_coordinator: Allow capacity stats to be refreshed with some nodes down
  topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places
  test: boost: tablets_test: Always provide capacity in load_stats
  test: perf_load_balancing: Set node capacity
  test: perf_load_balancing: Convert to topology_builder
  config, disk_space_monitor: Allow overriding capacity via config
  storage_service, tablets: Collect per-node capacity in load_stats

(cherry picked from commit b1d9f80d85)
2025-03-25 23:16:35 +01:00
Tomasz Grabiec
3be469da29 test: tablets_test: Add support for auto-split mode
rebalance_tablets() was performing migrations and merges automatically
but not splits, because splits need to be acked by replicas via
load_stats. It's inconvenient in tests which want to rebalance to the
equilibrium point. This patch changes rebalance_tablets() to split
automatically by default, can be disabled for tests which expect
differently.

shared_load_stats was introduced to provide a stable holder of
load_stats which can be reused across rebalance_tablets() calls.

(cherry picked from commit 5e471c6f1b)
2025-03-25 18:23:22 +01:00
Tomasz Grabiec
1895724465 test: cql_test_env: Expose db config
(cherry picked from commit f3b63bfeff)
2025-03-25 18:22:32 +01:00
Jenkins Promoter
9dca28d2b8 Update ScyllaDB version to: 2025.1.0 scylla-2025.1.0 scylla-2025.1.0-candidate-20250327024402 2025-03-25 09:19:12 +02:00
Avi Kivity
bc98301783 Merge '[Backport 2025.1] repair: allow concurrent repair and migration of two different tablets' from Aleksandra Martyniuk
Do not hold erm during repair of a tablet that is started with tablet
repair scheduler. This way two different tablets can be repaired
and migrated concurrently. The same tablet won't be migrated while
being repaired as it is provided by topology coordinator.

Use topology_guard to maintain safety.

Fixes: https://github.com/scylladb/scylladb/issues/22408.

Needs backport to 2025.1 that introduces the tablet repair scheduler.

Closes scylladb/scylladb#23362

* github.com:scylladb/scylladb:
  test: add test to check concurrent tablets migration and repair
  repair: do not hold erm for repair scheduled by scheduler
  repair: get total rf based on current erm
  repair: make shard_repair_task_impl::erm private
  repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary
  repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary
  repair: pass session_id to repair_writer_impl::create_writer
  repair: keep materialized topology guard in shard_repair_task_impl
  repair: pass session_id to repair_meta
scylla-2025.1.0-rc4 scylla-2025.1.0-rc4-candidate-20250324121000
2025-03-23 20:14:53 +02:00
Avi Kivity
220bbcf329 Merge '[Backport 2025.1] cql3: Introduce RF-rack-valid keyspaces' from Scylladb[bot]
This PR is an introductory step towards enforcing
RF-rack-valid keyspaces in Scylla.

The scope of changes:
* defining RF-rack-valid keyspaces,
* introducing a configuration option enforcing RF-rack-valid
  keyspaces,
* restricting the CREATE and ALTER KEYSPACE statements
  so that they never lead to RF-rack invalid keyspaces,
* during the initialization of a node, it verifies that all existing
  keyspaces are RF-rack-valid. If not, the initialization fails.

We provide tests verifying that the changes behave as intended.

---

Note that there are a number of things that still need to be implemented.
That includes, for instance, restricting topology operations too.

---

Implementation strategy (going beyond the scope of this PR):

1. Introduce the new configuration option `rf_rack_valid_keyspaces`.
2. Start enforcing RF-rack-validity in keyspaces if the option is enabled.
3. Adjust the tests: in the tree and out of it. Explicitly enable the option in all tests.
4. Once the tests have been adjusted, change the default value of the option to enabled.
5. Stop explicitly enabling the option in tests.
6. Get rid of the option.

---

Fixes scylladb/scylladb#20356
Fixes scylladb/scylladb#23276
Fixes scylladb/scylladb#23300

---

Backport: this is part of the requirements for releasing 2025.1.

- (cherry picked from commit 32879ec0d5)

- (cherry picked from commit 41f862d7ba)

- (cherry picked from commit 0e04a6f3eb)

Parent PR: #23138

Closes scylladb/scylladb#23398

* github.com:scylladb/scylladb:
  main: Refuse to start node when RF-rack-invalid keyspace exists
  cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces
  db/config: Introduce RF-rack-valid keyspaces
2025-03-23 16:16:29 +02:00
Dawid Mędrek
ecdefe801c main: Refuse to start node when RF-rack-invalid keyspace exists
When a node is started with the option `rf_rack_valid_keyspaces`
enabled, the initialization will fail if there is an RF-rack-invalid
keyspace. We want to force the user to adjust their existing
keyspaces when upgrading to 2025.* so that the invariant that
every keyspace is RF-rack-valid is always satisfied.

Fixes scylladb/scylladb#23300

(cherry picked from commit 0e04a6f3eb)
2025-03-21 12:27:04 +00:00
Dawid Mędrek
af2215c2d2 cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces
In this commit, we refuse to create or alter a keyspace when that operation
would make it RF-rack-invalid if the option `rf_rack_valid_keyspaces` is
enabled.

We provide two tests verifying that the changes work as intended.

Fixes scylladb/scylladb#23276

(cherry picked from commit 41f862d7ba)
2025-03-21 12:27:04 +00:00
Dawid Mędrek
864528eb9b db/config: Introduce RF-rack-valid keyspaces
We introduce a new term in the glossary: RF-rack-valid keyspace.

We also highlight in our user documentation that all keyspaces
must remain RF-rack-valid throughout their lifetime, and failing
to guarantee that may result in data inconsistencies or other
issues. We base that information on our experience with materialized
views in keyspaces using tablets, even though they remain
an experimental feature.

Along with the new term, we introduce a new configuration option
called `rf_rack_valid_keyspaces`, which, when enabled, will enforce
preserving all keyspaces RF-rack-valid. That functionality will be
implemented in upcoming commits. For now, we materialize the
restriction in form of a named requirement: a function verifying
that the passed keyspace is RF-rack-valid.

The option is disabled by default. That will change once we adjust
the existing tests to the new semantics. Once that is done, the option
will first be enabled by default, and then it will be removed.

Fixes scylladb/scylladb#20356

(cherry picked from commit 32879ec0d5)
2025-03-21 12:27:04 +00:00
Aleksandra Martyniuk
5153b91514 test: add test to check concurrent tablets migration and repair
Add a test to check whether a tablet can be migrated while another
tablet is repaired.

(cherry picked from commit 20f9d7b6eb)
2025-03-19 10:15:19 +01:00
Aleksandra Martyniuk
0a0347cb4e repair: do not hold erm for repair scheduled by scheduler
Do not hold erm	for tablet repair scheduled by scheduler. Thanks to
that one tablet repair won't exclude migration of other tablets.

Concurrent repair and migration of the same tablet isn't possible,
since a tablet can be in one type of transition only at the time.
Hence the change is safe.

Refs: https://github.com/scylladb/scylladb/issues/22408.
(cherry picked from commit 5b792bdc98)
2025-03-19 10:09:51 +01:00
Aleksandra Martyniuk
da64c02b92 repair: get total rf based on current erm
Get total rf based on erm. Currently, it does not change anything
because erm stays the same during the whole repair.

(cherry picked from commit a1375896df)
2025-03-19 10:09:30 +01:00
Aleksandra Martyniuk
39aabe5191 repair: make shard_repair_task_impl::erm private
Make shard_repair_task_impl::erm private. Access it with getter.

(cherry picked from commit 34cd485553)
2025-03-19 10:09:11 +01:00
Aleksandra Martyniuk
9eeff8573b repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary
When small_table_optimization isn't enabled, put_row_diff_with_rpc_stream
does not access erm. Pass small_table_optimization_params containing erm
only when small_table_optimization is enabled.

This is safe as erm is kept by shard_repair_task_impl.

(cherry picked from commit 444c7eab90)
2025-03-19 10:08:22 +01:00
Aleksandra Martyniuk
4115f6f367 repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary
When small_table_optimization isn't enabled, flush_rows_in_working_row_buf
does not access erm. Add small_table_optimization_params containing erm and
pass it only when small_table_optimization is enabled.

This is safe as erm is kept by shard_repair_task_impl.

(cherry picked from commit e56bb5b6e2)
2025-03-19 10:07:45 +01:00
Aleksandra Martyniuk
fb2c46dfbe repair: pass session_id to repair_writer_impl::create_writer
(cherry picked from commit 09c74aa294)
2025-03-19 10:07:00 +01:00
Aleksandra Martyniuk
b4e37600d6 repair: keep materialized topology guard in shard_repair_task_impl
Keep materialized topology guard in shard_repair_task_impl and check
it in check_in_abort_or_shutdown and before each range repair.

(cherry picked from commit 47bb9dcf78)
2025-03-19 10:04:17 +01:00
Aleksandra Martyniuk
6bbf20a440 repair: pass session_id to repair_meta
Pass session_id of tablet repair down the stack from the repair request
to repair_meta.

The session_id will be utiziled in the following patches.

(cherry picked from commit 928f92c780)
2025-03-19 10:02:24 +01:00
Botond Dénes
b8797551eb Merge '[Backport 2025.1] Rack aware tablet merge colocation migration ' from Tomasz Grabiec
service: Introduce rack-aware co-location migrations for tablet merge

Merge co-location can emit migrations across racks even when RF=#racks,
reducing availability and affecting consistency of base-view pairing.

Given replica set of sibling tablets T0 and T1 below:
[T0: (rack1,rack3,rack2)]
[T1: (rack2,rack1,rack3)]

Merge will co-locate T1:rack2 into T0:rack1, T1 will be temporarily only at
only a subset of racks, reducing availability.

This is the main problem fixed by this patch.

It also lays the ground for consistent base-view replica pairing,
which is rack-based. For tables on which views can be created we plan
to enforce the constraint that replicas don't move across racks and
that all tablets use the same set of racks (RF=#racks). This patch
avoids moving replicas across racks unless it's necessary, so if the
constraint is satisfied before merge, there will be no co-locating
migrations across racks. This constraint of RF=#racks is not enforced
yet, it requires more extensive changes.

Fixes #22994.
Refs #17265.

This patch is based on Raphael's work done in PR #23081. The main differences are:

1) Instead of sorting replicas by rack, we try to find
    replicas in sibling tablets which belong to the same rack.
    This is similar to how we match replicas within the same host.
    It reduces number of across-rack migrations even if RF!=#racks,
    which the original patch didn't handle.
    Unlike the original patch, it also avoids rack-overloaded in case
    RF!=#racks

2) We emit across-rack co-locating migrations if we have no other choice
   in order to finalize the merge

   This is ok, since views are not supported with tablets yet. Later,
   we will disallow this for tables which have views, and we will
   allow creating views in the first place only when no such migrations
   can happen (RF=#racks).

3) Added boost unit test which checks that rack overload is avoided during merge
   in case RF<#racks

4) Moved logging of across-rack migration to debug level

5) Exposed metric for across-rack co-locating migrations

(cherry picked from commit af949f3b6a)

Also backports dependent patches:
  - locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
  - locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
  - Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec

Closes scylladb/scylladb#22657
Closes scylladb/scylladb#22652

Closes scylladb/scylladb#23297

* github.com:scylladb/scylladb:
  service: Introduce rack-aware co-location migrations for tablet merge
  Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec
  locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
  locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
2025-03-18 16:22:29 +02:00
Nadav Har'El
b1cf1890a9 alternator: document the state of tablet support in Alternator
In commit c24bc3b we decided that creating a new table in Alternator
will by default use vnodes - not tablets - because of all the missing
features in our tablets implementation that are important for
Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag
`experimental:initial_tablets` which allows to override this decision
and create an Alternator table using tablets. We also never documented
what exactly doesn't work when Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md
(which is a good place for describing the `experimental:initial_tablets`
tag). The patch also adds a new test file, test_tablets.py, which
includes tests for all the statements made in the document regarding
how `experimental:initial_tablets` works and what works or doesn't
work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets -
are moved to the new test file.

When the tablets feature will finally be completed, both the document
and the tests will need to be modified (some of the tests should be
outright deleted). But it seems this will not happen for at least
several months, and that is too long to wait without accurate
documentation.

Fixes #21629

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22462

(cherry picked from commit c0821842de)

Closes scylladb/scylladb#23298
2025-03-16 18:25:21 +02:00
Jenkins Promoter
2f0ebe9f49 Update pgo profiles - aarch64 2025-03-15 04:21:14 +02:00
Jenkins Promoter
3633fb9ff8 Update pgo profiles - x86_64 2025-03-15 04:13:25 +02:00
Raphael S. Carvalho
33b5f27057 service: Introduce rack-aware co-location migrations for tablet merge
Merge co-location can emit migrations across racks even when RF=#racks,
reducing availability and affecting consistency of base-view pairing.

Given replica set of sibling tablets T0 and T1 below:
[T0: (rack1,rack3,rack2)]
[T1: (rack2,rack1,rack3)]

Merge will co-locate T1:rack2 into T0:rack1, T1 will be temporarily only at
only a subset of racks, reducing availability.

This is the main problem fixed by this patch.

It also lays the ground for consistent base-view replica pairing,
which is rack-based. For tables on which views can be created we plan
to enforce the constraint that replicas don't move across racks and
that all tablets use the same set of racks (RF=#racks). This patch
avoids moving replicas across racks unless it's necessary, so if the
constraint is satisfied before merge, there will be no co-locating
migrations across racks. This constraint of RF=#racks is not enforced
yet, it requires more extensive changes.

Fixes #22994.
Refs #17265.

This patch is based on Raphael's work done in PR #23081. The main differences are:

1) Instead of sorting replicas by rack, we try to find
    replicas in sibling tablets which belong to the same rack.
    This is similar to how we match replicas within the same host.
    It reduces number of across-rack migrations even if RF!=#racks,
    which the original patch didn't handle.
    Unlike the original patch, it also avoids rack-overloaded in case
    RF!=#racks

2) We emit across-rack co-locating migrations if we have no other choice
   in order to finalize the merge

   This is ok, since views are not supported with tablets yet. Later,
   we will disallow this for tables which have views, and we will
   allow creating views in the first place only when no such migrations
   can happen (RF=#racks).

3) Added boost unit test which checks that rack overload is avoided during merge
   in case RF<#racks

4) Moved logging of across-rack migration to debug level

5) Exposed metric for across-rack co-locating migrations

(cherry picked from commit af949f3b6a)

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
2025-03-14 20:02:33 +01:00
Anna Stuchlik
11ecc886c3 doc: Remove "experimental" from ALTER KEYSPACE with Tablets
Altering a keyspace with tablets is no longer experimental.
This commit removes the "Experimental" label from the feature.

Fixes https://github.com/scylladb/scylladb/issues/23166

Closes scylladb/scylladb#23183

(cherry picked from commit 562b5db5b8)

Closes scylladb/scylladb#23274
2025-03-14 13:57:55 +01:00