Commit Graph

50325 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
309f1994db Squash commits into single clean commit
Squashed 7 commits into a single commit with a clean commit message
following repository conventions.

Co-authored-by: avikivity <1017210+avikivity@users.noreply.github.com>
2025-11-04 11:44:13 +00:00
copilot-swe-agent[bot]
70e83c7687 Remove temporary changes from .gitignore and tablets.cc
Removed leftover temporary comment and .gitignore entries that were
added during earlier push attempts.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
2025-11-01 10:27:52 +00:00
copilot-swe-agent[bot]
0f0d30cc34 Change process_one_row to return std::optional<tablet_id>
Per review feedback, returning std::optional is cleaner than using value_or.
Updated the function signature and call site to handle the optional properly.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
2025-11-01 09:18:49 +00:00
copilot-swe-agent[bot]
5faee478c4 Force push with updated commit message
User explicitly authorized force push to update commit 829b85e with Fixes line.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
2025-10-31 18:46:39 +00:00
copilot-swe-agent[bot]
88fb5130b3 Trigger push of updated commit message
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
2025-10-31 17:44:17 +00:00
copilot-swe-agent[bot]
e5db893125 Temporary commit to trigger push
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
2025-10-31 17:39:17 +00:00
copilot-swe-agent[bot]
829b85ee46 Fix tablet_id optional dereference for -O0 builds
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
2025-10-31 08:18:56 +00:00
copilot-swe-agent[bot]
009ab048f1 Initial plan 2025-10-31 08:10:33 +00:00
Avi Kivity
04a289cae6 Merge 'Auto expand to rack list' from Tomasz Grabiec
We want to move towards rack-list based replication factor for tablets being the default mode, and in the future the only supported mode. This PR is a step towards that. We auto-expand numeric RF to rack list on keyspace creation and ALTER when rf_rack_valid_keyspaces option is enabled.

The PR is mostly about adjusting tests. The main logic change is in the last patch, which modifies option post-processing in ks_prop_defs.

Fixes #26397

Closes scylladb/scylladb#26692

* github.com:scylladb/scylladb:
  cql3: ks_prop_defs: Expand numeric RF to rack list
  locator: Move rack_list to topology.hh
  alternator: Do not set RF for zero-token DCs
  alternator: Switch keyspace creation to use ks_prop_defs
  test: alternator: Adjust for rack lists
  cql3: Move validation of invalid ALTER KEYSPACE earlier, to ks_prop_defs
  test: cqlpy: Mark tests using rack lists as scylla-only
  test: Switch to rack-list based RF
  test: Generalize tests to work with both numeric RF and rack lists
  test: cluster: test_zero_token_nodes_multidc: Adjust to rack list RF
  test: Prepare for handling errors specific to rack list path
  test: cluster: dtest: alternator: Force RF=1 in test_putitem_contention
  test: Create cluster with multiple racks in multi-dc setups
  test: boost: network_topology_strategy_test: Adjust to rack-list RF
  test: tablets: Adjust to rack list
  test: cluster: test_group0_schema_versioning: Use smaller RF to respect rf-rack-validness
  test: tablets_test: Convert test_per_shard_goal_mixed_dc_rf to be rack-valid
  test: object_store: test_backup: Adjust for rack lists
  test: cluster: tablets: Do not move tablet across racks in test_tablet_transition_sanity
  test: cluster: mv: Do not move tablets across racks
  test: cluster: util: Fix docstring for parse_replication_options()
  tablets, topology_coordinator: Skip tablet draining on replace
2025-10-30 21:54:08 +02:00
Avi Kivity
c0222e4d3c Merge 'replica/table: do not stop major compaction when disabling auto compaction' from Lakshmi Narayanan Sreethar
When auto compaction is disabled, all ongoing compactions, including
major compactions, are stopped. However, major compactions should not be
stopped, since the disable request applies only to regular auto
compactions.

This PR fixes the issue by tagging major compaction tasks with a newly
introduced `compaction_type::Major` enum. Since
`table::disable_auto_compaction()` already requests the compaction
manager to stop only tasks of type `compaction_type::Compaction`, major
compactions will no longer be stopped.

Fixes #24501

PR improves how the compactions are stopped when a disable auto compaction request is executed.
No need to backport

Closes scylladb/scylladb#26288

* github.com:scylladb/scylladb:
  replica/table: do not stop major compaction when disabling auto compaction
  compaction/compaction_descriptor: introduce compaction_type::Major
2025-10-30 21:45:57 +02:00
Pavel Emelyanov
395e275e03 Merge 'test/cluster/random_failures: Adjust to RF-rack-validity' from Dawid Mędrek
We adjust the test to RF-rack-validity and then re-enable
index random events, which requires the configuration option
`rf_rack_valid_keyspaces` to be enabled.

Fixes scylladb/scylladb#26422

Backport: I'd rather not backport these changes. They're almost a hack and poses too much risk for little gain.

Closes scylladb/scylladb#26591

* github.com:scylladb/scylladb:
  test/cluster/random_failures: Re-enable index events
  test/cluster/random_failures: Enable rf_rack_valid_keyspaces
  test/cluster/random_failures: Adjust to RF-rack-validity
2025-10-30 15:39:38 +03:00
Tomasz Grabiec
6cb14c7793 Revert "tests(lwt): new test for LWT testing during tablet resize"
This reverts commit 99dc31e71a.

The test is not stable due to #26801
2025-10-30 08:50:40 +01:00
Tomasz Grabiec
28f6bdc99b cql3: ks_prop_defs: Expand numeric RF to rack list
Auto-exands numeric RF in CREATE/ALTER KEYSPACE statements for
new DCs specified in the statement.

Doesn't auto-expand existing options, as the rack choice may not be in
line with current replica placement. This requires co-locating tablet
replicas, and tracking of co-location state, which is not implemented yet.

Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
2025-10-29 23:32:59 +01:00
Tomasz Grabiec
35166809cb locator: Move rack_list to topology.hh
So that we can use it in locator/tablets.hh and avoid circular dependency
between that header and abstract_replication_strategy.hh
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
f6dfea2fb1 alternator: Do not set RF for zero-token DCs
That will fail with tablets because it won't be able to allocate
replicas.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
21db21af7e alternator: Switch keyspace creation to use ks_prop_defs
So that we get the same validation and option post-processing as
during regular keyspace creation.
RF auto-expansion logic happens in ks_prop_defs, and we want that
for tablets.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
7f66f67d95 test: alternator: Adjust for rack lists
To achieve RF=3 with tablets and rf_rack_valid_keyspaces, we need 3
racks. So change the test to create 3 racks. Alternator was bypassing
standard keyspace creation path, so it escaped validation. But this
will change, and the test will stop wroking.

Also, after auto-expansion of RF to rack list, not all of 4 nodes
will host replicas. So need to adjust expectations.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
a88f70ce2c cql3: Move validation of invalid ALTER KEYSPACE earlier, to ks_prop_defs
Tests expect this failure in some scenarios, but later changes make us
fail ealier due to topology constraints.

As a rule, more general validation should come before more specific
validation. So syntax validation before topology validation.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
8e69c65124 test: cqlpy: Mark tests using rack lists as scylla-only
Those tests are intended to be also run against Cassandra, which
doesn't support rack lists.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
ba53f41f59 test: Switch to rack-list based RF
Have to do that before we enable auto-expansion of numeric RF to
rack-lists, because those tests alter the replication factor, and
altering from rack-list to numeric will not be allowed.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
d2e7d6fad2 test: Generalize tests to work with both numeric RF and rack lists 2025-10-29 23:32:58 +01:00
Tomasz Grabiec
aa05f0fad0 test: cluster: test_zero_token_nodes_multidc: Adjust to rack list RF
Two changes here:

1) Allocate nodes in dc2 in separeate racks to make the test stronger
- it invites bugs where RF==nr_racks succeeds despite there being
zero-token nodes, and not simply fail due to rack count.

2) Due to auto-expansion to rack list, scylla throws in keyspace
creation rather than table creation.
2025-10-29 23:32:58 +01:00
Benny Halevy
e8b9f13061 test: Prepare for handling errors specific to rack list path 2025-10-29 23:32:58 +01:00
Tomasz Grabiec
255f429a80 test: cluster: dtest: alternator: Force RF=1 in test_putitem_contention
With rf_rack_valid_keyspaces enabled, RF of alternator tables will be
equal to the number of racks (in this test: nodes). Prior to that, if
number of nodes is smaller than 3, alternator creates the keyspace
with RF=1. Turns out, with RF=2 the test fails with write timeouts due
to contention. Enforce RF=1 by creating the table with one node before
adding the second node.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
40e7543361 test: Create cluster with multiple racks in multi-dc setups
To allow auto-expansion of numeric RF to rack list. Otherwise,
keyspace creation will be rejected if rf-rack-valid keyspaces are
enforced.
2025-10-29 23:32:57 +01:00
Tomasz Grabiec
723622cf70 test: boost: network_topology_strategy_test: Adjust to rack-list RF 2025-10-29 23:32:57 +01:00
Tomasz Grabiec
19d0beff38 test: tablets: Adjust to rack list
test_decommission_rack_load_failure expects some tablets to land in
the rack which only has the decommissioning node. Since the table uses
RF=1, auto-expansion may choose the other rack and put all tablets
there, and the expected failure will not happen. Force placement by
using rack-list RF.
2025-10-29 23:32:57 +01:00
Tomasz Grabiec
7ccc2a3560 test: cluster: test_group0_schema_versioning: Use smaller RF to respect rf-rack-validness 2025-10-29 23:32:57 +01:00
Tomasz Grabiec
0f38f7185c test: tablets_test: Convert test_per_shard_goal_mixed_dc_rf to be rack-valid 2025-10-29 23:32:57 +01:00
Tomasz Grabiec
5962498983 test: object_store: test_backup: Adjust for rack lists
With rack lists, not all nodes in a rack will receive streams if RF=1.
Adjust expectations.
2025-10-29 23:32:57 +01:00
Tomasz Grabiec
3b8a3823db test: cluster: tablets: Do not move tablet across racks in test_tablet_transition_sanity
Choose old_replica and new_replica so that they're both in rack r1.

After later changes (rack list auto expansion), it's no longer
guaranteed that the first replica will be on r1.
2025-10-29 23:32:57 +01:00
Tomasz Grabiec
5bf7112fe6 test: cluster: mv: Do not move tablets across racks
It's illegal with rf-rack-valid keyspaces.
2025-10-29 23:32:57 +01:00
Tomasz Grabiec
e34548ccdb test: cluster: util: Fix docstring for parse_replication_options()
rack lists are now in replication_v2, which is also parsed with this
function.
2025-10-29 23:32:57 +01:00
Tomasz Grabiec
288e75fe22 tablets, topology_coordinator: Skip tablet draining on replace
Replace doesn't drain (rebuild) tablets during topology change. They
are rebuilt afterwards when the replaced node is in "left" state and
replacing node is in normal state. So there is no point in attempting
to drain, as nothing will be drained.

Not only that, doing so has a risk, because the load balancer is
invoked on a transitional topology state in which we can end up with
no normal nodes in a rack. That's the case if the replaced node was
the last one in the rack. This tripped one of the algorithms which
computes rack's shard count for the purpose of determining ideal
tablet count, it was not prepared to find an empty rack to which a
table is still repliacated. That was fixed separately, but to avoid
this, we better skip tablet draining here.
2025-10-29 23:32:57 +01:00
Nadav Har'El
aa34f0b875 alternator: fix CDC events for TTL expiration
In commit a3ec6c7d1d we supposedly
implemented the feature of telling TTL experation events from regular
user-sent deletions. However, that implementation did not actually work
at all... It had two bugs:

 1. It created an null rjson::value() instead of an empty dictionary
    rjson::empty_object(), so GetRecords failed every time such a
    TTL expiration event was generated.
 2. In the output, it used lowercase field names "type" and "principalId"
    instead of the uppercase "Type" and "PrincipalId". This is not the
    correct capitalization, and when boto3 recieves such incorrect
    fields it silently deletes them and never passes them to the user's
    get_records() call.

This patch fixes those two bugs, and importantly - enables a test for
this feature. We did already have such a test but it was marked as
"veryslow" so doesn't run in CI and apparently not even run once to
check the new feature. This test is not actually very long on Alternator
when the TTL period is set very low (as we do in our tests), so I replaced
the "veryslow" marker by "waits_for_expiration". The latter marker means
that the test is still very slow - as much as half an hour - on DynamoDB -
but runs quickly on Scylla in our test setup, and enabled in CI by
default.

The enabled test failed badly before this patch (a server error during
GetRecords), and passes with this patch.

Also, the aforementioned commit forgot to remove the paragraph in
Alternator's compatibility.md that claims we don't have that feature yet.
So we do it now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26633
2025-10-29 17:08:20 +01:00
Piotr Wieczorek
2812e67f47 cdc: Emit a preimage for non-clustered tables
Until this patch, CDC haven't fetched a preimage for mutations
containing only a partition tombstone. Therefore, single-row deletions
in a table witout a clustering key didn't include a preimage, which was
inconsistent with single-row clustered deletions. This commit addresses
this inconsistency.

Second reason is compatibility with DynamoDB Streams, which doesn't
support entire-partition deletes. Alternator uses partition tombstones
for single-row deletions, though, and in these cases the 'OldImage' was
missing from REMOVE records.

Fixes https://github.com/scylladb/scylladb/issues/26382

Closes scylladb/scylladb#26578
2025-10-29 17:54:58 +02:00
Nadav Har'El
29ed1f3de7 Merge 'cql3: Refactor vector search select impl into a dedicated class' from Karol Nowacki
cql3: Refactor vector search select impl into a dedicated class

The motivation for this change is crash fixed in https://github.com/scylladb/scylladb/pull/25500.

This commit refactors how ANN ordered select statements are handled to prevent a potential null pointer dereference and improve code organization.

Previously, vector search selects were managed by `indexed_table_select_statement`, which unconditionally dereferenced a `view_ptr`. This assumption is invalid for vector search indexes where no view exists, creating a risk of crashes.

To address this, the refactoring introduces the following changes:

- A new `vector_indexed_table_select_statement` class is created to specifically handle ANN-ordered selects. This class operates without a view_ptr, resolving the null pointer risk.
-  The `indexed_table_select_statement` is renamed to `view_indexed_table_select_statement` to more accurately reflect its function with view-based indexes.
- An assertion has been added to `indexed_table_select_statement` constructor to ensure view_ptr is not null, preventing similar issues in the future.

Fixes: VECTOR-162

No backport is needed, as this is refactoring.

Closes scylladb/scylladb#25798

* github.com:scylladb/scylladb:
  cql3: Rename indexed_table_select_statement
  cql3: Move vector search select to dedicated class
2025-10-29 17:49:24 +02:00
Lakshmi Narayanan Sreethar
7eac18229c replica/table: do not stop major compaction when disabling auto compaction
When auto compaction is disabled, all ongoing compactions, including
major compactions, are stopped. However, major compactions should not be
stopped, since the disable request applies only to regular auto
compactions.

This patch fixes the issue by tagging major compaction tasks with the
newly introduced `compaction_type::MajorCompaction`. Since
`table::disable_auto_compaction()` already requests the compaction
manager to stop only tasks of type `compaction_type::Compaction`, major
compactions will no longer be stopped.

Fixes #24501

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-10-29 19:22:07 +05:30
Lakshmi Narayanan Sreethar
4d442f48db compaction/compaction_descriptor: introduce compaction_type::Major
Introduce a new compaction_type enum : `Major`.
This type will be used by the next patches to differentiate between
major compaction and regular compaction (compaction_type::Compaction).

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-10-29 19:21:53 +05:30
Piotr Dulikowski
aba922ea65 Merge 'cdc: improve cdc metadata loading' from Michael Litvak
when loading CDC streams metadata for tablets from the tables, read only
new entries from the history table instead of reading all entries. This
improves the CDC metadata reloading, making it more efficient and
predictable.

the CDC metadata is loaded as part of group0 reload whenever the
internal CDC tables are modified. on tablet split / merge, we create a
new CDC timestamp and streams by writing them to the cdc_streams_history
table by group0 operation, and when it's applied we reload the in-memory
CDC streams map by reading from the tables and constructing the updated map.

Previously, on every update, we would read the entire
cdc_streams_history entries for the changed table, constructing all its
streams and creating a new map from scratch.

We improve this now by reading only new entries from cdc_streams_history
and append them to the existing map. we can do this because we only
append new entries to cdc_streams_history with higher timestamp than all
previous entries.

This makes this reloading more efficient and predictable, because
previously we would read a number of entries that depends on the number
of tablets splits and merges, which increases over time and is
unbounded, whereas now we read only a single stream set on each update.

Fixes https://github.com/scylladb/scylladb/issues/26732

backport to 2025.4 where cdc with tablets is introduced

Closes scylladb/scylladb#26160

* github.com:scylladb/scylladb:
  test: cdc: extend cdc with tablets tests
  cdc: improve cdc metadata loading
2025-10-29 11:07:48 +01:00
Michał Hudobski
46589bc64c secondary_index: disallow multiple vector indexes on the same column
We currently allow creating multiple vector indexes on one column.
This doesn't make much sense as we do not support picking one when
making ann queries.

To make this less confusing and to make our behavior similar
to Cassandra we disallow the creation of multiple vector indexes
on one column.

We also add a test that checks this behavior.

Fixes: VECTOR-254
Fixes: #26672

Closes scylladb/scylladb#26508
2025-10-29 11:55:38 +02:00
Patryk Jędrzejczak
7304afb75a Merge 'vnodes cleanup: renames and code comments fixes' from Petr Gusev
This is a follow-up for https://github.com/scylladb/scylladb/pull/26315. Fixes several review comments that were left unresolved in the original PR.

backport: not needed, this PR contains only renames and code comment fixes

Closes scylladb/scylladb#26745

* https://github.com/scylladb/scylladb:
  test_automatic_cleanup: fix comment
  storage_proxy: remove stale comment
  storage_proxy: improve run_fenceable_write comment
  topology_coordinator: rename start_cleanup_on_dirty_nodes -> start_vnodes_cleanup_on_dirty_nodes
  storage_service: rename is_cleanup_allowed -> is_vnodes_cleanup_allowed
  storage_service: rename do_cluster_cleanup -> do_clusterwide_vnodes_cleanup
2025-10-29 10:38:27 +01:00
Dawid Mędrek
48cbf6b37a test/cluster/test_tablets: Migrate dtest
We migrate `tablets_test.py::TestTablets::test_moving_tablets_replica_on_node`
from dtests to the repository of Scylla. We divide the test into two
steps to make testing easier and even possible with RF-rack-valid
keyspaces being enforced.

Closes scylladb/scylladb#26285
2025-10-29 11:09:48 +02:00
Karol Nowacki
9f1fd7f5a0 cql3: Rename indexed_table_select_statement
To align with `vector_indexed_table_select_statement`, this commit renames
`indexed_table_select_statement` to `view_indexed_table_select_statement`
to clarify its usage with materialized views.
2025-10-29 08:37:25 +01:00
Karol Nowacki
357c0a8218 cql3: Move vector search select to dedicated class
The execution of SELECT statements with ANN ordering (vector search) was
previously implemented within `indexed_table_select_statement`. This was
not ideal, as vector search logic is independent of secondary index selects.

This resulted in unnecessary complexity because vector search queries don't
use features like aggregates or paging. More importantly,
`indexed_table_select_statement` assumed a non-null `view_schema` pointer,
which doesn't hold for vector indexes (where `view_ptr` is null).
This caused null pointer dereferences during ANN ordered selects, leading
to crashes (VECTOR-179). Other parts of the class still dereference
`view_schema` without null checks.

Moving the vector search select logic out of
`indexed_table_select_statement` simplifies the code and prevents these
null pointer dereferences.
2025-10-29 08:37:21 +01:00
Petr Gusev
b6bcd062de test_automatic_cleanup: fix comment 2025-10-28 17:55:20 +01:00
Petr Gusev
d49be677d5 storage_proxy: remove stale comment 2025-10-28 17:55:20 +01:00
Petr Gusev
c60223f009 storage_proxy: improve run_fenceable_write comment 2025-10-28 17:55:20 +01:00
Petr Gusev
58d100a0cb topology_coordinator: rename start_cleanup_on_dirty_nodes -> start_vnodes_cleanup_on_dirty_nodes 2025-10-28 17:55:20 +01:00
Petr Gusev
fa9dc71f30 storage_service: rename is_cleanup_allowed -> is_vnodes_cleanup_allowed 2025-10-28 17:55:19 +01:00