Commit Graph

49933 Commits

Author SHA1 Message Date
Piotr Dulikowski
fe7ffc5e5d Merge 'service/qos: set long timeout for auth queries on SL cache update' from Michael Litvak
pass an appropriate query state for auth queries called from service
level cache reload. we use the function qos_query_state to select a
query_state based on caller context - for internal queries, we set a
very long timeout.

the service level cache reload is called from group0 reload. we want it
to have a long timeout instead of the default 5 seconds for auth
queries, because we don't have strict latency requirement on the one
hand, and on the other hand a timeout exception is undesired in the
group0 reload logic and can break group0 on the node.

Fixes https://github.com/scylladb/scylladb/issues/25290

backport possible to improve stability

Closes scylladb/scylladb#26180

* github.com:scylladb/scylladb:
  service/qos: set long timeout for auth queries on SL cache update
  auth: add query_state parameter to query functions
  auth: refactor query_all_directly_granted
2025-10-08 12:37:01 +02:00
Dawid Mędrek
a9577e4d52 replica/database: Fix description of validate_tablet_views_indexes
The current description is not accurate: the function doesn't throw
an exception if there's an invalid materialized view. Instead, it
simply logs the keyspaces that violate the requirement.

Furthermore, the experimental feature `views-with-tablets` is no longer
necessary for considering a materialized view as valid. It was dropped
in scylladb/scylladb@b409e85c20. The
replacement for it is the cluster feature `VIEWS_WITH_TABLETS`.

Fixes scylladb/scylladb#26420

Closes scylladb/scylladb#26421
2025-10-07 17:39:43 +02:00
Artsiom Mishuta
99455833bd test.py: reintroducing sudo in resource_gather.py
conditionally reintroducing sudo for resource gathering
when running under docker

related: https://github.com/scylladb/scylladb/pull/26294#issuecomment-3346968097

fixes: https://github.com/scylladb/scylladb/issues/26312

Closes scylladb/scylladb#26401
2025-10-07 14:42:15 +02:00
Piotr Dulikowski
264cf12b66 Merge 'view building coordinator - add missing tests' from Michał Jadwiszczak
This patch adds tests for:
- tablet migration during view building
- tablet merge during view building.

Those tests were missing from the original testing plan.

We want to backport it to 2025.4 to ensure the release is bug-free.

Closes scylladb/scylladb#26414

* github.com:scylladb/scylladb:
  test/cluster/test_view_building_coordinator: add test for tablet merge
  test/cluster/test_view_building_coordinator: add test for tablet migration
2025-10-07 14:25:04 +02:00
Botond Dénes
8b0bfb817e Merge 'Switch REST API server to use content-streaming' from Pavel Emelyanov
Seastar httpd recommended users to stop using contiguous requet.content string and read body they need from request's input_stream instead. However, "official" deprecation of request content had been only made recently.

This PR patches REST API server to turn this feature on and patches few handlers that mess with request bodies to read them from request stream.

Using newer seastar API, no need to backport

Closes scylladb/scylladb#26418

* github.com:scylladb/scylladb:
  api: Switch to request content streaming
  api: Fix indentation after previous patch
  api: Coroutinize set_relabel_config handler
  api: Coroutinize set_error_injection handler
2025-10-07 14:13:47 +03:00
Michał Chojnowski
3cf51cb9e8 sstables: fix some typos in comments
I added those typos recently, and spellcheckers complain.

Closes scylladb/scylladb#26376
2025-10-07 13:20:06 +03:00
Botond Dénes
8beea931be Merge 'Remove system_keyspace from column_family API' from Pavel Emelyanov
This dependency reference is carried into column_family handlers block to make get_built_views handler work. However, the handler in question should live in view_builder block, because it works with v.b. data. This PR moves the handler there, while at it, coroutinizes it, and removes the no longer needed sys.ks. reference from column_family.

API dependencies cleanup work, no need to backport

Closes scylladb/scylladb#26381

* github.com:scylladb/scylladb:
  api: Fix indentation after previous patch
  api: Coroutinize get_built_indexes handler code
  api: Remove system_keyspace ref from column_family API block
  api: Move get_built_indexes from column_family to view_builder
2025-10-07 13:07:46 +03:00
Pavel Emelyanov
ed1c049c3b scripts: Add usage to pull_github_pr script
If mis-used, the script says

   error: unrecognized option: ..., see ./scripts/pull_github_pr.sh -h for usage

but if using the suggested -h option it prints just the same.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#26378
2025-10-07 10:28:35 +03:00
Lakshmi Narayanan Sreethar
b8042b66e3 cmake: replace -fvisibility=hidden compiler flag with -fvisibility-inlines-hidden
The PR #26154 dropped the `-fvisibility=hidden` compiler flag and
replaced it with `-fvisibility-inlines-hidden` as the former caused
issues in how the `noncopyable_function::operator bool` method executed
leading to incorrect return values. Apply the same fix to cmake.

Fixes #26391

Closes scylladb/scylladb#26431
2025-10-07 10:10:47 +03:00
Pavel Emelyanov
127afd4da1 api: Switch to request content streaming
There are three handler that need to be patched all at once with the
server itself being marked with set_content_streaming

For two simple handler just get the content string with
read_entire_stream_contiguous helper. This is what httpd server did
anyway.

The "start_restore" handler used the contiguous contents to parse json
from using rjson utility. This handler is patched to use
read_entire_stream() that returns a vector of temporary buffers. The
rjson parser has a helper to pars from that vector, so the change is
also optimization.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-06 16:43:26 +03:00
Pavel Emelyanov
2cfccdac5c api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-06 16:43:12 +03:00
Pavel Emelyanov
5668058cb0 api: Coroutinize set_relabel_config handler
Without the invoke_on_all lambda, for simplicity
Also keep indentation "broken" for the ease of review

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-06 16:42:30 +03:00
Pavel Emelyanov
5017a25c00 api: Coroutinize set_error_injection handler
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-06 16:42:14 +03:00
Patryk Jędrzejczak
67d48a459f raft topology: make the voter handler consider only group 0 members
In the Raft-based recovery procedure, we create a new group 0 and add
live nodes to it one by one. This means that for some time there are
nodes which belong to the topology, but not to the new group 0. The
voter handler running on the recovery leader incorrectly considers these
nodes while choosing voters.

The consequences:
- misleading logs, for example, "making servers {<ID of a non-member>}
  voters", where the non-member won't become a voter anyway,
- increased chance of majority loss during the recovery procedure, for
  example, all 3 nodes that first joined the new group 0 are in the same
  dc and rack, but only one of them becomes a voter because the voter
  handler tries to make non-members in other dcs/racks voters.

Fixes #26321

Closes scylladb/scylladb#26327
2025-10-06 16:27:47 +03:00
Pavel Emelyanov
8002ddf946 code: Use tls_options::bye_timeout instead of deprecated switch
Some code wants its TLS sockets to close immediately without sending BYE
message and waiting for the response. Recent seastar update changed the
way this functionality is requested (scylladb/seastar#2986)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#26253
2025-10-06 16:25:35 +03:00
Michał Jadwiszczak
279a8cbba3 test/cluster/test_view_building_coordinator: add test for tablet merge
The test pauses processing of the view building task and triggers tablet
merge.
2025-10-06 15:06:11 +02:00
Michał Jadwiszczak
fc7e5370a1 test/cluster/test_view_building_coordinator: add test for tablet migration
The test pauses processing of the view building task and migrates it to
another node.
2025-10-06 15:02:42 +02:00
Michał Chojnowski
3b338e36c2 utils/config_file: fix a missing allowed_values propagation in one of named_value constructors
In one of the constructors of `named_value`, the `allowed_values`
argument isn't used.

(This means that if some config entry uses this constructor,
the values aren't validated on the config layer,
and might give some lower layer a bad surprise).

Fix that.

Fixes scyllladb/scylladb#26371

Closes scylladb/scylladb#26196
2025-10-06 15:33:11 +03:00
Michał Chojnowski
dbddba0794 sstables/trie: actually apply BYPASS CACHE to index reads
BYPASS CACHE is implemented for `bti_index_reader` by
giving it its own private `cached_file` wrappers over
Partitions.db and Rows.db, instead of passing it
the shared `cached_file` owned by the sstable.

But due to an oversight, the private `cached_file`s aren't
constructed on top of the raw Partitions.db and Rows.db
files, but on top of `cached_file_impl` wrappers around
those files. Which means that BYPASS CACHE doesn't
actually do its job.

Tests based on `scylla_index_page_cache_*` metrics
and on CQL tracing still see the reads from the private
files as "cache misses", but those misses are served
from the shared cached files anyway, so the tests don't see
the problem. In this commit we extend `test_bti_index.py`
with a check that looks at reactor's `io_queue` metrics
instead, and catches the problem.

Fixes scylladb/scylladb#26372

Closes scylladb/scylladb#26373
2025-10-06 15:32:05 +03:00
Andrzej Jackowski
c3dd383e9e test: add reproduction of name reuse bug to service level tests
This commit adds a reproduction test for scylladb/scylladb#26190 to the
service levels test suite. Although the bug was fixed internally in
Seastar, the corner-case service level name reuse scenario should be
covered by tests to prevent regressions.

Refs: https://github.com/scylladb/scylladb/issues/26190

Closes scylladb/scylladb#26379
2025-10-06 14:19:22 +02:00
Piotr Dulikowski
380f243986 Merge ' Support replication factor rack list for tablet-based keyspaces' from Tomasz Grabiec
This change extends the CQL replication options syntax so the replication factor can be stated as a list of rack names.
For example: { 'mydatacenter': [ 'myrack1', 'myrack2', 'myrack4' ] }

Rack-list based RF can coexist with the old numerical RF, even in the same keyspace for different DCs.

Specifying the rack list also allows to add replicas on the specified racks (increasing the replication factor), or decommissioning certain racks from their replicas (by omitting them from the current datacenter rack-list). This will allow us to keep the keyspace rf-rack-valid, maintaining guarantees, while allowing adding/removing racks. In particular, this will allow us to add a new DC, which happens by incrementally increasing RF in that DC to cover existing racks.

Migration from numerical RF to rack-list is not supported yet. Migration from rack-list to numerical RF is not planned to be supported.

New feature, no backport required.

Co-authored with @bhalevy

Fixes https://github.com/scylladb/scylladb/issues/25269
Fixes https://github.com/scylladb/scylladb/issues/23525

Closes scylladb/scylladb#26358

* github.com:scylladb/scylladb:
  tablets: load_balancer: Recognize that tablets are confined to racks when computing desired tablet count
  locator: Make hasher for endpoint_dc_rack globally accessible
  test: tablets: Add test for replica allocation on rack list changes
  test: lib: topology_builder: generate unique rack names
  test: Add tests for rack list RF
  doc: Document rack-list replication factor
  topology_coordinator: Restore formatting
  topology_coordinator: Cancel keyspace alter on broader set of errors
  topology_coordinator: Make keyspace alter process options through as_ks_metadata_update()
  cql3: ks_prop_defs: Preserve old options
  cql3: ks_prop_defs: Introduce flattened()
  locator: Recognize rack list RF as valid in assert_rf_rack_valid_keyspace()
  tablet_allocator: Respect binding replicas to racks
  locator: network_topology_strategy: Respect rack list when reallocating tablets
  cql3: ks_prop_defs: Fail with more information when options are not in expected format
  locator, cql3: Support rack lists in replication options
  cql3: Fail early on vnode/tablet flavor alter
  cql3: Extract convert_property_map() out of Cql.g
  schema: Use definition from the header instead of open-coding it
  locator: Abstract obtaining the number of replicas from replication_strategy_config_option
  cql3, locator: Use type aliases for option maps
  locator: Add debug logging
  locator: Pass topology to replication strategy constructor
  abstract_replication_strategy, network_topology_strategy: add replication_factor_data class
2025-10-06 14:14:09 +02:00
Piotr Dulikowski
e7907b173a Merge 'db/view: Require rf_rack_valid_keyspaces when creating materialized view' from Dawid Mędrek
Materialized views are currently in the experimental phase and using them
in tablet-based keyspaces requires starting Scylla with an experimental feature,
`views-with-tablets`. Any attempts to create a materialized view or secondary
index when it's not enabled will fail with an appropriate error.

After considerable effort, we're drawing close to bringing views out of the
experimental phase, and the experimental feature will no longer be needed.
However, materialized views in tablet-based keyspaces will still be restricted,
and creating them will only be possible after enabling the configuration option
`rf_rack_valid_keyspaces`. That's what we do in this PR.

In this patch, we adjust existing tests in the tree to work with the new
restriction. That shouldn't have been necessary because we've already seemingly
adjusted all of them to work with the configuration option, but some tests hid
well. We fix that mistake now.

After that, we introduce the new restriction. What's more, when starting Scylla,
we verify that there is no materialized view that would violate the contract.
If there are some that do, we list them, notify the user, and refuse to start.

High-level implementation strategy:

1. Name the restrictions in form of a function.
2. Adjust existing tests.
3. Restrict materialized views by both the experimental feature
   and the configuration option. Add validation test.
4. Drop the requirement for the experimental feature. Adjust the added test
   and add a new one.
5. Update the user documentation.

Fixes scylladb/scylladb#23030

Backport: 2025.4, as we are aiming to support materialized views for tablets from that version.

Closes scylladb/scylladb#25802

* github.com:scylladb/scylladb:
  view: Stop requiring experimental feature
  db/view: Verify valid configuration for tablet-based views
  db/view: Require rf_rack_valid_keyspaces when creating view
  test/cluster/random_failures: Skip creating secondary indexes
  test/cluster/mv: Mark test_mv_rf_change as skipped
  test/cluster: Adjust MV tests to RF-rack-validity
  test/boost/schema_loader_test.cc: Explicitly enable rf_rack_valid_keyspaces
  db/view: Name requirement for views with tablets
2025-10-06 12:46:46 +02:00
Pavel Emelyanov
6ad8dc4a44 Merge 'root,replica: mv querier to replica/' from Botond Dénes
The querier object is a confusing one. Based on its name it should be in the query/ module and it is already in the query namespace. The query namespace is used for symbols which span the coordinator and replica, or that are mostly coordinator side. The querier is mainly in this namespace due to its similar name and because at the time it was introduced, namespace replica didn't exist yet. But this is a mistake which confuses people.
The querier is actually a completely replica-side logic, implementing the caching of the readers on the replica. Move it to the replica module and namespace to make this more clear.

Code cleanup, no backport.

Closes scylladb/scylladb#26280

* github.com:scylladb/scylladb:
  replica: move querier code to replica namespace
  root,replica: mv querier to replica/
2025-10-06 08:26:05 +03:00
Pavel Emelyanov
5cf9043d74 Merge 'sstables/sstable_directory: don't forget to delete other components when deleting TemporaryHashes.db' from Michał Chojnowski
TemporaryHashes.db is a temporary sstable component used during ms
sstable writes. It's different from other sstable components in that
it's not included in the TOC. Because of this, it has a special case in
the logic that deletes unfinished sstables on boot.
(After Scylla dies in the middle of a sstable write).

But there's a bug in that special case,
which causes Scylla to forget to delete other components from the same unfinished sstable.

The code intends only to delete the TemporaryHashes.db file from the
`_state->generations_found` multimap, but it accidentally also deletes
the file's sibling components from the multimap. Fix that.

Also, extend a related test so that it would catch the problem before the fix.

Fixes scylladb/scylladb#26393

Bugfix, needs backport to 2025.4.

Closes scylladb/scylladb#26394

* github.com:scylladb/scylladb:
  sstables/sstable_directory: don't forget to delete other components when deleting TemporaryHashes.db
  test/boost/database_test: fix two no-op distributed loader tests
2025-10-06 08:23:03 +03:00
Andrzej Jackowski
3411089f5d treewide: seastar module update
The reason for this seastar update is fixing #26190 - a service
level bug caused by a problem in scheduling group in seastar
implementation (seastar#2992).

* ./seastar 9c07020a...270476e7 (10):
  > core: restore seastar_logger namespace in try_systemwide_memory_barrier
  > Merge 'coroutines: support coroutines that copy their captures into the coroutine frame' from Avi Kivity
    coroutines: advertise lambda-capture-by-value and test it
    future: invoke continuation functions as temporaries
    future: handle lvalue references in future continuations early
  > resource: Tune up some allocate_io_queues() arguments
  > Merge 'Add perf test hooks' from Travis Downs
    perf_tests:add tests to verify pre-run hooks
    per_tests: add pre-run hooks
    perf-tests.md: update on measurement overhead
    perf_tests_perf: a few more test variations
    remove vestigial register_test method
  > Add `touch` command to `rl` file processing
  > Merge 'execution_stage: update stage name on scheduling_group rename' from Andrzej Jackowski
    test: add sg_rename_recreate_with_the_same_name
    test: add test_renaming_execution_stage in metric_test
    test: add test_execution_stage_rename
    execution_stage: update stage name on scheduling_group rename
    execution_stage: reorganize per_group_stage_type
    execution_stage: add concrete_execution_stage_base
    execution_stage: move metrics setup to a separate method
  > iotune: Fix warmup calculation bug and botched rebase
  > Add missing `#pragma once` to ascii.rl
  > iotune: Ignore measurements during warmup period

Fixes: https://github.com/scylladb/scylladb/issues/26190

Closes scylladb/scylladb#26388
2025-10-06 08:13:37 +03:00
Michał Chojnowski
6efb807c1a sstables/sstable_directory: don't forget to delete other components when deleting TemporaryHashes.db
TemporaryHashes.db is a temporary sstable component used during ms
sstable writes. It's different from other sstable components in that
it's not included in the TOC. Because of this, it has a special case in
the logic that deletes unfinished sstables on boot.
(After Scylla dies in the middle of a sstable write).

But there's a bug in that special case,
which causes Scylla to forget to delete other components from the same unfinished sstable.

The code intends only to delete the TemporaryHashes.db file from the
`_state->generations_found` multimap, but it accidentally also deletes
the file's sibling components from the multimap. Fix that.

Fixes scylladb/scylladb#26393
2025-10-04 00:45:55 +02:00
Michał Chojnowski
16cb223d7f test/boost/database_test: fix two no-op distributed loader tests
There are two tests which effectively check nothing.

They intend to check that distributed loader removes "leftover" sstable
files. So they create some incomplete sstables, run the test env
on the directory, and the files disappeared.
But the test env completely clears the test directory before
the distributed loader looks at the files, so the tests succeed trivially.

Fix that by adding a config knob to the test env which instructs it
not to clear the directory before the test.
2025-10-04 00:44:49 +02:00
Ferenc Szili
20aeed1607 load balancing: extend locator::load_stats to collect tablet sizes
This commit extend the TABLE_LOAD_STATS RPC with data about the tablet
replica sizes and effective disk capacity.
Effective disk capacity of a node is computed as a sum of the sizes of
all tablet replicas on a node and available disk space.

This is the first change in the size based load balancing series.

Closes scylladb/scylladb#26035
2025-10-03 13:37:22 +02:00
Pavel Emelyanov
37f59cef04 Merge 'tools: fix documentation links after change to source-available' from Botond Dénes
Some tools commands have links to online documentation in their help output. These links were left behind in the source-available change, they still point to the old opensource docs. Furthermore, the links in the scylla-sstable help output always point to the latest stable release's documentation, instead of the appropriate one for the branch the tool was built from. Fix both of these.

Fixes: scylladb/scylladb#26320

Broken documentation link fix for the  tool help output, needs backport to all live source-available versions.

Closes scylladb/scylladb#26322

* github.com:scylladb/scylladb:
  tools/scylla-sstable: fix doc links
  release: adjust doc_link() for the post source-available world
  tools/scylla-nodetool: remove trailing " from doc urls
2025-10-03 13:53:19 +03:00
Pavel Emelyanov
7116e7dac6 api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-03 13:51:25 +03:00
Pavel Emelyanov
42657105a3 api: Coroutinize get_built_indexes handler code
"While at it". It looks much simpler this way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-03 13:50:58 +03:00
Pavel Emelyanov
f77f9db96c api: Remove system_keyspace ref from column_family API block
This reference was only needed to facilitate get_built_indexes handler
to work. Now it's gone and the sys.ks. reference is no longer needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-03 13:50:22 +03:00
Pavel Emelyanov
95b616d0e5 api: Move get_built_indexes from column_family to view_builder
The handler effectively works with the view_builder and should be
registerd in the block that has this service captured.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-03 13:49:33 +03:00
Tomasz Grabiec
9ebdeb261f tablets: load_balancer: Recognize that tablets are confined to racks when computing desired tablet count
The old logic assumes that replicas are spread across whole DC when
determining how many tablets we need to have at least 10 tablets per
shard. If replicas are actually confined to a subset of racks, that
will come up with a too high count and overshoot actual per-shard
count in this rack.

Similar problem happens for scaling-down of tablet count, when we try
to keep per-shard tablet count below the goal. It should be tracked
per-rack rather than per-DC, since racks can differ in how loaded they
are by RF if it's a rack-list.
2025-10-02 19:45:00 +02:00
Tomasz Grabiec
6962464be7 locator: Make hasher for endpoint_dc_rack globally accessible 2025-10-02 19:45:00 +02:00
Tomasz Grabiec
85ddb832b4 test: tablets: Add test for replica allocation on rack list changes 2025-10-02 19:45:00 +02:00
Benny Halevy
4955ca3ddd test: lib: topology_builder: generate unique rack names
Encode the dc identifier into each rack name so each dc will have its
own unique racks.

Just for easier distinction in logs.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-10-02 19:45:00 +02:00
Tomasz Grabiec
5fc617ecf5 test: Add tests for rack list RF 2025-10-02 19:45:00 +02:00
Tomasz Grabiec
1d34614421 doc: Document rack-list replication factor
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
2025-10-02 19:45:00 +02:00
Tomasz Grabiec
655f4ffa3c topology_coordinator: Restore formatting 2025-10-02 19:45:00 +02:00
Tomasz Grabiec
a21bbd4773 topology_coordinator: Cancel keyspace alter on broader set of errors
We now include keyspace metadata construction, which can throw if
validation fails. We want to fail the ALTER rather than keep retrying.
2025-10-02 19:44:59 +02:00
Tomasz Grabiec
d02f93e77e topology_coordinator: Make keyspace alter process options through as_ks_metadata_update()
There are several problems with how ALTER execution works with tablets.

1) Currently, option processing bypasses
   ks_prop_defs::prepare_options(), and will pass them directly to
   keyspace_metadata. This deviates from the vnode path, causing
   discrepancy in logic. But also there will be some non-trivial options
   post-processing added there - numeric RF will be replaced with a
   rack list. We should preserve it in the tablet path which alters
   the keyspace, otherwise it will fail when trying to construct
   network_topology_strategy.

2) Option merging happens on the flat version of the map, which won't
   work correctly with extended map which contains lists. We want
   the new list to replace the old list or numeric RF, not get its items
   merged. For example:

    We want:

      {'dc1': 3} + {'dc1': ['rack1', 'rack2']} = {'dc1': ['rack1', 'rack2']}

    If we merge flattened options, we would get incorrect flattened options:

      {'dc1': 3,
       'dc1:0', 'rack1'
       'dc1:1', 'rack2'}

3) We lose atomicity of update. Validation and merging which happens on the CQL
   coordinator is done in a different group0 transaction context than mutation
   generation inside topology coordinator later.

   Fixes https://github.com/scylladb/scylladb/issues/25269
2025-10-02 19:44:29 +02:00
Tomasz Grabiec
849ab5545f cql3: ks_prop_defs: Preserve old options
In 2d9b8f2, semantics of ALTER was changed for tablet-based keyspaces
which makes "replication" assignment act like +=, where replication
options are merged with the old options.

This merging is currently performed in the CQL statement level on
options map, before passing to topology coordinator. This will change
in later commit, so move merging here. Merging options of flattened
level will not be correct because it doesn't recognize nested
collections, like rack lists.

We want:

  {'dc1': 3} + {'dc1': ['rack1', 'rack2']} = {'dc1': ['rack1', 'rack2']}

If we merge flattened options, we would get incorrect flattened options:

  {'dc1': 3,
   'dc1:0', 'rack1'
   'dc1:1', 'rack2'}

Which cannot be parsed back into ks_prop_defs on the topology coordinator.

Refs https://github.com/scylladb/scylladb/pull/20208#issuecomment-3174728061
Refs #25549
2025-10-02 19:42:39 +02:00
Tomasz Grabiec
0d0c06da06 cql3: ks_prop_defs: Introduce flattened() 2025-10-02 19:42:39 +02:00
Tomasz Grabiec
6b7b0cb628 locator: Recognize rack list RF as valid in assert_rf_rack_valid_keyspace() 2025-10-02 19:42:39 +02:00
Tomasz Grabiec
e5b7452af2 tablet_allocator: Respect binding replicas to racks 2025-10-02 19:42:39 +02:00
Tomasz Grabiec
6de342ed3e locator: network_topology_strategy: Respect rack list when reallocating tablets 2025-10-02 19:42:39 +02:00
Tomasz Grabiec
8e9a58b89f cql3: ks_prop_defs: Fail with more information when options are not in expected format
Before, we would throw vague sstring_out_of_range from substr() when
the name doesn't have a nested key separate with ":", e.g "durable_writes"
instead of "durable_writes:durable_writes".
2025-10-02 19:42:39 +02:00
Tomasz Grabiec
66755db062 locator, cql3: Support rack lists in replication options
Allows per-DC replication factor to be either a string, holding a
numerical value, or a list of strings, holding a list of rack names.

The rack list is not respected yet by the tablet allocator, this is
achieved in subsequent commit.

This changes the format of options stored in the flattened map
in system_schema.keyspaces#replication. Values which are rack lists,
are converted into multiple entries, with the list index appended to
the key with ':' as the separator:

For example, this extended map:

   {
      'dc1': '3',
      'dc2': ['rack1', 'rack2']
   }

is stored as a flattened map:

  {
    'dc1': '3',
    'dc2:0': 'rack1',
    'dc2:1': 'rack2'
  }

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
2025-10-02 19:42:39 +02:00
Botond Dénes
9310d3eff3 scylla-gdb.py: small-objects: don't cast free_object to void*
When walking the free-list of a pool or a span, the small-object code
casts the dereferenced `free_object*` to `void*`. This is unnecessary,
just use the `next` field of the `free_object` to look up the next free
object. I think this monkey business with `void*` was done to speed up
walking the free-list, but recently we've seen small-object --summarize
fail in CI, and it could be related.

Fixes: #25733

Closes scylladb/scylladb#26339
2025-10-02 13:36:49 +03:00