Commit Graph

48854 Commits

Author SHA1 Message Date
Asias He
76316f44a7 repair: Add metrics for sstable bytes read and skipped from sstables
scylla_repair_inc_sst_skipped_bytes: Total number of bytes skipped from
sstables for incremental repair on this shard.

scylla_repair_inc_sst_read_bytes : Total number of bytes read from
sstables for incremental repair on this shard.
2025-08-18 11:01:22 +08:00
Asias He
b0364fcba3 test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
Disable incremental repair so that the second repair can still work on
the repaired data set.
2025-08-18 11:01:22 +08:00
Asias He
ad5275fd4c test.py: Add tests for tablet incremental repair
The following tests are added for tablet incremental repair:

- Basic incremental repair

- Basic incremental repair with error

- Minor compaction and incremental repair

- Major compaction and incremental repair

- Scrub compaction and incremental repair

- Cleanup/Upgrade compaction and incremental repair

- Tablet split and incremental repair

- Tablet merge and incremental repair
2025-08-18 11:01:21 +08:00
Asias He
0d7e518a26 repair: Add tablet incremental repair support
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.

Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.

This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:

The repaired_at on disk field of a SSTable is used.
   - A 64-bit number increases sequentially

The sstables_repaired_at is added to the system.tablets table.
   - repaired_at <= sstables_repaired_at means the sstable is repaired

The being_repaired in memory field of a SSTable is added.
   - A repair UUID tells which sstable has participated in the repair

Initial test results:

    1) Medium dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~500GB
    Cluster pre-populated with ~500GB of data before starting repairs job.
    Results for Repair Timings:
    The regular repair run took 210 mins.
    Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
    The speedup is: 183 mins  / 48s = 228X

    2) Small dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~167GB
    Cluster pre-populated with ~167GB of data before starting the repairs job.
    Regular repair 1st run took 110s,  2nd and 3rd runs took 110s.
    Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
    The speedup is: 110s / 1.5s = 73X

    3) Large dataset results

    Node amount: 6
    Instance type: i4i.2xlarge, 3 racks
    50% of base load, 50% read/write
    Dataset == Sum of data on each node

    Dataset     Non-incremental repair (minutes)
    1.3 TiB     31:07
    3.5 TiB     25:10
    5.0 TiB     19:03
    6.3 TiB     31:42

    Dataset     Incremental repair (minutes)
    1.3 TiB     24:32
    3.0 TiB     13:06
    4.0 TiB     5:23
    4.8 TiB     7:14
    5.6 TiB     3:58
    6.3 TiB     7:33
    7.0 TiB     6:55

Fixes #22472
2025-08-18 11:01:21 +08:00
Asias He
f9021777d8 compaction: Add tablet incremental repair support
This patch addes incremental_repair support in compaction.

- The sstables are split into repaired and unrepaired set.

- Repaired and unrepaired set compact sperately.

- The repaired_at from sstable and sstables_repaired_at from
  system.tablets table are used to decide if a sstable is repaired or
  not.

- Different compactions tasks, e.g., minor, major, scrub, split, are
  serialized with tablet repair.
2025-08-18 11:01:21 +08:00
Asias He
2ecd42f369 feature_service: Add TABLET_INCREMENTAL_REPAIR feature 2025-08-11 10:10:08 +08:00
Asias He
b226ad2f11 tablet_allocator: Add tablet_force_tablet_count_increase and decrease
It is useful to increase and decrease the tablet count in the test for
tablet split and merge testing.
2025-08-11 10:10:08 +08:00
Asias He
1bf59ebba0 repair: Add incremental helpers
This adds the helpers which are needed by both repair and compaction to
add incremental repair support.
2025-08-11 10:10:08 +08:00
Asias He
b86f554760 sstable: Add being_repaired to sstable
This in-memory filed is set by incremental repair when the sstable
participates the repair.
2025-08-11 10:10:08 +08:00
Asias He
f50cd94429 sstables: Add set_repaired_at to metadata_collector 2025-08-11 10:10:08 +08:00
Asias He
ac9d33800a mutation_compactor: Introduce add operator to compaction_stats
It is needed to combine two compactions.
2025-08-11 10:10:07 +08:00
Asias He
5377f87e5a tablet: Add sstables_repaired_at to system.tablets table
It is used to store the repaired_at for each tablet.
2025-08-11 10:10:07 +08:00
Asias He
8db18ac74e test: Fix drain api in task_manager_client.py
The POST method should be used.
2025-08-11 10:10:07 +08:00
Avi Kivity
6daa6178b1 scripts: pull_github_pr.sh: reject unintended submodule changes
It is easy for submodule changes to slip through during rebase (if
the developer uses the terrible `git add -u`  command) and
for a maintainer to miss it (if they don't go over each change after
a rebase).

Protect against such mishaps by checking if a submodule was updated
(or .gitmodules itself was changes) and aborting the operation.

If the pull request title contains "submodule", assume the operation
was intended.

Allow bypassing the check with --allow-submodule.

Closes scylladb/scylladb#25418
2025-08-10 11:48:34 +03:00
Avi Kivity
c2a2e11c40 Merge 'Prepare the way for incremental repair' from Botond Dénes
With incremental repair, each replica::compaction_group will have 3 logical compaction groups, repaired, repairing and unrepaired. The definition of group is a set of sstables that can be compacted together. The logical groups will share the same instance of sstable_set, but each will have its own logical sstable set. Existing compaction::table_state is a view for a logical compaction group. So it makes sense that each replica::compaction_group will have multiple views. Each view will provide to compaction layer only the sstables that belong to it. That way, we preserve the existing interface between replica and compaction layer, where each compaction::table_state represents a single logical group.
The idea is that all the incremental repair knowledge is confined to repair and replica layer, compaction doesn't want to know about it, it just works on logical groups, what each represents doesn't matter from the perspective of the subsystem. This is the best way forward to not violate layers and reduce the maintenance burden in the long run.
We also proceed to rename table_state to compaction_group_view, since it's a better description. Working with multiple terms is confusing. The placeholder for implementing the sstable classifier is also left in tablet_storage_group_manager, by the time being, all sstables will go to the unrepaired logical set, which preserves the current behavior.

New functionality, no backport required

Closes scylladb/scylladb#25287

* github.com:scylladb/scylladb:
  test: Add test that compaction doesn't cross logical group boundary
  replica: Introduce views in compaction_group for incremental repair
  compaction: Allow view to be added with compaction disabled
  replica: Futurize retrieval of sstable sets in compaction_group_view
  treewide: Futurize estimation of pending compaction tasks
  replica: Allow compaction_group to have more than one view
  Move backlog tracker to replica::compaction_group
  treewide: Rename table_state to compaction_group_view
  tests: adjust for incremental repair
2025-08-09 17:21:17 +03:00
Anna Stuchlik
f3d9d0c1c7 doc: add new and removed metrics to the 2025.3 upgrade guide
This commit adds the list of new and removed metrics to the already existing upgrade guide
from 2025.2 to 2025.3.

Fixes https://github.com/scylladb/scylladb/issues/24697

Closes scylladb/scylladb#25385
2025-08-08 13:25:51 +02:00
Avi Kivity
ab45a0edb5 Update seastar submodule
* seastar 60b2e7da...1520326e (36):
  > Merge 'http/client: Fix content length body overflow check (and a bit more)' from Pavel Emelyanov
    test/http: Add test for http_content_length_data_sink
    test/http: Implement some missing methods for memory data sink
    http/client: Fix content length body overflow check
    http/client: Fix misprint in overflow exception message
  > dns: Use TCP connection data_sink directly
  > iostream: Update "used stream" check for output_stream::detach()
  > Update dpdk submodule
  > rpc: server::process: coroutinize
  > iostream: Remove deprecated constructor
  > Merge 'foreign_ptr: add unwrap_on_owner_shard method' from Benny Halevy
    foreign_ptr: add unwrap_on_owner_shard method
    foreign_ptr: release: check_shard with SEASTAR_DEBUG_SHARED_PTR
  > enum: Replace static_assert() with concept
  > rpc: reindent connection::negotiate()
  > rpc: client: use structured binding
  > rpc.cc: reindent
  > queue: Remove duplicating static assertion
  > Merge 'rpc: client: convert main loop to a coroutine' from Avi Kivity
    rpc: client::loop(): restore indentation
    rpc: client: coroutinize client::loop()
    rpc: client: split main loop function
  > Merge 'treewide: replace remaining std::enable_if with constraints' from Avi Kivity
    optimized_optional: replace std::enable_if with constraint
    log: replace std::enable_if with constraint
    rpc: replace std::enable_if with constraint
    when_all: replace std::enable_if with constraints
    transfer: replace std::enable_if with constraints
    sstring: replace std::enable_if with constraint
    simple-stream: replace std::enable_if with constraints
    shared_ptr: replace std::enable_if with constraints
    sharded: replace std::enable_if with constraints for sharded_has_stop
    sharded: replace std::enable_if with constraints for peering_sharded_service
    scollectd: replace std::enable_if with constraints for type inference
    scollectd: replace std::enable_if with constraints for ser/deser
    metrics: replace std::enable_if with constraints
    chunked_fifo: replace std::enable_if with constraint
    future: replace std::enable_if with constraints
  > websocket: Avoid sending scattered_message to output_stream
  > websocket: Remove unused scattered_message.hh inclusion
  > aio: Squash aio_nowait_supported into fs_info::nowait_works
  > Merge 'reactor: coroutinize spawn()' from Avi Kivity
    reactor: restore indentation for spawn()
    reactor: coroutinize spawn()
  > modules: export coroutine facilities
  > Merge 'reactor: coroutinize some file-related functions' from Avi Kivity
    reactor: adjust indentation
    reactor: coroutinize reactor::make_pipe()
    reactor: coroutinize reactor::inotify_add_watch()
    reactor: coroutinize reactor::read_directory()
    reactor: coroutinize reactor::file_type()
    reactor: coroutinize reactor::chmod()
    reactor: coroutinize reactor::link_file()
    reactor: coroutinize reactor::rename_file()
    reactor: coroutinize open_file_dma()
  > memory: inline disable_abort_on_alloc_failure_temporarily
  > Merge 'addr2line timing and optimizations' from Travis Downs
    addr2line: add basic timing support
    addr2line: do a quick check for 0x in the line
    addr2line: don't load entire file
    addr2line: typing fixing
  > posix: Replace static_assert with concept
  > tls: Push iovec with the help of put(vector<temporary_buffer>)
  > io_queue: Narrow down friendship with reactor
  > util: drop concepts.hh
  > reactor: Re-use posix::to_timespec() helper
  > Fix incorrect defaults for io queue iops/bandwidth
  > net: functions describing ssl connection
  > Add label values to the duplicate metrics exception
  > Merge 'Nested scheduling groups (CPU only)' from Pavel Emelyanov
    test: Add unit test for cross-sched-groups wakeups
    test: Add unit test for fair CPU scheduling
    test: Add unit test for basic supergrops manipulations
    test: Add perf test for context switch latency
    scheduling: Add an internal method to get group's supergroup
    reactor: Add supergroup get_shares() API
    reactor: Add supergroup::set_shares() API
    reactor: Create scheduling groups in supergroups
    reactor: Supergroups destroying API
    reactor: Supergroups creating API
    reactor: Pass parent pointer to task_queue from caller
    reactor: Wakeup queue group on child activation
    reactor: Add pure virtual sched_entity::run_tasks() method
    reactor: Make task_queue_group be sched_entity too
    reactor: Split task_queue_group::run_some_tasks()
    reactor: Count and limit supergroup children
    reactor: Link sched entity to its parent
    reactor: Switch activate(task_queue*) to work on sched_entity
    reactor: Move set_shares() to sched_entity()
    reactor: Make account_runtime() work with sched_entity
    reactor: Make insert_activating_task_queue() work on sched_entity
    reactor: Make pop_active_task_queue() work on sched_entity
    reactor: Make insert_active_task_queue() work on sched_entity
    reactor: Move timings to sched_entity
    reactor: Move active bit to sched_entity
    reactor: Move shares to sched_entity
    reactor: Move vruntime to sched_entity
    reactor: Introduce sched_entity
    reactor: Rename _activating_task_queues -> _activating
    reactor: Remove local atq* variable
    reactor: Rename _active_task_queues -> _active
    reactor: Move account_runtime() to task_queue_group
    reactor: Move vruntime update from task_queue into _group
    reactor: Simplify task_queue_group::run_some_tasks()
    reactor: Move run_some_tasks() into task_queue_group
    reactor: Move insert_activating_task_queues() into task_queue_group
    reactor: Move pop_active_task_queue() into task_queue_group
    reactor: Move insert_active_task_queue() into task_queue_group
    reactor: Introduce and use task_queue_group::activate(task_queue)
    reactor: Introduce task_queue_group::active()
    reactor: Wrap scheduling fields into task_queue_group
    reactor: Simplify task_queue::activate()
    reactor: Rename task_queue::activate() -> wakeup()
    reactor: Make activate() method of class task_queue
    reactor: Make task_queue::run_tasks() return bool
    reactor: Simplify task_queue::run_tasks()
    reactor: Make run_tasks() method of class task_queue
  > Fix hang in io_queue for big write ioproperties numbers
  > split random io buffer size in 2 options
  > reactor: document run_in_background
  > Merge 'Add io_queue unit test for checking request rates' from Robert Bindar
    Add unit test for validating computed params in io_queue
    Move `disk_params` and `disk_config_params` to their own unit
    Add an overload for `disk_config_params::generate_config`

Closes scylladb/scylladb#25404
2025-08-08 12:24:39 +03:00
Botond Dénes
70aa81990b Merge 'Alternator - add the ability to write, not just read, system tables' from Nadav Har'El
In commit 44a1daf we added the ability to read Scylla system tables with Alternator. This feature is useful, among other things, in tests that want to read Scylla's configuration through the system table system.config. But tests often want to modify system.config, e.g., to temporarily reduce some threshold to make tests shorter. Until now, this was not possible

This series add supports for writing to system tables through Alternator, and examples of tests using this capability (and utility functions to make it easy).

Because the ability to write to system tables may have non-obvious security consequences, it is turned off by default and needs to be enabled with a new configuration option "alternator_allow_system_table_write"

No backports are necessary - this feature is only intended for tests. We may later decide to backport if we want to backport new tests, but I think the probability we'll want to do this is low.

Fixes #12348

Closes scylladb/scylladb#19147

* github.com:scylladb/scylladb:
  test/alternator: utility functions for changing configuration
  alternator: add optional support for writing to system table
  test/alternator: reduce duplicated code
2025-08-08 09:13:15 +03:00
Raphael S. Carvalho
beaaf00fac test: Add test that compaction doesn't cross logical group boundary
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:58:01 +03:00
Raphael S. Carvalho
d351b0726b replica: Introduce views in compaction_group for incremental repair
Wired the unrepaired, repairing and repaired views into compaction_group.

Also the repaired filter was wired, so tablet_storage_group_manager
can implement the procedure to classify the sstable.

Based on this classifier, we can decide which view a sstable belongs
to, at any given point in time.

Additionally, we made changes changes to compaction_group_view
to return only sstables that belong to the underlying view.

From this point on, repaired, repairing and unrepaired sets are
connected to compaction manager through their views. And that
guarantees sstables on different groups cannot be compacted
together.
Repairing view specifically has compaction disabled on it altogether,
we can revert this later if we want, to allow repairing sstables
to be compacted with one another.

The benefit of this logical approach is having the classifier
as the single source of truth. Otherwise, we'd need to keep the
sstable location consistest with global metadata, creating
complexity

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:58:00 +03:00
Raphael S. Carvalho
61cb02f580 compaction: Allow view to be added with compaction disabled
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:58:00 +03:00
Raphael S. Carvalho
9d3755f276 replica: Futurize retrieval of sstable sets in compaction_group_view
This will allow upcoming work to gently produce a sstable set for
each compaction group view. Example: repaired and unrepaired.

Locking strategy for compaction's sstable selection:
Since sstable retrieval path became futurized, tasks in compaction
manager will now hold the write lock (compaction_state::lock)
when retrieving the sstable list, feeding them into compaction
strategy, and finally registering selected sstables as compacting.
The last step prevents another concurrent task from picking the
same sstable. Previously, all those steps were atomic, but
we have seen stall in that area in large installations, so
futurization of that area would come sooner or later.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:58:00 +03:00
Raphael S. Carvalho
20c3301a1a treewide: Futurize estimation of pending compaction tasks
This is to allow futurization of compaction_group_view method that
retrieves sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:51:29 +03:00
Raphael S. Carvalho
af3592c658 replica: Allow compaction_group to have more than one view
In order to support incremental repair, we'll allow each
replica::compaction_group to have two logical compaction groups
(or logical sstable sets), one for repaired, another for unrepaired.

That means we have to adapt a few places to work with
compaction_group_view instead, such that no logical compaction
group is missed when doing table or tablet wide operations.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:51:29 +03:00
Raphael S. Carvalho
e78295bff1 Move backlog tracker to replica::compaction_group
Since there will be only one physical sstable set, it makes sense to move
backlog tracker to replica::compaction_group. With incremental repair,
it still makes sense to compute backlog accounting both logical sets,
since the compound backlog influences the overall read amplification,
and the total backlog across repaired and unrepaired sets can help
driving decisions like giving up on incremental repair when unrepaired
set is almost as large as the repaired set, causing an amplification
of 2.

Also it's needed for correctness because a sstable can move quickly
across the logical sets, and having one tracker for each logical
set could cause the sstable to not be erased in the old set it
belonged to;

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:51:29 +03:00
Raphael S. Carvalho
2c4a9ba70c treewide: Rename table_state to compaction_group_view
Since table_state is a view to a compaction group, it makes sense
to rename it as so.

With upcoming incremental repair, each replica::compaction_group
will be actually two compaction groups, so there will be two
views for each replica::compaction_group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:51:28 +03:00
Asias He
acc367c522 tests: adjust for incremental repair
The separatation of sstables into the logical repaired and unrepaired
virtual sets, requires some adjustments for certain tests, in particular
for those that look at number of compaction tasks or number of sstables.
The following tests need adjustment:
* test/cluster/tasks/test_tablet_tasks.py
* test/boost/memtable_test.cc

The adjustments are done in such a way that they accomodate both the
case where there is separate repaired/unrepaired states and when there
isn't.
2025-08-08 06:49:17 +03:00
Andrei Chekun
5c095558b1 test.py: add timeout option for the whole run
Add possibility to limit the execution time for one test in pytest
Add --session-timeout to limit execution of the test.py or/and pytest
session

Closes scylladb/scylladb#25185
2025-08-07 21:06:14 +03:00
Avi Kivity
2b8f5d128a Merge 'GCP Key Provider: Fix authentication issues' from Nikos Dragazis
* Fix discovery of application default credentials by using fully expanded pathnames (no tildes).
* Fix grant type in token request with user credentials.

Fixes #25345.

Closes scylladb/scylladb#25351

* github.com:scylladb/scylladb:
  encryption: gcp: Fix the grant type for user credentials
  encryption: gcp: Expand tilde in pathnames for credentials file
2025-08-07 20:50:12 +03:00
Dani Tweig
0ade762654 Adding action call to update Jira issue status
Add actions that will change the relevant Jira issue status based on the linked PR changes.

Closes scylladb/scylladb#25397
2025-08-07 15:55:58 +03:00
Benny Halevy
3f44dba014 sstables: make_entry_descriptor: make regex non-greedy
With greedy matching, an sstable path in a snapshot
directory with a tag that resembles a name-<uuid>
would match the dir regular expression as the longest match,
while a non-greedy regular expression would correctly match
the real keyspace and table as the shortest match.

Also, add a regression unit test reproducing the issue and
validating the fix.

Fixes #25242

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25323
2025-08-07 15:35:11 +03:00
Avi Kivity
8164f72f6e Merge 'Separate local_effective_replication_map from vnode_effective_replication_map' from Benny Halevy
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.

However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.

Refs #22733

* No backport required

Closes scylladb/scylladb#25222

* github.com:scylladb/scylladb:
  locator: abstract_replication_strategy: implement local_replication_strategy
  locator: vnode_effective_replication_map: convert clone_data_gently to clone_gently
  locator: abstract_replication_map: rename make_effective_replication_map
  locator: abstract_replication_map: rename calculate_effective_replication_map
  replica: database: keyspace: rename {create,update}_effective_replication_map
  locator: effective_replication_map_factory: rename create_effective_replication_map
  locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al
  locator: abstract_replication_strategy: rename global_vnode_effective_replication_map
  keyspace: rename get_vnode_effective_replication_map
  dht: range_streamer: use naked e_r_m pointers
  storage_service: use naked e_r_m pointers
  alternator: ttl: use naked e_r_m pointers
  locator: abstract_replication_strategy: define is_local
2025-08-07 12:51:43 +03:00
Nadav Har'El
6f415b2f10 Merge 'test/cqlpy: Adjust test_describe.py to work against Cassandra' from Dawid Mędrek
We adjust most of the tests in `cqlpy/test_describe.py`
so that they work against both Scylla and Cassandra.
This PR doesn't cover all of them, just those I authored.

Refs scylladb/scylladb#11690

Backport: not needed. This is effectively a code cleanup.

Closes scylladb/scylladb#25060

* github.com:scylladb/scylladb:
  test/cqlpy/test_describe.py: Adjust test_create_role_with_hashed_password_authorization to work with Cassandra
  test/cqlpy/test_describe.py: Adjust test_desc_restore to work with Cassandra
  test/cqlpy/test_describe.py: Mark Scylla-only tests as such
2025-08-07 12:43:04 +03:00
Avi Kivity
90eb6e6241 Merge 'sstables/trie: implement BTI node format serialization and traversal' from Michał Chojnowski
This is the next part in the BTI index project.

Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25154
Next part: implementing a trie cursor (the "set to key, step forwards, step backwards" thing) on top of the `node_reader` added here.

The new code added here is not used for anything yet, but it's posted as a separate PR
to keep things reviewably small.

This part implements the BTI trie node encoding, as described in https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.md#trie-nodes.
It contains the logic for encoding the abstract in-memory `writer_node`s (added in the previous PR)
into the on-disk format, and the logic for traversing the on-disk nodes during a read.

New functionality, no backporting needed.

Closes scylladb/scylladb#25317

* github.com:scylladb/scylladb:
  sstables/trie: add tests for BTI node serialization and traversal
  sstables/trie: implement BTI node traversal
  sstables/trie: implement BTI serialization
  utils/cached_file: add get_shared_page()
  utils/cached_file: replace a std::pair with a named struct
2025-08-07 12:15:42 +03:00
Nadav Har'El
d632599a92 Merge 'test.py: native pytest repeats' from Andrei Chekun
Previous way of execution repeat was to launch pytest for each repeat.
That was resource consuming, since each time pytest was doing discovery
of the tests. Now all repeats are done inside one pytest process.

Backport for 2025.3 is needed, since this functionality is framework only, and 2025.3 affected with this slow repeats as well.

Closes scylladb/scylladb#25073

* github.com:scylladb/scylladb:
  test.py: add repeats in pytest
  test.py: add directories and filename to the log files
  test.py: rename log sink file for boost tests
  test.py: better error handling in boost facade
2025-08-06 18:18:03 +03:00
Dawid Pawlik
b284961a95 scripts: fetch the name of the author of the PR
The `pull_github_pr.sh` script has been fetching the username
from the owner of the source branch.
The owner of the branch is not always the author of the PR.
For example the branch might come from a fork managed by organization
or group of people.
This lead to having the author in merge commits refered to as `null`
(if the name was not set for the group) or it mentioned a name
not belonging to the author of the patch.

Instead looking for the owner of the source branch, the script should
look for the name of the PR's author.

Closes scylladb/scylladb#25363
2025-08-06 16:45:38 +03:00
Benny Halevy
5e5e63af10 scylla-sstable: print_query_results_json: continue loop if row is disengaged
Otherwise it is accessed right when exiting the if block.
Add a unit test reproducing the issue and validating the fix.

Fixes #25325

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25326
2025-08-06 16:44:51 +03:00
Szymon Malewski
eb11485969 test/alternator: enable more relevant logs in CI.
This patch sets, for alternator test suite, all 'alternator-*' loggers and 'paxos' logger to trace level. This should significantly ease debugging of failed tests, while it has no effect on test time and increases log size only by 7%.
This affects running alternator tests only with `test.py`, not with `test/alternator/run`.

Closes #24645

Closes scylladb/scylladb#25327
2025-08-06 16:37:25 +03:00
Benny Halevy
6dbbb80aae locator: abstract_replication_strategy: implement local_replication_strategy
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.

However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.

Note that everywhere_replication_strategy is not abstracted in a similar
way, although it could, since the plan is to get rid of it
once all system keyspaces areconverted to local or tablets replication
(and propagated everywhere if needed using raft group0)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:05:11 +03:00
Benny Halevy
8bde507232 locator: vnode_effective_replication_map: convert clone_data_gently to clone_gently
create_effective_replication_map need not know about the internals of
vnode_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:53 +03:00
Benny Halevy
8d4ac97435 locator: abstract_replication_map: rename make_effective_replication_map
to make_vnode_effective_replication_map_ptr since
it is specific to vnode_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:53 +03:00
Benny Halevy
babb4a41a8 locator: abstract_replication_map: rename calculate_effective_replication_map
to calculate_vnode_effective_replication_map since
it is specific to vnode-based range calculations.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:53 +03:00
Benny Halevy
34b223f6f9 replica: database: keyspace: rename {create,update}_effective_replication_map
to *_static_effective_replication_map, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:53 +03:00
Benny Halevy
688bd4fd43 locator: effective_replication_map_factory: rename create_effective_replication_map
to create_static_effective_replication_map, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:53 +03:00
Benny Halevy
cbad497859 locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al
to static_effective_replication_map_ptr, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:53 +03:00
Benny Halevy
2ab44e871b locator: abstract_replication_strategy: rename global_vnode_effective_replication_map
to global_static_effective_replication_map, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:49 +03:00
Benny Halevy
bd62421c05 keyspace: rename get_vnode_effective_replication_map
to get_static_effective_replication_map, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map (both are per-keyspace).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:40:43 +03:00
Benny Halevy
33f34c8c32 dht: range_streamer: use naked e_r_m pointers
Prepare for following patch that will separate
the local effective replication map from
vnode_effective_replication_map.

The caller is responsible to keep the
effective_replication_map_ptr alive while
in use by low-level async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:34:23 +03:00
Benny Halevy
d6d434b1c2 storage_service: use naked e_r_m pointers
Prepare for following patch that will separate
the local effective replication map from
vnode_effective_replication_map.

The caller is responsible to keep the
effective_replication_map_ptr alive while
in use by low-level async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:34:23 +03:00
Benny Halevy
59375e4751 alternator: ttl: use naked e_r_m pointers
Prepare for following patch that will separate
the local effective replication map from
vnode_effective_replication_map.

The caller is responsible to keep the
effective_replication_map_ptr alive while
in use by low-level async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:34:23 +03:00