Commit Graph

50823 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
ea26e4b3a5 Use tablet_task_info fields for tablet task timing
For tablet_virtual_task, set creation_time to tablet_task_info::request_time
and start_time to tablet_task_info::sched_time, matching the actual semantics
of when the request was created vs when it was scheduled for execution.

Co-authored-by: Deexie <56607372+Deexie@users.noreply.github.com>
2025-12-08 15:05:50 +00:00
copilot-swe-agent[bot]
3b793ef09f Merge topology_request_tracking_mutation_builder calls for keyspace_rf_change
Combined the two separate topology_request_tracking_mutation_builder calls
into one, setting both start_time and done status in the same mutation builder
to reduce redundancy.

Co-authored-by: Deexie <56607372+Deexie@users.noreply.github.com>
2025-12-08 14:53:23 +00:00
copilot-swe-agent[bot]
df0a59ba03 Fix start_time setting to be together with operation start
Refactored start_time setting for global requests to be included in the same
mutation batch that starts the actual operation, matching the pattern used for
node operations. This avoids an extra update_topology_state call and ensures
start_time is set atomically with the operation start.

Also updated nodetool tasks documentation to include creation_time field in
example outputs for status, list, and tree commands.

Co-authored-by: Deexie <56607372+Deexie@users.noreply.github.com>
2025-12-08 14:40:01 +00:00
copilot-swe-agent[bot]
69024a09b2 Add creation_time field to nodetool tasks subcommands
Extended scylla-nodetool.cc to display creation_time in all task-related outputs:
- tasks_print_status: Added creation_time to time field formatting
- tasks_print_trees: Added creation_time column to task tree display
- tasks_print_stats_list: Added creation_time column to task stats list

Co-authored-by: Deexie <56607372+Deexie@users.noreply.github.com>
2025-12-08 14:28:01 +00:00
copilot-swe-agent[bot]
bb8f28a1ab Fix start_time setting: remove from request creation, add to execution start
Remove incorrect start_time setting from request creation sites for:
- cleanup requests
- new_cdc_generation requests
- truncate_table requests
- keyspace_rf_change requests

Add start_time setting in topology_coordinator::handle_global_request
when execution begins, matching the pattern for node operations.

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2025-12-08 14:09:22 +00:00
copilot-swe-agent[bot]
32bc7e3a1c Add creation_time field to task status and stats API
Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2025-12-08 13:26:09 +00:00
copilot-swe-agent[bot]
fb4e37248d Initial plan 2025-12-08 13:03:16 +00:00
Avi Kivity
45c16553eb Revert "Update tools/cqlsh submodule"
This reverts commit ff1b212319. In this
commit, the python driver was updated to 3.29.6. That version has a
serious flaw - it rejects compression=None settings [1] which
cqlsh (legitimately) uses in copyutil.py.

The reason this hasn't caused numerous continuous integration failures
is that the submodule update commit did not update the frozen toolchain,
so the build was effectively running with an older version of the driver.

Fix by reverting the change. This allows us to regenerate the frozen
toolchain when we need to.

Reverted changes:

* tools/cqlsh 2240122...6badc99 (2):
  > Update scylla-driver version to 3.29.6
  > Revert "Migrate workflows to Blacksmith"

[1] 78f554236f

Closes scylladb/scylladb#27473
2025-12-08 08:50:52 +02:00
Nadav Har'El
c984f557ef Merge 'alternator: eliminate cross shard ::free for do_batch_write' from Petr Gusev
This is an optimization follow-up [for this PR](https://github.com/scylladb/scylladb/pull/27396#issuecomment-3611410774): avoiding destruction of foreign objects on the wrong shard. Releasing objects allocated on a different shard causes their ::free calls to be executed remotely, which adds unnecessary load to the SMP subsystem.

Before this PR, a `std::vector<put_or_delete_item>` could be moved to another shard. When the vector was eventually destroyed, its ::free had to be marshalled back to the shard where the memory had originally been allocated. This change avoids that overhead by passing the vector by const reference instead.

backport: not needed, this is an optimization

Closes scylladb/scylladb#27432

* github.com:scylladb/scylladb:
  alternator/executor.cc: avoid cross-shard free
  storage_proxy: cas: take cas_request by raw reference
2025-12-07 22:54:36 +02:00
Andrei Chekun
5e83311305 test.py: switch to ThreadPoolExecutor
With python 3.14, the Process fails due to pickling issue with nodes objects.
This will eliminate this issue, so we can bump up the python version.

Closes scylladb/scylladb#27456
2025-12-07 17:37:25 +02:00
Petr Gusev
f00f7976c1 alternator/executor.cc: avoid cross-shard free
This commit is an optimization: avoiding destruction of
foreign objects on the wrong shard. Releasing objects allocated on a
different shard causes their ::free calls to be executed remotely,
which adds unnecessary load to the SMP subsystem.

Before this patch, a std::vector could be moved
to another shard. When the vector was eventually destroyed,
its ::free had to be marshalled back to the shard where the memory had
originally been allocated. This change avoids that overhead by passing
the vector by const reference instead.

The referenced objects lifetime correctness reasoning:
* the put_or_delete_item refs usages in put_or_delete_item_cas_request
are bound to its lifetime
* cas_request lifetime is bound to storage_proxy::cas future
* we don't release put_or_delete_item-s untill all storage_proxy::cas
calls are done.
2025-12-07 16:14:56 +01:00
Petr Gusev
c428645d16 storage_proxy: cas: take cas_request by raw reference
In the next commit we want to add an optimization that relies on
precise control over the lifetime of cas_request. In particular, we
want the implementation of this interface in Alternator to operate on
raw references that are guaranteed to remain valid only until the
cas() future is resolved. We already depend on the same lifetime
assumptions in cas_request when used by modification_statement.
However, these assumptions are not clearly expressed in the current
interface: cas_request is taken by shared_ptr, and nothing prevents
cas() from storing that pointer inside paxos_response_handler, which
may outlive the cas() future.

This commit fixes that by taking cas_request by raw reference. This
makes it explicit that cas() does not assume ownership of the object.
Callers must ensure that the referenced object remains valid until
the returned future is resolved.
2025-12-07 16:14:56 +01:00
Tomasz Grabiec
082342ecad Attach names to allocating sections for better debuggability
Large reserves in allocating_section can cause stalls. We already log
reserve increase, but we don't know which table it belongs to:

  lsa - LSA allocation failure, increasing reserve in section 0x600009f94590 to 128 segments;

Allocating sections used for updating row cache on memtable flush are
notoriously problematic. Each table has its own row_cache, so its own
allocating_section(s). If we attached table name to those sections, we
could identify which table is causing problems. In some issues we
suspected system.raft, but we can't be sure.

This patch allows naming allocating_sections for the purpose of
identifying them in such log messages. I use abstract_formatter for
this purpose to avoid the cost of formatting strings on the hot path
(e.g. index_reader). And also to avoid duplicating strings which are
already stored elsewhere.

Fixes #25799

Closes scylladb/scylladb#27470
2025-12-07 14:14:25 +02:00
Avi Kivity
47efbdffbc Merge 'cache, mvcc: Preempt cache update when applying range tombstone from memtable' from Tomasz Grabiec
Range tombstones are represented as entry attributes, which applies to
the interval between entries. So if a range tombstone covers many
rows, to apply it we have to update all covered entries.  In some
workloads that could be many entries, even the whole cache.  Before
the patch, we did this update without preemption, which can cause
reactor stalls in such workloads.

This scenario is already covered by mvcc_tests,
e.g. test_apply_to_incomplete_respects_continuity. And I verified that
the new preemption point is hit in the test.

perf-row-cache-update results show no significant stalls anymore (max
2ms scheduling delay, instead of previous 1.5 s):

    Generated 1124195 rows
    Memtable fill took 4179.457520 [ms], {count: 8295, 99%: 0.654949 [ms], max: 32.817176 [ms]}
    Draining...
    took 0.000616 [ms]
    cache: 2506/2948 [MB], memtable: 781/1024 [MB], alloc/comp: 1051/662 [MB] (amp: 0.630)
    update: 2874.157471 [ms], preemption: {count: 26650, 99%: 1.131752 [ms], max: 2.068762 [ms]}, cache: 3027/3973 [MB], alloc/comp: 3951/2424 [MB] (amp: 0.614), pr/me/dr 1124195/0/0

Fixes #23479
Fixes #2578

Closes scylladb/scylladb#27469

* github.com:scylladb/scylladb:
  cache, mvcc: Preempt cache update when applying range tombstone from memtable
  partition_snapshot_row_cursor: Clarify non-obvious semantic difference of range_tombstone()
  perf-row-cache-update: Add scenario with large tombstone covering many rows
2025-12-07 11:54:15 +02:00
Avi Kivity
d811eeb4ca Merge 'Make direct failure detector verb handler more efficient' from Gleb Natapov
We saw that in large clusters direct failure detector may cause large task queues to be accumulated. The series address this issue and also moves the code into the correct scheduling group.

Fixes https://github.com/scylladb/scylladb/issues/27142

Backport to all version where 60f1053087 was backported to since it should improve performance in large clusters.

Closes scylladb/scylladb#27387

* github.com:scylladb/scylladb:
  direct_failure_detector: run direct failure detector in the gossiper scheduling group
  raft: drop invoke_on from the pinger verb handler
  direct_failure_detector: pass timeout to direct_fd_ping verb
2025-12-07 11:40:26 +02:00
Marcin Maliszkiewicz
4784e39665 auth: fix ctor signature of certificate_authenticator
In b9199e8b24 we
added cache argument to constructor of authenticators
but certificate_authenticator was ommited. Class
registrator sadly only fails in runtime for such
cases.

Fixes https://github.com/scylladb/scylladb/issues/27431

Closes scylladb/scylladb#27434
2025-12-07 11:18:42 +02:00
Tomasz Grabiec
d4014b7970 Drop legacy schema support
We switched to using v3 schema tables (in system_schema keyspace) in
2017, in 9eb91bc30b.

So no system should have the old schema any more.

No need to run legacy_schema_migrator on boot.

Closes scylladb/scylladb#27420
2025-12-07 00:09:13 +02:00
Tomasz Grabiec
92b5e4d63d cache, mvcc: Preempt cache update when applying range tombstone from memtable
Range tombstones are represented as entry attributes, which applies to
the interval between entries. So if a range tombstone covers many
rows, to apply it we have to update all covered entries.  In some
workloads that could be many entries, even the whole cache.  Before
the patch, we did this update without preemption, which can cause
reactor stalls in such workloads.

This scenario is already covered by mvcc_tests,
e.g. test_apply_to_incomplete_respects_continuity. And I verified that
the new preemption point is hit in the test.

perf-row-cache-update results show no significant stalls anymore (max
2ms scheduling delay, instead of previous 1.5 s):

Generated 1124195 rows
Memtable fill took 4179.457520 [ms], {count: 8295, 99%: 0.654949 [ms], max: 32.817176 [ms]}
Draining...
took 0.000616 [ms]
cache: 2506/2948 [MB], memtable: 781/1024 [MB], alloc/comp: 1051/662 [MB] (amp: 0.630)
update: 2874.157471 [ms], preemption: {count: 26650, 99%: 1.131752 [ms], max: 2.068762 [ms]}, cache: 3027/3973 [MB], alloc/comp: 3951/2424 [MB] (amp: 0.614), pr/me/dr 1124195/0/0

Fixes #23479
Fixes #2578
2025-12-06 13:45:35 +01:00
Tomasz Grabiec
e546143fd9 partition_snapshot_row_cursor: Clarify non-obvious semantic difference of range_tombstone() 2025-12-06 01:03:10 +01:00
Tomasz Grabiec
721434054b perf-row-cache-update: Add scenario with large tombstone covering many rows
Fills memtable with rows and a tombstone which deletes all rows which
are already in cache.

Similar to raft log workload, but more extreme.

With -c1 -m4G, observed really bad performance:

update: 1711.976196 [ms], preemption: {count: 22603, 99%: 0.943127 [ms], max: 1494.571776 [ms]}, cache: 2148/2906 [MB], alloc/comp: 1334/869 [MB] (amp: 0.651), pr/me/dr 1062186/0/1062187
cache: 2148/2906 [MB], memtable: 738/1024 [MB], alloc/comp: 993/0 [MB] (amp: 0.000)

Which means that max reactor stall during cache update was 1.5 [s]
0.7 GB memtables. 2.1 GB in cache.
2025-12-06 01:03:09 +01:00
Nadav Har'El
350cbd1d66 alternator: fix typo of BatchWriteItem in comments
The DynamoDB API's "BatchWriteItem" operation is spelled like this, in
singular. Some comments incorrectly referred to as BatchWriteItems - in
plural. This patch fixes those mistakes.

There are no functional changes here or changes to user-facing documents -
these mistakes were only in code comments.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#27446
2025-12-05 15:08:58 +02:00
Botond Dénes
866c96f536 Merge 'Add digests for all sstable components in scylla metadata' from Taras Veretilnyk
This pull request adds support for calculation and storing CRC32 digests for all SSTable components.
This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure
and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream.
All important SSTable components (Index, Partitions, Rows, Summary, Filter, CompressionInfo, and TOC) are covered.
Several test cases where introduced to verify expected behaviour.

Backport is not required, it is a new feature

Fixes #20100

Closes scylladb/scylladb#27287

* github.com:scylladb/scylladb:
  sstable_test: add verification testcases of SSTable components digests persistance
  sstables: store digest of all sstable components in scylla metadata
  sstables: Add TemporaryScylla metadata component type
  sstables: Extract file writer closing logic into separate methods
  sstables: Add components_digests to scylla metadata components
  sstables: Implement CRC32 digest-only writer
2025-12-05 11:36:50 +02:00
Botond Dénes
367633270a Merge 'EAR: handle IPV6 hosts in KMIP and use shared (improved) http parser in AWS/Azure' from Calle Wilund
Fixes #27367
Fixes #27362
Fixes #27366

Makes http URL parser handle IPv6.
Makes KMIP host setup handle IPv6 hosts + use system trust if no truststore set
Moves Azure/KMS code to use shared http URL parser to avoid same regex everywhere.

Closes scylladb/scylladb#27368

* github.com:scylladb/scylladb:
  ear::kms/ear::azure: Use utils::http URL parsing
  ear::kmip_host: Handle ipv6 hosts + use system trust when not specified
  utils::http: Handle ipv6 numeric host part in URL:s
2025-12-05 10:43:07 +02:00
Asias He
e97a504775 repair: Allow min max range to be updated for repair history
It is observed that:

repair - repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: Failed to update
system.repair_history table of node d27de212-6f32-4649ad76-a9ef1165fdcb:
seastar::rpc::remote_verb_error (repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: range (minimum
token,maximum token) is not in the format of (start, end])

This is because repair checks the end of the range to be repaired needs
to be inclusive. When small_table_optimization is enabled for regular
repair, a (minimum token,maximum token) will be used.

To fix, we can relax the check of (start, end] for the min max range.

Fixes #27220

Closes scylladb/scylladb#27357
2025-12-05 10:41:25 +02:00
Anna Stuchlik
a5c971d21c doc: update the upgrade policy to cover non-consecutive minor upgrades
Fixes https://github.com/scylladb/scylladb/issues/27308

Closes scylladb/scylladb#27319
2025-12-05 10:31:53 +02:00
Guy Shtub
a0809f0032 Update integration-jaeger.rst
Fixing broken link in Jaeger Docs to ScyllaDB

Closes scylladb/scylladb#26406
2025-12-05 10:23:07 +02:00
Piotr Dulikowski
bb6e41f97a index: allow vector indexes without rf_rack_valid_keyspces
The rf_rack_valid_keyspaces option needs to be turned on in order to
allow creating materialized views in tablet keyspaces with numeric RF
per DC. This is also necessary for secondary indexes because they use
materialized views underneath. However, this option is _not_ necessary
for vector store indexes because those use the external vector store
service for querying the list of keys to fetch from the main table, they
do not create a materialized view. The rf_rack_valid_keyspaces was, by
accident, required for vector indexes, too.

Remove the restriction for vector store indexes as it is completely
unnecessary.

Fixes: SCYLLADB-81

Closes scylladb/scylladb#27447
2025-12-05 09:26:26 +02:00
Marcin Maliszkiewicz
4df6b51ac2 auth: fix cache::prune_all roles iteration
During b9199e8b24
reivew it was suggested to use standard for loop
but when erasing element it causes increment on
invalid iterator, as role could have been erased
before.

This change brings back original code.

Fixes: https://github.com/scylladb/scylladb/issues/27422
Backport: no, offending commit not released yet

Closes scylladb/scylladb#27444
2025-12-04 23:35:54 +01:00
Taras Veretilnyk
0c8730ba05 sstable_test: add verification testcases of SSTable components digests persistance
Adds a generic test helper that writes a random SSTable, reloads it, and
verifies that the persisted CRC32 digest for each component matches the
digest computed from disk. Those covers all checksummed components test cases.
2025-12-04 21:09:01 +01:00
Taras Veretilnyk
bc2e83bc1f sstables: store digest of all sstable components in scylla metadata
This change replaces plain file_writer with crc32_digest_file_writer
for all SSTable components that should be checksummed. The resulting component
digests are stored in the sstable structure and later persisted to disk
as part of the Scylla metadata component during writer::consume_end_of_stream.
2025-12-04 21:00:09 +01:00
Patryk Jędrzejczak
f4c3d5c1b7 Merge 'fix test_coordinator_queue_management flakiness' from Gleb Natapov
After 39cec4ae45 node join may fail with either "request canceled" notification or (very rarely) because it was banned. Depend on timing. The series fixes the test to check for both possibilities.

Fixes #27320

No need to backport since the flakiness is in the mater only.

Closes scylladb/scylladb#27408

* https://github.com/scylladb/scylladb:
  test: fix test_coordinator_queue_management flakiness
  test/pylib: allow expected_error in server_start to contain regular expression
2025-12-04 16:08:02 +01:00
Tomasz Grabiec
e54abde3e8 Merge 'main: delay setup of storage_service REST API' from Andrzej Jackowski
The storage_service REST API uses `group0` internally. Before this
patch, it was possible to send an HTTP request before `group0` was
initialized, which resulted in a segmentation fault. Therefore,
this patch delays the setup of the storage_service REST API.

Additionally, `test_rest_api_on_startup` is added to reproduce the problem.

Fixes: https://github.com/scylladb/scylladb/issues/27130

No backport. It's a crash fix but possible only if a request is sent in a very specific phase of a node start.

Closes scylladb/scylladb#27410

* github.com:scylladb/scylladb:
  test: add test_rest_api_on_startup
  main: delay setup of storage_service REST API
2025-12-04 14:56:49 +01:00
Avi Kivity
9696ee64d0 database: fix overflow when computing data distribution over shards
We store the per-shard chunk count in a uint64_t vector
global_offset, and then convert the counts to offsets with
a prefix sum:

```c++
        // [1, 2, 3, 0] --> [0, 1, 3, 6]
        std::exclusive_scan(global_offset.begin(), global_offset.end(), global_offset.begin(), 0, std::plus());
```

However, std::exclusive_scan takes the accumulator type from the
initial value, 0, which is an int, instead of from the range being
iterated, which is of uint64_t.

As a result, the prefix sum is computed as a 32-bit integer value. If
it exceeds 0x8000'0000, it becomes negative. It is then extended to
64 bits and stored. The result is a huge 64-bit number. Later on
we try to find an sstable with this chunk and fail, crashing on
an assertion.

An example of the failure can be seen here: https://godbolt.org/z/6M8aEbo57

The fix is simple: the initial value is passed as uint64_t instead of int.

Fixes https://github.com/scylladb/scylladb/issues/27417

Closes scylladb/scylladb#27418
2025-12-04 14:10:53 +01:00
Calle Wilund
8dd69f02a8 ear::kms/ear::azure: Use utils::http URL parsing
Fixes #27367

Move to reuse shared code.
2025-12-04 11:38:41 +00:00
Calle Wilund
d000fa3335 ear::kmip_host: Handle ipv6 hosts + use system trust when not specified
Fixes #27362

The KMIP host connector should handle ipv4 connections (named or numeric).
It also should fall back to system trust when truststore is not specified.
2025-12-04 11:38:41 +00:00
Calle Wilund
4e289e8e6a utils::http: Handle ipv6 numeric host part in URL:s
Fixes #27366

A URL with numeric host part formats special in case of ipv6,
to avoid confusion with port part.
The parser should handle this.

I.e.
http://[2001:db8:4006:812::200e]:8080

v2:
* Include scheme agnostic parse + case insensitive scheme matching
2025-12-04 11:38:41 +00:00
Botond Dénes
9d2f7c3f52 Merge 'mv: allow setting concurrency in PRUNE MATERIALIZED VIEW' from Wojciech Mitros
The PRUNE MATERALIZED VIEW statement is performed as follows:
1. Perform a range scan of the view table from the view replicas based
on the ranges specified in the statement.
2. While reading the paged scan above, for each view row perform a read
from all base replicas at the corresponding primary key. If a discrepancy
is detected, delete the row in the view table.

When reading multiple rows, this is very slow because for each view row
we need to performe a single row query on multiple replicas.
In this patch we add an option to speed this up by performing many of the
single base row reads concurrently, at the concurrency specified in the
USING CONCURRENCY clause.

Aside from the unit test, I checked manually on a 3-node cluster with 10M rows, using vnodes. There were actually no ghost rows in the test, but we still had to iterate over all view rows and read the corresponding base rows. And actual ghost rows, if there are any, should be a tiny fraction of all rows. I compared concurrencies 1,2,10,100 and the results were:
* Pruning with concurrency 1 took total 1416 seconds
* Pruning with concurrency 2 took total 731 seconds
* Pruning with concurrency 10 took total 234 seconds
* Pruning with concurrency 100 took total 171 seconds
So after a concurrency of 10 or so we're hitting diminishing returns (at least in this setup). At that point we may be no longer bottlenecked by the reads, but by CPU on the shard that's handling the PRUNE

Fixes https://github.com/scylladb/scylladb/issues/27070

Closes scylladb/scylladb#27097

* github.com:scylladb/scylladb:
  mv: allow setting concurrency in PRUNE MATERIALIZED VIEW
  cql: add CONCURRENCY to the USING clause
2025-12-04 11:47:41 +02:00
Aleksandra Martyniuk
e3e81a9a7a repair: throw if flush failed in get_flush_time
Currently, _flush_time was stored as a std::optional<gc_clock::time_point>
and std::nullopt indicates that the flush was needed but failed. It's confusing
for the caller and does not work as expected since the _flush_time is initialized
with value (not optional).

Change _flush_time type to gc_clock::time_point. If a flush is needed but failed,
get_flush_time() throws an exception.

This was suppose to be a part of https://github.com/scylladb/scylladb/pull/26319
but it was mistakenly overwritten during rebases.

Refs: https://github.com/scylladb/scylladb/issues/24415.

Closes scylladb/scylladb#26794
2025-12-04 11:45:53 +02:00
Gleb Natapov
86dde50c0d direct_failure_detector: run direct failure detector in the gossiper scheduling group
When direct failure detector was introduces the idea was that it will
run on the same connection raft group0 verbs are running, but in
60f1053087 raft verbs were moved to run on the gossiper connection
while DIRECT_FD_PING was left where it was. This patch move it to
gossiper connection as well and fix the pinger code to run in gossiper
scheduling group.
2025-12-04 11:35:43 +02:00
Gleb Natapov
6a6bbbf1a6 raft: drop invoke_on from the pinger verb handler
Currently raft direct pinger verb jumps to shard 0 to check if group0 is
alive before replying. The verb runs relatively often, so it is not very
efficient. The patch distributes group0 liveness information (as it
changes) to all shard instead, so that the handler itself does not need
to jump to shard 0.
2025-12-04 11:35:43 +02:00
Avi Kivity
b82f92b439 main: replace p11-kit hack for trust paths override with gnutls hack
p11-kit has hardcoded paths for the trust paths. Of course, each
Linux distribution hardcodes those paths differently. As a result,
our relocatable gnutls, which uses p11-kit-trust.so to process the
trust paths, needs some overrides to select the right paths.

Currently, we use p11_kit_override_system_files(), a p11-kit API
intended for testing, but which worked well enough for our purpose,
to override the trust module configuration.

Unfortunately, starting (presumably [1]) in gnutls 3.8.11, gnutls
changed how it works with p11-kit and our override is now ignored.

This was likely unintentional, but there appears to be a better way:
instead of letting gnutls auto-load the trust module from a hacked
configuration, we load the modules outselves using
gnutls_pkcs11_init(GNUTLS_PKCS11_FLAG_MANUAL) and
gnutls_pkcs11_add_provider(). These appear to be intended for the purpose.

We communicate the paths to the scylla executable using an environment
variable. This isn't optimal, but is much easier than adding a command
line variable since there are multiple levels of command line parsing due
to the subtool mechanism.

With this, we unlock the possibility to upgrade gnutls to newer versions.

[1] aa5f15a872

Closes scylladb/scylladb#27348
2025-12-04 11:33:51 +02:00
Gleb Natapov
f00e00fde0 test: fix test_coordinator_queue_management flakiness
After 39cec4ae45 node join may fail with either "request canceled"
notification or (very rarely) because it was banned. Depend on timing.
The patch fixes the test to check for both possibilities.
2025-12-04 11:06:20 +02:00
Gleb Natapov
b0727d3f2a test/pylib: allow expected_error in server_start to contain regular expression
Currently expected_error parameter to server_start can only work with
exact matches. Change it to support regular expressions.
2025-12-04 11:06:20 +02:00
Calle Wilund
4169bdb7a6 encryption::gcp_host: Add exponential retry for server errors
Fixes #27242

Similar to AWS, google services may at times simply return a 503,
more or less meaning "busy, please retry". We rely for most cases
higher up layers to handle said retry, but we cannot fully do so,
because both we reach this code sometimes through paths that do
no such thing, and also because it would be slightly inefficient,
since we'd like to for example control the back-off for auth etc.

This simply changes the existing retry loop in gcp_host to
be a little more forgiving, special case 503 errors and extend
the retry to the auth part, as well as re-use the
exponential_backoff_retry primitive.

v2:
* Avoid backoff if refreshing credentials. Should not add latency due to this.
* Only allow re-auth once per (non-service-failure-backoff) try.
* Add abort source to both request and retry
v3:
* Include timeout and other server errors in retry-backoff
v4:
* Reorder error code handling correctly

Closes scylladb/scylladb#27267
2025-12-04 10:13:37 +02:00
Anna Stuchlik
c5580399a8 replace the Driver pages with a link to the new Drivers pages
This commit removes the now redundant driver pages from
the Scylla DB documentation. Instead, the link to the pages
where we moved the diver information is added.
Also, the links are updated across the ScyllaDB manual.

Redirections are added for all the removed pages.

Fixes https://github.com/scylladb/scylladb/issues/26871

Closes scylladb/scylladb#27277
2025-12-04 10:07:27 +02:00
Tomasz Grabiec
1d42770936 Merge 'topology_coordinator: Add barrier to cleanup_target' from Łukasz Paszkowski
Consider the following scenario:
1. A table has RF=3 and writes use CL=QUORUM
2. One node is down
3. There is a pending tablet migration from the unavailable node
   that is reverted

During the revert, there can be a time window where the pending replica
being cleaned up still accepts writes. This leads to write failures,
as only two nodes (out of four) are able to acknowledge writes.

This patch fixes the issue by adding a barrier to the cleanup_target
tablet transition state, ensuring that the coordinator switches back to
the previous replica set before cleanup is triggered.

Fixes https://github.com/scylladb/scylladb/issues/26512

It's a pre existing issue. Backport is required to all recent 2025.x versions.

Closes scylladb/scylladb#27413

* github.com:scylladb/scylladb:
  topology_coordinator: Fix the indentation for the cleanup_target case
  topology_coordinator: Add barrier to cleanup_target
  test_node_failure_during_tablet_migration: Increase RF from 2 to 3
2025-12-03 23:57:45 +01:00
Taras Veretilnyk
d287b054b9 sstables: Add TemporaryScylla metadata component type
Add TemporaryScylla component type to make atomic updates of SSTable Scylla metadata using temporary files
and atomic rename operations possible. This will be needed in further commit to rewrite metadata together with
the statistics component.
2025-12-03 23:40:10 +01:00
Szymon Wasik
4f803aad22 Improve documentation of vector search configuration parameters.
This patch adds separate group for vector search parameters in the
documentation and fixes small typos and formatting.

Fixes: SCYLLADB-77.

Closes scylladb/scylladb#27385
2025-12-03 21:02:59 +02:00
Karol Nowacki
a54bf50290 vector_search: Fix requests hanging on unreachable nodes
When a vector store node becomes unreachable, a client request sent
before the keep-alive timer fires would hang until the CQL query
timeout was reached.

This occurred because the HTTP request writes to the TCP buffer and then
waits for a response. While data is in the buffer, TCP retransmissions
prevent the keep-alive timer from detecting the dead connection.

This patch resolves the issue by setting the `TCP_USER_TIMEOUT` socket
option, which applies an effective timeout to TCP retransmissions,
allowing the connection to fail faster.

Closes scylladb/scylladb#27388
2025-12-03 21:01:43 +02:00
Nadav Har'El
06dd3b2e64 install-dependencies.sh: add zlib
Scylla uses zlib, through the header <zlib.h>, in sstable compression.
We also want to use it in Alternator for gzip-compressed requests.

We never actually required zlib explicltly in install-dependencies.sh,
we only get it through transitive dependencies. But it's better to
require it explicitly so this is what we do in this patch.

In Fedora, we use the newer, more efficient, zlib-ng which is API-
compatible with the classic zlib. Unfortunately, the Debian zlib-ng
package is *not* drop-in compatible with zlib (you need to include
a different header file <zlib-ng.h>) so we use the classic zlib.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#27238
2025-12-03 19:30:36 +02:00