Commit Graph

11801 Commits

Author SHA1 Message Date
Petr Gusev
33e9ea4a0f test.py: add universalasync_typed_wrap
The universalasync.wrap function doesn't preserve the
type information, which confuses the VS Code Pylance
plugin and makes code navigation hard.

In this commit we fix the problem by adding a typed
wrapped around universalasync.wrap.

Fixes: scylladb/scylladb#26639
2025-10-22 11:32:37 +02:00
Petr Gusev
e1667afa50 topology_coordinator: fix log message 2025-10-22 11:32:37 +02:00
Nadav Har'El
7c9f5ef59e Merge 'alternator/executor: instantly mark view as built when creating it with base table' from Michał Jadwiszczak
`CreateTable` request creates GSI/LSI together with the base table,
the base table is empty and we don't need to actually build the view.

In tablet-based keyspaces we can just don't create view building tasks
and mark the view build status as SUCCESS on all nodes. Then, the view
building worker on each node will mark the view as built in
`system.built_views` (`view_building_worker::update_built_views()`).

Vnode-based keyspaces will use the "old" logic of view builder, which
will process the view and mark it as built.

Fixes scylladb/scylladb#26615

This fix should be backported to 2025.4.

Closes scylladb/scylladb#26657

* github.com:scylladb/scylladb:
  test/alternator/test_tablets: add test for GSI backfill with tablets
  test/alternator/test_tablets: add reproducer for GSI with tablets
  alternator/executor: instantly mark view as built when creating it with base table
2025-10-22 10:44:28 +03:00
Calle Wilund
31cc1160b4 test::pylib::dockerized_service: Add helper for running docker/podman
While there is a docker interface for python, need to deal with
the docker-in-docker issues etc. This uses pure subprocess and
stream parse. Meant to provide enough flexibility for all our
docker mock server needs.
2025-10-21 23:26:50 +00:00
Avi Kivity
ab488fbb3f Merge 'Switch to seastar API level 9 (no more packet-s in output_stream/data_sink API)' from Pavel Emelyanov
Other than patching Scylla sinks to implement new data_sink_impl::put(std::span<temporary_buffer>) overload, the PR changes transport write_response() method to stop using output_stream::write(scattered_message) because it's also gone.

Using newer seastar API, no need to backport

Closes scylladb/scylladb#26592

* github.com:scylladb/scylladb:
  code: Fix indentation after previous patch
  code: Switch to seastar API level 9
  transport: Open-code invoke_with_counting into counting_data_sink::put
  transport: Don't use scattered_message
  utils: Implement memory_data_sink::put(net::packet)
2025-10-22 01:51:43 +03:00
Michał Jadwiszczak
34503f43a1 test/alternator/test_tablets: add test for GSI backfill with tablets
The test should pass without the fix for scylladb/scylladb#26615,
because the `executor::updata_table()` uses
`service::prepare_new_view_announcement()`, which creates view building
tasks for the view.

But it's better to add this test.
2025-10-22 00:34:49 +02:00
Michał Jadwiszczak
bdab455cbb test/alternator/test_tablets: add reproducer for GSI with tablets 2025-10-22 00:34:10 +02:00
Andrei Chekun
24d17c3ce5 test.py: rewrite the wait_for_first_completed
Rewrite wait_for first_completed to return only first completed task guarantee
of awaiting(disappearing) all cancelled and finished tasks
Use wait_for_first_completed to avoid false pass tests in the future and issues
like #26148
Use gather_safely to await tasks and removing warning that coroutine was
not awaited

Closes scylladb/scylladb#26435
2025-10-22 01:13:43 +03:00
Avi Kivity
029513bee9 Merge 'storage_proxy: wait for write handlers destruction' from Petr Gusev
`shared_ptr<abstract_write_response_handler>` instances are captured in the `lmutate` and `rmutate` lambdas of `send_to_live_endpoints()`. As a result, an `abstract_write_response_handler` object may outlive its removal from the `storage_proxy::_response_handlers` map -> `cancel_all_write_response_handlers()` doesn't actually wait for requests completion -> `sp::drain_on_shutdown()` doesn't guarantee all requests are drained -> `sp::stop_remote()` completes too early and `paxos_store` is destroyed while LWT local writes might still be in progress. In this PR we introduce a `write_handler_destroy_promise` to wait for such pending instances in `cancel_write_handlers()` and `cancel_all_write_response_handlers()` to prevent the `use-after-free`.

A better long-term solution might be to replace `shared_ptr` with `unique_ptr` for `abstract_write_response_handler` and use a separate gate to track the `lmutate/rmutate` lambdas. We do not actually need to wait for these lambdas to finish before sending a timeout or error response to the client, as we currently do in `~abstract_write_response_handler`.

Fixes scylladb/scylladb#26355

backport: need to be backported to 2025.4 since #26355 is reproduced on LWT over tablets

Closes scylladb/scylladb#26408

* github.com:scylladb/scylladb:
  test_tablets_lwt: add test_lwt_shutdown
  storage_proxy: wait for write handler destruction
  storage_proxy: coroutinize cancel_write_handlers
  storage_proxy: cancel_write_handlers: don't hold a strong pointer to handler
2025-10-22 00:02:08 +03:00
Michał Hudobski
5c957e83cb vector_search: remove dependence on cql3
This patch removes the dependence of vector search module
on the cql3 module by moving the contents of cql3/type_json.hh
to types/json_utils.hh and removing the usage of cql3 primary_key
object in vector_store_client. We also make the needed adjustments
to files that were previously using the afformentioned type_json.hh
file.

This fixes the circular dependency cql3 <-> vector_search.

Closes scylladb/scylladb#26482
2025-10-21 17:41:55 +03:00
Michael Litvak
35711a4400 test: cdc: test cdc compatible schema
Add a simple test verifying our changes for the compatible CDC schema.
The test checks we can write to a table with CDC enabled after ALTER and
after node restart.
2025-10-21 14:14:34 +02:00
Michael Litvak
448e14a3b7 cdc: use compatiable cdc schema
in the CDC log transformer, when augmenting a base mutation, use the CDC
log schema that is compatible with the base schema, if set.

Now that the base schema has a pointer to its CDC schema, we can use it
instead of getting the current schema from the db, which may not be
compatible with the base schema.

The compatible CDC schema may not be set if the cluster is not using
raft mode for schema. In this case, we maintain the previous behavior.
2025-10-21 14:14:33 +02:00
Michael Litvak
ac96e40f13 schema: add pointer to CDC schema
Add to the schema object a member that points to the CDC schema object
that is compatible with this schema, if any.

The compatible CDC schema is created and altered with its base schema in
the same group0 operation.

When generating CDC log mutations for some base mutation we want them to
be created using a compatible schema thas has a CDC column corresponding
to each base column. This change will allow us to find the right CDC
schema given a base mutation.

We also update the relevant structures in the schema registry that are
related to learning about schemas and transporting schemas across
shards or nodes.

When transporting a schema as frozen_schema, we need to transport the
frozen cdc schema as well, and set it again when unfreezing and
reconstructing the schema.

When adding a schema to the registry, we need to ensure its CDC schema
is added to the registry as well.

Currently we always set the CDC schema to nullptr and maintain the
previous behavior. We will change it in a later commit. Until then, we
mark all places where CDC schema is passed clearly so we don't forget
it.
2025-10-21 14:13:43 +02:00
Michael Litvak
085abef05d schema_registry: use extended_frozen_schema in schema load
Change the schema loader type in the schema_registry to return a
extended_frozen_schema instead of view_schema_and_base_info, and
remove view_schema_and_base_info which is not used anymore.

The casting between them is trivial.
2025-10-21 14:13:43 +02:00
Emil Maskovsky
cf93820c0a test/cluster: fix missing await in test_group0_tombstone_gc
The recursive call to alter_system_schema() was missing the await
keyword, which meant the coroutine was never actually executed and
the test wasn't doing what it was supposed to do.

Not backporting: Test fix only.

Closes scylladb/scylladb#26623
2025-10-21 11:22:39 +02:00
Calle Wilund
91db8583f8 test::pylib::kmip_wrapper: Modify to be usable by pytest fixtures
Add `serve` impl that does not mess with signals, and shutdown
that does not mess with threads. Also speed up standalone shutdown
to make boost tests less slow.
2025-10-21 09:01:55 +00:00
Calle Wilund
772bd856e2 test::boost::kmip_wrapper: Move python script for PyKMIP to pylib
Prepare for re-use in python tests as well as boost ones.
2025-10-21 09:01:54 +00:00
Botond Dénes
fbceb8c16b Merge 's3_client: handle failures which require http::request updating' from Ernest Zaslavsky
Apply two main changes to the s3_client error handling
1. Add a loop to s3_client's `make_request` for the case whe the retry strategy will not help since the request itself have to be updated. For example, authentication token expiration or timestamp on the request header
2. Refine the way we handle exceptions in the `chunked_download_source` background fiber, now we carry the original `exception_ptr` and also we wrap EVERY exception in `filler_exception` to prevent retry strategy trying to retry the request altogether

Fixes: https://github.com/scylladb/scylladb/issues/26483

Should be ported back to 2025.3 and 2025.4 to prevent deadlocks and failures in these versions

Closes scylladb/scylladb#26527

* github.com:scylladb/scylladb:
  s3_client: tune logging level
  s3_client: add logging
  s3_client: improve exception handling for chunked downloads
  s3_client: fix indentation
  s3_client: add max for client level retries
  s3_client: remove `s3_retry_strategy`
  s3_client: support high-level request retries
  s3_client: just reformat `make_request`
  s3_client: unify `make_request` implementation
2025-10-21 10:40:38 +03:00
Botond Dénes
c543059f86 Merge 'Synchronize tablet split and load-and-stream' from Raphael Raph Carvalho
Load-and-stream is broken when running concurrently to the finalization step of tablet split.

Consider this:
1) split starts
2) split finalization executes barrier and succeed
3) load-and-stream runs now, starts writing sstable (pre-split)
4) split finalization publishes changes to tablet metadata
5) load-and-stream finishes writing sstable
6) sstable cannot be loaded since it spans two tablets

two possible fixes (maybe both):

1) load-and-stream awaits for topology to quiesce
2) perform split compaction on sstable that spans both sibling tablets

This patch implements # 1. By awaiting for topology to quiesce,
we guarantee that load-and-stream only starts when there's no
chance coordinator is handling some topology operation like
split finalization.

Fixes https://github.com/scylladb/scylladb/issues/26455.

Closes scylladb/scylladb#26456

* github.com:scylladb/scylladb:
  test: Add reproducer for l-a-s and split synchronization issue
  sstables_loader: Synchronize tablet split and load-and-stream
2025-10-21 09:43:38 +03:00
Tomasz Grabiec
ba692d1805 schema_tables: Keep "replication" column backwards-compatible by expanding rack lists to numeric RF
In 380f243986 we added support for rack
lists in replication options. Drivers which are not prepared to parse
that (as of now, all of them), will not create metadata object for
that keyspace. This breaks, for example, the "copy to/from" cqlsh
command. Potentially other things too.

To fix that, keep the "replication" column in the old format, and
store numeric RF there, which corresponds to the number of
replicas. Accurate options in the new format are put in
"replication_v2".

We set replication_v2 in the schema only when it differs from the old
"replication" so that the new column is not set during upgrade,
otherwise downgrade would fail. Partition tombstone is added to ensure
that pre-alter replication_v2 value is deleted on alters which change
replication to a value which is the same as the post-alter
"replication" value.

Fixes #26415

Closes scylladb/scylladb#26429
2025-10-21 09:11:25 +03:00
Tomasz Grabiec
e4e79be295 Merge 'tablet_allocator: allow merges in base tables if rf-rack-valid=true' from Piotr Dulikowski
Tablet merge of base tables is only safe if there is at most one replica in each rack. For more details on why it is the case please see scylladb/scylladb#17265. If the rf-rack-valid-keyspaces is turned on, this condition is satisfied, so allow it in that case.

Fixes: scylladb/scylladb#26273

Marked for backport to 2025.4 as MVs are getting un-experimentaled there.

Closes scylladb/scylladb#26278

* github.com:scylladb/scylladb:
  test: mv: add a test for tablet merge
  tablet_allocator, tests: remove allow_tablet_merge_with_views injection
  tablet_allocator: allow merges in base tables if rf-rack-valid=true
2025-10-21 00:18:30 +02:00
Raphael S. Carvalho
4654cdc6fd test: Add reproducer for l-a-s and split synchronization issue
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-10-20 19:17:25 -03:00
Petr Gusev
8925f31596 test_tablets_lwt: add test_lwt_shutdown 2025-10-20 20:16:09 +02:00
Piotr Wieczorek
a3ec6c7d1d alternator/streams: Support userIdentity field for TTL deletions
UserIdentity is a map of two fields in GetRecords responses, which
always has the same value. It may be missing, or contain a constant
object with value `{"type": "Service", "principalId":
"dynamodb.amazonaws.com"}`. Currently, the latter is set only for
`REMOVE`s triggered by TTL.

This commit introduces two new CDC operation types: `service_row_delete`
and `service_partition_delete`, emitted in place of `row_delete` and
`partition_delete`. Alternator Streams treats them as regular `REMOVE`s,
but in addition adds the `userIdentity` field to the record.

This change may break existing Scylla libraries for reading raw CDC
tables, but we doubt that anybody has this use case.

Refs https://github.com/scylladb/scylladb/pull/26149
Refs https://github.com/scylladb/scylladb/pull/26121
Fixes https://github.com/scylladb/scylladb/issues/11523

Closes scylladb/scylladb#26460
2025-10-20 17:15:59 +02:00
Nadav Har'El
eb06ace944 Merge 'auth: implement vector store authorization' from Michał Hudobski
This patch implements the changes required by the Vector Store authorization, as described in https://scylladb.atlassian.net/wiki/spaces/RND/pages/107085899/Vector+Store+Authentication+And+Authorization+To+ScyllaDB, that is:

- adding a new permission VECTOR_SEARCH_INDEXING, grantable only on ALL KEYSPACES
- allowing users with that permission to perform SELECT queries, but only on tables with a vector index
- increasing the number of scheduling groups by one to allow users to create a service level for a vector store user
- adjusting the tests and documentation

These changes are needed, as the vector indexes are managed by the external service, Vector Store, which needs to read the tables to create the indexes in its memory. We would like to limit the privileges of that service to a minimum to maintain the principle of least privilege, therefore a new permission, one that allows the SELECTs conditional on the existence of a vector_index on the table.

Fixes: VECTOR-201

Backport reasoning:
Backport to 2025.4 required as this can make upgrading clusters more difficult if we add it in 2026.1. As for now Scylla Cloud requires version 2025.4 to enable vector search and permission is set by orchestrator so there is no chance that someone will try to add this permission during upgrade. In 2026.1 it will be more difficult.

Closes scylladb/scylladb#25976

* github.com:scylladb/scylladb:
  docs: adjust docs for VS auth changes
  test: add tests for VECTOR_SEARCH_INDEXING permission
  cql: allow VECTOR_SEARCH_INDEXING users to select
  auth: add possibilty to check for any permission in set
  auth: add a new permission VECTOR_SEARCH_INDEXING
2025-10-20 17:32:00 +03:00
Ernest Zaslavsky
1d34657b14 s3_client: improve exception handling for chunked downloads
Refactor the wrapping exception used in `chunked_download_source` to
prevent the retry strategy from reattempting failed requests. The new
implementation preserves the original `exception_ptr`, making the root
cause clearer and easier to diagnose.
2025-10-20 17:12:59 +03:00
Dawid Mędrek
7e201eea1a index: Set tombstone_gc when creating secondary index
Before this commit, when the underlying materialized view was created,
it didn't have the property `tombstone_gc` set to any value. That
was a bug and we fix it now.

Two reproducer tests is added for validation. They reproduce the problem
and don't pass before this commit.

Fixes scylladb/scylladb#26542
2025-10-20 14:04:45 +02:00
Łukasz Paszkowski
7ec369b900 database: Log message after critical_disk_utilization mode is set
This is a follow-up of the previous fix: https://github.com/scylladb/scylladb/pull/26030

The test test_user_writes_rejection starts a 3-node cluster and
creates a large file on one of the nodes, to trigger the out-of-space
prevention mechanism, which should reject writes on that node.

It waits for the log message 'Setting critical disk utilization mode: true'
and then executes a write expecting the node to reject it.

Currently, the message is logged before the `_critical_disk_utilization`
variable is actually updated. This causes the test to fail sporadically
if it runs quickly enough.

The fix splits the logging into two steps:
1. "Asked to set critical disk utilization mode" - logged before any action
2) "Set critical disk utilization mode" - logged after `_critical_disk_utilization` has been updated

The tests are updated to wait for the second message.

Fixes https://github.com/scylladb/scylladb/issues/26004

Closes scylladb/scylladb#26392
2025-10-20 13:24:10 +03:00
Pavel Emelyanov
44ed3bbb7c Merge 'RFC: Initial GCP storage backend for scylla (sstables + backup)' from Calle Wilund
Integrates GCP object storage as a working storage backend for scylla sstables as well as backup storage.

Adds an abstraction layer (atm very heavily designed around the s3 client interface and usage) to allow the "storage" etc layers of sstable management to pick transparently between "s3" and "gs" providers.

This modifies the scylla config such that endpoints can optionally (through a "type" param) ref a GS backend.
Similarly with storage_options.

Also adds some IO wrapping primitives to make it more feasible to place some logic at a mid level of the implementation stack (such as making networked storage files, ranged reading etc).

Test s3 fixture is replaced (where appropriate) with an `object_storage` fixture that multiplexes the test across both backends.
Unit tests are duplicated and for the GS versions use a boost test fixture for GCS, default local fake.

Fixes #25359
Fixes #26453

Closes scylladb/scylladb#26186

* github.com:scylladb/scylladb:
  docs::dev::object_storage: Add some initial info on GS storage
  docs/dev: Add mention of (nested) docker usage in testing.md
  sstables::object_storage_client: Forward memory limit semaphore to GS instance
  utils::gcp::object_storage: Add optional memory limits to up/download
  sstables::object_storage_client: Add multi-upload support for GS
  utils::gcp::storage: Add merge objects operation
  test_backup/test_basic: Make tests multiplex both s3 and gs backends
  test::cluster::conftest: Add support for multiple object storage backends
  boost::gcs_storage_test: reindent
  boost::gcs_storage_test: Convert to use fixture
  tests::boost: Add GS object storage cases to mirror S3 ones
  tests::lib::gcs_fixture: Add a reusable test fixture for real/fake GS/GCS
  tests::lib::test_utils: Add overloads/helpers for reading and (temp) writing env
  sstables::object_storage_client: Add google storage implementation
  test_services: Allow testing with GS object storage parameters
  utils::gcp::gcp_credentials: Add option to create uninitialized credentials
  utils::gcp::object_storage: Make create_download_source return seekable_data_source
  utils::gcp::object_storage: Add defensive copies of string_view params
  utils::gcp::object_storage: Add missing retry backoff increate
  utils::gcp::object_storage: Add timestamp to object listing
  utils::gcp::object_storage: Add paging support to list_objects
  object_storage_client: Add object_name wrapper type
  utils::gcp::object_storage: Add optional abort_source
  utils::rest::client: Add abort_source support
  sstables: Use object_storage_client for remote storage
  sstables::object_storage_client: Add abstraction layer for OS cliens (s3 initial)
  s3::upload_progress: Promote to general util type
  storage_options: Abstract s3 to "object_storage" and add gs as option
  sstables::file_io_extension: Change "creator" callback to just data_source
  utils::io-wrappers: Add ranged data_source
  utils::io-wrappers: Add file wrapper type for seekable_source
  utils::seekable_source: Add a seekable IO source type
  object_storage_endpoint_param: Add gs storage as option
  config: break out object_storage_endpoint_param preparing for multi storage
2025-10-20 13:14:53 +03:00
Patryk Jędrzejczak
c57f097630 test: test group0 tombstone GC in the Raft-based recovery procedure
We add a regression test for the bug fixed in the previous commits.
2025-10-20 12:05:11 +02:00
Botond Dénes
1ab697693f Merge 'compaction/twcs: fix use after free issues' from Lakshmi Narayanan Sreethar
The `compaction_strategy_state` class holds strategy specific state via
a `std::variant` containing different state types. When a compaction
strategy performs compaction, it retrieves a reference to its state from
the `compaction_strategy_state` object. If the table's compaction
strategy is ALTERed while a compaction is in progress, the
`compaction_strategy_state` object gets replaced, destroying the old
state. This leaves the ongoing compaction holding a dangling reference,
resulting in a use after free.

Fix this by using `seastar::shared_ptr` for the state variant
alternatives(`leveled_compaction_strategy_state_ptr` and
`time_window_compaction_strategy_state_ptr`). The compaction strategies
now hold a copy of the shared_ptr, ensuring the state remains valid for
the duration of the compaction even if the strategy is altered.

The `compaction_strategy_state` itself is still passed by reference and
only the variant alternatives use shared_ptrs. This allows ongoing
compactions to retain ownership of the state independently of the
wrapper's lifetime.

The method `maybe_wait_for_sstable_count_reduction()`, when retrieving
the list of sstables for a possible compaction, holds a reference to the
compaction strategy. If the strategy is updated during execution, it can
cause a use after free issue. To prevent this, hold a copy of the
compaction strategy so it isn’t yanked away during the method’s
execution.

Fixes #25913

Issue probably started after 9d3755f276, so backport to 2025.4

Closes scylladb/scylladb#26593

* github.com:scylladb/scylladb:
  compaction: fix use after free when strategy is altered during compaction
  compaction/twcs: pass compaction_strategy_state to internal methods
  compaction_manager: hold a copy to compaction strategy in maybe_wait_for_sstable_count_reduction
2025-10-20 10:45:47 +03:00
Michael Litvak
b808d84d63 storage_service: improve colocated repair error to show table names
When requesting repair for tablets of a colocated table, the request
fails with an error. Improve the error message to show the table names
instead of table IDs, because the table names are more useful for users.

Fixes scylladb/scylladb#26567

Closes scylladb/scylladb#26568
2025-10-20 10:03:31 +03:00
Piotr Dulikowski
70b0cfb13e Merge 'test: cluster: Replica exceptions tests' from Dario Mirovic
This patch series introduces several tests that check number of exceptions that happens during various replica operations. The goal is to have a set of tests that can catch situations where number of exceptions per operation increases. It makes exception throw regressions easier to catch.

The tests cover apply counter update and apply functionalities in the database layer.

There are more paths that can be checked, like various semaphore wait timeouts located deeper in the code. This set of tests does not cover all code paths.

Fixes #18164

This is an improvement. No backport needed.

Closes scylladb/scylladb#25992

* github.com:scylladb/scylladb:
  test: cluster: test replica write timeout
  database: parameterize apply_counter_update_delay_5s injector value
  test: cluster: test replica exceptions - test rate limit exceptions
2025-10-20 10:03:31 +03:00
Piotr Dulikowski
a716fab125 Merge 'alternator/metrics: Log operation sizes to histograms' from Piotr Wieczorek
This PR adds operation per-table histograms to Alternator with item sizes involved in an operation, for each of the operations: `GetItem`, `PutItem`, `DeleteItem`, `UpdateItem`, `BatchGetItem`, `BatchWriteItem`. If read-before-write wasn't performed (i.e. it was not needed by the operation and the flag `alternator_force_read_before_write` was disabled), then we log sizes of the items that are in the request. Also, `UpdateItem` logs the maximum of the update size and the existing item size. We'll change it in a next PR.

Fixes: #25143

Closes scylladb/scylladb#25529

* github.com:scylladb/scylladb:
  alternator: Add UpdateItem and BatchWriteItem response size metrics
  alternator: Add PutItem and DeleteItem response size metrics
  alternator: Add BatchGetItem response size metrics
  alternator: Add GetItem response size metrics
  alternator/test: Add more context to test_metrics.py asserts
2025-10-20 10:03:31 +03:00
Lakshmi Narayanan Sreethar
18c071c94b compaction: fix use after free when strategy is altered during compaction
The `compaction_strategy_state` class holds strategy specific state via
a `std::variant` containing different state types. When a compaction
strategy performs compaction, it retrieves a reference to its state from
the `compaction_strategy_state` object. If the table's compaction
strategy is ALTERed while a compaction is in progress, the
`compaction_strategy_state` object gets replaced, destroying the old
state. This leaves the ongoing compaction holding a dangling reference,
resulting in a use after free.

Fix this by using `seastar::shared_ptr` for the state variant
alternatives(`leveled_compaction_strategy_state_ptr` and
`time_window_compaction_strategy_state_ptr`). The compaction strategies
now hold a copy of the shared_ptr, ensuring the state remains valid for
the duration of the compaction even if the strategy is altered.

The `compaction_strategy_state` itself is still passed by reference and
only the variant alternatives use shared_ptrs. This allows ongoing
compactions to retain ownership of the state independently of the
wrapper's lifetime.

Fixes #25913

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-10-17 22:57:05 +05:30
Dario Mirovic
1d93f342f9 test: cluster: test replica write timeout
This patch introduces test `test_replica_database_apply_timeout`.
It tests timeout on database write. The test uses error injection
that returns timeout error if the injection `database_apply_force_timeout`
is enabled.

Refs #18164
2025-10-17 11:52:11 +02:00
Dario Mirovic
ff88fe2d76 database: parameterize apply_counter_update_delay_5s injector value
Parameterize `apply_counter_update_delay_5s` injector value. Instead of
sleeping 5s when the injection is active, read parameter value that
specifies sleep duration. To reflect these changes, it is renamed to
`apply_counter_update_delay_ms` and the sleep duration is specified in
milliseconds.

Refs #18164
2025-10-17 11:52:10 +02:00
Dario Mirovic
7dc0ff2152 test: cluster: test replica exceptions - test rate limit exceptions
This patch introduces two tests for `replica::rate_limit_exception`.
One test is for write/apply limit, the other one for read/query limit.

The tests check the number of rate limit errors reported and the
number of cpp exceptions reported. If somebody adds an exception
throw on the rate limit paths, this test will catch it and fail.

Refs #18164
2025-10-17 10:54:43 +02:00
Botond Dénes
01bcafbe24 Merge 'test: make various improvements in the recovery procedure tests' from Patryk Jędrzejczak
This PR contains various improvements in the recovery procedure
tests, mostly `test_raft_recovery_user_data`:
- decreasing the running time,
- some simplifications,
- making sure group 0 majority is lost when expected.

These are not critical test changes, so no need to backport.

Closes scylladb/scylladb#26442

* github.com:scylladb/scylladb:
  test: assert that majority is lost in some tests of the recovery procedure
  test: rest_client: add timeout support for read_barrier
  test: test_raft_recovery_user_data: lose majority when killing one dc
  test: test_raft_recovery_user_data: shutdown driver sessions
  test: test_raft_recovery_user_data: use a separate driver connection for the write workload
  test: test_raft_recovery_user_data: send ALTER KEYSPACE to any node
  test: test_raft_recovery_user_data: bring failure_detector_timeout_in_ms back to 20 s
  test: test_raft_recovery_user_data: speed up replace operations
  test: stop/start servers concurrently in the recovery procedure tests
2025-10-17 10:54:05 +03:00
Pawel Pery
10208c83ca vector_search: fix flaky dns_refresh_aborted test
The test process like that:
- run long dns refresh process
- request for the resolve hostname with short abort_source timer - result
  should be empty list, because of aborted request

The test sometimes finishes long dns refresh before abort_source fired and the
result list is not empty.

There are two issues. First, as.reset() changes the abort_source timeout. The
patch adds a get() method to the abort_source_timeout class, so there is no
change in the abort_source timeout. Second, a sleep could be not reliable. The
patch changes the long sleep inside a dns refresh lambda into
condition_variable handling, to properly signal the end of the dns refresh
process.

Fixes: #26561
Fixes: VECTOR-268

It needs to be backported to 2025.4

Closes scylladb/scylladb#26566
2025-10-17 09:33:17 +02:00
Pavel Emelyanov
068d788084 transport: Don't use scattered_message
The API to put scattered_message into output_stream() is gone in seastar
API level 9, transport is the only place in Scylla that still uses it.

The change is to put the response as a sequence of temporary_buffer-s.
This preserves the zero-copy-ness of the reply, but needs few things to
care about.

First, the response header frame needs to be put as zero-copy buffer
too. Despite output_stream() supports semi-mixed mode, where z.c.
buffers can follow the buffered writes, it won't apply here. The socket
is flushed() in batched mode, so even if the first reply populates the
stream with data and flushes it, the next response may happen to start
putting the header frame before delayed flush took place.

Second, because socket is flushed in batch-flush poller, the temporary
buffers that are put into it must hold the foreigh_ptr with the response
object. With scattered message this was implemented with the help of a
delter that was attached to the message, now the deleter is shared
between all buffers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-10-17 10:17:08 +03:00
Tomasz Grabiec
c4a87453a2 Merge 'Add experimental feature flag for strongly consistent tables and extend kesypace creation syntax to allow specifying consistency mode.' from Gleb Natapov
The series adds an experimental flag for strongly consistent tables  and extends "CREATE KEYSPACE" ddl with `consistency` option that allows specifying the consistency mode for the keyspace.

Closes scylladb/scylladb#26116

* github.com:scylladb/scylladb:
  schema: Allow configuring consistency setting for a keyspace
  db: experimental consistent-tablets option
2025-10-16 21:48:06 +02:00
Piotr Wieczorek
caa522a29d alternator: Add UpdateItem and BatchWriteItem response size metrics
This commit bundle introduces metrics on item sizes for Alternator operations.

The new metrics are:
- `operation_size_kib op=UpdateItem`: Tracks the size of an `UpdateItem`
  operation. This is calculated as the sum of the existing item's size
  plus the estimated size of the updated fields.
- `operation_size_kib op=BatchWriteItem`: Tracks the total size of items
  within a `BatchWriteItem` request, aggregated on a per-table basis. If
  an item already exists, the logged size is the maximum of the old and
  the new item size.

NOTE: Both metrics rely on read-before-write, so if the
`alternator_force_read_before_write` option is disabled, these metrics
may be incomplete and report inaccurate sizes.
2025-10-16 19:17:27 +02:00
Piotr Wieczorek
5ca42b3baf alternator: Add PutItem and DeleteItem response size metrics
This commit bundle introduces metrics on item sizes for Alternator
operations. Specifically, this commit adds `operation_size_kb`
histograms for sizes of items created or replaced by the `PutItem`
operation, and sizes of items deleted by `DeleteItem` requests. The
latter needs a read-before-write, so the metrics may be incomplete if
`alternator_force_read_before_write` is disabled.
2025-10-16 19:17:26 +02:00
Piotr Wieczorek
5c72fd9ea3 alternator: Add BatchGetItem response size metrics
This commit bundle introduces metrics on item sizes for Alternator
operations. Specifically, this commit adds a `operation_size_kb`
per-table histogram, which contains item sizes in BatchGetItem requests.

A size of a BatchGetItem is the sum of the sizes of all items in the
operation grouped by table. In other words, a single BatchGetItem, and
BatchWriteItem for that matter, updates the histograms for each table
that it has items in.
2025-10-16 19:16:57 +02:00
Piotr Wieczorek
1aa3819b57 alternator: Add GetItem response size metrics
This commit bundle introduces metrics on item sizes for Alternator
operations. Specifically, this commit adds a per-table
`operation_size_kb` histogram, recording the sizes of the items
contained in GetItem responses.
2025-10-16 19:04:55 +02:00
Piotr Dulikowski
44257f4961 Merge 'raft topology: disable schema pulls in the Raft-based recovery procedure' from Patryk Jędrzejczak
Schema pulls should always be disabled when group 0 is used. However,
`migration_manager::disable_schema_pulls()` is never called during
a restart with `recovery_leader` set in the Raft-based recovery
procedure, which causes schema pulls to be re-enabled on all live nodes
(excluding the nodes replacing the dead nodes). Moreover, schema pulls
remain enabled on each node until the node is restarted, which could
be a very long time.

We fix this issue and add a regression test in this PR.

Fixes #26569

This is an important bug fix, so it should be backported to all branches
with the Raft-based recovery procedure (2025.2 and newer branches).

Closes scylladb/scylladb#26572

* github.com:scylladb/scylladb:
  test: test_raft_recovery_entry_loss: fix the typo in the test case name
  test: verify that schema pulls are disabled in the Raft-based recovery procedure
  raft topology: disable schema pulls in the Raft-based recovery procedure
2025-10-16 18:48:38 +02:00
Emil Maskovsky
6769c313c2 raft: small fixes for voters code
Minor cleanups and improvements to voter-related code.

No backport: cleanup only, no functional changes.

Closes scylladb/scylladb#26559
2025-10-16 18:41:08 +02:00
Piotr Wieczorek
1559021c4e alternator/test: Add more context to test_metrics.py asserts
This commit adds more information to the assert messages to ease in
debugging. The semantics of the asserts remains the same.
2025-10-16 14:41:19 +02:00
Piotr Dulikowski
a8d92f2abd test: mv: add a test for tablet merge
The test test_mv_tablets_replace verifies that merging tablets of both a
view and its base table is allowed if rf-rack-valid-keyspaces option is
enabled (and it is enabled by default in the test suite).
2025-10-16 14:07:37 +02:00