Commit Graph

53948 Commits

Author SHA1 Message Date
Avi Kivity
287625b297 auth_test: coroutinize test_alter_with_workload_type
Use co_await instead of return for improved readability.
2026-04-05 18:26:30 +03:00
Avi Kivity
4eeb5ef54d auth_test: coroutinize test_alter_with_timeouts
Use co_await instead of return for improved readability.
2026-04-05 18:26:30 +03:00
Avi Kivity
170c71b25d auth_test: coroutinize role_permissions_table_is_protected
Use co_await for improved readability.
2026-04-05 18:26:30 +03:00
Avi Kivity
13eccf519f auth_test: coroutinize role_members_table_is_protected
Use co_await for improved readability.
2026-04-05 18:26:30 +03:00
Avi Kivity
43ff3798ad auth_test: coroutinize roles_table_is_protected
Use co_await for improved readability.
2026-04-05 18:26:30 +03:00
Avi Kivity
c586eeb003 auth_test: coroutinize test_password_authenticator_operations
Flatten continuation chains (.then()) into linear thread-style code
with .get() calls for improved readability. Remove the now-unused
require_throws helper template.
2026-04-05 18:26:25 +03:00
Avi Kivity
fbccfe5c9d auth_test: coroutinize test_password_authenticator_attributes
Use co_await instead of return+do_with_cql_env+make_ready_future
for improved readability.
2026-04-05 17:28:09 +03:00
Avi Kivity
e3dee64003 auth_test: coroutinize test_default_authenticator
Use co_await instead of return+do_with_cql_env+make_ready_future
for improved readability.
2026-04-05 17:27:45 +03:00
Jenkins Promoter
ab4a2cdde2 Update pgo profiles - aarch64 2026-04-05 16:58:02 +03:00
Jenkins Promoter
b97cf0083c Update pgo profiles - x86_64 2026-04-05 16:00:15 +03:00
Nikos Dragazis
6d50e67bd2 scylla_swap_setup: Remove Before=swap.target dependency from swap unit
When a Scylla node starts, the scylla-image-setup.service invokes the
`scylla_swap_setup` script to provision swap. This script allocates a
swap file and creates a swap systemd unit to delegate control to
systemd. By default, systemd injects a Before=swap.target dependency
into every swap unit, allowing other services to use swap.target to wait
for swap to be enabled.

On Azure, this doesn't work so well because we store the swap file on
the ephemeral disk [1] which has network dependencies (`_netdev` mount
option, configured by cloud-init [2]). This makes the swap.target
indirectly depend on the network, leading to dependency cycles such as:

swap.target -> mnt-swapfile.swap -> mnt.mount -> network-online.target
-> network.target -> systemd-resolved.service -> tmp.mount -> swap.target

This patch breaks the cycle by removing the swap unit from swap.target
using DefaultDependencies=no. The swap unit will still be activated via
WantedBy=multi-user.target, just not during early boot.

Although this problem is specific to Azure, this patch applies the fix
to all clouds to keep the code simple.

Fixes #26519.
Fixes SCYLLADB-1257

[1] https://github.com/scylladb/scylla-machine-image/pull/426
[2] https://github.com/canonical/cloud-init/pull/1213#issuecomment-1026065501

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#28504
2026-04-05 15:07:50 +03:00
Tomasz Grabiec
74542be5aa test: pylib: Ignore exceptions in wait_for()
ManagerClient::get_ready_cql() calls server_sees_others(), which waits
for servers to see each other as alive in gossip. If one of the
servers is still early in boot, RESTful API call to
"gossiper/endpoint/live" may fail. It throws an exception, which
currently terminates the wait_for() and propagates up, failing the test.

Fix this by ignoring errors when polling inside wait_for. In case of
timeout, we log the last exception. This should fix the problem not
only in this case, for all uses of wait_for().

Example output:

```
pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140>
deadline = 1775218828.9172852, period = 1.0, before_retry = None
backoff_factor = 1.5, max_period = 1.0, label = None

    async def wait_for(
            pred: Callable[[], Awaitable[Optional[T]]],
            deadline: float,
            period: float = 0.1,
            before_retry: Optional[Callable[[], Any]] = None,
            backoff_factor: float = 1.5,
            max_period: float = 1.0,
            label: Optional[str] = None) -> T:
        tag = label or getattr(pred, '__name__', 'unlabeled')
        start = time.time()
        retries = 0
        last_exception: Exception | None = None
        while True:
            elapsed = time.time() - start
            if time.time() >= deadline:
                timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)"
                if last_exception is not None:
                    timeout_msg += (
                        f"; last exception: {type(last_exception).__name__}: {last_exception}"
                    )
                    raise AssertionError(timeout_msg) from last_exception
                raise AssertionError(timeout_msg)

            try:
>               res = await pred()

test/pylib/util.py:80:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    async def _sees_min_others():
>       raise Exception("asd")
E       Exception: asd

test/pylib/manager_client.py:802: Exception

The above exception was the direct cause of the following exception:

manager = <test.pylib.manager_client.ManagerClient object at 0x7f022af7e7b0>

    @pytest.mark.asyncio
    async def test_auth_after_reset(manager: ManagerClient) -> None:
        servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1")
>       cql, _ = await manager.get_ready_cql(servers)

test/cluster/auth_cluster/test_auth_after_reset.py:33:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/manager_client.py:137: in get_ready_cql
    await self.servers_see_each_other(servers)
test/pylib/manager_client.py:820: in servers_see_each_other
    await asyncio.gather(*others)
test/pylib/manager_client.py:806: in server_sees_others
    await wait_for(_sees_min_others, time() + interval, period=.5)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140>
deadline = 1775218828.9172852, period = 1.0, before_retry = None
backoff_factor = 1.5, max_period = 1.0, label = None

    async def wait_for(
            pred: Callable[[], Awaitable[Optional[T]]],
            deadline: float,
            period: float = 0.1,
            before_retry: Optional[Callable[[], Any]] = None,
            backoff_factor: float = 1.5,
            max_period: float = 1.0,
            label: Optional[str] = None) -> T:
        tag = label or getattr(pred, '__name__', 'unlabeled')
        start = time.time()
        retries = 0
        last_exception: Exception | None = None
        while True:
            elapsed = time.time() - start
            if time.time() >= deadline:
                timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)"
                if last_exception is not None:
                    timeout_msg += (
                        f"; last exception: {type(last_exception).__name__}: {last_exception}"
                    )
>                   raise AssertionError(timeout_msg) from last_exception
E                   AssertionError: wait_for(_sees_min_others) timed out after 45.30s (46 retries); last exception: Exception: asd

test/pylib/util.py:76: AssertionError
```

Fixes a failure observed in test_auth_after_reset:

```
manager = <test.pylib.manager_client.ManagerClient object at 0x7fb3740e1630>

    @pytest.mark.asyncio
    async def test_auth_after_reset(manager: ManagerClient) -> None:
        servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1")
        cql, _ = await manager.get_ready_cql(servers)
        await cql.run_async("ALTER ROLE cassandra WITH PASSWORD = 'forgotten_pwd'")

        logging.info("Stopping cluster")
        await asyncio.gather(*[manager.server_stop_gracefully(server.server_id) for server in servers])

        logging.info("Deleting sstables")
        for table in ["roles", "role_members", "role_attributes", "role_permissions"]:
            await asyncio.gather(*[manager.server_wipe_sstables(server.server_id, "system", table) for server in servers])

        logging.info("Starting cluster")
        # Don't try connect to the servers yet, with deleted superuser it will be possible only after
        # quorum is reached.
        await asyncio.gather(*[manager.server_start(server.server_id, connect_driver=False) for server in servers])

        logging.info("Waiting for CQL connection")
        await repeat_until_success(lambda: manager.driver_connect(auth_provider=PlainTextAuthProvider(username="cassandra", password="cassandra")))
>       await manager.get_ready_cql(servers)

test/cluster/auth_cluster/test_auth_after_reset.py:50:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/manager_client.py:137: in get_ready_cql
    await self.servers_see_each_other(servers)
test/pylib/manager_client.py:819: in servers_see_each_other
    await asyncio.gather(*others)
test/pylib/manager_client.py:805: in server_sees_others
    await wait_for(_sees_min_others, time() + interval, period=.5)
test/pylib/util.py:71: in wait_for
    res = await pred()
test/pylib/manager_client.py:802: in _sees_min_others
    alive_nodes = await self.api.get_alive_endpoints(server_ip)
test/pylib/rest_client.py:243: in get_alive_endpoints
    data = await self.client.get_json(f"/gossiper/endpoint/live", host=node_ip)
test/pylib/rest_client.py:99: in get_json
    ret = await self._fetch("GET", resource_uri, response_type = "json", host = host,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <test.pylib.rest_client.TCPRESTClient object at 0x7fb2404a0650>
method = 'GET', resource = '/gossiper/endpoint/live', response_type = 'json'
host = '127.15.252.8', port = 10000, params = None, json = None, timeout = None
allow_failed = False

    async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None,
                     host: Optional[str] = None, port: Optional[int] = None,
                     params: Optional[Mapping[str, str]] = None,
                     json: Optional[Mapping] = None, timeout: Optional[float] = None, allow_failed: bool = False) -> Any:
        # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html
        assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}"
        assert response_type is None or response_type in ["text", "json"], \
                f"Invalid response type requested {response_type} (expected 'text' or 'json')"
        # Build the URI
        port = port if port else self.default_port if hasattr(self, "default_port") else None
        port_str = f":{port}" if port else ""
        assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \
                "{method} {resource}"
        host_str = host if host is not None else self.default_host
        uri = self.uri_scheme + "://" + host_str + port_str + resource
        logging.debug(f"RESTClient fetching {method} {uri}")

        client_timeout = ClientTimeout(total = timeout if timeout is not None else 300)
        async with request(method, uri,
                           connector = self.connector if hasattr(self, "connector") else None,
                           params = params, json = json, timeout = client_timeout) as resp:
            if allow_failed:
                return await resp.json()
            if resp.status != 200:
                text = await resp.text()
>               raise HTTPError(uri, resp.status, params, json, text)
E               test.pylib.rest_client.HTTPError: HTTP error 404, uri: http://127.15.252.8:10000/gossiper/endpoint/live, params: None, json: None, body:
E               {"message": "Not found", "code": 404}

test/pylib/rest_client.py:77: HTTPError
```

Fixes: SCYLLADB-1367

Closes scylladb/scylladb#29323
2026-04-05 13:52:26 +03:00
Ernest Zaslavsky
c7a74237b3 compaction_test: fix formatting after previous patches 2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
101b4ad7fa compaction_test: add S3/GCS variations to tests
Add S3 and GCS variants of the compaction tests to expand coverage for
keyspaces configured to use object_storage backends.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
03bd3010bf compaction_test: extract test_env-based tests into functions
Move all test code that relies on test_env into standalone free
functions so they can be reused by upcoming S3 and GCS test suites.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
b18528e97e compaction_test: replace file_exists with storage::exists
Replace direct filesystem checks (file_exists) with the
storage-agnostic exists() method in unsealed_sstable_compaction,
sstable_clone_leaving_unsealed_dest_sstable, and
failure_when_adding_new_sstable tests, making them compatible
with object-storage backends (S3, GCS).
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
98492e4ea8 compaction_test: initialize tables with schema via make_table_for_tests
Start using `table_for_tests::make_default_schema` so test tables are
created with a real schema. This is required for object-storage
backends, which cannot operate correctly without proper schema
initialization.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
5ba79e2ed4 compaction_test: use sstable APIs to manipulate component files
Switch tests to use sstable member functions for file manipulation
instead of opening files directly on the filesystem. This affects the
helpers that emulate sstable corruption: we now overwrite the entire
component file rather than just the first few kilobytes, which is
sufficient for producing a corrupted sstable.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
405c032f48 compaction_test: fix use-after-move issue
We were moving `compaction_type_options` inside a loop, so on the
second iteration the test received an already moved-from instance.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
437a581b04 sstable_utils: add get_storage and open_file helpers
Add a non-const `get_storage` accessor to expose underlying storage,
and an `open_file` helper to access sstable component files directly.
These are needed so compaction tests can read and write sstable
components.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
2ad2dbae03 test_env: delay unplugging sstable registry
Unplugging the mock sstable_registry happened too early in the test
environment. During sstable destruction, components may still need
access to the registry, so the unplugging is moved to a later stage.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
8f6630e9cd storage: add exists method to storage abstraction
Add an `exists` method to the storage abstraction to allow S3, GCS,
and local storage implementations to check whether an sstable
component is present.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
ba785f6cab s3_client: use lowres_system_clock for aws_sigv4
Switch aws_sigv4 to lowres_system_clock since it is not affected by
time offsets often introduced in tests, which can skew db_clock. S3
requests cannot represent time shifts greater than 15 minutes from
server time, so a stable clock is required.
2026-04-05 11:07:17 +03:00
Ernest Zaslavsky
e08d779922 s3_client: add object_exists helper
Introduce `object_exists` to the S3 client to check whether an object
exists. This is primarily useful for test scenarios.
2026-04-05 11:07:16 +03:00
Ernest Zaslavsky
016b344a8a gcs_client: add object_exists helper
Introduce `object_exists` to the GCS client to check whether an object
exists. This is primarily useful for test scenarios.
2026-04-05 11:07:16 +03:00
Andrzej Jackowski
8c0920202b test: protect populate_range in row_cache_test from bad_alloc
When test_exception_safety_of_update_from_memtable was converted from
manual fail_after()/catch to with_allocation_failures() in 74db08165d,
the populate_range() call ended up inside the failure injection scope
without a scoped_critical_alloc_section guard. The other two tests
converted in the same commit (test_exception_safety_of_transitioning...
and test_exception_safety_of_partition_scan) were correctly guarded.

Without the guard, the allocation failure injector can sometimes
target an allocation point inside the cleanup path of populate_range().
In a rare corner case, this triggers a bad_alloc in a noexcept context
(reader_concurrency_semaphore::stop()), causing std::terminate.

Fixes SCYLLADB-1346

Closes scylladb/scylladb#29321
2026-04-04 21:13:26 +03:00
Andrzej Jackowski
ec274cf7b6 test: add test_upgrade_preserves_ddl_audit_for_tables
Verify that upgrading from 2025.1 to master does not silently drop DDL
auditing for table-scoped audit configurations (SCYLLADB-1155).

Test time in dev: 4s

Refs: SCYLLADB-1155
Fixes: SCYLLADB-1305
2026-04-03 13:53:28 +02:00
Andrzej Jackowski
9c7b7ac3e3 test: audit: split validate helper so callers need not pass audit_settings
The old execute_and_validate_audit_entry required every caller to
pass audit_settings so it could decide internally whether to expect
an entry. A test added later in this series needs to simply assert
an entry was produced, without specifying audit_settings at all.

Split into two methods:
- execute_and_validate_new_audit_entry: unconditionally expects an
  audit entry.
- execute_and_validate_if_category_enabled: checks audit_settings
  to decide whether to expect an entry or assert absence.

Local wrapper functions and **kwargs forwarding are removed in favor
of explicit arguments at each call site, and expected-error cases are
handled inline with assert_invalid + assert_entries_were_added.
2026-04-03 13:52:47 +02:00
Andrzej Jackowski
189bff1d5c test: audit: declare manager attribute in AuditTester base class
AuditTester uses self.manager throughout but never declares it.
The attribute is only assigned in the CQLAuditTester subclass
__init__, so the type checker reports 'Attribute "manager" is
unknown' on every self.manager reference in the base class.

Add an __init__ to AuditTester that accepts and stores the manager
instance, and update CQLAuditTester to forward it via super().__init__
instead of assigning self.manager directly.
2026-04-03 13:52:47 +02:00
Botond Dénes
2c22d69793 Merge 'Pytest: fix variable handling in GSServer (mock) and ensure docker service logs go to test log as well' from Calle Wilund
Fixes: SCYLLADB-1106

* Small fix in scylla_cluster - remove debug print
* Fix GSServer::unpublish so it does not except if publish was not called beforehand
* Improve dockerized_server so mock server logs echo to the test log to help diagnose CI failures (because we don't collect log files from mocks etc, and in any case correlation will be much easier).

No backport needed.

Closes scylladb/scylladb#29112

* github.com:scylladb/scylladb:
  dockerized_service: Convert log reader to pipes and push to test log
  test::cluster::conftest::GSServer: Fix unpublish for when publish was not called
  scylla_cluster: Use thread safe future signalling
  scylla_cluster: Remove left-over debug printout
2026-04-03 06:38:05 +03:00
Raphael S. Carvalho
b6ebbbf036 test/cluster/test_tablets2: Fix test_split_stopped_on_shutdown race with stale log messages
The test was failing because the call to:

    await log.wait_for('Stopping.*ongoing compactions')

was missing the 'from_mark=log_mark' argument. The log mark was updated
(line: log_mark = await log.mark()) immediately after detecting
'splitting_mutation_writer_switch_wait: waiting', and just before
launching the shutdown task. However, the wait_for call on the following
line was scanning from the beginning of the log, not from that mark.

As a result, the search immediately matched old 'Stopping N tasks for N
ongoing compactions for table system.X due to table removal' messages
emitted during initial server bootstrap (for system.large_partitions,
system.large_rows, system.large_cells), rather than waiting for the
shutdown to actually stop the user-table split compaction.

This caused the test to prematurely send the message to the
'splitting_mutation_writer_switch_wait' injection. The split compaction
was unblocked before the shutdown had aborted it, so it completed
successfully. Since the split succeeded, 'Failed to complete splitting
of table' was never logged.

Meanwhile, 'storage_service_drain_wait' was blocking do_drain() waiting
for a message. With the split already done, the test was stuck waiting
for the expected failure log that would never come (600s timeout). At
the same time, after 60s the 'storage_service_drain_wait' injection
timed out internally, triggering on_internal_error() which -- with
--abort-on-internal-error=1 -- crashed the server (exit code -6).

Fix: pass from_mark=log_mark to the wait_for('Stopping.*ongoing
compactions') call so it only matches messages that appear after the
shutdown has started, ensuring the test correctly synchronizes with the
shutdown aborting the user-table split compaction before releasing the
injection.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1319.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#29311
2026-04-03 06:28:51 +03:00
Andrei Chekun
6526a78334 test.py: fix nodetool mock server port collision
Replace the random port selection with an OS-assigned port. We open
a temporary TCP socket, bind it to (ip, 0) with SO_REUSEADDR, read back
the port number the OS selected, then close the socket before launching
rest_api_mock.py.
Add reuse_address=True and reuse_port=True to TCPSite in rest_api_mock.py
so the server itself can also reclaim a TIME_WAIT port if needed.

Fixes: SCYLLADB-1275

Closes scylladb/scylladb#29314
2026-04-02 16:24:07 +02:00
Botond Dénes
eb78498e07 test: fix flaky test_timeout_is_applied_on_lookup by using eventually_true
On slow/overloaded CI machines the lowres_clock timer may not have
fired after the fixed 2x sleep, causing the assertion on
get_abort_exception() to fail. Replace the fixed sleep with
sleep(1x) + eventually_true() which retries with exponential backoff,
matching the pattern already used in test_time_based_cache_eviction.

Fixes: SCYLLADB-1311

Closes scylladb/scylladb#29299
2026-04-01 18:20:11 +03:00
Marcin Maliszkiewicz
a74665b300 transport: add per-service-level pending response memory metric
Track the total memory consumed by responses waiting to be
written to the socket, exposed as a per-scheduling-group gauge
(cql_pending_response_memory). This complements the response
memory accounting added in the previous commits by giving
visibility into how much memory each service level is holding
in unsent response buffers.
2026-04-01 17:15:28 +02:00
Robert Bindar
e7527392c4 test: close clients if cluster teardown throws
make sure the driver is stopped even though cluster
teardown throws and avoid potential stale driver
connections entering infinite reconnect loops which
exhaust cpu resources.

Fixes: SCYLLADB-1189

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#29230
2026-04-01 17:22:19 +03:00
Tomasz Grabiec
2ec47a8a21 tests: address_map_test: Fix flakiness in debug mode due to task reordering
Debug mode shuffles task position in the queue. So the following is possible:
 1) shard 1 calls manual_clock::advance(). This expires timers on shard 1 and queues a background smp call to shard 0 which will expire timers there
 2) the smp::submit_to(0, ...) from shard 1 called by the test sumbits the call
 3) shard 0 creates tasks for both calls, but (2) is run first, and preempts the reactor
 4) shard 1 sees the completion, completes m_svc.invoke_on(1, ..)
 5) shard 0 inserts the completion from (4) before task from (1)
 6) the check on shard 0: m.find(id1) fails because the timer is not expired yet

To fix that, wait for timer expiration on shard 0, so that the test
doesn't depend on task execution order.

Note: I was not able to reproduce the problem locally using test.py --mode
debug --repeat 1000.

It happens in jenkins very rarely. Which is expected as the scenario which
leads to this is quite unlikely.

Fixes SCYLLADB-1265

Closes scylladb/scylladb#29290
2026-04-01 17:17:35 +03:00
Aleksandra Martyniuk
4d4ce074bb test: node_ops_tasks_tree: reconnect driver after topology changes
The test exercises all five node operations (bootstrap, replace, rebuild,
removenode, decommission) and by the end only one node out of four
remains alive. The CQL driver session, however, still holds stale
references to the dead hosts in its connection pool and load-balancing
policy state.

When the new_test_keyspace context manager exits and attempts
DROP KEYSPACE, the driver routes the query to the dead hosts first,
gets ConnectionShutdown from each, and throws NoHostAvailable before
ever trying the single live node.

Fix by calling driver_connect() after the decommission step, which
closes the old session and creates a fresh one connected only to the
servers the test manager reports as running.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1313.

Closes scylladb/scylladb#29306
2026-04-01 17:13:11 +03:00
Dario Mirovic
85127fded8 test: boost: test null data value to_parsable_string
Add tests for null value in data_type::to_parsable_string().
We now explicitly return "null".

Refs SCYLLADB-1350
2026-04-01 14:15:25 +02:00
Dario Mirovic
fc705dfb4b cql3: fix null handling in data_value formatting
data_value::to_parsable_string() crashed with a null pointer
dereference when called on a null data_value. Return "null" instead.

Fixes SCYLLADB-1350
2026-04-01 14:15:18 +02:00
Andrzej Jackowski
cccb014747 test: ldap: add regression test for double-free on unregistered message ID
Sends a search via the raw LDAP handle (bypassing _msgid_to_promise
registration), then triggers poll_results() through the public API
to exercise the unregistered-ID branch.

Refs: SCYLLADB-1344
2026-04-01 12:57:50 +02:00
Botond Dénes
0351756b15 Merge 'test: fix fuzzy_test timeout in release mode' from Piotr Smaron
The multishard_query_test/fuzzy_test was timing out (SIGKILL after
15 minutes) in release mode CI.

In release mode the test generates up to 64 partitions with up to
1000 clustering rows and 1000 range tombstones each.  With deeply
nested randomly-generated types (e.g. frozen<map<varint,
frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed
the 15-minute CI timeout.

Reduce the release-mode clustering-row and range-tombstone
distributions from 0-1000 to 0-200.  This caps the worst case at
~12,800 rows -- still 2x the devel-mode maximum (0-100) and
sufficient to exercise multi-partition paged scanning with many
pages.

Fixes: SCYLLADB-1270

No need to backport for now, only appeared on master.

Closes scylladb/scylladb#29293

* github.com:scylladb/scylladb:
  test: clean up fuzzy_test_config and add comments
  test: fix fuzzy_test timeout in release mode
2026-04-01 11:50:15 +03:00
Andrzej Jackowski
f0028c06dc ldap: fix double-free of LDAPMessage in poll_results()
In the unregistered-ID branch, ldap_msgfree() was called on a result
already owned by an RAII ldap_msg_ptr, causing a double-free on scope
exit. Remove the redundant manual free.

Fixes: SCYLLADB-1344
2026-04-01 10:35:13 +02:00
Andrei Chekun
18f41dcd71 test.py: introduce new scheduler for choosing job count
This commit improves how test.py chohoses the default number of
parallele jobs.
This update keeps logic of selecting number of jobs from memory and cpu limits
but simplifies the heuristic so it is smoother, easier to reason about.
This avoids discontinuities such as neighboring machine sizes producing
unexpectedly different job counts, and behaves more predictably on asymmetric
machines where CPU and RAM do not scale together.

Compared to the current threshold-based version, this approach:
- avoids hard jumps around memory cutoffs
- avoids bucketed debug scaling based on CPU count
- keeps CPU and memory as separate constraints and combines them in one place
- avoids double-penalizing debug mode
- is easier to tune later by adjusting a few constants instead of rewriting branching logic

Closes scylladb/scylladb#28904
2026-04-01 11:11:15 +03:00
Avi Kivity
d438e35cdd test/cluster: fix race in test_insert_failure_standalone audit log query
get_audit_partitions_for_operation() returns None when no audit log
rows are found. In _test_insert_failure_doesnt_report_success_assign_nodes,
this None is passed to set(), causing TypeError: 'NoneType' object is
not iterable.

The audit log entry may not yet be visible immediately after executing
the INSERT, so use wait_for() from test.pylib.util with exponential
backoff to poll until the entry appears. Import it as wait_for_async
to avoid shadowing the existing wait_for from test.cluster.dtest.dtest_class,
which has a different signature (timeout vs deadline).

Fixes SCYLLADB-1330

Closes scylladb/scylladb#29289
2026-04-01 10:59:02 +03:00
Michael Litvak
35547bfb6e test: logstor: additional logstor tests 2026-03-31 18:45:08 +02:00
Michael Litvak
5b3e2a4ca2 docs/dev: add logstor on-disk format section 2026-03-31 18:45:08 +02:00
Michael Litvak
39baa573d2 logstor: add version and crc to buffer header
add basic crc and validation to the buffer header. add also a version
field that indicates the version of the on-disk format.
2026-03-31 18:45:08 +02:00
Michael Litvak
6ace823ee4 test: logstor: tablet split/merge and migration
add basic logstor tests for tablet split/merge and migration to verify
it works as expected
2026-03-31 18:45:08 +02:00
Michael Litvak
996d623ab4 logstor: enable tablet balancing
enable tablet balancing with the logstor feature now that it works
2026-03-31 18:45:08 +02:00
Michael Litvak
b02349d755 logstor: streaming of logstor segments using stream_blob
implement tablet migration for logstor tables by streaming segments
using stream_blob, similar to file streaming of sstables.

take a snapshot of the logstor segments and create a stream_blob_info
vector with entry for each segment with the input stream that reads the
segment and an op of type file_ops::stream_logstor_segments.

the stream_blob_handler creates a logstor sink that allocates a segment
on the target shard and creates an output stream that writes to it. when
the sink is closed it loads the segment.
2026-03-31 18:45:08 +02:00