Commit Graph

47870 Commits

Author SHA1 Message Date
Michał Chojnowski
185a032044 utils/stream_compressor: allocate memory for zstd compressors externally
The default and recommended way to use zstd compressors is to let
zstd allocate and free memory for compressors on its own.

That's what we did for zstd compressors used in RPC compression.
But it turns out that it generates allocation patterns we dislike.

We expected zstd not to generate allocations after the context object
is initialized, but it turns out that it tries to downsize the context
sometimes (by reallocation). We don't want that because the allocations
generated by zstd are large (1 MiB with the parameters we use),
so repeating them periodically stresses the reclaimer.

We can avoid this by using the "static context" API of zstd,
in which the memory for context is allocated manually by the user
of the library. In this mode, zstd doesn't allocate anything
on its own.

The implementation details of this patch adds a consideration for
forward compatibility: later versions of Scylla can't use a
window size greater than the one we hardcoded in this patch
when talking to the old version of the decompressor.

(This is not a problem, since those compressors are only used
for RPC compression at the moment, where cross-version communication
can be prevented by bumping COMPRESSOR_NAME. But it's something
that the developer who changes the window size must _remember_ to do).

Fixes #24160
Fixes #24183

Closes scylladb/scylladb#24161
2025-05-27 12:43:11 +03:00
Jenkins Promoter
76dddb758e Update pgo profiles - x86_64 2025-05-27 12:02:49 +03:00
Pavel Emelyanov
bd3bd089e1 sstables_loader: Fix load-and-stream vs skip-cleanup check
The intention was to fail the REST API call in case --skip-cleanup is
requested for --load-and-stream loading. The corresponding if expression
is checking something else :( despite log message is correct.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24208
2025-05-27 12:01:01 +03:00
Jenkins Promoter
de9d9c9ece Update pgo profiles - aarch64 2025-05-27 11:59:56 +03:00
Andrzej Jackowski
555d897a15 test: wait for normal state propagation in test_auth_v2_migration
By default, cluster tests have skip_wait_for_gossip_to_settle=0 and
ring_delay_ms=0. In tests with gossip topology, it may lead to a race,
where nodes see different state of each other.

In case of test_auth_v2_migration, there are three nodes. If the first
node already knows that the third node is NORMAL, and the second node
does not, the system_auth tables can return incomplete results.

To avoid such a race, this commit adds a check that all nodes see other
nodes as NORMAL before any writes are done.

Refs: #24163

Closes scylladb/scylladb#24185
2025-05-27 11:41:09 +03:00
Nikos Dragazis
eaa2ce1bb5 sstables: Fix race when loading checksum component
`read_checksum()` loads the checksum component from disk and stores a
non-owning reference in the shareable components. To avoid loading the
same component twice, the function has an early return statement.
However, this does not guarantee atomicity - two fibers or threads may
load the component and update the shareable components concurrently.
This can lead to use-after-free situations when accessing the component
through the shareable components, since the reference stored there is
non-owning. This can happen when multiple compaction tasks run on the
same SSTable (e.g., regular compaction and scrub-validate).

Fix this by not updating the reference in shareable components, if a
reference is already in place. Instead, create an owning reference to
the existing component for the current fiber. This is less efficient
than using a mutex, since the component may be loaded multiple times
from disk before noticing the race, but no locks are used for any other
SSTable component either. Also, this affects uncompressed SSTables,
which are not that common.

Fixes #23728.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#23872
2025-05-27 11:26:35 +03:00
Botond Dénes
2739eb49fd Merge 'docs: remove API reference redirect' from David Garcia
Fix for https://github.com/scylladb/scylladb/pull/24097

The stable branch does not contain the split API reference yet. This change fixes the 404 error raised when accessing the API reference on the stable branch due to the redirect.

Closes scylladb/scylladb#24259

* github.com:scylladb/scylladb:
  docs: fix typo
  docs: remove API reference redirect
2025-05-27 11:24:27 +03:00
Nadav Har'El
8487d81c6e Merge 'test: mark difference in handling IFs in LWT as scylla_only' from Andrzej Jackowski
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS"). Cassandra tries to detect condition conflicts, and prints
an error instead of silently failing the batch, but in ScyllaDB
we considered this check to be inconsistent and unhelpful, and
decided not to implement it.

In this series, we extend the documentation of the ScyllaDB behaviour
by extending the documents and improving relevant LWT tests.

Fixes: https://github.com/scylladb/scylladb/issues/13011

Backport not needed, only docs and minor tests changes.

Closes scylladb/scylladb#24086

* github.com:scylladb/scylladb:
  test: mark difference in handling IFs in LWT as scylla_only
  docs: cql: add explicit explanation how mixing IFs works in LWT
  docs: lwt: add two missing spaces
2025-05-27 09:35:41 +03:00
Andrzej Jackowski
7dc0c4cf4f test: close logfile/socket_dir for stopped servers in recycle_cluster
PythonTestSuite::recycle_cluster is a function that releases resources
of an old, dirty cluster to make it reusable. It closes log_file and
maintenance_socket_dir for running nodes in a dirty cluster, however it
doesn't do the same for stopped nodes. It leads to leakage of file
descriptors of stopped nodes, which in turn can lead to hitting ulimit
of open files (that is often 1024) if the leaking test is repeated with
`./test.py --repeat ...`. The problem was detected when tests from
`test/cluster/dtest/` directory were executed with high `repeat` value.

This commit extends `recycle_cluster` to close and cleanup logfile and
`socket_dir` for nodes that are stopped (because self.servers in
ScyllaCluster is ChainMap of self.running and self.stopped).

Closes scylladb/scylladb#24243
2025-05-27 08:37:43 +03:00
David Garcia
d99d1c315c docs: remove [erno X] prefix from metrics logger
Closes scylladb/scylladb#24246
2025-05-27 08:37:11 +03:00
David Garcia
3e331cfbbe docs: fix typo 2025-05-26 21:34:23 +02:00
David Garcia
eefc9c33e8 docs: remove API reference redirect
The stable branch does not contain the split API reference yet.
This change fixes the 404 error raised when accessing the API reference on the stable branch.
2025-05-26 21:32:07 +02:00
Andrzej Jackowski
ea6ef5d0aa test: mark difference in handling IFs in LWT as scylla_only
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS"). Cassandra tries to detect condition conflicts, and prints
an error instead of silently failing the batch, but in ScyllaDB
we considered this check to be inconsistent and unhelpful, and
decided not to implement it.

This commit:
 - Make test_lwt_with_batch_conflict_1 scylla_only instead of xfail,
   change the scenario to pass with the current implementation.
 - Add test_lwt_with_batch_conflict_3 that shows how Cassandra fails
   batch statement with different conditions, even when the conditions
   are not contradictory.
 - Add test_lwt_with_batch_conflict_4/5 that shows how static rows
   are handled in conditional batches.

Fixes: #13011
2025-05-26 15:47:11 +02:00
Andrzej Jackowski
2d4acb623e docs: cql: add explicit explanation how mixing IFs works in LWT
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS").

This commit explicitly documents the differences in the behavior.

Refs: #13011
2025-05-26 15:13:01 +02:00
Piotr Dulikowski
4508823294 Merge 'test.py: dtest: few fixes missed in the initial implementation' from Evgeniy Naydanov
There are few problems found in the dtest shim code after scylladb/scylladb#21580 was merged:

- The call of `init_default_config()` method was missed in scylladb/scylladb#21580.  It is required to handle dtest options and markers.
- The implementation of dtest shim uses `server_id` to format a name of a node in a cluster. This is a difference in behavior with dtest. Some of dtests use code like `cluster.nodes()["node1"]` to get access to a node object.
- Default timeout was missed in `ScyllaNode.wait_until_stopped()` method. Set it to 600 for debug mode or to 127 otherwise.

Closes scylladb/scylladb#24225

* github.com:scylladb/scylladb:
  test.py: dtest: set default wait_seconds based on build mode
  test.py: dtest: name nodes in cluster using index starting from 1
  test.py: dtest: initialize default config in dtest setup fixture
2025-05-26 13:37:12 +02:00
Yaron Kaikov
89ace09c18 [workflow]: add conflict_reminder to PRs based against master
Today we send a reminder to PR's author when backport PRs has conflicts.
Often, PR authors wait for their PR to be reviewed/merged, but the merge is not happening because the PR now conflicts with master and so maintainers won't merge it. This can lead to a stall, where maintainers wait for the author to rebase and authors are waiting for merge.

In this PR we added the ability to notify the PR author as soon as base
branch moved forward and rebase is requried

Fixes: https://github.com/scylladb/scylla-pkg/issues/4955

Closes scylladb/scylladb#24209
2025-05-26 14:30:06 +03:00
David Garcia
6f722e8bc0 docs: split api reference in smaller files
Closes scylladb/scylladb#24097
2025-05-26 12:06:59 +03:00
David Garcia
bf9534e2b5 docs: fix \t (tab) is not rendered correctly
Closes scylladb/scylladb#24096
2025-05-26 12:06:03 +03:00
Avi Kivity
29932a5af1 pgo: drop Java configuration
Since 5e1cf90a51
("build: replace tools/java submodule with packaged cassandra-stress")
we run pre-packaged cassandra-stress. As such, we don't need to look for
a Java runtime (which is missing on the frozen toolchain) and can
rely on the cassandra-stress package finding its own Java runtime.

Fix by just dropping all the Java-finding stuff.

Note: Java 11 is in fact present on the frozen toolchain, just
not in a way that pgo.py can find it.

Fixes #24176.

Closes scylladb/scylladb#24178
2025-05-26 10:16:03 +02:00
Avi Kivity
f195c05b0d untyped_result_set: mark get_blob() as returning unfragmented data
Blobs can be large, and unfragmented blobs can easily exceed 128k
(as seen in #23903). Rename get_blob() to get_blob_unfragmented()
to warn users.

Note that most uses are fine as the blobs are really short strings.

Closes scylladb/scylladb#24102
2025-05-26 09:40:34 +02:00
Michał Chojnowski
ff8a119f26 test/boost/sstable_compressor_factory_test: define a test suite name
It seems that tests in test/boost/combined_tests have to define a test
suite name, otherwise they aren't picked up by test.py.

Fixes #24199

Closes scylladb/scylladb#24200
2025-05-26 09:35:30 +02:00
Anna Stuchlik
d303edbc39 doc: remove copyright from Cassandra Stress
This commit removes the Apache copyright note from the Cassandra Stress page.

It's a follow up to https://github.com/scylladb/scylladb/pull/21723, which missed
that update (see https://github.com/scylladb/scylladb/pull/21723#discussion_r1944357143).

Cassandra Stress is a separate tool with separate repo with the docs, so the copyright
information on the page is incorrect.

Fixes https://github.com/scylladb/scylladb/issues/23240

Closes scylladb/scylladb#24219
2025-05-26 09:35:30 +02:00
Pavel Emelyanov
2a253ace5e Merge 'test.py: add coverage for boost with pytest execution' from Andrei Chekun
This PR adds the possibility to gather coverage for the boost tests when they're executed with pytest. Since the pytest will be used as the main runner for boost tests as well, we need this before switching the runners.

Closes scylladb/scylladb#24236

* github.com:scylladb/scylladb:
  test.py: add support for coverage for boost test
  test.py: get the temp dir from facade
2025-05-26 10:18:53 +03:00
Andrei Chekun
537054bfad test.py: add support for coverage for boost test
This PR adds the possibility to gather coverage for the boost tests when they're executed with pytest. Since the pytest will be used as the main runner for boost tests as well, we need this before switching the runners.
2025-05-23 12:54:54 +02:00
Andrei Chekun
c5a7f3415c test.py: get the temp dir from facade
No need to get the temp dir from the options when facade has this information already.
2025-05-23 12:54:48 +02:00
Nadav Har'El
d2844055ad Merge 'index: implement schema management layer for vector search indexes' from null
This pull request adds support for creating custom indexes (at a metadata level) as long as a supported custom class is provided (currently only vector search).

The patch contains:

- a change in CREATE INDEX statement that allows for the USING keyword to be present as long as one of the supported classes is used
-  support for describing custom indexes in the DESCRIBE statement
- unit tests

Co-authored by: @Balwancia

Closes scylladb/scylladb#23720

* github.com:scylladb/scylladb:
  test/cqlpy: add custom index tests
  index: support storing metadata for custom indices
2025-05-22 12:19:36 +03:00
Pavel Emelyanov
a0d2e63303 Merge 'test.py: add the possibility to gather resource metrics for C++ tests' from Andrei Chekun
Move the run_process method to resource gather instance, since we need to start a monitor to check memory consumption in the cgroup. Pytest has concept of the test, but it is completely different from test.py. Resource gather instance take test instance to save and extract information about the test. Additional method emulating test.py test instance added not to rewrite the resource gather instance. Finally, combining all these changes to have ability to get metrics for test in both runners: test.py and pytest.

Closes scylladb/scylladb#24091

* github.com:scylladb/scylladb:
  test.py: add missing parameter for boost tests for pytest runner
  test.py: add support for boost_data_test_case in combined tests
  test.py: clean log files after a successful run
  test.py: attach output of the boost test to the report
  test.py: fix metrics DB location
  test.py: move run_process to resource_gather.py
  test.py: unify using constant for finding repo root directory
  test.py: refactor run_process in facade.py
  test.py: add the possibility to create a test alike object
2025-05-22 10:34:34 +03:00
Evgeniy Naydanov
8dc5413f54 test.py: dtest: set default wait_seconds based on build mode
Default timeout was missed in `ScyllaNode.wait_until_stopped()` method.
Set it to 600 for debug mode or to 127 otherwise.
2025-05-22 06:39:03 +00:00
Evgeniy Naydanov
eca5d52f1d test.py: dtest: name nodes in cluster using index starting from 1
The current implementation of dtest shim use `server_id` to format a
name of a node in a cluster. This is a difference in behavior with dtest.
Some of dtests use code like `cluster.nodes()["node1"]` to get access
to a node object.  This commit changes it to be more consistent with
dtest.
2025-05-22 06:34:03 +00:00
Evgeniy Naydanov
91e29a302a test.py: dtest: initialize default config in dtest setup fixture
The call of `init_default_config()` method was missed in #21580.
It is required to handle dtest options and markers.
2025-05-22 06:22:04 +00:00
Andrei Chekun
8812b14078 test.py: add missing parameter for boost tests for pytest runner
Since we are running tests with a pytest, we don't need a report at the end of the run.
2025-05-21 19:41:41 +02:00
Andrei Chekun
66b014621e test.py: add support for boost_data_test_case in combined tests
Change the parsing logic of combined tests to support a case when boost_data_test_case used that produced additional lines in the output.
2025-05-21 19:41:41 +02:00
Andrei Chekun
88d24d8ad5 test.py: clean log files after a successful run
Clean different output files from the boost and unit tests.
Move logs for boost test to the testlog directory instead of having additional directory pytest
2025-05-21 19:41:41 +02:00
Andrei Chekun
a956dd8770 test.py: attach output of the boost test to the report
Added attaching the output of the test in case of fail to the Allure report
2025-05-21 19:41:39 +02:00
Andrei Chekun
ac86cc9f6d test.py: fix metrics DB location
Fix the issue introduced with scylladb/scylladb#22960. Suite log dir was changed, and the path for metrics DB was relying on it. As a result, DB is now located in the mode directory instead of the root of the testlog.
2025-05-21 15:37:15 +02:00
Andrei Chekun
b5b69710bd test.py: move run_process to resource_gather.py
Move the run_process method to the resource gather instance, since we need to start monitor to check memory consumption in the cgroup. Since resource_gather needs test.py test object, and pytest has no clue about it, adding a simple namespace object to emulate such a test object. It needed only to gather some information regarding the test to be able to add records to the DB.
Since we have two facades that can share the same run process procedure, adding a common method to handle this to avoid code duplication.
2025-05-21 15:34:34 +02:00
Andrei Chekun
3bcd6db718 test.py: unify using constant for finding repo root directory
Instead of finding dynamically the repo root directory relatively to the temp dir, that's in most cases in the repo, will fail if a non-default temp dir parameter is used. Additionally, to have the single source of truth of finding the repo root directory switching to the constants.
2025-05-21 15:34:34 +02:00
Andrei Chekun
4e18444831 test.py: refactor run_process in facade.py
Add injecting environment variables to the process
Switch from print to propper logger
Set buffer size to 1 to avoid losing any data from the boost test if the test collapsed.
Currently, run process logs and return stdout and stderr, but boost tests are using stderr only. So stderr redirected to stdout. This helps with Jenkins as well, since we are reducing the number of files to store.
2025-05-21 15:34:34 +02:00
Andrei Chekun
38310975c5 test.py: add the possibility to create a test alike object
resource_gather.py needs test.py test object to work. It needs some information about the test to be able to write down this information to the DB with metrics. When running with pytest, there's no such test object, that's why adding make_test_object to mimic the test.py's test object.
Switching the getting the mode for constructing path to chgroup to test
instead of suite. They are the same, but this helps to have emulate less
in make_test_object method.
2025-05-21 15:34:34 +02:00
Pavel Emelyanov
dac7589cef Revert "encryption_test: Catch exact exception"
This reverts commit 2d5c0f0cfd.

KMS tests became flaky after it: #24218
Need to revisit.
2025-05-20 13:52:14 +03:00
Petr Gusev
0443081b0d build: fix merge-compdb.py for CMake 'output' attributes
compile_commands.json is used by LSPs (e.g. `clangd` in VS Code) for
code navigation. `merge-compdb.py`, called by `configure.py`, merges
these files from Scylla, Seastar, and Abseil. The script filters
entries by checking the output attribute against a given prefix. This
is needed because Scylla’s compile_commands.json is generated by Ninja
and includes all build modes, in case the user specified multiple
ones in the call to configure.py. Seastar and Abseil databases,
generated by CMake, used to omit the output attribute, so filtering
did not apply. Starting with `CMake 3.20+`, output attributes are now
included and do not match the expected prefix. For example, they
could be of the form
`absl/synchronization/CMakeFiles/synchronization.dir/internal/futex_waiter.cc.o`.
This causes relevant entries from Seastar and Abseil to be filtered out.

This patch refactors `merge-compdb.py` to allow specifying an
optional prefix per input file, preserving the intent of applying
the output filtering logic only for ninja-generated
Scylla compdb file.

Closes scylladb/scylladb#24211
2025-05-20 08:43:09 +03:00
Piotr Dulikowski
c15cf54e3d Merge 'test.py: migrate alternator_tests.py from dtest suite' from Evgeniy Naydanov
We have a significant amount of tests in scylla-dtest repository and I believe most of them can be just copied to test.py framework with adding a relatively small shim code. In this PR I done that for 2 tests: [alternator_tests.py](https://github.com/scylladb/scylla-dtest/blob/next/alternator_tests.py) and [error_example_test.py](https://github.com/scylladb/scylla-dtest/blob/next/error_example_test.py)

One of the problems is async nature of test.py framework and synchronous of scylla-dtest. It was resolved by using universalasync third-party library. Other problem is ccmlib and it's resolved by adding a shim code (`test/dtest/ccmlib`)

ccmlib has a lot of dead code and not all it's features used by scylla-dtest, in this PR I added checks that we will not accidentally use some of them or miss something. And when we'll done the migration we can easily remove all unused parameters and these checks.

`error_example_test.py` copied as is (just license preamble added), `alternator_tests.py` has small changes:

1. License preamble
2. Remove unused imports
3. Remove unneeded `skip_if` marker (I think it can be backported to dtest, or we can remove the test from dtest after merging this PR)

```diff
--- ../../../scylla-dtest/alternator_tests.py
+++ alternator_tests.py
@@ -1,17 +1,20 @@
+#
+# Copyright (C) 2025-present ScyllaDB
+#
+# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
+#
+
 import logging
 import operator
 import os
 import random
-import shutil
 import string
-import subprocess
 import tempfile
 import time
 from ast import literal_eval
 from concurrent.futures.thread import ThreadPoolExecutor
 from copy import deepcopy
 from decimal import Decimal
-from pathlib import Path
 from pprint import pformat

 import boto3.dynamodb.types
@@ -46,7 +49,6 @@
 )
 from dtest_class import get_ip_from_node, wait_for
 from tools.cluster import new_node
-from tools.marks import issue_open, with_feature
 from tools.misc import set_trace_probability
 from tools.retrying import retrying

@@ -168,7 +170,6 @@
         read_and_delete_set_elements_thread.join()

     @pytest.mark.next_gating
-    @pytest.mark.skip_if(with_feature("tablets") & issue_open("#18002"))
     def test_decommission_during_dynamo_load(self):
         self.prepare_dynamodb_cluster(num_of_nodes=3)
         node1, node2, node3 = self.cluster.nodelist()
```

Because all tests in this repo are considered to be "gating",  I removed all not next_gating tests and all dtest's suites markers as a separate commit.

To reduce tests execution time run the tests in dev mode only and made some sleeps smaller.

In result, 23 tests added in total (22 in `test_alternator.py` and 1 in `test_error_example`.)  The added tests will increase CI time by ~2х4 =8 minutes.

Closes scylladb/scylladb#21580

* github.com:scylladb/scylladb:
  test.py: dtest/alternator_tests.py: make sleep intervals smaller
  test.py: dtest/alternator_tests.py: remove not next_gating tests
  test.py: migrate alternator_tests.py from dtest
  test.py: initial implementation of dtest/ccm shim
  test.py: manager: add server_get_returncode() method
  test.py: manager: change CLI and env options on a node start
  test.py: REST API: add set_trace_probability() method
  test.py: REST API: add get_tokens() method
  test.py: rework log_browsing for dtest migration
2025-05-20 00:13:16 +02:00
Evgeniy Naydanov
e456f0ed7b test.py: dtest/alternator_tests.py: make sleep intervals smaller 2025-05-19 12:27:32 +00:00
Evgeniy Naydanov
8dd86818a0 test.py: dtest/alternator_tests.py: remove not next_gating tests
Remove all not next_gating tests and remove any dtest suites markers
because all tests in this repo are considered to be "gating".
2025-05-19 12:27:32 +00:00
Evgeniy Naydanov
57c1035146 test.py: migrate alternator_tests.py from dtest
The test almost unmodified except remove unneeded skipif mark
and unused imports.
2025-05-19 12:27:32 +00:00
Evgeniy Naydanov
ac1551892b test.py: initial implementation of dtest/ccm shim
Use universalasync library to make test.py async code compatible
with synchronous code of dtest/ccm

Also, copied unmodified error_example_test.py from dtest as an example.

Run the test in `dev` mode only.
2025-05-19 12:27:31 +00:00
Evgeniy Naydanov
2cb640f95c test.py: manager: add server_get_returncode() method
The method return None if Scylla process is still running or returncode.
If there is no Scylla process launched then raise NoSuchProcess exception.
2025-05-19 11:50:55 +00:00
Evgeniy Naydanov
d874beb17f test.py: manager: change CLI and env options on a node start
Add parameters to server_start() method to provide ability to
change Scylla' CLI and env options on a node start.

Also, add `expected_server_up_state` parameter as we have for
server_add() method.
2025-05-19 11:50:55 +00:00
Evgeniy Naydanov
5d3b54aa9b test.py: REST API: add set_trace_probability() method 2025-05-19 11:50:55 +00:00
Evgeniy Naydanov
a16a4b6171 test.py: REST API: add get_tokens() method
Get a list of the tokens for the specified node.
Optional `endpoint` parameter can be provided.
2025-05-19 11:50:55 +00:00