scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-21 23:32:15 +00:00

Author	SHA1	Message	Date
Nadav Har'El	cd61a44ab8	test/alternator: test response compression of tiny responses This patch adds to the existing collection of tests for Alternator response compression another test with a tiny response being compressed. This test serves two purposes: 1. It verifies setting alternator_response_compression_threshold_in_bytes to a tiny number like 1 really means that tiny responses would be compressed. 2. It verifies that our compression code, which has a special code path for the small chunk at the end of the compression, works correctly. The original motivation for writing this test was a false alarm by Claude Code which claimed that Alternator's response compression code has a serious, exploitable, memory overrun bug, because it set the wrong size limit on that last chunk. Claude was wrong, there is no such bug. We did set an oversized limit on the last chunk (so this patch fixes this typo), but it didn't matter - because the code used deflateBound - the guaranteed maximum size of the uncompressed data - for the buffer's size, so the buffer was unconditionally big enough, no matter which avail_out limit we passed to delate() it could never overflow. The included test passes even before this patch, even with ASAN enabled to detect memory overflows - no overflow was happening. It also passes after the typo correction in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29718	2026-05-19 10:02:26 +03:00
Avi Kivity	85374207ca	Merge 'test.py: rewrite gather metrics' from Andrei Chekun Rewrite gather metrics to be able to gather metrics for python tests correctly. Python tests require different handling of metrics gathering from cgroup than C++ tests. pytest do not execute each python tests in a separate process, so we can't put it there and get the metrics. The idea is to put the whole pytest process to the cgroup and get the metrics. This will work because pytest runs the threads as a completely separate processes and inside the thread it will run tests consequently. Additionally, to simplify system resource monitor moved to pytest main thread. Change the behavior of the gathering metrics. From this PR some data will be collected even with `--no-gather-metrics`. This data do not need any configuration and just metadata of the tests: test name, time of execution, status of the test. When `--gather-metrics` provided additionally will be written the data gathered from the cgroups about the memory for each specific test and system CPU/RAM utilization. Backport is not needed, because it's a framework change only. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-575 ~Blocked by: https://github.com/scylladb/scylladb/pull/27618~ Now python tests have metrics gathered from the cgroups as well with their own Scylla instances. ```bash $ sqlite3 --header testlog/sqlite_af8cb.db 'select tst.path, tst.file, tst.test_name, user_sec,system_sec,usage_sec,memory_peak /1024/1024 as memory_peak_mb from test_metrics join tests as tst where tst.id = test_metrics.test_id order by memory_peak_mb desc limit 10;' path\|file\|test_name\|user_sec\|system_sec\|usage_sec\|memory_peak_mb test/cluster/dtest\|limits_test.py\|test_max_cells\|489.468174\|27.6638949999999\|517.132069\|4241 test/cluster/dtest\|rebuild_test.py\|test_rebuild_stream_abort_repro\|93.6400869999998\|28.9843249999999\|122.624412\|4241 test/cluster/dtest\|schema_management_test.py\|test_prepared_statements_work_after_node_restart_after_altering_schema_without_changing_columns\|6.8933219999999\|3.63569899999993\|10.5290209999994\|4241 test/cluster/dtest\|schema_management_test.py\|test_dropping_keyspace_with_many_columns\|1.31770999999981\|0.754742999999962\|2.07245299999977\|4241 test/cluster/dtest\|schema_management_test.py\|test_multiple_create_table_in_parallel\|5.48435300000028\|2.72915200000011\|8.21350499999971\|4241 test/cluster/dtest\|schema_management_test.py\|test_alter_table_in_parallel_to_read_and_write[write]\|80.687293\|18.5562\|99.2434920000005\|4241 test/cluster/dtest\|schema_management_test.py\|test_alter_table_in_parallel_to_read_and_write[read]\|79.1984790000001\|18.0969829999999\|97.2954609999997\|4241 test/cluster/dtest\|schema_management_test.py\|test_alter_table_in_parallel_to_read_and_write[mixed]\|85.332915\|18.9321070000001\|104.265022\|4241 test/cluster/dtest\|schema_management_test.py\|test_update_schema_while_node_is_killed[create_table]\|10.5875369999999\|5.67954400000008\|16.267081\|4241 test/cluster/dtest\|schema_management_test.py\|test_update_schema_while_node_is_killed[alter_table]\|11.3801709999998\|6.54689099999996\|17.9270630000001\|4241 ``` Closes scylladb/scylladb#28206 * github.com:scylladb/scylladb: test.py: Add host hardware info test.py: rewrite resource gather	2026-05-18 20:35:14 +03:00
Dawid Pawlik	c2d27d1a50	index: remove Chinese, Japanese, and Korean language analyzers Remove "chinese", "japanese", and "korean" from the list of accepted full-text search analyzer options. Exposing these options commits ScyllaDB to supporting them long-term — if we ever switch from one backend search engine to another, CJK analyzers are the most likely to lose out-of-the-box support, unlike the popular European languages that are broadly available across text analysis libraries. Restrict the accepted set now, while FTS is still new, to avoid a future compatibility burden. Add a test to check if the CJK language analyzer options are rejected. Fixes: VECTOR-672 Closes scylladb/scylladb#29877	2026-05-18 18:20:47 +03:00
Szymon Malewski	15493872b2	vector_search: fix decimal/varint precision loss in filter value_to_json() value_to_json() converts CQL values to JSON for vector search filters. For decimal and varint types, it used rjson::parse() on the JSON string, which parses through a double and silently loses precision for values exceeding ~15 significant digits — producing wrong filter results. Additionally, for decimal type we need an exact string representation that preserves the original (unscaled, scale) pair, because partition keys use byte-level identity: different serialized representations of the same numeric value are distinct rows, so the filter must reproduce the exact representation stored in the key. Add big_decimal::to_string_canonical() which follows the Java BigDecimal toString() spec (JDK 8+), producing a bijective string representation that uses exponential notation for extreme scales instead of expanding trailing zeros (which could cause OOM). This could replace to_string(), but doing so has wider consequences (e.g. hash/equality contract for decimal_type) described in SCYLLADB-1574. Use it in value_to_json() for decimal_type, and use rjson::from_string() for varint_type, both bypassing the lossy double parse path. Tests cover the new to_string_canonical() and the filter fix, as well as existing decimal type behavior (key representation, clustering order, toJson) that we rely on and must not break. The CQL decimal type tests (test_type_decimal.py) also pass against Cassandra. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1583 Refs: https://scylladb.atlassian.net/browse/SCYLLADB-1574 Closes scylladb/scylladb#29505	2026-05-18 17:07:26 +03:00
Piotr Dulikowski	26671d4d5f	Merge 'Refactor view_update_builder' from Wojciech Mitros This series improves the readability and structure of view_update_builder, the component that generates materialized view updates from base-table mutations. The first four patches are pure renames and refactoring with no semantic changes: 1. Document that the builder operates on a single base partition. 2. Rename member fields to clearly distinguish readers (the mutation_reader streams) from the cached fragments (the last mutation_fragment_v2 read from each stream). 3. Rename advance/on_results methods to names that describe what they actually do: read the next fragment, or generate view updates. 4. Extract partition-start handling into its own method. The next two patches are minor optimizations: 5. Simplify clustering-row handling by moving the row out of the fragment before applying the tombstone, avoiding an unnecessary memory-usage recalculation in the reader permit. 6. Replace deep copies with moves in the existing-only tail path, matching the pattern used everywhere else. Finally, patch 7 deduplicates the fragment-consuming logic by extracting the three repeated blocks into consume_both_fragments(), consume_update_fragment(), and consume_existing_fragment(). Code reorganization - no backport needed Closes scylladb/scylladb#29497 * github.com:scylladb/scylladb: mv: deduplicate code for consuming fragments in view_update_builder mv: avoid unnecessary copies of existing rows in generate_updates() mv: simplify clustering row handling in generate_updates() mv: rename methods in view_update_builder for clarity mv: rename view_update_builder readers and cached fragments mv: drop redundant std::move from partition key extraction mv: document single-partition builder scope	2026-05-18 15:52:26 +02:00
Piotr Dulikowski	5efb43195e	Merge 'db/schema_tables: don't emit empty view_building_tasks mutation on ALTER TABLE' from Michał Jadwiszczak After recent change (`1a32ccd`) `make_update_indices_mutations()` is unconditionally adding a mutation for `system.view_building_tasks`, even when no indices were being dropped. In a mixed-version cluster, the older node may not have this table, causing the Raft schema applier to fail with 'Can't find a column family with UUID ...'. This patch fixes the bug by emitting the mutation when indices are actually dropped (i.e., when the view building cleanup code path was entered). Fixes: SCYLLADB-2026 Refs: scylladb#26557 scylladb#26557 wasn't backported, so this patch also doesn't need to be. Closes scylladb/scylladb#29908 * github.com:scylladb/scylladb: db/schema_tables: don't emit empty view_building_tasks mutation on ALTER TABLE db/view_building_task_mutation_builder: add `empty()` method	2026-05-18 15:37:02 +02:00
Nadav Har'El	5dbd0d71d5	Merge 'test/pylib: test/pylib: Cached Scylla package resolver' from Alex Dathskovsky This series adds a shared helper for resolving, downloading, unpacking, and installing Scylla relocatable packages for test.py. The first patch introduces `version_fetch_utils`, which can resolve public Scylla artifacts from the downloads bucket by version, architecture, package variant, or direct URL. It also centralizes the local cache/install flow using retry handling, marker files, and file locking so repeated or concurrent test runs can safely reuse an existing installation. The second patch wires this helper into the existing Scylla executable setup paths. This removes the hard-coded 2025.1 package URL and replaces the local download/unpack/install logic in `scylla_cluster.py` with the shared resolver. It also makes `--exe-url` use the same cached installer path. Together, these changes make upgrade-test executable selection less brittle, avoid duplicated install logic, and provide a reusable foundation for fetching other Scylla versions in test.py. Closes scylladb/scylladb#29855 * github.com:scylladb/scylladb: test/pylib: use version fetcher for Scylla executable setup test/pylib: add cached Scylla package installer	2026-05-18 16:32:47 +03:00
Yaniv Michael Kaul	5d8d158bdd	call_backport_with_jira.yaml: add missing workflow permissions Add explicit permissions block matching the requirements of the called reusable workflow (contents: read, pull-requests: write, issues: write). Fixes code scanning alerts #181, #182, #183. Closes scylladb/scylladb#29182	2026-05-18 15:50:00 +03:00
Yaniv Michael Kaul	fbf5be5587	docs: update Python deps Ran 'make update' to get the latest version of all dependencies needed to build docs. Tested with 'make test' only. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-Assisted: no, to my surprise. Backport: not sure. Closes scylladb/scylladb#29909	2026-05-18 15:45:59 +03:00
Andrei Chekun	a03c4fd754	test.py: Add host hardware info Gather additional information about the running host for better metrics analysis	2026-05-18 12:23:40 +02:00
Andrei Chekun	6414c48fc2	test.py: rewrite resource gather Python tests requires different handling of metrics gathering from cgroup than C++ tests. pytest do not execute each python tests in a separate process, so we can't put it there and get the metrics. The idea is to put the whole pytest process to the cgroup and get the metrics. This will work because pytest runs the threads as as completely separate processes and inside the thread it will run tests consequently. Additionally, to simplify system resource monitor moved to pytest main thread.	2026-05-18 12:23:40 +02:00
Marcin Maliszkiewicz	628e1ef2de	Merge 'Introduce auth::config to decouple auth modules from db::config' from Pavel Emelyanov Auth modules (authenticators, role managers, and auth::service) access their configuration options by reaching into db::config through the query processor. This abuses database as proxy object to get configuration. This series introduces a dedicated auth::config struct that carries the configuration options used by auth modules.The config is populated in main.cc and delivered to each shard via sharded_parameter. This makes auth service conform to the overall design, where db::config is split into smaller per-service configs on start, thus decoupling individual components/services from global configuration. Cleaning components dependencies, not backporting. Closes scylladb/scylladb#29870 * github.com:scylladb/scylladb: auth: Remove unused default_superuser() function auth: Switch role managers to use auth::config auth: Switch authenticators to use auth::config auth: Introduce auth::config and wire it through service	2026-05-18 11:32:11 +02:00
Patryk Jędrzejczak	c9592a495e	Merge 'cql: fix missing TABLETS_ROUTING_V1 payload after CAS shard bounce' from Petr Gusev After an internal CAS shard bounce, check_locality() was evaluating against this_shard_id() of the post-bounce shard — which is the correct tablet shard — so it returned nullopt, and LWT/SERIAL responses omitted the tablets-routing-v1 custom payload. The client never learned the correct tablet map. Fix by recording the original entry shard in client_state (initialized to this_shard_id() at construction, preserved across shard bounces via client_state_for_another_shard) and passing it to check_locality() so it compares against the client's actual routing decision. No host_id tracking or forwarded_client_state IDL changes are needed because CAS shard bounces are always intra-node. Fixes SCYLLADB-2041 backport: need to backport to all versions with LWT over tablets Closes scylladb/scylladb#29910 * https://github.com/scylladb/scylladb: cql: refactor add_tablet_info to take tablet_routing_info directly cql: fix UB dereference of nullopt tablet_info in execute_with_condition test/boost: add regression test for missing tablet routing after CAS bounce cql: fix missing TABLETS_ROUTING_V1 payload after CAS shard bounce	2026-05-18 11:19:04 +02:00
Yehuda Lebi	6307e17795	fix: raise scylla-helper.slice CPUWeight from 10 to 100 to prevent node_exporter CPU starvation Closes scylladb/scylladb#29839	2026-05-18 11:55:14 +03:00
Yaniv Michael Kaul	f047e6fd5c	trigger_jenkins.yaml: add missing permissions and fix script injection Add explicit empty permissions block (permissions: {}) since this workflow only triggers Jenkins and sends Slack notifications using its own secrets. Also move expression interpolations into env vars to prevent potential script injection. Fixes code scanning alert #147. Also remove the pre-existing 'permissions: contents: read' block, which would result in duplicate YAML keys (invalid per the YAML spec). Closes scylladb/scylladb#29186	2026-05-18 11:39:39 +03:00
Botond Dénes	cc210813c8	Merge 'cmake: add IDL comparison to build system tool and fix PCH propagation' from Ernest Zaslavsky This series adds IDL file comparison to the build system comparison tool and fixes CMake PCH propagation. 1. `scripts/compare_build_systems.py` only compared compilation flags, link targets, and linker settings — it did not compare IDL-generated file sets. This allowed PR #28843 to pass CI despite adding `strong_consistency/groups_manager.idl.hh` to `configure.py` but not to `idl/CMakeLists.txt`. 2. CMake's `scylla-main` target was not using the precompiled header (`stdafx.hh`), even though configure.py applies it to every source file via `-include-pch`. This caused compilation failures for files relying on transitive includes from the PCH — e.g., `sstables_loader.cc` failed with `no member named 'read_entire_stream' in namespace 'seastar::util'`. Add a 4th comparison check to the build system comparison script: extract IDL-generated file sets from both build systems' ninja files and compare them. The extractors parse ninja build statements — configure.py side filters by build mode, CMake side handles the `\|` separator for implicit outputs — and normalize to a canonical relative path for comparison. Add the missing `strong_consistency/groups_manager.idl.hh` to `idl/CMakeLists.txt`. Add `target_precompile_headers(scylla-main REUSE_FROM scylla-precompiled-header)` so that all sources compiled under `scylla-main` benefit from the PCH, matching configure.py's behavior. Update documentation to reflect the new IDL comparison check. Refs: https://github.com/scylladb/scylladb/pull/29901 Refs: https://github.com/scylladb/scylladb/pull/28843 No backport needed — these are build system improvements only. Closes scylladb/scylladb#29912 * github.com:scylladb/scylladb: cmake: reuse precompiled header in scylla-main target idl: add missing groups_manager.idl.hh to CMakeLists.txt scripts: add IDL-generated file comparison to compare_build_systems	2026-05-18 11:38:14 +03:00
Yaniv Michael Kaul	34aac2030c	paxos: enable paging for internal paxos state queries The paxos state queries (load_paxos_state, save_paxos_promise, etc.) were using page_size=-1 (no paging). While each query returns at most one row and paging never actually kicks in, the lack of paging causes these internal queries to be counted as non-paged reads in the metrics, which can be confusing to users monitoring their cluster. Add LIMIT 1 to the SELECT query so that may_need_paging() short-circuits to false (row_limit <= 1), avoiding pager allocation overhead entirely. Set page_size=1000 so these queries are no longer reported as non-paged reads. Refs: https://scylladb.atlassian.net/browse/CUSTOMER-372 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Backport: no, improvement Closes scylladb/scylladb#29852	2026-05-18 11:35:55 +03:00
Michał Jadwiszczak	a9b2baf36b	db/schema_tables: don't emit empty view_building_tasks mutation on ALTER TABLE After recent change (`1a32ccd`) `make_update_indices_mutations()` is unconditionally adding a mutation for `system.view_building_tasks`, even when no indices were being dropped. In a mixed-version cluster, the older node may not have this table, causing the Raft schema applier to fail with 'Can't find a column family with UUID ...'. This patch fixes the bug by emitting the mutation when indices are actually dropped (i.e., when the view building cleanup code path was entered). Fixes: SCYLLADB-2026 Refs: scylladb#26557	2026-05-18 10:01:21 +02:00
Michał Jadwiszczak	82eb5611ab	db/view_building_task_mutation_builder: add `empty()` method The method allows to check if the builder contains any changes, so it will allow to skip emitting empty mutation.	2026-05-18 09:54:26 +02:00
Ernest Zaslavsky	834eed10d9	test: fix use-after-free in start_docker_service retry path start_docker_service is a coroutine that took docker_args and image_args by const reference. Its caller start_fake_gcs_server is a regular function that passes temporaries (initializer lists) and immediately returns a future. The temporaries are destroyed when the caller returns, leaving the coroutine holding dangling references. On the first loop iteration this works by luck (memory not yet reused), but on retry (after "address already in use") the params.append_range(image_args) reads freed memory, causing use-after-free that manifests as std::bad_alloc or broken_promise in non-sanitizer builds. Fix by taking docker_args and image_args by value so the coroutine frame owns the vectors for its entire lifetime. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2003 Closes scylladb/scylladb#29932	2026-05-18 10:50:19 +03:00
Szymon Malewski	cb8e11653f	test/alternator: Number normalization tests DynamoDB normalizes Number values, so different string representations of the same number (e.g., "1000" vs "1e3") should be treated as the same value in all contexts. In Alternator this is true in most cases, thanks to implicit normalization in Decimal `to_string()` function. However this is fragile - and in fact this function should be fixed due to OOM vulnerability in CQL use (#8002). This patch adds tests that should prevent regression in cases that work currently. Unfortunately not all contexts work currently - mainly the HASH keys are not normalized and backend handles them by byte representation. Added test replicate this incorrect behaviour All added tests pass with DynamoDB, with one exception: weirdly DynamoDB doesn't recognise unnormalized numbers in BatchGetItem as duplicate keys. Ref SCYLLADB-1575 Closes scylladb/scylladb#29501	2026-05-18 09:42:33 +03:00
Evgeniy Naydanov	39a10d6d67	test: remove dead suite subclasses and legacy execution pipeline After all test suites migrated to test_config.yaml with type: Python, the specialized suite classes (Topology, CQLApproval, Run, Tool) and the legacy execution pipeline (find_tests, run_test, TestSuite.run, Test.run) became unreachable. Remove all this dead code. Deleted files: - suite/topology.py, suite/cql_approval.py, suite/run.py, suite/tool.py Simplified: - base.py: remove run_test(), read_log(), TestSuite.run(), add_test_list(), build_test_list(), all_tests(), test_count(), SUITE_CONFIG_FILENAME, disabled/flaky test tracking, and dead Test attributes (args, core_args, valid_exit_codes, allure_dir, is_flaky, is_cancelled, etc.) - python.py: remove PythonTestSuite.run(), PythonTest.run(), _prepare_pytest_params(), pattern, test_file_ext, xmlout, server_log, scylla_env setup, and shlex import. Simplify run_ctx() to take no parameters. - runner.py: remove --scylla-log-filename option, print_scylla_log_filename fixture, SUITE_CONFIG_FILENAME import, and suite.yaml probe in TestSuiteConfig.from_pytest_node(). - __init__.py: remove re-exports of deleted classes. - test_config.yaml: Topology -> Python, Approval -> Python. - conftest files: run_ctx(options=...) -> run_ctx(). - docs/dev/testing.md: update to reflect current pytest-based architecture, log paths, and removed features. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com> Closes scylladb/scylladb#29613	2026-05-17 22:16:31 +03:00
Alex	176dbf12c2	test/pylib: use version fetcher for Scylla executable setup Replace the hard-coded 2025.1 archive download and local install logic with the shared Scylla package fetch/install helper. This keeps upgrade-test executable resolution and `--exe-url` handling on the same cached installer path.	2026-05-17 17:43:56 +03:00
Alex	1efe9a7243	test/pylib: add cached Scylla package installer Add utilities to resolve relocatable Scylla artifacts from the public downloads bucket by version, architecture, package variant, or direct URL. Download, unpack, and install the selected archive into the test.py cache with retry handling, marker files, and file locking so repeated or concurrent test runs can reuse the same installation safely.	2026-05-17 17:43:56 +03:00
Andrzej Jackowski	61e5ec9888	test: storage: retry fusermount3 unmount on teardown After stopping scylla server processes, the FUSE daemon (fuse2fs) may still be processing file handle closures. An immediate fusermount3 -u can fail with 'device busy', causing spurious test failures on teardown. Retry the unmount up to 10 times with 0.5s delay between attempts, and capture stderr for diagnostics. Fixes: SCYLLADB-2049 Closes scylladb/scylladb#29920	2026-05-16 19:36:48 +03:00
Piotr Dulikowski	460cb1656e	Merge 'test: limits: optimize test_max_cells to avoid large allocations and fragmentation' from Dario Mirovic The `test_max_cells` test was flaky due to `std::bad_alloc` caused by Seastar buddy allocator fragmentation. The root causes are: 1. The doubling loop with 24 iterations of CREATE/INSERT/DROP fragmented the allocator 2. The test built the whole batch as a single string that takes contiguous memory Also, some iterations inserted zero rows, but still did CREATE/DROP table which also contributed to the fragmentation. This patch series: - Skips iterations that insert zero rows - Creates the table once, truncates it after each test iteration - Switches to prepared statements Investigation results are presented in detail in https://scylladb.atlassian.net/browse/SCYLLADB-1645 Fixes SCYLLADB-1645 CI stability improvement. Backport to versions that have this test. Closes scylladb/scylladb#29759 * github.com:scylladb/scylladb: test: prepare max cells inserts test: reuse max cells schema test: limits: skip empty max cells iterations	2026-05-15 18:12:48 +02:00
Pavel Emelyanov	98bea152a8	auth: Remove unused default_superuser() function All callers have been migrated to read the superuser name from auth::config directly. Remove the now-unused helper that fetched it from db::config via the query processor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-15 18:55:02 +03:00
Pavel Emelyanov	9b58d2213b	auth: Switch role managers to use auth::config Convert all role manager implementations to receive their configuration from auth::config instead of accessing db::config through the query processor: - standard_role_manager: reads superuser name from config - ldap_role_manager: reads LDAP URL template, attribute, bind credentials, and permissions update interval from config; passes config to inner standard_role_manager - maintenance_socket_role_manager: keeps a const reference to service's config and passes it directly when lazily constructing standard_role_manager Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-15 18:55:02 +03:00
Aleksandra Martyniuk	d874d355c2	service: skip load_sketch unload for excluded nodes on RF shrink When an RF change shrinks replicas on a DC and the node being shrunk is excluded, refresh_tablet_load_stats() only provides load_stats for that node if it has a cached snapshot from when the node was still up. If the snapshot is missing or predates the tables being shrunk (e.g. they were created after the node went down), stats stay incomplete. In that case load_sketch::unload() called from make_rf_change_plan() throws: Can't provide accurate load computation with incomplete load_stats for host: <uuid> Since an excluded node is not expected to come back, load_stats will never become complete, and the topology coordinator retries the plan infinitely, hanging ALTER KEYSPACE. Add a check for excluded nodes and skip unload() for them: we are removing the replica, so accurate load data for that node is not needed. For all other node states the throw-and-retry behavior is preserved. Modify test_excludenode_shrink_rf to always trigger the bug: a new error injection 'force_down_node_load_stats_invalid' forces the invalid-stats path in refresh_tablet_load_stats() for a down node, so the test does not depend on whether the load-stats refresher happened to cache the excluded node's stats while it was still up. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1702. Closes scylladb/scylladb#29622	2026-05-15 17:46:28 +02:00
Pavel Emelyanov	14b36b3db1	auth: Switch authenticators to use auth::config Convert all authenticator implementations to receive their configuration from auth::config instead of accessing db::config through the query processor: - password_authenticator: reads superuser name and salted password from config, stores them as members - saslauthd_authenticator: reads socket path from config - certificate_authenticator: reads role queries from config - transitional_authenticator: passes config to inner password_authenticator - maintenance_socket_authenticator: inherits new constructor via using declaration Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-15 18:45:01 +03:00
Pavel Emelyanov	07ed557a2f	auth: Introduce auth::config and wire it through service Add a dedicated auth::config struct that carries all configuration options needed by auth modules. The config is created per-shard using sharded_parameter to ensure updateable_value fields are shard-local. The config is stored as a member in auth::service and passed by const reference to factories so that each auth module can receive its configuration when constructed. The modules themselves are not yet converted — they still read from db::config via the query processor. The stored config is also used in describe_roles() to read the superuser name, eliminating the default_superuser() call that reached into db::config via the query processor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-15 18:44:37 +03:00
Petr Gusev	9e3209e4a3	cql: refactor add_tablet_info to take tablet_routing_info directly Change add_tablet_info() to accept locator::tablet_routing_info instead of destructured (tablet_replica_set, token_range) pair. This simplifies all three call sites. Remove the empty-replicas guard inside add_tablet_info(): the only producer of tablet_routing_info is tablet ERM's check_locality(), which returns either nullopt (correctly routed) or info with replicas copied from tablet_info — a tablet always has replicas. All callers already check for nullopt before calling add_tablet_info(), so by the time we enter the function replicas are guaranteed non-empty.	2026-05-15 12:28:33 +02:00
Petr Gusev	738b7b4a86	cql: fix UB dereference of nullopt tablet_info in execute_with_condition When check_locality() returns nullopt (correctly routed LWT), the optional tablet_info was unconditionally dereferenced in the lambda capture list: tablet_info->tablet_replicas, tablet_info->token_range. The code previously masked this by initializing tablet_info with an empty-but-present value, so the dereference happened to work but only because the empty tablet_replicas made add_tablet_info() a no-op. After check_locality() overwrites it with nullopt, the dereference is UB. Fix by initializing tablet_info as empty (nullopt) and guarding the dereference.	2026-05-15 11:56:14 +02:00
Petr Gusev	8a76ec7e65	test/boost: add regression test for missing tablet routing after CAS bounce Add test_tablet_routing_info_after_cas_shard_bounce that verifies TABLETS_ROUTING_V1 payload is returned after an internal CAS shard bounce. The test simulates the transport-layer bounce: it creates a table whose single tablet replica lands on a shard different from the test thread, executes an LWT (which bounces), then transfers client_state via client_state_for_another_shard (preserving _original_shard) and re-executes on the tablet shard. The test asserts that check_locality() correctly detects the misrouting and returns tablet routing info. Refs SCYLLADB-2041	2026-05-15 11:56:14 +02:00
Petr Gusev	167a3c9c50	cql: fix missing TABLETS_ROUTING_V1 payload after CAS shard bounce After an internal CAS shard bounce, check_locality() was evaluating against this_shard_id() of the post-bounce shard — which is the correct tablet shard — so it returned nullopt, and LWT/SERIAL responses omitted the tablets-routing-v1 custom payload. The client never learned the correct tablet map. Fix by recording the original entry shard in client_state (initialized to this_shard_id() at construction, preserved across shard bounces via client_state_for_another_shard) and passing it to check_locality() so it compares against the client's actual routing decision. No host_id tracking or forwarded_client_state IDL changes are needed because CAS shard bounces are always intra-node. Fixes SCYLLADB-2041	2026-05-15 11:56:14 +02:00
Jenkins Promoter	db3c44440b	Update pgo profiles - aarch64	2026-05-15 05:49:12 +03:00
Jenkins Promoter	a2fd608b7d	Update pgo profiles - x86_64	2026-05-15 05:10:51 +03:00
Ernest Zaslavsky	8d85382c55	cmake: reuse precompiled header in scylla-main target scylla-precompiled-header defines the PCH (stdafx.hh) with PRIVATE visibility, so targets linking to it do not inherit the PCH. scylla-main was missing the PCH entirely, causing files like sstables_loader.cc to fail with 'no member read_entire_stream' since that symbol comes from <seastar/util/short_streams.hh> which is included in stdafx.hh. PR #29901 worked around this by adding the missing #include directly, but the real fix is to propagate the PCH to scylla-main — matching the configure.py behavior where every source file is compiled with -include-pch stdafx.hh.pch. Add target_precompile_headers(scylla-main REUSE_FROM scylla-precompiled-header) so that all sources in scylla-main benefit from the precompiled header. Refs: https://github.com/scylladb/scylladb/pull/29901	2026-05-14 19:46:51 +03:00
Ernest Zaslavsky	d0ac01af2f	idl: add missing groups_manager.idl.hh to CMakeLists.txt PR #28843 added strong_consistency/groups_manager.idl.hh to configure.py but not to idl/CMakeLists.txt, causing the CMake build to fail with a missing generated header.	2026-05-14 19:46:51 +03:00
Ernest Zaslavsky	c36932d252	scripts: add IDL-generated file comparison to compare_build_systems Add a 4th check that compares IDL-generated file sets between configure.py and CMake. Previously only compilation flags, link targets, and linker settings were compared — a missing IDL entry (like strong_consistency/groups_manager.idl.hh in PR #28843) would go undetected. The extractors parse ninja build statements from both systems and normalize to a canonical relative path (e.g. cache_temperature.dist.hh) for comparison. configure.py outputs are filtered by mode; CMake outputs handle the \| separator for implicit outputs in ninja build lines. Also update the documentation to mention the new check.	2026-05-14 19:46:51 +03:00
Marcin Maliszkiewicz	0574055b73	test: prepare max cells inserts Switch from raw CQL batch string to using a prepared statement. The old approach constructed the entire 50-row batch as a single CQL text string (~19.8 MiB with 32768 column names spelled out per row). This caused large contiguous allocations in the server. Fixes SCYLLADB-1645	2026-05-14 17:25:39 +02:00
Marcin Maliszkiewicz	0fd6f6f292	test: reuse max cells schema Extract table creation into _create_max_cell_count_table(). Call it once before the loop instead of creating and dropping the table on every iteration. Use TRUNCATE instead of DROP TABLE between iterations to clear data while keeping the schema. This avoids repeated schema operations that fragment the Seastar buddy allocator's address space with scattered small allocations. Refs SCYLLADB-1645	2026-05-14 17:24:53 +02:00
Marcin Maliszkiewicz	ec8f8e3a5b	Merge 'test: make test_vector_search_with_vector_store_mock 30 times faster!' from Nadav Har'El Before this patch, ``` test/cqlpy/run test_vector_search_with_vector_store_mock.py ``` Took 34 seconds. After this patch, it takes 1 second. Look at the individual patches for how the magic happened. The first patch lowers the test duration from 34 to 5 seconds, the second patch lowers it further to 1 second. Closes scylladb/scylladb#29891 * github.com:scylladb/scylladb: test/cqlpy: make test_vector_search_with_vector_store_mock faster vector-search: reset DNS timeout after changing host	2026-05-14 17:12:47 +02:00
Marcin Maliszkiewicz	3debae9a37	test: limits: skip empty max cells iterations The doubling loop in test_max_cells started from cells=1. Since each row has MAX_CELLS_COLUMNS (32768) cells, iterations where cells < MAX_CELLS_COLUMNS produced zero rows (cells // columns = 0). Those iterations only did CREATE TABLE / DROP TABLE with no data inserted. Start the loop from MAX_CELLS_COLUMNS and use a while loop. Co-authored-by: Dario Mirovic <dario.mirovic@scylladb.com> Refs SCYLLADB-1645	2026-05-14 17:00:15 +02:00
Botond Dénes	8a305dd6c7	docs: expand OCI Object Storage configuration section The existing OCI section in admin.rst was a minimal stub that only showed a config snippet without explaining how to actually set up connectivity. Add documentation for: - The OCI S3-compatible endpoint URL format (namespace + region) - That credentials must be set explicitly via AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY using OCI Customer Secret Keys (unlike AWS, OCI has no instance metadata fallback compatible with STS/EC2) - A note that iam_role_arn is AWS-specific and should be omitted for OCI Fixes: SCYLLADB-501 Closes scylladb/scylladb#29689	2026-05-14 16:44:42 +02:00
Piotr Dulikowski	5b269be37b	Merge 'test/cluster/test_view_building_coordinator: migrate test from dtest' from Michał Jadwiszczak Move `materialized_views_test.py::TestMaterializedViews::test_do_not_finish_view_building_with_hints` test from dtest to test.py. The dtest was throttling down IO throughput in the hope that the view building won't be finished too soon. This introduces some unreliability, which can be solved by using error injection and pausing view building until we stop necessary nodes. This patch adds 2 tests: one for tablet-based view and one for vnode-based. Both of the tests use error injection to pause view building. Fixes [SCYLLADB-1261](https://scylladb.atlassian.net/browse/SCYLLADB-1261) The issue was seen in 2026.2, so we should backport this patch to this version. [SCYLLADB-1261]: https://scylladb.atlassian.net/browse/SCYLLADB-1261?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29788 * github.com:scylladb/scylladb: test/cluster/mv/test_mv_building: add similar test for vnode-based view test/cluster/test_view_building_coordinator: migrate test from dtest db/view/view_building_worker: add more logs when flushing base table	2026-05-14 15:34:26 +02:00
Michał Jadwiszczak	25c176c1b4	sstables_loader: fix missing include Commit `c97232b` introduced use of `seastar::util::read_entire_stream()`, however it didn't included relevant header which is causing compilation error. It probably went silently through CI because of precompiled headers. Refs scylladb#28763 Closes scylladb/scylladb#29901	2026-05-14 15:16:34 +02:00
Piotr Szymaniak	ac3fff897a	alternator/doc: update Streams compatibility docs Alternator Streams graduated from experimental in #29604. Update the compatibility and FAQ docs accordingly: - Replace the "Experimental API features" section with a new "Alternator Streams" section that lists known differences without the experimental framing. - Expand the alternator_streams_increased_compatibility paragraph to explain both consequences of leaving it off (spurious no-op events and inaccurate INSERT/MODIFY distinction) and the performance cost of enabling it (LWT path for every write). - Drop the stale ShardFilter limitation (now implemented). - Replace the alternator-streams FAQ example with strongly-consistent-tables so the multi-feature syntax example remains useful. Fixes SCYLLADB-462 Closes scylladb/scylladb#29695	2026-05-14 15:06:19 +02:00
Michał Jadwiszczak	5c84cff78a	test/cluster/mv/test_mv_building: add similar test for vnode-based view In the dtest repo, the test run for both vnode and tablet based views. Since in test.py infra we're using error injection to pause the view building process, we need separate tests for those two cases.	2026-05-14 10:52:44 +02:00
Piotr Dulikowski	0c016cecc3	Merge 'QOS: self-heal stale V1-to-V2 migration state on upgrade' from Alex Dathskovsky service_levels: self-heal stale v1 marker after raft topology upgrade This PR handles an upgrade corner case where a node may already be using raft topology, while `system.scylla_local` still marks service levels as v1. The problem was introduced by commit `2917ec5d51` ("service:qos: service levels migration"), which added the service-levels migration from `system_distributed.service_levels` to `system.service_levels_v2` as part of the raft topology upgrade. However, if the cluster had no service levels configured, there was no data to migrate. In that case, the migration path could leave the local version marker unchanged, so the node would later observe an inconsistent state: * raft topology is already enabled; * service levels are still marked as v1 in `system.scylla_local`. Such clusters can be left in a stale state and fail startup during upgrade to 2026.2 This PR makes the upgrade path self-healing. The first commit restores `service_level_controller::migrate_to_v2()`, giving us a group0-based path for writing the service-levels v2 state even after raft topology is already in use. The second commit wires this path into startup. When the node detects the stale raft-topology + service-levels-v1 state, it retries the migration a bounded number of times and updates the version marker to v2 instead of failing startup. With this change, clusters that were left in this stale state can recover automatically during upgrade to 2026. Fixes: SCYLLADB-1807 backport: 2026.2 2026.1 we need this functionality when we are upgrading older servers Closes scylladb/scylladb#29749 * github.com:scylladb/scylladb: test/auth_cluster: simulate v1 state in self-heal test When skip_service_levels_v2_initialization is used, write an explicit v1 service level version marker while skipping v2 initialization. This lets the restart test exercise self-healing from v1 to v2. qos: self-heal stale service levels version on startup qos: reintroduce service levels v2 migration self-heal	2026-05-14 10:32:43 +02:00

1 2 3 4 5 ...

53943 Commits