Fix a lifetime bug where `send_snapshot()` captured `abort_source` by reference and the referenced object could be destroyed before the continuation ran.
Use a gate-tracked background coroutine for each snapshot transfer:
- keep abort_source on the coroutine frame (stable lifetime)
- store a raw abort_source* in _snapshot_transfers for synchronous abort
- erase transfer slots immediately on abort to allow same-batch reuse
- close _snapshot_gate during abort() to wait for all in-flight transfers
This removes the need for extra aborted-transfer bookkeeping and makes snapshot transfer shutdown and ownership semantics explicit.
Fixes: SCYLLADB-1234
Refs: https://github.com/scylladb/scylladb/pull/29092
No backport: Currently the abort source parameter is not being actually used, so this doesn't cause any problems in the current and older branches. So no backport is needed (the using of abort source parameter will be eventually implemented on master afterwards).
Closesscylladb/scylladb#29913
* https://github.com/scylladb/scylladb:
raft: fix send_snapshot abort_source lifetime
raft: fix parameter name mismatch in `send_snapshot()`
In Alternator's HTTP API, response headers can dominate bandwidth for
small payloads. The Server, Date, and Content-Type headers were sent on
every response but many clients never use them.
This patch introduces three Alternator config options:
- alternator_http_response_server_header,
- alternator_http_response_disable_date_header,
- alternator_http_response_disable_content_type_header,
which allow customizing or suppressing the respective HTTP response
headers. All three options support live update (no restart needed).
The Server header is no longer sent by default; the Date and
Content-Type defaults preserve the existing behavior.
The Server and Date header suppression uses Seastar's
set_server_header() and set_generate_date_header() APIs added in
https://github.com/scylladb/seastar/pull/3217. This patch also
fixes deprecation warnings from older Seastar HTTP APIs.
Tests are in test/alternator/test_http_headers.py.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-70Closesscylladb/scylladb#28288
Add more racks to dc2 to verify that the default replication factor
covers all available racks (rather than e.g. limited to 3).
With tablets and rf_rack_valid_keyspaces, verify also the automatically
selected rack list.
Restrict the extension to non-debug build modes to prevent running out
of memory with --repeat=100.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#29931
This patch adds to the existing collection of tests for Alternator
response compression another test with a tiny response being compressed.
This test serves two purposes:
1. It verifies setting alternator_response_compression_threshold_in_bytes
to a tiny number like 1 really means that tiny responses would be
compressed.
2. It verifies that our compression code, which has a special code path
for the small chunk at the end of the compression, works correctly.
The original motivation for writing this test was a false alarm by
Claude Code which claimed that Alternator's response compression code
has a serious, exploitable, memory overrun bug, because it set the
wrong size limit on that last chunk. Claude was wrong, there is no such
bug. We did set an oversized limit on the last chunk (so this patch
fixes this typo), but it didn't matter - because the code used
deflateBound - the guaranteed maximum size of the uncompressed data -
for the buffer's size, so the buffer was unconditionally big enough,
no matter which avail_out limit we passed to delate() it could never
overflow.
The included test passes even before this patch, even with ASAN
enabled to detect memory overflows - no overflow was happening.
It also passes after the typo correction in this patch.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#29718
Fix a lifetime bug where `send_snapshot()` captured `abort_source` by
reference and the referenced object could be destroyed before the
continuation ran.
Use a gate-tracked background coroutine for each snapshot transfer:
- keep abort_source on the coroutine frame (stable lifetime)
- store a raw abort_source* in _snapshot_transfers for synchronous abort
- erase transfer slots immediately on abort to allow same-batch reuse
- close _snapshot_gate during abort() to wait for all in-flight transfers
This removes the need for extra aborted-transfer bookkeeping and makes
snapshot transfer shutdown and ownership semantics explicit.
Fixes: SCYLLADB-1234
Rewrite gather metrics to be able to gather metrics for python tests correctly.
Python tests require different handling of metrics gathering from cgroup than C++ tests. pytest do not execute each python tests in a separate process, so we can't put it there and get the metrics.
The idea is to put the whole pytest process to the cgroup and get the metrics. This will work because pytest runs the threads as a completely separate processes and inside the thread it will run tests consequently.
Additionally, to simplify system resource monitor moved to pytest main thread.
Change the behavior of the gathering metrics. From this PR some data will be collected even with `--no-gather-metrics`. This data do not need any configuration and just metadata of the tests: test name, time of execution, status of the test. When `--gather-metrics` provided additionally will be written the data gathered from the cgroups about the memory for each specific test and system CPU/RAM utilization.
Backport is not needed, because it's a framework change only.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-575
~Blocked by: https://github.com/scylladb/scylladb/pull/27618~
Now python tests have metrics gathered from the cgroups as well with their own Scylla instances.
```bash
$ sqlite3 --header testlog/sqlite_af8cb.db 'select tst.path, tst.file, tst.test_name, user_sec,system_sec,usage_sec,memory_peak /1024/1024 as memory_peak_mb from test_metrics join tests as tst where tst.id = test_metrics.test_id order by memory_peak_mb desc limit 10;'
path|file|test_name|user_sec|system_sec|usage_sec|memory_peak_mb
test/cluster/dtest|limits_test.py|test_max_cells|489.468174|27.6638949999999|517.132069|4241
test/cluster/dtest|rebuild_test.py|test_rebuild_stream_abort_repro|93.6400869999998|28.9843249999999|122.624412|4241
test/cluster/dtest|schema_management_test.py|test_prepared_statements_work_after_node_restart_after_altering_schema_without_changing_columns|6.8933219999999|3.63569899999993|10.5290209999994|4241
test/cluster/dtest|schema_management_test.py|test_dropping_keyspace_with_many_columns|1.31770999999981|0.754742999999962|2.07245299999977|4241
test/cluster/dtest|schema_management_test.py|test_multiple_create_table_in_parallel|5.48435300000028|2.72915200000011|8.21350499999971|4241
test/cluster/dtest|schema_management_test.py|test_alter_table_in_parallel_to_read_and_write[write]|80.687293|18.5562|99.2434920000005|4241
test/cluster/dtest|schema_management_test.py|test_alter_table_in_parallel_to_read_and_write[read]|79.1984790000001|18.0969829999999|97.2954609999997|4241
test/cluster/dtest|schema_management_test.py|test_alter_table_in_parallel_to_read_and_write[mixed]|85.332915|18.9321070000001|104.265022|4241
test/cluster/dtest|schema_management_test.py|test_update_schema_while_node_is_killed[create_table]|10.5875369999999|5.67954400000008|16.267081|4241
test/cluster/dtest|schema_management_test.py|test_update_schema_while_node_is_killed[alter_table]|11.3801709999998|6.54689099999996|17.9270630000001|4241
```
Closesscylladb/scylladb#28206
* github.com:scylladb/scylladb:
test.py: Add host hardware info
test.py: rewrite resource gather
Remove "chinese", "japanese", and "korean" from the list of accepted
full-text search analyzer options. Exposing these options commits
ScyllaDB to supporting them long-term — if we ever switch from one
backend search engine to another, CJK analyzers are the most likely
to lose out-of-the-box support, unlike the popular European languages
that are broadly available across text analysis libraries.
Restrict the accepted set now, while FTS is still new, to avoid a
future compatibility burden.
Add a test to check if the CJK language analyzer options are rejected.
Fixes: VECTOR-672
Closesscylladb/scylladb#29877
value_to_json() converts CQL values to JSON for vector search filters.
For decimal and varint types, it used rjson::parse() on the JSON string,
which parses through a double and silently loses precision for values
exceeding ~15 significant digits — producing wrong filter results.
Additionally, for decimal type we need an exact string representation
that preserves the original (unscaled, scale) pair, because partition
keys use byte-level identity: different serialized representations of
the same numeric value are distinct rows, so the filter must reproduce
the exact representation stored in the key.
Add big_decimal::to_string_canonical() which follows the Java BigDecimal
toString() spec (JDK 8+), producing a bijective string representation
that uses exponential notation for extreme scales instead of expanding
trailing zeros (which could cause OOM). This could replace to_string(),
but doing so has wider consequences (e.g. hash/equality contract for
decimal_type) described in SCYLLADB-1574. Use it in value_to_json() for
decimal_type, and use rjson::from_string() for varint_type, both
bypassing the lossy double parse path.
Tests cover the new to_string_canonical() and the filter fix, as well as
existing decimal type behavior (key representation, clustering order,
toJson) that we rely on and must not break. The CQL decimal type tests
(test_type_decimal.py) also pass against Cassandra.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1583
Refs: https://scylladb.atlassian.net/browse/SCYLLADB-1574Closesscylladb/scylladb#29505
This series improves the readability and structure of
view_update_builder, the component that generates materialized view
updates from base-table mutations.
The first four patches are pure renames and refactoring with no
semantic changes:
1. Document that the builder operates on a single base partition.
2. Rename member fields to clearly distinguish readers (the
mutation_reader streams) from the cached fragments (the last
mutation_fragment_v2 read from each stream).
3. Rename advance/on_results methods to names that describe what
they actually do: read the next fragment, or generate view
updates.
4. Extract partition-start handling into its own method.
The next two patches are minor optimizations:
5. Simplify clustering-row handling by moving the row out of the
fragment before applying the tombstone, avoiding an unnecessary
memory-usage recalculation in the reader permit.
6. Replace deep copies with moves in the existing-only tail path,
matching the pattern used everywhere else.
Finally, patch 7 deduplicates the fragment-consuming logic by
extracting the three repeated blocks into consume_both_fragments(),
consume_update_fragment(), and consume_existing_fragment().
Code reorganization - no backport needed
Closesscylladb/scylladb#29497
* github.com:scylladb/scylladb:
mv: deduplicate code for consuming fragments in view_update_builder
mv: avoid unnecessary copies of existing rows in generate_updates()
mv: simplify clustering row handling in generate_updates()
mv: rename methods in view_update_builder for clarity
mv: rename view_update_builder readers and cached fragments
mv: drop redundant std::move from partition key extraction
mv: document single-partition builder scope
After recent change (1a32ccd) `make_update_indices_mutations()` is unconditionally adding a mutation for `system.view_building_tasks`, even when no indices were being dropped.
In a mixed-version cluster, the older node may not have this table, causing the Raft schema applier to fail with 'Can't find a column family with UUID ...'.
This patch fixes the bug by emitting the mutation when indices are actually dropped (i.e., when the view building cleanup code path was entered).
Fixes: SCYLLADB-2026
Refs: scylladb#26557
scylladb#26557 wasn't backported, so this patch also doesn't need to be.
Closesscylladb/scylladb#29908
* github.com:scylladb/scylladb:
db/schema_tables: don't emit empty view_building_tasks mutation on ALTER TABLE
db/view_building_task_mutation_builder: add `empty()` method
This series adds a shared helper for resolving, downloading, unpacking, and
installing Scylla relocatable packages for test.py.
The first patch introduces `version_fetch_utils`, which can resolve public
Scylla artifacts from the downloads bucket by version, architecture, package
variant, or direct URL. It also centralizes the local cache/install flow using
retry handling, marker files, and file locking so repeated or concurrent test
runs can safely reuse an existing installation.
The second patch wires this helper into the existing Scylla executable setup
paths. This removes the hard-coded 2025.1 package URL and replaces the local
download/unpack/install logic in `scylla_cluster.py` with the shared resolver.
It also makes `--exe-url` use the same cached installer path.
Together, these changes make upgrade-test executable selection less brittle,
avoid duplicated install logic, and provide a reusable foundation for fetching
other Scylla versions in test.py.
Closesscylladb/scylladb#29855
* github.com:scylladb/scylladb:
test/pylib: use version fetcher for Scylla executable setup
test/pylib: add cached Scylla package installer
Ran 'make update' to get the latest version of all dependencies needed to build docs.
Tested with 'make test' only.
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-Assisted: no, to my surprise.
Backport: not sure.
Closesscylladb/scylladb#29909
Python tests requires different handling of metrics gathering from
cgroup than C++ tests. pytest do not execute each python tests in
a separate process, so we can't put it there and get the metrics.
The idea is to put the whole pytest process to the cgroup and get the
metrics. This will work because pytest runs the threads as as completely
separate processes and inside the thread it will run tests consequently.
Additionally, to simplify system resource monitor moved to pytest main
thread.
Auth modules (authenticators, role managers, and auth::service) access their configuration options by reaching into db::config through the query processor. This abuses database as proxy object to get configuration.
This series introduces a dedicated auth::config struct that carries the configuration options used by auth modules.The config is populated in main.cc and delivered to each shard via sharded_parameter. This makes auth service conform to the overall design, where db::config is split into smaller per-service configs on start, thus decoupling individual components/services from global configuration.
Cleaning components dependencies, not backporting.
Closesscylladb/scylladb#29870
* github.com:scylladb/scylladb:
auth: Remove unused default_superuser() function
auth: Switch role managers to use auth::config
auth: Switch authenticators to use auth::config
auth: Introduce auth::config and wire it through service
After an internal CAS shard bounce, check_locality() was evaluating
against this_shard_id() of the post-bounce shard — which is the correct
tablet shard — so it returned nullopt, and LWT/SERIAL responses omitted
the tablets-routing-v1 custom payload. The client never learned the
correct tablet map.
Fix by recording the original entry shard in client_state (initialized
to this_shard_id() at construction, preserved across shard bounces via
client_state_for_another_shard) and passing it to check_locality() so
it compares against the client's actual routing decision.
No host_id tracking or forwarded_client_state IDL changes are needed
because CAS shard bounces are always intra-node.
Fixes SCYLLADB-2041
backport: need to backport to all versions with LWT over tablets
Closesscylladb/scylladb#29910
* https://github.com/scylladb/scylladb:
cql: refactor add_tablet_info to take tablet_routing_info directly
cql: fix UB dereference of nullopt tablet_info in execute_with_condition
test/boost: add regression test for missing tablet routing after CAS bounce
cql: fix missing TABLETS_ROUTING_V1 payload after CAS shard bounce
Add explicit empty permissions block (permissions: {}) since this
workflow only triggers Jenkins and sends Slack notifications using its
own secrets. Also move expression interpolations into env vars to
prevent potential script injection. Fixes code scanning alert #147.
Also remove the pre-existing 'permissions: contents: read' block,
which would result in duplicate YAML keys (invalid per the YAML spec).
Closesscylladb/scylladb#29186
This series adds IDL file comparison to the build system comparison tool and fixes CMake PCH propagation.
1. `scripts/compare_build_systems.py` only compared compilation flags, link targets, and linker settings — it did not compare IDL-generated file sets. This allowed PR #28843 to pass CI despite adding `strong_consistency/groups_manager.idl.hh` to `configure.py` but not to `idl/CMakeLists.txt`.
2. CMake's `scylla-main` target was not using the precompiled header (`stdafx.hh`), even though configure.py applies it to every source file via `-include-pch`. This caused compilation failures for files relying on transitive includes from the PCH — e.g., `sstables_loader.cc` failed with `no member named 'read_entire_stream' in namespace 'seastar::util'`.
Add a 4th comparison check to the build system comparison script: extract IDL-generated file sets from both build systems' ninja files and compare them. The extractors parse ninja build statements — configure.py side filters by build mode, CMake side handles the `|` separator for implicit outputs — and normalize to a canonical relative path for comparison.
Add the missing `strong_consistency/groups_manager.idl.hh` to `idl/CMakeLists.txt`.
Add `target_precompile_headers(scylla-main REUSE_FROM scylla-precompiled-header)` so that all sources compiled under `scylla-main` benefit from the PCH, matching configure.py's behavior.
Update documentation to reflect the new IDL comparison check.
Refs: https://github.com/scylladb/scylladb/pull/29901
Refs: https://github.com/scylladb/scylladb/pull/28843
No backport needed — these are build system improvements only.
Closesscylladb/scylladb#29912
* github.com:scylladb/scylladb:
cmake: reuse precompiled header in scylla-main target
idl: add missing groups_manager.idl.hh to CMakeLists.txt
scripts: add IDL-generated file comparison to compare_build_systems
The paxos state queries (load_paxos_state, save_paxos_promise, etc.)
were using page_size=-1 (no paging). While each query returns at most
one row and paging never actually kicks in, the lack of paging causes
these internal queries to be counted as non-paged reads in the metrics,
which can be confusing to users monitoring their cluster.
Add LIMIT 1 to the SELECT query so that may_need_paging() short-circuits
to false (row_limit <= 1), avoiding pager allocation overhead entirely.
Set page_size=1000 so these queries are no longer reported as non-paged
reads.
Refs: https://scylladb.atlassian.net/browse/CUSTOMER-372
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Backport: no, improvement
Closesscylladb/scylladb#29852
After recent change (1a32ccd) `make_update_indices_mutations()` is unconditionally
adding a mutation for `system.view_building_tasks`, even when no indices were being dropped.
In a mixed-version cluster, the older node may not have this table,
causing the Raft schema applier to fail with 'Can't find a column
family with UUID ...'.
This patch fixes the bug by emitting the mutation when indices are actually
dropped (i.e., when the view building cleanup code path was entered).
Fixes: SCYLLADB-2026
Refs: scylladb#26557
start_docker_service is a coroutine that took docker_args and
image_args by const reference. Its caller start_fake_gcs_server
is a regular function that passes temporaries (initializer lists)
and immediately returns a future. The temporaries are destroyed
when the caller returns, leaving the coroutine holding dangling
references.
On the first loop iteration this works by luck (memory not yet
reused), but on retry (after "address already in use") the
params.append_range(image_args) reads freed memory, causing
use-after-free that manifests as std::bad_alloc or broken_promise
in non-sanitizer builds.
Fix by taking docker_args and image_args by value so the coroutine
frame owns the vectors for its entire lifetime.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2003Closesscylladb/scylladb#29932
DynamoDB normalizes Number values, so different string representations
of the same number (e.g., "1000" vs "1e3") should be treated as the
same value in all contexts.
In Alternator this is true in most cases, thanks to implicit normalization in
Decimal `to_string()` function.
However this is fragile - and in fact this function should be fixed
due to OOM vulnerability in CQL use (#8002).
This patch adds tests that should prevent regression in cases
that work currently.
Unfortunately not all contexts work currently - mainly the HASH keys
are not normalized and backend handles them by byte representation.
Added test replicate this incorrect behaviour
All added tests pass with DynamoDB, with one exception: weirdly
DynamoDB doesn't recognise unnormalized numbers in BatchGetItem
as duplicate keys.
Ref SCYLLADB-1575
Closesscylladb/scylladb#29501
After all test suites migrated to test_config.yaml with type: Python,
the specialized suite classes (Topology, CQLApproval, Run, Tool) and
the legacy execution pipeline (find_tests, run_test, TestSuite.run,
Test.run) became unreachable. Remove all this dead code.
Deleted files:
- suite/topology.py, suite/cql_approval.py, suite/run.py, suite/tool.py
Simplified:
- base.py: remove run_test(), read_log(), TestSuite.run(),
add_test_list(), build_test_list(), all_tests(), test_count(),
SUITE_CONFIG_FILENAME, disabled/flaky test tracking, and dead
Test attributes (args, core_args, valid_exit_codes, allure_dir,
is_flaky, is_cancelled, etc.)
- python.py: remove PythonTestSuite.run(), PythonTest.run(),
_prepare_pytest_params(), pattern, test_file_ext, xmlout,
server_log, scylla_env setup, and shlex import.
Simplify run_ctx() to take no parameters.
- runner.py: remove --scylla-log-filename option,
print_scylla_log_filename fixture, SUITE_CONFIG_FILENAME import,
and suite.yaml probe in TestSuiteConfig.from_pytest_node().
- __init__.py: remove re-exports of deleted classes.
- test_config.yaml: Topology -> Python, Approval -> Python.
- conftest files: run_ctx(options=...) -> run_ctx().
- docs/dev/testing.md: update to reflect current pytest-based
architecture, log paths, and removed features.
Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>
Closesscylladb/scylladb#29613
Replace the hard-coded 2025.1 archive download and local install logic with the
shared Scylla package fetch/install helper. This keeps upgrade-test executable
resolution and `--exe-url` handling on the same cached installer path.
Add utilities to resolve relocatable Scylla artifacts from the public downloads
bucket by version, architecture, package variant, or direct URL. Download,
unpack, and install the selected archive into the test.py cache with retry
handling, marker files, and file locking so repeated or concurrent test runs can
reuse the same installation safely.
After stopping scylla server processes, the FUSE daemon
(fuse2fs) may still be processing file handle closures.
An immediate fusermount3 -u can fail with 'device busy',
causing spurious test failures on teardown.
Retry the unmount up to 10 times with 0.5s delay between
attempts, and capture stderr for diagnostics.
Fixes: SCYLLADB-2049
Closesscylladb/scylladb#29920
The `test_max_cells` test was flaky due to `std::bad_alloc` caused by Seastar buddy allocator fragmentation. The root causes are:
1. The doubling loop with 24 iterations of CREATE/INSERT/DROP fragmented the allocator
2. The test built the whole batch as a single string that takes contiguous memory
Also, some iterations inserted zero rows, but still did CREATE/DROP table which also contributed to the fragmentation.
This patch series:
- Skips iterations that insert zero rows
- Creates the table once, truncates it after each test iteration
- Switches to prepared statements
Investigation results are presented in detail in https://scylladb.atlassian.net/browse/SCYLLADB-1645
Fixes SCYLLADB-1645
CI stability improvement. Backport to versions that have this test.
Closesscylladb/scylladb#29759
* github.com:scylladb/scylladb:
test: prepare max cells inserts
test: reuse max cells schema
test: limits: skip empty max cells iterations
All callers have been migrated to read the superuser name from
auth::config directly. Remove the now-unused helper that fetched
it from db::config via the query processor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Convert all role manager implementations to receive their
configuration from auth::config instead of accessing db::config
through the query processor:
- standard_role_manager: reads superuser name from config
- ldap_role_manager: reads LDAP URL template, attribute, bind
credentials, and permissions update interval from config;
passes config to inner standard_role_manager
- maintenance_socket_role_manager: keeps a const reference to
service's config and passes it directly when lazily
constructing standard_role_manager
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When an RF change shrinks replicas on a DC and the node being shrunk is
excluded, refresh_tablet_load_stats() only provides load_stats for that
node if it has a cached snapshot from when the node was still up. If the
snapshot is missing or predates the tables being shrunk (e.g. they were
created after the node went down), stats stay incomplete. In that case
load_sketch::unload() called from make_rf_change_plan() throws:
Can't provide accurate load computation with incomplete load_stats
for host: <uuid>
Since an excluded node is not expected to come back, load_stats will
never become complete, and the topology coordinator retries the plan
infinitely, hanging ALTER KEYSPACE.
Add a check for excluded nodes and skip unload() for them: we are
removing the replica, so accurate load data for that node is not
needed. For all other node states the throw-and-retry behavior is
preserved.
Modify test_excludenode_shrink_rf to always trigger the bug: a new
error injection 'force_down_node_load_stats_invalid' forces the
invalid-stats path in refresh_tablet_load_stats() for a down node, so
the test does not depend on whether the load-stats refresher happened
to cache the excluded node's stats while it was still up.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1702.
Closesscylladb/scylladb#29622
Convert all authenticator implementations to receive their
configuration from auth::config instead of accessing db::config
through the query processor:
- password_authenticator: reads superuser name and salted password
from config, stores them as members
- saslauthd_authenticator: reads socket path from config
- certificate_authenticator: reads role queries from config
- transitional_authenticator: passes config to inner
password_authenticator
- maintenance_socket_authenticator: inherits new constructor
via using declaration
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add a dedicated auth::config struct that carries all configuration
options needed by auth modules. The config is created per-shard using
sharded_parameter to ensure updateable_value fields are shard-local.
The config is stored as a member in auth::service and passed by
const reference to factories so that each auth module can receive its
configuration when constructed. The modules themselves are not yet
converted — they still read from db::config via the query processor.
The stored config is also used in describe_roles() to read the
superuser name, eliminating the default_superuser() call that reached
into db::config via the query processor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Change add_tablet_info() to accept locator::tablet_routing_info instead
of destructured (tablet_replica_set, token_range) pair. This simplifies
all three call sites.
Remove the empty-replicas guard inside add_tablet_info(): the only
producer of tablet_routing_info is tablet ERM's check_locality(), which
returns either nullopt (correctly routed) or info with replicas copied
from tablet_info — a tablet always has replicas. All callers already
check for nullopt before calling add_tablet_info(), so by the time we
enter the function replicas are guaranteed non-empty.
When check_locality() returns nullopt (correctly routed LWT), the
optional tablet_info was unconditionally dereferenced in the lambda
capture list: tablet_info->tablet_replicas, tablet_info->token_range.
The code previously masked this by initializing tablet_info with an
empty-but-present value, so the dereference happened to work but
only because the empty tablet_replicas made add_tablet_info() a no-op.
After check_locality() overwrites it with nullopt, the dereference
is UB.
Fix by initializing tablet_info as empty (nullopt) and guarding the
dereference.
Add test_tablet_routing_info_after_cas_shard_bounce that verifies
TABLETS_ROUTING_V1 payload is returned after an internal CAS shard
bounce.
The test simulates the transport-layer bounce: it creates a table whose
single tablet replica lands on a shard different from the test thread,
executes an LWT (which bounces), then transfers client_state via
client_state_for_another_shard (preserving _original_shard) and
re-executes on the tablet shard. The test asserts that check_locality()
correctly detects the misrouting and returns tablet routing info.
Refs SCYLLADB-2041
After an internal CAS shard bounce, check_locality() was evaluating
against this_shard_id() of the post-bounce shard — which is the correct
tablet shard — so it returned nullopt, and LWT/SERIAL responses omitted
the tablets-routing-v1 custom payload. The client never learned the
correct tablet map.
Fix by recording the original entry shard in client_state (initialized
to this_shard_id() at construction, preserved across shard bounces via
client_state_for_another_shard) and passing it to check_locality() so
it compares against the client's actual routing decision.
No host_id tracking or forwarded_client_state IDL changes are needed
because CAS shard bounces are always intra-node.
Fixes SCYLLADB-2041
scylla-precompiled-header defines the PCH (stdafx.hh) with PRIVATE
visibility, so targets linking to it do not inherit the PCH.
scylla-main was missing the PCH entirely, causing files like
sstables_loader.cc to fail with 'no member read_entire_stream' since
that symbol comes from <seastar/util/short_streams.hh> which is
included in stdafx.hh.
PR #29901 worked around this by adding the missing #include directly,
but the real fix is to propagate the PCH to scylla-main — matching
the configure.py behavior where every source file is compiled with
-include-pch stdafx.hh.pch.
Add target_precompile_headers(scylla-main REUSE_FROM
scylla-precompiled-header) so that all sources in scylla-main benefit
from the precompiled header.
Refs: https://github.com/scylladb/scylladb/pull/29901
PR #28843 added strong_consistency/groups_manager.idl.hh to
configure.py but not to idl/CMakeLists.txt, causing the CMake build
to fail with a missing generated header.
Add a 4th check that compares IDL-generated file sets between
configure.py and CMake. Previously only compilation flags, link
targets, and linker settings were compared — a missing IDL entry
(like strong_consistency/groups_manager.idl.hh in PR #28843) would
go undetected.
The extractors parse ninja build statements from both systems and
normalize to a canonical relative path (e.g. cache_temperature.dist.hh)
for comparison. configure.py outputs are filtered by mode; CMake
outputs handle the | separator for implicit outputs in ninja build
lines.
Also update the documentation to mention the new check.
Switch from raw CQL batch string to using a prepared statement.
The old approach constructed the entire 50-row batch as a single
CQL text string (~19.8 MiB with 32768 column names spelled out
per row). This caused large contiguous allocations in the server.
Fixes SCYLLADB-1645
Extract table creation into _create_max_cell_count_table(). Call
it once before the loop instead of creating and dropping the table
on every iteration. Use TRUNCATE instead of DROP TABLE between
iterations to clear data while keeping the schema.
This avoids repeated schema operations that fragment the Seastar
buddy allocator's address space with scattered small allocations.
Refs SCYLLADB-1645
Before this patch,
```
test/cqlpy/run test_vector_search_with_vector_store_mock.py
```
Took 34 seconds.
After this patch, it takes **1 second**.
Look at the individual patches for how the magic happened. The first patch lowers the test duration from 34 to 5 seconds, the second patch lowers it further to 1 second.
Closesscylladb/scylladb#29891
* github.com:scylladb/scylladb:
test/cqlpy: make test_vector_search_with_vector_store_mock faster
vector-search: reset DNS timeout after changing host
The doubling loop in test_max_cells started from cells=1. Since
each row has MAX_CELLS_COLUMNS (32768) cells, iterations where
cells < MAX_CELLS_COLUMNS produced zero rows (cells // columns = 0).
Those iterations only did CREATE TABLE / DROP TABLE with no data
inserted.
Start the loop from MAX_CELLS_COLUMNS and use a while loop.
Co-authored-by: Dario Mirovic <dario.mirovic@scylladb.com>
Refs SCYLLADB-1645
The existing OCI section in admin.rst was a minimal stub that only showed
a config snippet without explaining how to actually set up connectivity.
Add documentation for:
- The OCI S3-compatible endpoint URL format (namespace + region)
- That credentials must be set explicitly via AWS_ACCESS_KEY_ID /
AWS_SECRET_ACCESS_KEY using OCI Customer Secret Keys (unlike AWS,
OCI has no instance metadata fallback compatible with STS/EC2)
- A note that iam_role_arn is AWS-specific and should be omitted for OCI
Fixes: SCYLLADB-501
Closesscylladb/scylladb#29689