Commit Graph

53948 Commits

Author SHA1 Message Date
Botond Dénes
830d28a889 Merge 'Use standard helpers to create ks:cf and populate it in test_backup.py' from Pavel Emelyanov
The PR removed the create_and_ks() helper from backup test and patches all callers to create keyspace, table and populate them with standard explicit facilities. While patching it turned out that one test doesn't need to populate the table, so it even becomes tiny bit shorter and faster

Enhancing test, not backporting

Closes scylladb/scylladb#29417

* github.com:scylladb/scylladb:
  test_backup: Remove create_ks_and_cf helper Test
  test_backup: Replace create_ks_and_cf with async patterns Test
  test_backup: Add if-True blocks for indentation Test
2026-04-16 12:54:21 +03:00
Nikos Dragazis
7abcf94823 test: Use arbitrary tokens in vnodes->tablets migration tests
The migration tests used to start nodes with pre-computed power-of-two
tokens. This was required because the migration itself only supported
power-of-two aligned tokens. Now that arbitrary tokens are supported,
switch the tests to use Scylla's default random token assignment.

Switching to arbitrary tokens makes the tests non-deterministic, but the
migration aspects that are affected by the token distribution
(resharding, wrap-around vnode split) are out of scope for these tests
and covered by dedicated tests.

Add a `get_all_vnode_tokens()` helper that queries system.topology at
runtime to discover the actual token layout, and derive expected tablet
counts from that.

Also account for the possible extra wrap-around tablet when the last
vnode token does not coincide with MAX_TOKEN.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2026-04-16 12:47:27 +03:00
Nikos Dragazis
26f0c038af test: boost: Add test for wrap-around vnodes
Add a Boost test to verify that `prepare_for_tablets_migration()`
produces the correct tablet boundaries when a wrap-around vnode exists.

Tablets cannot wrap around the token ring as vnodes do; the last token
of the last tablet must always be MAX_TOKEN. When the last vnode token
does not coincide with MAX_TOKEN, the wrap-around vnode must be split
into two tablets.

The test is parameterized over both cases: unaligned (split expected)
and aligned (no split expected).

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2026-04-16 12:47:16 +03:00
Botond Dénes
c355df4461 Merge 'test: Lower default log level from DEBUG to INFO' from Artsiom Mishuta
1. test.py — Removed --log-level=DEBUG flag from pytest args
2. test/pytest.ini — Changed log_level to INFO (that was set DEBUG in test.py), changed log_file_level from DEBUG to INFO, added clarifying comments

+minor fix

[test/pylib: save logs on success only during teardown phase](0ede308a04)
Previously, when --save-log-on-success was enabled, logs were saved
for every test phase (setup, call, teardown)in 3 files. Restrict it to only
the teardown phase, that contains all 3 in case of test success,
to avoid redundant log entries.

Closes scylladb/scylladb#29086

* github.com:scylladb/scylladb:
  test/pylib: save logs on success only during teardown phase
  test: Lower default  log level from DEBUG to INFO
2026-04-16 12:46:11 +03:00
Nikos Dragazis
098732ff76 storage_service: Support vnodes->tablets migrations w/ arbitrary tokens
The vnodes-to-tablets migration creates tablet maps that mirror the
vnode layout: one tablet per vnode, preserving token boundaries and
replica placement. However, due to tablet restrictions, the migration
requires vnode tokens to be a power of two and uniformly distributed
across the token ring.

In practice, this restriction is too limiting. Real clusters use
randomly generated tokens and a node's token assignment is immutable.

To solve this problem, prior work (01fb97ee78) has been done to relax
the tablet constraints by allowing arbitrary tablet boundaries, removing
the requirement for power-of-two sizing and uniform distribution.

This patch leverages the relaxed tablet constraints to enable tablet map
creation from arbitrary vnode tokens:

* Removes all token-related constraints.
* Handles wrap-around vnodes. If a vnode wraps (i.e., the highest vnode
  token is not `dht::token::last()`), it is split into two tablets:
  - (last_vnode_token, dht::token::last()]
  - [dht::token::first(), first_vnode_token]

The migration ops guide has been updated to remove the power-of-two
constraint.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2026-04-16 12:39:23 +03:00
Nikos Dragazis
8ea8c05120 storage_service: Hoist migration precondition
`prepare_for_tablets_migration()` is idempotent; it filters out tables
that already have tablet maps and returns early if no tablet maps need
to be created. However, this precondition is currently misplaced. Move
it higher to skip extra work.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2026-04-16 12:19:34 +03:00
Botond Dénes
9bfcc25cf7 Merge 'streaming: stream_blob: hold table for streaming' from Michael Litvak
When initializing streaming sources in tablet_stream_files_handler we
use a reference to the table. We should hold the table while doing so,
because otherwise the table may be dropped and destroyed when we yield.
Use the table.stream_in_progress() phaser to hold the table while we
access it.

For sstable file streaming we can release the table after the snapshot
is initialized, and the table may be dropped safely because the files
are held by the snapshot and we don't access the table anymore. There
was a single access to the table for logging but it is replaced by a
pre-calculated variable.

For logstor segment streaming, currently it doesn't support discarding
the segments while they are streamed - when the table is dropped it
discard the segments by overwriting and freeing them, so they shouldn't
be accessed after that. Therefore, in that case continue to hold the
table until streaming is completed.

Fixes [SCYLLADB-1533](https://scylladb.atlassian.net/browse/SCYLLADB-1533)

It's a pre-existing use-after-free issue in sstable file streaming so should be backported to all releases.
It's also made worse with the recent changes of logstor, and affects also non-logstor tables, so the logstor fixes should be in the same release (2026.2).

[SCYLLADB-1533]: https://scylladb.atlassian.net/browse/SCYLLADB-1533?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#29488

* github.com:scylladb/scylladb:
  test: test drop table during streaming
  streaming: stream_blob: hold table for streaming
2026-04-16 12:12:42 +03:00
Dario Mirovic
50e498ac0d test/lib: fix typos in proc_utils, gcs_fixture, and dockerized_service
Fix assorted typos in comments, strings, and identifiers:
- path_preprend -> path_prepend (proc_utils.hh, proc_utils.cc)
- laúnch -> launch (proc_utils.cc)
- hand/fail -> hang/fail (dockerized_service.py)
- inconvinient -> inconvenient (dockerized_service.py)
- priviledges -> privileges (gcs_fixture.hh)
- remove double semicolon (gcs_fixture.cc)

Refs SCYLLADB-1542
2026-04-16 10:58:55 +02:00
Dario Mirovic
11b5997eaf test: gcs_fixture: rename container from "local-kms" to "fake-gcs-server"
The GCS fixture's fake-gcs-server container was named "local-kms",
copy-pasted from the AWS KMS fixture. It happened when both were
refactored to use the shared start_docker_service helper (bc544eb08e).

Rename to "fake-gcs-server" to match the Python-side naming and avoid
confusion in logs.

Refs SCYLLADB-1542
2026-04-16 10:58:52 +02:00
Dario Mirovic
dc7f848bf8 test: fix proc_utils.cc formatting from previous commit
Fix indentation of lines moved inside the for-loop in
start_docker_service (lines 208-225).

Refs SCYLLADB-1542
2026-04-16 10:55:48 +02:00
Dario Mirovic
be4d32c474 test: lib: use unique container name per retry attempt
The container name is generated once before the retry loop, so
all retry attempts reuse the same name. Move the name generation
inside the loop so each attempt gets a fresh name via the
incrementing counter, consistent with the comment "publish port
ephemeral, allows parallel instances".

Formatting changes (indentation) of lines 208-225 in test/lib/proc_utils.cc
will be fixed in the next commit.

Refs SCYLLADB-1542
2026-04-16 10:55:04 +02:00
Botond Dénes
33682fd14e Merge 'sstables/storage_manager: fix race between object storage config update and keyspace creation' from Dimitrios Symonidis
Previously, config_updater used a serialized_action to trigger update_config() when object_storage_endpoints changed. Because serialized_action::trigger() always schedules the action as a new reactor task (via semaphore::wait().then()), there was a window between the config value becoming visible to the REST API and update_config() actually running. This allowed a concurrent CREATE KEYSPACE to see the new endpoint via is_known_endpoint() before storage_manager had registered it in _object_storage_endpoints.

Now config observers run synchronously in a reactor turn and must not suspend. Split the previous monolithic async update_config() coroutine  into two phases:

- Sync (in the observer, never suspends): storage_manager::_object_storage_endpoints is updated in place; for already-instantiated clients, update_config_sync swaps the new config atomically
- Async (per-client gate): background fibers finish the work that can't run in the observer — S3 refreshes credentials under _creds_sem; GCS drains and closes the replaced client.

Config reloads triggered by SIGHUP are applied on shard 0 and then broadcast to all other shards. An rwlock has been also introduced to make sure that the configuration has been propagated to all cores. This guarantees that a client requesting a config via the REST API will see a consistent snapshot

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-757
Fixes: [28141](https://github.com/scylladb/scylladb/issues/28141)

Closes scylladb/scylladb#28950

* github.com:scylladb/scylladb:
  test/object_store: verify object storage client creation and live reconfiguration
  sstables/utils/s3: split config update into sync and async parts
  test_config: improve logging for wait_for_config API
  db: introduce read-write lock to synchronize config updates with REST API
2026-04-16 10:20:43 +03:00
Michael Litvak
43c76aaf2b logstor: split log record to header and data
Split the `log_record` to `log_record_header` type that has the record
metadata fields and the mutation as a separate field which is the actual
record data:

struct log_record {
    log_record_header header;
    canonical_mutation mut;
};

Both the header and mutation have variable serialized size. When a
record is serialized in a write_buffer, we first put a small
`record_header` that has the header size and data size, then the
serialized header and data follow. The `log_location` of a record points
to the beginning of the `record_header`, and the size includes the
`record_header`.

This allows us to read a record header without reading the data when
it's not needed and avoid deserializing it:
* on recovery, when scanning all segments, we read only the record
  headers.
* on compaction, we read the record header first to determine if the
  record is alive, if yes then we read the data.

Closes scylladb/scylladb#29457
2026-04-16 10:00:35 +03:00
Botond Dénes
8e7ba7efe2 Merge 'commitlog: fix segment replay order by using ordered map per shard' from Sergey Zolotukhin
The commitlog replayer groups segments by shard using a
std::unordered_multimap, then iterates per-shard segments via
equal_range(). However, equal_range() does not guarantee iteration
order for elements with the same key, so segments could be replayed
out of order within a shard.

Correct segment ordering is required for:
- Fragmented entry reconstruction, which accumulates fragments across
  segments and depends on ascending order for efficient processing.
- Commitlog-based storage used by the strongly consistent tables
  feature, which relies on replayed raft items being stored in order.

Fix by changing the data structure from
  std::unordered_multimap<unsigned, commitlog::descriptor>
to
  std::unordered_map<unsigned, utils::chunked_vector<commitlog::descriptor>>

Since the descriptors are inserted from a std::set ordered by ID, the
vector preserves insertion (and thus ID) order. The per-shard iteration
now simply iterates the vector, guaranteeing correct replay order.

Fixes: SCYLLADB-1411

Backport: It looks like this issue doesn't cause any trouble, and is required only by the strong consistent tables, so no backporting required.

Closes scylladb/scylladb#29372

* github.com:scylladb/scylladb:
  commitlog: add test to verify segment replay order
  commitlog: fix replay order by using ordered map per shard
2026-04-16 09:55:27 +03:00
Łukasz Paszkowski
61877e9dfb test/cluster/storage: Add a reproducer for load-and-stream out-of-space rejection
Add `test_load_and_stream_rejected_on_critical_disk` which verifies
that `nodetool refresh --load-and-stream` is rejected when the target
node reaches critical disk utilization during streaming. The test is
marked xfail because the stream_mutation_fragments handler does not
yet check whether the node is in the critical disk utilization mode
(introduced in the next patch).

The test sets up a 3-node cluster, writes data and snapshots SSTables
on one node, wipes another node's data, and copies the snapshot to its
upload directory. It then starts load-and-stream and uses the
`write_components_writer_created` error injection to pause SSTable writing.
While paused, the test fills the disk past the critical threshold, then
releases the injection. The next streamed mutation fragment is rejected
with critical_disk_utilization_exception.

The test verifies that:

- The operation fails with the expected error.
- No data is persisted on the target node.
- Partial SSTable files created during streaming are deleted (via the
  implicit mark-for-deletion mechanism in the SSTable lifecycle).
2026-04-16 08:38:34 +02:00
Łukasz Paszkowski
8d34127684 sstables: clean up TemporaryHashes file in wipe()
The TemporaryHashes.db.tmp file is created during SSTable writing to
store intermediate bloom filter hashes and is deleted before the SSTable
is sealed. Since it is not tracked in the TOC, it is also absent from
_recognized_components and all_components().

When an SSTable write fails before sealing (e.g. streaming rejected due
to critical disk utilization), wipe() is called to clean up the partial
SSTable. However, wipe() only iterates over all_components(), so the
TemporaryHashes file was left behind as an orphan.

Previously, the only cleanup mechanism for this file was the
startup-time directory scanner in sstable_directory, which would not
help when the orphan needs to be cleaned up at runtime.

Explicitly remove the TemporaryHashes file in wipe(), ignoring ENOENT
for the common case where the file was already removed before sealing.
2026-04-16 08:38:34 +02:00
Łukasz Paszkowski
159675e975 sstables: add error injection point in write_components
Add a `write_components_writer_created` error injection point in
`sstable::write_components()` between writer creation and fragment
consumption.

This injection is needed by the out-of-space streaming test (added in
the next patch) to reliably pause SSTable writing at the right moment:
after the SSTable writer has been created and files exist on disk, but
before mutation fragments are consumed.

Pausing earlier (before writer creation) would not work because there
are no files on disk yet, while pausing later (after consuming fragments)
would be too late to reliably push the node into critical disk utilization.
2026-04-16 08:38:34 +02:00
Łukasz Paszkowski
d1a24aa16a test/cluster/storage: extract validate_data_existence to module scope
Move validate_data_existence out of test_user_writes_rejection into
module scope so it can be reused by other tests in the file. No
functional change.
2026-04-16 08:38:33 +02:00
Łukasz Paszkowski
9c82b76755 test/cluster: enable suppress_disk_space_threshold_checks in tests using data_file_capacity
Tests that override disk capacity via the data_file_capacity config
option trigger the disk space monitor's critical utilization mode and
as a consequence activate out-of-space prevention mechanisms.

This will cause bootstrap failures with critical_disk_utilization_exception
during mutation-based streaming introduced later in the series.

Enable the `suppress_disk_space_threshold_checks` error injection at
startup in the affected tests to prevent the disk space monitor from
interfering with the test-configured capacity values.

Affected tests:
- test_balance_empty_tablets (test/cluster/test_size_based_load_balancing.py)
- test_load_stats_on_coordinator_failover (test/cluster/test_tablet_stats.py)
2026-04-16 08:38:33 +02:00
Łukasz Paszkowski
3726e31c03 utils/disk_space_monitor: add error injection to suppress threshold checks
Add the `suppress_disk_space_threshold_checks` error injection point
to the disk space monitor. When enabled, the threshold listener
short-circuits without evaluating disk utilization.

This is useful for tests that override disk capacity via `data_file_capacity`,
where the real disk usage causes the monitor to incorrectly report
critical utilization and activate out-of-space prevention mechanisms.
2026-04-16 08:38:33 +02:00
Pavel Emelyanov
335261f351 cql3: Move enable_create_table_with_compact_storage to cql_config
Move enable_create_table_with_compact_storage option from db::config to
cql_config. This improves separation of concerns by consolidating CQL-specific
table creation policies in the cql_config structure. Update the CREATE TABLE
statement prepare() function to use the new location for the configuration check.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 08:52:20 +03:00
Pavel Emelyanov
f20ede79f9 cql3: Move strict_is_not_null_in_views to cql_config
Move strict_is_not_null_in_views option from db::config to cql_config via
new view_restrictions sub-struct. This improves separation of concerns by
keeping view-specific validation policies with other CQL configuration.
Update prepare_view() to take view_restrictions reference instead of reaching
into db::config, and update all callsites to pass the sub-struct.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 08:52:19 +03:00
Pavel Emelyanov
027c91f45e cql3: Move restrict_future_timestamp to cql_config
Move restrict_future_timestamp option from db::config to cql_config. This
improves separation of concerns as timestamp validation is part of CQL query
execution behavior. Update validate_timestamp() function signature to take
cql_config reference instead of db::config, and update all callsites in
modification_statement and batch_statement to pass cql_config.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 08:51:53 +03:00
Pavel Emelyanov
7264581881 cql3: Move TWCS restriction options to cql_config
Move twcs_max_window_count and restrict_twcs_without_default_ttl options
from db::config to cql_config via new twcs_restrictions sub-struct. This
improves separation of concerns by keeping TWCS-specific validation policies
with other CQL configuration. Update check_restricted_table_properties()
to remove unused db parameter and take twcs_restrictions reference instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 08:51:52 +03:00
Pavel Emelyanov
8b853505cd cql3: Move keyspace restriction options to cql_config
Introduce replication_restrictions, a sub-struct of cql_config, to hold
the seven keyspace-level policy options that govern how CREATE/ALTER
KEYSPACE statements are validated:
  - restrict_replication_simplestrategy
  - replication_strategy_warn_list / replication_strategy_fail_list
  - minimum/maximum_replication_factor_warn/fail_threshold

Pass replication_restrictions into check_against_restricted_replication_strategies()
instead of having it reach into db::config directly (via both
qp.db().get_config() and qp.proxy().data_dictionary().get_config()).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 08:51:24 +03:00
Benny Halevy
ce00d61917 db: implement large_data virtual tables with feature flag gating
Replace the physical system.large_partitions, system.large_rows, and
system.large_cells CQL tables with virtual tables that read from
LargeDataRecords stored in SSTable scylla metadata (tag 13).

The transition is gated by a new LARGE_DATA_VIRTUAL_TABLES cluster
feature flag:

- Before the feature is enabled: the old physical tables remain in
  all_tables(), CQL writes are active, no virtual tables are registered.
  This ensures safe rollback during rolling upgrades.

- After the feature is enabled: old physical tables are dropped from
  disk via legacy_drop_table_on_all_shards(), virtual tables are
  registered on all shards, and CQL writes are skipped via
  skip_cql_writes() in cql_table_large_data_handler.

Key implementation details:

- Three virtual table classes (large_partitions_virtual_table,
  large_rows_virtual_table, large_cells_virtual_table) extend
  streaming_virtual_table with cross-shard record collection.

- generate_legacy_id() gains a version parameter; virtual tables
  use version 1 to get different UUIDs than the old physical tables.

- compaction_time is derived from SSTable generation UUID at display
  time via UUID_gen::unix_timestamp().

- Legacy SSTables without LargeDataRecords emit synthetic summary
  rows based on above_threshold > 0 in LargeDataStats.

- The activation logic uses two paths: when the feature is already
  enabled (test env, restart), it runs as a coroutine; when not yet
  enabled, it registers a when_enabled callback that runs inside
  seastar::async from feature_service::enable().

- sstable_3_x_test updated to use a simplified large_data_test_handler
  and validate LargeDataRecords in SSTable metadata directly.
2026-04-16 08:49:02 +03:00
Benny Halevy
cb6004b625 db: call initialize_virtual_tables from shard 0 only
Move the smp::invoke_on_all dispatch from the callers into
initialize_virtual_tables() itself, so the function is called
once from shard 0 and internally distributes the per-shard
virtual table setup to all shards.

This simplifies the callers and allows a single place to add
cross-shard coordination logic (e.g. feature-gated table
registration) in future commits.
2026-04-16 08:49:02 +03:00
Benny Halevy
90d4ff34fb test: add LargeDataRecords round-trip unit tests
Add three new test cases to sstable_3_x_test.cc that verify the
LargeDataRecords metadata written by the SSTable writer can be read
back after open_data():

- test_large_data_records_round_trip: verifies partition_size, row_size,
  and cell_size records are written with correct field semantics when
  thresholds are exceeded
- test_large_data_records_top_n_bounded: verifies the bounded min-heap
  keeps only the top-N largest entries per type
- test_large_data_records_none_when_below_threshold: verifies no records
  are written when data is below all thresholds

Also wire large_data_records_per_sstable from db_config into the test
env's sstables_manager::config so that config changes propagate through
the updateable_value chain to configure_writer().
2026-04-16 08:49:02 +03:00
Benny Halevy
1f7faeef57 sstables: populate LargeDataRecords from writer
During compaction (SSTable writing), maintain bounded min-heaps (one per
large_data_type) that collect the top-N above-threshold records.  On
stream end, drain all five heaps into a single LargeDataRecords array
and write it into the SSTable's scylla metadata component.

Five separate heaps are used:
- partition_size, row_size, cell_size: ordered by value (size bytes)
- rows_in_partition, elements_in_collection: ordered by elements_count

A new config option 'compaction_large_data_records_per_sstable' (default
10) controls the maximum number of records kept per type.
2026-04-16 08:49:02 +03:00
Benny Halevy
8f4976f65d large_data_handler: return above_threshold_result from maybe_record_large_cells
Change maybe_record_large_cells to return above_threshold_result with
separate booleans for cell size (.size) and collection elements
(.elements) thresholds.  This allows the writer to track above_threshold
counts for cell_size and elements_in_collection independently.
2026-04-16 08:49:02 +03:00
Benny Halevy
c1b797f288 large_data_handler: rename partition_above_threshold to above_threshold_result
Rename partition_above_threshold to above_threshold_result and its
'rows' field to 'elements', making it a generic struct that can be
reused for other large data types (e.g., cells with collection
elements).

Use designated initializers for clarity.
2026-04-16 08:49:02 +03:00
Benny Halevy
d92cd42fe6 sstables: add LargeDataRecords metadata type (tag 13)
Add a new scylla metadata component LargeDataRecords (tag 13) that
stores per-SSTable top-N large data records.  Each record carries:
  - large_data_type (partition_size, row_size, cell_size, etc.)
  - binary serialized partition key and clustering key
  - column name (for cell records)
  - value (size in bytes)
  - element count (rows or collection elements, type-dependent)
  - range tombstones and dead rows (partition records only)

The struct uses disk_string<uint32_t> for key/name fields and is
serialized via the existing describe_type framework into the SSTable
Scylla metadata component.

Add JSON support in scylla-sstable and format documentation.
2026-04-16 08:49:01 +03:00
Benny Halevy
85e2c6f2a7 sstables: add fmt::formatter for large_data_type
Add a fmt::formatter specialization for sstables::large_data_type and
use it in scylla-sstable.cc instead of the local to_string() overload,
which is removed.
2026-04-16 08:42:54 +03:00
Benny Halevy
d4283d0ffc keys: move key_to_str() to keys/keys.hh
Move the key_to_str() template function from a file-local static in
db/large_data_handler.cc to keys/keys.hh so it can be reused by:
- large_data_handler.cc for log messages
- virtual tables (db/virtual_tables.cc) for converting binary keys
  to human-readable CQL display
- scylla-sstable for JSON output of LargeDataRecords

No functional change.
2026-04-16 08:42:54 +03:00
Pavel Emelyanov
1af26a1dd6 cql3: Move batch_size_fail_threshold_in_kb to cql_config
The batch_size_fail_threshold_in_kb option controls the batch size at
which an oversized batch error is returned to the client. It belongs in
cql_config rather than db::config as it directly governs CQL batch
statement behavior.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 07:57:27 +03:00
Pavel Emelyanov
4d255cf533 cql3: Move batch_size_warn_threshold_in_kb to cql_config
The batch_size_warn_threshold_in_kb option controls the batch size at
which a client warning is emitted during batch execution. It belongs in
cql_config rather than db::config as it directly governs CQL batch
statement behavior.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 07:57:27 +03:00
Pavel Emelyanov
a3f097f100 cql3: Move enable_parallelized_aggregation to cql_config
The enable_parallelized_aggregation option controls whether aggregation
queries are fanned out across shards for parallel execution. It belongs
in cql_config rather than db::config as it directly governs CQL query
behavior at prepare time.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 07:57:27 +03:00
Pavel Emelyanov
4314fc0642 cql3: Move strict_allow_filtering to cql_config
The strict_allow_filtering option controls whether queries that require
ALLOW FILTERING are silently accepted, warned about, or rejected. It
belongs in cql_config rather than db::config as it directly governs CQL
query behavior at prepare time.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 07:57:26 +03:00
Pavel Emelyanov
3411ed8bcc cql3: Move select_internal_page_size to cql_config
The select_internal_page_size option controls CQL query execution
behavior (internal paging for aggregate/filtered SELECTs) and belongs
in cql_config rather than being read directly from db::config at
execution time.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 07:57:26 +03:00
Pavel Emelyanov
728eb20b42 test: Fix cql_test_env to use updateable cql_config from db::config
The test environment was creating cql_config with hardcoded default values that
were never updated when system.config was modified via CQL. This broke tests
that dynamically change configuration values (e.g., TWCS tests).

Fix by creating cql_config from db::config using sharded_parameter, which
ensures updateable_value fields track the actual db::config sources and reflect
changes made during test execution.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-04-16 07:57:26 +03:00
Pavel Emelyanov
60a834d9fa cql3: Add cql_config parameter to parsed_statement::prepare()
Pass cql_config to prepare() so that statement preparation can use
CQL-specific configuration rather than reaching into db::config
directly.

Callers that use default_cql_config:
- db/view/view.cc: builds a SELECT statement internally to compute view
  restrictions, not in response to a user query
- cql3/statements/create_view_statement.cc: same -- parses the view's
  WHERE clause as a synthetic SELECT to extract restrictions
- tools/schema_loader.cc: offline schema loading tool, no runtime
  config available
- tools/scylla-sstable.cc: offline sstable inspection tool, no runtime
  config available

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 07:57:25 +03:00
Nadav Har'El
f0e9177130 Merge 'audit/alternator: Make Alternator requests audited' from Piotr Szymaniak
Each Alternator API call results in the request being audited, provided the auditing is enabled.
Both successful as well as the failed requests are audited, with few exceptions.

The chosen audit types for the operations:
- CreateTable - DDL
- DescribeTable - QUERY
- DeleteTable - DDL
- UpdateTable - DDL
- PutItem - DML
- UpdateItem - DML
- GetItem - QUERY
- DeleteItem - DML
- ListTables - QUERY
- Scan - QUERY
- DescribeEndpoints - QUERY
- BatchWriteItem - DML
- BatchGetItem - QUERY
- Query - QUERY
- TagResource - DDL
- UntagResource - DDL
- ListTagsOfResource - QUERY
- UpdateTimeToLive - DDL
- DescribeTimeToLive - QUERY
- ListStreams - QUERY
- DescribeStream - QUERY
- GetShardIterator - QUERY
- GetRecords - QUERY
- DescribeContinuousBackups - QUERY

FIXME: The tests are now covering the new functionality only partially.

Fixes: scylladb/scylla-enterprise#3796
Fixes: SCYLLADB-467

No need to backport, new functionality.

Closes scylladb/scylladb#27953

* github.com:scylladb/scylladb:
  audit/alternator: support audit_tables=alternator.<table> shorthand
  audit/alternator: Add negative audit tests
  audit/alternator: Add testing of auditing
  audit/alternator: Audit requests
  audit/alternator: Refactor in preparation for auditing Alternator
2026-04-15 22:17:57 +03:00
Nikos Dragazis
d38f44208a test/cqlpy: Harden mutation_fragments tests against background flushes
Several tests in test_select_from_mutation_fragments.py assume that all
mutations end up in a single SSTable. This assumption can be violated
by background memtable flushes triggered by commitlog disk pressure.

Since the Scylla node is taken from a pool, it may carry unflushed data
from prior tests that prevents closed segments from being recycled,
thereby increasing the commitlog disk usage. A main source of such
pressure is keyspace-level flushes from earlier tests in this module,
which rotate commitlog segments without flushing system tables (e.g.,
`system.compaction_history`), leaving closed segments dirty.
Additionally, prior tests in the same module may have left unflushed
data on the shared test table (`test_table` fixture), keeping commitlog
segments dirty on its behalf as well. When commitlog disk usage exceeds
its threshold, the system flushes the test table to reclaim those
segments, potentially splitting a running test's mutations across
multiple SSTables.

This was observed in CI, where test_paging failed because its data was
split across two SSTables, resulting in more mutation fragments than the
hardcoded expected count.

This patch fixes the affected tests in two ways:

1. Where possible, tests are reworked to not assume a single SSTable:
   - test_paging
   - test_slicing_rows
   - test_many_partition_scan

2. Where rework is impractical, major compaction is added after writes
   and before validation to ensure that only one SSTable will exist:
   - test_smoke
   - test_count
   - test_metadata_and_value
   - test_slicing_range_tombstone_changes

Fixes SCYLLADB-1375.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#29389
2026-04-15 21:46:00 +03:00
Michael Litvak
cc94467097 test: test drop table during streaming
Add a test that drops a table while tablet streaming is running for the
table. The table is dropped after taking the storage snapshot and
initializating streaming sources - after that streaming should be able
to complete or abort correctly if the table is dropped. We want to
verify there is no incorrect access to the destroyed table.

The test tests both types of streaming in stream_blob - sstables
and logstor segments.
2026-04-15 19:23:00 +02:00
Michael Litvak
69d2a90106 streaming: stream_blob: hold table for streaming
When initializing streaming sources in tablet_stream_files_handler we
use a reference to the table. We should hold the table while doing so,
because otherwise the table may be dropped and destroyed when we yield.
Use the table.stream_in_progress() phaser to hold the table while we
access it.

For sstable file streaming we can release the table after the snapshot
is initialized, and the table may be dropped safely because the files
are held by the snapshot and we don't access the table anymore. There
was a single access to the table for logging but it is replaced by a
pre-calculated variable.

For logstor segment streaming, currently it doesn't support discarding
the segments while they are streamed - when the table is dropped it
discard the segments by overwriting and freeing them, so they shouldn't
be accessed after that. Therefore, in that case continue to hold the
table until streaming is completed.

Fixes SCYLLADB-1533
2026-04-15 19:22:42 +02:00
Avi Kivity
59ec93b86b Merge 'Allow arbitrary tablet boundaries and count' from Tomasz Grabiec
There are several reasons we want to do that.

One is that it will give us more flexibility in distributing the
load. We can subdivide tablets at any token, and achieve more
evenly-sized tablets. In particular, we can isolate large partitions
into separate tablets.

We can also split and merge incrementally individual tablets.
Currently, we do it for the whole table or nothing, which makes
splits and merges take longer and cause wide swings of the count.
This is not implemented in this PR yet, we still split/merge the whole table.

Another reason is vnode to tablets migration. We now could construct a
tablet map which matches exactly the vnode boundaries, so migration
can happen transparently from CQL-coordinator point of view.

Tablet count is still a power-of-two by default for newly created tables.
It may be different if tablet map is created by non-standard means,
or if per-table tablet option "pow2_count" is set to "false".

build/release/scylla perf-tablets:

Memory footprint for 131k tablets increased from 56 MiB to 58.1 MiB (+3.5%)

Before:
```
Generating tablet metadata
Total tablet count: 131072
Size of tablet_metadata in memory: 57456 KiB
Copied in 0.014346 [ms]
Cleared in 0.002698 [ms]
Saved in 1234.685303 [ms]
Read in 445.577881 [ms]
Read mutations in 299.596313 [ms] 128 mutations
Read required hosts in 247.482742 [ms]
Size of canonical mutations: 33.945053 [MiB]
Disk space used by system.tablets: 1.456761 [MiB]
Tablet metadata reload:
full      407.69ms
partial     2.65ms
```

After:
```
Generating tablet metadata
Total tablet count: 131072
Size of tablet_metadata in memory: 59504 KiB
Copied in 0.032475 [ms]
Cleared in 0.002965 [ms]
Saved in 1093.877441 [ms]
Read in 387.027100 [ms]
Read mutations in 255.752121 [ms] 128 mutations
Read required hosts in 211.202805 [ms]
Size of canonical mutations: 33.954453 [MiB]
Disk space used by system.tablets: 1.450162 [MiB]
Tablet metadata reload:
full      354.50ms
partial     2.19ms
```

Closes scylladb/scylladb#28459

* github.com:scylladb/scylladb:
  test: boost: tablets: Add test for merge with arbitrary tablet count
  tablets, database: Advertise 'arbitrary' layout in snapshot manifest
  tablets: Introduce pow2_count per-table tablet option
  tablets: Prepare for non-power-of-two tablet count
  tablets: Implement merged tablet_map constructor on top of for_each_sibling_tablets()
  tablets: Prepare resize_decision to hold data in decisions
  tablets: table: Make storage_group handle arbitrary merge boundaries
  tablets: Make stats update post-merge work with arbitrary merge boundaries
  locator: tablets: Support arbitrary tablet boundaries
  locator: tablets: Introduce tablet_map::get_split_token()
  dht: Introduce get_uniform_tokens()
2026-04-15 18:57:22 +03:00
Andrzej Jackowski
78926d9c96 test/random_failures: remove gossip shadow round injection
Commit c17c4806a1 removed check_for_endpoint_collision() from
the fresh bootstrap path, which was the only code path that
called do_shadow_round() for new nodes. Since the gossip shadow
round is no longer executed during bootstrap, remove the
stop_during_gossip_shadow_round error injection from the test.

The entry is marked as REMOVED_ rather than deleted to preserve
the shuffle order for seed-based test reproducibility.

The injection point in gms/gossiper.cc is also removed since it
is no longer used by any test.

Fixes: SCYLLADB-1466

Closes scylladb/scylladb#29460
2026-04-15 16:30:55 +02:00
Dario Mirovic
336dab1eec test: lib: fix broken retry in start_docker_service
The retry loop in start_docker_service passes the parse callbacks
via std::move into create_handler on each iteration. After the
first iteration, the moved-from std::function objects are empty.
All subsequent retries skip output parsing entirely and
immediately treat the service as successfully started. This
defeats the entire purpose of the retry mechanism.

Fix by passing the callbacks by copy instead of move, so the
original callbacks remain valid across retries.

Fixes SCYLLADB-1542
2026-04-15 15:25:52 +02:00
Asias He
4137a4229c test: Stabilize tablet incremental repair error test
Use async tablet repair task flow to avoid a race where client timeout
returns while server-side repair continues after injections are
disabled.

Start repair with await_completion=false, assert it does not complete
within timeout under injection, abort/wait the task, then verify
sstables_repaired_at is unchanged.

Fixes SCYLLADB-1184

Closes scylladb/scylladb#29452
2026-04-15 16:24:43 +03:00
Gleb Natapov
b55748ec54 db/system_distributed_keyspace: remove unused code
system_distributed_keyspace::CDC_TOPOLOGY_DESCRIPTION table is not
created since scylla-4.6.0 but we still have code to mark it as sync.
2026-04-15 15:48:49 +03:00