Commit Graph

11801 Commits

Author SHA1 Message Date
Michael Litvak
778dec2630 test/cqlpy: adjust cdc tests for tablets
update cdc-related tests in test/cqlpy for cdc with tablets.

* test_cdc_log_entries_use_cdc_streams: this test depends on the
  implementation of the cdc tables, which is different for tablets, so
  it's changed to run for both vnodes and tablets keyspaces, and we add
  the implementation for tablets.

* some cdc-related are unskipped for tablets so they will be run with
  both tablets and vnodes keyspaces. these are tests where the
  implementation may be different between tablets and vnodes and we want
  to have converage of both.

* other cdc-related tests do not depend on the implementation
  differences between tablets and vnodes, so we can just enable them to
  run with the default configuration. previously they were disabled for
  tablets keyspaces because it wasn't supported, so now we remove this.
2025-09-17 14:47:13 +02:00
Michael Litvak
5a87d0f6c9 test/cluster/test_cdc_with_tablets: introduce cdc with tablets tests
Introduce basic tests creating CDC tables in tablets-enabled keyspaces,
verifying we can create and drop CDC tables, write and consume CDC log
entries, and consume the log while splitting streams.
2025-09-17 14:47:13 +02:00
Michael Litvak
67410cac4d cdc: generate_stream_diff helper function
This helper functions receives two sets of streams and constructs their
difference - closed and opened streams.
2025-09-17 14:47:12 +02:00
Michael Litvak
9ec4b6ccb1 cdc: load tablet streams metadata from tables
Read the CDC stream metadata from the internal system tables, and store
it in the cdc metadata data structures.

The metadata is stored in the tables as diffs which is more storage
efficient, but when in-memory we store it as full stream sets for each
timestamp. This is more useful because we need to be able to find a
stream given timestamp and token.
2025-09-17 14:47:12 +02:00
Nadav Har'El
e322902506 Merge 'index, metrics: add per-index metrics' from Michał Hudobski
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.

First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1

Fixes: https://github.com/scylladb/scylladb/issues/25970

Closes scylladb/scylladb#25995

* github.com:scylladb/scylladb:
  test: verify that the index metric is added
  index, metrics: add per-index metrics
2025-09-17 14:54:12 +03:00
Michał Chojnowski
b7afda5030 sstables/mx/reader: remove mx::make_reader_with_index_reader
When `mx::make_reader` is used to construct an sstable reader,
it constructs its own index reader internally.

`mx::make_reader_with_index_reader` was originally added
as a variant of `mx::make_reader` which can be used to inject
a custom `index_reader` for testing that the mx Data reader
tolerates inexact indexes.

But now we want the ability to choose between BIG index readers
and BTI index readers if both are present. And at this point,
it seems to me that it makes sense to just construct the index
reader in the caller and pass it via argument to `mx::make_reader`
instead of putting the index selection inside it.

So that's what we do in this patch. And we remove `mx::make_reader_with_index_reader`
because it's no longer different from `mx::make_reader`.
2025-09-17 12:22:41 +02:00
Michał Chojnowski
f7d7722baa test/boost/bti_index_test: fix indentation
Fix an indentation mishap. Purely cosmetic patch.
2025-09-17 12:22:41 +02:00
Michał Chojnowski
191405fc51 sstables/trie/bti_index_reader: in last_block_offset(), return offset from the beginning of partition, not file
Before this patch, `bti_index_reader::last_block_offset()` returns the
offset of the last block within the file.
But the old `index_reader::last_block_offset()` returns the offset
within the partition, and that's what the callers (i.e. reversed
sstable reader) expect.

Fix `bti_index_reader::last_block_offset()` (and the corresponding
comment and test) to match `index_reader::last_block_offset()`.
2025-09-17 12:22:40 +02:00
Michał Chojnowski
1f85069389 sstables/trie: support reader_permit and trace_state properly
Before this patch, `reader_permit` taken by `bti_index_reader`.
wasn't actually being passed down to disk reads. In this patch,
we fix this FIXME by propagating the permit down to the I/O
operations on the `cached_file`.

Also, it didn't take `trace_state_ptr` at all.
In this patch, we add a `trace_state_ptr` argument and propagate
it down to disk reads.

(We combine the two changes because the permit and the trace state
are passed together everywhere anyway).
2025-09-17 12:22:40 +02:00
Michał Chojnowski
98b7655d2b sstables/trie/bti_index_reader: support BYPASS CACHE
Before this patch, `bti_index_reader` doesn't have a good
way to implement BYPASS CACHE.

In this patch we add a way, similar to what `index_reader` does:
we allow the caller to pass in the `cached_file` via a shared pointer.

If the caller wants the loads done by the index reader to remain cached,
he can pass in the `cached_file` owned by the `sstable`, shared by all
caching index readers.

If the caller doesn't want the loads to remain cached, he can pass
in a fresh `cached_file` which will be privately owned by the index
reader, and will be evicted when the index reader dies.
2025-09-17 12:22:40 +02:00
Michał Chojnowski
95c93568f7 test/boost/bti_index_test: use read_bti_partitions_db_footer where appropriate
Use the helper instead of of reading the needed footer field "manually".
2025-09-17 12:22:40 +02:00
Michał Chojnowski
5934639c4b sstables/trie: change the signature of bti_partition_index_writer::finish
Let's return `bti_partitions_db_footer` so that it can be directly
saved to `sstables::shareable_components` after the index write is
finished, without re-reading the footer from the file.

Let's take `const sstables::key&` arguments instead of
`disk_string_view<uint16_t>`, that's more natural.
2025-09-17 12:22:40 +02:00
Michał Chojnowski
bf90018b8e types/comparable_bytes: add a missing implementation for date_type_impl
date_type_impl is like timestamp_type_impl, but unsigned.
2025-09-17 12:22:40 +02:00
Szymon Malewski
776f90e2f8 alternator/expressions.g: Fix antlr3 missing token leak
This patch overrides the antlr3 function that allocates the missing
tokens that would eventually leak. The override stores these tokens in
a vector, ensuring memory is freed whenever the parser is destroyed.
Solution is copied from CQL implementation.

A unit test to reproduce the issue is added - leak would be reported
by ASAN, when running this test in debug mode - the test passed but
the leak is discovered when the test file exits.

Fixes #25878

Closes scylladb/scylladb#25930
2025-09-17 13:05:24 +03:00
Abhinav Jha
43656371cf raft_topology: Modify the conditional logic in remove node operation to enhance concurrency for raft enabled clusters.
In the current scenario, the shard receiving the remove node REST api request
performs condional lock depending on whether raft is enabled or not. Since
non-zero shard returns false for `raft_topology_change_enabled()`, the requests
routed to non zero shards are prone to this lock which is unnecessary and
hampers the ability to perform concurrent operations, which is possible for
raft enabled nodes.

This pr modifies the conditional lock logic and orchestrates the remove node
execution logic directly to the shard0, hence the `raft_topology_change_enabled()` is
now checked on the shard0 and execution is performed accordingly.

A test is also added to confirm the new behaviour, where concurrent remove
node operations are now being performed seamlessly.

This pr doesn't fix a critical bug. No need to backport it.

Fixes: scylladb/scylladb#24737
2025-09-17 15:23:32 +05:30
Benny Halevy
3a6208b319 utils: stall_free: clear_gently: release wrapped objects
As discussed in https://github.com/scylladb/scylladb/pull/24606#discussion_r2281870939
clear_gently of shared pointers should release the wrapped
object reference and when the object's use_count reaches 1,
the object itself would be cleared_gently, before it's destroyed.

This behavior is similar to the way we clear gently containers
like arrays or vectors, and so it is extended in this patch
to smart pointers like unique_ptr and foreign_ptr.

The unit tests are adjusted respectively to expect the
smart pointers to be reset after clear_gently, plus
the use of `reset()` for `foreign_ptr<shared_ptr<>>` was
replaced by `clear_gently().get()` which now ensures the
reference to a shared object is released, and awaited for,
if it happens on a foreign owner shard, unlike reset of
a foreign_ptr that kicks off destroy of that shared object
in the background on the owner shard - causing flakiness.

Fixes #25723

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25759
2025-09-17 11:44:26 +03:00
Patryk Jędrzejczak
454eb08cb4 Merge 'group0: remove obsolete "stop_before_becoming_raft_voter" error injection' from Emil Maskovsky
The Raft topology workflow was changed by the limited voters feature: nodes no longer request votership themselves. As a result, the "stop_before_becoming_raft_voter" error injection is now obsolete and has been removed.

Fixes: scylladb/scylladb#23418

No backport: This re-enables a test, only needed for master.

Closes scylladb/scylladb#26042

* https://github.com/scylladb/scylladb:
  group0: remove obsolete "stop_before_becoming_raft_voter" error injection
  test/random_failures: preserve test repeatability when removing error injections
2025-09-17 10:38:32 +02:00
Patryk Jędrzejczak
368d70ee15 Merge 'LWT: implement fencing' from Petr Gusev
This PR consists of three parts:
* Small refactoring of the fencing APIs in storage_proxy (renames + comments + some functions were extracted)
* Implement the fencing for LWT verbs itself. This includes checking the fencing token before and after local replica data accesses.
* Two new `test.py` tests in `test_fencing.py`, which check the fencing in some real-world scenarios.

Backport: no need -- fencing for LWT requests is needed primarily for LWT over tablets, which is not released yet.

Fixes scylladb/scylladb#22332

Closes scylladb/scylladb#25550

* https://github.com/scylladb/scylladb:
  test_tablets_lwt: eliminate redundant disable_tablet_balancing
  test_fencing: add test_lwt_fencing_upgrade
  pylib: extract upgrade helpers from test_sstable_compression_dictionaries_upgrade.py
  test_fencing: add test_fenced_out_on_tablet_migration_while_handling_paxos_verb
  test_fencing: test_fence_lwt_during_bootstap
  pylib/rest_client.py: encode injection name
  storage_proxy_stats: add fenced_out_requests metric
  storage_proxy: add fencing to Paxos verbs
  storage_proxy::apply_fence: add overload that throws on failure
  storage_proxy: extract apply_fence_result
  sp::apply_fence: rename to apply_fence_on_ready
  sp::apply_fence: rename to check_fence
  sp::apply_fence: make non-generic
2025-09-16 23:40:48 +03:00
Ernest Zaslavsky
d624413ddd treewide: Move query related files to a new query directory
As requested in #22120, moved the files and fixed other includes and build system.

Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh

Fixes: #22120

This is a cleanup, no need to backport

Closes scylladb/scylladb#25105
2025-09-16 23:40:47 +03:00
Michał Chojnowski
68e6141211 scylla-gdb: add scylla prepared-statements
Add a helper which prints all prepared statements currently
present in the query processor.

Example output:
```
(gdb) scylla prepared-statements
(cql3::cql_statement*)(0x600003d71050): SELECT * FROM ks.ks WHERE pk = ?
(cql3::cql_statement*)(0x600003972b50): SELECT pk FROM ks.ks WHERE pk = ?
```

Closes scylladb/scylladb#26007
2025-09-16 23:40:47 +03:00
Botond Dénes
0cf6a648bb Merge 'Default create keyspace syntax' from Dario Mirovic
Allow for the following CQL syntax:

```
CREATE KEYSPACE [IF NOT EXISTS] <name>;
```
for example:
```
CREATE KEYSPACE test_keyspace;
```

With this syntax all the keyspace's parameters would be defaulted to:

replication strategy = `NetworkTopologyStrategy`,
replication factor = number of racks , but excluding racks that only have arbiter nodes
storage options, durable writes = defaults we normally would use,
tablets enabled if they are enabled in the db configuration, e.g. scylla.yaml or db/config.cc by default.

Options besides `replication` already have defaults. `replication` had to be specified, but it could be an empty set, where defaults for sub-options (replication strategy and replication factor) would be used - `replication = {}`. Now there is no need for specifying an empty set - omitting `replication = {}` has the same effect as `replication = {}`.

Since all the options now have defaults, `WITH` is optional for `CREATE KEYSPACE` statement.

Fixes #25145

This is an improvement, no backport needed.

Closes scylladb/scylladb#25872

* github.com:scylladb/scylladb:
  docs: cql: default create keyspace syntax
  test: cqlpy: add test for create keyspace with no options specified
  cql: default `CREATE KEYSPACE` syntax
2025-09-16 23:40:47 +03:00
Emil Maskovsky
87bd328873 group0: remove obsolete "stop_before_becoming_raft_voter" error injection
The Raft topology workflow was changed by the limited voters feature:
nodes no longer request votership themselves. As a result, the
"stop_before_becoming_raft_voter" error injection is now obsolete and
has been removed.

Fixes: scylladb/scylladb#23418
2025-09-16 18:24:27 +02:00
Emil Maskovsky
0453052d66 test/random_failures: preserve test repeatability when removing error injections
The order of entries in the ERROR_INJECTIONS list determines test
repeatability for a given random seed.

To allow removing error injections without affecting the order of the
remaining ones, removed injections are now renamed with a "REMOVED_"
prefix instead of being deleted.

This ensures they are ignored by the tests, while the sequence of active
injections—and thus test reproducibility—remains unchanged.
2025-09-16 18:22:45 +02:00
Michał Hudobski
3364cc96f5 test: verify that the index metric is added
This commit adds a test that performs
a sanity check that the implemented metric
is actually being added to Scylla's metrics
and has the correct value.
2025-09-16 18:10:01 +02:00
Cezar Moise
e9be1e7b35 test: cleanup big mutation commitlog tests
- fix typos
- improve comments
- remove false and misleading comments
- remove `disableautocompaction` as it did nothing for the test
and the comment with it was false
2025-09-16 15:33:23 +03:00
Cezar Moise
492b4cf71c test: fix test_one_big_mutation_corrupted_on_startup
The commitlog in the tests with big mutations were corrupted by
overwriting 10 chunks of 1KB with random data, which could not be enough
due to randomness and the big size of the commitlog (~65MB).

Change `corrupt_file` to overwrite based on a percentage
of the file's size instead of fixed number of chunks.

refs: #25627
2025-09-16 15:32:44 +03:00
Lakshmi Narayanan Sreethar
7cdda510ee compaction/scrub: register sstables for compaction before validation
When `scrub --validate` runs, it collects all candidate sstables at the
start and validates them one by one in separate compaction tasks.
However, scrub in validate mode does not register these sstables for
compaction, which allows regular compaction to pick them up and
potentially compact them away before validation begins. This leads to
scrub failures because the sstables can no longer be found.

This patch fixes the issue by first disabling compaction, collecting the
sstables, and then registering them for compaction before starting
validation. This ensures that the enqueued sstables remain available for
the entire duration of the scrub validation task.

Fixes #23363

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-09-16 15:29:57 +05:30
Asias He
54162a026f scylla-nodetool: Add --incremental-mode option to cluster repair
The `--incremental-mode` option specifies the incremental repair mode.
Can be 'disabled', 'regular', or 'full'.

'regular': The incremental repair logic is enabled. Unrepaired sstables
will be included for repair.  Repaired sstables will be skipped. The
incremental repair states will be updated after repair.

'full': The incremental repair logic is enabled. Both repaired and
unrepaired sstables will be included for repair. The incremental repair
states will be updated after repair.

'disabled': The incremental repair logic is disabled completely. The
incremental repair states, e.g., repaired_at in sstables and
sstables_repaired_at in the system.tablets table, will not be updated
after repair.

When the option is not provided, it defaults to regular.

Fixes #25931

Closes scylladb/scylladb#25969
2025-09-16 10:23:22 +03:00
Petr Gusev
1d270020f2 test_tablets_lwt: eliminate redundant disable_tablet_balancing
This is a refactoring commit.
2025-09-15 12:40:10 +02:00
Petr Gusev
7060265d5f test_fencing: add test_lwt_fencing_upgrade
This test verifies that upgrading to a Scylla version
with LWT fencing does not disrupt existing LWT workloads.
2025-09-15 12:34:45 +02:00
Petr Gusev
49b036cf2b pylib: extract upgrade helpers from test_sstable_compression_dictionaries_upgrade.py
We want to reuse them to test upgade for LWT fencing
2025-09-15 12:34:45 +02:00
Petr Gusev
82f0235e4b test_fencing: add test_fenced_out_on_tablet_migration_while_handling_paxos_verb
This test verifies that the fencing token is checked on replicas
after the local Paxos state is updated. This ensures that if we failed
to drain an LWT request during topology changes the replicas
where paxos verbs got stuck won't contributed to the target CLs.
2025-09-15 12:34:45 +02:00
Petr Gusev
0156850605 test_fencing: test_fence_lwt_during_bootstap 2025-09-15 12:09:08 +02:00
Petr Gusev
92b165b8c0 pylib/rest_client.py: encode injection name
Sometimes it's convenient to use slashes in injection names,
for example my_component/my_method/my_condition. Without quote()
we get 'handler not found' error from Scylla.
2025-09-15 11:24:53 +02:00
Michał Jadwiszczak
dc1ffd2c10 service/storage_service: drain view_building_worker earlier
Similarly to view builder, view building worker needs to be drained
in `storage_service::do_drain()`.

Storage service drain is happening at the same beginning of shutdown
procedure. Before this patch, the worker was still building views
after the storage service was drained and this caused errors like:
`Error applying view update to (named_gate_closed_exception)` and
`locator::no_such_tablet_map`.

Fixes scylladb/scylladb#25908

Closes scylladb/scylladb#25984
2025-09-15 11:29:19 +03:00
Nadav Har'El
b4e3d4ac2f alternator: nicer error message for integer overflow in list index
In the DynamoDB API, when "a" is a list attribute, a[999] returns the
1000th element. But if the list isn't that long (e.g., it only has 5
elements), a[999] returns nothing - it's not an error.

But it turns out that when the index is so long that it can't even be
parsed as an integer, e.g., 99999999999999, DynamoDB does report an
error:

    Invalid ProjectionExpression: List index is not within the
    allowable range; index: [99999999999999]

Before this patch, Alternator also returned an error in this case,
with the right type (ValidationException), but with a strange low-level
error text:

    Failed parsing ProjectionExpression 'a[99999999999999]':
    std::out_of_range (stoi)

The problem was that the code (in alternator/expressions.g) ran stoi()
without converting its std::out_of_range exception to a better user-facing
message. We do this in this patch, and the error message now looks like:

    Failed parsing ProjectionExpression 'a[99999999999999]':
    list index out of integer range

This patch also includes a test reproducing this error, which passes
on DynamDB and on Alternator it fails before this patch and passes with
the patch.

Fixes #25947

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25951
2025-09-15 08:43:00 +03:00
Aleksandra Martyniuk
75b772adfb db: optimize cache invalidation following repair/streaming
Currently, if a new sstable is created during repair/streaming,
we invalidate its whole	token range in cache. If the sstable
is sparse, we unnecessarily clear too much data.

Modify cache invalidation, so that only the partitions present
in the sstable are cleared.

To check whether a partition is present in the sstable, we use bloom
filters. Bloom filters may return false positives and show that
an sstable contains a partition, even though it does not. Due to that
we may invalidate a bit more than we need to, but the cache will be
in valid state.

An issue arises when we do not invalidate two consecutive partitions
that are continuous. The sstable may contain a token that falls
between these partitions, breaking the continuity. To check that, we
would need to scan sstable index. However, such a change would
noticeably complicate the invalidation, both performance and code.
In this change, sstable index reader isn't used. Instead, the continuity
flag is unset for all scanned partitions. This comes at a cost of
heavier reads, as we will need to verify continuity when reading more
than one partition from cache.

Fixes: https://github.com/scylladb/scylladb/issues/9136.

Closes scylladb/scylladb#25996
2025-09-14 19:48:14 +03:00
Lakshmi Narayanan Sreethar
1d1e572962 sstables: skip bloom filter rebuilds with minimal savings
If a bloom filter was built with a bad partition estimate, it is rebuilt
right before the sstable is sealed. The rebuild is already skipped if
the current bitset size results in a false-positive rate within 75%–125%
of the configured value.

This patch adds additional conditions to prevent rebuilds when the
savings are minimal. It also skips rebuilding for garbage collected
sstables, since they will be dropped soon anyway.

Also updated and added more test cases to cover these new criteria for
bloom filter rebuilds.

Fixes #25464
Fixes #25468

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#25968
2025-09-14 18:19:50 +03:00
Nadav Har'El
5307d1b9a8 Merge 'vector_index: add version to index options' from Dawid Pawlik
Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lacked the information about the version of the index.

The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`.
It requires few changes and seems unintruitive for existing infrastructure.

This patch implements the solution described above.

Refs: VECTOR-142

Closes scylladb/scylladb#25614

* github.com:scylladb/scylladb:
  cqlpy/test_vector_index: add vector index version test
  vector_index, index_prop_defs: add version to index options
  create_index_statement: rename `validator` to `custom_index_factory`
  custom index: rename `custom_index_option_name`
  vector_index: rename `supported_options` to `vector_index_options`
2025-09-14 15:35:53 +03:00
Avi Kivity
ef7babda3d Merge 'test: deflake test_restart_leaving_replica_during_cleanup' from Patryk Jędrzejczak
The test started hitting #21779 recently. We deflake it in this commit
by disabling the tablet load balancing before dropping the keyspace at
the end of the test.

We still have to understand why the test started hitting #21779, so we
keep #25938 open.

Refs #25938

The test was flaky only on master, so no backport needed.

Closes scylladb/scylladb#25975

* github.com:scylladb/scylladb:
  test: enable load balancing on a single node in test_restart_leaving_replica_during_cleanup
  test: deflake test_restart_leaving_replica_during_cleanup
2025-09-12 15:58:19 +03:00
Radosław Cybulski
436150eb52 treewide: fix spelling errors
Fix spelling errors reported by copilot on github.
Remove single use namespace alias.

Closes scylladb/scylladb#25960
2025-09-12 15:58:19 +03:00
Patryk Jędrzejczak
aaab71c14e test: enable load balancing on a single node in test_restart_leaving_replica_during_cleanup
Doing it on more than one node is redundant.
2025-09-11 13:19:56 +02:00
Patryk Jędrzejczak
4c9efc08d8 test: deflake test_restart_leaving_replica_during_cleanup
The test started hitting #21779 recently. We deflake it in this commit
by disabling the tablet load balancing before dropping the keyspace at
the end of the test.

We still have to understand why the test started hitting #21779, so we
keep #25938 open.

Refs #25938
2025-09-11 13:19:51 +02:00
Patryk Jędrzejczak
eae12c1717 test: cluster: add a test for restarts with no group 0 quorum
We don't have such a test, and we could add a group 0 quorum requirement
on the restart path by mistake.

A new test, no backport.

Closes scylladb/scylladb#25623
2025-09-11 08:56:34 +03:00
Raphael S. Carvalho
b607b1c284 compaction: Fix stop of sstable cleanup
The interface suggests the whole sstable cleanup is aborted with
'nodetool stop CLEANUP', but it is currently stopping only the
ongoing cleanup task, and the compaction manager will retry the
task since the error is not propagated all the way back to the
caller. With raft topology, the coordinator should retry it though
since cleanup became mandatory with automatic cleanup. So it's
only fixing the usage where cleanup is issued manually.

The stop exception is only propagated to the caller of cleanup.
When stopping tasks during shutdown, the exception is swallowed
and the error only returned to the caller.

Fixes #20823.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#24996
2025-09-11 08:55:10 +03:00
Cezar Moise
20ba8d4e8c test: skip flaky test test_one_big_mutation_corrupted_on_startup
The test is flaky since it tries to corrupt the commitlog
in a non-deterministic way that sometimes allows
the tested mutation to escape and be replayed anyhow.

refs: #25627

Closes scylladb/scylladb#25950
2025-09-11 08:39:24 +03:00
Avi Kivity
c91b326d5a Merge 'transport: replace throwing protocol_exception with returns' from Dario Mirovic
Replace throwing `protocol_exception` with returning it as a result or an exceptional future in the transport server module. The goal is to improve performance.

Most of the `protocol_exception` throws were made from `fragmented_temporary_buffer` module, by passing `exception_thrower()` to its `read*` methods. `fragmented_temporary_buffer` is changed so that it now accepts an exception creator, not exception thrower. `fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced `fragmented_temporary_buffer_concepts::ExceptionThrower` and all methods that have been throwing now return failed result of type `utils::result_with_eptr`. This change is then propagated to the callers.

The scope of this patch is `protocol_exception`, so commitlog just calls `.value()` method on the result. If the result failed, that will throw the exception from the result, as defined by `utils::result_with_eptr_throw_policy`. This means that the behavior of commitlog module stays the same.

transport server module handles results gracefully. All the caller functions that return non-future value `T` now return `utils::result_with_eptr<T>`. When the caller is a function that returns a future, and it receives failed result, `make_exception_future(std::move(failed_result).value())` is returned. The rest of the callstack up to the transport server `handle_error` function is already working without throwing, and that's how zero throws is achieved.

cql3 module changes do the same as transport server module.

Benchmark that is not yet merged has commit `67fbe35833e2d23a8e9c2dcb5e04580231d8ec96`, [GitHub diff view](https://github.com/scylladb/scylladb/compare/master...nuivall:scylladb:perf_cql_raw). It uses either read or write query.

Command line used:
```
./build/release/scylla perf-cql-raw --workdir ~/tmp/scylladir --smp 1 --developer-mode 1 --workload write --duration 300 --concurrency 1000 --username cassandra --password cassandra 2>/dev/null
```
The only thing changed across runs is `--workload write`/`--workload read`.

Built and run on `release` target.

<details>

```
throughput:
        mean=   36946.04 standard-deviation=1831.28
        median= 37515.49 median-absolute-deviation=1544.52
        maximum=39748.41 minimum=28443.36
instructions_per_op:
        mean=   108105.70 standard-deviation=965.19
        median= 108052.56 median-absolute-deviation=53.47
        maximum=124735.92 minimum=107899.00
cpu_cycles_per_op:
        mean=   70065.73 standard-deviation=2328.50
        median= 69755.89 median-absolute-deviation=1250.85
        maximum=92631.48 minimum=66479.36

⏱  real=5:11.08  user=2:00.20  sys=2:25.55  cpu=85%
```

```
throughput:
        mean=   40718.30 standard-deviation=2237.16
        median= 41194.39 median-absolute-deviation=1723.72
        maximum=43974.56 minimum=34738.16
instructions_per_op:
        mean=   117083.62 standard-deviation=40.74
        median= 117087.54 median-absolute-deviation=31.95
        maximum=117215.34 minimum=116874.30
cpu_cycles_per_op:
        mean=   58777.43 standard-deviation=1225.70
        median= 58724.65 median-absolute-deviation=776.03
        maximum=64740.54 minimum=55922.58

⏱  real=5:12.37  user=27.461  sys=3:54.53  cpu=83%
```

```
throughput:
        mean=   37107.91 standard-deviation=1698.58
        median= 37185.53 median-absolute-deviation=1300.99
        maximum=40459.85 minimum=29224.83
instructions_per_op:
        mean=   108345.12 standard-deviation=931.33
        median= 108289.82 median-absolute-deviation=55.97
        maximum=124394.65 minimum=108188.37
cpu_cycles_per_op:
        mean=   70333.79 standard-deviation=2247.71
        median= 69985.47 median-absolute-deviation=1212.65
        maximum=92219.10 minimum=65881.72

⏱  real=5:10.98  user=2:40.01  sys=1:45.84  cpu=85%
```

```
throughput:
        mean=   38353.12 standard-deviation=1806.46
        median= 38971.17 median-absolute-deviation=1365.79
        maximum=41143.64 minimum=32967.57
instructions_per_op:
        mean=   117270.60 standard-deviation=35.50
        median= 117268.07 median-absolute-deviation=16.81
        maximum=117475.89 minimum=117073.74
cpu_cycles_per_op:
        mean=   57256.00 standard-deviation=1039.17
        median= 57341.93 median-absolute-deviation=634.50
        maximum=61993.62 minimum=54670.77

⏱  real=5:12.82  user=4:10.79  sys=11.530  cpu=83%
```

This shows ~240 instructions per op increase for reads and ~180 instructions per op increase for writes.

Tests have been run multiple times, with almost identical results. Each run lasted 300 seconds. Number of operations executed is roughly 38k per second * 300 seconds = 11.4m ops.

Update:

I have repeated the benchmark with clean state - reboot computer, put in performance mode, rebuild, closed other apps that might affect CPU and disk usage.

run count: 5 times before and 5 times after the patch
duration: 300 seconds

Average write throughput median before patch: 41155.99
Average write throughput median after patch: 42193.22

Median absolute deviation is also lower now, with values in range 350-550, while the previous runs' values were in range 750-1350.

</details>

Built and run on `release` target.

<details>

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null

```
throughput:
        mean=   14910.90 standard-deviation=477.72
        median= 14956.73 median-absolute-deviation=294.16
        maximum=16061.18 minimum=13198.68
instructions_per_op:
        mean=   659591.63 standard-deviation=495.85
        median= 659595.46 median-absolute-deviation=324.91
        maximum=661184.94 minimum=658001.49
cpu_cycles_per_op:
        mean=   213301.49 standard-deviation=2724.27
        median= 212768.64 median-absolute-deviation=1403.85
        maximum=225837.15 minimum=208110.12

⏱  real=5:19.26  user=5:00.22  sys=15.827  cpu=98%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null

```
throughput:
        mean=   93345.45 standard-deviation=4499.00
        median= 93915.52 median-absolute-deviation=2764.41
        maximum=104343.64 minimum=79816.66
instructions_per_op:
        mean=   65556.11 standard-deviation=97.42
        median= 65545.11 median-absolute-deviation=71.51
        maximum=65806.75 minimum=65346.25
cpu_cycles_per_op:
        mean=   34160.75 standard-deviation=803.02
        median= 33927.16 median-absolute-deviation=453.08
        maximum=39285.19 minimum=32547.13

⏱  real=5:03.23  user=4:29.46  sys=29.255  cpu=98%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null

```
throughput:
        mean=   206982.18 standard-deviation=15894.64
        median= 208893.79 median-absolute-deviation=9923.41
        maximum=232630.14 minimum=127393.34
instructions_per_op:
        mean=   35983.27 standard-deviation=6.12
        median= 35982.75 median-absolute-deviation=3.75
        maximum=36008.24 minimum=35952.14
cpu_cycles_per_op:
        mean=   17374.87 standard-deviation=985.06
        median= 17140.81 median-absolute-deviation=368.86
        maximum=26125.38 minimum=16421.99

⏱  real=5:01.23  user=4:57.88  sys=0.124  cpu=98%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null

```
throughput:
        mean=   16198.26 standard-deviation=902.41
        median= 16094.02 median-absolute-deviation=588.58
        maximum=17890.10 minimum=13458.74
instructions_per_op:
        mean=   659752.73 standard-deviation=488.08
        median= 659789.16 median-absolute-deviation=334.35
        maximum=660881.69 minimum=658460.82
cpu_cycles_per_op:
        mean=   216070.70 standard-deviation=3491.26
        median= 215320.37 median-absolute-deviation=1678.06
        maximum=232396.48 minimum=209839.86

⏱  real=5:17.33  user=4:55.87  sys=18.425  cpu=99%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null

```
throughput:
        mean=   97067.79 standard-deviation=2637.79
        median= 97058.93 median-absolute-deviation=1477.30
        maximum=106338.97 minimum=87457.60
instructions_per_op:
        mean=   65695.66 standard-deviation=58.43
        median= 65695.93 median-absolute-deviation=37.67
        maximum=65947.76 minimum=65547.05
cpu_cycles_per_op:
        mean=   34300.20 standard-deviation=704.66
        median= 34143.92 median-absolute-deviation=321.72
        maximum=38203.68 minimum=33427.46

⏱  real=5:03.22  user=4:31.56  sys=29.164  cpu=99%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null

```
throughput:
        mean=   223495.91 standard-deviation=6134.95
        median= 224825.90 median-absolute-deviation=3302.09
        maximum=234859.90 minimum=193209.69
instructions_per_op:
        mean=   35981.41 standard-deviation=3.16
        median= 35981.13 median-absolute-deviation=2.12
        maximum=35991.46 minimum=35972.55
cpu_cycles_per_op:
        mean=   17482.26 standard-deviation=281.82
        median= 17424.08 median-absolute-deviation=143.91
        maximum=19120.68 minimum=16937.43

⏱  real=5:01.23  user=4:58.54  sys=0.136  cpu=99%
```

</details>

Fixes: #24567

This PR is a continuation of #24738 [transport: remove throwing protocol_exception on connection start](https://github.com/scylladb/scylladb/pull/24738). This PR does not solve a burning issue, but is rather an improvement in the same direction. As it is just an enhancement, it should not be backported.

Closes scylladb/scylladb#25408

* github.com:scylladb/scylladb:
  test/cqlpy: add protocol exception tests
  test/cqlpy: `test_protocol_exceptions.py` refactor message frame building
  test/cqlpy: `test_protocol_exceptions.py` refactor duplicate code
  transport: replace `make_frame` throw with return result
  cql3: remove throwing `protocol_exception`
  transport: replace throw in validate_utf8 with result_with_exception_ptr return
  transport: replace throwing protocol_exception with returns
  utils: add result_with_exception_ptr
  test/cqlpy: add unknown compression algorithm test case
2025-09-10 21:54:15 +03:00
Avi Kivity
fc64333040 Merge 'sstables/trie: add BTI index readers and writers' from Michał Chojnowski
This is yet another part in the BTI index project.

Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25506/
Next part: plugging the BTI index readers and writers into sstable readers and writers.

The new code added in this PR isn't used outside of tests yet, but it's posted as a separate PR for reviewability.

This series implements, on top of the key translation logic, and abstract trie writing and traversal logic, a writer and a reader of sstable index files (which map primary keys to positions in Data.db), as described in f16fb6765b/src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.md.

Caveats:
1. I think the added test has reasonable coverage, but that depends on running it multiple times. (Though it shouldn't need more than a few runs to catch any bug it covers). It's somewhat awkward as a test meant for running in CI, it's better as something you run many times after a relevant change.
2. These readers and writers are intended to be compatible with Cassandra, but I did *NOT* do any compatibility testing. The writers and readers added here have only been tested against each other, not against Cassandra's readers and writers.
3. This didn't undergo any proper benchmarking and optimization work. I was doing some measurements in the past, but everything was rewritten so much since then that the my old measurements are effectively invalidated. Frankly I have no idea what the performance of all this branchy-branchy logic is now.

No backports needed, new functionality.

Closes scylladb/scylladb#25626

* github.com:scylladb/scylladb:
  test/manual: add bti_cassandra_compatibility_test
  test/lib/random_schema: add some constraints for generated uuid and time/date values
  test/lib/random_utils: add a variant of get_bytes which takes an `engine&`
  test/boost: add bti_index_test
  sstables/writer: add an accessor for the current write position in Data.db
  sstables/trie: introduce bti_index_reader
  sstables/trie: add bti_partition_index_writer.cc
  sstables/trie: add bti_row_index_writer.cc
  utils/bit_cast: add a new overload of write_unaligned()
  sstables/trie: add trie_writer::add_partial()
  sstables/consumer: add read_56()
  sstables/trie: make bti_node_reader::page_ptr copy-constructible
  sstables: extract abstract_index_reader from index_reader.hh to its own header
  sstables/trie: add an accessor to the file_writer under bti_node_sink
  sstables/types: make `deletion_time::operator tombstone()` const
  sstables/types: add sstables::deletion_time::make_live()
  sstables/trie: fix a special case in max_offset_from_child
  sstables/trie: handle `partition_region`s other than `clustered` in BTI position encoding
  sstables/trie: rewrite lcb_mismatch to handle fragment invalidation
  test/boost/bti_key_translation_test: fix a compilation error hidden behind `if constexpr`
2025-09-10 21:48:52 +03:00
Nadav Har'El
ce4592d8fc Merge 'test: cluster: deflake consistency checks after decommission' from Patryk Jędrzejczak
In the Raft-based topology, a decommissioning node is removed from group
0 after the decommission request is considered finished (and the token
ring is updated). Therefore, `check_token_ring_and_group0_consistency`
called just after decommission might fail when the decommissioned node
is still in group 0 (as a non-voter). We deflake all tests that call
`check_token_ring_and_group0_consistency` after decommission in this PR.

Fixes #25809

This PR improves CI stability and changes only tests, so it should be
backported to all supported branches.

Closes scylladb/scylladb#25927

* github.com:scylladb/scylladb:
  test: cluster: deflake consistency checks after decommission
  test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency
2025-09-10 17:57:02 +03:00
Dawid Pawlik
1ce76a6ca2 cqlpy/test_vector_index: add vector index version test
Test if the index version is the same as the base table version before
the index was created.
Test if recreating the index with the same parameters changes the version.
Test if altering the base table does not change the version.
Test if the user cannot specify the index version option by themself.
2025-09-10 15:19:36 +02:00