4052 Commits

Author SHA1 Message Date
Botond Dénes
34473302b0 Merge 'docs: document existing guardrails' from Andrzej Jackowski
This patch series introduces a new documentation for exiting guardrails.

Moreover:
 - Warning / failure messages of recently added write CL guardrails (SCYLLADB-259) are rephrased, so all guardrails have similar messages.
 - Some new tests are added, to help verify the correctness of the documentation and avoid situations where the documentation and implementation diverge.

Fixes: [SCYLLADB-257](https://scylladb.atlassian.net/browse/SCYLLADB-257)

No backport, just new docs and tests.

[SCYLLADB-257]: https://scylladb.atlassian.net/browse/SCYLLADB-257?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#29011

* github.com:scylladb/scylladb:
  test: add new guardrail tests matching documentation scenarios
  test: add metric assertions to guardrail replication strategy tests
  test: use regex matching in guardrail replication strategy tests
  test: extract ks_opts helper in test_guardrail_replication_strategy
  docs: document CQL guardrails
  cql: improve write consistency level guardrail messages
2026-03-20 08:56:00 +02:00
Michael Litvak
9172cc172e schema: add logstor cf property
add a schema property for tables with logstor storage
2026-03-18 19:24:26 +01:00
Piotr Dulikowski
d8b283e1fb Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros
In this series we add support for forwarding strongly consistent CQL requests to suitable replicas, so that clients can issue reads/writes to any node and have the request executed on an appropriate tablet replica (and, for writes, on the Raft leader). We return the same CQL response as what the user would get while sending the request to the correct replica and we perform the same logging/stats updates on the request coordinator as if the coordinator was the appropriate replica.

The core mechanism of forwarding a strongly consistent request is sending an RPC containing the user's cql request frame to the appropriate replica and returning back a ready, serialized `cql_transport::response`. We do this in the CQL server - it is most prepared for handling these types and forwarding a request containing a CQL frame allows us to reuse near-top-level methods for CQL request handling in the new RPC handler (such as the general `process`)

For sending the RPC, the CQL server needs to obtain the information about who should it forward the request to. This requires knowledge about the tablet raft group members and leader. We obtain this information during the execution of a `cql3/strong_consistency` statement, and we return this information back to the CQL server using the generalized `bounce_to_shard` `response_message`, where we now store the information about either a shard, or a specific replica to which we should forward to. Similarly to `bounce_to_shard`, we need to handle this `result_message` in a loop - a replica may move during statement execution, or the Raft leader can change. We also use it for forwarding strongly consistent writes when we're not a member of the affected tablet raft group - in that case we need to forward the statement twice - once to any replica of the affected tablet, then that replica can find the leader and return this information to the coordinator, which allows the second request to be directed to the leader.

This feature also allows passing through exception messages which happened on the target replica while executing the statement. For that, many methods of the `cql_transport::cql_server::connection` for creating error responses needed to be moved to `cql_transport::cql_server`. And for final exception handling on the coordinator, we added additional error info to the RPC response, so that the handling can be performed without having the `result_message::exception` or `exception_ptr` itself.

Fixes [SCYLLADB-71](https://scylladb.atlassian.net/browse/SCYLLADB-71)

[SCYLLADB-71]: https://scylladb.atlassian.net/browse/SCYLLADB-71?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#27517

* github.com:scylladb/scylladb:
  test: add tests for CQL forwarding
  transport: enable CQL forwarding for strong consistency statements
  transport: add remote statement preparation for CQL forwarding
  transport: handle redirect responses in CQL forwarding
  transport: add exception handling for forwarded CQL requests
  transport: add basic CQL request forwarding
  idl: add a representation of client_state for forwarding
  cql_server: handle query, execute, batch in one case
  transport: inline process_on_shard in cql_server::process
  transport: extract process() to cql_server
  transport: add messaging_service to cql_server
  transport: add response reconstruction helpers for forwarding
  transport: generalize the bounce result message for bouncing to other nodes
  strong consistency: redirect requests to live replicas from the same rack
  transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server
  transport: extract the error handling from process_request_one
  transport: move error response helpers from connection to cql_server
2026-03-13 15:03:10 +01:00
Andrzej Jackowski
60aaea8547 cql: improve write consistency level guardrail messages
Update warn and fail messages for the write_consistency_levels_warned
and write_consistency_levels_disallowed guardrails to include the
configuration option name and actionable guidance. The main motivation
is to make the messages follow the conventions of other guardrails.

Refs: SCYLLADB-257
2026-03-13 14:40:45 +01:00
Avi Kivity
03186ce60d Merge 'Cleanup after auth v1 and default superuser code removal' from Marcin Maliszkiewicz
This is short cleanup after recent removal of creating default cassandra superuser and auth-v1 code removal.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1036

Backport: no, just code cleanup

Closes scylladb/scylladb#29004

* github.com:scylladb/scylladb:
  auth: remove DEFAULT_SUPERUSER_NAME constant and dead DEFAULT_USER_PASSWORD
  auth: use configurable default_superuser in describe_roles
  auth: move default_superuser to common, remove _superuser member
  auth: use LOCAL_ONE for all auth queries
  auth: remove get_auth_ks_name indirection
2026-03-12 23:44:32 +02:00
Avi Kivity
e2eeef3e01 Merge 'service level: remove remnants of version 1 service level' from Gleb Natapov
can_use_effective_service_level_cache() always returns true now, so the function can be dropped entirely and all the code that assumes it may return false can be dropped as well. Also drop async versions of find_effective_service_level and get_user_scheduling_group since they are unused.

No need to backport, code removal,

Closes scylladb/scylladb#29002

* github.com:scylladb/scylladb:
  service level: make maybe_update_per_service_level_params synchronous
  service level: remove unused get_user_scheduling_group function
  service level: drop async find_effective_service_level
  service level: remove remnants of version 1 service level
2026-03-12 23:39:41 +02:00
Wojciech Mitros
32974770b0 test: add tests for CQL forwarding
Add basic cluster tests for CQL forwarding.
The test cases include:
- basic reads and writes
- prepared statements with binds
- forwarding from a non-replica
- exception passthrough during forwarding (using an injection)
- re-preparing a statement on the target node, even if the user
  query is also an EXECUTE request on a prepared statement
- verification metric updates

The existing test_basic_write_read was modified so that a few extra
cases could be validated on the same cluster.
2026-03-12 19:43:35 +01:00
Wojciech Mitros
916a9995c1 transport: enable CQL forwarding for strong consistency statements
We enable CQL forwarding by starting to return the bounce_to_node
result message in redirect_statement() instead of throwing. The
forwarding code introduced in the preceding patches reacts to these
messages, allowing the requests to be forwarded.

With the update, some tests assuming that requests can't be forwarded
need to be adjusted, so we do that as well.
2026-03-12 19:43:35 +01:00
Avi Kivity
76b6784c1a Merge 'cql3: track CQL parsing memory cost and use it for admission control' from Marcin Maliszkiewicz
Use rolling_max_tracker to record gross bytes allocated during each
CQL parse.  The rolling maximum is then added to the memory estimate
for incoming QUERY and PREPARE requests so that the admission control
in the CQL transport layer accounts for parsing overhead.

The measured memory footprint serves as upper bound rather than
exact number but it's purpose is to prevent OOMs under unprepared
statements heavy load.

In benchmark 1G memory node shows decrease of non-LSA memory usage
from peak 320MB (our coordinator budget is 10% of 1G) to 96MB. While
tps drops from 1.2 kops to 0.8 kops. Drop in tps is expected as
memory admission kicks in trying to prevent OOM.

This is phase 1 of OOM prevention, potential next steps:
- add second admission in query_processor::get_statement trying to prevent potential thundering herd problem
- decrease cql_server memory pool size
- count reads in the memory pool
- add per service level memory pool and a shared one

Related https://scylladb.atlassian.net/browse/SCYLLADB-740
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-938

Backport: no, new feature, but we may reconsider if some customer needs it

Closes scylladb/scylladb#28919

* github.com:scylladb/scylladb:
  cql3: track CQL parsing memory cost and use it for admission control
  utils: add rolling max tracker
2026-03-12 19:59:52 +02:00
Wojciech Mitros
e44820ba1f transport: generalize the bounce result message for bouncing to other nodes
In the following patches, we'll start allowing forwarding requests to strongly
consistent tables so that they'll get executed on the suitable tablet Raft group
members. For that we'll reuse the approach that we already have for bouncing
requests to other shards - we'll try to execute a request locally, and the
result of that will be a bounce message with another replica as the target.
In this patch we generalize the former bounce_to_shard result message so that
it will be able to specify the target of the bounce as another shard or specific
replica.
We also rename it to result_message::bounce so that it stops implying that only
another shard may be its target.
Aside from the host_id and the shard, the new message also includes the timeout,
because in the service handling the forwarding we won't have the access to it,
and it's needed for specifying how long we should wait for the forwarded
requests. It also includes an information whether this is a write request
to return correct timeout response in case the deadline is exceeded.
We will return other hosts in the new bounce message when executing requests to
strongly consistent tables when we can't handle the request because we aren't
a suitable replica. We can't handle this message yet, so we don't return it
anywhere and we still assume that every bounce message is a bounce to the same
host.
2026-03-12 17:48:57 +01:00
Gleb Natapov
a934d8391d service level: drop async find_effective_service_level
find_cached_effective_service_level does exactly same thing now and it
is synchronous.
2026-03-12 14:28:26 +02:00
Marcin Maliszkiewicz
b277d9d9aa cql3: track CQL parsing memory cost and use it for admission control
Use rolling_max_tracker to record gross bytes allocated during each
CQL parse.  The rolling maximum is then added to the memory estimate
for incoming QUERY and PREPARE requests so that the admission control
in the CQL transport layer accounts for parsing overhead.

The measured memory footprint serves as upper bound rather than
exact number but it's purpose is to prevent OOMs under unprepared
statements heavy load.

In benchmark 1G memory node shows decrease of non-LSA memory usage
from peak 320MB (our coordinator budget is 10% of 1G) to 96MB. While
tps drops from 1.2 kops to 0.8 kops. Drop in tps is expected as
memory admission kicks in trying to prevent OOM.
2026-03-12 10:16:10 +01:00
Marcin Maliszkiewicz
2d22eea2f9 Merge 'cql3: Replace SCYLLA_ASSERT and abort by throwing_assert' from Nadav Har'El
In this patch we replace every single use of SCYLLA_ASSERT(), abort() and assert() in the cql3/ directory by throwing_assert().

The problem with SCYLLA_ASSERT()/abort()/assert() is that when it fails, it crashes Scylla. This is almost always a bad idea (see #7871 discussing why), but it's even riskier in front-end code like cql3/: In front-end code, there is a risk that due to a bug in our code, a specific user request can cause Scylla to crash. A malicious user can send this query to all nodes and crash the entire cluster. When the user is not malicious, it causes a small problem (a failing request) to become a much worse crash - and worse, the user has no idea which request is causing this crash and the crash will repeat if the same request is tried again.

All of this is solved by using the new throwing_assert(), which is the same as SCYLLA_ASSERT() but throws an exception (using on_internal_error()) instead of crashing. The exception will prevent the code path with the invalid assumption from continuing, but will result in only the current user request being aborted, with a clear error message reporting the internal server error due to an assertion failure.

I reviewed all the changes that I did in these patches to check that (to the best of my understanding) none of the assertions in cql3/ involve the sort of serious corruption that might require crashing the Scylla node entirely.

throwing_assert() also improves logging of assertion failures compared to the original SCYLLA_ASSERT()/abort() - SCYLLA_ASSERT() printed a message to stderr which in many installations is lost, and abort() often prints no message at all. But throwing_assert() uses Scylla's standard logger, and also includes a backtrace in the log message.

Fixes #13970 (Exorcise assertions from CQL code paths)
Refs #7871 (Exorcise assertions from Scylla)

Closes scylladb/scylladb#28847

* github.com:scylladb/scylladb:
  cql3: remove unnecessary assert()
  cql3: replace abort() by throwing_assert()
  cql3: Replace SCYLLA_ASSERT by throwing_assert
2026-03-12 09:09:24 +01:00
Marcin Maliszkiewicz
6d1153687a auth: remove get_auth_ks_name indirection
Replace get_auth_ks_name(qp) with db::system_keyspace::NAME directly.
The function always returned the constant "system" and its qp
parameter was unused.
2026-03-11 16:26:47 +01:00
Piotr Dulikowski
d9a277453e Merge 'cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race' from Alex Dathskovsky
query_processor::prepare() could race with prepared statement invalidation: after loading from the prepared cache, we converted the cached object to a checked weak pointer and then continued asynchronous work (including error-injection waitpoints). If invalidation happened in that window, the weak handle could no longer be promoted and the prepare path could fail nondeterministically.

This change keeps a strong cache entry reference alive across the whole critical section in prepare() by using a pinned cache accessor (get_pinned()), and only deriving the weak handle while the entry is pinned. This removes the lifetime gap without adding retry loops.

  Test coverage was extended in test/cluster/test_prepare_race.py:

  - reproduces the invalidation-during-prepare window with injection,
  - verifies prepare completes successfully,
  - then invalidates again and executes the same stale client prepared object,
  - confirms the driver transparently re-requests/re-prepares and execution succeeds.

  This change introduces:

  - no behavior change for normal prepare flow besides stronger lifetime guarantees,
  - no new protocol semantics,
  - preserves existing cache invalidation logic,
  - adds explicit cluster-level regression coverage for both the race and driver reprepare path.
  - pushes the re prepare operation twards the driver, the server will return unprepared error for the first time and the driver will have to re prepare during execution stage

Fixes: https://github.com/scylladb/scylladb/issues/27657

Backport to active branches recommended: No node crash, but user-visible PREPARE failures under rare schema-invalidation race; low-risk timeout-bounded retry improves robustness.

Closes scylladb/scylladb#28952

* github.com:scylladb/scylladb:
  transport/messages: hold pinned prepared entry in PREPARE result
  cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race
2026-03-11 12:09:23 +01:00
Botond Dénes
475220b9c9 Merge 'Remove the rest of pre raft topology code' from Gleb Natapov
Remove the rest of the code that assumes that either group0 does not exist yet or a cluster is till not upgraded to raft topology. Both of those are not supported any more.

No need to backport since we remove functionality here.

Closes scylladb/scylladb#28841

* github.com:scylladb/scylladb:
  service level: remove version 1 service level code
  features: move GROUP0_SCHEMA_VERSIONING to deprecated features list
  migration_manager: remove unused forward definitions
  test: remove unused code
  auth: drop auth_migration_listener since it does nothing now
  schema: drop schema_registry_entry::maybe_sync() function
  schema: drop make_table_deleting_mutations since it should not be needed with raft
  schema: remove calculate_schema_digest function
  schema: drop recalculate_schema_version function and its uses
  migration_manager: drop check for group0_schema_versioning feature
  cdc: drop usage of cdc_local table and v1 generation definition
  storage_service: no need to add yourself to the topology during reboot since raft state loading already did it
  storage_service: remove unused functions
  group0: drop with_raft() function from group0_guard since it always returns true now
  gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more
  gossiper: drop tokens from loaded_endpoint_state
  gossiper: remove unused functions
  storage_service: do not pass loaded_peer_features to join_topology()
  storage_service: remove unused fields from replacement_info
  gossiper: drop is_safe_for_restart() function and its use
  storage_service: remove unused variables from join_topology
  gossiper: remove the code that was only used in gossiper topology
  storage_service: drop the check for raft mode from recovery code
  cdc: remove legacy code
  test: remove unused injection points
  auth: remove legacy auth mode and upgrade code
  treewide: remove schema pull code since we never pull schema any more
  raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer
  group0: hoist the checks for an illegal upgrade into main.cc
  api: drop get_topology_upgrade_state and always report upgrade status as done
  service_level_controller: drop service level upgrade code
  test: drop run_with_raft_recovery parameter to cql_test_env
  group0: get rid of group0_upgrade_state
  storage_service: drop topology_change_kind as it is no longer needed
  storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more
  service_storage: remove unused functions
  storage_service: remove non raft rebuild code
  storage_service: set topology change kind only once
  group0: drop in_recovery function and its uses
  group0: rename use_raft to maintenance_mode and make it sync
2026-03-11 10:24:20 +02:00
Nadav Har'El
00a819bcd8 cql3: remove unnecessary assert()
In cql3/, there was one call to assert() (not SCYLLA_ASSERT or
throwing_assert), and it was:

        const auto shard_num = smp::count;
        assert(shard_num > 0)

Rather than converting this assert() to throwing_assert() as I did in
previous patches, I decided to outright remove it: Seastar guarantees
that smp::count is not zero. Many other places in the code use
smp::count assuming that it is correct, no other place bothers to assert
it isn't zero.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-03-11 09:43:24 +02:00
Nadav Har'El
34eec020b3 cql3: replace abort() by throwing_assert()
After the previous patch replaced all SCYLLA_ASSERT() calls by
throwing_assert(), this patch also replaces all calls to abort().

All these abort() calls are supposedly cases that can never happen,
but if they ever do happen because of a bug, in none of these places
we absolutely need to crash - and exception that aborts the current
operation should be enough.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-03-11 09:43:11 +02:00
Nadav Har'El
c87d6407ed cql3: Replace SCYLLA_ASSERT by throwing_assert
In this patch we replace every single use of SCYLLA_ASSERT() in the cql3/
directory by throwing_assert().

The problem with SCYLLA_ASSERT() is that when it fails, it crashes Scylla.
This is almost always a bad idea (see #7871 discussing why), but it's even
riskier in front-end code like cql3/: In front-end code, there is a risk
that due to a bug in our code, a specific user request can cause Scylla
to crash. A malicious user can send this query to all nodes and crash
the entire cluster. When the user is not malicious, it causes a small
problem (a failing request) to become a much worse crash - and worse,
the user has no idea which request is causing this crash and the crash
will repeat if the same request is tried again.

All of this is solved by using the new throwing_assert(), which is the
same as SCYLLA_ASSERT() but throws an exception (using on_internal_error())
instead of crashing. The exception will prevent the code path with the
invalid assumption from continuing, but will result in only the current
user request being aborted, with a clear error message reporting the
internal server error due to an assertion failure.

I reviewed all the changes that I did in this patch to check that (to the
best of my understanding) none of the assertions in cql3/ involve the
sort of serious corruption that might require crashing the Scylla node
entirely.

throwing_assert() also improves logging of assertion failures compared
to the original SCYLLA_ASSERT() - SCYLLA_ASSERT() printed a message to
stderr which in many installations is lost, whereas throwing_assert()
uses Scylla's standard logger, and also includes a backtrace in the
log message.

Fixes #13970 (Exorcise assertions from CQL code paths)
Refs #7871 (Exorcise assertions from Scylla)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-03-11 09:41:20 +02:00
Dawid Mędrek
167feabe1a cql3: Reject user-provided timestamps for strongly consistent tables
Similarly to LWTs, we reject queries with user-provided timestamps
when they target strongly consistent tables.

Such statements could force us to rewrite history, and that contradicts
the philosophy of linearizability we aim for.

Fixes SCYLLADB-879

Closes scylladb/scylladb#28867
2026-03-10 22:11:39 +02:00
Alex
3ac4e258e8 transport/messages: hold pinned prepared entry in PREPARE result
result_message::prepared now owns a strong pinned prepared-cache entry instead of relying only on a weak pointer view. This closes the remaining lifetime gap after query_processor::prepare() returns, so users of the returned PREPARE message cannot observe an invalidated weak handle during subsequent
  processing.

  - update result_message::prepared::cql constructor to accept pinned entry
  - construct weak view from owned pinned entry inside the message
  - pass pinned cache entry from query_processor::prepare() into the message constructor
2026-03-10 14:17:57 +02:00
Alex
27051d9a7c cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race
query_processor::prepare() could race with prepared statement invalidation: after loading from the prepared cache, we converted the cached object to a checked weak pointer and then continued asynchronous work (including error-injection waitpoints). If invalidation happened in that window, the weak
  handle could no longer be promoted and the prepare path could fail nondeterministically.

  This change keeps a strong cache entry reference alive across the whole critical section in prepare() by using a pinned cache accessor (get_pinned()), and only deriving the weak handle while the entry is pinned. This removes the lifetime gap without adding retry loops.

  Test coverage was extended in test/cluster/test_prepare_race.py:

  - reproduces the invalidation-during-prepare window with injection,
  - verifies prepare completes successfully,
  - then invalidates again and executes the same stale client prepared object,
  - confirms the driver transparently re-requests/re-prepares and execution succeeds.

  This change introduces:

  - no behavior change for normal prepare flow besides stronger lifetime guarantees,
  - no new protocol semantics,
  - preserves existing cache invalidation logic,
  - adds explicit cluster-level regression coverage for both the race and driver reprepare path.
  - pushes the re prepare operation twards the driver, the server will return unprepared error for the first time and the driver will have to re prepare during execution stage
2026-03-10 14:17:57 +02:00
Gleb Natapov
b59b3d4f8a service level: remove version 1 service level code 2026-03-10 10:46:48 +02:00
Gleb Natapov
1d188f0394 auth: remove legacy auth mode and upgrade code
A system needs to be upgraded to use v2 auth before moving to this
ScyllaDB version otherwise the boot will fail.
2026-03-10 10:09:39 +02:00
Michał Chojnowski
ff60a5f1e5 cql3: suggest ALTER MATERIALIZED VIEW to users trying to use ALTER TABLE on a view
When a user tries to use ALTER TABLE on a materialized view,
the resulting error message is `Cannot use ALTER TABLE on Materialized View`.

The intention behind this error is that ALTER MATERIALIZED VIEW should
be used instead.

But we observed that some users interpret this error message as a general
"You cannot do any ALTER on this thing".

This patch enhances the error message (and others similar to it)
to prevent the confusion.

Closes scylladb/scylladb#28831
2026-03-09 15:07:21 +01:00
Szymon Malewski
d817e56e87 vector_similarity_fcts.cc: fix strict aliasing violation in extract_float_vector
Previous code performed endian conversion by bulk-copying raw bytes
into a std::vector<float> and then iterating over it via a
reinterpret_cast<uint32_t*> pointer. Accessing float storage through a
uint32_t* violates C++ strict aliasing rules, giving the compiler
freedom to reorder or elide the stores, causing undefined behavior.

Replace the two-pass approach with a single-pass loop using
seastar::consume_be<uint32_t>() and std::bit_cast<float>(), which is
both well-defined and auto-vectorizable.

Follow-up #28754

Closes scylladb/scylladb#28912
2026-03-06 09:15:45 +01:00
Szymon Malewski
f9d213547f cql3: selection: fix add_column_for_post_processing for ORDER BY
The purpose of `add_column_for_post_processing` is to add columns that are required for processing of a query,
but are not part of SELECT clause and shouldn't be returned. They are added to the final result set, but later are not serialized.
Mainly it is used for filtering and grouping columns, with a special case of `WHERE primary_key IN ...  ORDER BY ...` when the whole result set needs additional final sorting,
and ordering columns must be added as well.
There was a bug that manifested in #9435, #8100 and was actually identified in #22061.
In case of selection with processing (e.g functions involved), result set row is formed in two stages.
Initially it is a list of columns fetched from replicas - on which filtering and grouping is performed.
After that the actual selection is resolved and the final number of columns can change.
Ordering is performed on this final shape, but the ordering column index returned by `add_column_for_post_processing` refereed to initial shape.
If selection refereed to the same column twice (e.g. `v, TTL(v)` as in #9435) final row was longer than initial and ordering refereed to incorrect column.
If a function in selection refereed to multiple columns (e.g. as_json(.., ..) which #8100 effectively uses) the final row was shorter
and ordering tried to use a non-existing column.

This patch fixes the problem by making sure that column index of the final result set is used for ordering.

The previously crashing test `cassandra_tests/validation/entities/json_test.py::testJsonOrdering` doesn't have to be skipped, but now it is failing on issue #28467.

Fixes #9435
Fixes #8100
Fixes #22061

Closes scylladb/scylladb#28472
2026-03-05 19:22:34 +02:00
Marcin Maliszkiewicz
c3f59e4fa1 Merge 'cql3: implement write_consistency_levels guardrails' from Andrzej Jackowski
This patch series implements `write_consistency_levels_warned` and `write_consistency_levels_disallowed` guardrails, allowing the configuration of which consistency levels are unwanted for writes. The motivation for these guardrails is to forbid writing with consistency levels that don't provide high durability guarantees (like CL=ANY, ONE, or LOCAL_ONE).

Neither guardrail is enabled by default, so as not to disrupt clusters that are currently using any of the CLs for writes. The warning guardrail may seem harmless, as it only adds a warning to the CQL response; however, enabling it can significantly increase network traffic (as a warning message is added to each response) and also decrease throughput due to additional allocations required to prepare the warning. Therefore, both guardrails should be enabled with care. The newly added `writes_per_consistency_level` metric, which is incremented unconditionally, can help decide whether a guardrail can be safely enabled in an existing cluster.

This commit adds additional `if` instructions on the critical path. However, based on the `perf_simple_query` benchmark for writes, the difference is marginal (~40 additional instructions, which is a relative difference smaller than 0.001).

BEFORE:
```
291443.35 tps ( 53.3 allocs/op,  16.0 logallocs/op,  14.2 tasks/op,   48067 insns/op,   18885 cycles/op,        0 errors)
throughput:
 mean=   289743.07 standard-deviation=6075.60
 median= 291424.69 median-absolute-deviation=1702.56
 maximum=292498.27 minimum=261920.06
instructions_per_op:
 mean=   48072.30 standard-deviation=21.15
 median= 48074.49 median-absolute-deviation=12.07
 maximum=48119.87 minimum=48019.89
cpu_cycles_per_op:
 mean=   18884.09 standard-deviation=56.43
 median= 18877.33 median-absolute-deviation=14.71
 maximum=19155.48 minimum=18821.57
```

AFTER:
```
290108.83 tps ( 53.3 allocs/op,  16.0 logallocs/op,  14.2 tasks/op,   48121 insns/op,   18988 cycles/op,        0 errors)
throughput:
 mean=   289105.08 standard-deviation=3626.58
 median= 290018.90 median-absolute-deviation=1072.25
 maximum=291110.44 minimum=274669.98
instructions_per_op:
 mean=   48117.57 standard-deviation=18.58
 median= 48114.51 median-absolute-deviation=12.08
 maximum=48162.18 minimum=48087.18
cpu_cycles_per_op:
 mean=   18953.43 standard-deviation=28.76
 median= 18945.82 median-absolute-deviation=20.84
 maximum=19023.93 minimum=18916.46
```

Fixes: SCYLLADB-259
Refs: SCYLLADB-739
No backport, it's a new feature

Closes scylladb/scylladb#28570

* github.com:scylladb/scylladb:
  scylla.yaml: add write CL guardrails to scylla.yaml
  scylla.yaml: reorganize guardrails config to be in one place
  test: add cluster tests for write CL guardrails
  test: implement test_guardrail_write_consistency_level
  cql3: start using write CL guardrails
  cql3/query_processor: implement metrics to track CL of writes
  db: cql3/query_processor: add write_consistency_levels enum_sets
  config: add write_consistency_levels_* guardrails configuration
2026-03-05 09:55:38 +01:00
Szymon Malewski
212bd6ae1a vector: Vectorize loops in similarity functions
The main loops iterating over vector components were not vectorized due to:
- "cannot prove it is safe to reorder floating-point operations"
- "Cannot vectorize early exit loop with more than one early exit"
The first issue is fixed with adding `#pragma clang fp contract(fast) reassociate(on)`, which allows compiler to optimize floating point operations.
The second issue is solved by refactoring the operations in the affected loop.
Additionally using float operations instead of double increases throughput and numerical accuracy is not the main consideration in vector search scenarios.

Performance measured:
- scylla built using dbuild
- using https://github.com/zilliztech/VectorDBBench (modified to call `SELECT id, similarity_cosine({vector<float, 1536>}, {vector<float, 1536>}) ...` without ANN search):
- client concurrency 20
before: ~2250 QPS
`float` operations: ~2350 QPS
`compute_cosine_similarity` vectorization: ~2500QPS
`extract_float_vector` vectorization: ~3000QPS

Follow-up https://github.com/scylladb/scylladb/pull/28615
Ref https://scylladb.atlassian.net/browse/SCYLLADB-764

Closes scylladb/scylladb#28754
2026-03-04 15:14:53 +01:00
Andrzej Jackowski
bb359b3b78 cql3: start using write CL guardrails
Enable verification of write consistency level guardrails in
`modification_statement` and `batch_statement`.

Neither guardrail is enabled by default, so as not to disrupt clusters
that are currently using any of the CLs for writes. The warning
guardrail may seem harmless, as it only adds a warning to the CQL
response; however, enabling it can significantly increase network
traffic (as a warning message is added to each response) and also
decrease throughput due to additional allocations required to prepare
the warning. Therefore, both guardrails should be enabled with care.
The newly added `writes_per_consistency_level` metric, which is
incremented unconditionally, can help decide whether a guardrail can
be safely enabled in an existing cluster.

This commit adds additional `if` instructions on the critical path.
However, based on the `perf_simple_query` benchmark for writes,
the difference is marginal (~40 additional instructions, which is
a relative difference smaller than 0.001).

BEFORE:
```
291443.35 tps ( 53.3 allocs/op,  16.0 logallocs/op,  14.2 tasks/op,   48067 insns/op,   18885 cycles/op,        0 errors)
throughput:
	mean=   289743.07 standard-deviation=6075.60
	median= 291424.69 median-absolute-deviation=1702.56
	maximum=292498.27 minimum=261920.06
instructions_per_op:
	mean=   48072.30 standard-deviation=21.15
	median= 48074.49 median-absolute-deviation=12.07
	maximum=48119.87 minimum=48019.89
cpu_cycles_per_op:
	mean=   18884.09 standard-deviation=56.43
	median= 18877.33 median-absolute-deviation=14.71
	maximum=19155.48 minimum=18821.57
```

AFTER:
```
290108.83 tps ( 53.3 allocs/op,  16.0 logallocs/op,  14.2 tasks/op,   48121 insns/op,   18988 cycles/op,        0 errors)
throughput:
	mean=   289105.08 standard-deviation=3626.58
	median= 290018.90 median-absolute-deviation=1072.25
	maximum=291110.44 minimum=274669.98
instructions_per_op:
	mean=   48117.57 standard-deviation=18.58
	median= 48114.51 median-absolute-deviation=12.08
	maximum=48162.18 minimum=48087.18
cpu_cycles_per_op:
	mean=   18953.43 standard-deviation=28.76
	median= 18945.82 median-absolute-deviation=20.84
	maximum=19023.93 minimum=18916.46
```

Fixes: SCYLLADB-259
2026-03-04 07:26:00 +01:00
Dario Mirovic
6a1edab2ac client_state: add has_superuser method
Encapsulate the superuser check in client_state so that it
respects _bypass_auth_checks. Connections that bypass auth
(internal callers and the maintenance socket) are always
considered superusers.

Migrate existing call sites from auth::has_superuser(service, user)
to client_state.has_superuser(). Also add _bypass_auth_checks
handling to ensure_not_anonymous().

Refs SCYLLADB-409
2026-03-03 22:31:35 +01:00
Andrzej Jackowski
371cdb3c81 cql3/query_processor: implement metrics to track CL of writes
Add `write_consistency_levels_disallowed_violations` and
`write_consistency_levels_warned_violations` metrics to track
violations of write_consistency_levels guardrails.

Add `writes_per_consistency_level` to track what CL is used by
writes, regardless of the guardrails configuration.
Data gathered by this metric can be used to decide whether enabling
a particular write consistency level guardrail in a particular
existing cluster is safe.

Refs: SCYLLADB-259
2026-03-03 21:18:11 +01:00
Andrzej Jackowski
3606934458 db: cql3/query_processor: add write_consistency_levels enum_sets
Add enum_sets to query_processor that track the configuration
values of `write_consistency_levels_warned` and
`write_consistency_levels_disallowed`.

Refs: SCYLLADB-259
2026-03-03 20:28:57 +01:00
Karol Nowacki
30487e8854 index: fix vector index with filtering target column
The secondary index mechanism is currently used to determine the target column.
This mechanism works incorrectly for vector indexes with filtering because
it returns the last specified column as the target (vectors) column.
However, the syntax for a vector index requires the first column to be the target:
```
CREATE CUSTOM INDEX ON t(vectors, users) USING 'vector_index';
```

This discrepancy eventually leads to the following exception when performing an
ANN search on a vector index with filtering columns:
````
ANN ordering by vector requires the column to be indexed using 'vector_index'
````

This commit fixes the issue by introducing dedicated logic for vector indexes
to correctly identify the target(vectors) column.

Fixes: SCYLLADB-635

Closes scylladb/scylladb#28740
2026-03-02 18:47:58 +02:00
Marcin Maliszkiewicz
a83ee6cf66 Merge 'db/batchlog_manager: re-add v1 support for mixed clusters' from Botond Dénes
3f7ee3ce5d introduced system.batchlog_v2, with a schema designed to speed up batchlog replays and make post-replay cleanups much more effective.
It did not introduce a cluster feature for the new table, because it is node local table, so the cluster can switch to the new table gradually, one node at a time.
However, https://github.com/scylladb/scylladb/issues/27886 showed that the switching causes timeouts during upgrades, in mixed clusters. Furthermore, switching to the new table unconditionally  on upgrades nodes, means that on rollback, the batches saved into the v2 table are lost.
This PR introduces re-introduces v1 (`system.batchlog`) support and guards the use of the v2 table with a cluster feature, so mixed clusters keep using v1 and thus be rollback-compatible.
The re-introduced v1 support doesn't support post-replay cleanups for simplicity. The cleanup in v1 was never particularly effective anyway and we ended up disabling it for heavy batchlog users, so I don't think the lack of support for cleanup is a problem.

Fixes: https://github.com/scylladb/scylladb/issues/27886

Needs backport to 2026.1, to fix upgrades for clusters using batches

Closes scylladb/scylladb#28736

* github.com:scylladb/scylladb:
  test/boost/batchlog_manager_test: add tests for v1 batchlog
  test/boost/batchlog_manager_test: make prepare_batches() work with both v1 and v2
  test/boost/batchlog_manager_test: fix indentation
  test/boost/batchlog_manager_test: extract prepare_batches() method
  test/lib/cql_assertions: is_rows(): add dump parameter
  tools/scylla-sstable: extract query result printers
  tools/scylla-sstable: add std::ostream& arg to query result printers
  repair/row_level: repair_flush_hints_batchlog_handler(): add all_replayed to finish log
  db/batchlog_manager: re-add v1 support
  db/batchlog_manager: return all_replayed from process_batch()
  db/batchlog_manager: process_bath() fix indentation
  db/batchlog_manager: make batch() a standalone function
  db/batchlog_manager: make structs stats public
  db/batchlog_manager: allocate limiter on the stack
  db/batchlog_manager: add feature_service dependency
  gms/feature_service: add batchlog_v2 feature
2026-03-02 12:09:10 +01:00
Marcin Maliszkiewicz
a03ebe1a29 Merge 'cql: implement a new per-row TTL feature' from Nadav Har'El
This series implements a new per-row TTL feature for CQL. The per-row TTL feature was requested in issue #13000. It is a feature that does not exist in Cassandra, and was inspired by DynamoDB's TTL feature - and under the hood uses the same implementation that we used in Alternator to implement this DynamoDB feature.

The new per-row TTL feature is completely separate from CQL's existing per-write (and per-cell) TTL, and both will be available to users.

In the per-row TTL feature, one column in the table is designated as the "TTL" column, and its value for a row is the expiration time for that row. The TTL column can be designated at table creation time, e.g.:

```cql
CREATE TABLE tab (
    id int PRIMARY KEY,
    t text,
    expiration timestamp TTL
);
```

Or after the table already exists with:

```cql
ALTER TABLE tab TTL expiration
```

Expiration can also be disabled, with:

```cql
ALTER TABLE tab TTL NULL
```

The new per-row TTL feature has two features that users have been asking for:

1. A user can change the value of just the TTL column - without rewriting the entire row - to change the expiration time of the entire row.
2. When an expired row is finally deleted, a CDC event about this deletion appears in the CDC log (if CDC is enabled), including - if a preimage is enabled - the content of the deleted row.

To achieve the second goal (CDC events), a row is not guaranteed to disappear at exactly its expiration time (as CQL's original TTL feature guarantees). Rather, the row is deleted some time later, depending on `alternator_ttl_period_in_seconds`; Until the actual deletion, the row is still readable (and even writable). But we are guaranteed that when the row is finally deleted, the CDC event will come too.

The implementation uses the same background thread used by Alternator to periodically scan for expired items and delete them.

The expiration thread keeps the same metrics as it did for Alternator:
* `scylla_expiration_scan_passes`
* `scylla_expiration_scan_table`
* `scylla_expiration_items_deleted`
* `scylla_expiration_secondary_ranges_scanned`

The series begins with a few small preparation patches, followed by the main part of the feature (which isn't big, since we are just enabling the pre-existing Alternator expiration machinary for CQL) and finally 30 tests (single-node and multi-node tests) and documentation.

This series is a new feature, so traditionally would not be backported. However, I wouldn't be surprised if we will be requested to backport it so that customers will not need to wait for a new major release.

Fixes #13000

Closes scylladb/scylladb#28320

* github.com:scylladb/scylladb:
  test/cqlpy: verify that a column can't be both STATIC and PRIMARY KEY
  docs/cql: document the new CQL per-row TTL feature
  test/cluster: tests for the new CQL per-row TTL feature
  test/cqlpy: tests for the new CQL per-row TTL feature
  test: set low alternator_ttl_period_in_seconds in CQL tests
  cql ttl: fix ALTER TABLE to disable TTL if column is dropped
  cql ttl: add setting/unsetting of TTL column to ALTER TABLE
  cql ttl: add TTL column support to CREATE TABLE and DESC TABLE
  ttl: add CQL support to Alternator's TTL expiration service
  alternator ttl: move TTL_TAG_KEY to a header file
  alternator ttl: remove unnecessary check of feature flag
  cql: add "cql_row_ttl" cluster feature
  alternator: fix error message if UpdateTimeToLive is not supported
2026-02-26 15:29:12 +01:00
Yaniv Michael Kaul
ead9961783 cql: vector: fix vector dimension type
Switch vector dimension handling to fixed-width `uint32_t` type,
update parsing/validation, and add boundary tests.

The dimension is parsed as `unsigned long` at first which is guaranteed
to be **at least** 32-bit long, which is safe to downcast to `uint32_t`.

Move `MAX_VECTOR_DIMENSION` from `cql3_type::raw_vector` to `cql3_type`
to ensure public visibility for checks outside the class.

Add tests to verify the type boundaries.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-223

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>

Closes scylladb/scylladb#28762
2026-02-26 14:46:53 +02:00
Nadav Har'El
eebd7b0fbc cql ttl: fix ALTER TABLE to disable TTL if column is dropped
If "ALTER TABLE tab DROP x" is done to delete column x, and column x was
the designated TTL column, then the per-row TTL feature should be disabled
on this table.

If we don't do this, the expiration scanner will continue to scan the
table trying to read the dropped column - which will be wasteful or
worse.

A test for this case is also included in test/cqlpy/test_ttl_row.py
in a later patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-25 14:59:43 +02:00
Nadav Har'El
acbdf637b6 cql ttl: add setting/unsetting of TTL column to ALTER TABLE
The previous patch added the ability in CREATE TABLE to designate one
of the regular columns as a "TTL column", to be used by the per-row TTL
feature (Refs #13000). In this patch we add to ALTER TABLE the ability
to enable per-row TTL on an existing table with a given column as the
TTL column:

     ALTER TABLE tab TTL colname

and also the ability to disable per-row TTL with

     ALTER TABLE tab TTL NULL

as in CREATE TABLE, the designated TTL column must be a regular column
(it can't be a primary key column or a static column), and must have
the types timestamp, bigint or int.

You can't enable per-row TTL if already enabled, or disable it if
already disabled. To change the TTL column on an existing table,
you must first disable TTL, and then re-enable it with the new column.

A large collection of functional tests (in test/cqlpy), for every detail
of this patch, will come in a later patch in this series.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-25 14:59:43 +02:00
Nadav Har'El
22c79b6af8 cql ttl: add TTL column support to CREATE TABLE and DESC TABLE
This patch enables the per-row TTL feature in CQL (Refs #13000).

This patch allows the user to create a new table with one of its columns
designated as the TTL column with a syntax like:

    CREATE TABLE tab (
        id int PRIMARY KEY,
        t text,
        expiration timestamp TTL
    );

The column marked "TTL" must have the "timestamp", "bigint" or "int"
types (the choice of these types was explained in the previous patch),
and there can only be one such column. We decided not to allow a column
to be both a primary key column and a TTL column - although it would
have worked (it's supported in Alternator), I considered this non-useful
and confusing, and decided not to allow it in CQL. A TTL column also
can't be a static column.

We save the information of which column is the TTL column in a tag which
is read by the "expiration service" - originally a part of Alternator's
TTL implementation. After the previous patch, the expiration service is
running and knows how to understand CQL tables, so the CQL per-row TTL
feature will start to work.

This patch also implements DESC TABLE, printing the word "TTL" in the
right place of the output.

This patch doesn't yet implement ALTER TABLE that should allow enabling
or disabling the TTL column setting on an existing table - we'll do that
in the next patch.

A large collection of functional tests (in test/cqlpy), for every detail
of this feature will be added in a later patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-25 14:59:42 +02:00
Marcin Maliszkiewicz
c5dc086baf Merge 'vector_search: return NaN for similarity_cosine with all-zero vectors' from Dawid Pawlik
The ANN vector queries with all-zero vectors are allowed even on vector indexes with similarity function set to cosine.
When enabling the rescoring option, those queries would fail as the rescoring calls `similarity_cosine` function underneath, causing an `InvalidRequest` exception as all-zero vectors were not allowed matching Cassandra's behaviour.

To eliminate the discrepancy we want the all-zero vector `similarity_cosine` calls to pass, but return the NaN as the cosine similarity for zero vectors is mathematically incorrect. We decided not to use arbitrary values contrary to USearch, for which the distance (not to be confused with similarity) is defined as cos(0, 0) = 0, cos(0, x) = 1 while supporting the range of values [0, 2].
If we wanted to convert that to similarity, that would mean sim_cos(0, x) = 0.5, which does not support mathematical reasoning why that would be more similar than for example vectors marking obtuse angles.
It's safe to assume that all-zero vectors for cosine similarity shouldn't make any impact, therefore we return NaN and eliminate them from best results.

Adjusted the tests accordingly to check both proper Cassandra and Scylla's behaviour.

Fixes: SCYLLADB-456

Backport to 2026.1 needed, as it fixes the bug for ANN vector queries using rescoring introduced there.

Closes scylladb/scylladb#28609

* github.com:scylladb/scylladb:
  test/vector_search: add reproducer for rescoring with zero vectors
  vector_search: return NaN for similarity_cosine with all-zero vectors
2026-02-23 13:10:44 +01:00
Botond Dénes
48e9b3d668 tools/scylla-sstable: extract query result printers
To cql3/query_result_printer.hh. Allowing for other users, outside of
tools.
2026-02-20 07:03:46 +02:00
Szymon Malewski
668d6fe019 vector: Improve similarity functions performance
Improves performance of deserialization of vector data for calculating similarity functions.
Instead of deserializing vector data into a std::vector<data_value>, we deserialize directly into a std::vector<float>
and then pass it to similarity functions as a std::span<const float>.
This avoids overhead of data_value allocations and conversions.
Example QPS of `SELECT id, similarity_cosine({vector<float, 1536>}, {vector<float, 1536>}) ...`:
client concurrency 1: before: ~135 QPS, after: ~1005 QPS
client concurrency 20: before: ~280 QPS, after: ~2097 QPS
Measured using https://github.com/zilliztech/VectorDBBench (modified to call above query without ANN search)

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-471

Closes scylladb/scylladb#28615
2026-02-18 00:33:34 +02:00
Avi Kivity
ffde2414e8 cql3: grammar: remove special case for vector similarity functions in selectors
In b03d520aff ("cql3: introduce similarity functions syntax") we
added vector similarity functions to the grammar. The grammar had to
be modified because we wanted to support literals as vector similarity
function arguments, and the general function syntax in selectors
did not allow that.

In cc03f5c89d ("cql3: support literals and bind variables in
selectors") we extended the selector function call grammar to allow
literals as function arguments.

Here, we remove the special case for vector similarity functions as
the general case in function calls covers all the possibilities the
special case does.

As a side effect, the vector similarity function names are no longer
reserved.

Note: the grammar change fixes an inconsistency with how the vector
similarity functions were evaluated: typically, when a USE statement
is in effect, an unqualified function is first matched against functions
in the keyspace, and only if there is no match is the system keyspace
checked. But with the previous implementation vector similarity functions
ignored the USE keyspace and always matched only the system keyspace.

This small inconsistency doesn't matter in practice because user defined
functions are still experimental, and no one would name a UDF to conflict
with a system function, but it is still good to fix it.

Closes scylladb/scylladb#28481
2026-02-17 12:40:21 +01:00
Dawid Pawlik
af0889d194 vector_search: return NaN for similarity_cosine with all-zero vectors
The ANN vector queries with all-zero vectors are allowed even on vector
indexes with similarity function set to cosine.
When enabling the rescoring option, those queries would fail as the rescoring
calls `similarity_cosine` function underneath, causing an `InvalidRequest` exception
as all-zero vectors were not allowed matching Cassandra's behaviour.

To eliminate the discrepancy we want the all-zero vector `similarity_cosine` calls to pass,
but return the NaN as the cosine similarity for zero vectors is mathematically incorrect.
We decided not to use arbitrary values contrary to USearch, for which the distance
(not to be confused with similarity) is defined as cos(0, 0) = 0, cos(0, x) = 1 while
supporting the range of values [0, 2].
If we wanted to convert that to similarity, that would mean sim_cos(0, x) = 0.5,
which does not support mathematical reasoning why that would be more similar than
for example vectors marking obtuse angles.
It's safe to assume that all-zero vectors for cosine similarity shouldn't make any impact,
therefore we return NaN and eliminate them from best results.

Adjusted the tests accordingly to check both proper Cassandra and Scylla's behaviour.

Fixes: SCYLLADB-456
2026-02-11 12:31:47 +01:00
Nadav Har'El
df69dbec2a Merge ' cql3/statements/describe_statement: hide paxos state tables ' from Michał Jadwiszczak
Paxos state tables are internal tables fully managed by Scylla
and they shouldn't be exposed to the user nor they shouldn't be backed up.

This commit hides those kind of tables from all listings and if such table
is directly described with `DESC ks."tbl$paxos"`, the description is generated
withing a comment and a note for the user is added.

Fixes https://github.com/scylladb/scylladb/issues/28183

LWT on tablets and paxos state tables are present in 2025.4, so the patch should be backported to this version.

Closes scylladb/scylladb#28230

* github.com:scylladb/scylladb:
  test/cqlpy: add reproducer for hidden Paxos table being shown by DESC
  cql3/statements/describe_statement: hide paxos state tables
2026-02-02 21:22:59 +02:00
Avi Kivity
cc03f5c89d cql3: support literals and bind variables in selectors
Add support for literals in the SELECT clause. This allows
SELECT fn(column, 4) or SELECT fn(column, ?).

Note, "SELECT 7 FROM tab" becomes valid in the grammar, but is still
not accepted because of failed type inference - we cannot infer the
type of 7, and don't have a favored type for literals (like C favors
int). We might relax this later.

In the WHERE clause, and Cassandra in the SELECT clause, type hints
can also resolve type ambiguity: (bigint)7 or (text)?. But this is
deferred to a later patch.

A few changes to the grammar are needed on top of adding a `value`
alternative to `unaliasedSelector`:

 - vectorSimilarityArg gained access to `value` via `unaliasedSelector`,
   so it loses that alternate to avoid ambiguity. We may drop
   `vectorSimilarityArg` later.
 - COUNT(1) became ambiguous via the general function path (since
   function arguments can now be literals), so we remove this case
   from the COUNT special cases, remaining with count(*).
 - SELECT JSON and SELECT DISTINCT became "ambiguous enough" for
   ANTLR to complain, though as far as I can tell `value` does not
   add real ambiguity. The solution is to commit early (via "=>") to
   a parsing path.

Due to the loss of count(1) recognition in the parser, we have to
special-case it in prepare. We may relax it to count any expression
later, like modern Cassandra and SQL.

Testing is awkward because of the type inference problem in top-level.
We test via the set_intersection() function and via lua functions.

Example:

```
cqlsh> CREATE FUNCTION ks.sum(a int, b int) RETURNS NULL ON NULL INPUT RETURNS int  LANGUAGE LUA AS 'return a + b';
cqlsh> SELECT ks.sum(1, 2) FROM system.local;

 ks.sum(1, 2)
--------------
            3

(1 rows)
cqlsh>
```

(There are no suitable system functions!)

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-296

Closes scylladb/scylladb#28256
2026-02-02 00:06:13 +02:00
Marcin Maliszkiewicz
e18b519692 cql3: remove find_schema call from select check_access
Schema is already a member of select statement, avoiding
the call saves around 400 cpu instructions on a select
request hot path.

Closes scylladb/scylladb#28328
2026-01-30 11:49:09 +01:00
Marcin Maliszkiewicz
a93ad3838f audit: cql: remove create_no_audit_info
We don't need a special guard value, it's
only being filled for batch statements for
which we can simply ignore the value.

Not having special value allows us to return
fast when audit is not enabled.
2026-01-26 10:18:38 +01:00
Piotr Dulikowski
3ec4f67407 Merge 'vector_index: Implement rescoring' from Szymon Malewski
This series implements rescoring algorithm.

Index options allowing to enable this functionality were introduced in earlier PR https://github.com/scylladb/scylladb/pull/28165.

When Vector Index has enabled quantization, Vector Store uses reduced vector representation to save memory, but it may degrade correctness of ANN queries. For quantized index we can enable rescoring algorithm, which recalculates similarity score from full vector representation stored in Scylla and reorder returned result set.
It works also with oversampling - we fetch more candidates from Vector Store, rescore them at Scylla and return only requested number of results.

Example:

Creating a Vector Index with Rescoring

```sql
-- Create a table with a vector column
CREATE TABLE ks.products (
    id int PRIMARY KEY,
    embedding vector<float, 128>
);

-- Create a vector index with rescoring enabled
CREATE INDEX products_embedding_idx ON ks.products (embedding)
    USING 'vector_index'
    WITH OPTIONS = {
        'similarity_function': 'cosine',
        'quantization': 'i8',
        'oversampling': '2.0',
        'rescoring': 'true'
    };
```

1. **Quantization** (`i8`) compresses vectors in the index, reducing memory usage but introducing precision loss in distance calculations
2. **Oversampling** (`2.0`) retrieves 2× more candidates than requested from the vector store (e.g., `LIMIT 10` fetches 20 candidates)
3. **Rescoring** (`true`) recalculates similarity scores using full-precision (`f32`) vectors from the base table and re-ranks results

Query example:

```sql
-- Find 10 most similar products
SELECT id, similarity_cosine(embedding, [0.1, 0.2, ...]) AS score
FROM ks.products
ORDER BY embedding ANN OF [0.1, 0.2, ...]
LIMIT 10;
```

With rescoring enabled, the query:
1. Fetches 20 candidates from the quantized index (due to oversampling=2.0)
2. Reads full-precision embeddings from the base table
3. Recalculates similarity scores with full precision
4. Re-ranks and returns the top 10 results

In this implementation we use CQL similarity function implementation to calculate new score values and use them in post query ordering. We add that column manually to selection, but it has to be removed from the final response.

Follow-up https://github.com/scylladb/scylladb/pull/28165
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-83

New feature - doesn't need backport.

Closes scylladb/scylladb#27769

* github.com:scylladb/scylladb:
  vector_index: rescoring: Fetch oversampled rows
  vector_index: rescoring: Sort by similarity column
  select_statement: Modify `needs_post_query_ordering` condition
  vector_index: rescoring: Add hidden similarity score column
  vector_index: Refactor extracting ANN query information
2026-01-23 15:20:10 +01:00