Commit Graph

530 Commits

Author SHA1 Message Date
Karol Nowacki
086c6992f5 vector_search: Fix ANN query abort on CQL timeout
When a CQL vector search request timed out, the underlying ANN query was
not aborted and continued to run. This happened because the abort source
was not being signaled upon request expiration.
This commit ensures the ANN query is aborted when the CQL request times out
preventing unnecessary resource consumption.
2025-12-02 01:17:01 +01:00
Michał Hudobski
7646dde25b select_statement: add a warning about unsupported paging for vs queries
Currently we do not support paging for vector search queries.
When we get such a query with paging enabled we ignore the paging
and return the entire result. This behavior can be confusing for users,
as there is no warning about paging not working with vector search.
This patch fixes that by adding a warning to the result of ANN queries
with paging enabled.

Closes scylladb/scylladb#26384
2025-11-13 18:47:05 +02:00
Karol Nowacki
9f1fd7f5a0 cql3: Rename indexed_table_select_statement
To align with `vector_indexed_table_select_statement`, this commit renames
`indexed_table_select_statement` to `view_indexed_table_select_statement`
to clarify its usage with materialized views.
2025-10-29 08:37:25 +01:00
Karol Nowacki
357c0a8218 cql3: Move vector search select to dedicated class
The execution of SELECT statements with ANN ordering (vector search) was
previously implemented within `indexed_table_select_statement`. This was
not ideal, as vector search logic is independent of secondary index selects.

This resulted in unnecessary complexity because vector search queries don't
use features like aggregates or paging. More importantly,
`indexed_table_select_statement` assumed a non-null `view_schema` pointer,
which doesn't hold for vector indexes (where `view_ptr` is null).
This caused null pointer dereferences during ANN ordered selects, leading
to crashes (VECTOR-179). Other parts of the class still dereference
`view_schema` without null checks.

Moving the vector search select logic out of
`indexed_table_select_statement` simplifies the code and prevents these
null pointer dereferences.
2025-10-29 08:37:21 +01:00
Michał Hudobski
541b52cdbf cql: fail with a better error when null vector is passed to ann query
Currently when a null vector is passed to an ANN query we fail with a
quite confusing error ("NoHostAvailable: ('Unable to complete the
operation against any hosts', {<Host: 127.0.0.1:9042 datacenter1>:
<Error from server: code=0000 [Server error] message="to_bytes() called
on raw value that is null">})").

This patch fixes that by throwing an InvalidRequestException with an
appropriate message instead.
We also add a test case that validates this behavior.

Fixes: VECTOR-257

Closes scylladb/scylladb#26510
2025-10-27 16:09:08 +02:00
Michał Hudobski
5c957e83cb vector_search: remove dependence on cql3
This patch removes the dependence of vector search module
on the cql3 module by moving the contents of cql3/type_json.hh
to types/json_utils.hh and removing the usage of cql3 primary_key
object in vector_store_client. We also make the needed adjustments
to files that were previously using the afformentioned type_json.hh
file.

This fixes the circular dependency cql3 <-> vector_search.

Closes scylladb/scylladb#26482
2025-10-21 17:41:55 +03:00
Nadav Har'El
eb06ace944 Merge 'auth: implement vector store authorization' from Michał Hudobski
This patch implements the changes required by the Vector Store authorization, as described in https://scylladb.atlassian.net/wiki/spaces/RND/pages/107085899/Vector+Store+Authentication+And+Authorization+To+ScyllaDB, that is:

- adding a new permission VECTOR_SEARCH_INDEXING, grantable only on ALL KEYSPACES
- allowing users with that permission to perform SELECT queries, but only on tables with a vector index
- increasing the number of scheduling groups by one to allow users to create a service level for a vector store user
- adjusting the tests and documentation

These changes are needed, as the vector indexes are managed by the external service, Vector Store, which needs to read the tables to create the indexes in its memory. We would like to limit the privileges of that service to a minimum to maintain the principle of least privilege, therefore a new permission, one that allows the SELECTs conditional on the existence of a vector_index on the table.

Fixes: VECTOR-201

Backport reasoning:
Backport to 2025.4 required as this can make upgrading clusters more difficult if we add it in 2026.1. As for now Scylla Cloud requires version 2025.4 to enable vector search and permission is set by orchestrator so there is no chance that someone will try to add this permission during upgrade. In 2026.1 it will be more difficult.

Closes scylladb/scylladb#25976

* github.com:scylladb/scylladb:
  docs: adjust docs for VS auth changes
  test: add tests for VECTOR_SEARCH_INDEXING permission
  cql: allow VECTOR_SEARCH_INDEXING users to select
  auth: add possibilty to check for any permission in set
  auth: add a new permission VECTOR_SEARCH_INDEXING
2025-10-20 17:32:00 +03:00
Nadav Har'El
921d07a26b cql: make SELECT's "internal page size" configurable
In some uses of SELECT, such as aggregation (sum() et al.), GROUP BY or
secondary index, it needs to perform internal scans. It uses an "internal
page size" which before this patch was always DEFAULT_COUNT_PAGE_SIZE = 10000.

There was an ad-hoc and undocumented way to override this default in C++
tests, using functions in test/lib/select_statement_utils.hh, but it
was so non-obvious that the test that most needed to override this
default - the very slow test test_indexing_paging_and_aggregation which
would have been must faster with a lower setting - never used it.

So in this patch we replace the ad-hoc configuration functions by a
bona-fide Scylla configuration option named "select_internal_page_size".

The few C++ tests that used the old configuration functions were
modified to use the new configuration parameters. The slow test
test_indexing_paging_and_aggregation still doesn't use the new
configuration to become faster - we'll do this in the next patch.

Another benefit of having this "internal page size" as a configuration
option is that one day a user might realize that the default choice
10,000 is bad for some reason (which I can't envision right now), so
having it configurable might come it handy.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-10-15 18:42:09 +03:00
Michał Hudobski
6a69bd770a cql: allow VECTOR_SEARCH_INDEXING users to select
This patch allows users with the VECTOR_SEARCH_INDEXING permission
to perform SELECT queries on tables that have a vector index.
This is needed for the Vector Store service, which
reads the vector-indexed tables, but does not require
the full SELECT permission.
2025-10-03 16:55:57 +02:00
Szymon Wasik
ccfe80ab97 cql3: Update error messages to be in line with documentation.
ANN (aproximate nearest neighborhood) is just the name of the type
of algorithm used to perform vector search. For this reason the error
messages should refer to vector queries rather than ANN queries.
2025-09-26 17:01:10 +02:00
Karol Nowacki
eae71d3e91 vector_store_client: Move to vector_search module
Vector search related implementation moved to a new module vector_search.
As the vector search functionality is going to be extended, it is
better to keep it in a separate module.
2025-09-22 08:01:47 +02:00
Michał Hudobski
1690e5265a vector search: correct column name formatting
This patch corrects the column name formatting whenever
an "Undefined column name" exception is thrown.
Until now we used the `name()` function which
returns a bytes object. This resulted in a message
with a garbled ascii bytes column name instead of
a proper string. We switch to the `text()` function
that returns a sstring instead, making the message
readable.
Tests are adjusted to confirm this behavior.

Fixes: VECTOR-228

Closes scylladb/scylladb#26120
2025-09-20 07:02:53 +02:00
Nadav Har'El
e322902506 Merge 'index, metrics: add per-index metrics' from Michał Hudobski
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.

First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1

Fixes: https://github.com/scylladb/scylladb/issues/25970

Closes scylladb/scylladb#25995

* github.com:scylladb/scylladb:
  test: verify that the index metric is added
  index, metrics: add per-index metrics
2025-09-17 14:54:12 +03:00
Ernest Zaslavsky
d624413ddd treewide: Move query related files to a new query directory
As requested in #22120, moved the files and fixed other includes and build system.

Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh

Fixes: #22120

This is a cleanup, no need to backport

Closes scylladb/scylladb#25105
2025-09-16 23:40:47 +03:00
Michał Hudobski
b09d1f0a98 index, metrics: add per-index metrics
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.

First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1
2025-09-16 14:03:43 +02:00
Nadav Har'El
5307d1b9a8 Merge 'vector_index: add version to index options' from Dawid Pawlik
Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lacked the information about the version of the index.

The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`.
It requires few changes and seems unintruitive for existing infrastructure.

This patch implements the solution described above.

Refs: VECTOR-142

Closes scylladb/scylladb#25614

* github.com:scylladb/scylladb:
  cqlpy/test_vector_index: add vector index version test
  vector_index, index_prop_defs: add version to index options
  create_index_statement: rename `validator` to `custom_index_factory`
  custom index: rename `custom_index_option_name`
  vector_index: rename `supported_options` to `vector_index_options`
2025-09-14 15:35:53 +03:00
Dawid Pawlik
be54346846 select_statement: check for access to CDC base table
Before the patch, user with CREATE access could create a table
with CDC or alter the table enabling CDC, but could not query
a SELECT on the CDC table they created.
It was due to the fact, the SELECT permission was checked on
the CDC log, and later it's "parent" - the keyspace,
but not thebase table, on which the user had SELECT permission
automatically granted on CREATE.

This patch matches the behaviour of querying the CDC log
to the one implemented for Materialized Views:
    1. No new permissions are granted on CREATE.
    2. When querying SELECT, the permissions on base table
SELECT are checked.

Fixes: #19798
2025-09-03 13:20:39 +02:00
Karol Nowacki
3086d15999 cql3: Fix crash on ANN OF query when TRACING ON is enabled
Executing a vector search (SELECT with ANN OF ordering) query with `TRACING ON` enabled
caused a node to crash due to a null pointer dereference.

This occurred because a vector index does not have an associated view
table, making its `_view_schema` member null. The implementation
attempted to enable tracing on this null view schema, leading to the
crash.

The fix adds a null check for `_view_schema` before attempting to
enable tracing on the view (index) table.

A regression test is included to prevent this from happening again.

Fixes: VECTOR-179

Closes scylladb/scylladb#25500
2025-09-01 17:26:54 +03:00
Dawid Pawlik
873d7dba5c custom index: rename custom_index_option_name
Renamed `custom_index_option_name` to `custom_class_option_name`
as the late was a bit misleading since we refactored our model
of custom indexes to be index class reliant.
2025-08-29 10:49:15 +02:00
Nadav Har'El
0a990d2a48 config: split tri_mode_restriction to a separate header
Today, any source file or header file that wants to use the
tri_mode_restriction type needs to include db/config.hh, which is a
large and frequently-changing header file. In this patch we split this
type into a separate header file, db/tri_mode_restriction.hh, and avoid
a few unnecessary inclusions of db/config.hh. However, a few source
files now need to explicitly include db/config.hh, after its
transitive inclusion is gone.

Note that the overwhelmingly common inclusion of db/config.hh continues
to be a problem after this patch - 128 source files include it directly.
So this patch is just the first step in long journey.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25692
2025-08-27 13:47:04 +03:00
Avi Kivity
8164f72f6e Merge 'Separate local_effective_replication_map from vnode_effective_replication_map' from Benny Halevy
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.

However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.

Refs #22733

* No backport required

Closes scylladb/scylladb#25222

* github.com:scylladb/scylladb:
  locator: abstract_replication_strategy: implement local_replication_strategy
  locator: vnode_effective_replication_map: convert clone_data_gently to clone_gently
  locator: abstract_replication_map: rename make_effective_replication_map
  locator: abstract_replication_map: rename calculate_effective_replication_map
  replica: database: keyspace: rename {create,update}_effective_replication_map
  locator: effective_replication_map_factory: rename create_effective_replication_map
  locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al
  locator: abstract_replication_strategy: rename global_vnode_effective_replication_map
  keyspace: rename get_vnode_effective_replication_map
  dht: range_streamer: use naked e_r_m pointers
  storage_service: use naked e_r_m pointers
  alternator: ttl: use naked e_r_m pointers
  locator: abstract_replication_strategy: define is_local
2025-08-07 12:51:43 +03:00
Benny Halevy
ec85678de1 locator: abstract_replication_strategy: define is_local
Prefer for specializing the local replication strategy,
local effective replication map, et. al byt defining
an is_local() predicate, similar to uses_tablets().

Note that is_vnode_based() still applies to local replication
strategy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:34:23 +03:00
Jan Łakomy
447c66f4ec cql3/statements: implement external ANN OF queries
Implement execution of `ANN OF` queries using the vector_store service.

Throw invalid_request_exception with specific message using
the ann_error_visitor when ANN request returns no result.

Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
Co-authored-by: Michał Hudobski <michal.hudobski@scylladb.com>
2025-08-05 12:34:48 +02:00
Jan Łakomy
5fecad0ec8 cql3/statements: add ANN OF queries support to select statements
Add parsing of `ANN OF` queries to the `select_statement` and
`indexed_table_select_statement` classes.
Add a placeholder for the implementation of external ANN queries.

Rename `should_create_view` to `view_should_exist` as it is used
not only to check if the view should be created but also if
the view has been created.

Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
2025-08-01 12:08:50 +02:00
Petr Gusev
ff1caa9798 query_options: add node_local_only mode
We want to access the paxos state table only on the local node and
shard (or shards in case of intranode_migration). In this commit we
add a node_local_only flag to query_options, which allows to do that.
This flag can be set for a query via make_internal_options.

We handle this flag on the statements layer by forwarding it to
either coordinator_query_options or coordinator_mutate_options.
2025-07-24 19:48:08 +02:00
Pawel Pery
5bfce5290e cql3: refactor primary_key as a top-level class
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.

There is a need for forward declaration of primary_key class. This patch
moves a nested definition of select_statement::primary_key (from a
cql3::statements namespace) into a standalone class in a
cql3::statements namespace.

Reference: VS-47
2025-07-09 11:54:51 +02:00
Petr Gusev
3d262d2be8 LWT: create cas_shard in select_statement
In this commit we create cas_shard in select_statement
and pass it to the sp::query_result function.
2025-06-30 10:37:33 +02:00
Avi Kivity
f195c05b0d untyped_result_set: mark get_blob() as returning unfragmented data
Blobs can be large, and unfragmented blobs can easily exceed 128k
(as seen in #23903). Rename get_blob() to get_blob_unfragmented()
to warn users.

Note that most uses are fine as the blobs are really short strings.

Closes scylladb/scylladb#24102
2025-05-26 09:40:34 +02:00
Marcin Maliszkiewicz
e3f2ebd4fb cql3: remove not needed cmd copy in indexed_table_select_statement
It's not used variable. There should be a tiny perf increase as
it saves allocation.

Closes scylladb/scylladb#23473
2025-03-31 09:34:32 +03:00
Botond Dénes
39bcf99f8e Merge 'Apply hard limit to partition range vectors in secondary index queries' from Nikos Dragazis
Secondary index queries fetch partition keys from the index view and store them in an `std::vector`. The vector size is currently limited by the user's page size and the page memory limit (1MiB). These are not enough to prevent large contiguous allocations (which can lead to stalls).

This series introduces a hard limit to the vector size to ensure it does not exceed the allocator's preferred max contiguous allocation size (128KiB). With the size of each element being 120 bytes, this allows for 1092 partition keys. The limit was set to 1000. Any partitions above this limit are discarded.

Discarding partitions breaks the querier cache on the replicas, causing a performance regression, as can be seen from the following measurements:
```
* Cluster: 3 nodes (local Docker containers), 1 vCPU, 4GB memory, dev mode
* Schema:
  CREATE KEYSPACE ks WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '3'} AND durable_writes = true AND tablets = {'enabled': false};
  CREATE TABLE ks.t1 (pk1 int, pk2 int, ck int, value int, PRIMARY KEY ((pk1, pk2), ck));
  CREATE INDEX t1_pk2_idx ON ks.t1(pk2);
* Query: CONSISTENCY LOCAL_QUORUM; SELECT * FROM ks.t1 where pk2 = 1;

+------------+-------------------+-------------------+
|  Page Size |      Master       |   Vector Limit    |
+============+===================+===================+
|            |   Latency (sec)   |   Latency (sec)   |
+------------+-------------------+-------------------+
|     100    |  5.80 ± 0.13      |  5.64 ± 0.10      |
+------------+-------------------+-------------------+
|    1000    |  4.77 ± 0.07      |  4.62 ± 0.06      |
+------------+-------------------+-------------------+
|    2000    |  4.67 ± 0.07      |  5.13 ± 0.03      |
+------------+-------------------+-------------------+
|    5000    |  4.82 ± 0.09      |  6.25 ± 0.06      |
+------------+-------------------+-------------------+
|   10000    |  4.89 ± 0.36      |  7.52 ± 0.13      |
+------------+-------------------+-------------------+
|     -1     |  4.90 ± 0.67      |  4.79 ± 0.33      |
+------------+-------------------+-------------------+
```
We expect this to be fixed with adaptive paging in a future PR. Until then, users can avoid regressions by adjusting their page size.

Additionally, this series changes the `untyped_result_set` to store rows in a `chunked_vector` instead of an `std::vector`, similarly to the `result_set`. Secondary index queries use an `untyped_result_set` to store the raw result from the index view before processing. With 1MiB results, the `std::vector` would cause a large allocation of this magnitude.

Finally, a unit test is added to reproduce the bug.

Fixes #18536.

The PR fixes stalls of up to 100ms, but there is an easy workaround: adjust the page size. No need to backport.

Closes scylladb/scylladb#22682

* github.com:scylladb/scylladb:
  cql3: secondary index: Limit page size for single-row partitions
  cql3: secondary index: Limit the size of partition range vectors
  cql3: untyped_result_set: Store rows in chunked_vector
  test: Reproduce bug with large allocations from secondary index
2025-03-14 15:06:07 +02:00
Paweł Zakrzewski
d483051e44 cql3/select_statement: reject aggregate functions when PER PARTITION LIMIT is present
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
While using aggregate functions in conjunction with PER PARTITION LIMIT
can make sense, we want to disable it until we can offer proper
implementation, see #9879 for discussion.

We want to match Cassandra, and for queries with aggregate functions it
behaves as follows:
- it silently ignores PER PARTITION LIMIT if GROUP BY is present, which
  matches our previous implementation.
- rejects PER PARTITION LIMIT when GROUP BY is *not* present.

This patch adds rejection of the second group.

Fixes #9879

Closes scylladb/scylladb#23086
2025-03-13 10:29:53 +02:00
Nikos Dragazis
7a6a4f54a5 cql3: secondary index: Limit page size for single-row partitions
The size of the partition range vector was constrained in the previous
patch. Any rows beyond the vector's capacity are discarded.

In the special case of single-row partitions, we know the size of each
partition, so we can enforce this limit on the query itself via the page
size.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-03-10 12:18:49 +02:00
Nikos Dragazis
76b31a3acc cql3: secondary index: Limit the size of partition range vectors
The partition range vector is an std::vector, which means it performs
contiguous allocations. Large allocations are known to cause problems
(e.g., reactor stalls).

For paged queries, limit the vector size to 1000. If more partition keys
are available in the query result, discard them. Ideally, we should not
be fetching them at all, but this is not possible without knowing the
size of each partition.

Currently, each vector element is 120 bytes and the standard allocator's
max preferred contiguous allocation is 128KiB. Therefore, the chosen
value of 1000 satisfies the constraint (128 KiB / 120 = 1092 > 1000).
This should be good enough for most cases. Since secondary index queries
involve one base table query per partition key, these queries are slow.
A higher limit would only make them slower and increase the probability
of a timeout. For the same reason, saving a follow-up paged request from
the client would not increase the efficiency much.

For unpaged queries, do not apply any limit. This means they remain
susceptible to stalls, but unpaged queries are considered unoptimized
anyway.

Finally, update the unit test reproducer since the bug is now fixed.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-03-10 12:18:42 +02:00
Paweł Zakrzewski
9e7f79d1ab cql3/select_statement: require LIMIT and PER PARTITION LIMIT to be strictly positive
LIMIT and PER PARTITION LIMIT limit the number of rows returned or taken
into consideration by a query. It makes no logical sense to have this
value at less than 1. Cassandra also has this requirement.

This patch ensures that the limit value is strictly positive and adds
an explicit test for it - it was only tested in a test ported from
Cassandra, that is disabled due to other issues.

Closes scylladb/scylladb#23013
2025-03-03 08:13:27 +02:00
Paweł Zakrzewski
854d2917a1 cql3/select_statement: reject PER PARTITION LIMIT with SELECT DISTINCT
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
SELECT DISTINCT requires all the partition key columns, which means that
setting PER PARTITION LIMIT is redundant - only one result will be
returned from every partition anyway.

Cassandra behaves the same way, so this patch also ensures
compatibility.

Fixes scylladb/scylladb#15109

Closes scylladb/scylladb#22950
2025-02-24 14:50:18 +02:00
Kefu Chai
fd52b0a3cc cql3: fix false-positive "used-after-move" warning in clang-tidy
`slice.is_reversed()` was falsely flagged as accessing moved data, since
the underlying enum_set remains valid after move. However, to improve code
clarity and silence the warning, now reference `command->slice` directly
instead, which is guaranteed to be valid as the move target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22971
2025-02-23 18:58:35 +02:00
Kefu Chai
727d5637ab cql3: remove redundant std::move() in select_statement.cc
GCC-14 correctly flagged unnecessary use of std::move() where copy elision applies:
```
    return std::move(paging_state_copy);
```
This error occurs in indexed_table_select_statement::generate_view_paging_state_from_base_query_results
at line 1122. The C++17 standard guarantees copy elision for returning local variables, making
std::move() redundant in this context and potentially hindering compiler optimizations.

Fixes build failure with GCC-14 which treats redundant moves as errors
with -Werror=redundant-move.

The error message looks like:
```
/usr/lib64/ccache/g++ -DDEVEL -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSCYLLA_ENABLE_PREEMPTION_SOURCE -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/Dev/seastar/gen/include -isystem /home/kefu/dev/scylladb/abseil -I/usr/include/p11-kit-1  -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unused-parameter -Wno-changes-meaning -Wno-ignored-attributes -Wno-dangling-pointer -Wno-array-bounds -Wno-narrowing -Wno-type-limits -ffile-prefix-map=/home/kefu/dev/scylladb/= -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -ffile-prefix-map=/home/kefu/dev/scylladb/build/=build -march=westmere -Wstack-usage=21504 -std=gnu++23 -Wno-maybe-uninitialized -Werror=unused-result -fstack-clash-protection -DSEASTAR_P2581R1 -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=19 -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DBOOST_PROGRAM_OPTIONS_NO_LIB -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_THREAD_NO_LIB -DBOOST_THREAD_DYN_LINK -DFMT_SHARED -MD -MT cql3/CMakeFiles/cql3.dir/Dev/statements/select_statement.cc.o -MF cql3/CMakeFiles/cql3.dir/Dev/statements/select_statement.cc.o.d -o cql3/CMakeFiles/cql3.dir/Dev/statements/select_statement.cc.o -c /home/kefu/dev/scylladb/cql3/statements/select_statement.cc
/home/kefu/dev/scylladb/cql3/statements/select_statement.cc: In member function ‘seastar::lw_shared_ptr<const service::pager::paging_state> cql3::statements::indexed_table_select_statement::generate_view_paging_state_from_base_query_results(seastar::lw_shared_ptr<const service::pager::paging_state>, const seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&, service::query_state&, const cql3::query_options&) const’:
/home/kefu/dev/scylladb/cql3/statements/select_statement.cc:1122:21: error: redundant move in return statement [-Werror=redundant-move]
 1122 |     return std::move(paging_state_copy);
      |            ~~~~~~~~~^~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22903
2025-02-18 21:12:58 +02:00
Piotr Dulikowski
6aa962f5f4 Merge 'Add audit subsystem for database operations' from Paweł Zakrzewski
Introduces a comprehensive audit system to track database operations for security
and compliance purposes. This change includes:

Core Components:
- New audit subsystem for logging database operations
- Service level integration for proper resource management
- CQL statement tracking with operation categories
- Login process integration for tenant management

Key Features:
- Configurable audit logging (syslog/table)
- Operation categorization (QUERY/DML/DDL/DCL/AUTH/ADMIN)
- Selective auditing by keyspace/table
- Password sanitization in audit logs
- Service level shares support (1-1000) for workload prioritization
- Proper lifecycle management and cleanup

I ran the dtests for audit (manually enabled) and they pass.
The in-repo tests pass.

Notably, there should be no non-whitespace changes between this and scylla-enterprise

Fixes scylladb/scylla-enterprise#4999

Closes scylladb/scylladb#22147

* github.com:scylladb/scylladb:
  audit: Add shares support to service level management
  audit: Add service level support to CQL login process
  audit: Add support to CQL statements
  audit: Integrate audit subsystem into Scylla main process
  audit: Add documentation for the audit subsystem
  audit: Add the audit subsystem
2025-01-17 13:14:55 +01:00
Gleb Natapov
83d15b8e32 cql3: report host id instead of ip in error during SELECT FROM MUTATION_FRAGMENTS query
We want to drop ip from the topology::node.
2025-01-16 16:37:07 +02:00
Paweł Zakrzewski
98f5e49ea8 audit: Add support to CQL statements
Integrates audit functionality into CQL statement processing to enable tracking of database operations. Key changes:

- Add audit_info and statement_category to all CQL statements
- Implement audit categories for different statement types:
  - DDL: Schema altering statements (CREATE/ALTER/DROP)
  - DML: Data manipulation (INSERT/UPDATE/DELETE/TRUNCATE/USE)
  - DCL: Access control (GRANT/REVOKE/CREATE ROLE)
  - QUERY: SELECT statements
  - ADMIN: Service level operations

- Add audit inspection points in query processing:
  - Before statement execution
  - After access checks
  - After statement completion
  - On execution failures

- Add password sanitization for role management statements
  - Mask plaintext passwords in audit logs
  - Handle both direct password parameters and options maps
  - Preserve query structure while hiding sensitive data

- Modify prepared statement lifecycle to carry audit context
  - Pass audit info during statement preparation
  - Track audit info through statement execution
  - Support batch statement auditing

This change enables comprehensive auditing of CQL operations while ensuring sensitive data is properly masked in audit logs.
2025-01-15 11:10:36 +01:00
Michael Litvak
5ef7afb968 cql3: allow SELECT of specific collection key
This adds to the grammar the option to SELECT a specific key in a
collection column using subscript syntax.

For example:
SELECT map['key'] FROM table
SELECT map['key1']['key2'] FROM table

The key can also be parameterized in a prepared query. For this we need
to pass the query options to result_set_builder where we process the
selectors.

Fixes scylladb/scylladb#7751
2024-12-30 17:05:20 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
bab12e3a98 treewide: migrate from boost::adaptors::transformed to std::views::transform
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.

in this change, we:

- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
  is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
  `std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
  range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
  returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
  by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
  to feed `std::views::transform()` a view range.

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

limitations:

there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21700
2024-12-03 09:41:32 +02:00
Botond Dénes
075ca6cc02 Merge 'cql3: respect PER PARTITION LIMIT for aggregate queries' from Paweł Zakrzewski
Currently, PER PARTITION LIMIT is not implemented for aggregates and queries can result in more rows than expected from the same partition.

Instrument the result_set_builder class so that it can enforce PER PARTITION LIMIT for aggregate queries, specifically:
- add per_partition_limit to the result_set_builder
- expose the number of input rows in the selector

result_set_builder gets two new functions handling partition start and end:
- accept_partition_end for notifying that a partition has been finished. This is also called when a page ends, so we cannot simply flush here, as a naive implementation could do.
- accept_new_partition, where we flush_selectors() if it's indeed a new partition (and not a continuation of the previous) and the query has a grouping: we don't want to flush on new partition in a query like SELECT COUNT(*) FROM foo;

Fixes #5363

Closes scylladb/scylladb#21125

* github.com:scylladb/scylladb:
  test: enable PER PARTIION LIMIT + GROUP BY tests
  cql3: respect PER PARTITION LIMIT for aggregates
  cql3: selection: count input rows in the selector
  cql3: selection: pass per partition limit to the result_set_builder
  cql3: show different messages for LIMIT and PER PARTITION LIMIT in get_limit
2024-11-20 09:54:28 +02:00
Paweł Zakrzewski
aea3c3851e cql3: selection: pass per partition limit to the result_set_builder
Aggregates require the limit to be applied from within the builder
class, so it needs to be passed to it.
2024-11-18 17:56:53 +01:00
Paweł Zakrzewski
cb1483037c cql3: show different messages for LIMIT and PER PARTITION LIMIT in get_limit
select_statement::get_limit is used to evaluate the LIMIT value for both
LIMIT and PER PARTITION LIMIT. This change fixes the error message for
incorrect values passed by the user.
2024-11-18 17:56:53 +01:00
Nadav Har'El
b778ce08a9 cql3: change sstring_view to std::string_view
Our "sstring_view" is an historic alias for the standard std::string_view.
The cql3/ directory used this old alias in a few of random places, let's
change them to use the standard type name.

Refs #4062.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-11-18 15:57:20 +02:00
Kefu Chai
59eb2ab119 treewide: s/boost::algorithm::any_of/std::ranges::any_of/
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::any_of`.

in this change, we replace `boost::algorithm::any_of` with
`std::ranges::any_of`

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-05 14:06:09 +08:00
Kefu Chai
f8bb1c64f1 treewide: s/boost::algorithm::all_of/std::ranges::all_of/
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::all_of`.

in this change, we replace `boost::algorithm::all_of` with
`std::ranges::all_of`

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-05 14:05:24 +08:00
Avi Kivity
820509026f schema: replace boost ranges with std ranges
To reduce dependency load, use std ranges instead of boost ranges.

The std::ranges::{lower,upper}_bound don't support heterogeneous lookup,
but a more natural solution is to use a projection to search for the name,
so we use that and the custom comparator is removed.

Many callers are converted as well due to poor interoperability between
boost ranges and std ranges.
2024-10-15 16:42:54 +03:00