The execution of SELECT statements with ANN ordering (vector search) was
previously implemented within `indexed_table_select_statement`. This was
not ideal, as vector search logic is independent of secondary index selects.
This resulted in unnecessary complexity because vector search queries don't
use features like aggregates or paging. More importantly,
`indexed_table_select_statement` assumed a non-null `view_schema` pointer,
which doesn't hold for vector indexes (where `view_ptr` is null).
This caused null pointer dereferences during ANN ordered selects, leading
to crashes (VECTOR-179). Other parts of the class still dereference
`view_schema` without null checks.
Moving the vector search select logic out of
`indexed_table_select_statement` simplifies the code and prevents these
null pointer dereferences.
Today, any source file or header file that wants to use the
tri_mode_restriction type needs to include db/config.hh, which is a
large and frequently-changing header file. In this patch we split this
type into a separate header file, db/tri_mode_restriction.hh, and avoid
a few unnecessary inclusions of db/config.hh. However, a few source
files now need to explicitly include db/config.hh, after its
transitive inclusion is gone.
Note that the overwhelmingly common inclusion of db/config.hh continues
to be a problem after this patch - 128 source files include it directly.
So this patch is just the first step in long journey.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25692
Add parsing of `ANN OF` queries to the `select_statement` and
`indexed_table_select_statement` classes.
Add a placeholder for the implementation of external ANN queries.
Rename `should_create_view` to `view_should_exist` as it is used
not only to check if the view should be created but also if
the view has been created.
Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
SELECT DISTINCT requires all the partition key columns, which means that
setting PER PARTITION LIMIT is redundant - only one result will be
returned from every partition anyway.
Cassandra behaves the same way, so this patch also ensures
compatibility.
Fixesscylladb/scylladb#15109Closesscylladb/scylladb#22950
Integrates audit functionality into CQL statement processing to enable tracking of database operations. Key changes:
- Add audit_info and statement_category to all CQL statements
- Implement audit categories for different statement types:
- DDL: Schema altering statements (CREATE/ALTER/DROP)
- DML: Data manipulation (INSERT/UPDATE/DELETE/TRUNCATE/USE)
- DCL: Access control (GRANT/REVOKE/CREATE ROLE)
- QUERY: SELECT statements
- ADMIN: Service level operations
- Add audit inspection points in query processing:
- Before statement execution
- After access checks
- After statement completion
- On execution failures
- Add password sanitization for role management statements
- Mask plaintext passwords in audit logs
- Handle both direct password parameters and options maps
- Preserve query structure while hiding sensitive data
- Modify prepared statement lifecycle to carry audit context
- Pass audit info during statement preparation
- Track audit info through statement execution
- Support batch statement auditing
This change enables comprehensive auditing of CQL operations while ensuring sensitive data is properly masked in audit logs.
SELECT * FROM MUTATION_FRAGMENTS($table) is a new select statement
sub-type. More information will be provided in the patch which introduces
it. This patch adds only the Cql.g changes and what is further strictly
necessary.
Change one more layer of processing to work on prepared
rather than raw selectors. This moves the call to prepare
the selectors early in select_statement processing. In turn
this changes maybe_jsonize_select_clause() and forward_service's
mock_selection() to work in the prepared realm as well.
This moves us one step closer to using evaluate() to process
the select clause, as the prepared selectors are now available
in select_statement. We can't use them yet since we can't evaluate
aggregations.
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.
Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.
The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
In preparation of the relaxation of the grammar to return any expression,
change the whereClause production to return an expression rather than
terms. Note that the expression is still constrained to be a conjunction
of relations, and our filtering code isn't prepared for more.
Before the patch, if the WHERE clause was optional, the grammar would
pass an empty vector of expressions (which is exactly correct). After
the patch, it would pass a default-constructed expression. Now that
happens to be an empty conjunction, which is exactly what's needed, but
it is too accidental, so the patch changes optional WHERE clauses to
explicitly generate an empty conjunction if the WHERE clause wasn't
specified.
In order to expose the API for deleting ghost rows from a view,
a CQL statement is created. It is loosely based on select_statement,
as its first step is to select view table rows.
Right now is_json is used to decide if the statement needs to be treated
in a special way. For two types (regular statement and JSON statement),
a boolean is enough, but this series extends it for two more types,
so the flag is converted to an enum.
Functionality of the relation class has been replaced by
expr::to_restriction.
Relation and all classes deriving from it can now be removed.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Parser used to output the where clause as a vector of relations,
but now we can change it to a vector of expressions.
Cql.g needs to be modified to output expressions instead
of relations.
The WHERE clause is kept in a few places in the code that
need to be changed to vector<expression>.
Finally relation->to_restriction is replaced by expr::to_restriction
and the expressions are converted to restrictions where required.
The relation class isn't used anywhere now and can be removed.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.
Closes#10562
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.
data_dictionary::database::real_database() is called from several
places, for these reasons:
- calling yet-to-be-converted code
- callers with a legitimate need to access data (e.g. system_keyspace)
but with the ::database accessor removed from query_processor.
We'll need to find another way to supply system_keyspace with
data access.
- to gain access to the wasm engine for testing whether used
defined functions compile. We'll have to find another way to
do this as well.
The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.
Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
Reorganized the code that handles column ordering (ASC or DESC).
I feel that it's now clearer and easier to understand.
Added an enum that describes column ordering.
It has two possible values: ascending or descending.
It used to be a bool that was sometimes called 'reversed',
which could mean multiple things.
Instead of column.type->is_reversed() != <ordering bool>
there is now a function called are_column_select_results_reversed.
Split checking if ordering is reversed and verifying whether it's correct into two functions.
Before all of this was done by is_reversed()
This is a preparation to later allow skipping ORDER BY restrictions on some columns.
Adding this to the existing code caused it to get quite complex,
but this new version is better suited for the task.
The diff is a bit messy because I moved all ordering functions to one place,
it's better to read select_statement.cc lines 1495-1651 directly.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Return the pre- 6773563d3 behavior of demanding ALLOW FILTERING when partition slice is requested but on potentially unlimited number of partitions. Put it on a flag defaulting to "off" for now.
Fixes#7608; see comments there for justification.
Tests: unit (debug, dev), dtest (cql_additional_test, paging_test)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#9126
* github.com:scylladb/scylla:
cql3: Demand ALLOW FILTERING for unlimited, sliced partitions
cql3: Track warnings in prepared_statement
test: Use ALLOW FILTERING more strictly
cql3: Add statement_restrictions::to_string
When a query requests a partition slice but doesn't limit the number
of partitions, require that it also says ALLOW FILTERING. Although
do_filter() isn't invoked for such queries, the performance can still
be unexpectedly slow, and we want to signal that to the user by
demanding they explicitly say ALLOW FILTERING.
Because we now reject queries that worked fine before, existing
applications can break. Therefore, the behavior is controlled by a
flag currently defaulting to off. We will default to "on" in the next
Scylla version.
Fixes#7608; see comments there for justification.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
The class is repurposed to be more generic and also be able
to hold additional metadata related to function calls within
a CQL statement. Rename all methods appropriately.
Visitor functions in AST nodes (`collect_marker_specification`)
are also renamed to a more generic `fill_prepare_context`.
The name `prepare_context` designates that this metadata
structure is a byproduct of `stmt::raw::prepare()` call and
is needed only for "prepare" step of query execution.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
First of all, select statement is extended with an 'attrs' field,
which keeps the per-query attributes. Currently, only TIMEOUT
parameter is legal to use, since TIMESTAMP and TTL bear no meaning
for reads.
Secondly, if TIMEOUT attribute is set, it will be used as the effective
timeout for a particular query.
* Pass raw::select_statement::parameters as lw_shared_ptr
* Some more const cleanups here and there
* lists,maps,sets::equals now accept const-ref to *_type_impl
instead of shared_ptr
* Remove unused `get_column_for_condition` from modification_statement.hh
* More methods now accept const-refs instead of shared_ptr
Every call site where a shared_ptr was required as an argument
has been inspected to be sure that no dangling references are
possible.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200220153204.279940-1-pa.solodovnikov@scylladb.com>
De-pointerize cql3 code APIs further: change some call sites
to pass `schema` as const-ref instead of `shared_ptr`.
Affected functions known to be expecting always non-null
pointer to schema and don't store or pass the pointer somewhere
else, assuming it's safe to give them just a reference.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200218142338.69824-1-pa.solodovnikov@scylladb.com>
Convert some more helper functions to accept const reference to
column_specification and column_identifier instead of shared_ptr.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
cql3 has cql_statement, parsed_statement and prepared_statement
classes, which, largely, stand for the same thing. prepared was
an alias for prepared_statement which only required an extra
tag jump in IDE and carried no meaning.
`parsed_statement::get_bound_variables` is assumed to always
return a nonnull pointer to `variable_specifications` instance.
In this case using a pointer is superfluous and can be safely
replaced by a plain reference.
Also add a default ctor and a utility method `set_bound_variables`
to the `variable_specifications` class to actually reset the
contents of the class instance.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200120195839.164296-1-pa.solodovnikov@scylladb.com>
Instances of `variable_specifications` are passed around as
shared_ptr's, which are redundant in this case since the class
is marked as `final`. Use `lw_shared_ptr` instead since we know
for sure it's not a polymorphic pointer.
Tests: unit(debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>
Extend the grammar file with GROUP BY, collect the column identifiers,
and store them in raw::select_statement.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
The BYPASS CACHE clause instructs the database not to read from or populate the
cache for this query. The new keywords (BYPASS and CACHE) are not reserved.
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
When a query that needs filtering is executed, the columns
that the coordinator is filtering by have to be retrieved.The
columns should be retrieved even if they are not used for
ordering or named in the actual select clause.
If the columns are missing from the result set, then any
filtering that restricts the missing column will not take
place.
Fixes#3803
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Enables 'ALLOW FILTERING' queries by transfering control
to result_set_builder::filtering_visitor.
Both regular and primary key columns are allowed,
but some things are left unimplemented:
- multi-column restrictions
- CONTAINS queries
Fixes#2025
This commit adds the implementation of SELECT JSON clause
which returns rows in JSON format. Each returned row has a single
'[json]' column.
References #2058
Use seastar::checked_ptr<weak_ptr<pepared_statement>> instead of shared_ptr for passing prepared statements around.
This allows an easy tracking and handling of statements invalidation.
This implementation will throw an exception every time an invalidated
statement reference is dereferenced.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
This patch propagates the for_view argument, used by
statement_restrictions to ensure IS NOT NULL can be used when creating
a materialized view.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>