Commit Graph

124 Commits

Author SHA1 Message Date
Piotr Dulikowski
bb6e41f97a index: allow vector indexes without rf_rack_valid_keyspces
The rf_rack_valid_keyspaces option needs to be turned on in order to
allow creating materialized views in tablet keyspaces with numeric RF
per DC. This is also necessary for secondary indexes because they use
materialized views underneath. However, this option is _not_ necessary
for vector store indexes because those use the external vector store
service for querying the list of keys to fetch from the main table, they
do not create a materialized view. The rf_rack_valid_keyspaces was, by
accident, required for vector indexes, too.

Remove the restriction for vector store indexes as it is completely
unnecessary.

Fixes: SCYLLADB-81

Closes scylladb/scylladb#27447
2025-12-05 09:26:26 +02:00
Amnon Heiman
68c7236acb vector_index: require tablets for vector indexes
This patch enforces that vector indexes can only be created on keyspaces
that use tablets. During index validation, `check_uses_tablets()` verifies
the base keyspace configuration and rejects creation otherwise.

To support this, the `custom_index::validate()` API now receives a
`const data_dictionary::database&` parameter, allowing index
implementations to access keyspace-level settings during DDL validation.

Fixes https://scylladb.atlassian.net/browse/VECTOR-322

Closes scylladb/scylladb#26786
2025-11-26 13:30:43 +02:00
Michael Litvak
868ac42a8b tombstone_gc: don't use 'repair' mode for colocated tables
For tables of special types that can be located: MV, CDC, and paxos
table, we should not use tombstone_gc=repair mode because colocated
tablets are never repaired, hence they will not have repair_time set and
will never be GC'd using 'repair' mode.
2025-11-25 09:15:46 +01:00
Michał Hudobski
46589bc64c secondary_index: disallow multiple vector indexes on the same column
We currently allow creating multiple vector indexes on one column.
This doesn't make much sense as we do not support picking one when
making ann queries.

To make this less confusing and to make our behavior similar
to Cassandra we disallow the creation of multiple vector indexes
on one column.

We also add a test that checks this behavior.

Fixes: VECTOR-254
Fixes: #26672

Closes scylladb/scylladb#26508
2025-10-29 11:55:38 +02:00
Dawid Mędrek
7e201eea1a index: Set tombstone_gc when creating secondary index
Before this commit, when the underlying materialized view was created,
it didn't have the property `tombstone_gc` set to any value. That
was a bug and we fix it now.

Two reproducer tests is added for validation. They reproduce the problem
and don't pass before this commit.

Fixes scylladb/scylladb#26542
2025-10-20 14:04:45 +02:00
Dawid Mędrek
e294b80615 index: Make create_view_for_index method of create_index_statement 2025-10-20 14:04:16 +02:00
Dawid Mędrek
fe00485491 index: Move code for creating MV of secondary index to cql3
We move the code responsible for creating the schema for the underlying
materialized view of a secondary index from `index/` to `cql3/` so that
it's close to that responsible for performing `CREATE INDEX`. That's in
line with how other CQL statements are designed.

Note that the moved method is still a method of `secondary_index_manager`.
We'll make it a method of `create_index_statement` in the following
commit.
2025-10-20 14:04:11 +02:00
Dawid Mędrek
20761b5f13 db, cql3: Move creation of underlying MV for index
The main goal of this patch is to give more control over the creation
of the underlying view on an index to `create_index_statement.cc`.
That goal is in line with how the other statements are executed:
the schema is built in the cql3 module and only the ready schema_ptr
is passed further. That should also make the code cleaner and easier
to understand.

There are a few important things to note here:

* A call to `service::prepare_new_view_announcement` appears out of nowhere.
  Aside from some validation checks and logging, that function does pretty
  much the same as the pre-existing code we remove:

  a. It creates Raft mutations based on the passed `view_ptr`.
  b. It creates Raft mutations responsible for view building tasks.
  c. It notifies about a new column family.

* We seemingly get rid of the code that creates view building tasks. That's not
  true: we still do that via `service::prepare_new_view_announcement`.

That should explain why the change doesn't remove any relevant logic.
On the other hand, it might be more difficult to explain why moving the
code is correct. I'll touch on it below.

Before that, it may also be important to highlight that this commit only
affects the logic responsible for creating an index. There should be no
effect on any other part of how Scylla behaves.

---

Proving the correctness of the solution would take quite a lot of space,
so I'll only summarize it. It relies on a few things:

1. Two schema changes cannot happen in one operation. We allow for more
   but only when those changes are dependent on each other and when
   the additional ones are internal for Scylla, e.g. creating an index
   leads to creating the underlying materialized view.
2. There are no entities or components that rely on indexes.
3. Each index is uniquely defined by the keyspace it belongs to
   and the name of the index.
4. There is a bijection between rows in `system_schema.indexes`
   and the currently existing indexes.
5. The name of an unnamed index depends on the name of the base table
   and the names of the indexed columns. The name of an unnamed index
   may have a number attached to it, but that number only depends on
   the state of the schema at the time of creation of the index, and
   it never changes later on. There are no other things the name of
   an unnamed index depends on.
6. Scylla doesn't allow for changing any column in the base table
   that has an index depending on it.

Based on that, we conclude that every existing index has exactly one
entry in `system_schema.indexes`, and the primary key of that entry
never changes.

The columns of `system_schema.indexes` that are not part of the primary
key are: `kind` and `options`. Both values are only decided at the time
of creation of an index, and currently there's no way to modify them.

That implies that there are only two events when an entry in the system
table can change: when creating an index and when dropping an index.

---

When we consider the previous place of the logic that this commit moves
to `cql3/statements/create_index_statement.cc`, it works like this:

1. We compare the sets of indexes defined on a specific table
   (in the form of a structure called `index_metadata`) before and
   after an operation.
2. We divide the entries into three sets: those present in both sets
   and those present in only one of them.
3. We handle each of those three sets separately.

The structure `index_metadata` is a reflection of entries in
`system_schema.indexes`. It stores one more parameter -- `local` --
but its value depends on the other values of an entry, so we can ignore
it in this reasoning.

Because an index cannot be modified -- it can only be created or dropped
-- there are at most two non-empty sets: the set of new indexes and the
set of dropped indexes. Those sets are only non-empty during an operation
like `CREATE INDEX`, `DROP INDEX`, `DROP TABLE (base table)`,
`DROP KEYSPACE`. Note that it's impossible to drop an index by dropping
the underlying materialized view -- Scylla doesn't allow for that.

However, the code in `migration_manager.cc` we call
(`prepare_column_family_update_announcement`) and the code that we call
in `schema_tables.cc` (`make_update_table_mutations`) is only triggered
by *updates* related to the base table. In the context of `DROP TABLE`
or `DROP KEYSPACE`, we'd call `prepare_column_family_drop_announcement`
instead. In other words, we're only concerned with `CREATE INDEX` and
`DROP INDEX`.

---

A conclusion from this reasoning is that we only need to consider those
two situations when talking about correctness of this change. The impact
of this commit is that we may have potentially reordered mutations in the
resulting vector that will be applied to the Raft log.

The only mutations we may have reordered are the mutations responsible for
creating the underlying view and the mutations responsible for updating
columns in the base table. It's clear then that this commit brings no change
at all: we only give `cql3/statements/create_index_statement.cc` more
control over creating the underlying view.

---

We leave a remnant of the code in `db/schema_tables.cc` responsible
for dropping an index along with its underlying view. It would require
changing a bit more of the logic, and we don't need it for the rest
of this sequence of changes.

Refs scylladb/scylladb#16454
2025-10-20 14:04:06 +02:00
Dawid Mędrek
a1254fb6f3 db/view: Name requirement for views with tablets
We add a named requirement, a function, for materialized views with tablets.
It decides whether we can create views and secondary indexes in a given
keyspace. It's a stepping stone towards modifying the requirements for it.

This way, we keep the code in one place, so it's not possible to forget
to modify it somewhere. It also makes it more organized and concise.
2025-09-29 13:07:08 +02:00
Dawid Pawlik
909a51e524 vector_index, index_prop_defs: add version to index options
Since creating the vector index does not lead to creation
of a view table [#24438] (whose version info had been logged in
`system_schema.scylla_tables`) we lack the information about
the version of the index.

The mentioned version is used to recognize the quick-drop-create
index with the same parameters that needs to be rebuild.

The case is mainly experienced while testing, benchmarking
or experimenting with Vector Search.
Nevertheless it is important to have it considered, as it is really
weird having seen that DROP and CREATE commands did not change
anything.
Although being nice "optimization" to use the same old index,
the rebuild feels more natural for the get-to-know-VS-users.
Should not change anything in a real production environment.

The solution we arrived at is to add the version as a field in
options column of `system_schema.indexes`.
The version of vector index is a base table's schema version
on which the index was created.
The table's schema version changes everytime a table is changed
meaning that CREATE INDEX or DROP INDEX statement also change it.
Every index has a different index version, so it allows to identify
them easily.

This patch implements the solution described above.
2025-09-10 15:16:54 +02:00
Dawid Pawlik
a70086c781 create_index_statement: rename validator to custom_index_factory
The change is motivated by the fact that indeed the result of
`get_custom_class_factory` is a `custom_index_factory`.
The name `validator` was a bit misleading as it does not validate
anything by itself. Furthermore if we wanted to use the custom index
produced by the factory in other operations than validate, the name
feels really off.
2025-08-29 10:49:15 +02:00
Benny Halevy
3feb759943 everywhere: use utils::chunked_vector for list of mutations
Currently, we use std::vector<*mutation> to keep
a list of mutations for processing.
This can lead to large allocation, e.g. when the vector
size is a function of the number of tables.

Use a chunked vector instead to prevent oversized allocations.

`perf-simple-query --smp 1` results obtained for fixed 400MHz frequency
and PGO disabled:

Before (read path):
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...

89055.97 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39417 insns/op,   18003 cycles/op,        0 errors)
103372.72 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39380 insns/op,   17300 cycles/op,        0 errors)
98942.27 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39413 insns/op,   17336 cycles/op,        0 errors)
103752.93 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39407 insns/op,   17252 cycles/op,        0 errors)
102516.77 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39403 insns/op,   17288 cycles/op,        0 errors)
throughput:
	mean=   99528.13 standard-deviation=6155.71
	median= 102516.77 median-absolute-deviation=3844.59
	maximum=103752.93 minimum=89055.97
instructions_per_op:
	mean=   39403.99 standard-deviation=14.25
	median= 39406.75 median-absolute-deviation=9.30
	maximum=39416.63 minimum=39380.39
cpu_cycles_per_op:
	mean=   17435.81 standard-deviation=318.24
	median= 17300.40 median-absolute-deviation=147.59
	maximum=18002.53 minimum=17251.75
```

After (read path)
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
59755.04 tps ( 66.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39466 insns/op,   22834 cycles/op,        0 errors)
71854.16 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39417 insns/op,   17883 cycles/op,        0 errors)
82149.45 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39411 insns/op,   17409 cycles/op,        0 errors)
49640.04 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.3 tasks/op,   39474 insns/op,   19975 cycles/op,        0 errors)
54963.22 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.3 tasks/op,   39474 insns/op,   18235 cycles/op,        0 errors)
throughput:
	mean=   63672.38 standard-deviation=13195.12
	median= 59755.04 median-absolute-deviation=8709.16
	maximum=82149.45 minimum=49640.04
instructions_per_op:
	mean=   39448.38 standard-deviation=31.60
	median= 39466.17 median-absolute-deviation=25.75
	maximum=39474.12 minimum=39411.42
cpu_cycles_per_op:
	mean=   19267.01 standard-deviation=2217.03
	median= 18234.80 median-absolute-deviation=1384.25
	maximum=22834.26 minimum=17408.67
```

`perf-simple-query --smp 1 --write` results obtained for fixed 400MHz frequency
and PGO disabled:

Before (write path):
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no}
Disabling auto compaction
63736.96 tps ( 59.4 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   49667 insns/op,   19924 cycles/op,        0 errors)
64109.41 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   49992 insns/op,   20084 cycles/op,        0 errors)
56950.47 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50005 insns/op,   20501 cycles/op,        0 errors)
44858.42 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50014 insns/op,   21947 cycles/op,        0 errors)
28592.87 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50027 insns/op,   27659 cycles/op,        0 errors)
throughput:
	mean=   51649.63 standard-deviation=15059.74
	median= 56950.47 median-absolute-deviation=12087.33
	maximum=64109.41 minimum=28592.87
instructions_per_op:
	mean=   49941.18 standard-deviation=153.76
	median= 50005.24 median-absolute-deviation=73.01
	maximum=50027.07 minimum=49667.05
cpu_cycles_per_op:
	mean=   22023.01 standard-deviation=3249.92
	median= 20500.74 median-absolute-deviation=1938.76
	maximum=27658.75 minimum=19924.32
```

After (write path)
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no}
Disabling auto compaction
53395.93 tps ( 59.4 allocs/op,  16.5 logallocs/op,  14.3 tasks/op,   50326 insns/op,   21252 cycles/op,        0 errors)
46527.83 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50704 insns/op,   21555 cycles/op,        0 errors)
55846.30 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50731 insns/op,   21060 cycles/op,        0 errors)
55669.30 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50735 insns/op,   21521 cycles/op,        0 errors)
52130.17 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50757 insns/op,   21334 cycles/op,        0 errors)
throughput:
	mean=   52713.91 standard-deviation=3795.38
	median= 53395.93 median-absolute-deviation=2955.40
	maximum=55846.30 minimum=46527.83
instructions_per_op:
	mean=   50650.57 standard-deviation=182.46
	median= 50731.38 median-absolute-deviation=84.09
	maximum=50756.62 minimum=50325.87
cpu_cycles_per_op:
	mean=   21344.42 standard-deviation=202.86
	median= 21334.00 median-absolute-deviation=176.37
	maximum=21554.61 minimum=21060.24
```

Fixes #24815

Improvement for rare corner cases. No backport required

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#24919
2025-07-13 19:13:11 +03:00
Michał Hudobski
3ab643a5de vector_index: add custom index and vector index classes
In this patch we add an abstract class, "custom_index", with a validate() method.
Each CUSTOM INDEX class needs to implement a concrete subclass of custom_index
which is used to validate if this type of custom index class may be used,
and whether the optional parameters passed to it are valid.

We change the existing CUSTOM INDEX validation code to use this new mechanism.

Finally this patch implements one concrete subclass for vector index.
Before this patch, the custom index type "vector_index" was allowed,
but after this patch it gains more validation of its optional parameters
(we support 4 specific parameters, with some rules on their values).
Of course, the vector index isn't actually implemented in this patch,
we are just improving the validation of the index creation statement.
2025-05-27 21:04:50 +02:00
Michał Hudobski
05daa8dded index: support storing metadata for custom indices
Added function returning custom index class name.
Added printing custom index class name when using DESCRIBE.
Changed validation to reflect current support of indices.
2025-05-14 09:32:00 +02:00
Alexander Turetskiy
47011ab830 Materialized view name length should be limited
Oversized materialized view and index names are rejected;
Materialized view names with invalid symbols are rejected.

fixes: #20755

Closes scylladb/scylladb#21746
2025-02-11 22:16:09 +02:00
Wojciech Mitros
677f9962cf mv: forbid views with tablets by default
Materialized views with tablets are not stable yet, but we want
them available as an experimental feature, mainly for teseting.

The feature was added in scylladb/scylladb#21833,
but currently it has no effect. All tests have been updated to use the
feature, so we should finally make it work.
This patch prevents users from creating materialized views in keyspaces
using tablets when the VIEWS_WITH_TABLETS feature is not enabled - such
requests will now get rejected.

Fixes scylladb/scylladb#21832

Closes scylladb/scylladb#22217
2025-01-30 12:10:47 +01:00
Paweł Zakrzewski
98f5e49ea8 audit: Add support to CQL statements
Integrates audit functionality into CQL statement processing to enable tracking of database operations. Key changes:

- Add audit_info and statement_category to all CQL statements
- Implement audit categories for different statement types:
  - DDL: Schema altering statements (CREATE/ALTER/DROP)
  - DML: Data manipulation (INSERT/UPDATE/DELETE/TRUNCATE/USE)
  - DCL: Access control (GRANT/REVOKE/CREATE ROLE)
  - QUERY: SELECT statements
  - ADMIN: Service level operations

- Add audit inspection points in query processing:
  - Before statement execution
  - After access checks
  - After statement completion
  - On execution failures

- Add password sanitization for role management statements
  - Mask plaintext passwords in audit logs
  - Handle both direct password parameters and options maps
  - Preserve query structure while hiding sensitive data

- Modify prepared statement lifecycle to carry audit context
  - Pass audit info during statement preparation
  - Track audit info through statement execution
  - Support batch statement auditing

This change enables comprehensive auditing of CQL operations while ensuring sensitive data is properly masked in audit logs.
2025-01-15 11:10:36 +01:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
bab12e3a98 treewide: migrate from boost::adaptors::transformed to std::views::transform
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.

in this change, we:

- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
  is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
  `std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
  range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
  returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
  by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
  to feed `std::views::transform()` a view range.

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

limitations:

there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21700
2024-12-03 09:41:32 +02:00
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Kefu Chai
720997d1de cql3/statements: mark format string as constexpr const
after switching over to the new `seastar::format()` which enables
the compile-time format check, the fmt string should be a constexpr,
otherwise `fmt::format()` is not able to perform the check at compile
time.

to prepare for bumping up the seastar module to a version which
contains the change of `seastar::format()`, let's mark the format
string with `constexpr const`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20484
2024-09-09 08:35:45 +03:00
Kefu Chai
ad649be1bf treewide: drop thrift support
thrift support was deprecated since ScyllaDB 5.2

> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.

so let's drop it. in this change,

* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is
  preserved for backward compatibility, as we could load
  from an existing system.local table which still contains
  this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for
  backward compatibility with java-based nodetool.
* `rpc_port` and `start_rpc` options are preserved, but
  they are marked as "Unused". so that the new release
  of scylladb can consume existing scylla.yaml configurations
  which might contain these settings. by making them
  deprecated, user will be able get warned, and update
  their configurations before we actually remove them
  in the next major release.

Fixes #3811
Fixes #18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 06:44:59 +08:00
Michał Jadwiszczak
7dc0d068c0 cql3/statements: pass query_options to prepare_schema_mutations()
The object is needed to get timestamp from attributes (in a case when
the statement was prepared with parameter marker).
2024-04-25 21:27:40 +02:00
Botond Dénes
1f7b03672c cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
Scylla's schema tables code determines which index was added, by diffing
index definitions with previous ones. This is clunky to use in
tools/schema_loader.cc, so also return the index metadata for the newly
created index.
2024-01-04 01:32:10 -05:00
Botond Dénes
039d41f5d4 cql3/statements/create_index_statement: relax some method's dependence on qp
The methods `validate_while_excuting()` and its only caller,
`build_index_schema()`, only use the query processor to get db from it.
So replace qp parameter with db one, relaxing requirements w.r.t.
callers.
2024-01-04 01:32:10 -05:00
Patryk Jędrzejczak
ffc3c1302e cql3: schema_altering_statement::prepare_schema_mutations: remove an unused parameter
After changing the prepare_ methods of migration_manager to
functions, the migration_manager& parameter of
schema_altering_statement::prepare_schema_mutations has been
unused by all classes inheriting from schema_altering_statement.
2023-08-01 10:07:31 +02:00
Patryk Jędrzejczak
3468cbd66b service: migration_manager: change the prepare_ methods to functions
The migration_manager service is responsible for schema convergence
in the cluster - pushing schema changes to other nodes and pulling
schema when a version mismatch is observed. However, there is also
a part of migration_manager that doesn't really belong there -
creating mutations for schema updates. These are the functions with
prepare_ prefix. They don't modify any state and don't exchange any
messages. They only need to read the local database.

We take these functions out of migration_manager and make them
separate functions to reduce the dependency of other modules
(especially query_processor and CQL statements) on
migration_manager. Since all of these functions only need access
to storage_proxy (or even only replica::database), doing such a
refactor is not complicated. We just have to add one parameter,
either storage_proxy or database and both of them are easily
accessible in the places where these functions are called.
2023-07-28 13:55:27 +02:00
Gleb Natapov
4bad482e4b cql3: move validation::validate_column_family from client_state::has_column_family_access
Checking keyspace/table presence should not be part of authorization code
and it is not done consistently today.  For instance keyspace presence
is not checked in "alter keyspace" during authorization, but during
statement execution. Make it consistent.
2023-06-22 13:57:36 +03:00
Kamil Braun
2606c190af cql3: statements: pass migration_manager& explicitly to prepare_schema_mutations
We want to stop relying on `qp.get_migration_manager()`, so we can make
the function private in the future. This in turn is a prerequisite for
splitting `query_processor` initialization into two phases, where the
first phase will only allow local queries (and won't require
`migration_manager`).
2023-06-15 09:48:54 +02:00
Jan Ciolek
a8cc5ed491 schema_altering_statement: return warnings from prepare_schema_mutations()
Validation of a CREATE MATERIALIZED VIEW statement takes place inside
the prepare_schema_mutations() method.
I would like to generate warnings during this validation, but there's
currently no way to pass them.

Let's add one more return value - a vector of CQL warnings generated
during the execution of this statement.

A new alias is added to make it clear what the function is returning:
```c++
// A vector of CQL warnings generated during execution of a statement.
using cql_warnings_vec = std::vector<sstring>;
```

Later the warnings will be sent to the user by the function
schema_altering_statment::execute(), which is the only caller
of prepare_schema_mutations().

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-07 02:30:07 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Piotr Dulikowski
cc3af3190d create_index_statement: lift restriction on secondary indexes on static rows
Secondary indexes on static columns should work now. This commit lifts
the existing restriction after the cluster is fully upgraded to a
version which supports such indexes.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
9f14f0ac09 create_index_statement: disallow creation of local indexes with static columns
Local indexes on static columns don't make sense because there is only
one static row per partition. It's always better to just run SELECT
DISTINCT on the base table. Allowing for such an index would only make
such queries slower (due to double lookup), would take unnecessary space
and could pose potential consistency problems, so this commit explicitly
forbids them.
2022-12-06 11:21:16 +01:00
Avi Kivity
e25bf62871 cql3: create_index_statement: remove unused lambda
throw_exception is defined but not used.
2022-11-28 21:58:30 +02:00
Nadav Har'El
8b00c91c13 cql, index: make collection indexing a cluster feature
Prevent a user from creating a secondary index on a collection column if
the cluster has any nodes which don't support this feature. Such nodes
will not be able to correctly handle requests related to this index,
so better not allow creating one.

Attempting to create an index on a collection before the entire cluster
supports this feature will result in the error:

    Indexing of collection columns not supported by some older nodes
    in this cluster. Please upgrade them.

Tested by manually disabling this feature in feature_service.cc and
seeing this error message during collection indexing test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
56204a3794 cql, index: improve error messages
Before this patch, trying to create an index on entries(x) where x is
not a map results in an error message:

  Cannot create index on index_keys_and_values of column x

The string "index_keys_and_values" is strange - Cassandra prints the
easier to understand string "entries()" - which better corresponds to
what the user actually did.

It turns out that this string "index_keys_and_values" comes from an
elaborate set of variables and functions spanning multiple source files,
used to convert our internal target_type variable into such a string.
But although this code was called "index_option" and sounded very
important, it was actually used just for one thing - error messages!

So in this patch we drop the entire "index_option" abstraction,
replacing it by a static trivial function defined exactly where
it's used (create_index_statement.cc), which prints a target type.
While at it, we print "entries()" instead of "index_keys_and_values" ;-)

After this patch, the
test_secondary_index.py::test_index_collection_wrong_type

finally passes (the previous patch fixed the default table names it
assumes, and this patch fixes the expected error messages), so its
"xfail" tag is removed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
84461f1827 cql, index: fix default index name for collection index
When creating an index "CREATE INDEX ON tbl(keys(m))", the default name
of the index should be tbl_m_idx - with just "m". The current code
incorrectly used the default name tbl_m_keys_idx, so this patch adds
a test (which passes on Cassandra, and after this patch also on Scylla)
and fixes the default name.

It turns out that the default index name was based on a mysterious
index_target::as_string(), which printed the target "keys(m)" as
"m_keys" without explaining why it was so. This method was actually
used only in three places, and all of them wanted just the column
name, without the "_keys" suffix! So in this patch we rename the
mysterious as_string() to column_name(), and use this function instead.

Now that the default index name uses column_name() and gets just
column_name(), the correct default index name is generated, and the
test passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Michał Radwański
cbe33f8d7a cql3/statements/: validate CREATE INDEX for index over a collection
Allow CQL like this:
CREATE INDEX idx ON table(some_map);
CREATE INDEX idx ON table(KEYS(some_map));
CREATE INDEX idx ON table(VALUES(some_map));
CREATE INDEX idx ON table(ENTRIES(some_map));
CREATE INDEX idx ON table(some_set);
CREATE INDEX idx ON table(VALUES(some_set));
CREATE INDEX idx ON table(some_list);
CREATE INDEX idx ON table(VALUES(some_list));

This is needed to support creating indexes on collections.
2022-08-14 10:29:13 +03:00
Michał Radwański
997682ed72 cql3/statements/create_index_statement,index_target: rewrite index target for collection
The syntax used for creating indexes on collections that is present in
Cassandra is unintuitive from the internal representation point of view.

For instance, index on VALUES(some_set) indexes the set elements, which
in the internal representation are keys of collection. Rewrite the index
target after receiving it, so that the index targets are consistent with
the representation.
2022-08-14 10:29:13 +03:00
Michał Radwański
166afd46b5 Cql.g, treewide: support cql syntax INDEX ON table(VALUES(collection))
Brings support of cql syntax `INDEX ON table(VALUES(collection))`, even
though there is still no support for indexes over collections.
Previously, index_target::target_type::values was refering to values of
a regular (non-collection) column. Rename it to `regular_values`.

Fixes #8745.
2022-08-14 10:29:13 +03:00
Avi Kivity
5937b1fa23 treewide: remove empty comments in top-of-files
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.

Closes #10562
2022-05-13 07:11:58 +02:00
Kamil Braun
283ac7fefe treewide: pass mutation timestamp from call sites into migration_manager::prepare_* functions
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
2022-01-24 15:12:50 +01:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Pavel Emelyanov
00de5f4876 validation: Make validate_column_family use data_dictionary::database
And instantly convert the validate_keyspace() as it's not called
from anywhere but the validate_column_family().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-01-14 13:00:53 +03:00
Pavel Emelyanov
b6bc7a9b29 client_state: Make has_column_family_access use data_dictionary::database
Straightforward replacement. Internals of the has_column_family_access()
temporarily get .real_database(), but it will be changed soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-01-14 12:55:15 +03:00
Pavel Emelyanov
d32de22ee8 cql3: Get data dictionary directly from query_processor
After previous patches there's a whole bunch of places that do

  qp.proxy().data_dictionary()

while the data_dictionary is present on the query processor itself
and there's a public method to get one. So use it everywhere.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 11:28:44 +03:00
Pavel Emelyanov
3d373597eb cql3: Make .validate_while_executing() accept query_processor
The schema_altering_statement declares this pure virtual method. This
patch changes its first argument from proxy into query processor and
fixes what compiler errors about.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 10:54:27 +03:00
Pavel Emelyanov
b990ca5550 cql3: Make .validate() and .check_access() accept query_processor
This is mostly a sed script that replaces methods' first argument
plus fixes of compiler-generated errors.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 10:53:44 +03:00
Avi Kivity
d768e9fac5 cql3, related: switch to data_dictionary
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.

data_dictionary::database::real_database() is called from several
places, for these reasons:

 - calling yet-to-be-converted code
 - callers with a legitimate need to access data (e.g. system_keyspace)
   but with the ::database accessor removed from query_processor.
   We'll need to find another way to supply system_keyspace with
   data access.
 - to gain access to the wasm engine for testing whether used
   defined functions compile. We'll have to find another way to
   do this as well.

The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.

Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
2021-12-15 13:54:23 +02:00