before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
after switching over to the new `seastar::format()` which enables
the compile-time format check, the fmt string should be a constexpr,
otherwise `fmt::format()` is not able to perform the check at compile
time.
to prepare for bumping up the seastar module to a version which
contains the change of `seastar::format()`, let's mark the format
string with `constexpr const`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20484
thrift support was deprecated since ScyllaDB 5.2
> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.
so let's drop it. in this change,
* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is
preserved for backward compatibility, as we could load
from an existing system.local table which still contains
this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for
backward compatibility with java-based nodetool.
* `rpc_port` and `start_rpc` options are preserved, but
they are marked as "Unused". so that the new release
of scylladb can consume existing scylla.yaml configurations
which might contain these settings. by making them
deprecated, user will be able get warned, and update
their configurations before we actually remove them
in the next major release.
Fixes#3811Fixes#18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Scylla's schema tables code determines which index was added, by diffing
index definitions with previous ones. This is clunky to use in
tools/schema_loader.cc, so also return the index metadata for the newly
created index.
The methods `validate_while_excuting()` and its only caller,
`build_index_schema()`, only use the query processor to get db from it.
So replace qp parameter with db one, relaxing requirements w.r.t.
callers.
After changing the prepare_ methods of migration_manager to
functions, the migration_manager& parameter of
schema_altering_statement::prepare_schema_mutations has been
unused by all classes inheriting from schema_altering_statement.
The migration_manager service is responsible for schema convergence
in the cluster - pushing schema changes to other nodes and pulling
schema when a version mismatch is observed. However, there is also
a part of migration_manager that doesn't really belong there -
creating mutations for schema updates. These are the functions with
prepare_ prefix. They don't modify any state and don't exchange any
messages. They only need to read the local database.
We take these functions out of migration_manager and make them
separate functions to reduce the dependency of other modules
(especially query_processor and CQL statements) on
migration_manager. Since all of these functions only need access
to storage_proxy (or even only replica::database), doing such a
refactor is not complicated. We just have to add one parameter,
either storage_proxy or database and both of them are easily
accessible in the places where these functions are called.
Checking keyspace/table presence should not be part of authorization code
and it is not done consistently today. For instance keyspace presence
is not checked in "alter keyspace" during authorization, but during
statement execution. Make it consistent.
We want to stop relying on `qp.get_migration_manager()`, so we can make
the function private in the future. This in turn is a prerequisite for
splitting `query_processor` initialization into two phases, where the
first phase will only allow local queries (and won't require
`migration_manager`).
Validation of a CREATE MATERIALIZED VIEW statement takes place inside
the prepare_schema_mutations() method.
I would like to generate warnings during this validation, but there's
currently no way to pass them.
Let's add one more return value - a vector of CQL warnings generated
during the execution of this statement.
A new alias is added to make it clear what the function is returning:
```c++
// A vector of CQL warnings generated during execution of a statement.
using cql_warnings_vec = std::vector<sstring>;
```
Later the warnings will be sent to the user by the function
schema_altering_statment::execute(), which is the only caller
of prepare_schema_mutations().
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
Secondary indexes on static columns should work now. This commit lifts
the existing restriction after the cluster is fully upgraded to a
version which supports such indexes.
Local indexes on static columns don't make sense because there is only
one static row per partition. It's always better to just run SELECT
DISTINCT on the base table. Allowing for such an index would only make
such queries slower (due to double lookup), would take unnecessary space
and could pose potential consistency problems, so this commit explicitly
forbids them.
Prevent a user from creating a secondary index on a collection column if
the cluster has any nodes which don't support this feature. Such nodes
will not be able to correctly handle requests related to this index,
so better not allow creating one.
Attempting to create an index on a collection before the entire cluster
supports this feature will result in the error:
Indexing of collection columns not supported by some older nodes
in this cluster. Please upgrade them.
Tested by manually disabling this feature in feature_service.cc and
seeing this error message during collection indexing test.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Before this patch, trying to create an index on entries(x) where x is
not a map results in an error message:
Cannot create index on index_keys_and_values of column x
The string "index_keys_and_values" is strange - Cassandra prints the
easier to understand string "entries()" - which better corresponds to
what the user actually did.
It turns out that this string "index_keys_and_values" comes from an
elaborate set of variables and functions spanning multiple source files,
used to convert our internal target_type variable into such a string.
But although this code was called "index_option" and sounded very
important, it was actually used just for one thing - error messages!
So in this patch we drop the entire "index_option" abstraction,
replacing it by a static trivial function defined exactly where
it's used (create_index_statement.cc), which prints a target type.
While at it, we print "entries()" instead of "index_keys_and_values" ;-)
After this patch, the
test_secondary_index.py::test_index_collection_wrong_type
finally passes (the previous patch fixed the default table names it
assumes, and this patch fixes the expected error messages), so its
"xfail" tag is removed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When creating an index "CREATE INDEX ON tbl(keys(m))", the default name
of the index should be tbl_m_idx - with just "m". The current code
incorrectly used the default name tbl_m_keys_idx, so this patch adds
a test (which passes on Cassandra, and after this patch also on Scylla)
and fixes the default name.
It turns out that the default index name was based on a mysterious
index_target::as_string(), which printed the target "keys(m)" as
"m_keys" without explaining why it was so. This method was actually
used only in three places, and all of them wanted just the column
name, without the "_keys" suffix! So in this patch we rename the
mysterious as_string() to column_name(), and use this function instead.
Now that the default index name uses column_name() and gets just
column_name(), the correct default index name is generated, and the
test passes.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Allow CQL like this:
CREATE INDEX idx ON table(some_map);
CREATE INDEX idx ON table(KEYS(some_map));
CREATE INDEX idx ON table(VALUES(some_map));
CREATE INDEX idx ON table(ENTRIES(some_map));
CREATE INDEX idx ON table(some_set);
CREATE INDEX idx ON table(VALUES(some_set));
CREATE INDEX idx ON table(some_list);
CREATE INDEX idx ON table(VALUES(some_list));
This is needed to support creating indexes on collections.
The syntax used for creating indexes on collections that is present in
Cassandra is unintuitive from the internal representation point of view.
For instance, index on VALUES(some_set) indexes the set elements, which
in the internal representation are keys of collection. Rewrite the index
target after receiving it, so that the index targets are consistent with
the representation.
Brings support of cql syntax `INDEX ON table(VALUES(collection))`, even
though there is still no support for indexes over collections.
Previously, index_target::target_type::values was refering to values of
a regular (non-collection) column. Rename it to `regular_values`.
Fixes#8745.
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.
Closes#10562
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
And instantly convert the validate_keyspace() as it's not called
from anywhere but the validate_column_family().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Straightforward replacement. Internals of the has_column_family_access()
temporarily get .real_database(), but it will be changed soon.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After previous patches there's a whole bunch of places that do
qp.proxy().data_dictionary()
while the data_dictionary is present on the query processor itself
and there's a public method to get one. So use it everywhere.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The schema_altering_statement declares this pure virtual method. This
patch changes its first argument from proxy into query processor and
fixes what compiler errors about.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is mostly a sed script that replaces methods' first argument
plus fixes of compiler-generated errors.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.
data_dictionary::database::real_database() is called from several
places, for these reasons:
- calling yet-to-be-converted code
- callers with a legitimate need to access data (e.g. system_keyspace)
but with the ::database accessor removed from query_processor.
We'll need to find another way to supply system_keyspace with
data access.
- to gain access to the wasm engine for testing whether used
defined functions compile. We'll have to find another way to
do this as well.
The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.
Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
To be able to confine raft to the execution time of a statement we need to
move all schema access to the execution time as well. Since the
validation code access the schema lets run it during execution.
What should the following pair of statements do?
CREATE INDEX xyz ON tbl(a)
CREATE INDEX IF NOT EXISTS xyz ON tbl(b)
There are two reasonable choices:
1. An index with the name xyz already exists, so the second command should
do nothing, because of the "IF NOT EXISTS".
2. The index on tbl(b) does *not* yet exist, so the command should try to
create it. And when it can't (because the name xyz is already taken),
it should produce an error message.
Currently, Cassandra went with choice 1, and Scylla went with choice 2.
After some discussions on the mailing list, we agreed that Scylla's
choice is the better one and Cassandra's choice could be considered a
bug: The "IF NOT EXIST" feature is meant to allow idempotent creation of
an index - and not to make it easy to make mistakes without not noticing.
The second command listed above is most likely a mistake by the user,
not anything intentional: The command intended to ensure than an index
on column b exists, but after the silent success of the command, no such
index exists.
So this patch doesn't change any Scylla code (it just adds a comment),
and rather it adds a test which "enshrines" the current behavior.
The test passes on Scylla and fails on Cassandra so we tag it
"cassandra_bug", meaning that we consider this difference to be
intentional and we consider Cassandra's behavior in this case to be wrong.
Fixes#9182.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210811113906.2105644-1-nyh@scylladb.com>
The value of a frozen collection may only be indexed (using a secondary
index) in full - it is not allowed to index only the keys for example -
"CREATE INDEX idx ON table (keys(v))" is not allowed.
The error message referred to a frozen<map>, but the problem can happen
on any frozen collection (e.g., a frozen set), not just a frozen map,
so can be confusing to a user who used a frozen set, and getting an
error about a frozen map.
So this patch fixes the error message to refer to a "frozen collection".
Note that the Cassandra error message in this case is different - it
reads: "Frozen collections are immutable and must be fully indexed".
Fixes#8744.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529094056.825117-1-nyh@scylladb.com>
It is forbidden to create a secondary index of a column which includes in
any way the "duration" type. This includes a UDT which including duration.
Our code attempted to print in this case the message "Secondary indexes
are not supported on UDTs containing durations" - but because we tested
for tuples first, and UDTs are also tuples - we got the message about
tuples.
By changing the order of the tests, we get the most specific (and
useful) error message.
Fixes#8724.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210526201042.642550-1-nyh@scylladb.com>
The recent commit 0ef0a4c78d added helpful
error messages in case an index cannot be created because the intended
name of its materialized view is already taken - but accidentally broke
the "CREATE INDEX IF NOT EXISTS" feature.
The checking code was correct, but in the wrong place: we need to first
check maybe the index already exists and "IF NOT EXISTS" was chosen -
and only do this new error checking if this is not the case.
This patch also includes a cql-pytest test for reproducing this bug.
The bug is also reproduced by the translated Cassandra unit tests
cassandra_tests/validation/entities/secondary_index_test.py::
testCreateAndDropIndex
and this is how I found this bug. After these patch, all these tests
pass.
Fixes#8717.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210526143635.624398-1-nyh@scylladb.com>
When an index is created without an explicit name, a default name
is chosen. However, there was no check if a table with conflicting
name already exists. The check is now in place and if any conflicts
are found, a new index name is chosen instead.
When an index is created *with* an explicit name and a conflicting
regular table is found, index creation should simply fail.
This series comes with a test.
Fixes#8620
Tests: unit(release)
Closes#8632
* github.com:scylladb/scylla:
cql-pytest: add regression tests for index creation
cql3: fail to create an index if there is a name conflict
database: check for conflicting table names for indexes
When an index with an explicit name is created, it's underlying
materalized view's name is set to <index-name>_index.
If there already exists a regular table with such a name,
the creation should fail with a proper error message.
In order to avoid needless schema disagreements, a way of announcing
a schema change with fixed timestamp is added.
That way, when nodes update schemas of their internal tables (e.g.
during updates), it's possible for all nodes to use an identical
timestamp for this operation, which in turn makes their digests
identical.
After previous patches some places in cql3 code take a
long path to get database reference:
query processor -> storage proxy -> database
The query processor can provide the database reference
by itself, so take this chance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Most of the schema altering statements implementations can now
stop calling for global migration manager instance and get it
from the query processor.
Here are the trivial cases when the query processor is just
avaiable at the place where it's needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now when the only call to .announce_migration gas the
query processor at hands -- pass it to the real statements.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It looks like the history of the flag begins in Cassandra's
https://issues.apache.org/jira/browse/CASSANDRA-7327 where it is
introduced to speedup tests by not needing to start the gossiper.
The thing is we always start gossiper in our cql tests, so the flag only
introduce noise. And, of course, since we want to move schema to use raft
it goes against the nature of the raft to be able to apply modification only
locally, so we better get rid of the capability ASAP.
Tests: units(dev, debug)
Message-Id: <20201230111101.4037543-2-gleb@scylladb.com>