before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
A dialect is a different way to interpret the same CQL statement.
Examples:
- how duplicate bind variable names are handled (later in this series)
- whether `column = NULL` in LWT can return true (as is now) or
whether it always returns NULL (as in SQL)
Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.
The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.
By making it a required argument, making sure the topology version is
pinned for the duration of the query. This is needed because mutation
dump queries bypass the storage proxy, where this pinning usually takes
place. So it has to be enforced here.
This change fixes#17237, fixes#5361 and fixes#5362 by passing the limit value down the call chain in cql3. A test is also added.
fixes#17237fixes#5361fixes#5362
The regression happened in 5.4 as we changed the way GROUP BY is processed in 432cb02 - to force aggregation when it is used. The LIMIT value was not passed to aggregations and thus we failed to adhere to it.
W want to backport this fix to 5.4 and 6.0 to have continuous correct results for the test case from #17237
This patch consists of 4 commits:
- fa4225ea0fac2057b7a9976f57dc06bcbd900cd4 - cql3: respect the user-defined page size in aggregate queries - a precondition for this patch to be implementable
- 8fbe69e74dca16ed8832d9a90489ca47ba271d0b - cql3/select_statement: simplify the get_limit function - the `do_get_limit()` function did a lot of legwork that should not be associated with it. This change makes it trivial and makes its callers do additional checks (for unset guards, or for an aggregate query)
- 162828194a2b88c22fbee335894ff045dcc943c9 - cql3: process LIMIT for GROUP BY queries - pass the limit value down the chain and make use of it. This is the actual fix to #17237
- b3dc6de6d6cda8f5c09b01463bb52f827a6a00b4 - test/cql-pytest: Add test for GROUP BY queries with LIMIT - tests
Closesscylladb/scylladb#18842
* github.com:scylladb/scylladb:
test/cql-pytest: Add test for GROUP BY queries with LIMIT
cql3: process LIMIT for GROUP BY queries
cql3/select_statement: simplify the get_limit function
cql3: respect the user-defined page size in aggregate queries
Simplify implementation and for clustering key ranges in native
reversed format, require a reversed table schema.
Trimming native reversed clustering key ranges requires a reversed
schema to be passed in. Thus, the reverse flag is no longer required
as it would always be set to false.
Use a reversed schema and a native reversed slice when constructing
a read_command and executing a reversed select statement.
Such a created read_command is passed further down to query_pagers::pager
and storage::proxy::query_result that transform it to the format
they accept/know, i.e. lagacy.
Currently LIMIT not passed to the query executor at all and it was just
an accident that it worked for the case referenced in #17237. This
change passes the limit value down the chain.
The get_limit() function performed tasks outside of its scope - for
example checked if the statement was an aggregate. This change moves the
onus of the check to the caller.
The comment in the code already states that we should use the
user-defined page size if it's provided. To avoid OOM conditions we'll
use the internally defined limit as the upper bound or if no page size
is provided.
This change lays ground work for fixing #5362 and is necessary to pass
the test introduced in #19392 once it is implemented.
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
forward_service is nondescriptive and misnamed, as it does more than
forward requests. It's a classic map/reduce algorithm (and in fact one
of its parameters is "reducer"), so name it accordingly.
The name "forward" leaked into the wire protocol for the messaging
service RPC isolation cookie, so it's kept there. It's also maintained
in the name of the logger (for "nodetool setlogginglevel") for
compatibility with tests.
Closesscylladb/scylladb#19444
Currently reads with WHERE clause which limits them to be
single-partition reads, are unnecessarily parallelized.
This commit checks this condition and the query doesn't use
forward_service in single-partition reads.
Instead, use shard_for_reads(). The justification is that:
1) In cas_shard(), we need to pick a single request coordinator.
shard_for_reads() gives that, which is equivalent to shard_of()
if there is no intra-node migration.
2) In paxos handler for prepare(), the shard we execute it on is
the shard from which we read, so shard_for_reads() is the one.
3) Updates of paxos state are separate CQL requests, and use their
own sharding.
4) Handler for learn is executing updates using calls to
storage_proxy::mutate_locally() which will use the right sharder for writes
However, the code is still not prepared for intra-node migration, and
possibly regular migration too in case of abandoned requests, because
the locking of paxos state assumes that the shard is static. That
would have to be fixed separately, e.g. by locking both shards
(shard_for_writes()) during migration, so that the set of locked
shards always intersects during migration and local serialization of
paxos state updates is achieved. I left FIXMEs for that.
When constructing a vector with partition key data, the size of that
vector is known beforehand
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18239
we should include used header, to avoid compilation failures like:
```
cql3/statements/select_statement.cc:229:79: error: no member named 'filter' in namespace 'std::ranges::views'
for (const auto& used_function : used_functions | std::ranges::views::filter(not_native)) {
~~~~~~~~~~~~~~~~~~~~^
1 error generated.`
```
if some of the included header drops its own `#include <optional>`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18145
We've observed errors during shutdown like the following:
```
ERROR 2023-12-26 17:36:17,413 [shard 0:main] raft - [088f01a3-a18b-4821-b027-9f49e55c1926] applier fiber stopped because of the error: std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)
INFO 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft_state_monitor_fiber aborted with raft::stopped_error (Raft instance is stopped)
ERROR 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft topology: failed to fence previous coordinator raft::stopped_error (Raft instance is stopped, reason: "background error, std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)")
```
some CQL statement execution was trying to use `forward_service` during
shutdown.
It turns out that the statement is in
`system_keyspace::load_topology_state`:
```
auto gen_rows = co_await execute_cql(
format("SELECT count(range_end) as cnt FROM {}.{} WHERE key = '{}' AND id = ?",
NAME, CDC_GENERATIONS_V3, cdc::CDC_GENERATIONS_V3_KEY),
gen_uuid);
```
It's querying a table in the `system` keyspace.
Pushing local table queries through `forward_service` doesn't make sense
as the data is not distributed. Excluding local tables from this logic
also fixes the shutdown error.
Fixesscylladb/scylladb#16570Closesscylladb/scylladb#16662
Tablets metadata is quite expensive to generate (each data_value is
an allocation), so an old driver (without support for tablets) will
generate huge amounts of such notifications. This commit adds a way
to negotiate generation of the notification: a new driver will ask
for them, and an old driver won't get them. It uses the
OPTIONS/SUPPORTED/STARTUP protocol described in native_protocol_v4.spec.
Closesscylladb/scylladb#16611
Commit 62458b8e4f introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs.
The code does not distinguish between the two so the permissions are also unintenionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions.
Fixes#16526
Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions.
Also added test that checks for regression on native functions execution and verified that it fails on authorization before
the fix and passes after the fix.
Closesscylladb/scylladb#16556
* github.com:scylladb/scylladb:
test.py: Add test for native functions permissions
select statement: verify EXECUTE permissions only for non native functions
Commit 62458b8e4f introduced the
enforcement of EXECUTE permissions of functions in cql select. However,
according to the reference in #12869, the permissions should be enforced
only on UDFs and UDAs.
The code does not distinguish between the two so the permissions are
also unintentionally enforced also on native function.
This commit introduce the distinction and only enforces the permissions
on non native functions.
Fixes#16526
Manually verified (before and after change) with the reproducer
supplied in #16526 and also with some the `min` and `max` native
functions.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Recently, the expression-rewrite effort changed the way that GROUP BY is
implemented. Usually GROUP BY involves an aggregation function (e.g., if
you want a separate SUM per partition). But there's also a query like
SELECT p, c1, c2, v FROM tbl GROUP BY p
This query is supposed to return one row - the *first* row in clustering
order - per group (in this case, partition). The expression rewrite
re-implemented this feature by introducing a new internal aggregator,
first(), which returns the first aggregated value. The above query is
rewritten into:
SELECT first(p), first(c1), first(c2), first(v) FROM tbl GROUP BY p
This case works correctly, and we even have a regression test for it.
But unfortunately the rewrite broke the following query:
SELECT * FROM tbl GROUP BY p
Note the "*" instead of the explicit list of columns.
In our implementation, a selection of "*" is looks like an empty
selection, and it didn't get the "first()" treatment and it remained
a "SELECT *" - and wrongly returned all rows instead of just the first
one in each partition. This was a regression - it worked correctly in
Scylla 5.2 (and also in Cassandra) - see the next patch for a
regression test.
In this patch we fix this regression. When there is a GROUP BY, the "*"
is rewritten to the appropriate list of all visible columns and then
gets the first() treatment, so it will return only the first row as
expected. The next patch will be a test that confirms the bug and its
fix.
Fixes#16531
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.
The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).
The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.
The main motivation of this PR is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.
The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.
Closesscylladb/scylladb#15695
* github.com:scylladb/scylladb:
view: remove unused `_backing_secondary_index`
schema_tables: turn view schema fixing code into a sanity check
schema_tables: make comment more precise
feature_service: make COMPUTED_COLUMNS feature unconditionally true
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.
The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).
The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.
The main motivation of this patch is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.
The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.
This query bypasses the usual read-path in storage-proxy and therefore
also misses the erm pinning done by storage-proxy. To avoid a vnode
being pulled from under its feet, do the erm pinning in the statement
itself.
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.
Fixes: #13942
Message-ID: <ZNsynXayKim2XAFr@scylladb.com>
This reverts commit 70b5360a73. It generates
a failure in group0_test .test_concurrent_group0_modifications in debug
mode with about 4% probability.
Fixes#15050
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.
Fixes: #13942
Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>
While in SQL DISTINCT applies to the result set, in CQL it applies
to the table being selected, and doesn't allow GROUP BY with clustering
keys. So reject the combination like Cassandra does.
While this is not an important issue to fix, it blocks un-xfailing
other issues, so I'm clearing it ahead of fixing those issues.
An issue is unmarked as xfail, and other xfails lose this issue
as a blocker.
Fixes#12479Closes#14970
Not wired in yet. SELECT * FROM MUTATION_FRAGMENTS($table) is a new
select statement sub-type, which allows dumping the underling mutations
making up the data of a given table. The output of this statement is
mutation-fragments presented as CQL rows. Each row corresponds to a
mutation-fragment. Subsequently, the output of this statement has a
schema that is different than that of the underlying table.
Data is always read from the local replica, on which the query is
executed. Migrating queries between coordinators is not allowed.
SELECT JSON uses selector_factories to obtain the names of the
fields to insert into the json object, and we want to drop
selector_factories entirely. Switch instead to the ":metadata" mode
of printing expressions, which does what we want.
Unfortunately, the switch changes how system functions are converted
into field names. A function such as unixtimestampof() is now rendered
as "system.unixtimestampof()"; before it did not have the keyspace
prefix.
This is a compatiblity problem, albeit an obscure one. Since the new
behavior matches Cassandra, and the odds of hitting this are very low,
I think we can allow the change.
In three cases we need to consult a column that's possibly not explicitly
selected:
- for the WHERE clause
- for GROUP BY
- for ORDER BY
The return value of the function is the index where the newly-added
column can be found. Currently, the index is correct for both
the internal column vector and the result set, but soon in won't
be.
In the first two cases (WHERE clause and ORDER BY), we're interested
in the column before grouping, in the last case (ORDER BY) we're interested
in the column after grouping, so we need to distinguish between the two.
Since we already have selection::index_of() that returns the pre-grouping
index, choose the post-grouping index for the return value of
selection::add_column_for_post_processing(), and change the GROUP BY
code to use index_of(). Comments are added.
GROUP BY is typically used with aggregation. In one case the aggregation
is implicit:
SELECT a, b, c
FROM tab
GROUP BY x, y, z
One row will appear from each group, even though no aggregation
was specified. To avoid this irregularity, rewrite this query as
SELECT first(a), first(b), first(c)
FROM tab
GROUP BY x, y, z
This allows us to have different paths for aggregations and
non-aggregations, without worrying about this special case.
Avoid mixed aggregate/non-aggregate queries by inserting
calls to the first() function. This allows us to avoid internal
state (simple_selector::_current) and make selector evaluation
stateless apart from explicit temporaries.
A query of the form `SELECT foo, count(foo) FROM tab` returns the first
value of the foo column along with the count. This can't be parallized
today since the first selector isn't an aggregate.
We plan to rewrite the query internally as `SELECT first(foo), count(foo)
FROM tab`, in order to make the query more regular (no mixing of aggregates
and non-aggregates). However, this will defeat the current check since
after the rewrite, all selectors are aggregates.
Prepare for this by performing the check on a pre-rewrite variable, so
it won't be affected by the query rewrite in the next patch.
Note that although even though we could add support for running
first() in parallel, it's not possible to get the correct results,
since first() is not commutative and we don't reduce in order. It's
also not a particularly interesting query.
Change one more layer of processing to work on prepared
rather than raw selectors. This moves the call to prepare
the selectors early in select_statement processing. In turn
this changes maybe_jsonize_select_clause() and forward_service's
mock_selection() to work in the prepared realm as well.
This moves us one step closer to using evaluate() to process
the select clause, as the prepared selectors are now available
in select_statement. We can't use them yet since we can't evaluate
aggregations.
Currently, each selector expression is individually prepared, then converted
into a selector object that is later executed. This is done (on a vector
of raw selectors) by cql3::selection::raw_selector::to_selectables().
Split that into two phases. The first phase converts raw_selector into
a new struct prepared_selector (a better name would be plain 'selector',
but it's taken for now). The second phase continues the process and
converts prepared_selector into selectables.
This gives us a full view of the prepared expressions while we're
preparing the select clause of the select statement.
The series contains mostly cleanups for query processor and no functional
change. The last patch is a small cleanup for the storage_proxy.
* 'qp-cleanup' of https://github.com/gleb-cloudius/scylla:
storage_proxy: remove unused variable
client_state: co-routinise has_column_family_access function
query_processor: get rid of internal_state and create individual query_satate for each request
cql3: move validation::validate_column_family from client_state::has_column_family_access
client_state: drop unneeded argument from has.*access functions
cql3: move check for dropping cdc tables from auth to the drop statement code itself
query_processor: co-routinise execute_prepared_without_checking_exception_message function
query_processor: co-routinize execute_direct_without_checking_exception_message function
cql3: remove empty statement::validate functions
cql3: remove empty function validate_cluster_support
cql3/statements: fix indentation and spurious white spaces
query_processor: move statement::validate call into execute_with_params function
query_processor: co-routinise execute_with_params function
query_processor: execute statement::validate before each execution of internal query instead of only during prepare
query_processor: get rid of shared internal_query_state
query_processor: co-routinize execute_paged_internal function
query_processor: co_routinize execute_batch_without_checking_exception_message function
query_processor: co-routinize process_authorized_statement function
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:
- moving some restrictions-related definitions to
the existing expr/restrictions.hh
- moving evaluation related names to a new header
expr/evaluate.hh
- move utilities to a new header
expr/expr-utilities.hh
expression.hh contains only expression definitions and the most
basic and common helpers, like printing.
Checking keyspace/table presence should not be part of authorization code
and it is not done consistently today. For instance keyspace presence
is not checked in "alter keyspace" during authorization, but during
statement execution. Make it consistent.