Commit Graph

1284 Commits

Author SHA1 Message Date
Nadav Har'El
8ece63c433 Merge 'Safemode - Introduce TimeWindowCompactionStrategy Guardrails'
This series introduces two configurable options when working with TWCS tables:

- `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting
- `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check.

Refs: #6923
Fixes: #9029

Closes #11445

* github.com:scylladb/scylladb:
  tests: cql_query_test: add mixed tests for verifying TWCS guard rails
  tests: cql_query_test: add test for TWCS window size
  tests: cql_query_test: add test for TWCS tables with no TTL defined
  cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables
  cql: add max window restriction for TimeWindowCompactionStrategy
  time_window_compaction_strategy: reject invalid window_sizes
  cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr
2022-09-12 23:55:51 +03:00
Felipe Mendes
7fec4fcaa6 cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables
TimeWindowCompactionStrategy (TWCS) tables are known for being used explicitly for time-series workloads. In particular, most of the time users should specify a default_time_to_live during table creation to ensure data is expired such as in a sliding window. Failure to do so may create unbounded windows - which - depending on the compaction window chosen, may introduce severe latency and operational problems, due to unbounded window growth.

However, there may be some use cases which explicitly ingest data by using the `USING TTL` keyword, which effectively has the same effect. Therefore, we can not simply forbid table creations without a default_time_to_live explicitly set to any value other than 0.

The new restrict_twcs_without_default_ttl option has three values: "true", "false", and "warn":

We default to "warn", which will notify the user of the consequences when creating a TWCS table without a default_time_to_live value set. However, users are encouraged to switch it to "true", as - ideally - a default_time_to_live value should always be expected to prevent applications failing to ingest data against the database ommitting the `USING TTL` keyword.
2022-09-11 16:50:42 -03:00
Felipe Mendes
a3356e866b cql: add max window restriction for TimeWindowCompactionStrategy
The number of potential compaction windows (or buckets) is defined by the default_time_to_live / sstable_window_size ratio. Every now and then we end up in a situation on where users of TWCS end up underestimating their window buckets when using TWCS. Unfortunately, scenarios on which one employs a default_time_to_live setting of 1 year but a window size of 30 minutes are not rare enough.

Such configuration is known to only make harm to a workload: As more and more windows are created, the number of SSTables will grow in the same pace, and the situation will only get worse as the number of shards increase.

This commit introduces the twcs_max_window_count option, which defaults to 50, and will forbid the Creation or Alter of tables which get past this threshold. A value of 0 will explicitly skip this check.

Note: this option does not forbid the creation of tables with a default_time_to_live=0 as - even though not recommended - it is perfectly possible for a TWCS table with default TTL=0 to have a bound window, provided any ingestion statements make use of 'USING TTL' within the CQL statement, in addition to it.
2022-09-11 16:50:22 -03:00
Mikołaj Grzebieluch
803115d061 raft: broadcast_tables: add returning query result
Intermediate language added new layer of abstraction between cql
statement and quering mutations, thus this commit adds new layer of
abstraction between mutations and returning query result.

Result can't be directly returned from `group0_state_machine::apply`, so
we decided to hold query results in map inside `raft_group0_client`. It can
be safely read after `add_entry_unguarded`, because this method waits
for applying raft command. After translating result to `result_message`
or in case of exception, map entry is erased.
2022-09-08 15:25:36 +02:00
Mikołaj Grzebieluch
db88525774 raft: broadcast_tables: add execution of intermediate language
Extended `group0_command` to enable transmission of `raft::broadcast_tables::query`.
Added `add_entry_unguarded` method in `raft_group0_client` for dispatching raft
commands without `group0_guard`.
Queries on group0_kv_store are executed in `group_0_state_machine::apply`,
but for now don't return results. They don't use previous state id, so they will
block concurrent schema changes, but these changes won't block queries.

In this version snapshots are ignored.
2022-09-08 15:25:36 +02:00
Mikołaj Grzebieluch
82df8a9905 raft: broadcast_tables: add compilation of cql to intermediate language
We decided to extend `cql_statement` hierarchy with `strongly_consistent_modification_statement`
and `strongly_consistent_select_statement`. Statements operating on
system.broadcast_kv_store will be compiled to these new subclasses if
BROADCAST_TABLES flag is enabled.

If the query is executed on a shard other than 0 it's bounced to that shard.
2022-09-08 15:25:36 +02:00
Felipe Mendes
7ccf8ed221 cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr
As check_restricted_table_properties() is invoked both within CREATE TABLE and ALTER TABLE CQL statements,
we currently have no way to determine whether the operation was either a CREATE or ALTER. In many situations,
it is important to be able to distinguish among both operations, such as - for example - whether a table already has
a particular property set or if we are defining it within the statement.

This patch simply adds a std::optional<schema_ptr> to check_restricted_table_properties() and updates its caller.

Whenever a CREATE TABLE statement is issued, the method is called as a std::nullopt, whereas if an ALTER TABLE is
issued instead, we call it with a schema_ptr.
2022-09-07 21:27:32 -03:00
Piotr Sarna
cf30d4cbcf Merge 'Secondary index of collection columns' from Nadav Har'El
This pull request introduces global secondary-indexing for non-frozen collections.

The intent is to enable such queries:

```
CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id));
CREATE INDEX ON test(keys(somemap));
CREATE INDEX ON test(values(somemap));
CREATE INDEX ON test(entries(somemap));
CREATE INDEX ON test(values(somelist));
CREATE INDEX ON test(values(someset));

-- index on test(c) is the same as index on (values(c))
CREATE INDEX IF NOT EXISTS ON test(somelist);
CREATE INDEX IF NOT EXISTS ON test(someset);
CREATE INDEX IF NOT EXISTS ON test(somemap);

SELECT * FROM test WHERE someset CONTAINS 7;
SELECT * FROM test WHERE somelist CONTAINS 7;
SELECT * FROM test WHERE somemap CONTAINS KEY 7;
SELECT * FROM test WHERE somemap CONTAINS 7;
SELECT * FROM test WHERE somemap[7] = 7;
```

We use here all-familiar materialized views (MVs). Scylla treats all the
collections the same way - they're a list of pairs (key, value). In case
of sets, the value type is dummy one. In case of lists, the key type is
TIMEUUID. When describing the design, I will forget that there is more
than one collection type.  Suppose that the columns in the base table
were as follows:

```
pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2)
```

The MV schema is as follows (the names of columns which are not the same
as in base might be different). All the columns here form the primary
key.

```
-- for index over entries
indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over keys
indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over values
indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int
```

The reason for the last additional column is that the values from a collection might not be unique.

Fixes #2962
Fixes #8745
Fixes #10707

This patch does not implement **local** secondary indexes for collection columns: Refs #10713.

Closes #10841

* github.com:scylladb/scylladb:
  test/cql-pytest: un-xfail yet another passing collection-indexing test
  secondary index: fix paging in map value indexing
  test/cql-pytest: test for paging with collection values index
  cql, view: rename and explain bytes_with_action
  cql, index: make collection indexing a cluster feature
  test/cql-pytest: failing tests for oversized key values in MV and SI
  cql: fix secondary index "target" when column name has special characters
  cql, index: improve error messages
  cql, index: fix default index name for collection index
  test/cql-pytest: un-xfail several collecting indexing tests
  test/cql-pytest/test_secondary_index: verify that local index on collection fails.
  docs/design-notes/secondary_index: add `VALUES` to index target list
  test/cql-pytest/test_secondary_index: add randomized test for indexes on collections
  cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra
  cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail.
  test/boost/secondary_index_test: update for non-frozen indexes on collections
  test/cql-pytest: Uncomment collection indexes tests that should be working now
  cql, index: don't use IS NOT NULL on collection column
  cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows
  cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections
  cql3, index: Use entries() indexes on collections for queries
  cql3, index: Use keys() and values() indexes on collections for queries.
  types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented
  cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function
  db/view/view.cc: compute view_updates for views over collections
  view info: has_computed_column_depending_on_base_non_primary_key
  column_computation: depends_on_non_primary_key_column
  schema, index/secondary_index_manager: make schema for index-induced mv
  index/secondary_index_manager: extract keys, values, entries types from collection
  cql3/statements/: validate CREATE INDEX for index over a collection
  cql3/statements/create_index_statement,index_target: rewrite index target for collection
  column_computation.hh, schema.cc: collection_column_computation
  column_computation.hh, schema.cc: compute_value interface refactor
  Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`
2022-08-16 14:18:51 +02:00
Avi Kivity
fbaa280acd cql3: select_statement: improve loop termination condition in indexed_table_select_statement::do_execute_base_query()
Move the termination condition to the front of the loop so it's
clear why we're looping and when we stop.

It's less than perfectly clean since we widen the scope of some variables
(from loop-internal to loop-carried), but IMO it's clearer.
2022-08-14 15:40:45 +03:00
Avi Kivity
60c7c11c96 cql3: select_statement: reindent indexed_table_select_statement::do_execute_base_query()
Reindent after coroutinization. No functional changes.
2022-08-14 15:35:36 +03:00
Avi Kivity
492dc6879e cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()
It's much easier to maintain this way. Since it uses ranges_to_vnodes,
it interacts with topology and needs integration into
effective_replication_map management.

The patch leaves bad indentation and an infinite-looking loop in
the interest of minimization, but that will be corrected later.

Note, the test for `!r.has_value()` was eliminated since it was
short-circuited by the test for `!rqr.has_value()` returning from
the coroutine rather than propagating an error.
2022-08-14 15:31:45 +03:00
Avi Kivity
973034978c cql3: select_statement: de-result_wrap indexed_table_select_statement::do_execute_base_query()
We use result_wrap() in two places, but that makes coroutinizing the
containing function a little harder, since it's composed of more lambdas.

Remove the wrappers, gaining a bit of performance in the error case.
2022-08-14 15:22:18 +03:00
Nadav Har'El
f6f18b187a secondary index: fix paging in map value indexing
When indexing a map column's *values*, if the same value appears more
than once, the same row will appear in the index more than once. We had
code that removed these duplicates, but this deduplication did not work
across page boundaries. We had two xfailing tests to demonstrate this bug.

In this patch we fix this bug by looking at the page's start and not
generating the same row again, thereby getting the same deduplication
we had inside pages - now across pages.

The previously-xfailing tests now pass, and their xfail tag is removed.
I also added another test, for the case where the base table has only
partition keys without clustering keys. This second test is important
because the code path for the partition-key-only case is different,
and the second test exposed a bug in it as well (which is also fixed
in this patch).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
8b00c91c13 cql, index: make collection indexing a cluster feature
Prevent a user from creating a secondary index on a collection column if
the cluster has any nodes which don't support this feature. Such nodes
will not be able to correctly handle requests related to this index,
so better not allow creating one.

Attempting to create an index on a collection before the entire cluster
supports this feature will result in the error:

    Indexing of collection columns not supported by some older nodes
    in this cluster. Please upgrade them.

Tested by manually disabling this feature in feature_service.cc and
seeing this error message during collection indexing test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
2c244c6e09 cql: fix secondary index "target" when column name has special characters
Unfortunately, we encode the "target" of a secondary index in one of
three ways:

1. It can be just a column name
2. It can be a string like keys(colname) - for the new type of
   collection indexes introduced in this series.
3. It can be a JSON map ({ ... }). This form is used for local indexes.

The code parsing this target - target_parser::parse() - needs not to
confuse these different formats. Before this patch, if the column name
contains special characters like braces or parentheses (this is allowed
in CQL syntax, via quoting), we can confuse case 1, 2, and 3: A column
named "keys(colname)" will be confused for case 2, and a column named
"{123}" will be confused with case 3.

This problem can break indexing of some specially-crafted column names -
as reproduced by test_secondary_index.py::test_index_quoted_names.

The solution adopted in this patch is that the column name in case 1
should be escaped somehow so it cannot be possibly confused with either
cases 2 and 3. The way we chose is to convert the column name to CQL (with
column_definition::as_cql_name()). In other words, if the column name
contains non-alphanumeric characters, it is wrapped in quotes and also
quotes are doubled, as in CQL. The result of this can't be confused
with case 2 or 3, neither of which may begin with a quote.

This escaping is not the minimal we could have done, but incidentally it
is exactly what Cassandra does as well, so I used it as well.

This change is *mostly* backward compatible: Already-existing indexes will
still have unescaped column names stored for their "target" string,
and the unescaping code will see they are not wrapped in quotes, and
not change them. Backward compatibility will only fail on existing indexes
on columns whose name begin and end in the quote characters - but this
case is extremely unlikely.

This patch illustrates how un-ideal our index "target" encoding is,
but isn't what made it un-ideal. We should not have used three different
formats for the index target - the third representation (JSON) should
have sufficed. However, two two other representations are identical
to Cassandra's, so using them when we can has its compatibility
advantages.

The patch makes test_secondary_index.py::test_index_quoted_names pass.

Fixes #10707.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
56204a3794 cql, index: improve error messages
Before this patch, trying to create an index on entries(x) where x is
not a map results in an error message:

  Cannot create index on index_keys_and_values of column x

The string "index_keys_and_values" is strange - Cassandra prints the
easier to understand string "entries()" - which better corresponds to
what the user actually did.

It turns out that this string "index_keys_and_values" comes from an
elaborate set of variables and functions spanning multiple source files,
used to convert our internal target_type variable into such a string.
But although this code was called "index_option" and sounded very
important, it was actually used just for one thing - error messages!

So in this patch we drop the entire "index_option" abstraction,
replacing it by a static trivial function defined exactly where
it's used (create_index_statement.cc), which prints a target type.
While at it, we print "entries()" instead of "index_keys_and_values" ;-)

After this patch, the
test_secondary_index.py::test_index_collection_wrong_type

finally passes (the previous patch fixed the default table names it
assumes, and this patch fixes the expected error messages), so its
"xfail" tag is removed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
84461f1827 cql, index: fix default index name for collection index
When creating an index "CREATE INDEX ON tbl(keys(m))", the default name
of the index should be tbl_m_idx - with just "m". The current code
incorrectly used the default name tbl_m_keys_idx, so this patch adds
a test (which passes on Cassandra, and after this patch also on Scylla)
and fixes the default name.

It turns out that the default index name was based on a mysterious
index_target::as_string(), which printed the target "keys(m)" as
"m_keys" without explaining why it was so. This method was actually
used only in three places, and all of them wanted just the column
name, without the "_keys" suffix! So in this patch we rename the
mysterious as_string() to column_name(), and use this function instead.

Now that the default index name uses column_name() and gets just
column_name(), the correct default index name is generated, and the
test passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
67990d2170 cql, index: don't use IS NOT NULL on collection column
When the secondary-index code builds a materialized view on column x, it
adds "x IS NOT NULL" to the where-clause of the view, as required.

However, when we index a collection column, we index individual pieces
of the collection (keys, values), the the entire collection, so checking
if the entire collection is null does not make sense. Moreover, for a
collection column x, "x IS NOT NULL" currently doesn't work and throws
errors when evaluating that expression when data is written to the table.

The solution used in this patch is to simply avoid adding the "x IS NOT
NULL" when creating the materialized view for a collection index.
Everything works just fine without it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Michał Radwański
bd44bc3e35 cql3/statements/select_statement: for index on values of collection,
don't emit duplicate rows

The index on collection values is special in a way, as its' clustering key contains not only the
base primary key, but also a column that holds the keys of the cells in the collection, which
allows to distinguish cells with different keys but the same value.
This has an unwanted consequence, that it's possible to receive two identical base table primary
keys from indexed_table_select_statement::find_index_clustering_rows. Thankfully, the duplicate
primary keys are guaranteed to occur consequently.
2022-08-14 10:29:52 +03:00
Karol Baryła
7966841d37 cql3, index: Use keys() and values() indexes on collections for queries.
Previous commits added the possibility of creating GSI on non-frozen collections.
This (and next) commit allow those indexes to actually be used by queries.
This commit enables both keys() and values() indexes, as they are pretty similar.
2022-08-14 10:29:52 +03:00
Michał Radwański
e6521ff8ba cql3/statements/index_target: throw exception to signalize that we
didn't miss returning from function

GCC doesn't consider switches over enums to be exhaustive. Replace
bogous return value after a switch where each of the cases return, with
an exception.
2022-08-14 10:29:52 +03:00
Michał Radwański
cbe33f8d7a cql3/statements/: validate CREATE INDEX for index over a collection
Allow CQL like this:
CREATE INDEX idx ON table(some_map);
CREATE INDEX idx ON table(KEYS(some_map));
CREATE INDEX idx ON table(VALUES(some_map));
CREATE INDEX idx ON table(ENTRIES(some_map));
CREATE INDEX idx ON table(some_set);
CREATE INDEX idx ON table(VALUES(some_set));
CREATE INDEX idx ON table(some_list);
CREATE INDEX idx ON table(VALUES(some_list));

This is needed to support creating indexes on collections.
2022-08-14 10:29:13 +03:00
Michał Radwański
997682ed72 cql3/statements/create_index_statement,index_target: rewrite index target for collection
The syntax used for creating indexes on collections that is present in
Cassandra is unintuitive from the internal representation point of view.

For instance, index on VALUES(some_set) indexes the set elements, which
in the internal representation are keys of collection. Rewrite the index
target after receiving it, so that the index targets are consistent with
the representation.
2022-08-14 10:29:13 +03:00
Michał Radwański
2babee2cdc column_computation.hh, schema.cc: compute_value interface refactor
The compute_value function of column_computation has had previously the
following signature:
    virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;

This is superfluous, since never in the history of Scylla, the last
parameter (row) was used in any implentation, and never did it happen
that it returned bytes_opt. The absurdity of this interface can be seen
especially when looking at call sites like following, where dummy empty
row was created:
```
    token_column.get_computation().compute_value(
            *_schema, pkv_linearized, clustering_row(clustering_key_prefix::make_empty()));
```
2022-08-14 10:29:13 +03:00
Michał Radwański
166afd46b5 Cql.g, treewide: support cql syntax INDEX ON table(VALUES(collection))
Brings support of cql syntax `INDEX ON table(VALUES(collection))`, even
though there is still no support for indexes over collections.
Previously, index_target::target_type::values was refering to values of
a regular (non-collection) column. Rename it to `regular_values`.

Fixes #8745.
2022-08-14 10:29:13 +03:00
Piotr Sarna
1ab4c6aab3 Merge 'cql3: enable collections as UDA accumulators' from Wojciech Mitros
Currently, the initial values of UDA accumulators are converted
to strings using the to_string() method and from strings using the
from_string() method. The from_string() method is not implemented
for collections, and it can't be implemented without changing the
string format, because in that format, we cannot differentiate
whether a separator is a part of a value or is an actual separator
between values. In particular, the separators are not escaped
in the collection values.

Instead of from_string()/to_string() the cql parser is used
for creating a value from a string (the same , and to_parsable_string()
is used to converting a value into a string.

A test using a list as an accumulator is added to
cql-pytest/test_uda.py.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Closes #11250

* github.com:scylladb/scylladb:
  cql3: enable collections as UDA accumulators
  cql3: extend implementation of to_bytes for raw_value
2022-08-12 12:51:17 +02:00
Wojciech Mitros
48bd752971 cql3: enable collections as UDA accumulators
Currently, the initial values of UDA accumulators are converted
to strings using the to_string() method and from strings using the
from_string() method. The from_string() method is not implemented
for collections, and it can't be implemented without changing the
string format, because in that format, we cannot differentiate
whether a separator is a part of a value or is an actual separator
between values. In particular, the separators are not escaped
in the collection values. For example, a list with string elements:
'a, b', 'c' would be represented as a string 'a, b, c', while now
it is represented as "['a, b', 'c']".
Some types that were parsable are now represented in a different
way. For example, a tuple ('a', null, 0) was represented as
"a:\@:0", and now it is "('a', null, 0)".

Instead of from_string()/to_string() the cql parser is used
for creating a value from a string (the same , and to_parsable_string()
is used to converting a value into a string.

A test using a list as an accumulator is added to
cql-pytest/test_uda.py.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2022-08-11 16:23:57 +02:00
Botond Dénes
d1d53f1b84 query: add tombstone-limit to read-command
Propagate the tombstone-limit from coordinator to replicas, to make sure
all is using the same limit.
2022-08-10 06:01:47 +03:00
Benny Halevy
c71ef330b2 query-request, everywhere: define and use query_id as a strong type
Define query_id as a tagged_uuid
So it can be differentiated from other uuid-class types.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:13:28 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Avi Kivity
268e4abe77 Merge 'wasm: reuse instances for wasm UDFs' from Wojciech Mitros
Calling WebAssembly UDFs requires wasmtime instance. Creating such an instance is expensive,
but these instances can be reused for subsequent calls of the same UDF on various inputs.

This patch introduces a way of reusing wasmtime instances: a wasm instance cache.
The cache stores a wasmtime instance for each UDF and scheduling group. The instances are
evicted using LRU strategy and their size is based on the size of their wasm memories.

The instances stored in the cache are also dropped when the UDF is dropped itself. For that reason,
the first patch modifies the current implementation of UDF dropping, so that the instance dropping may be added
later. The patch also removes the need of compiling the UDF again when dropping it.

The second patch contains the implementation and use of the new cache. The cache is implemented
in `lang/wasm_instance_cache.hh` and the main ways of using it are the `run_script` methods from `wasm.hh`

The third patch adds tests to `test_wasm.py` that check the correctness and performance of the new
cache. The tests confirm the instance reuse, size limits, instance eviction after timeout and after dropping the UDF.

Closes #10306

* github.com:scylladb/scylladb:
  wasm: test instances reuse
  wasm: reuse UDF instances
  schema_tables: simplify merge_functions and avoid extra compilation
2022-08-02 13:51:16 +03:00
Nadav Har'El
cb8a67dc98 Merge 'Allow materialized views to by synchronous' from Piotr Sarna
This pull request introduces a "synchronous mode" for global views. In this mode, all view updates are applied synchronously as if the view was local.

Marking view as a synchronous one can be done using `CREATE MATERIALIZED VIEW` and `ALTER MATERIALIZED VIEW`. E.g.:
```cql
ALTER MATERIALIZED VIEW ks.v WITH synchronous_updates = true;
```

Marking view as a synchronous one was done using tags (originally used by alternator). No big modifications in the view's code were needed.

Fixes: https://github.com/scylladb/scylla/issues/10545

Closes #11013

* github.com:scylladb/scylla:
  cql-pytest: extend synchronous mv test with new cases
  cql-pytest: allow extra parameters in new_materialized_view
  docs: add a paragraph on view synchronous updates
  test/boost/cql_query_test: add test setting synchronous updates property
  test: cql-pytest: add a test for synchronous mode materialized views
  db: view: react to synchronous updates tag
  cql3: statements: cf_prop_defs: apply synchronous updates tag
  alternator, db: move the tag code to db/tags
  cql3: statements: add a synchronous_updates property
2022-07-26 15:42:51 +03:00
Michał Sala
128806f022 cql3: statements: cf_prop_defs: apply synchronous updates tag
This commit defines a new tag key (SYNCHRONOUS_VIEW_UPDATES_TAG_KEY) to
be used for marking "synchronous mode" views. This key is used in
`cf_prop_defs::apply_to_builder` if the properties contain
KW_SYNCHRONOUS_UPDATES.
2022-07-25 09:53:33 +02:00
Michał Sala
494e7fc5f5 cql3: statements: add a synchronous_updates property
This property can be used with CREATE MATERIALIZED VIEW and ALTER
MATERIALIZED VIEW statements. Setting it allows global views to enter
"synchronous mode". In this mode, all view updates are also applied
synchronously as if the view was local. This may reduce their
availability, but has the benefit of propagating a potential
inconsistency risk (in form of a write error) to the user, who can
respond to it appropriately (e.g. retry the write or fix the view
later).
2022-07-25 09:53:33 +02:00
Avi Kivity
9823e75d16 cql3: grammar: make where clause return an expression
In preparation of the relaxation of the grammar to return any expression,
change the whereClause production to return an expression rather than
terms. Note that the expression is still constrained to be a conjunction
of relations, and our filtering code isn't prepared for more.

Before the patch, if the WHERE clause was optional, the grammar would
pass an empty vector of expressions (which is exactly correct). After
the patch, it would pass a default-constructed expression. Now that
happens to be an empty conjunction, which is exactly what's needed, but
it is too accidental, so the patch changes optional WHERE clauses to
explicitly generate an empty conjunction if the WHERE clause wasn't
specified.
2022-07-22 20:14:48 +03:00
Avi Kivity
fd663bcb94 cql3: util: change where clause utilities to accept a single expression rather than a vector of terms
Conversion to terms happens internally via boolean_factors().
2022-07-22 20:14:48 +03:00
Avi Kivity
a5dd588465 cql3: statement_restrictions: accept a single expression rather than a vector
Move closer to the goal of accepting a generic expression for WHERE
clause by accepting a generic expression in statement_restrictions. The
various callers will synthesize it from a vector of terms.
2022-07-22 20:14:48 +03:00
Avi Kivity
4aa0a03b7e cql3: select_statement: remove wrong but harmless std::move() in prepare_restrictions
std::move(_where_clause) is wrong, because _where_clause is used later
(when analyzing GROUP BY), but also harmless (because the
statement_restrictions constructor accepts it by const reference).

To avoid confusion in the next patch where we'll pass _where_clause
to a different function, remove the bad std::move() in advance here.
2022-07-22 20:14:48 +03:00
Wojciech Mitros
9281ba3919 wasm: reuse UDF instances
When executing a wasm UDF, most of the time is spent on
setting up the instance. To minimize its cost, we reuse
the instance using wasm::instance_cache.

This patch adds a wasm instance cache, that stores
a wasmtime instance for each UDF and scheduling group.
The instances are evicted using LRU strategy. The
cache may store some entries for the UDF after evicting
the instance, but they are evicted when the corresponding
UDF is dropped, which greatly limits their number.

The size of stored instances is estimated using the size
of their WASM memories. In order to be able to read the
size of memory, we require that the memory is exported
by the client.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2022-07-20 18:19:22 +02:00
Avi Kivity
13a64d8ab2 Merge 'Remove all remaining restrictions classes' from Jan Ciołek
This PR removes all code that used classes `restriction`, `restrictions` and their children.

There were two fields in `statement_restrictions` that needed to be dealt with: `_clustering_columns_restrictions` and `_nonprimary_key_restrictions`.

Each function was reimplemented to operate on the new expression representaiion and eventually these fields weren't needed anymore.

After that the restriction classes weren't used anymore and could be deleted as well.

Now all of the code responsible for analyzing WHERE clause and planning a query works on expressions.

Closes #11069

* github.com:scylladb/scylla:
  cql3: Remove all remaining restrictions code
  cql3: Move a function from restrictions class to the test
  cql3: Remove initial_key_restrictions
  cql3: expr: Remove convert_to_restriction
  cql3: Remove _new from _new_nonprimary_key_restrictions
  cql3: Remove _nonprimary_key_restrictions field
  cql3: Reimplement uses of _nonprimary_key_restrictions using expression
  cql3: Keep a map of single column nonprimary key restrictions
  cql3: Remove _new from _new_clustering_columns_restrictions
  cql3: Remove _clustering_columns_restrictions from statement_restrictions
  cql3: Use a variable instead of dynamic cast
  cql3: Use the new map of single column clustering restrictions
  cql3: Keep a map of single column clustering key restrictions
  cql3: Return an expression in get_clustering_columns_restrctions()
  cql3: Reimplement _clustering_columns_restrictions->has_supporting_index()
  cql3: Don't create single element conjunction
  cql3: Add expr::index_supports_some_column
  cql3: Reimplement has_unrestricted_components()
  cql3: Reimplement _clustering_columns_restrictions->need_filtering()
  cql3: Reimplement num_prefix_columns_that_need_not_be_filtered
  cql3: Use the new clustering restrictions field instead of ->expression
  cql3: Reimplement _clustering_columns_restrictions->size() using expressions
  cql3: Reimplement _clustering_columns_restrictions->get_column_defs() using expressions
  cql3: Reimplement _clustering_columns_restrictions->is_all_eq() using expressions
  cql3: expr: Add has_only_eq_binops function
  cql3: Reimplement _clustering_columns_restrictions->empty() using expressions
2022-07-20 18:01:15 +03:00
Botond Dénes
014c5b56a3 query-result: move last_pos up to query::result
query_result was the wrong place to put last position into. It is only
included in data-responses, but not on digest-responses. If we want to
support empty pages from replicas, both data and digest responses have
to include the last position. So hoist up the last position to the
parent structure: query::result. This is a breaking change inter-node
ABI wise, but it is fine: the current code wasn't released yet.

Closes #11072
2022-07-20 13:28:09 +03:00
Jan Ciolek
599bcd6ea7 cql3: Remove all remaining restrictions code
The classes restriction, restrictions and its children
aren't used anywhere now and can be safely removed.

Some includes need to be modified for the code to compile.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Jan Ciolek
9d1ba07471 cql3: Reimplement uses of _nonprimary_key_restrictions using expression
All parts of the code that use _nonprimary_key_restrictions
are changed to use _new_nonprimary_key_restrictions instead.
I decided not to split this into multiple commits,
as there isn't a lot of changes and they are
analogous to the ones done before for partition
and clustering columns.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:30 +02:00
Jan Ciolek
2b7ffd57fb cql3: Return an expression in get_clustering_columns_restrctions()
get_clustering_columns_restrctions() used to return
a shared pointer to the clustering_restrictions class.

Now everything is being converted to expression,
so it should return an expression as well.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-19 16:02:01 +02:00
Jadw1
182438c5f8 forward_service: enable multiple selection
Enables parallelization of query like `SELECT MIN(x), MAX(x)`.
Compatibility is ensured under the same cluster feature as
UDA and native aggregates parallelization. (UDA_NATIVE_PARALLELIZED_AGGREGATION)
2022-07-18 15:25:41 +02:00
Jadw1
29a0be75da forward_service: support UDA and native aggregate parallelization
Enables parallelization of UDA and native aggregates. The way the
query is parallelized is the same as in #9209. Separate reduction
type for `COUNT(*)` is left for compatibility reason.
2022-07-18 15:25:41 +02:00
Jadw1
6d977fcf88 cql3: selection: detect parallelize reduction type
Detects type of reduction if it is possible. Separate case for
`COUNT(*)` is left for compatibility reason. By now only single
selection is supported.
2022-07-18 15:25:41 +02:00
Jadw1
59498caeca db,cql3: Move part of cql3's function into db
Moving `function`, `function_name` and `aggregate_function` into
db namespace to avoid including cql3 namespace into query-request.
For now, only minimal subset of cql3 function was moved to db.
2022-07-18 15:25:41 +02:00
Jadw1
d8f3461147 CQL3: Add reduce function to UDA
Add optional field to UDA, that describes reduce function to allow
parallelization of UDA aggregates.
2022-07-18 14:18:48 +02:00
Benny Halevy
acae3cc223 treewide: stop use of deprecated coroutine::make_exception
Convert most use sites from `co_return coroutine::make_exception`
to `co_await coroutine::return_exception{,_ptr}` where possible.

In cases this is done in a catch clause, convert to
`co_return coroutine::exception`, generating an exception_ptr
if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10972
2022-07-07 15:02:16 +03:00