and set crs and rts only in the block where they are used,
so we can get rid of reversed_range_tombstones.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than reversing the schema on every call
just keep the potentially reversed schema in cookie.
Othwerwise, cookie.schema was write only.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In commit 7eda6b1e90, we increased the
request_timeout parameter used by cql-pytest tests from the default of
10 seconds to 120 seconds. 10 seconds was usually more than enough for
finishing any Scylla request, but it turned out that in some extreme
cases of a debug build running on an extremely over-committed machine,
the default timeout was not enough.
Recently, in issue #11289 we saw additional cases of timeouts which
the request_timeout setting did *not* solve. It turns out that the Python
CQL driver has two additional timeout settings - connect_timeout and
control_connection_timeout, which default to 5 seconds and 2 seconds
respectively. I believe that most of the timeouts in issue #11289
come from the control_connection_timeout setting - by changing it
to a tiny number (e.g., 0.0001) I got the same error messages as those
reported in #11289. The default of that timeout - 2 seconds - is
certainly low enough to be reached on an extremely over-committed
machine.
So this patch significantly increases both connect_timeout and
control_connection_timeout to 60 seconds. We don't care that this timeout
is ridiculously large - under normal operations it will never be reached.
There is no code which loops for this amount of time, for example.
Refs #11289 (perhaps even Fixes, we'll need to see that the test errors
go away).
NOTE: This patch only changes test/cql-pytest/util.py, which is only
used by the cql-pytest test suite. We have multiple other test suites which
copied this code, and those test suites might need fixing separately.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11295
Right now, if there's a node for which we don't know the features
supported by this node (they are neither persisted locally, nor gossiped
by that node), we would skip this node in calculating the set
of enabled features and potentially enable a feature which shouldn't be
enabled - because that node may not know it. We should only enable a
feature when we know that all nodes have upgraded and know the feature.
This bug caused us problems when we tried to move RAFT out of
experimental. There are dtests such as `partitioner_tests.py` in which
nodes would enable features prematurely, which caused the Raft upgrade
procedure to break (the procedure starts only when all nodes upgrade
and announce that they know the SUPPORTS_RAFT cluster feature).
Closes#11225
This pull request introduces global secondary-indexing for non-frozen collections.
The intent is to enable such queries:
```
CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id));
CREATE INDEX ON test(keys(somemap));
CREATE INDEX ON test(values(somemap));
CREATE INDEX ON test(entries(somemap));
CREATE INDEX ON test(values(somelist));
CREATE INDEX ON test(values(someset));
-- index on test(c) is the same as index on (values(c))
CREATE INDEX IF NOT EXISTS ON test(somelist);
CREATE INDEX IF NOT EXISTS ON test(someset);
CREATE INDEX IF NOT EXISTS ON test(somemap);
SELECT * FROM test WHERE someset CONTAINS 7;
SELECT * FROM test WHERE somelist CONTAINS 7;
SELECT * FROM test WHERE somemap CONTAINS KEY 7;
SELECT * FROM test WHERE somemap CONTAINS 7;
SELECT * FROM test WHERE somemap[7] = 7;
```
We use here all-familiar materialized views (MVs). Scylla treats all the
collections the same way - they're a list of pairs (key, value). In case
of sets, the value type is dummy one. In case of lists, the key type is
TIMEUUID. When describing the design, I will forget that there is more
than one collection type. Suppose that the columns in the base table
were as follows:
```
pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2)
```
The MV schema is as follows (the names of columns which are not the same
as in base might be different). All the columns here form the primary
key.
```
-- for index over entries
indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over keys
indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over values
indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int
```
The reason for the last additional column is that the values from a collection might not be unique.
Fixes#2962Fixes#8745Fixes#10707
This patch does not implement **local** secondary indexes for collection columns: Refs #10713.
Closes#10841
* github.com:scylladb/scylladb:
test/cql-pytest: un-xfail yet another passing collection-indexing test
secondary index: fix paging in map value indexing
test/cql-pytest: test for paging with collection values index
cql, view: rename and explain bytes_with_action
cql, index: make collection indexing a cluster feature
test/cql-pytest: failing tests for oversized key values in MV and SI
cql: fix secondary index "target" when column name has special characters
cql, index: improve error messages
cql, index: fix default index name for collection index
test/cql-pytest: un-xfail several collecting indexing tests
test/cql-pytest/test_secondary_index: verify that local index on collection fails.
docs/design-notes/secondary_index: add `VALUES` to index target list
test/cql-pytest/test_secondary_index: add randomized test for indexes on collections
cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra
cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail.
test/boost/secondary_index_test: update for non-frozen indexes on collections
test/cql-pytest: Uncomment collection indexes tests that should be working now
cql, index: don't use IS NOT NULL on collection column
cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows
cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections
cql3, index: Use entries() indexes on collections for queries
cql3, index: Use keys() and values() indexes on collections for queries.
types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented
cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function
db/view/view.cc: compute view_updates for views over collections
view info: has_computed_column_depending_on_base_non_primary_key
column_computation: depends_on_non_primary_key_column
schema, index/secondary_index_manager: make schema for index-induced mv
index/secondary_index_manager: extract keys, values, entries types from collection
cql3/statements/: validate CREATE INDEX for index over a collection
cql3/statements/create_index_statement,index_target: rewrite index target for collection
column_computation.hh, schema.cc: collection_column_computation
column_computation.hh, schema.cc: compute_value interface refactor
Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`
Commit 23acc2e848 broke the "--ssl" option of test/cql-pytest/run
(which makes Scylla - and cqlpytest - use SSL-encrypted CQL).
The problem was that there was a confusion between the "ssl" module
(Python's SSL support) and a new "ssl" variable. A rename and a missing
"import" solves the breakage.
We never noticed this because Jenkins does *not* run cql-pytest/run
with --ssl (actually, it no longer runs cql-pytest/run at all).
It is still a useful option for checking SSL-related problems in Scylla
and Seastar.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11292
Test schema changes when there was an underlying topology change.
- per test case checks of cluster health and cycling
- helper class to do cluster manager API requests
- tests can perform topology changes: stop/start/restart servers
- modified clusters are marked dirty and discarded after the test case
- cql connection is updated per topology change and per cluster change
Closes#11266
* github.com:scylladb/scylladb:
test.py: test topology and schema changes
test.py: ClusterManager API mark cluster dirty
test.py: call before/after_test for each test case
test.py: handle driver connection in ManagerClient
test.py: ClusterManager API and ManagerClient
test.py: improve topology docstring
Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later.
Fixes: https://github.com/scylladb/scylladb/issues/11264Closes#11273
* github.com:scylladb/scylladb:
querier: querier_cache: remove now unused evict_all_for_table()
database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table()
reader_concurrency_semaphore: add evict_inactive_reads_for_table()
It should have had one, derived instances are stored and destroyed via
the base-class. The only reason this haven't caused bugs yet is that
derived instances happen to not have any non-trivial members yet.
Closes#11293
A mixed bag of improvements developed as part of another PR (https://github.com/scylladb/scylladb/pull/10736). Said PR was closed so I'm submitting these improvements separately.
Closes#11294
* github.com:scylladb/scylladb:
test/lib: move convenience table config factory to sstable_test_env
test/lib/sstable_test_env: move members to impl struct
test/lib/sstable_utils: use test_env::do_with_async()
Instead of querier_cache::evict_all_for_table(). The new method cover
all queriers and in addition any other inactive reads registered on the
semaphore. In theory by the time we detach a table, no regular inactive
reads should be in the semaphore anymore, but if there is any still, we
better evict them before the table is destroyed, they might attempt to
access it in when destroyed later.
All users of `column_family_test_config()`, get the semaphore parameter
for it from `sstable_test_env`. It is clear that the latter serves as
the storage space for stable objects required by the table config. This
patch just enshrines this fact by moving the config factory method to
`sstable_test_env`, so it can just get what it needs from members.
All present members of sstable_test_env are std::unique_ptr<>:s because
they require stable addresses. This makes their handling somewhat
awkward. Move all of them into an internal `struct impl` and make that
member a unique ptr.
Fixes#11184Fixes#11237
In prev (broken) fix for https://github.com/scylladb/scylladb/issues/11184 we added the footprint for left-over
files (replay candidates) to disk footprint on commitlog init.
This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things.
Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint).
So shards 1-X would all be either locked out or performance degraded.
Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved.
To further emphasize this, don't store segments found on init scan in all shard instances,
instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store
pointless string lists in shards 1 to X, and also we can potentially ask for the list twice.
More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is
considered candidate for recycling, and included in footprint.
Closes#11251
* github.com:scylladb/scylladb:
commitlog: Make get_segments_to_replay on-demand
commitlog: Revert/modify fac2bc4 - do footprint add in delete
Fix https://github.com/scylladb/scylladb/issues/11197
This PR adds a new page where specifying workload attributes with service levels is described and adds it to the menu.
Also, I had to fix some links because of the warnings.
Closes#11209
* github.com:scylladb/scylladb:
doc: remove the reduntant space from index
doc: update the syntax for defining service level attributes
doc: rewording
doc: update the links to fix the warnings
doc: add the new page to the toctree
doc: add the descrption of specifying workload attributes with service levels
doc: add the definition of workloads to the glossary
In preparation for effective_replication_map hygiene, convert
some counter functions to coroutines to simplify the changes.
Closes#11291
* github.com:scylladb/scylladb:
storage_proxy: mutate_counters_on_leader: coroutinize
storage_proxy: mutate_counters: coroutinize
storage_proxy: mutate_counters: reorganize error handling
Simplify ahead of refactoring for consistent effective_replication_map.
This is probably a pessimization of the error case, but the error case
will be terrible in any case unless we resultify it.
Move the error handling function where it's used so the code
is more straightforward.
Due to some std::move()s later, we must still capture the schema early.
Move the termination condition to the front of the loop so it's
clear why we're looping and when we stop.
It's less than perfectly clean since we widen the scope of some variables
(from loop-internal to loop-carried), but IMO it's clearer.
It's much easier to maintain this way. Since it uses ranges_to_vnodes,
it interacts with topology and needs integration into
effective_replication_map management.
The patch leaves bad indentation and an infinite-looking loop in
the interest of minimization, but that will be corrected later.
Note, the test for `!r.has_value()` was eliminated since it was
short-circuited by the test for `!rqr.has_value()` returning from
the coroutine rather than propagating an error.
We use result_wrap() in two places, but that makes coroutinizing the
containing function a little harder, since it's composed of more lambdas.
Remove the wrappers, gaining a bit of performance in the error case.
The function `check_exists` checks whether a given table exists, giving
an error otherwise. It previously used `on_internal_error`.
`check_exists` is used in some old functions that insert CDC metadata to
CDC tables. These tables are no longer used in newer Scylla versions
(they were replaced with other tables with different schema), and this
function is no longer called. The table definitions were removed and
these tables are no longer created. They will only exists in clusters
that were upgraded from old versions of Scylla (4.3) through a sequence
of upgrades.
If you tried to upgrade from a very old version of Scylla which had
neither the old or the new tables to a modern version, say from 4.2 to
5.0, you would get `on_internal_error` from this `check_exists`
function. Fortunately:
1. we don't support such upgrade paths
2. `on_internal_error` in production clusters does not crash the system,
only throws. The exception would be catched, printed, and the system
would run (just without CDC - until you finished upgrade and called
the propoer nodetool command to fix the CDC module).
Unfortunately, there is a dtest (`partitioner_tests.py`) which performs
an unsupported upgrade scenario - it starts Scylla from Cassandra (!)
work directories, which is like upgrading from a very old version of
Scylla.
This dtest was not failing due to another bug which masked the problem.
When we try to fix the bug - see #11225 - the dtest starts hitting the
assertion in `check_exists`. Because it's a test, we configure
`on_internal_error` to crash the system.
The point of this commit is to not crash the system in this rare
scenario which happens only in some weird tests. We now throw
`std::runtime_error` instead of calling `on_internal_error`. In the
dtest, we already ignore the resulting CDC error appearing in the logs
(see scylladb/scylla-dtest#2804). Together with this change, we'll be
able to fix the #11225 bug and pass this test.
Closes#11287
After collection indexing has been implemented, yet another test which
failed because of #2962 now passes. So remove the "xfail" marker.
Refs #2962
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When indexing a map column's *values*, if the same value appears more
than once, the same row will appear in the index more than once. We had
code that removed these duplicates, but this deduplication did not work
across page boundaries. We had two xfailing tests to demonstrate this bug.
In this patch we fix this bug by looking at the page's start and not
generating the same row again, thereby getting the same deduplication
we had inside pages - now across pages.
The previously-xfailing tests now pass, and their xfail tag is removed.
I also added another test, for the case where the base table has only
partition keys without clustering keys. This second test is important
because the code path for the partition-key-only case is different,
and the second test exposed a bug in it as well (which is also fixed
in this patch).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
If a map has several keys with the same value, then the "values(m)" index
must remember all of them as matching the same row - because later we may
remove one of these keys from the map but the row would still need to
match the value because of the remaining keys.
We already had a test (test_index_map_values) that although the same row
appears more than once for this value, when we search for this value the
result only returns the row once. Under the hood, Scylla does find the
same value multiple times, but then eliminates the duplicate matched raw
and returns it only once.
But there is a complication, that this de-duplication does not easily
span *paging*. So in this patch we add a test that checks that paging
does not cause the same row to be returned more than once.
Unfortunately, this test currently fails on Scylla so marked "xfail".
It passes on Cassandra.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The structure "bytes_with_action" was very hard to understand because of
its mysterious and general-sounding name, and no comments.
In this patch I add a large comment explaining its purpose, and rename
it to a more suitable name, view_key_and_action, which suggests that
each such object is about one view key (where to add a view row), and
an additional "action" that we need to take beyond adding the view row.
This is the best I can do to make this code easier to understand without
completely reorganizing it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Prevent a user from creating a secondary index on a collection column if
the cluster has any nodes which don't support this feature. Such nodes
will not be able to correctly handle requests related to this index,
so better not allow creating one.
Attempting to create an index on a collection before the entire cluster
supports this feature will result in the error:
Indexing of collection columns not supported by some older nodes
in this cluster. Please upgrade them.
Tested by manually disabling this feature in feature_service.cc and
seeing this error message during collection indexing test.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In issue #9013, we noticed that if a value larger than 64 KB is indexed,
the write fails in a bad way, and we fixed it. But the test we wrote
when fixing that issue already suggested that something was still wrong:
Cassandra failed the write cleanly, with an InvalidRequest, while Scylla
failed with a mysterious WriteFailure (with a relevant error message
only in the log).
This patch adds several xfailing tests which demonstrate what's still
wrong. This is also summarized in issue #8627:
1. A write of an oversized value to an indexed column returns the wrong
error message.
2. The same problem also exists when indexing a collection, and the indexed
key or value is oversized.
3. The situation is even less pleasant when adding an index to a table
with pre-existing data and an oversized value. In this case, the
view building will fail on the bad row, and never finish.
4. We have exactly the same bugs not just with indexes but also with
materialized views. Interestingly, Cassandra has similar bugs in
materialized views as well (but not in the secondary index case,
where Cassandra does behave as expected).
Refs #8627.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Unfortunately, we encode the "target" of a secondary index in one of
three ways:
1. It can be just a column name
2. It can be a string like keys(colname) - for the new type of
collection indexes introduced in this series.
3. It can be a JSON map ({ ... }). This form is used for local indexes.
The code parsing this target - target_parser::parse() - needs not to
confuse these different formats. Before this patch, if the column name
contains special characters like braces or parentheses (this is allowed
in CQL syntax, via quoting), we can confuse case 1, 2, and 3: A column
named "keys(colname)" will be confused for case 2, and a column named
"{123}" will be confused with case 3.
This problem can break indexing of some specially-crafted column names -
as reproduced by test_secondary_index.py::test_index_quoted_names.
The solution adopted in this patch is that the column name in case 1
should be escaped somehow so it cannot be possibly confused with either
cases 2 and 3. The way we chose is to convert the column name to CQL (with
column_definition::as_cql_name()). In other words, if the column name
contains non-alphanumeric characters, it is wrapped in quotes and also
quotes are doubled, as in CQL. The result of this can't be confused
with case 2 or 3, neither of which may begin with a quote.
This escaping is not the minimal we could have done, but incidentally it
is exactly what Cassandra does as well, so I used it as well.
This change is *mostly* backward compatible: Already-existing indexes will
still have unescaped column names stored for their "target" string,
and the unescaping code will see they are not wrapped in quotes, and
not change them. Backward compatibility will only fail on existing indexes
on columns whose name begin and end in the quote characters - but this
case is extremely unlikely.
This patch illustrates how un-ideal our index "target" encoding is,
but isn't what made it un-ideal. We should not have used three different
formats for the index target - the third representation (JSON) should
have sufficed. However, two two other representations are identical
to Cassandra's, so using them when we can has its compatibility
advantages.
The patch makes test_secondary_index.py::test_index_quoted_names pass.
Fixes#10707.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Before this patch, trying to create an index on entries(x) where x is
not a map results in an error message:
Cannot create index on index_keys_and_values of column x
The string "index_keys_and_values" is strange - Cassandra prints the
easier to understand string "entries()" - which better corresponds to
what the user actually did.
It turns out that this string "index_keys_and_values" comes from an
elaborate set of variables and functions spanning multiple source files,
used to convert our internal target_type variable into such a string.
But although this code was called "index_option" and sounded very
important, it was actually used just for one thing - error messages!
So in this patch we drop the entire "index_option" abstraction,
replacing it by a static trivial function defined exactly where
it's used (create_index_statement.cc), which prints a target type.
While at it, we print "entries()" instead of "index_keys_and_values" ;-)
After this patch, the
test_secondary_index.py::test_index_collection_wrong_type
finally passes (the previous patch fixed the default table names it
assumes, and this patch fixes the expected error messages), so its
"xfail" tag is removed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When creating an index "CREATE INDEX ON tbl(keys(m))", the default name
of the index should be tbl_m_idx - with just "m". The current code
incorrectly used the default name tbl_m_keys_idx, so this patch adds
a test (which passes on Cassandra, and after this patch also on Scylla)
and fixes the default name.
It turns out that the default index name was based on a mysterious
index_target::as_string(), which printed the target "keys(m)" as
"m_keys" without explaining why it was so. This method was actually
used only in three places, and all of them wanted just the column
name, without the "_keys" suffix! So in this patch we rename the
mysterious as_string() to column_name(), and use this function instead.
Now that the default index name uses column_name() and gets just
column_name(), the correct default index name is generated, and the
test passes.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
After the previous patches implemented collection indexing, several
tests in test/cql-pytest/test_secondary_index.py that were marked
with "xfail" started to pass - so here we remove the xfail.
Only three collection indexing tests continue to xfail:
test_secondary_index.py::test_index_collection_wrong_type
test_secondary_index.py::test_index_quoted_names (#10707)
test_secondary_index.py::test_local_secondary_index_on_collection (#10713)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
collection fails.
Collection indexing is being tracked by #2962. Global secondary index
over collection is enabled by #10123. Leave this test to track this
behaviour.
Related issue: #10713
ported from Cassandra is expected to fail, since Scylla assumes that
comparison with null doesn't throw error, just evaluates to false. Since
it's not a bug, but expected behavior from the perspective of Scylla, we
don't mark it as xfail.
When the secondary-index code builds a materialized view on column x, it
adds "x IS NOT NULL" to the where-clause of the view, as required.
However, when we index a collection column, we index individual pieces
of the collection (keys, values), the the entire collection, so checking
if the entire collection is null does not make sense. Moreover, for a
collection column x, "x IS NOT NULL" currently doesn't work and throws
errors when evaluating that expression when data is written to the table.
The solution used in this patch is to simply avoid adding the "x IS NOT
NULL" when creating the materialized view for a collection index.
Everything works just fine without it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>