`int_range::make_singular()` accepts a single `int` as its parameter,
so there is no need to brace the paramter with `{}`. this helps to silence
the warning from Clang, like:
```
/home/kefu/dev/scylladb/test/perf/perf_fast_forward.cc:1396:63: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
check_no_disk_reads(test(int_range::make_singular({100}))),
^~~~~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12903
Task ttl can be set with task manager test api, which is disabled
in release mode.
Move get_and_update_ttl from task manager test api to task manager
api, so that it can be used in release mode.
Closes#12894
The former is a convenience wrapper over the latter. There's no real benefit in using it, but having two test_env-s is worse than just one.
Closes#12794
* github.com:scylladb/scylladb:
sstable_utils: Move the test_setup to perf/
sstable_utils: Remove unused wrappers over test_env
sstable_test: Open-code do_with_cloned_tmp_directory
sstable_test: Asynchronize statistics_rewrite case
tests: Replace test_setup::do_with_tmp_directory with test_env::do_with(_async)?
This patch adds a reproducer for the bug described in issue #7964 -
The restriction `where k=1 and c=2` (when k,c are the key columns)
returns (at most) a single row so doesn't need ALLOW FILTERING,
but if we add a third restriction, say `v=2`, this still processes
at most a single row so doesn't need ALLOW FILTERING - and both
Scylla and Cassandra get it wrong - so it's marked with both xfail
and cassandra_bug.
The patch also adds another test that for longer partition slices,
e.g., `where k=1 and c>2`, although the slice itself doesn't need
filtering, if we add `v=2` here we suddenly do need ALLOW FILTERING,
because the slice itself may be a large number of rows, and adding
`v=2` may restrict it to just a few results. This test passes
on both Scylla and Cassandra.
Issue #7964 mentioned these scenarios and even had some example code,
but we never added it to the test suite, so we finally do it now.
Refs #7964
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12850
Tests of each module that is integrated with task manager use
calls to task manager api. Boilerplate to call, check status, and
get result may be reduced using functions.
task_manager_utils.py contains wrappers for task manager api
calls and helpers that may be reused by different tests.
Closes#12844
* github.com:scylladb/scylladb:
test: use functions from task_manager_utils.py in test_task_manager.py
test: add task_manager_utils.py
as an abstract base class `output_writer` is inherited by both
`json_output_writer` and `text_output_writer`. and `output_manager`
manages the lifecycles of used writers using
`std::unique_ptr<output_writer>`.
before this change, the dtor of `output_writer` is not marked as
virtual, so when its dtor is invoked, what gets called is the base
class's dtor. but the dtor of `json_output_writer` is non-trivial
in the sense that this class is aggregated by a bunch of member
variables. if we don't invoke its dtor when destroying this object,
leakage is expected.
so, in this change, the dtor of `output_writer` is marked as virtual,
this makes all of its derived classes' dtor virtual. and the right
dtor is always called.
test/perf is only designed for testing, and not used in production,
also, this feature was recently integrated into scylla executable in
228ccdc1c7.
so there is no need to backport this change.
change should also silence the warning from Clang 17:
```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on 'output_writer' that is abstract but has non-virtual destructor [-Werror,-Wdelete-abstract-non-virtual-dtor]
delete __ptr;
^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<output_writer>::operator()' requested here
get_deleter()(std::move(__ptr));
^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:88:15: note: in instantiation of member function 'std::unique_ptr<output_writer>::~unique_ptr' requested here
__location->~_Tp();
^
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12888
This patch adds a cql-pytest test for an old secondary-index bug
that was described three years ago in issue #5823. cql-pytest makes
it easy to run the same test against different versions of Scylla,
and it was used to check that the bug existed in Scylla 2.3.0 but
was gone by 2.3.5, and also not present in master or in 2021.1.
A bit about the bug itself:
A secondary index is useful for equality restrictions (a=2) but can't be
used for inequality restrictions (a>=2). In Scylla 3.2.0 we used to have a
bug that because the restriction a>=2 couldn't be used through the index,
it was ignored completely. This is of course a mistake.
Refs #5823
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12856
Since 5c0f9a8180 ("mutation_partition: Switch cache of
rows onto B-tree") it's no longer in use, except in some
performance test, so remove it.
Although scylla-gdb.py is sometimes used with older releases,
it's so outdated we can remove it from there too.
Closes#12868
This patch adds a reproducer to a static-column index lookup bug
described in issue #12829: The restriction `where pk=0 and s=1 and c=3`
where pk,c are the primary key and s is an indexed static column,
results in an internal error: "clustering column id 2 >= 2".
Unfortunately, because on_internal_error() crashes Scylla in debug
mode, we need to mark this failing test with skip instead of xfail.
Refs #12829
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12852
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
In issue #12828 it was noted that Scylla requires ALLOW FILTERING
for `where b=1 and c=1` where b is an indexed static column and
c is a clustering key, and it was suggested that this is a bug.
This patch adds a test that confirms that both Scylla and Cassandra
require ALLOW FILTERING in this case. We explain in a comment that
this requirement is expected (i.e., it's not a bug), as the `b=1`
may match a huge number of rows, and the `c=1` may further match just
a few of those - i.e., it is filtering.
This test is virtually identical to the test we already had for
`where a=1 and c=1` - when `a` is an indexed regular column.
There too, the ALLOW FILTERING is required.
Closes#12828 as "not a bug".
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12848
Task manager api will be used in many tests. Thus, to make it easier
api calls to task manager are wrapped into functions in task_manager_utils.py.
Some helpers that may be reused in other tests are moved there too.
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
Patch 55a8421e3d fixed an inefficiency when rebuilding
statistics with many compaction groups, but it incorrectly removed
the update for newly added SSTables. This patch restores it.
When a new SSTable is added to any of the groups, the stats are
incrementally updated (as before). On compaction completion,
statistics are still rebuilt by simply iterating through each
group, which keeps track of its own stats.
Unit tests are added to guarantee the stats are correct both after
compaction completion and memtable flush.
Fixes#12808.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#12834
The test test_scan.py::test_scan_long_partition_tombstone_string
checks that a full-table Scan operation ends a page in the middle of
a very long string of partition tombstones, and does NOT scan the
entire table in one page (if we did that, getting a single page could
take an unbounded amount of time).
The test is currently flaky, having failed in CI runs three times in
the past two months.
The reason for the flakiness is that we don't know exactly how long
we need to make the sequence of partition tombstones in the test before
we can be absolutely sure a single page will not read this entire sequence.
For single-partition scans we have the "query_tombstone_page_limit"
configuration parameter, which tells us exactly how long we need to
make the sequence of row tombstones. But for a full-table scan of
partition tombstones, the situation is more complicated - because the
scan is done in parallel on several vnodes in parallel and each of
them needs to read query_tombstone_page_limit before it stops.
In my experiments, using query_tombstone_limit * 4 consecutive tombstones
was always enough - I ran this test hundreds of times and it didn't fail
once. But since it did fail on Jenkins very rarely (3 times in the last
two months), maybe the multiplier 4 isn't enough. So this patch doubles
it to 8. Hopefully this would be enough for anyone (TM).
This makes this test even bigger and slower than it was. To make it
faster, I changed this test's write isolation mode from the default
always_use_lwt to forbid_rmw (not use LWT). This leaves the test's
total run time to be similar to what it was before this patch - around
0.5 seconds in dev build mode on my laptop.
Fixes#12817
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12819
Currently, because serialize_visitor::operator() is not implemented for counters, we cannot convert a counter returned by a WASM UDF to bytes when returning from wasm::run_script().
We could disallow using counters as WASM UDF return types, but an easier solution which we're already using in Lua UDFs is treating the returned counters as 64-bit integers when deserializing. This patch implements the latter approach and adds a test for it.
Closes#12806
* github.com:scylladb/scylladb:
wasm udf: deserialize counters as integers
test_wasm.py: add utility function for reading WASM UDF saved in files
This patch is based on #12681, only last 3 commits are relevant.
As described in #12709, currently, when a UDF used in a UDA is replaced, the UDA is not updated until the whole node is restarted.
This patch fixes the issue by updating all affected UDAs when a UDF is replaced.
Additionally, it includes a few convenience changes
Closes#12710
* github.com:scylladb/scylladb:
uda: change the UDF used in a UDA if it's replaced
functions: add helper same_signature method
uda: return aggregate functions as shared pointers
LWT `IF` (column_condition) duplicates the expression prepare and evaluation code. Annoyingly,
LWT IF semantics are a little different than the rest of CQL: a NULL equals NULL, whereas usually
NULL = NULL evaluates to NULL.
This series converts `IF` prepare and evaluate to use the standard expression code. We employ
expression rewriting to adjust for the slightly different semantics.
In a few places, we adjust LWT semantics to harmonize them with the rest of CQL. These are pointed
out in their own separate patches so the changes don't get lost in the flood.
Closes#12356
* github.com:scylladb/scylladb:
cql3: lwt: move IF clause expression construction to grammar
cql3: column_condition: evaluate column_condition as a single expression
cql3: lwt: allow negative list indexes in IF clause
cql3: lwt: do not short-circuit col[NULL] in IF clause
cql3: column_condition: convert _column to an expression
cql3: expr: generalize evaluation of subscript expressions
cql3: expr: introduce adjust_for_collection_as_maps()
cql3: update_parameters: use evaluation_inputs compatible row prefetch
cql3: expr: protect extract_column_value() from partial clustering keys
cql3: expr: extract extract_column_value() from evaluation machinery
cql3: selection: introduce selection_from_partition_slice
cql3: expr: move check for ordering on duration types from restrictions to prepare
cql3: expr: remove restrictions oper_is_slice() in favor of expr::is_slice()
cql3: column_condition: optimize LIKE with constant pattern after preparing
cql3: expr: add optimizer for LIKE with constant pattern
test: lib: add helper to evaluate an expression with bind variables but no table
cql3: column_condition: make the left-hand-side part of column_condition::raw
cql3: lwt: relax constraints on map subscripts and LIKE patterns
cql3: expr: fix search_and_replace() for subscripts
cql3: expr: fix function evaluation with NULL inputs
cql3: expr: add LWT IF clause variants of binary operators
cql3: expr: change evaluate_binop_sides to return more NULL information
In issue #12601, a dtest involving paging of ListStreams showed
incorrect results - the paged results had one duplicate stream and one
missing stream. We believe that the cause of this bug was that the
unsorted map of tables can change order between pages. In this patch
we add a test test_list_streams_paged_with_new_table which can
demonstrate this bug - by adding a lot of tables in mid-paging, we
cause the unsorted map to be reshufled and the paging to break.
This is not the same situation as in #12601 (which did not involve
new tables) but we believe it demonstrates the same bug - and check
its fix. Indeed this passes with the fix in pull request #12614 and
fails without it.
This patch also adds a second test, test_stream_arn_unchanging:
That test eliminates a guess we had for the cause of #12601. We
thought that maybe stream ARN changing on a table if its schema
version changes, but the new test confirms that it actually behaves
as expected (the stream ARN doesn't change).
Refs #12601
Refs #12614
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12616
This patch adds yet another reproducer for issue #10649, where a
the combination of filtering and LIMIT returns fewer results when
a secondary index is added to the table.
Whereas the previous tests we had for this issue involved a regular
(global) index, the new test uses a local index (a Scylla-only feature).
It shows that the same bug exists also for local indexes, as noticed
by a user in #12766.
Refs #10649
Refs #12766
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12783
Recently we enabled RBNO by default in all topology operations. This
made the operations a bit slower (repair-based topology ops are a bit
slower than classic streaming - they do more work), and in debug mode
with large number of concurrent tests running, they might timeout.
The timeout for bootstrap was already increased before, do the same for
decommission/removenode. The previously used timeout was 300 seconds
(this is the default used by aiohttp library when it makes HTTP
requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which
is 1000 seconds.
Closes#12765
* github.com:scylladb/scylladb:
test/pylib: use larger timeout for decommission/removenode
test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT
Issue #7659, which we solved long ago, was about a query which included
a non-EQ restriction and wrongly picked up one of the indexes. It had
a short C++ regression test, but here we add a more elaborate Python
test for the same bug. The advantages of the Python test are:
1. The Python test can be run against any version of Scylla (e.g., to
whether a certain version contains a backport of the fix).
2. The Python test reproduces not only a "benign" query error, but also
an assertion-failed crash which happened when the non-EQ restriction
was an "IN".
3. The Python test reproduces the same bug not just for a regular
index, but also a local index.
I checked that, as expected, these tests pass on master, but fail
(and crash Scylla) in old branches before the fix for #7659.
Refs #7659.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12797
It's only used by the single test and apparently exists since the times
seastar was missing the future::discard_result() sugar
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#12803
Currently, because serialize_visitor::operator() is not implemented
for counters, we cannot convert a counter returned by a WASM UDF
to bytes when returning from wasm::run_script().
We could disallow using counters as WASM UDF return types, but an
easier solution which we're already using in Lua UDFs is treating
the returned counters as 64-bit integers when deserializing. This
patch implements the latter approach and adds a test for it.
Currently, we're repeating the same os.path, open, read, replace
each time we read a WASM UDF from a file.
To reduce code bloat, this patch adds a utility function
"read_function_from_file" that finds the file and reads it given
a function name and an optional new name, for cases when we want
to use a different name in cql (mostly for unique_names).
Move long running topology tests out of `test_topology.py` and into their own files, so they can be run in parallel.
While there, merge simple schema tests.
Closes#12804
* github.com:scylladb/scylladb:
test/topology: rename topology test file
test/topology: lint and type for topology tests
test/topology: move topology ip tests to own file
test/topology: move topology test remove garbaje...
test/topology: move topology rejoin test to own file
test/topology: merge topology schema tests and...
test/topology: isolate topology smp params test
test/topology: move topology helpers to common file
Instead of the grammar passing expression bits to column_condition,
have the grammar construct an unprepared expression and pass it as
a whole. column_condition::raw then uses prepare_expression() to
prepare it.
The call to validate_operation_on_durations() is eliminated, since it's
already done be prepare_expression().
Some tests adjusted for slightly different wording.
LWT IF clause errors out on negative list index. This deviates
from non-LWT subscript evaluation, PostgresQL, and too-large index,
all of which evaluate the subscript operation to NULL.
Make things more consistent by also evaluating list[-1] to NULL.
A test is adjusted.
Currently if an LWT IF clause contains a subscript with NULL
as the key, then the entire IF clause is evaluated as FALSE.
This is incorrect, because col[NULL] = NULL would simplify
to NULL = NULL, which is interpreted as TRUE using the LWT
comparisons. Even with SQL NULL handling, "col[NULL] IS NULL"
should evaluate to true, but since we short-circuit as soon
as we encounter the NULL key, we cannot complete the evaluation.
Fix by setting cell_value to null instead of returning immediately.
Tests that check for this were adjusted. Since the test changed
behavior from not applying the statement to applying it, a new
statement is added that undoes the previous one, so downstream
statements are not affected.
After this change, all components of column_condition are expressions.
One LWT-specific hack was removed from the evaluation path:
- lists being represented as maps is made transparent by
converting during evaluation with adjust_for_collections_as_maps()
column_condition::applies_to() previously handled a missing row
by materializing a NULL for the column being evaluated; now it
materializes a NULL row instead, since evaluation of the column is
moved to common code.
A few more cases in lwt_test became legal, though I'm not sure
exactly why in this patch.
Both LWT IF clause and SELECT WHERE clause check that a duration type
isn't used in an ordered comparison, since duration types are unordered
(is 1mo more or less than 30d?). As a first step towards centralizing this
check, move the check from restrictions into prepare. When LWT starts using
prepare, the duplication will be removed.
The error message was changed: the word "slice" is an internal term, and
a comparison does not necessarily have to be in a restriction (which is
also an internal term).
Tests were adjusted.
Compiling a pattern is expensive and so we should try to do it
at prepare time, if the pattern is a constant. Add an optimizer
that looks for such cases and replaces them with a unary function
that embeds the compiled pattern.
This isn't integrated yet with prepare_expr(), since the filtering
code isn't ready for generic expressions. Its first user will be LWT,
which contains the optimization already (filtering had it as well,
but lost it sometime during the expression rewrite).
A unit test is added.
Sometimes we want to defeat the expression optimizer's ability to
fold constant expressions. A bind variable is a convenient way to
do this, without the complexity of faking a schema and row inputs.
Add a helper to evaluate an expression with bind variable parameters,
doing all the paperwork for us.
A companion make_bind_variable() is added to likewise simplify
creating bind variables for tests.
Previously, we rejected map subscripts that are NULL, as well as
LIKE patterns that are NULL. General SQL expression evaluation
allows NULL everywhere, and doesn't raise errors - an expression
involving NULL generally yields NULL. Change the behavior to
follow that. Since the new behavior was previously disallowed,
no one should have been relying on it and there is no compatibility
problem.
Update the tests and note it as a CQL extension.
LWT IF clause interprets equality differently from SQL (and the
rest of CQL): it thinks NULL equals NULL. Currently, it implements
binary operators all by itself so the fact that oper_t::EQ (and
friends) means something else in the rest of the code doesn't
bother it. However, we can't unify the code (in
column_condition.cc) with the rest of expression evaluation if
the meaning changes in different places.
To prepare for this, introduce a null_handling_style field to
binary_operator that defaults to `sql` but can be changed to
`lwt_nulls` to indicate this special semantic.
A few unit tests are added. LWT itself still isn't modified.
group0 members to own file
Move slow test for removenode with nodes not present in group0 to a
server after a sudden stop to a separate file.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
`paxos_response_handler::learn_decision` was calling
`cdc_service::augment_mutation_call` concurrently with
`storage_proxy::mutate_internal`. `augment_mutation_call` was selecting
rows from the base table in order to create the preimage, while
`mutate_internal` was writing rows to the table. It was therefore
possible for the preimage to observe the update that it accompanied,
which doesn't make any sense, because the preimage is supposed to show
the state before the update.
Fix this by performing the operations sequentially. We can still perform
the CDC mutation write concurrently with the base mutation write.
`cdc_with_lwt_test` was sometimes failing in debug mode due to this bug
and was marked flaky. Unmark it.
Also fix a comment in `cdc_with_lwt_test`.
Fixes#12098Closes#12768
* github.com:scylladb/scylladb:
test/cql-pytest: test_cdc: regression test for #12098
test/cql: cdc_with_lwt_test: fix comment
service: storage_proxy: sequence CDC preimage select with Paxos learn
... move them to their own file.
Schema verification tests for restart, add, and hard stop of server can
be done with the same cluster. Merge them in the same test case.
While there, move them to a separate file to be run independently as
this is a slow test.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Improve logging by printing the cluster at the end of each test.
Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure.
Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test.
Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do.
Closes#12652
* github.com:scylladb/scylladb:
test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters
test/topology: don't drop random_tables keyspace after a failed test
test/pylib: mark cluster as dirty after a failed test
test: pylib, topology: don't perform operations after test on a dirty cluster
test/pylib: print cluster at the end of test
Recently we enabled RBNO by default in all topology operations. This
made the operations a bit slower (repair-based topology ops are a bit
slower than classic streaming - they do more work), and in debug mode
with large number of concurrent tests running, they might timeout.
The timeout for bootstrap was already increased before, do the same for
decommission/removenode. The previously used timeout was 300 seconds
(this is the default used by aiohttp library when it makes HTTP
requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which
is 1000 seconds.