There are situations where a node outside the current configuration is
the only node that can become a leader. We become candidates in such
cases. But there is an easy check for when we don't need to; a comment was
added explaining that.
* kbr/candidate-outside-config-v3:
raft: sometimes become a candidate even if outside the configuration
raft: fsm: update _commit_idx when applying snapshot
Adds a sync_point structure. A sync point is a (possibly incomplete)
mapping from hint queues to a replay position in it. Users will be able
to create sync points consisting of the last written positions of some
hint queues, so then they can wait until hint replay in all of the
queues reach that point.
The sync point supports serialization - first it is serialized with the
help of IDL to a binary form, and then converted to a hexadecimal
string. Deserialization is also possible.
The base64 encoding/decoding functions will be used for serialization of
hint sync point descriptions. Base64 format is not specific to
Alternator, so it can be moved to utils.
There are situations where a node outside the current configuration is
the only node that can become a leader. We become candidates in such
cases. But there is an easy check for when we don't need to; a comment was
added explaining that.
Before the fix introduced in the previous patch, the cluster would
forget its configuration when taking a snapshot, making it unable to
reelect a leader. This regression test catches that.
Calculating clustering ranges on a local index has been rewritten to use the new `expression` variant.
This allows us to finally remove the old `bounds_ranges` function.
Closes#9080
* github.com:scylladb/scylla:
cql3: Remove unused functions like bounds_ranges
cql3: Use expressions to calculate the local-index clustering ranges
statement_restrictions_test: tests for extracting column restrictions
expression: add a function to extract restrictions for a column
We must not apply remote snapshots with commit indexes smaller than our
local commit index; this could result in out-of-order command
application to the local state machine replica, leading to
serializability violations.
Message-Id: <20210805112736.35059-1-kbraun@scylladb.com>
In issue #9083 a user noted that whereas Cassandra's partition-count
estimation is accurate, Scylla's (rewritten in commit b93cc21) is very
inaccurate. The tests introduce here, which all xfail on Scylla, confirm
this suspicion.
The most important tests are the "simple" tests, involving a workload
which writes N *distinct* partitions and then asks for the estimated
partition count. Cassandra provides accurate estimates, which grow
more accurate with more partitions, so it passes these tests, while
Scylla provides bad estimates and fails them.
Additional tests demonstrate that neither Scylla nor Cassandra
can handle anything beyond the "simple" case of distinct partitions.
Two tests which xfail on both Cassandra and Scylla demonstrate that
if we write the same partitions to multiple sstables - or also delete
partitions - the estimated partition counts will be way off.
Refs #9083
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210726211315.1515856-1-nyh@scylladb.com>
In this patch we add another test case for a case where ALLOW FILTERING
should not be required (and Cassandra doesn't require it) but Scylla
does.
This problem was introduced by pull request #9122. The pull request
fixed an incorrect query (see issue #9085) involving both an index and
a multi-column restriction on a compound clustering key - and the fix is
using filtering. However, in one specific case involving a full prefix,
it shouldn't require filtering. This test reproduces this case.
The new test passes on Cassandra (and also theoretically, should pass),
but fails on Scylla - the check_af_optional() call fails because Scylla
made the ALLOW FILTERING mandatory for that case.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210803092046.1677584-1-nyh@scylladb.com>
We already have tests for Query's ExclusiveStartKey option, but we
only exercised it as a way for paging linearly through all the results.
Now we add a test that confirms that ExclusiveStartKey can be used not
just for paging through all the result - but also for jumping directly to
the middle of a partition after any clustering key (existing or non-
existing clustering key). The new test also for the first time verifies
that ExclusiveStartKey with a specific format works (previous tests just
copied LastEvaluatedKey to ExclusiveStartKey, so any opaque cookie could
have worked).
The test passes on both DynamoDB and Alternator so it did not find a new
bug. But it's useful to have as a regression test, in case in the future
we want to improve paging performance (see #6278) - and need to keep in
mind that ExclusiveStartKey is not just for paging.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210729114703.1609058-1-nyh@scylladb.com>
They were already correctly returned to the caller, but we had a
leftover discarded future that would sometimes end up with a
broken_promise exception. Ignore the exception explicitly.
Message-Id: <20210803122207.78406-1-kbraun@scylladb.com>
With commit 1924e8d2b6, compaction code was moved into a
top level dir as compaction is layered on top of sstables.
Let's continue this work by moving all compaction unit tests
into its own test file. This also makes things much more
organized.
sstable_datafile_test, as its name implies, will only contain
sstable data tests. Perhaps it should be renamed to only
sstable_data_test, as the test also contains tests involving
other components, not only the data one.
BEFORE
$ cat test/boost/sstable_datafile_test.cc | grep TEST_CASE | wc -l
105
AFTER
$ cat test/boost/sstable_compaction_test.cc | grep TEST_CASE | wc -l
57
$ cat test/boost/sstable_datafile_test.cc | grep TEST_CASE | wc -l
48
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210802192120.148583-1-raphaelsc@scylladb.com>
We should not use the current term; we should use the term of the
snapshot's index, which may be lower.
* https://github.com/kbr-/scylla/tree/snapshot-right-term-fix:
test: raft: regression test for using the correct term when taking a snapshot
test: raft: randomized_nemesis_test: server configuration parameter
raft: use the correct term when storing a snapshot
When a WHERE clause contains a multi-column restriction and an indexed
regular column, we must filter the results. It is generally not
possible to craft the index-table query so it fetches only the
matching rows, because that table's clustering key doesn't match up
with the column tuple.
Fixes#9085.
Tests: unit (dev, debug)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#9122
It was observed that since fce124bd90 ('Merge "Introduce
flat_mutation_reader_v2" from Tomasz') database_test takes much longer.
This is expected since it now runs the upgrade/downgrade reader tests
on all existing tests. It was also observed that in a similar time frame
database_test sometimes times our on test machines, taking much
longer than usual, even with the extra work for testing reader
upgrade/downgrade.
In an attempt to reproduce, I noticed ti failing on EMFILE (too many
open file descriptors). I saw that tests usually use ~100 open file
descriptors, while the default limit is 1024.
I suspect we have runaway concurrency, but I was not able to pinpoint the
cause. It could be compaction lagging behind, or cleanup work for
deleting tables (the test
test_database_with_data_in_sstables_is_a_mutation_source creates and
deletes many tables).
As a stopgap solution to unblock the tests, this patch raises the file
descriptor limit in the way recommended by [1]. While tests shouldn't
use so many descriptors, I ran out of ideas about how to plug the hole.
Note that main() does something similar, through more elaborate since
it needs to communicate to users. See ec60f44b64 ("main: improve
process file limit handling").
[1] http://0pointer.net/blog/file-descriptor-limits.htmlCloses#9121
Restrict expected exception message to filter only relevant exception,
matching both for scylla and cassandra.
For example, the former has this message:
Cannot use query parameters in CREATE MATERIALIZED VIEW statements
While the latter throws this:
Bind variables are not allowed in CREATE MATERIALIZED VIEW statements
Also, place cleanup code in try-finally clause.
Tests: cql-pytest:test_materialized_view.py(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210802083912.229886-1-pa.solodovnikov@scylladb.com>
This is a translation of Cassandra's CQL unit test source file
validation/entities/TimestampTest.java into our our cql-pytest framework.
This test file checks has a few tests (8) on various features of
cell timestamps. All these tests pass on Cassandra and on Scylla - i.e.,
these tests no new Scylla bug was detected :-)
Two of the new tests are very slow (6 seconds each) and check a trivial
feature that was already checked elsewhere more efficiently (the fact
that TTL expiration works), so I marked them "skip" after verifying they
really pass.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210801142738.1633126-1-nyh@scylladb.com>
"
Previously, the following functions were
incorrectly marked as pure, meaning that the
function is executed at "prepare" step:
* `currenttimestamp()`
* `currenttime()`
* `currentdate()`
* `currenttimeuuid()`
For functions that possibly depend on timing and random seed,
this is clearly a bug. Cassandra doesn't have a notion of pure
functions, so they are lazily evaluated.
Make Scylla to match Cassandra behavior for these functions.
Add a unit-test for a fix (excluding `currentdate()` function,
because there is no way to use synthetic clock with query
processor and sleeping for a whole day to demonstrate correct
behavior is clearly not an option).
Also, extend the cql-pytest for #8604 since there are now more
non-deterministic CQL functions, they are all subject to the test
now.
Fixes: #8816
"
* 'timeuuid_function_pure_annotation_v3' of https://github.com/ManManson/scylla:
test: test_non_deterministic_functions: test more non-pure functions
cql3: change `current*()` CQL functions to be non-pure
Check that all existing non-pure functions (except for
`currentdate()`) work correctly with or without prepared
statements.
Tests: cql-pytest/test_non_deterministic_functions.py(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
These include the following:
* `currenttimestamp()`
* `currenttime()`
* `currentdate()`
* `currenttimeuuid()`
Previously, they were incorrectly marked as pure, meaning
that the function is executed at "prepare" step.
For functions that possibly depend on timing and random seed,
this is clearly a bug. Cassandra doesn't have a notion of pure
functions, so they are lazily evaluated.
Make Scylla to match Cassandra behavior for these functions.
Add a unit-test for a fix (excluding `currentdate()` function,
because there is no way to use synthetic clock with query
processor and sleeping for a whole day to demonstrate correct
behavior is clearly not an option).
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The request should fail with `InvalidRequest` exception and shouldn't
crash the database.
Don't check for actual error messages, because they are different
between Scylla and Cassandra.
The former has this message:
Cannot use query parameters in CREATE MATERIALIZED VIEW statements
While the latter throws this:
Bind variables are not allowed in CREATE MATERIALIZED VIEW statements
Tests: cql-pytest/test_materialized_view.py(scylla dev, cassandra trunk)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
"
`function_call` AST nodes are created for each function
with side effects in a CQL query, i.e. non-deterministic
functions (`uuid()`, `now()` and some others timeuuid-related).
These nodes are evaluated either when a query itself is executed
or query restrictions are computed (e.g. partition/clustering
key ranges for LWT requests).
We need to cache the calls since otherwise when handling a
`bounce_to_shard` request for an LWT query, we can possibly
enter an infinite bouncing loop (in case a function is used
to calculate partition key ranges for a query), since the
results can be different each time.
Furthermore, we don't support bouncing more than one time.
Returning `bounce_to_shard` message more than one time
will result in a crash.
Caching works only for LWT statements and only for the function
calls that affect partition key range computation for the query.
`variable_specifications` class is renamed to `prepare_context`
and generalized to record information about each `function_call`
AST node and modify them, as needed:
* Check whether a given function call is a part of partition key
statement restriction.
* Assign ids for caching if above is true and the call is a part
of an LWT statement.
There is no need to include any kind of statement identifier
in the cache key since `query_options` (which holds the cache)
is limited to a single statement, anyway.
Function calls are indexed by the order in which they appear
within a statement while parsing. There is no need to
include any kind of statement identifier to the cache key
since `query_options` (which holds the cache) is limited
to a single statement, anyway.
Note that `function_call::raw` AST nodes are not created
for selection clauses of a SELECT statement hence they
can only accept only one of the following things as parameters:
* Other function calls.
* Literal values.
* Parameter markers.
In other words, only parameters that can be immediately reduced
to a byte buffer are allowed and we don't need to handle
database inputs to non-pure functions separately since they
are not possible in this context. Anyhow, we don't even have
a single non-pure function that accepts arguments, so precautions
are not needed at the moment.
Add a test written in `cql-pytest` framework to verify
that both prepared and unprepared lwt statements handle
`bounce_to_shard` messages correctly in such scenario.
Fixes: #8604
Tests: unit(dev, debug)
NOTE: the patchset uses `query_options` as a container for
cached values. This doesn't look clean and `service::query_state`
seems to be a better place to store them. But it's not
forwarded to most of the CQL code and would mean that a huge number
of places would have to be amended.
The series presents a trade-off to avoid forwarding `query_state`
everywhere (but maybe it's the thing that needs to be done, nonetheless).
"
* 'lwt_bounce_to_shard_cached_fn_v6' of https://github.com/ManManson/scylla:
cql-pytest: add a test for non-pure CQL functions
cql3: cache function calls evaluation for non-deterministic functions
cql3: rename `variable_specifications` to `prepare_context`
Introduce a test using `cql-pytest` framework to assert that
both prepared an unprepared LWT statements (insert with
`IF NOT EXISTS`) with a non-deterministic function call
work correctly in case its evaluation affects partition
key range computation (hence the choice of `cas_shard()`
for lwt query).
Tests: cql-pytest/test_non_deterministic_functions.py
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Convert all known tri-compares that return an int to return std::strong_ordering.
Returning an int is dangerous since the caller can treat it as a bool, and indeed
this series uncovered a minor bug (#9103).
Test: unit (dev)
Fixes#1449Closes#9106
* github.com:scylladb/scylla:
treewide: remove redundant "x <=> 0" compares
test: mutation_test: convert internal tri-compare to std::strong_ordering
utils: int_range: change to std::strong_ordering
test: change some internal comparators to std::strong_ordering
utils: big_decimal: change to std::strong_ordering
utils: fragment_range: change to std::strong_ordering
atomic_cell: change compare_atomic_cell_for_merge() to std::strong_ordering
types: drop scaffolding erected around lexicographical_tri_compare
sstables: keys: change to std::strong_ordering internally
bytes: compare_unsigned(): change to std::strong_ordering
uuid: change comparators to std::strong_ordering
types: convert abstract_type::compare and related to std::strong_ordering
types: reduce boilerplate when comparing empty value
serialized_tri_compare: change to std::strong_ordering
compound_compat: change to std::strong-ordering
types: change lexicographical_tri_compare, prefix_equality_tri_compare to std::strong_ordering
"
There are few places that call global storage service, but all
are easily fixable without significant changes.
1. alternator -- needs token metadata, switch to using proxy
2. api -- calls methods from storage service, all handlers are
registered in main and can capture storage service from there
3. thrift -- calls methods from storage service, can carry the
reference via controller
4. view -- needs tokens, switch to using (global) proxy
5. storage_service -- (surprisingly) can use "this"
tests: unit(dev), dtest(simple_boot_shutdown, dev)
"
* 'br-unglobal-storage-service' of https://github.com/xemul/scylla:
storage_service: Make it local
storage_service: Remove (de)?init_storage_service()
storage_service: Use container() in run_with(out)_api_lock
storage_service: Unmark update_topology static
storage_service: Capture this when appropriate
view: Use proxy to get token metadata from
thrift: Use local storage service in handlers
thrift: Carry sharded<storage_service>& down to handler
api: Capture and use sharded<storage_service>& in handlers
api: Carry sharded<storage_service>& down to some handlers
alternator: Take token metadata from server's storage_proxy
alternator: Keep storage_proxy on server
"
This is the 3rd slowest test in the set. There are 3 cases out
there that are hard-coded to be sequential. However, splitting
them into boost test cases helps running this test faster in
--parallel-cases mode. Timings for debug mode:
Total before the patch: 25 min
Sequential after the patch: 25 min
Basic case: 5 min
Evict-paused-readers case: 5 min
Single-mutation-buffer case: 15 min
tests: unit.multishard_combining_reader_as_mutation_source(debug)
"
* 'br-parallel-mcr-test' of https://github.com/xemul/scylla:
test: Split test_multishard_combining_reader_as_mutation_source into 3
test: Fix indentation after previous patch
test: Move out internals of test_multishard_combining_reader_as_mutation_source
There are 3 places that can now declare local instance:
- main
- cql_test_env
- boost gossiper test
The global pointer is saved in debug namespace for debugging.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
This series changes the behavior of the system when executing reads
annotated with "bypass cache" clause in CQL. Such reads will not
use nor populate the sstable partition index cache and sstable index page cache.
"
* 'bypass-cache-in-sstable-index-reads' of github.com:tgrabiec/scylla:
sstables: Do not populate page cache when searching in promoted index for "bypass cache" reads
sstables: Do not populate partition index cache for "bypass cache" reads
If x is of type std::strong_ordering, then "x <=> 0" is equivalent to
x. These no-ops were inserted during #1449 fixes, but are now unnecessary.
They have potential for harm, since they can hide an accidental of the
type of x to an arithmetic type, so remove them.
Ref #1449.
gcc 10.3.1 spews the following error:
```
_test_generator::generate_scenario(std::mt19937&) const’:
test/boost/mutation_reader_test.cc:3731:28: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
3731 | for (auto i = 0; i < num_ranges; ++i) {
| ~~^~~~~~~~~~~~
```
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210728073538.2467040-1-bhalevy@scylladb.com>
Patch the .row_tombstones() to return the range_tombstone_list
wrapped into the immutable_collection<> so that callers are
guaranteed not to touch the collection itself, but still can
modify the tombstones.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some callers of mutation_partition::row_tomstones() don't want
(and shouldn't) modify the list itself, while they may want to
modify the tombstones. This patch explicitly locates those that
need to modify the collection, because the next patch will
return immutable collection for the others.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Patch the .clustered_rows() method to return the btree of rows
wrapped into the immutable_collection<> so that callers are
guaranteed not to touch the collection itself, but still can
modify the elements in it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patches will mark btree::iterator methods that modify
the tree itself as private, so stop using them in tests.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
This series fixes two issues which cause very poor efficiency of reads
when there is a lot of range tombstones per live row in a partition.
The first issue is in the row_cache reader. Before the patch, all range
tombstones up to the next row were copied into a vector, and then put
into the buffer until it's full. This would get quadratic if there is
much more range tombstones than fit in a buffer.
The fix is to avoid the accumulation of all tombstones in the vector
and invoke the callback instead, which stops the iteration as soon as
the buffer is full.
Fixes#2581.
The second, similar issue was in the memtable reader.
Tests:
- unit (dev)
- perf_row_cache_update (release)
"
* tag 'no-quadratic-rt-in-reads-v1' of github.com:tgrabiec/scylla:
test: perf_row_cache_update: Uncomment test case for lots of range tombstones
row_cache: Consume range tombstones incrementally
partition_snapshot_reader: Avoid quadratic behavior with lots of range tombstones
tests: mvcc: Relax monotonicity check
range_tombstone_stream: Introduce peek_next()
This patch follows #9002, further reducing the complexity of the sstable readers.
The split between row consumer interfaces and implementations has been first added in 2015, and there is no reason to create new implementations anymore. By merging those classes, we achieve a sizeable reduction in sstable reader length and complexity.
Refs #7952
Tests: unit(dev)
Closes#9073
* github.com:scylladb/scylla:
sstables: merge row_consumer into mp_row_consumer_k_l
sstables: move kl row_consumer
sstables: merge consumer_m into mp_row_consumer_m
sstables: move mp_row_consumer_m