This should interrupt all sleeps in component teardown.
Before this patch, there was a 1s sleep on gossiper shutdown, which I
don't know where it comes from. After the patch there is no such
sleep.
Breaks the file into individually tagged + crc:ed pages.
Each page (sized as disk write alignment) gets a trailing
12-byte metadata, including CRC of the first page-12 bytes,
and the ID of the segment being written.
When reading, each page read is CRC:ed and checked to be part
of the expected segment by comparing ID:s. If crc is broken,
we have broken data. If crc is ok, but ID does not match, we
have a prematurely terminated segment (truncated), which, depending
on whether we use batch mode or not, implied data loss.
Refs #11845
When replaying, differentiate between the two cases for failure we have:
- A broken actual entry - i.e. entry header/data does not hold up to
crc scrutiny
- Truncated file - i.e. a chunk header is broken or unreadable. This can
be due to either "corruption" (i.e. borked write, post-corruption, hw
whatever), or simply an unterminated segment.
The difference is that the former is recoverable, the latter is not.
We now signal and report the two separately. The end result for a user
is not much different, in either case they imply data loss and the
need for repair. But there is some value in differentiating which
of the two we encountered.
Modifies and adds test cases.
reconcilable_result_builder passes range tombstone changes to _rt_assembler
using table schema, not query schema.
This means that a tombstone with bounds (a; b), where a < b in query schema
but a > b in table schema, will not be emitted from mutation_query.
This is a very serious bug, because it means that such tombstones in reverse
queries are not reconciled with data from other replicas.
If *any* queried replica has a row, but not the range tombstone which deleted
the row, the reconciled result will contain the deleted row.
In particular, range deletes performed while a replica is down will not
later be visible to reverse queries which select this replica, regardless of the
consistency level.
As far as I can see, this doesn't result in any persistent data loss.
Only in that some data might appear resurrected to reverse queries,
until the relevant range tombstone is fully repaired.
This series fixes the bug and adds a minimal reproducer test.
Fixes#10598Closesscylladb/scylladb#16003
* github.com:scylladb/scylladb:
mutation_query_test: test that range tombstones are sent in reverse queries
mutation_query: properly send range tombstones in reverse queries
Currently CREATE KEYSPACE ... WITH STORAGE = { 'type' = 'S3' ... } will create keyspace even if the backend configuration is "invalid" in the sense that the requested endpoint is not known to scylla via object_storage.yaml config file. The first time after that when this misconfiguration will reveal itself is when flushing a memtable (see #15635), but it's good to know the endpoint is not configured earlier than that.
fixes: #15074Closesscylladb/scylladb#16038
* github.com:scylladb/scylladb:
test: Add validation of misconfigured storage creation
sstables: Throw early if endpoint for keyspace is not configured
replica: Move storage options validation to sstables manager
test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store
sstables: Add has_endpoint_client() helper to manager
In an attempt to create a non-local keyspace with unknown endpoint,
there should pop up the configuration exception.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We're going to ban creation of a keyspace with S3 type in case the
requested endpoint is not configured. The problem is that this test case
of cql-pytest needs such keyspace to be created and in order to provide
the object storage configuration we'd need to touch the generic scylla
cluster management which is an overill for generic cql-pytest case.
Simpler solution is to make object_store test suite perform all the
S3-related checks, including the way DESCRIBE for S3-backed ks works.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The wrapper just calls the test-only core write_memtable_to_sstable()
overload, tests can do it on their own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This effectively reverts bc051387c5 (storage_service: Remove sys_dist_ks
from storage_service dependencies) since now storage service needs the
sys. disk. ks not only cluster join time. Next patch will make more use
of it as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now set via a dedicated call that happens after query processor is
started. Now query processor is started before storage service and the
latter can get the q.p. local reference via constructor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently when the coordinator decides to move the fence it issues an
RPC to each node and each node locally advances fence version. This is
fine if there are no failures or failures are handled by retrying
fencing, but if we want to allow topology changes to progress even in
the presence of barrier failures it is easier to store the fence version
in the raft state. The nodes that missed fence rpc may easily catch up
to the latest fence version by simply executing a raft barrier.
before this change, in sstable_run_based_compaction_test, we check
every 4 sstables, to verify that we close the sstable to be replaced
in a batch of 4.
since the integer-based generation identifier is monotonically
incremental, we can assume that the identifiers of sstables are like
0, 1, 2, 3, .... so if the compaction consumes sstable in a
batch of 4, the identifier of the first one in the batch should
always be the multiple of 4. unfortunately, this test does not work
if we use uuid-based identifier.
but if we take a closer look at how we create the dataset, we can
have following facts:
1. the `compaction_descriptor` returned by
`sstable_run_based_compaction_strategy_for_tests` never
set `owned_ranges` in the returned descriptor
2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no`
is used, if `_owned_ranges_checker` is empty
3. `mutation_reader_merger` respects the `fwd_mr` passed to its
ctor, so it closes current sstable immediately when the underlying
mutation reader reaches the end of stream.
in other words, we close every sstable once it is fully consumed in
sstable_ompaction_test. and the reason why the existing test passes
is that we just sample the sstables whose generation id is a multiple
of 4. what happens when we perform compaction in this test is:
1. replace 5 with 33, closing 5
2. replace 6 with 34, closing 6
3. replace 7 with 35, closing 7
4. replace 8 with 36, closing 8 << let's check here.. good, go on!
5. replace 13 with 37, closing 13
...
8. replace 16 with 40, closing 16 << let's check here.. also, good, go on!
so, in this change, we just check all old sstables, to verify that
we close each of them once it is fully consumed.
Fixes#16073
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
As a general rule, tests in test/cql-pytest shouldn't just pass on Scylla - they also should not fail on Cassandra; A test that fails on Cassandra may indicate that the test is wrong, or that Scylla's behavior is wrong and the test just enshrines that wrong behavior. Each time we see a test fail on Cassandra we need to check if this is not the case. We also have special markers scylla_only and cassandra_bug to put on tests that we know _should_ fail on Cassandra because it is missing some Scylla-only feature or there is a bug in Cassandra, respectively. Such tests will be xfailed/skipped when running on Cassandra, and not report failures.
Unfortunately, over time more several tests got into our suite in that did not pass on Cassandra. In this series I went over all of them, and fixed each to pass - or be skipped - on Cassandra, in a way that each patch explains.
Fixes#16027Closesscylladb/scylladb#16033
* github.com:scylladb/scylladb:
test/cql-pytest: fix test_describe.py to not fail on Cassandra
test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra
test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra
test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra
test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra
test/cql-pytest: fix test_keyspace.py to not fail on Cassandra
test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only
test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra
test/cql-pytest: fix test_filtering.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).
Some of the tests checked on Cassandra things that don't exist there
(namely local secondary indexes) and could skip that part. Other tests
need to be skipped completely ("scylla_only") because they rely on a
Scylla-only feature. We have a bit too many of those in this file, but
I don't want to fix this now.
Yet another test found a real bug in Cassandra 4.1.1 (CASSANDRA-17918)
but passes in Cassandra 4.1.2 and up, so there's nothing to fix except
a comment about the situation.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In commit 52bbc1065c, we started to allow "IN NULL" - it started to
match nothing instead of being an error as it is in Cassandra. The
commit *incorrectly* "fixed" the existing translated Cassandra unit test
to match the new behavior - but after this "fix" the test started to
fail on Cassandra.
The appropriate fix is just to comment out this part of the test and
not do it. It's a small point where we deliberately decided to deviate
from Cassandra's behavior, so the test it had for this behavior is
irrelevant.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Some error-message checks in this test file (which was translated in
the past from Cassandra) try operations which actually has two errors,
and expected to see one error message - but recent Cassandra prints
the other one. This caused several tests to fail when running on
Cassandra 4.1. Both messages are fine, so let's accept both.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Fixed two tests thich failed when running on Cassandra:
One test waited for a secondary index to appear, but in Cassandra, the
index can be broken (cause a read failure) for a short while and we
need to wait through this failure as well and not fail the entire test.
Another test was for local secondary index, which is a Scylla-only
feature, but we forgot the "scylla_only" tag.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The test function test_mv_synchronous_updates checks the
synchronous_updates feature, which is a ScyllaDB extension and
doesn't exist in Cassandra. So it should be marked with "scylla_only"
so that it doesn't fail when running the tests on Cassandra.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).
When testing some invalid cases of ALTER TABLE, the test required
that you cannot choose SimpleStrategy without specifying a
replication_factor. As explained in Refs #16028, this isn't true
in Cassandra 4.1 and up - it now has a default value for
replication_factor and it's no longer required.
So in this patch we split that part of the test to a separate test
function and mark it scylla_only.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The tests in test/cql-pytest/test_guardrail_replication_strategy.py
are for a Scylla-only feature that doesn't exist in Cassandra, so
obviously they all fail on Cassandra. Let's mark them all as
scylla_only.
We use an autouse fixture to automatically mark all tests in this file
as scylla-only, instead of marking each one separately.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).
This patch is only a partial fix - it fixes trivial differences in error
messages, but some potentially-real differences remain so three of the
tests still fail:
1. Trying to set tombstone_threshold to 5.5 is an error in ScyllaDB
("must be between 0.0 and 1.0") but allowed in Cassandra.
2. Trying to set bucket_low to 0.0 is an error in ScyllaDB, giving the
wrong-looking error message "must be between 0.0 and 1.0" (so 0.0 should
have been fine?!) but allowed in Cassandra.
3. Trying to set timestamp_resolution to SECONDS is an error in ScyllaDB
("invalid timestamp resolution SECONDS") but allowed in Cassandra.
I don't think anybody wants to actually use "SECONDS", but it seems
legal in Cassandra, so do we need to support it?
The patch also simplifies the test to use cql-pytest's util.py, instead
of cassandra_tests/porting.py. The latter was meant to make porting
existing Cassandra tests easier - not for writing new ones - and made
using a regular expression for testing error messages harder so I
switched to using pytest.raises() whose "match=" accepts a regular
expression.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).
It turns out that when the token() function is used with incorrect
parameters (it needs to be passed all partition-key columns), the
error message is different in ScyllaDB and Cassandra. Both are
reasonable error messages, so if we insist on checking the error
message - we should allow both.
Also the same test called its second partition-key column "ck". This
is confusing, because we usually use the name "ck" to refer to a clustering
key. So just for clarity, we change this name to "pk2". This is not a
functional change in the test.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows.
Turns out we had two problems in this area that leads to suboptimal bloom filters.
1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed.
2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count.
For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong.
Fixes https://github.com/scylladb/scylladb/issues/15704.
Closesscylladb/scylladb#15938
* github.com:scylladb/scylladb:
streaming: Improve partition estimation with TWCS
streaming: Don't adjust partition estimate if segregation is postponed
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.
Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.
Closesscylladb/scylladb#16050
* github.com:scylladb/scylladb:
test: test abort of compaction task that isn't started yet
test: test running compaction task abort
tasks: fail if a task was aborted
compaction: abort task manager compaction tasks
All two and the upcoming third test cases in the test create the very
same ks.cf pair with the very same sequence of steps. Generalize them.
For the basic test case also tune up the way "expected" rows are
calculated -- now they are SELECT-ed right after insertion and the size
is checked to be non zero. Not _exactly_ the same check, but it's good
enough for basic testing purposes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15986
Boost.Test prints the LHS and RHS when the predicate statement passed
to BOOST_REQUIRE_EQUAL() macro evaluates to false. so the error message
printed by Boost would be more developer friendly when the test fails.
in this test, we replace some BOOST_REQUIRE() with BOOST_REQUIRE_EQUAL()
when appropriate.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16047
This short series fixes test/cql-pytest/test_permissions.py to stop failing on Cassandra.
The second patch fixes these failures (and explains why). The first patch is a new test for UDFs, which helped me prove that one of the test_permissions.py failures in Cassandra is a Cassandra bug - some esoteric error path that prints the right message when no permissions are involved, becomes wrong when permissions are added.
Fixes#15969Closesscylladb/scylladb#15979
* github.com:scylladb/scylladb:
test/cql-pytest: fix test_permissions.py to not fail on Cassandra
test/cql-pytest: add test for DROP FUNCTION
This is continuation of a34c8dc4 (Drop compaction_manager_for_testing).
There's one more wrapper over compaction_manager to access its private fields. All such access was recently moved to sstables::test_env's compaction manager, now it's time to drop the remaining legacy wrapper class.
Closesscylladb/scylladb#16017
* github.com:scylladb/scylladb:
test/utils: Drop compaction_manager_test
test/utils: Get compaction manager from test_env
test/sstables: Introduce test_env_compaction_manager::perform_compaction()
test/env: Add sstables::test_env& to compaction_manager_test::run()
test/utils: Add sstables::test_env& to compact_sstables()
test/utils: Simplify and unify compaction_manager_test::run()
test/utils: Squash two compact_sstables() helpers
test/compaction: Use shorter compact_sstables() helper
test/utils: Keep test task compaction gate on task itself
test/utils: Move compaction_manager_test::propagate_replacement()
Having values of the duration type is not allowed for clustering
columns, because duration can't be ordered. This is correctly validated
when creating a table but do not validated when we alter the type.
Fixes#12913Closesscylladb/scylladb#16022
Propagate `exceptions::unavailable_exception` error message to the client such as cqlsh.
Fixes#2339Closesscylladb/scylladb#15922
* github.com:scylladb/scylladb:
test: add the auth_cluster test suite
auth: fix error message when consistency level is not met
the "task" fixture is supposed to return a task for test, if it
fails to do so, it would be an issue not directly related to
the test. so let's fail it early.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16042
This commit adds the auth_cluster test suite to test a custom scenario
involving password authentication:
- create a cluster of 2 nodes with password authentication
- down one node
- the other node should refuse login stating that it couldn't reach
QUORUM
References ScyllaDB OSS #2339
This class only provides a .run() method which allocates a task and
calls sstables::test_env::perform_compaction(). This can be done in a
helper method, no need for the whole class for it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Take it from compaction_manager_test::run() which is simplified overwite
of the compaction_manager::perform_compaction().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method is the simplified rewrite of the compaction_manager's
perform_compaction() one, but it makes task registration and
unregistration to hard way. Keep it shorter and simpler resembling the
compaction_manager's prototype.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the one sitting in utils is only called from its peer in compaction
test. Things get simpler if they get merged.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are several of them spread between the test and utils. One of the
test cases can use its local shorter overload for brevity. Also this
makes one of the next patches shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
They both have the same scope, but keeping it on the task frees the
caller from the need to mess with its private fields. For now it's not a
problem, but it will be critical in one of the next patches.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The purpose of this method is to turn public the private
compaction_manager method of the same name. The caller of this method is
having sstable_test_env at hand with its test_env_compaction_manager, so
the de-private-isation call can be moved.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We shouldn't have cql-pytest tests that report failure when run on
Cassandra (with test/cql-pytest/run-cassandra): A test that passes
on Scylla but fails on Cassandra indicates a *difference* between
Scylla's behavior and Cassandra's, and this difference should always
be investigated:
1. It can be a Scylla bug, which of should be fixed immediately
or reported as a bug and the test changed to fail on Scylla ("xfail").
2. It can be a minor difference in Scylla's and Cassandra's
behavior where both can be accepted. In this case the test should
me modified to accept both behaviors, and a comment added to
explain why we decided to do that.
3. It can be a Cassandra bug which causes a correct test to fail.
This case should not be taken lightly, and a serious effort
is needed to be convinced that this is really a Cassandra bug
and not our misunderstanding of what Cassandra does. In
this case the test should be marked "cassandra_bug" and a
detailed comment should explain why.
4. Or it can be an outright bug in the test that caused it to fail
on Cassandra.
This test had most of these cases :-) There was a test bug in one place
(in a Cassandra-specific Java UDF), a minor and (aruably) acceptable
difference between the error codes returned by Scylla and Cassandra
in one case, and two minor Cassandra bugs (in the error path). All
of these are fixed here, and after this patch test/cql-pytest/run-cassandra
no longer fails on this file.
Fixes#15969
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
We already have in test/cql-pytest various tests for UDF in the bigger
context of UDA (test_uda.py), WASM (test_wasm.py) and permissions, but
somehow we never had a file for simple tests only for UDF, so we
add one here, test/cql-pytest/test_udf.py
We add a test for checking something which was already assumed in
test_permissions.py - that it is possible to create two different
UDFs with the same name and different parameters, and then you must
specify the parameters when you want to DROP one of them. The test
confirms that ScyllaDB's and Cassandra's behavior is identical in
this, as hoped.
To allow the test to run on both ScyllaDB and Cassandra, it needs to
support both Lua (for ScyllaDB) or Java (for Cassandra), and we introduce
a fixture to make it easier to support both. This fixture can later
be used in more tests added to this file.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>