This is a translation of Cassandra's CQL unit test source file
validation/operations/AlterTest.java into our our cql-pytest framework.
This test file includes 24 tests for various types of ALTER operations
(of keyspaces, tables and types). Two additional tests which required
multiple data centers to test were dropped with a comment explaining why.
All 24 tests pass on Cassandra, with 8 failing on Scylla reproducing
one already known Scylla issue and 5 previously-unknown ones:
Refs #8948: Cassandra 3.11.10 uses "class" instead of
"sstable_compression" for compression settings by default
Refs #9929: Cassandra added "USING TIMESTAMP" to "ALTER TABLE",
we didn't.
Refs #9930: Forbid re-adding static columns as regular and vice versa
Refs #9935: Scylla stores un-expanded compaction class name in system
tables.
Refs #10036: Reject empty options while altering a keyspace
Refs #10037: If there are multiple values for a key, CQL silently
chooses last value
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220206163820.1875410-2-nyh@scylladb.com>
Implement the nodetool.compact() function, requesting a major compaction
of the given table. As usual for the nodetool.* functions, this is
implemented with the REST API if available (i.e., testing Scylla), or
with the external "nodetool" command if not (for testing Cassandra).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220206163820.1875410-1-nyh@scylladb.com>
Seastar uses POSIX IO for output in addition to C++ iostreams,
e.g. in print_safe(), where it write()s directly to stdout.
Instead of manipulating C++ output streams to reset
stdout/log files, reopen the underlying file descriptors
to output/log files.
Fixes#9962 "cql_repl prints junk into the log"
Message-Id: <20220204205032.1313150-1-kostja@scylladb.com>
Snapshot-ctl methods fetch information about snapshots from
column family objects. The problem with this is that we get rid
of these objects once the table gets dropped, while the snapshots
might still be present (the auto_snapshot option is specifically
made to create this kind of situation). This commit switches from
relying on column family interface to scanning every datadir
that the database knows of in search for "snapshots" folders.
This PR is a rebased version of #9539 (and slightly cleaned-up, cosmetically)
and so it replaces the previous PR.
Fixes#3463Closes#7122Closes#9884
* github.com:scylladb/scylla:
snapshots: Fix snapshot-ctl to include snapshots of dropped tables
table: snapshot: add debug messages
This series adds methods to perform offstrategy compaction, if needed, returning a future<bool>
so the caller can wait on it until compaction completes.
The returned value is true iff offstrategy compaction was needed.
The added keyspace_offstrategy_compaction calls perform_offstrategy_compaction on the specified keyspace and tables, return the number of tables that required offstrategy compaction.
A respective unit test was added to the rest_api pytest.
This PR replaces https://github.com/scylladb/scylla/pull/9095 that suggested adding an option to `keyspace_compaction`
since offstrategy compaction triggering logic is different enough from major compaction meriting a new api.
Test: unit (dev)
Closes#9980
* github.com:scylladb/scylla:
test: rest_api: add unit tests for keyspace_offstrategy_compaction api
api: add keyspace_offstrategy_compaction
compaction_manager: get rid of submit_offstrategy
table: add perform_offstrategy_compaction
compaction_manager: perform_offstrategy: print ks.cf in log messages
compaction_manager: allow waiting on offstrategy compaction
Snapshot-ctl methods fetch information about snapshots from
column family objects. The problem with this is that we get rid
of these objects once the table gets dropped, while the snapshots
might still be present (the auto_snapshot option is specifically
made to create this kind of situation). This commit switches from
relying on column family interface to scanning every datadir
that the database knows of in search for "snapshots" folders.
Fixes#3463Closes#7122Closes#9884
Signed-off-by: Piotr Wojtczak <piotr.m.wojtczak@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Added test that checks if a SELECT COUNT(*) query was transformed and
processed in a parallel way. Checking is done by looking at the cql
statistics and comparing subsequent counts of parallelized aggregation
SELECT query executions.
When forwarding a reconfiguration request from follower to a leader in
`modify_config`, there is no reason to wait for the follower's commit
index to be updated. The only useful information is that the leader
committed the configuration change - so `modify_config` should return as
soon as we know that.
There is a reason *not* to wait for the follower's commit index to be
updated: if the configuration change removes the follower, the follower
will never learn about it, so a local waiter will never be resolved.
`execute_modify_config` - the part of `modify_config` executed on the
leader - is thus modified to finish when the configuration change is
fully complete (including the dummy entry appended at the end), and
`modify_config` - which does the forwarding - no longer creates a local
waiter, but returns as soon as the RPC call to the leader confirms that
the entry was committed on the leader.
We still return an `entry_id` from `execute_modify_config` but that's
just an artifact of the implementation.
Fixes#9981.
A regression test was also added in randomized_nemesis_test.
* kbr/modify-config-finishes-v1:
test: raft: randomized_nemesis_test: regression test for #9981
raft: server: don't create local waiter in `modify_config`
DynamoDB protocol specifies that when getting items in a batch
failed only partially, unprocessed keys can be returned so that
the user can perform a retry.
Alternator used to fail the whole request if any of the reads failed,
but right now it instead produces the list of unprocessed keys
and returns them to the user, as long as at least 1 read was
successful.
This series comes with a test based on Scylla's error injection mechanism, and thus is only useful in modes which come with error injection compiled in. In release mode, expect to see the following message:
SKIPPED (Error injection not enabled in Scylla - try compiling in dev/debug/sanitize mode)
Fixes#9984Closes#9986
* github.com:scylladb/scylla:
test: add total failure case for GetBatchItem
test: add error injection case for GetBatchItem
test: add a context manager for error injection to alternator
alternator: add error injection to BatchGetItem
alternator: fill UnprocessedKeys for failed batch reads
The test verifies that if all reads from a batch operation
failed, the result is an error, and not a success response
with UnprocessedKeys parameter set to all keys.
With the new context manager it's now easier to request an error
to be injected via REST API. Note that error injection is only
enabled in certain build modes (dev, debug, sanitize)
and the test case will be skipped if it's not possible to use
this mechanism.
Schema changes on top of Raft do not allow concurrent changes.
If two changes are attempted concurrently, one of them gets
`group0_concurrent_modification` exception.
Catch the exception in CQL DDL statement execution function and retry.
In addition, improve the description of CQL DDL statements
in group 0 history table.
Add a test which checks that group 0 history grows iff a schema change does
not throw `group0_concurrent_modification`. Also check that the retry
mechanism works as expected.
* kbr/ddl-retry-v1:
test: unit test for group 0 concurrent change protection and CQL DDL retries
cql3: statements: schema_altering_statement: automatically retry in presence of concurrent changes
Raft randomized nemesis test was improved by adding some more
chaos: randomizing the network delay, server configuration,
ticking speed of servers.
This allowed to catch a serious bug, which is fixed in the first patch.
The patchset also fixes bugs in the test itself and adds quality of life
improvements such as better diagnostics when inconsistency is detected.
* kbr/nemesis-random-v1:
test: raft: randomized_nemesis_test: print state of each state machine when detecting inconsistency
test: raft: randomized_nemesis_test: print details when detecting inconsistency
test: raft: randomized_nemesis_test: print snapshot details when taking/loading snapshots in `impure_state_machine`
test: raft: randomized_nemesis_test: keep server id in impure_state_machine
test: raft: randomized_nemesis_test: frequent snapshotting configuration
test: raft: randomized_nemesis_test: tick servers at different speeds in generator test
test: raft: randomized_nemesis_test: simplify ticker
test: raft: randomized_nemesis_test: randomize network delay
test: raft: randomized_nemesis_test: fix use-after-free in `environment::crash()`
test: raft: randomized_nemesis_test: fix use-after-free in two-way rpc functions
test: raft: randomized_nemesis_test: rpc: don't propagate `gate_closed_exception` outside
test: raft: randomized_nemesis_test: fix obsolete comment
raft: fsm: print configuration entries appearing in the log
raft: `operator<<(ostream&, ...)` implementation for `server_address` and `configuration`
raft: server: abort snapshot applications before waiting for rpc abort
raft: server: logging fix
raft: fsm: don't advance commit index beyond matched entries
Fast forwarding is delegated to the underlying reader and assumes the
it's supported. The only corner case requiring special handling that has
shown up in the tests is producing partition start mutation in the
forwarding case if there are no other fragments.
compacting state keeps track of uncompacted partition start, but doesn't
emit it by default. If end of stream is reached without producing a
mutation fragment, partition start is not emitted. This is invalid
behaviour in the forwarding case, so I've added a public method to
compacting state to force marking partition as non-empty. I don't like
this solution, as it feels like breaking an abstraction, but I didn't
come across a better idea.
Tests: unit(dev, debug, release)
Message-Id: <20220128131021.93743-1-mikolaj.sieluzycki@scylladb.com>
Improve the comment that explains why we needed to use an explicitly
shared random sequence instead of the usual "random". We now understand
that we need this workaround to undo what the pytest-randomly plugin does.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220130155557.1181345-1-nyh@scylladb.com>
Some of the tests in test/cql-pytest share the same table but use
different keys to ensure they don't collide. Before this patch we used a
random key, which was usually fine, but we recently noticed that the
pytest-randomly plugin may cause different tests to run through the *same*
sequence of random numbers and ruin our intent that different tests use
different keys.
So instead of using a *random* key, let's use a *unique* key. We can
achieve this uniqueness trivially - using a counter variable - because
anyway the uniqueness is only needed inside a single temporary table -
which is different in every run.
Another benefit is that it will now be clearer that the tests are
deterministic and not random - the intent of a random_string() key
was never to randomly walk the entire key space (random_string()
anyway had a pretty narrow idea of what a random string looks like) -
it was just to get a unique key.
Refs #9988 (fixes it for cql-pytest, but not for test/alternator)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
If memory reclamation is triggered inside _cache.emplace(), the _cache
btree can get corrupted. Reclaimers erase from it, and emplace()
assumes that the tree is not modified during its execution. It first
locates the target node and then does memory allocation.
Fix by running emplace() under allocating section, which disables
memory reclamation.
The bug manifests with assert failures, e.g:
./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed.
Fixes#9915
Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>
It was observed (perhaps it depends on the Python implementation)
that an identical seed was used for multiple test cases,
which violated the assumption that generated values are in fact
unique. Using a global generator instead makes sure that it was
only seeded once.
Tests: unit(dev) # alternator tests used to fail for me locally
before this patch was applied
Message-Id: <315d372b4363f449d04b57f7a7d701dcb9a6160a.1643365856.git.sarna@scylladb.com>
Check that group 0 history grows iff a schema change does not throw
`group0_concurrent_modification`. Check that the CQL DDL statement retry
mechanism works as expected.
When performing a change through group 0 (which right now means schema
changes), clear entries from group 0 history table which are older
than one week.
This is done by including an appropriate range tombstone in the group 0
history table mutation.
* kbr/g0-history-gc-v2:
idl: group0_state_machine: fix license blurb
test: unit test for clearing old entries in group0 history
service: migration_manager: clear old entries from group 0 history when announcing
Previously all servers were ticked at the same moment, every 10
network/timer ticks.
Now we tick each server with probability 1/10 on each network/timer
tick. Thus, on average, every server is ticked once per 10 ticks.
But now we're able to obtain more interesting behaviors.
E.g. we can now observe servers which are stalling for as long as 10 ticks
and servers which temporarily speed up to tick once per each network tick.
Instead of taking a set of functions with different periods, take a
single function that is called on every tick. The periodicity can be
implemented easily on the user side.
The lambda attached to `_crash_fiber` was a coroutine. The coroutine
would use `this` captured by the lambda after the `co_await`, where the
lambda object (hence its captures) was already destroyed.
No idea why it worked before and sanitizers did not complain in debug
mode.
Two-way RPC functions such as `send_snapshot` had a guard object which
was captured in a lambda passed to `with_gate`. The guard object, on
destruction, accessed the `rpc` object. Unfortunately, the guard object
could outlive the `rpc` object. That's because the lambda, and hence the
guard object, was destroyed after `with_gate` finished (it lived in the
frame of the caller of `with_gate`, i.e. `send_snapshot` and others),
so it could be destroyed after `rpc` (the gate prevents `rpc` from being
destroyed).
Make sure that the guard object is destroyed before `with_gate` finishes
by creating it inside the lambda body - not capturing inside the object.
The `raft::rpc` interface functions are called by `raft::server_impl`
and the exceptions may be propagated outside the server, e.g. through
the `add_entry` API.
Translate the internal `gate_closed_exception` to an external
`raft::stopped_error`.
Useful for debugging.
Had to make `configuration` constructor explicit. Otherwise the
`operator<<` implementation for `configuration` would implicitly convert
the `server_address` to `configuration` when trying to output it, causing
infinite recursion.
Removed implicit uses of the constructor.
Raft does not need to persist the commit index since a restarted node will
either learn it from an append message from a leader or (if entire cluster
is restarted and hence there is no leader) new leader will figure it out
after contacting a quorum. But some users may want to be able to bring
their local state machine to a state as up-to-date as it was before restart
as soon as possible without any external communication.
For them this patch introduces new persistence API that allows saving
and restoring last seen committed index.
Message-Id: <YfFD53oS2j1My0p/@scylladb.com>
We perform a bunch of schema changes with different values of
`migration_manager::_group0_history_gc_duration` and check if entries
are cleared according to this setting.
The compactor recently acquired the ability to consume a v2 stream. The
v2 spec requires that all streams end with a null tombstone.
`range_tombstone_assembler`, the component the compactor uses for
converting the v2 input into its v1 output enforces this with a check on
`consume_end_of_partition()`. Normally the producer of the stream the
compactor is consuming takes care of closing the active tombstone before
the stream ends. The compactor however (or its consumer) can decide to
end the consume early, e.g. to cut the current page. When this happens
the compactor must take care of closing the tombstone itself.
Furthermore it has to keep this tombstone around to re-open it on the
next page.
This patch implements this mechanism which was left out of 134601a15e.
It also adds a unit test which reproduces the problems caused by the
missing mechanism.
The compactor now tracks the last clustering position emitted. When the
page ends, this position will be used as the position of the closing
range tombstone change. This ensures the range tombstone only covers the
actually emitted range.
Fixes: #9907
Tests: unit(dev), dtest(paging_test.py, paging_additional_test.py)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220114053215.481860-1-bdenes@scylladb.com>
`announce` now takes a `group0_guard` by value. `group0_guard` can only
be obtained through `migration_manager::start_group0_operation` and
moved, it cannot be constructed outside `migration_manager`.
The guard will be a method of ensuring linearizability for group 0
operations.
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
Without this fix, generate_random_content could generate 0 entries
and the expected exception would never be injected.
With it, we generate at least 1 entry and the test passes
with the offending random-seed:
```
random-seed=1898914316
Generated 1 dir entries
Aborting lister after 1 dir entries
test/boost/lister_test.cc(96): info: check 'exception "expected_exception" raised as expected' has passed
```
Fixes#9953
Test: lister_test.test_lister_abort --random-seed=1898914316(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220123122921.14017-1-bhalevy@scylladb.com>
Take into account that get_reshaping_job selects only
buckets that have more than min_threashold sstables in them.
Therefore, with 256 disjoint sstables in different windows,
allow first or last windows to not be selected by get_reshaping_job
that will return at least disjoint_sstable_count - min_threshold + 1
sstables, and not more than disjoint_sstable_count.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220123090044.38449-2-bhalevy@scylladb.com>
Base tables that use compact storage may have a special artificial
column that has an empty type.
c010cefc4d fixed the main CDC path to
handle such columns correctly and to not include them in the CDC Log
schema.
This patch makes sure that generation of preimage ignores such empty
column as well.
Fixes#9876Closes#9910
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This pull request fixes two preexisting issues related to snapshot_ctl::true_snapshots_size
https://github.com/scylladb/scylla/issues/9897https://github.com/scylladb/scylla/issues/9898
And adds a couple unit tests to tests the snapshot_ctl functionality.
Test: unit(dev), database_test.{test_snapshot_ctl_details,test_snapshot_ctl_true_snapshots_size}(debug)
Closes#9899
* github.com:scylladb/scylla:
table: get_snapshot_details: count allocated_size
snapshot_ctl: cleanup true_snapshots_size
snpashot_ctl: true_snapshots_size: do not map_reduce across all shards
snapshot_ctl uses map_reduce over all database shards,
each counting the size of the snapshots directory,
which is shared, not per-shard.
So the total live size returned by it is multiples by the number of shards.
Add a unit test to test that.
Fixes#9897
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Fixes#9922
storage proxy uses is_timeout_exception to traverse different code paths.
a6202ae079 broke this (because bit rot and
intermixing), by wrapping exception for information purposes.
This adds check of nested types in exception handling, as well as a test
for the routine itself.
Closes#9932
* github.com:scylladb/scylla:
database/storage_proxy: Use "is_timeout_exception" instead of catch match
utils::is_timeout_exception: Ensure we handle nested exception types