To run with both vnodes and tablets. For this functionality, both
replication methods should be covered with tests, because it uses
different ways to produce partition lists, depending on the replication
method.
Also add scylla_only to those tests that were missing this fixture
before. All tests in this suite are scylla-only and with the
parameterization, this is even more apparent.
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
The current point variant cannot take inclusiveness into account, when
said point comes from another interval bound.
This method had no tests at all, so add tests covering both overloads.
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.
Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.
The unit tests are renamed and range.hh is deleted.
Closesscylladb/scylladb#17428
It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration.
This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier.
To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field.
refs: #16527Closesscylladb/scylladb#17360
* github.com:scylladb/scylladb:
test/topology: Add checking error paths for failed migration
topology.tablets_migration: Handle failed streaming
topology.tablets_migration: Add cleanup_target transition stage
topology.tablets_migration: Add revert_migration transition stage
storage_service: Rewrap cleanup stage checking in cleanup_tablet()
test/topology: Move helpers to get tablet replicas to pylib
To allow to filter the returned keyspaces based by the replication they
use: tablets or vnodes.
The filter can be disabled by omitting the parameter or passing "all".
The default is "all".
Fixes: #16509Closesscylladb/scylladb#17319
so we exercise the cases where state and status are not "normal" and "up".
turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`.
* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
should help with debugging.
Fixes#17401Closesscylladb/scylladb#17417
* github.com:scylladb/scylladb:
test/nodetool: parameterize test_ring
test/nodetool: fail a test only with leftover expected requests
For now only fail streaming stage and check that migration doesn't get
stuck and doesn't make tablet appear on dead node.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
so we exercise the cases where state and status are not "normal" and "up".
turns out the MBean is able to cache some objects. so the requets
retrieving datacenter and rack are now marked `ANY`.
Fixes#17401
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
if there are unconsumed requests whose `multiple` is -1, we should
not consider it a required, the test can consume it or not. but if
it does not, we should not consider the test a failure just because
these requests are sitting at the end of queue.
so, in this change, we
* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
should help with debugging.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The mentioned test failed on CI. It sets up two nodes and performs
operations related to creation and dropping of tables as well as
moving tablets. Locally, the issue was not visible - also, the test
was passing on CI in majority of cases.
One of steps in the test case is intended to select the shard that
has some tablets on host_0 and then move them to (host_1, shard_3).
It contains also a precondition that requires the tablets count to
be greater than zero - to ensure, that move_tablets operation really
moves tablets.
The error message in the failed CI run comes from the precondition
related to tablets count on (host0, src_shard) - it was zero.
This indicated that there were no tablets on entire host_0.
The following commit removes the assumption about the existence of
tablets on host_0. In case when there are no tablets there, the
procedure is rerun for host_1.
Now the logic is as follows:
- find shard that has some tablets on host_0
- if such shard does not exist, then find such shard on host_1
- depending on the result of search set src/dest nodes
- verify that reported tablet count metric is changed when
move_tablet operation finishes
Refs: scylladb#17386
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17398
Before the patch if a decommissioned node tries
to restart, it calls _group0->discover_group0 first
in join_cluster, which hangs since decommissioned
nodes are banned and other nodes don't respond
to their discovering requests.
We fix the problem by checking was_decommissioned()
flag before calling discover_group0.
fixesscylladb/scylladb#17282Closesscylladb/scylladb#17358
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.
Fixes: #17046
Tests: test_tablet_repair_history
Closesscylladb/scylladb#17047
* github.com:scylladb/scylladb:
repair: Update repair history for tablet repair
repair: Extract flush hints code
As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation.
Refs: #15588Closesscylladb/scylladb#17368
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the repair command
test/nodetool: utils: add check_nodetool_fails_with_error_contains()
test/nodetool: util: replace flags with custom matcher
This is mostly a refactoring commit to make the test
more readable, as a byproduct of
scylladb/scylladb#17369 investigation.
We add the check for specific type of exceptions that
can be thrown (bad_property_file_error).
We also fix the potential race - the test may write
to res from multiple cores with no locks.
Closesscylladb/scylladb#17371
This API endpoint was failing when tablets were enabled
because of usage of get_vnode_effective_replication_map().
Moreover, it was providing an error message that was not
user-friendly.
This change extends the handler to properly service the incoming requests.
Furthermore, it introduces two new test cases that verify the behavior of
storage_service/range_to_endpoint_map API. It also adjusts the test case
of this endpoint for vnodes to succeed when tablets are enabled by default.
The new logic is as follows:
- when tablets are disabled then users may query endpoints
for a keyspace or for a given table in a keyspace
- when tablets are enabled then users have to provide
table name, because effective replication map is per-table
When user does not provide table name when tablets are enabled
for a given keyspace, then BAD_REQUEST is returned with a
meaningful error message.
Fixes: scylladb#17343
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17372
Before this PR, writes to the previous CDC generations would
always be rejected. After this PR, they will be accepted if the
write's timestamp is greater than `now - generation_leeway`.
This change was proposed around 3 years ago. The motivation was
to improve user experience. If a client generates timestamps by
itself and its clock is desynchronized with the clock of the node
the client is connected to, there could be a period during
generation switching when writes fail. We didn't consider this
problem critical because the client could simply retry a failed
write with a higher timestamp. Eventually, it would succeed. This
approach is safe because these failed writes cannot have any side
effects. However, it can be inconvenient. Writing to previous
generations was proposed to improve it.
The idea was rejected 3 years ago. Recently, it turned out that
there is a case when the client cannot retry a write with the
increased timestamp. It happens when a table uses CDC and LWT,
which makes timestamps permanent. Once Paxos commits an entry
with a given timestamp, Scylla will keep trying to apply that entry
until it succeeds, with the same timestamp. Applying the entry
involves writing to the CDC log table. If it fails, we get stuck.
It's a major bug with an unknown perfect solution.
Allowing writes to previous generations for `generation_leeway` is
a probabilistic fix that should solve the problem in practice.
Apart from this change, this PR adds tests for it and updates
the documentation.
This PR is sufficient to enable writes to the previous generations
only in the gossiper-based topology. The Raft-based topology
needs some adjustments in loading and cleaning CDC generations.
These changes won't interfere with the changes introduced in this
PR, so they are left for a follow-up.
Fixesscylladb/scylladb#7251Fixesscylladb/scylladb#15260Closesscylladb/scylladb#17134
* github.com:scylladb/scylladb:
docs: using-scylla: cdc: remove info about failing writes to old generations
docs: dev: cdc: document writing to previous CDC generations
test: add test_writes_to_previous_cdc_generations
cdc: generation: allow increasing generation_leeway through error injection
cdc: metadata: allow sending writes to the previous generations
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.
Fixes: #17046
Tests: test_tablet_repair_history
this change introduces a new exception which carries the status code
so that an operation can return a non-zero exit code without printing
any errors. this mimics the behavior of "viewbuildstatus" command of
C* nodetool.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17359
_do_check_nodetool_fails_with() currently has a `match_all` flag to
control how the match is checked. Now we need yet another way to control
how matching is done. Instead of adding yet another flag (and who knows
how many more), jut replace the flag and the errors input with a matcher
functor, which gets the stdout and stderr and is delegated to do any
checks it wants. This method will scale much better going forward.
When a node changes IP address we need to remove its old IP from `system.peers` and gossiper.
We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted.
The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address.
The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1.
To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup.
The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes.
Fixes#16886Fixes#16691Fixes#17199Closesscylladb/scylladb#17162
* github.com:scylladb/scylladb:
test_change_ip: improve the test
raft_ip_address_updater: remove stale IPs from gossiper
raft_address_map: add my ip with the new generation
system_keyspace::update_peer_info: check ep and host_id are not empty
system_keyspace::update_peer_info: make host_id an explicit parameter
system_keyspace::update_peer_info: remove any_set flag optimisation
system_keyspace: remove duplicate ips for host_id
system_keyspace: peers table: use coroutines
storage_service::raft_ip_address_updater: log gossiper event name
raft topology: ip change: purge old IP
on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
This API endpoint currently returns with status 500 if attempted to be called for a table which uses tablets. This series adds tablet support. No change in usage semantics is required, the endpoint already has a table parameter.
This endpoint is the backend of `nodetool getendpoints` which should now work, after this PR.
Fixes: #17313Closesscylladb/scylladb#17316
* github.com:scylladb/scylladb:
service/storage_service: get_natural_endpoints(): add tablets support
replica/database: keyspace: add uses_tablets()
service/storage_service: remove token overload of get_natural_endpoints()
In this commit we refactor test_change_ip to improve
it in several ways:
* We inject failure before old IP is removed and verify
that after restart the node sees the proper peers - the
new IP for node2 and old IP for node3, which is not restarted
yet.
* We introduce the lambda wait_proper_ips, which checks not only the
system.peers table, but also gossiper and token_metadata.
* We call this lambda for all nodes, not only the first node;
this allows to validate that the node that has changed its
IP has the proper IP of itself in the data structures above.
Note that we need to inject an additional delay ip-change-raft-sync-delay
before old IP is removed. Otherwise the problem stop reproducing - other
nodes remove the old IP before it's send back to the just restarted node.
The following scenario is possible: a node A changes its IP
from ip1 to ip2 with restart, other nodes are not yet aware of ip2
so they keep gossiping ip1, after restart A receives
ip1 in a gossip message and calls handle_major_state_change
since it considers it as a new node. Then on_join event is
called on the gossiper notification handles, we receive
such event in raft_ip_address_updater and reverts the IP
of the node A back to ip1.
The essence of the problem is that we don't pass the proper
generation when we add ip2 as a local IP during initialization
when node A restarts, so the zero generation is used
in raft_address_map::add_or_update_entry and the gossiper
message owerwrites ip2 to ip1.
In this commit we fix this problem by passing the new generation.
To do that we move the increment_and_get_generation call
from join_token_ring to scylla_main, so that we have a new generation
value before init_address_map is called.
Also we remove the load_initial_raft_address_map function from
raft_group0 since it's redundant. The comment above its call site
says that it's needed to not miss gossiper updates, but
the function storage_service::init_address_map where raft_address_map
is now initialized is called before gossiper is started. This
function does both - it load the previously persisted host_id<->IP
mappings from system.local and subscribes to gossiper notifications,
so there is no room for races.
Note that this problem reproduces less likely with the
'raft topology: ip change: purge old IP' commit - other
nodes remove the old IP before it's send back to the
just restarted node. This is also the reason why this
problem doesn't occur in gossiper mode.
fixesscylladb/scylladb#17199
Alternator TTL doesn't yet work on tables using tablets (this is
issue #16567). Before this patch, it can be enabled on a table with
tablets, and the result is a lot of log spam and nothing will get expired.
So let's make the attempt to enable TTL on a table that uses tablets
into a clear error. The error message points to the issue, and also
suggests how to create a table that uses vnodes, not tablets.
This patch also adds a test that verifies that trying to enable TTL
with tablets is an error. Obviously, this test should be removed
once the issue is solved and TTL begins working with tablets.
Refs #16567
Refs #16807
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17306
A user asked on the ScyllaDB forum several questions on whether
tombstone_gc works on materialized views. This patch includes two
tests that confirm the following:
1. The tombstone_gc may be set on a view - either during its creation
with CREATE MATERIALIZED VIEW or later with ALTER MATERIALIZED VIEW.
2. The tombstone_gc setting is correctly shown - for both base tables
and views - by the "DESC" statement.
3. The tombstone_gc setting is NOT inherited from a base table to a new
view - if you want this option on a view, you need to set it
separately.
Unfortunately, this test could not be a single-node cql-pytest because
we forbid tombstone_gc=repair when RF=1, and since recently, we forbid
setting RF>1 on a single-node setup. So the new tests are written in
the test/topology framework - which may run multiple tests against
a single three-node cluster run multiple tests against it.
To write tests over a shared cluster, we need functions which create
temporary keyspaces, tables and views, which are deleted automatically
as soon as a test ends. The test/topology framework was lacking such
functions, so this tests includes them - currently inside the test
file, but if other people find them useful they can be moved to a more
central location.
The new functions, net_test_keyspace(), new_test_table() and
new_materialized_view() are inspired by the identically-named
functions in test/cql-pytest/util.py, but the implementation is
different: Importantly, the new functions here are *async*
context managers, used via "async with", to fit with the rest
of the asynchronous code used in the topology test framework.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17345
This series makes several changes to how ignored nodes list is treated
by the topology coordinator. First the series makes it global and not
part of a single topology operation, second it extends the list at the
time of removenode/replace invocation and third it bans all nodes in
the list from contacting the cluster ever again.
The main motivation is to have a way to unblock tablet migration in case
of a node failure. Tablet migration knows how to avoid nodes in ignored
nodes list and this patch series provides a way to extend it without
performing any topology operation (which is not possible while tables
migration runs).
Fixesscylladb/scylladb#16108
* 'gleb/ignore-nodes-handling-v2' of github.com:scylladb/scylla-dev:
test: add test for the new ignore nodes behaviour
topology coordinator: cleanup node_state::decommissioning state handling code
topology coordinator: ban ignored nodes just like we ban nodes that are left
storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace
topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time
topology coordinator: make ignored_nodes list global and permanent
topology_coordinator: do not cancel rebuild just because some other nodes are dead
topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout
raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously
The test checks that once a node is specified in ignored node list by
one topology operation the information is carried over to the next
operation as well.
Since now a node that is at one point was marked as dead, either via
--ignore-dead-nodes parameter or by been a target for removenode or
replace, can no longer be made "undead" we need to make sure that they
cannot rejoin the cluster any longer. Do that by banning them on a
messaging layer just like we do for nodes that are left.
Not that the removenode failure test had to be altered since it restarted
a node after removenode failure (which now will not work). Also, since
the check for liveness was removed from the topology coordinator (because
the node is already banned by then), the test case that triggers the
removed code is removed as well.
All cql-pytest tests use one node, and unsuprisingly most use RF=1.
By default, as part of the "guardrails" feature, we print a warning
when creating a keyspace with RF=1. This warning gets printed on
every cql-pytest run, which creates a "boy who cried wolf" effect
whereby developers get used to seeing these warnings, and won't care
if new warnings start appearing.
The fix is easy - in run.py start Scylla with minimum-replication-factor-
warn-threshold set to -1 instead of the default 3.
Note that we do have cql-pytest tests for this guardrail, but those don't
rely on the default setting of this variable (they can't, cql-pytest
tests can also be run on a Scylla instance run manually by a developer).
Those tests temporarily set the threshold during the test.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17274
Adds a test reproducing https://github.com/scylladb/scylladb/issues/16759, and the instrumentation needed for it.
Closesscylladb/scylladb#17208
* github.com:scylladb/scylladb:
row_cache_test: test cache consistency during memtable-to-cache merge
row_cache: use preemption_source in update()
utils: preempt: add preemption_source
Alternator Streams doesn't yet work on tables using tablets (this is
issue #16317). Before this patch, an attempt to enable it results in
an unsightly InternalServerError, which isn't terrible - but we can
do better.
So in this patch, we make the attempt to enable Streams and tablets
together into a clear error. The error message points to the open issue,
and also suggests how to create a table that uses vnodes, not tablets.
Unfortunately, there are slightly two different code paths and error
messages for two cases: One case is the creation of a new table (where
the validation happens before the keyspace is actually created), and
the other case is an attempt to enable streams on an existing table
with an existing keyspace (which already might or might not be using
tablets).
This patch also adds a test that verifies that trying to enable Streams
with tablets is an error - in both cases (table creation and update).
Obviously, this test - and the validation code - should be removed once
the issue is solved and Alternator Streams begins working with tablets.
Fixes#16497
Refs #16807
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17311
Unfortunately, scylladb/python-driver#230 is not fixed yet, so it is
necessary for the sake of our CI's stability to re-create the driver
session after all nodes in the cluster are restarted.
There is one place in test_topology_recovery_basic where all nodes are
restarted but the driver session is not re-created. Even though nodes
are not restarted at once but rather sequentially, we observed a failure
with similar symptoms in a CI run for scylla-enterprise.
Add the missing driver reconnect as a workaround for the issue.
Fixes: scylladb/scylladb#17277Closesscylladb/scylladb#17278