In the following commit, we add a test that needs to block the CDC
generation publisher's loop twice. We allow it in this commit by
making handlers of the `cdc_generation_publisher_fiber` injection
share messages. From now on, unblocking every step of the loop will
require sending a new message from the test.
This change breaks the test already using the
`cdc_generation_publisher_fiber` injection, so we adjust the test.
For a single injection, all created injection handlers share all
received messages. In particular, it means that one received message
unblocks all handlers waiting for the first message. This behavior
is often desired, for example, if multiple fibers execute the
injected code and we want to unblock them all with a single message.
However, there is a problem if we want to block every execution
of the injected code. Apart from the first created handler, all
handlers will be instantly unblocked by messages from the past that
have already unblocked the first handler.
In one of the following commits, we add a test that needs to block
the CDC generation publisher's loop twice. Since it looks like there
are no good workarounds for this arguably general problem, we extend
injections with handlers in a way that solves it. We introduce the
new `share_messages` parameter. Depending on its value, handlers
will share messages or not. The details are described in the new
comments in `error_injection.hh`.
We also add some basic unit tests for the new funcionality.
sstables::test_env is intended for sstable unit tests, but to satisfy its
dependency of an sstables_registry we instantiate an entire database.
Remove the dependency by having a mock implementation of sstables_registry
and using that instead.
Closesscylladb/scylladb#17895
If there is a bug in the tablet scheduler which makes it never
converge for a given state of topology, rebalance_tablets() will never
complete and will generate a huge amounts of logs. This patch adds a
sanity limit so that we fail earlier.
This was observed in one of the test_load_balancing_with_random_load runs in CI.
Fixesscylladb/scylladb#17894.
Closesscylladb/scylladb#17916
The series marks nodes to be non expiring in the address map earlier, when
they are placed in the topology.
Fixes: scylladb/scylladb#16849
* 'gleb/16849-fix-v2' of github.com:scylladb/scylla-dev:
test: add test to check that address cannot expire between join request placemen and its processing
topology_coordinator: set address map entry to nonexpiring when a node is added to the topology
raft_group0: add modifiable_address_map() function
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* raft_call
* raft_read
* network_majority_grudge
* reconfiguration
* stop_crash
* operation::thread_id
* append_seq
* AppendReg::append
* AppendReg::ret
* operation::either_of<Ops...>
* operation::exceptional_result<Op>
* operation::completion<Op>
* operation::invocable<Op>
and drop their operator<<:s.
in which,
* `operator<<` for append_entry is never used. so it is removed.
* `operator<<` for `std::monostate` and `std::variant` are dropped. as we are now using their counterparts in {fmt}.
* stop_crash::result_type 's `fmt::formatter` is not added, as we cannot define a partial specialization of `fmt::formatter` for a nested class for a template class. we will tackle this struct in another change.
Refs #13245Closesscylladb/scylladb#17884
* github.com:scylladb/scylladb:
test: raft: generator: add fmt::formatter:s
test: randomized_nemesis_test: add fmt::formatter for some types
test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error
raft: add fmt::formatter for error classes
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* operation::either_of<Ops...>
* operation::exceptional_result<Op>
* operation::completion<Op>
* operation::invocable<Op>
and drop their operator<<:s.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* raft_call
* raft_read
* network_majority_grudge
* reconfiguration
* stop_crash
* operation::thread_id
* append_seq
* append_entry
* AppendReg::append
* AppendReg::ret
and drop their operator<<:s.
in which,
* `operator<<` for `std::monostate` and `std::variant` are dropped.
as we are now using their counterparts in {fmt}.
* stop_crash::result_type 's `fmt::formatter` is not added, as we
cannot define a partial specialization of `fmt::formatter` for
a nested class for a template class. we will tackle this struct
in another change.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatter for `seastar::timed_out_error`,
which will be used by the `fmt::formatter` for `std::variant<...>`.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The token ring table is a virtual table (`system.token_ring`), which contains the ring information for all keyspaces in the system. This is essentially an alternative to `nodetool describering`, but since it is a virtual table, it allows for all the usual filtering/aggregation/etc. that CQL supports.
Up until now, this table only supported keyspaces which use vnodes. This PR adds support for tablet keyspaces. To accommodate these keyspaces a new `table_name` column is added, which is set to `ALL` for vnodes keyspaces. For tablet keyspaces, this contains the name of the table.
Simple sanity tests are added for this virtual table (it had none).
Fixes: #16850Closesscylladb/scylladb#17351
* github.com:scylladb/scylladb:
test/cql-pytest: test_virtual_tables: add test for token_ring table
db/virtual_tables: token_ring_table: add tablet support
db/virtual_tables: token_ring_table: add table_name column
db/virtual_tables: token_ring_table: extract ring emit
service/storage_service: describe_ring_for_table(): use topology to map hostid to ip
In topology on raft, management of CDC generations is moved to the topology coordinator.
We need to verify that the CDC keeps working correctly during the upgrade for topology on the raft.
A similar change will be made in the topology recovery test. It will reuse
the `start_writes_to_cdc_table` function.
Ref #17409Closesscylladb/scylladb#17828
Affects load-and-stream for tablets only.
The intention is that only this loop is responsible for detecting
exhausted sstables and then discarding them for next iterations:
while (sstable_it != _sstables.rend() && exhausted(*sstable_it)) {
sstable_it++;
}
But the loop which consumes non exhausted sstables, on behalf of
each tablet, was incorrectly advancing the iterator, despite the
sstable wasn't considered exhausted.
Fixes#17733.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#17899
Fixes#17569
Tests are not closing file descriptor after it finishes. This leads to inability to continue tests since the default value for opened files in Linux is 1024. Issue easy to reproduce with the next command:
```
$ ./test.py --mode debug test_native_transport --repeat 1500
```
After fix applied all tests are passed with a next command:
```
$ ./test.py --mode debug test_native_transport --repeat 10000
```
Closesscylladb/scylladb#17798
sstables_manager now depends on system_keyspace for access to the
system.sstables table, needed by object storage. This violates
modularity, since sstables_manager is a relatively low-level leaf
module while system_keyspace integrates large parts of the system
(including, indirectly, sstables_manager).
One area where this is grating is sstables::test_env, which has
to include the much higher level cql_test_env to accommodate it.
Fix this by having sstables_manager expose its dependency on
system_keyspace as an interface, sstables_registry, and have
system_keyspace implement the glue logic in
system_keyspace_sstables_manager.
Closesscylladb/scylladb#17868
Lot's of BOOST_REQUIRES in this test require some integers to be in some
eq/gt/le relations to each other. And one place that compares rack names
as strings. Using more verbose boost checkers is preferred in such cases
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17866
This PR fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because tablet replica allocation for
rebuild will not be able to find a viable destination, as the replacing node
is not considered to be a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.
The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is in left state and replacing node is in normal state.
The replacing node waits for this draining to be complete on boot
before the node is considered booted.
Fixes https://github.com/scylladb/scylladb/issues/17025
Nodes in the left state will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:
1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first.
2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement.
It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.
Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).
In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet replica sets.
Currently left nodes are never removed from topology, so will
accumulate in memory. We could garbage-collect them from topology
coordinator if a left node is absent in any replica set. That means we
need a new state - left_for_real.
Closesscylladb/scylladb#17388
* github.com:scylladb/scylladb:
test: py: Add test for view replica pairing after replace
raft, api: Add RESTful API to query current leader of a raft group
test: test_tablets_removenode: Verify replacing when there is no spare node
doc: topology-on-raft: Document replace behavior with tablets
tablets, raft topology: Rebuild tablets after replacing node is normal
tablets: load_balancer: Access node attributes via node struct
tablets: load_balancer: Extract ensure_node()
mv: Switch to using host_id-based replica set
effective_replication_map: Introduce host_id-based get_replicas()
raft topology: Keep nodes in the left state to topology
tablets: Introduce read_required_hosts()
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* test_data in two different tests
* row_cache_stress_test::reader_id
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17861
For now test is incomplete in several ways
1. It xfails, until #17116
2. It doesn't rebuild/repair tablets
3. It doesn't check that tablet data actually exists on replicas
refs: #17575
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17808
It was observed that some use cases might append old data constantly to
memtable, blocking GC of expired tombstones.
That's because timestamp of memtable is unconditionally used for
calculating max purgeable, even when the memtable doesn't contain the
key of the tombstone we're trying to GC.
The idea is to treat memtable as we treat L0 sstables, i.e. it will
only prevent GC if it contains data that is possibly shadowed by the
expired tombstone (after checking for key presence and timestamp).
Memtable will usually have a small subset of keys in largest tier,
so after this change, a large fraction of keys containing expired
tombstones can be GCed when memtable contains old data.
Fixes#17599.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#17835
Fix writing cassandra-rackdc.properties with correct format data instead of yaml
Add a parameter to overwrite RF for specific DC
Add the possibility to connect cql to the specific node
In this PR 4 tests were added to test multi-DC functionality. One is added from initial commit were multi-DC possibility were introduced, however, this test was not commited. Three of them are migrations from dtest, that later will be deleted. To be able to execute migrated tests additional functionality is added: the ability to connect cql to the specific node in the cluster instead of pooled connection and the possibility to overwrite the replication factor for the specific DC. To be able to use the multi DC in test.py issue with the incorrect format of the properties file fixed in this PR.
Closesscylladb/scylladb#17503
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `perf_result_with_aio_writes`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17849
Newly joining nodes may not have a host id yet. Handle this and print a
"?" for these nodes, instead of the host-id.
Extend the existing test for joining node case (also rename it and add
comment).
Closesscylladb/scylladb#17853
test/raft/replication.cc defines a symbol named `tlogger`, while
test/raft/randomized_nemesis_test.cc also defines a symbol with
the same name. when linking the test with mold, it identified the ODR
violation.
in this change, we extract test-raft-helper out, so that
randomized_nemesis_test can selectively only link against this library.
this also matches with the behavior of the rules generated by `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17836
in gossiping_property_file_snitch_test, we use
`BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])` to check the equality
of two instances of `pair<sstring, sstring`, like:
```c++
BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])
```
since the standard library does not provide the formatter for printing
`std::pair<>`, we rely on the homebrew generic formatter to
print `std::pair<>, which in turn uses operator<< to format the
elements in the `pair`, but we intend to remove this formatter
in future, as the last step of #13245 .
so in order to enable Boost.test to print out lhs and rhs when
`BOOST_REQUIRE_EQUAL` check fails, we are adding
`boost_test_print_type()` for `pair<sstring,sstring>`. the helper
function uses {fmt} to print the `pair<>`.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17831
The assert_that_failed(future) pair of helpers are templates with
variadic futures, but since they are gone in seastar, so should they in
test/lib
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17830
Empty histograms are missing some of the members that non-empty
histograms have. The code handling these histograms assumed all required
members are always present and thus error out when receiving an empty
histogram.
Add tests for empty histograms and fix the code handling them to check
for the potentially missing members, instead of making assumptions.
Closesscylladb/scylladb#17816
The test is changed to be more strict. Verifies the case of replacing
when RF=N in which case tablet replicas have to be rebuilt using the
replacing node.
This would fail if tablets are drained as part of replace operation,
since replacing node is not yet a viable target for tablet migration.
Fix aiohttp usage issue in python 3.12:
"Timeout context manager should be used inside a task"
This occurs due to UnixRESTClient created in one event loop (created
inside pytest) but used in another (created in rewriten event_loop
fixture), now it is fixed by updating UnixRESTClient object for every new
loop.
Closesscylladb/scylladb#17760
Those nodes will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:
1) algorithms which work with replica sets filter nodes based on
their location. For example materialized views code which pairs base
replicas with view replicas filters by datacenter first.
2) tablet scheduler needs to identify each node's location in order
to make decisions about new replica placement.
It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.
Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).
In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet
replica sets.
We load topology infromation only for left nodes which are actually
referenced by any tablet. To achieve that, topology loading code
queries system.tablet for the set of hosts. This set is then passed to
system.topology loading method which decides whether to load
replica_state for a left node or not.
Will be used by topology loading code to determine which hosts are
needed in topology, even if they're in the left state. We want to load
only left nodes if they are referenced by any tablet, which may happen
temporarily until the replacement replica is rebuilt.
As the first clustering column. For vnode keyspaces, this will always be
"ALL", for tablet keyspaces, this will contain the name of the described
table.
Remove an unused function from test/cql-pytest/test_using_timeout.py.
Some linters can complain that this function used re.compile(), but
the "re" package was never imported. Since this function isn't used,
the right fix is to remove it - and not add the missing import.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17801
before this change, we rely on the homebrew generic formatter to
print unordered_set<>, which in turn uses operator<< to format the
elements in the `unordered_set`, but we intend to remove this formatter
in future, as the last step of #13245 .
so enable Boost.test to print out lhs and rhs when `BOOST_REQUIRE_EQUAL`
check fails, we are adding `boost_test_print_type()` for
`unordered_set<fruit>`. the helper function uses {fmt} to print the
`unordered_set<>`, so we are adding a fmt::formatter for `fruit`, the
operator<< for this type is dropped, as it is not used anymore.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17813
This series adds notification before dropping views and indices so that the
tablet_allocator can generate mutations to respectively drop all tablets associated with them from system.tablets.
Additional unit tests were added for these cases.
Note that one case is not yet tested: where a table is allowed to be dropped while having views that depend on it, when it is dropped from the alternator path.
This is tested indirectly by testing dropping a table with live secondary index as it follows the same notification path as views in this series.
Fixes#17627Closesscylladb/scylladb#17773
* github.com:scylladb/scylladb:
migration_manager: notify before_drop_column_family when dropping indices
schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
migration_manager: notify before_drop_column_family before dropping views
cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table
tablet_allocator: on_before_drop_column_family: remove unused result variable
Call the before_drop_column_family notifications
before dropping the views to allow the tablet_allocator
to delete the view's tablets.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Tablet transition would get stuck anyway for such nodes, so it's not worth trying
refs: #16372 (not fixes, because there's also repair transitions with same problem)
Closesscylladb/scylladb#17796
* github.com:scylladb/scylladb:
topology_coordinator: Skip dead nodes when balancing tablets
test: Add test for load_balancer skiplist
tablet_allocator: Add skiplist to load_balancer
The status command has an extensive amount of requests to the server. To be able to handle this more easily, the rest api mock server is refactored extensively to be more flexible, accepting expected requests out-of-order. While at it, the rest api mock server also moves away from a deprecated `aiohttp` feature: providing custom router argument to the `aiohttp` app. This forces us to pre-register all API endpoints that any test currently uses, although due to some templateing support, this is not as bad as it sounds. Still, this is an annoyance, but this point we have implemented almost all commands, so this won't be much a of a problem going forward.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#17547
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the status command
test/nodetool: rest_api_mock.py: match requests out-of-order
test/nodetool: rest_api_mock.py: remove trailing / from request paths
test/nodetool: rest_api_mock.py: use static routes
test/nodetool: check only non-exhausted requests
tools/scylla-nodetool: repair: set the jobThreads request parameter
It's too late to call `remove_rpc_client_with_ignored_topology` on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to `write_both_read_new`. The `remove_rpc_client_with_ignored_topology` function shutdowns sockets and causes such requests to timeout.
In this PR we move the `remove_rpc_client_with_ignored_topology` call to the earliest point possible when a node first appears in `token_metadata.topology`.
From the topology coordinator perspective this happens when a joining node moves to `node_state::bootstrapping` and the topology moves to `transition_state::join_group0`. In `sync_raft_topology_nodes` the node should be contained in transition_nodes. The successful `wait_for_ip` before entering `transition_state::join_group0` ensures that update_topology should find a node's IP and put it into the topology. The barrier in `commit_cdc_generation` will ensure that all nodes in the cluster are using the proper connection parameters.
Only outgoing connections are tracked by `remove_rpc_client_with_ignored_topology`, those created by the current node. This means we need to call `remove_rpc_client_with_ignored_topology` on each node of the cluster.
fixesscylladb/scylladb#17445Closesscylladb/scylladb#17757
* github.com:scylladb/scylladb:
test_remove_rpc_client_with_pending_requests: add a regression test
remove_rpc_client_with_ignored_topology: call it earlier
storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined
Since 6b87778 regular compaction tasks are removed from task manager
immediately after they are finished.
test_regular_compaction_task lists compaction tasks and then requests
their statuses. Only one regular compaction task is guaranteed to still
be running at that time, the rest of them may finish before their status
is requested and so it will no longer be in task manager, causing the test
to fail.
Fix statuses check to consider the possibility of a regular compaction
task being removed from task manager.
Fixes: #17776.
Closesscylladb/scylladb#17784
Nodetool currently assumes that positional arguments are only keyspaces.
ks.tbl pairs are only provided when --kt-list or friends are used. This
is not the case however. So check positional args too, and if they look
like ks.tbl, handle them accordingly.
While at it, also make sure that alternator keyspace and tables names
are handled correctly.
Closesscylladb/scylladb#17480
The method in question can have a shorter name that matches all other injections in this class, and can be non-template
Closesscylladb/scylladb#17734
* github.com:scylladb/scylladb:
error_injection: De-template inject() with handler
error_injection: Overload inject() instead of inject_with_handler()
This test reproduces the problem from scylladb/scylladb#17445.
It fails quite reliably without the fix from the previous
commit.
The test just bootstraps a new node while bombarding the cluster
with read requests.
The test is inspired by the test_load_balancing_with_empty_node one and
verifies that when a node is skiplisted, balancer doesn't put load on it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Contrary to Origin, the single-token case is not discriminated in the
native implementation, for two reasons:
* ScyllaDB doesn't ever run with a single token, it is even moving away
from vnodes.
* Origin implemented the logic to detect single-token with a mistake: it
compares the number of tokens to the number of DCs, not the number of
nodes.
Another difference is that the native implementation doesn't request
ownership information when a keyspace argument was not provided -- it is
not printed anyway.