Commit Graph

41839 Commits

Author SHA1 Message Date
Botond Dénes
a4e8bea679 tools/scylla-nodetool: status: handle missing host_id
Newly joining nodes may not have a host id yet. Handle this and print a
"?" for these nodes, instead of the host-id.
Extend the existing test for joining node case (also rename it and add
comment).

Closes scylladb/scylladb#17853
2024-03-18 12:26:59 +02:00
Kefu Chai
8811900602 build: cmake: do not link randomized_nemesis_test with replication.cc
test/raft/replication.cc defines a symbol named `tlogger`, while
test/raft/randomized_nemesis_test.cc also defines a symbol with
the same name. when linking the test with mold, it identified the ODR
violation.

in this change, we extract test-raft-helper out, so that
randomized_nemesis_test can selectively only link against this library.
this also matches with the behavior of the rules generated by `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17836
2024-03-17 17:01:47 +02:00
Kefu Chai
e1ae36ecfd test/boost: add formatter for BOOST_REQUIRE_EQUAL
in gossiping_property_file_snitch_test, we use
`BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])` to check the equality
of two instances of `pair<sstring, sstring`, like:
```c++
BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])
```

since the standard library does not provide the formatter for printing
`std::pair<>`, we rely on the homebrew generic formatter to
print `std::pair<>, which in turn uses operator<< to format the
elements in the `pair`, but we intend to remove this formatter
in future, as the last step of #13245 .

so in order to enable Boost.test to print out lhs and rhs when
`BOOST_REQUIRE_EQUAL` check fails, we are adding
`boost_test_print_type()` for `pair<sstring,sstring>`. the helper
function uses {fmt} to print the `pair<>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17831
2024-03-17 16:58:39 +02:00
Kefu Chai
6244a2ae00 service:qos: add fmt::formatter for service_level_options::workload_type
this change prepares for the fmt::formatter based formatter used by
tests, which will use {fmt} to print the elements in a container,
so we need to define the formatter using fmt::formatter for these
element. the operator<< for service_level_options::workload_type is
preserved, as the tests are still using it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17837
2024-03-17 16:52:57 +02:00
Kefu Chai
7df3acd39c repair: add fmt::formatter for row_level_diff_detect_algorithm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
row_level_diff_detect_algorithm. please note, we already have
`format_as()` overload for this type, but we cannot use it as a
fallback of the proper `fmt::formatter<>` specialization before
{fmt} v10. so before we update our CI to a distro with {fmt} v10,
`fmt::formatter<row_level_diff_detect_algorithm>` is still
needed.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17824
2024-03-16 19:12:49 +02:00
Botond Dénes
03c47bc30b tools/scylla-nodetool: status: handle nodes without load
Some nodes may not have a load yet. Handle this. Also add a test
covering this case.

Closes scylladb/scylladb#17823
2024-03-16 17:38:53 +02:00
Pavel Emelyanov
42a2dce4b6 test/lib: Eliminate variadic futures from template
The assert_that_failed(future) pair of helpers are templates with
variadic futures, but since they are gone in seastar, so should they in
test/lib

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17830
2024-03-16 17:37:25 +02:00
Kefu Chai
8bab51733f db: add fmt::formatter for db::functions::function
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::functions::function`.
please note, because we use `std::ostream` as the parameter of
the polymorphism implementation of `function::print()`.
without an intrusive change, we have to use `fmt::ostream_formatter`
or at least use similar technique to format the `function` instance
into an instance of `ostream` first. so instead of implementing
a "native" `fmt::formatter`, in this change, we just use
`fmt::ostream_formatter`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17832
2024-03-16 17:36:49 +02:00
Kefu Chai
23e9958ebb data_dictionary: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17826
2024-03-15 21:17:11 +03:00
Botond Dénes
ad9bad4700 tools/scylla-nodetool: {proxy,table}histograms: handle empty histograms
Empty histograms are missing some of the members that non-empty
histograms have. The code handling these histograms assumed all required
members are always present and thus error out when receiving an empty
histogram.
Add tests for empty histograms and fix the code handling them to check
for the potentially missing members, instead of making assumptions.

Closes scylladb/scylladb#17816
2024-03-15 15:59:31 +03:00
Artsiom Mishuta
73ed4c0eb5 test.py: fix aiohttp usage issue in python 3.12
Fix aiohttp usage issue in python 3.12:
"Timeout context manager should be used inside a task"

This occurs due to UnixRESTClient created in one event loop (created
inside pytest) but used in another (created in rewriten event_loop
fixture), now it is fixed by updating UnixRESTClient object for every new
loop.

Closes scylladb/scylladb#17760
2024-03-15 11:17:29 +01:00
Nadav Har'El
6cdb68f094 test/cql-pytest: remove unused function
Remove an unused function from test/cql-pytest/test_using_timeout.py.
Some linters can complain that this function used re.compile(), but
the "re" package was never imported. Since this function isn't used,
the right fix is to remove it - and not add the missing import.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17801
2024-03-15 09:56:30 +02:00
Kefu Chai
e1a9340cc1 partition_version: add fmt::formatter for partition_entry::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `parition_entry::printer`,
and drop its operator<< .

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17812
2024-03-15 09:52:27 +02:00
Kefu Chai
a0625261ef build: cmake: reword the comment for dev-headers
before this change, the comment was difficult to parse. let's update
it for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17814
2024-03-15 09:51:47 +02:00
Kefu Chai
640d573106 schema_mutations: add fmt::formatter for schema_mutations
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `schema_mutations`,
and drop its operator<< .

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17815
2024-03-15 09:49:56 +02:00
Kefu Chai
3edd530bd1 test/boost: add formatter for BOOST_REQUIRE_EQUAL
before this change, we rely on the homebrew generic formatter to
print unordered_set<>, which in turn uses operator<< to format the
elements in the `unordered_set`, but we intend to remove this formatter
in future, as the last step of #13245 .

so enable Boost.test to print out lhs and rhs when `BOOST_REQUIRE_EQUAL`
check fails, we are adding `boost_test_print_type()` for
`unordered_set<fruit>`. the helper function uses {fmt} to print the
`unordered_set<>`, so we are adding a fmt::formatter for `fruit`, the
operator<< for this type is dropped, as it is not used anymore.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17813
2024-03-15 09:40:22 +02:00
Benny Halevy
530d270828 api: /storage_service/tablets/balancing: fix incorrect operation summary
It was probably copy-pasted from /storage_service/tablets/move

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17811
2024-03-14 22:52:57 +01:00
Tomasz Grabiec
8c5d088928 Merge 'Drop tablets of dropped views and indices' from Benny Halevy
This series adds notification before dropping views and indices so that the
tablet_allocator can generate mutations to respectively drop all tablets associated with them from system.tablets.

Additional unit tests were added for these cases.

Note that one case is not yet tested: where a table is allowed to be dropped while having views that depend on it, when it is dropped from the alternator path.

This is tested indirectly by testing dropping a table with live secondary index as it follows the same notification path as views in this series.

Fixes #17627

Closes scylladb/scylladb#17773

* github.com:scylladb/scylladb:
  migration_manager: notify before_drop_column_family when dropping indices
  schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
  migration_manager: notify before_drop_column_family before dropping views
  cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table
  tablet_allocator: on_before_drop_column_family: remove unused result variable
2024-03-14 22:52:29 +01:00
Raphael S. Carvalho
c46c2d436f sstables: Reduce cost for loading sstables with tablets
Loader was changed to quickly determine ownership after consuming
sharding metadata only. If it's not available, it falls back to
reading first and last keys from summary. The fallback is only there
for backward compatibility and it costs a lot more as we don't
skip to the end where keys are located in summary.

With tablets, sharding metadata is only first and last keys so
we can do it without sharder. So loader will be able to use it
instead of looking up keys in summary.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17805
2024-03-14 21:06:35 +01:00
Pavel Emelyanov
8ffb5f27c7 topology_coordinator: Clear tablet transition session after streaming
When jumping from streaming stage into cleanup_target, session must also
be cleared as pending replica may still process some incoming mutations
blocked in the pipeline. Deleting session prior to executing barrier
makes sure those mutations will not be applied.

fixes: #17682

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17800
2024-03-14 20:35:00 +01:00
Pavel Emelyanov
6a77f36519 doc: Add tablets migration state diagram
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17790
2024-03-14 20:29:21 +01:00
Benny Halevy
5bfca73b30 migration_manager: notify before_drop_column_family when dropping indices
Fixes #17627

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:19:12 +02:00
Benny Halevy
9cf6a2e510 schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
When dropping indices, we don't need to go through
`create_view_for_index` in order to drop the index.
That actually creates a new schema for this view
which is used just for its metadata for generating mutations
dropping it.

Instead, use `find_schema` to lookup the current schema
for the dropped index.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:19:11 +02:00
Benny Halevy
358e92e645 migration_manager: notify before_drop_column_family before dropping views
Call the before_drop_column_family notifications
before dropping the views to allow the tablet_allocator
to delete the view's tablets.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:14:56 +02:00
Avi Kivity
5e28bf9b5c Merge 'Do not try to balance tablets on nodes which are known to be down' from Pavel Emelyanov
Tablet transition would get stuck anyway for such nodes, so it's not worth trying

refs: #16372 (not fixes, because there's also repair transitions with same problem)

Closes scylladb/scylladb#17796

* github.com:scylladb/scylladb:
  topology_coordinator: Skip dead nodes when balancing tablets
  test: Add test for load_balancer skiplist
  tablet_allocator: Add skiplist to load_balancer
2024-03-14 18:47:51 +02:00
Avi Kivity
0f188f2d9f Merge 'tools/scylla-nodetool: implement the status command' from Botond Dénes
The status command has an extensive amount of requests to the server. To be able to handle this more easily, the rest api mock server is refactored extensively to be more flexible, accepting expected requests out-of-order. While at it, the rest api mock server also moves away from a deprecated `aiohttp` feature: providing custom router argument to the `aiohttp` app. This forces us to pre-register all API endpoints that any test currently uses, although due to some templateing support, this is not as bad as it sounds. Still, this is an annoyance, but this point we have implemented almost all commands, so this won't be much a of a problem going forward.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17547

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the status command
  test/nodetool: rest_api_mock.py: match requests out-of-order
  test/nodetool: rest_api_mock.py: remove trailing / from request paths
  test/nodetool: rest_api_mock.py: use static routes
  test/nodetool: check only non-exhausted requests
  tools/scylla-nodetool: repair: set the jobThreads request parameter
2024-03-14 18:42:54 +02:00
Kamil Braun
5ef47c42b3 Merge 'remove_rpc_client_with_ignored_topology: recreate rpc client earlier' from Petr Gusev
It's too late to call `remove_rpc_client_with_ignored_topology` on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to `write_both_read_new`. The `remove_rpc_client_with_ignored_topology` function shutdowns sockets and causes such requests to timeout.

In this PR we move the `remove_rpc_client_with_ignored_topology` call to the earliest point possible when a node first appears in `token_metadata.topology`.

From the topology coordinator perspective this happens when a joining node moves to `node_state::bootstrapping` and the topology moves to `transition_state::join_group0`. In `sync_raft_topology_nodes` the node should be contained in transition_nodes. The successful `wait_for_ip` before entering `transition_state::join_group0` ensures that update_topology should find a node's IP and put it into the topology. The barrier in `commit_cdc_generation` will ensure that all nodes in the cluster are using the proper connection parameters.

Only outgoing connections are tracked by `remove_rpc_client_with_ignored_topology`, those created by the current node. This means we need to call `remove_rpc_client_with_ignored_topology` on each node of the cluster.

fixes scylladb/scylladb#17445

Closes scylladb/scylladb#17757

* github.com:scylladb/scylladb:
  test_remove_rpc_client_with_pending_requests: add a regression test
  remove_rpc_client_with_ignored_topology: call it earlier
  storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined
2024-03-14 17:20:59 +01:00
Yaniv Kaul
a2ac80340f Typo: pint -> print
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#17804
2024-03-14 15:50:35 +02:00
Wojciech Mitros
59d5bfa742 mv: fail base writes instead of dropping view updates when overloaded
Since 4c767c379c we can reach a situation
where we know that we have admitted too many expensive view update
operations and the mechanism of dropping the following view updates
can be triggerred in a wider range of scenarios. Ideally, we would
want to fail whole requests on the coordinator level, but for now, we
change the behavior to failing just the base writes. This allows us
to avoid creating inconsistencies between base replicas and views
at the cost of introducing inconsistencies between different base
replicas. This, however, can be fixed by repair, in contrast to
base-view inconsistencies which we don't have a good method of fixing.

Fixes #17795

Closes scylladb/scylladb#17777
2024-03-14 15:11:45 +02:00
Aleksandra Martyniuk
43ef6e6ab9 test: fix regular compaction tasks check
Since 6b87778 regular compaction tasks are removed from task manager
immediately after they are finished.

test_regular_compaction_task lists compaction tasks and then requests
their statuses. Only one regular compaction task is guaranteed to still
be running at that time, the rest of them may finish before their status
is requested and so it will no longer be in task manager, causing the test
to fail.

Fix statuses check to consider the possibility of a regular compaction
task being removed from task manager.

Fixes: #17776.

Closes scylladb/scylladb#17784
2024-03-14 14:40:18 +02:00
Piotr Smaron
ad2d039e3d db: move all group 0 tables to schema commitlog
This is to have durability for the group0 tables.
But also because I need it specifially to make
`system.topology` & `system_schema.scylla_keyspaces`
mutations under a single raft command in https://github.com/scylladb/scylladb/pull/16723

Fixes: #15596

Closes scylladb/scylladb#17783
2024-03-14 13:33:30 +01:00
Piotr Dulikowski
2d9e78b09a gossiper: failure detector: don't handle directly removed live endpoints
Commit 0665d9c346 changed the gossiper
failure detector in the following way: when live endpoints change
and per-node failure detectors finish their loops, the main failure
detector calls gossiper::convict for those nodes which were alive when
the current iteration of the main FD started but now are not. This was
changed in order to make sure that nodes are marked as down, because
some other code in gossiper could concurrently remove nodes from
the live node lists without marking them properly.

This was committed around 3 years ago and the situation changed:

- After 75d1dd3a76
  the `endpoint_state::_is_alive` field was removed and liveness
  of a node is solely determined by its presence
  in the `gossiper::_live_endpoints` field.
- Currently, all gossiper code which modifies `_live_endpoints`
  takes care to trigger relevant callback. The only function which
  modifies the field but does not trigger notifications
  is `gossiper::evict_from_membership`, but it is either called
  after `gossiper::remove_endpoint` which triggers callbacks
  by itself, or when a node is already dead and there is no need
  to trigger callbacks.

So, it looks like the reasons it was introduced for are not relevant
anymore. What's more important though is that it is involved in a bug
described in scylladb/scylladb#17515. In short, the following sequence
of events may happen:

1. Failure detector for some remote node X decides that it was dead
   long enough and `convict`s it, causing live endpoints to be updated.
2. The gossiper main loop sends a successful echo to X and *decides*
  to mark it as alive.
3. At the same time, failure detector for all nodes other than X finish
  and main failure detector continues; it notices that node X is
  not alive (because it was convicted in point 1.) and *decides*
  to convict it.
4. Actions planned in 2 and 3 run one after another, i.e. node is first
  marked as alive and then immediately as dead.

This causes `on_alive` callbacks to run first and then `on_dead`. The
second one is problematic as it closes RPC connections to node X - in
particular, if X is in the process of replacing another node with the
same IP then it may cause the replace operation to fail.

In order to simplify the code and fix the bug - remove the piece
of logic in question.

Fixes: scylladb/scylladb#17515

Closes scylladb/scylladb#17754
2024-03-14 13:29:17 +01:00
Botond Dénes
d6103dc1b6 tools/scylla-nodetool: snapshot: handle ks.tbl positional args correctly
Nodetool currently assumes that positional arguments are only keyspaces.
ks.tbl pairs are only provided when --kt-list or friends are used. This
is not the case however. So check positional args too, and if they look
like ks.tbl, handle them accordingly.

While at it, also make sure that alternator keyspace and tables names
are handled correctly.

Closes scylladb/scylladb#17480
2024-03-14 13:42:23 +02:00
Avi Kivity
dd76e1c834 Merge 'Simplify error_injection::inject_with_handler()' from Pavel Emelyanov
The method in question can have a shorter name that matches all other injections in this class, and can be non-template

Closes scylladb/scylladb#17734

* github.com:scylladb/scylladb:
  error_injection: De-template inject() with handler
  error_injection: Overload inject() instead of inject_with_handler()
2024-03-14 13:37:54 +02:00
Petr Gusev
2783985bb2 test_remove_rpc_client_with_pending_requests: add a regression test
This test reproduces the problem from scylladb/scylladb#17445.
It fails quite reliably without the fix from the previous
commit.

The test just bootstraps a new node while bombarding the cluster
with read requests.
2024-03-14 15:17:34 +04:00
Petr Gusev
398e14d6d0 remove_rpc_client_with_ignored_topology: call it earlier
In this commit we move the remove_rpc_client_with_ignored_topology
call to the earliest point possible - when a node first appears
in token_metadata.topology.

From the topology coordinator perspective this happens when a joining
node moves to node_state::bootstrapping and the topology moves to
transition_state::join_group0. In sync_raft_topology_nodes
the node should be contained in transition_nodes. The successful
wait_for_ip before entering transition_state::join_group0 ensures
that update_topology should find a node's IP and put it into the topology.
The barrier in commit_cdc_generation will ensure that all nodes
in the cluster are using the proper connection parameters.

Only outgoing connections are tracked by remove_rpc_client_with_ignored_topology,
those created by the current node. This means we need to call
remove_rpc_client_with_ignored_topology on each node of the cluster.

fixes scylladb/scylladb#17445
2024-03-14 15:10:09 +04:00
Petr Gusev
1b9f21314f storage_service: decouple remove_rpc_client_with_ignored_topology from
notify_joined

It's too late to call remove_rpc_client_with_ignored_topology on
messaging service when a node becomes normal. Data
plane requests can be routed to the node much earlier,
at least when topology switches to write_both_read_new.
The remove_rpc_client_with_ignored_topology function
shutdowns sockets and causes such requests to timeout.

We intend to call remove_rpc_client_with_ignored_topology
as soon as a node becomes part of token_metadata topology.
In this preparatory commit we refactor
storage_service::notify_joined. We remove the
remove_rpc_client_with_ignored_topology call from it
call it separately from the two call sites of notify_joined.
2024-03-14 15:10:09 +04:00
Kefu Chai
ce17841860 tools/scylla-nodetool: print bpo::options_description with fmt::streamed
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, since boost::program_options::options_description is
defined by boost.program_options library, and it only provides the
operator<< overload. we're inclined to not specializing `fmt::formatter`
for it at this moment, because

* this class is not in defined by scylla project. we would have to
  find a home for this formatter.
* we are not likely to reuse the formatter in multiple places

so, in this change we just print it using `fmt::streamed`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17791
2024-03-14 10:44:32 +02:00
Pavel Emelyanov
33d258528e topology_coordinator: Skip dead nodes when balancing tablets
The coordinator can find out which nodes are marked as DOWN, thus when
calling tablets balancer it can feed it a skiplist

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:51:11 +03:00
Pavel Emelyanov
ee55e8442a test: Add test for load_balancer skiplist
The test is inspired by the test_load_balancing_with_empty_node one and
verifies that when a node is skiplisted, balancer doesn't put load on it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:50:21 +03:00
Pavel Emelyanov
b4dd732dab tablet_allocator: Add skiplist to load_balancer
Currently load balancer skips nodes only based on its "administrative"
state, i.e. whether it's drained/decommissioned/removed/etc. There's no
way to exclude any node from balancing decision based on anything else.
This patch add this ability by adding skiplist argument to
balance_tablets() method. When a node is in it, it will not be
considered, as if it was removenode-d.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:47:31 +03:00
Kefu Chai
926fe29ebd db: commitlog: add fmt::formatter for commitlog types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* db::commitlog::segment::cf_mark
* db::commitlog::segment_manager::named_file
* db::commitlog::segment_manager::dispose_mode
* db::commitlog::segment_manager::byte_flow<T>

please note, the formatter of `db::commitlog::segment` is not
included in this commit, as we are formatting it in the inline
definition of this class. so we cannot define the specialization
of `fmt::formatter` for this class before its callers -- we'd
either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`.
either way, it's different from the theme of this commit, and we
will handle it in a separated commit.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17792
2024-03-14 09:28:12 +02:00
Botond Dénes
20d5c536b5 tools/scylla-nodetool: implement the status command
Contrary to Origin, the single-token case is not discriminated in the
native implementation, for two reasons:
* ScyllaDB doesn't ever run with a single token, it is even moving away
  from vnodes.
* Origin implemented the logic to detect single-token with a mistake: it
  compares the number of tokens to the number of DCs, not the number of
  nodes.

Another difference is that the native implementation doesn't request
ownership information when a keyspace argument was not provided -- it is
not printed anyway.
2024-03-14 03:27:04 -04:00
Botond Dénes
2d4f4cfad4 test/nodetool: rest_api_mock.py: match requests out-of-order
In the previous patch, we made matching requests to different endpoints
be matched out-of-order. In this patch we go one step further and make
matching requests to the same endpoint match out-of-order too.
With this, tests can register the expected requests in any order, not in
the same order as the nodetool-under-test is expected to send them. This
makes testing more flexible. Also, how requests are ordered is not
interesting from the correctness' POV anyway.
2024-03-14 03:27:04 -04:00
Botond Dénes
09a27f49ea test/nodetool: rest_api_mock.py: remove trailing / from request paths
The legacy nodetool likes to append an "/" to the requests paths every
now and then, but not consistently. Unfortunately, request path matching
in the mock rest server and in aiohttp is quite sensitive to this
currently. Reduce friction by removing trailing "/" from paths in the
mock api, allowing paths to match each other even if one has a trailing
"/" but the other doesn't.
Unfortunately there is nothing we can do about the aiohttp part, so some
API endpoints have to be registered with a trailing "/".
2024-03-14 03:27:04 -04:00
Botond Dénes
5659f23b2a test/nodetool: rest_api_mock.py: use static routes
The mock server currently provides its own router to the aiohttp.web
app. The ability to provide custom routers  however is deprecated and
can be removed at any point. So refactor the mock server to use the
built-in router. This requires some changes, because the built-in router
does not allow adding/removing routes once the server starts. However
the mock server only learns of the used routes when the tests run.
This unfortunately means that we have to statically register all
possible routes the tests will use. Fortunately, aiohttp has variable
route support (templated routes) and with this, we can get away with
just 9 statically registered routes, which is not too bad.

A (desired) side-effect of this refactoring is that now requests to
different routes do not have to arrive in order. This constraint of the
previous implementation proved to be not useful, and even made writing
certain tests awkward.
2024-03-14 03:27:04 -04:00
Botond Dénes
061bd89957 test/nodetool: check only non-exhausted requests
Refactor how the tests check for expected requests which were never
invoked. At the end of every test, the nodetool fixture requests all
unconsumed expected requests from the rest_api_mock.py and checks that
there is none. This mechanism has some interaction with requests which
have a "multiple" set: rest_api_mock.py allows registering requests with
different "multiple" requirements -- how many times a request is
expected to be invoked:
* ANY: [0, +inf)
* ONE: 1
* MULTIPLE: [1, +inf)

Requests are stored in a stack. When a request arrives, we pop off
requests from the top until we find a perfect match. We pop off
requests, iff: multiple == ANY || multiple == MULTIPLE and was hit at
least once.
This works as long as we don't have an multiple=ANY request at the
bottom of the stack which is never invoked. Or a multiple=MULTIPLE one.
This will get worse once we refactor requests to be not stored in a
stack.

So in this patch, we filter requests when collecting unexhausted ones,
dropping those which would be qualified to be popped from the stack.
2024-03-14 03:27:04 -04:00
Botond Dénes
be5a18c07d tools/scylla-nodetool: repair: set the jobThreads request parameter
Although ScyllaDB ignores this request parameter, the Java nodetools
sets it, so it is better to have the native one do the same for
symmetry. It makes testing easier.
Discovered with the more strict request matching introduced in the next
patches.
2024-03-14 03:26:13 -04:00
Benny Halevy
b4245bf46e cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 09:01:30 +02:00
Asias He
9d41fb9bcd repair: Add hosts and ignore_nodes option support for tablet repair
It is not supported currently.

If a user passes the option, the request will be rejected with:

    The hosts option is not supported for tablet repair
    The ignore_nodes option is not supported for tablet repair

This option is useful to select nodes to repair.

Fixes: #17742

Tests: repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes
       repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes_errors
       repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_dc_host

Closes scylladb/scylladb#17767
2024-03-14 08:40:30 +02:00