Commit Graph

41822 Commits

Author SHA1 Message Date
Tomasz Grabiec
6d24fdee75 test: test_tablets_removenode: Verify replacing when there is no spare node
The test is changed to be more strict. Verifies the case of replacing
when RF=N in which case tablet replicas have to be rebuilt using the
replacing node.

This would fail if tablets are drained as part of replace operation,
since replacing node is not yet a viable target for tablet migration.
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
1d01b4ca20 doc: topology-on-raft: Document replace behavior with tablets 2024-03-15 13:20:08 +01:00
Tomasz Grabiec
1c71f44e63 tablets, raft topology: Rebuild tablets after replacing node is normal
This fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because new tablet replica allocation
will not be able to find a viable destination, as the replacing node
is not considered a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is left and replacing node is normal.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes #17025
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
b2418fab39 tablets: load_balancer: Access node attributes via node struct
Reduces lookups into topology and decouples the algorithm more from
the topology object.
2024-03-15 11:22:34 +01:00
Tomasz Grabiec
9090050244 tablets: load_balancer: Extract ensure_node()
Will be called in another loop to populate the "nodes" map with left node.
2024-03-15 11:22:32 +01:00
Tomasz Grabiec
9b656ec2aa mv: Switch to using host_id-based replica set
This is necessary to not break replica pairing between base and
view. After replacing a node, tablet replica set contains for a while
the replaced node which is in the left state. This node is not
returned by the IP-based get_natural_endpoints() so the replica
indexes would shift, changing the pairing with the view.

The host_id-based replica set always has stable indexes for replicas.
2024-03-15 11:05:29 +01:00
Tomasz Grabiec
888dc41d66 effective_replication_map: Introduce host_id-based get_replicas() 2024-03-15 11:05:29 +01:00
Tomasz Grabiec
61b3453552 raft topology: Keep nodes in the left state to topology
Those nodes will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on
 their location. For example materialized views code which pairs base
 replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order
 to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet
replica sets.

We load topology infromation only for left nodes which are actually
referenced by any tablet. To achieve that, topology loading code
queries system.tablet for the set of hosts. This set is then passed to
system.topology loading method which decides whether to load
replica_state for a left node or not.
2024-03-15 11:05:29 +01:00
Tomasz Grabiec
f7851696fa tablets: Introduce read_required_hosts()
Will be used by topology loading code to determine which hosts are
needed in topology, even if they're in the left state. We want to load
only left nodes if they are referenced by any tablet, which may happen
temporarily until the replacement replica is rebuilt.
2024-03-15 11:05:29 +01:00
Avi Kivity
5e28bf9b5c Merge 'Do not try to balance tablets on nodes which are known to be down' from Pavel Emelyanov
Tablet transition would get stuck anyway for such nodes, so it's not worth trying

refs: #16372 (not fixes, because there's also repair transitions with same problem)

Closes scylladb/scylladb#17796

* github.com:scylladb/scylladb:
  topology_coordinator: Skip dead nodes when balancing tablets
  test: Add test for load_balancer skiplist
  tablet_allocator: Add skiplist to load_balancer
2024-03-14 18:47:51 +02:00
Avi Kivity
0f188f2d9f Merge 'tools/scylla-nodetool: implement the status command' from Botond Dénes
The status command has an extensive amount of requests to the server. To be able to handle this more easily, the rest api mock server is refactored extensively to be more flexible, accepting expected requests out-of-order. While at it, the rest api mock server also moves away from a deprecated `aiohttp` feature: providing custom router argument to the `aiohttp` app. This forces us to pre-register all API endpoints that any test currently uses, although due to some templateing support, this is not as bad as it sounds. Still, this is an annoyance, but this point we have implemented almost all commands, so this won't be much a of a problem going forward.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17547

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the status command
  test/nodetool: rest_api_mock.py: match requests out-of-order
  test/nodetool: rest_api_mock.py: remove trailing / from request paths
  test/nodetool: rest_api_mock.py: use static routes
  test/nodetool: check only non-exhausted requests
  tools/scylla-nodetool: repair: set the jobThreads request parameter
2024-03-14 18:42:54 +02:00
Kamil Braun
5ef47c42b3 Merge 'remove_rpc_client_with_ignored_topology: recreate rpc client earlier' from Petr Gusev
It's too late to call `remove_rpc_client_with_ignored_topology` on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to `write_both_read_new`. The `remove_rpc_client_with_ignored_topology` function shutdowns sockets and causes such requests to timeout.

In this PR we move the `remove_rpc_client_with_ignored_topology` call to the earliest point possible when a node first appears in `token_metadata.topology`.

From the topology coordinator perspective this happens when a joining node moves to `node_state::bootstrapping` and the topology moves to `transition_state::join_group0`. In `sync_raft_topology_nodes` the node should be contained in transition_nodes. The successful `wait_for_ip` before entering `transition_state::join_group0` ensures that update_topology should find a node's IP and put it into the topology. The barrier in `commit_cdc_generation` will ensure that all nodes in the cluster are using the proper connection parameters.

Only outgoing connections are tracked by `remove_rpc_client_with_ignored_topology`, those created by the current node. This means we need to call `remove_rpc_client_with_ignored_topology` on each node of the cluster.

fixes scylladb/scylladb#17445

Closes scylladb/scylladb#17757

* github.com:scylladb/scylladb:
  test_remove_rpc_client_with_pending_requests: add a regression test
  remove_rpc_client_with_ignored_topology: call it earlier
  storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined
2024-03-14 17:20:59 +01:00
Yaniv Kaul
a2ac80340f Typo: pint -> print
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#17804
2024-03-14 15:50:35 +02:00
Wojciech Mitros
59d5bfa742 mv: fail base writes instead of dropping view updates when overloaded
Since 4c767c379c we can reach a situation
where we know that we have admitted too many expensive view update
operations and the mechanism of dropping the following view updates
can be triggerred in a wider range of scenarios. Ideally, we would
want to fail whole requests on the coordinator level, but for now, we
change the behavior to failing just the base writes. This allows us
to avoid creating inconsistencies between base replicas and views
at the cost of introducing inconsistencies between different base
replicas. This, however, can be fixed by repair, in contrast to
base-view inconsistencies which we don't have a good method of fixing.

Fixes #17795

Closes scylladb/scylladb#17777
2024-03-14 15:11:45 +02:00
Aleksandra Martyniuk
43ef6e6ab9 test: fix regular compaction tasks check
Since 6b87778 regular compaction tasks are removed from task manager
immediately after they are finished.

test_regular_compaction_task lists compaction tasks and then requests
their statuses. Only one regular compaction task is guaranteed to still
be running at that time, the rest of them may finish before their status
is requested and so it will no longer be in task manager, causing the test
to fail.

Fix statuses check to consider the possibility of a regular compaction
task being removed from task manager.

Fixes: #17776.

Closes scylladb/scylladb#17784
2024-03-14 14:40:18 +02:00
Piotr Smaron
ad2d039e3d db: move all group 0 tables to schema commitlog
This is to have durability for the group0 tables.
But also because I need it specifially to make
`system.topology` & `system_schema.scylla_keyspaces`
mutations under a single raft command in https://github.com/scylladb/scylladb/pull/16723

Fixes: #15596

Closes scylladb/scylladb#17783
2024-03-14 13:33:30 +01:00
Piotr Dulikowski
2d9e78b09a gossiper: failure detector: don't handle directly removed live endpoints
Commit 0665d9c346 changed the gossiper
failure detector in the following way: when live endpoints change
and per-node failure detectors finish their loops, the main failure
detector calls gossiper::convict for those nodes which were alive when
the current iteration of the main FD started but now are not. This was
changed in order to make sure that nodes are marked as down, because
some other code in gossiper could concurrently remove nodes from
the live node lists without marking them properly.

This was committed around 3 years ago and the situation changed:

- After 75d1dd3a76
  the `endpoint_state::_is_alive` field was removed and liveness
  of a node is solely determined by its presence
  in the `gossiper::_live_endpoints` field.
- Currently, all gossiper code which modifies `_live_endpoints`
  takes care to trigger relevant callback. The only function which
  modifies the field but does not trigger notifications
  is `gossiper::evict_from_membership`, but it is either called
  after `gossiper::remove_endpoint` which triggers callbacks
  by itself, or when a node is already dead and there is no need
  to trigger callbacks.

So, it looks like the reasons it was introduced for are not relevant
anymore. What's more important though is that it is involved in a bug
described in scylladb/scylladb#17515. In short, the following sequence
of events may happen:

1. Failure detector for some remote node X decides that it was dead
   long enough and `convict`s it, causing live endpoints to be updated.
2. The gossiper main loop sends a successful echo to X and *decides*
  to mark it as alive.
3. At the same time, failure detector for all nodes other than X finish
  and main failure detector continues; it notices that node X is
  not alive (because it was convicted in point 1.) and *decides*
  to convict it.
4. Actions planned in 2 and 3 run one after another, i.e. node is first
  marked as alive and then immediately as dead.

This causes `on_alive` callbacks to run first and then `on_dead`. The
second one is problematic as it closes RPC connections to node X - in
particular, if X is in the process of replacing another node with the
same IP then it may cause the replace operation to fail.

In order to simplify the code and fix the bug - remove the piece
of logic in question.

Fixes: scylladb/scylladb#17515

Closes scylladb/scylladb#17754
2024-03-14 13:29:17 +01:00
Botond Dénes
d6103dc1b6 tools/scylla-nodetool: snapshot: handle ks.tbl positional args correctly
Nodetool currently assumes that positional arguments are only keyspaces.
ks.tbl pairs are only provided when --kt-list or friends are used. This
is not the case however. So check positional args too, and if they look
like ks.tbl, handle them accordingly.

While at it, also make sure that alternator keyspace and tables names
are handled correctly.

Closes scylladb/scylladb#17480
2024-03-14 13:42:23 +02:00
Avi Kivity
dd76e1c834 Merge 'Simplify error_injection::inject_with_handler()' from Pavel Emelyanov
The method in question can have a shorter name that matches all other injections in this class, and can be non-template

Closes scylladb/scylladb#17734

* github.com:scylladb/scylladb:
  error_injection: De-template inject() with handler
  error_injection: Overload inject() instead of inject_with_handler()
2024-03-14 13:37:54 +02:00
Petr Gusev
2783985bb2 test_remove_rpc_client_with_pending_requests: add a regression test
This test reproduces the problem from scylladb/scylladb#17445.
It fails quite reliably without the fix from the previous
commit.

The test just bootstraps a new node while bombarding the cluster
with read requests.
2024-03-14 15:17:34 +04:00
Petr Gusev
398e14d6d0 remove_rpc_client_with_ignored_topology: call it earlier
In this commit we move the remove_rpc_client_with_ignored_topology
call to the earliest point possible - when a node first appears
in token_metadata.topology.

From the topology coordinator perspective this happens when a joining
node moves to node_state::bootstrapping and the topology moves to
transition_state::join_group0. In sync_raft_topology_nodes
the node should be contained in transition_nodes. The successful
wait_for_ip before entering transition_state::join_group0 ensures
that update_topology should find a node's IP and put it into the topology.
The barrier in commit_cdc_generation will ensure that all nodes
in the cluster are using the proper connection parameters.

Only outgoing connections are tracked by remove_rpc_client_with_ignored_topology,
those created by the current node. This means we need to call
remove_rpc_client_with_ignored_topology on each node of the cluster.

fixes scylladb/scylladb#17445
2024-03-14 15:10:09 +04:00
Petr Gusev
1b9f21314f storage_service: decouple remove_rpc_client_with_ignored_topology from
notify_joined

It's too late to call remove_rpc_client_with_ignored_topology on
messaging service when a node becomes normal. Data
plane requests can be routed to the node much earlier,
at least when topology switches to write_both_read_new.
The remove_rpc_client_with_ignored_topology function
shutdowns sockets and causes such requests to timeout.

We intend to call remove_rpc_client_with_ignored_topology
as soon as a node becomes part of token_metadata topology.
In this preparatory commit we refactor
storage_service::notify_joined. We remove the
remove_rpc_client_with_ignored_topology call from it
call it separately from the two call sites of notify_joined.
2024-03-14 15:10:09 +04:00
Kefu Chai
ce17841860 tools/scylla-nodetool: print bpo::options_description with fmt::streamed
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, since boost::program_options::options_description is
defined by boost.program_options library, and it only provides the
operator<< overload. we're inclined to not specializing `fmt::formatter`
for it at this moment, because

* this class is not in defined by scylla project. we would have to
  find a home for this formatter.
* we are not likely to reuse the formatter in multiple places

so, in this change we just print it using `fmt::streamed`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17791
2024-03-14 10:44:32 +02:00
Pavel Emelyanov
33d258528e topology_coordinator: Skip dead nodes when balancing tablets
The coordinator can find out which nodes are marked as DOWN, thus when
calling tablets balancer it can feed it a skiplist

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:51:11 +03:00
Pavel Emelyanov
ee55e8442a test: Add test for load_balancer skiplist
The test is inspired by the test_load_balancing_with_empty_node one and
verifies that when a node is skiplisted, balancer doesn't put load on it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:50:21 +03:00
Pavel Emelyanov
b4dd732dab tablet_allocator: Add skiplist to load_balancer
Currently load balancer skips nodes only based on its "administrative"
state, i.e. whether it's drained/decommissioned/removed/etc. There's no
way to exclude any node from balancing decision based on anything else.
This patch add this ability by adding skiplist argument to
balance_tablets() method. When a node is in it, it will not be
considered, as if it was removenode-d.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:47:31 +03:00
Kefu Chai
926fe29ebd db: commitlog: add fmt::formatter for commitlog types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* db::commitlog::segment::cf_mark
* db::commitlog::segment_manager::named_file
* db::commitlog::segment_manager::dispose_mode
* db::commitlog::segment_manager::byte_flow<T>

please note, the formatter of `db::commitlog::segment` is not
included in this commit, as we are formatting it in the inline
definition of this class. so we cannot define the specialization
of `fmt::formatter` for this class before its callers -- we'd
either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`.
either way, it's different from the theme of this commit, and we
will handle it in a separated commit.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17792
2024-03-14 09:28:12 +02:00
Botond Dénes
20d5c536b5 tools/scylla-nodetool: implement the status command
Contrary to Origin, the single-token case is not discriminated in the
native implementation, for two reasons:
* ScyllaDB doesn't ever run with a single token, it is even moving away
  from vnodes.
* Origin implemented the logic to detect single-token with a mistake: it
  compares the number of tokens to the number of DCs, not the number of
  nodes.

Another difference is that the native implementation doesn't request
ownership information when a keyspace argument was not provided -- it is
not printed anyway.
2024-03-14 03:27:04 -04:00
Botond Dénes
2d4f4cfad4 test/nodetool: rest_api_mock.py: match requests out-of-order
In the previous patch, we made matching requests to different endpoints
be matched out-of-order. In this patch we go one step further and make
matching requests to the same endpoint match out-of-order too.
With this, tests can register the expected requests in any order, not in
the same order as the nodetool-under-test is expected to send them. This
makes testing more flexible. Also, how requests are ordered is not
interesting from the correctness' POV anyway.
2024-03-14 03:27:04 -04:00
Botond Dénes
09a27f49ea test/nodetool: rest_api_mock.py: remove trailing / from request paths
The legacy nodetool likes to append an "/" to the requests paths every
now and then, but not consistently. Unfortunately, request path matching
in the mock rest server and in aiohttp is quite sensitive to this
currently. Reduce friction by removing trailing "/" from paths in the
mock api, allowing paths to match each other even if one has a trailing
"/" but the other doesn't.
Unfortunately there is nothing we can do about the aiohttp part, so some
API endpoints have to be registered with a trailing "/".
2024-03-14 03:27:04 -04:00
Botond Dénes
5659f23b2a test/nodetool: rest_api_mock.py: use static routes
The mock server currently provides its own router to the aiohttp.web
app. The ability to provide custom routers  however is deprecated and
can be removed at any point. So refactor the mock server to use the
built-in router. This requires some changes, because the built-in router
does not allow adding/removing routes once the server starts. However
the mock server only learns of the used routes when the tests run.
This unfortunately means that we have to statically register all
possible routes the tests will use. Fortunately, aiohttp has variable
route support (templated routes) and with this, we can get away with
just 9 statically registered routes, which is not too bad.

A (desired) side-effect of this refactoring is that now requests to
different routes do not have to arrive in order. This constraint of the
previous implementation proved to be not useful, and even made writing
certain tests awkward.
2024-03-14 03:27:04 -04:00
Botond Dénes
061bd89957 test/nodetool: check only non-exhausted requests
Refactor how the tests check for expected requests which were never
invoked. At the end of every test, the nodetool fixture requests all
unconsumed expected requests from the rest_api_mock.py and checks that
there is none. This mechanism has some interaction with requests which
have a "multiple" set: rest_api_mock.py allows registering requests with
different "multiple" requirements -- how many times a request is
expected to be invoked:
* ANY: [0, +inf)
* ONE: 1
* MULTIPLE: [1, +inf)

Requests are stored in a stack. When a request arrives, we pop off
requests from the top until we find a perfect match. We pop off
requests, iff: multiple == ANY || multiple == MULTIPLE and was hit at
least once.
This works as long as we don't have an multiple=ANY request at the
bottom of the stack which is never invoked. Or a multiple=MULTIPLE one.
This will get worse once we refactor requests to be not stored in a
stack.

So in this patch, we filter requests when collecting unexhausted ones,
dropping those which would be qualified to be popped from the stack.
2024-03-14 03:27:04 -04:00
Botond Dénes
be5a18c07d tools/scylla-nodetool: repair: set the jobThreads request parameter
Although ScyllaDB ignores this request parameter, the Java nodetools
sets it, so it is better to have the native one do the same for
symmetry. It makes testing easier.
Discovered with the more strict request matching introduced in the next
patches.
2024-03-14 03:26:13 -04:00
Asias He
9d41fb9bcd repair: Add hosts and ignore_nodes option support for tablet repair
It is not supported currently.

If a user passes the option, the request will be rejected with:

    The hosts option is not supported for tablet repair
    The ignore_nodes option is not supported for tablet repair

This option is useful to select nodes to repair.

Fixes: #17742

Tests: repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes
       repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes_errors
       repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_dc_host

Closes scylladb/scylladb#17767
2024-03-14 08:40:30 +02:00
Avi Kivity
c1d8a1dda5 Merge 'Fix false-positive errors in scrub validate-mode' from Botond Dénes
The new MX-native validator, which validates the index in tandem with the data file, was discovered to print false-positive errors, related to range-tombstones and promoted-index positions.
This series fixes that. But first, it refactors the scrub-related tests. These are currently dominated by boiler-plate code. They are hard to read and hard to write. In the first half of the series, a new `scrub_test` is introduced, which moves all the boiler-plate to a central place, allowing the tests to focus on just the aspect of scrub that is tested.
Then, all the found bugs in validate are fixed and finally a new test, checking validate with valid sstable is introduced.

Fixes: #16326

Closes scylladb/scylladb#16327

* github.com:scylladb/scylladb:
  test/boost/sstable_compaction_test: add validation test with valid sstable
  sstablex/mx/reader: validate(): print trace message when finishing the PI block
  sstablex/mx/reader: validate(): make index-data PI position check message consistent
  sstablex/mx/reader: validate(): only load the next PI block if current is exhausted
  sstablex/mx/reader: validate(): reset the current PI block on partition-start
  sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked
  sstablex/mx/reader: validate(): fix validator for range tombstone end bounds
  test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper
  test/boost/sstable_compaction_test: fix indentation
  test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test
  test/boost/sstable_compaction_test: introduce scrub_test_framework
  test/lib/random_schema: add uncompatible_timestamp_generator()
2024-03-13 20:51:30 +02:00
Kefu Chai
15bea069a9 docs: use less slangy language
this is a follow-up change of 1519904fb9, to incorporate the comment
from Anna Stuchlik.

Signed-off-by: Anna Stuchlik <anna.stuchlik@scylladb.com>

Closes scylladb/scylladb#17671
2024-03-13 13:33:37 +02:00
Avi Kivity
4db4b2279c Merge 'tools/scylla-nodetool: implement the last batch of commands' from Botond Dénes
This PR implements the following new nodetool commands:
* netstats
* tablehistograms/cfhistograms
* proxyhistograms

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17651

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the proxyhistograms command
  tools/scylla-nodetool: implement the tableshistograms command
  tools/scylla-nodetool: introduce buffer_samples
  utils/estimated_histogram: estimated_histogram: add constructor taking buckets
  tools/scylla-nodetool: implement the netstats command
  tools/scylla-nodetool: add correct units to file_size_printer
2024-03-13 12:46:11 +02:00
Avi Kivity
e120ba3514 sstables: partition_index_cache: evict entries within a page gently
When the partition_index_cache is evicted, we yield for preemption between
pages, but not within a page.

Commit 3b2890e1db ("sstables: Switch index_list to chunked_vector
to avoid large allocations") recognized that index pages can be large enough
to overflow a 128k alignment block (this was before the index cache and
index entries were not stored in LSA then). However, it did not go as far as
to gently free individual entries; either the problem was not recognized
or wasn't as bad.

As the referenced issue shows, a fairly large stall can happen when freeing
the page. The workload had a large number of tombstones, so index selectivity
was poor.

Fix by evicting individual rows gently.

The fix ignores the case where rows are still references: it is unlikely
that all index pages will be referenced, and in any case skipping over
a referenced page takes an insignificant amount of time, compared to freeing
a page.

Fixes #17605

Closes scylladb/scylladb#17606
2024-03-13 10:44:37 +01:00
Marcin Maliszkiewicz
7b60752e47 test: fix cql connection problem in test_auth_raft_command_split
This is a speculative fix as the problem is observed only on CI.
When run_async is called right after driver_connect and get_cql
it fails with ConnectionException('Host has been marked down or
removed').

If the approach proves to be succesfull we can start to deprecate
base get_cql in favor of get_ready_cql. It's better to have robust
testing helper libraries than try to take care of it in every test
case separately.

Fixes #17713

Closes scylladb/scylladb#17772
2024-03-13 10:36:51 +01:00
Pavel Emelyanov
4d83a8c12c topology_coordinator: Mark constant class methods with const
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17756
2024-03-13 10:23:39 +02:00
Pavel Emelyanov
2e982df898 test/tablets: Generalize repair history loading
Two repair test cases verify that repair generated enough rows in the
history table. Both use identical code for that, worth generalizing

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17761
2024-03-13 10:22:57 +02:00
Pavel Emelyanov
88a40b0dfa uuid: UUID_gen::get_UUID src argument is const pointer
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17762
2024-03-13 10:21:25 +02:00
Botond Dénes
53e3325845 Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* mutation_partition_v2::printer
* frozen_mutation::printer
* mutation

their operator<<:s are dropped.

Refs #13245

Closes scylladb/scylladb#17769

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for mutation
  mutation: add fmt::formatter for frozen_mutation::printer
  mutation: add fmt::formatter for mutation_partition_v2::printer
2024-03-13 10:13:09 +02:00
Pavel Emelyanov
488404e080 gms: Remove unused i_failure_detection_event_listener
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17765
2024-03-13 09:33:56 +02:00
Kefu Chai
fb4f48b4ed schema: add fmt::formatter for schema
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* column_definition
* column_mapping
* ordinal_column_id
* raw_view_info
* schema
* view_ptr

their operator<<:s are dropped. but operator<< for schema is preserved,
as we are still printing `seastar::lw_shared_ptr<const schema>` with
our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which
uses operator<< to print the pointee.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17768
2024-03-13 09:29:00 +02:00
Kefu Chai
85c4034495 .git: skip redis/lolwut.cc when scanning spelling errors
codespell reports "Nees" should be "Needs" but "Nees" is the last
name of Georg Nees. so it is not a misspelling. can should not be
fixed.

since the purpose of lolwut.cc is to display Redis version and
print a generative computer art. the one included by our version
was created by Georg Nees. since the LOLWUT command does not contain
business logic connected with scylladb, we don't lose a lot if skip
it when scanning for spelling errors. so, in this change, let's
skip it, this should silence one more warning from the github
codespell workflow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17770
2024-03-13 09:25:58 +02:00
Michał Chojnowski
75864e18a2 open-coredump.sh: respect http redirects
downloads.scylladb.com recently started redirecting from http to https
(via `301 Moved Permanently`).
This broke package downloading in open-coredump.sh.

To fix this, we have to instruct curl to follow redirects.

Closes scylladb/scylladb#17759
2024-03-13 08:57:04 +02:00
Pavel Emelyanov
d90db016bf treewide: Use partition_slice::is_reversed()
Continuation of cc56a971e8, more noisy places detected

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17763
2024-03-13 08:52:46 +02:00
Botond Dénes
a329cc34b7 tools/scylla-nodetool: implement the proxyhistograms command 2024-03-13 02:06:30 -04:00
Botond Dénes
a52eddc9c1 tools/scylla-nodetool: implement the tableshistograms command 2024-03-13 02:06:30 -04:00