As the first clustering column. For vnode keyspaces, this will always be
"ALL", for tablet keyspaces, this will contain the name of the described
table.
Into a separate method. For vnodes there is a single ring per keyspace,
but for tablets, there is a separate ring for each table in the
keyspace. To accomodate both, we move the code emitting the ring into a
separate method, so execute() can just call it once per keyspace or once
per table, whichever appropriate.
Do no use the internal host2ip() method. This relies on `_group0`, which
is only set on shard 0. Consequently, any call to this method, coming
from a shard other than shard 0, would crash ScyllaDB, as it
dereferences a nullptr.
Tablet transition would get stuck anyway for such nodes, so it's not worth trying
refs: #16372 (not fixes, because there's also repair transitions with same problem)
Closesscylladb/scylladb#17796
* github.com:scylladb/scylladb:
topology_coordinator: Skip dead nodes when balancing tablets
test: Add test for load_balancer skiplist
tablet_allocator: Add skiplist to load_balancer
The status command has an extensive amount of requests to the server. To be able to handle this more easily, the rest api mock server is refactored extensively to be more flexible, accepting expected requests out-of-order. While at it, the rest api mock server also moves away from a deprecated `aiohttp` feature: providing custom router argument to the `aiohttp` app. This forces us to pre-register all API endpoints that any test currently uses, although due to some templateing support, this is not as bad as it sounds. Still, this is an annoyance, but this point we have implemented almost all commands, so this won't be much a of a problem going forward.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#17547
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the status command
test/nodetool: rest_api_mock.py: match requests out-of-order
test/nodetool: rest_api_mock.py: remove trailing / from request paths
test/nodetool: rest_api_mock.py: use static routes
test/nodetool: check only non-exhausted requests
tools/scylla-nodetool: repair: set the jobThreads request parameter
It's too late to call `remove_rpc_client_with_ignored_topology` on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to `write_both_read_new`. The `remove_rpc_client_with_ignored_topology` function shutdowns sockets and causes such requests to timeout.
In this PR we move the `remove_rpc_client_with_ignored_topology` call to the earliest point possible when a node first appears in `token_metadata.topology`.
From the topology coordinator perspective this happens when a joining node moves to `node_state::bootstrapping` and the topology moves to `transition_state::join_group0`. In `sync_raft_topology_nodes` the node should be contained in transition_nodes. The successful `wait_for_ip` before entering `transition_state::join_group0` ensures that update_topology should find a node's IP and put it into the topology. The barrier in `commit_cdc_generation` will ensure that all nodes in the cluster are using the proper connection parameters.
Only outgoing connections are tracked by `remove_rpc_client_with_ignored_topology`, those created by the current node. This means we need to call `remove_rpc_client_with_ignored_topology` on each node of the cluster.
fixesscylladb/scylladb#17445Closesscylladb/scylladb#17757
* github.com:scylladb/scylladb:
test_remove_rpc_client_with_pending_requests: add a regression test
remove_rpc_client_with_ignored_topology: call it earlier
storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined
Since 4c767c379c we can reach a situation
where we know that we have admitted too many expensive view update
operations and the mechanism of dropping the following view updates
can be triggerred in a wider range of scenarios. Ideally, we would
want to fail whole requests on the coordinator level, but for now, we
change the behavior to failing just the base writes. This allows us
to avoid creating inconsistencies between base replicas and views
at the cost of introducing inconsistencies between different base
replicas. This, however, can be fixed by repair, in contrast to
base-view inconsistencies which we don't have a good method of fixing.
Fixes#17795Closesscylladb/scylladb#17777
Since 6b87778 regular compaction tasks are removed from task manager
immediately after they are finished.
test_regular_compaction_task lists compaction tasks and then requests
their statuses. Only one regular compaction task is guaranteed to still
be running at that time, the rest of them may finish before their status
is requested and so it will no longer be in task manager, causing the test
to fail.
Fix statuses check to consider the possibility of a regular compaction
task being removed from task manager.
Fixes: #17776.
Closesscylladb/scylladb#17784
Commit 0665d9c346 changed the gossiper
failure detector in the following way: when live endpoints change
and per-node failure detectors finish their loops, the main failure
detector calls gossiper::convict for those nodes which were alive when
the current iteration of the main FD started but now are not. This was
changed in order to make sure that nodes are marked as down, because
some other code in gossiper could concurrently remove nodes from
the live node lists without marking them properly.
This was committed around 3 years ago and the situation changed:
- After 75d1dd3a76
the `endpoint_state::_is_alive` field was removed and liveness
of a node is solely determined by its presence
in the `gossiper::_live_endpoints` field.
- Currently, all gossiper code which modifies `_live_endpoints`
takes care to trigger relevant callback. The only function which
modifies the field but does not trigger notifications
is `gossiper::evict_from_membership`, but it is either called
after `gossiper::remove_endpoint` which triggers callbacks
by itself, or when a node is already dead and there is no need
to trigger callbacks.
So, it looks like the reasons it was introduced for are not relevant
anymore. What's more important though is that it is involved in a bug
described in scylladb/scylladb#17515. In short, the following sequence
of events may happen:
1. Failure detector for some remote node X decides that it was dead
long enough and `convict`s it, causing live endpoints to be updated.
2. The gossiper main loop sends a successful echo to X and *decides*
to mark it as alive.
3. At the same time, failure detector for all nodes other than X finish
and main failure detector continues; it notices that node X is
not alive (because it was convicted in point 1.) and *decides*
to convict it.
4. Actions planned in 2 and 3 run one after another, i.e. node is first
marked as alive and then immediately as dead.
This causes `on_alive` callbacks to run first and then `on_dead`. The
second one is problematic as it closes RPC connections to node X - in
particular, if X is in the process of replacing another node with the
same IP then it may cause the replace operation to fail.
In order to simplify the code and fix the bug - remove the piece
of logic in question.
Fixes: scylladb/scylladb#17515Closesscylladb/scylladb#17754
Nodetool currently assumes that positional arguments are only keyspaces.
ks.tbl pairs are only provided when --kt-list or friends are used. This
is not the case however. So check positional args too, and if they look
like ks.tbl, handle them accordingly.
While at it, also make sure that alternator keyspace and tables names
are handled correctly.
Closesscylladb/scylladb#17480
The method in question can have a shorter name that matches all other injections in this class, and can be non-template
Closesscylladb/scylladb#17734
* github.com:scylladb/scylladb:
error_injection: De-template inject() with handler
error_injection: Overload inject() instead of inject_with_handler()
This test reproduces the problem from scylladb/scylladb#17445.
It fails quite reliably without the fix from the previous
commit.
The test just bootstraps a new node while bombarding the cluster
with read requests.
In this commit we move the remove_rpc_client_with_ignored_topology
call to the earliest point possible - when a node first appears
in token_metadata.topology.
From the topology coordinator perspective this happens when a joining
node moves to node_state::bootstrapping and the topology moves to
transition_state::join_group0. In sync_raft_topology_nodes
the node should be contained in transition_nodes. The successful
wait_for_ip before entering transition_state::join_group0 ensures
that update_topology should find a node's IP and put it into the topology.
The barrier in commit_cdc_generation will ensure that all nodes
in the cluster are using the proper connection parameters.
Only outgoing connections are tracked by remove_rpc_client_with_ignored_topology,
those created by the current node. This means we need to call
remove_rpc_client_with_ignored_topology on each node of the cluster.
fixesscylladb/scylladb#17445
notify_joined
It's too late to call remove_rpc_client_with_ignored_topology on
messaging service when a node becomes normal. Data
plane requests can be routed to the node much earlier,
at least when topology switches to write_both_read_new.
The remove_rpc_client_with_ignored_topology function
shutdowns sockets and causes such requests to timeout.
We intend to call remove_rpc_client_with_ignored_topology
as soon as a node becomes part of token_metadata topology.
In this preparatory commit we refactor
storage_service::notify_joined. We remove the
remove_rpc_client_with_ignored_topology call from it
call it separately from the two call sites of notify_joined.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, since boost::program_options::options_description is
defined by boost.program_options library, and it only provides the
operator<< overload. we're inclined to not specializing `fmt::formatter`
for it at this moment, because
* this class is not in defined by scylla project. we would have to
find a home for this formatter.
* we are not likely to reuse the formatter in multiple places
so, in this change we just print it using `fmt::streamed`.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17791
The coordinator can find out which nodes are marked as DOWN, thus when
calling tablets balancer it can feed it a skiplist
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test is inspired by the test_load_balancing_with_empty_node one and
verifies that when a node is skiplisted, balancer doesn't put load on it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently load balancer skips nodes only based on its "administrative"
state, i.e. whether it's drained/decommissioned/removed/etc. There's no
way to exclude any node from balancing decision based on anything else.
This patch add this ability by adding skiplist argument to
balance_tablets() method. When a node is in it, it will not be
considered, as if it was removenode-d.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* db::commitlog::segment::cf_mark
* db::commitlog::segment_manager::named_file
* db::commitlog::segment_manager::dispose_mode
* db::commitlog::segment_manager::byte_flow<T>
please note, the formatter of `db::commitlog::segment` is not
included in this commit, as we are formatting it in the inline
definition of this class. so we cannot define the specialization
of `fmt::formatter` for this class before its callers -- we'd
either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`.
either way, it's different from the theme of this commit, and we
will handle it in a separated commit.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17792
Contrary to Origin, the single-token case is not discriminated in the
native implementation, for two reasons:
* ScyllaDB doesn't ever run with a single token, it is even moving away
from vnodes.
* Origin implemented the logic to detect single-token with a mistake: it
compares the number of tokens to the number of DCs, not the number of
nodes.
Another difference is that the native implementation doesn't request
ownership information when a keyspace argument was not provided -- it is
not printed anyway.
In the previous patch, we made matching requests to different endpoints
be matched out-of-order. In this patch we go one step further and make
matching requests to the same endpoint match out-of-order too.
With this, tests can register the expected requests in any order, not in
the same order as the nodetool-under-test is expected to send them. This
makes testing more flexible. Also, how requests are ordered is not
interesting from the correctness' POV anyway.
The legacy nodetool likes to append an "/" to the requests paths every
now and then, but not consistently. Unfortunately, request path matching
in the mock rest server and in aiohttp is quite sensitive to this
currently. Reduce friction by removing trailing "/" from paths in the
mock api, allowing paths to match each other even if one has a trailing
"/" but the other doesn't.
Unfortunately there is nothing we can do about the aiohttp part, so some
API endpoints have to be registered with a trailing "/".
The mock server currently provides its own router to the aiohttp.web
app. The ability to provide custom routers however is deprecated and
can be removed at any point. So refactor the mock server to use the
built-in router. This requires some changes, because the built-in router
does not allow adding/removing routes once the server starts. However
the mock server only learns of the used routes when the tests run.
This unfortunately means that we have to statically register all
possible routes the tests will use. Fortunately, aiohttp has variable
route support (templated routes) and with this, we can get away with
just 9 statically registered routes, which is not too bad.
A (desired) side-effect of this refactoring is that now requests to
different routes do not have to arrive in order. This constraint of the
previous implementation proved to be not useful, and even made writing
certain tests awkward.
Refactor how the tests check for expected requests which were never
invoked. At the end of every test, the nodetool fixture requests all
unconsumed expected requests from the rest_api_mock.py and checks that
there is none. This mechanism has some interaction with requests which
have a "multiple" set: rest_api_mock.py allows registering requests with
different "multiple" requirements -- how many times a request is
expected to be invoked:
* ANY: [0, +inf)
* ONE: 1
* MULTIPLE: [1, +inf)
Requests are stored in a stack. When a request arrives, we pop off
requests from the top until we find a perfect match. We pop off
requests, iff: multiple == ANY || multiple == MULTIPLE and was hit at
least once.
This works as long as we don't have an multiple=ANY request at the
bottom of the stack which is never invoked. Or a multiple=MULTIPLE one.
This will get worse once we refactor requests to be not stored in a
stack.
So in this patch, we filter requests when collecting unexhausted ones,
dropping those which would be qualified to be popped from the stack.
Although ScyllaDB ignores this request parameter, the Java nodetools
sets it, so it is better to have the native one do the same for
symmetry. It makes testing easier.
Discovered with the more strict request matching introduced in the next
patches.
It is not supported currently.
If a user passes the option, the request will be rejected with:
The hosts option is not supported for tablet repair
The ignore_nodes option is not supported for tablet repair
This option is useful to select nodes to repair.
Fixes: #17742
Tests: repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes
repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes_errors
repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_dc_host
Closesscylladb/scylladb#17767
The new MX-native validator, which validates the index in tandem with the data file, was discovered to print false-positive errors, related to range-tombstones and promoted-index positions.
This series fixes that. But first, it refactors the scrub-related tests. These are currently dominated by boiler-plate code. They are hard to read and hard to write. In the first half of the series, a new `scrub_test` is introduced, which moves all the boiler-plate to a central place, allowing the tests to focus on just the aspect of scrub that is tested.
Then, all the found bugs in validate are fixed and finally a new test, checking validate with valid sstable is introduced.
Fixes: #16326Closesscylladb/scylladb#16327
* github.com:scylladb/scylladb:
test/boost/sstable_compaction_test: add validation test with valid sstable
sstablex/mx/reader: validate(): print trace message when finishing the PI block
sstablex/mx/reader: validate(): make index-data PI position check message consistent
sstablex/mx/reader: validate(): only load the next PI block if current is exhausted
sstablex/mx/reader: validate(): reset the current PI block on partition-start
sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked
sstablex/mx/reader: validate(): fix validator for range tombstone end bounds
test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper
test/boost/sstable_compaction_test: fix indentation
test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test
test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test
test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test
test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test
test/boost/sstable_compaction_test: introduce scrub_test_framework
test/lib/random_schema: add uncompatible_timestamp_generator()
This PR implements the following new nodetool commands:
* netstats
* tablehistograms/cfhistograms
* proxyhistograms
All commands come with tests and all tests pass with both the new and the current nodetool implementations.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#17651
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the proxyhistograms command
tools/scylla-nodetool: implement the tableshistograms command
tools/scylla-nodetool: introduce buffer_samples
utils/estimated_histogram: estimated_histogram: add constructor taking buckets
tools/scylla-nodetool: implement the netstats command
tools/scylla-nodetool: add correct units to file_size_printer
When the partition_index_cache is evicted, we yield for preemption between
pages, but not within a page.
Commit 3b2890e1db ("sstables: Switch index_list to chunked_vector
to avoid large allocations") recognized that index pages can be large enough
to overflow a 128k alignment block (this was before the index cache and
index entries were not stored in LSA then). However, it did not go as far as
to gently free individual entries; either the problem was not recognized
or wasn't as bad.
As the referenced issue shows, a fairly large stall can happen when freeing
the page. The workload had a large number of tombstones, so index selectivity
was poor.
Fix by evicting individual rows gently.
The fix ignores the case where rows are still references: it is unlikely
that all index pages will be referenced, and in any case skipping over
a referenced page takes an insignificant amount of time, compared to freeing
a page.
Fixes#17605Closesscylladb/scylladb#17606
This is a speculative fix as the problem is observed only on CI.
When run_async is called right after driver_connect and get_cql
it fails with ConnectionException('Host has been marked down or
removed').
If the approach proves to be succesfull we can start to deprecate
base get_cql in favor of get_ready_cql. It's better to have robust
testing helper libraries than try to take care of it in every test
case separately.
Fixes#17713Closesscylladb/scylladb#17772
Two repair test cases verify that repair generated enough rows in the
history table. Both use identical code for that, worth generalizing
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17761
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* mutation_partition_v2::printer
* frozen_mutation::printer
* mutation
their operator<<:s are dropped.
Refs #13245Closesscylladb/scylladb#17769
* github.com:scylladb/scylladb:
mutation: add fmt::formatter for mutation
mutation: add fmt::formatter for frozen_mutation::printer
mutation: add fmt::formatter for mutation_partition_v2::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* column_definition
* column_mapping
* ordinal_column_id
* raw_view_info
* schema
* view_ptr
their operator<<:s are dropped. but operator<< for schema is preserved,
as we are still printing `seastar::lw_shared_ptr<const schema>` with
our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which
uses operator<< to print the pointee.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17768
codespell reports "Nees" should be "Needs" but "Nees" is the last
name of Georg Nees. so it is not a misspelling. can should not be
fixed.
since the purpose of lolwut.cc is to display Redis version and
print a generative computer art. the one included by our version
was created by Georg Nees. since the LOLWUT command does not contain
business logic connected with scylladb, we don't lose a lot if skip
it when scanning for spelling errors. so, in this change, let's
skip it, this should silence one more warning from the github
codespell workflow.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17770
downloads.scylladb.com recently started redirecting from http to https
(via `301 Moved Permanently`).
This broke package downloading in open-coredump.sh.
To fix this, we have to instruct curl to follow redirects.
Closesscylladb/scylladb#17759
When printing human-readable file-sizes, the Java nodetool always uses
base-2 steps (1024) to arrive at the human-readable size, but it uses
the base-10 units (MB) and base-2 units (MiB) interchangeably.
Adapt file_size_printer to support both. Add a flag to control which is
used.