One of the test cases explicitly wraps itself into async, but there's a
convenience helper for that already.
Indentation is deliberately left broken
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
the corresponding implementation of operator<< was dropped in
a40d3fc25b, so there is no needs to
keep this friend declaration anymore.
also, drop `include <ostream>`, as this header does not reference
any of the ostream types with the change above.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17743
* seastar 5d3ee980...a71bd96d (51):
> util: add formatter for optimized_optional<>
> build: search protobuf using package config
> reactor: Move pieces of scollectd to scollectd
> reactor: Remove write-only task_queue._current
> Add missing include in tests/unit/rpc_test.cc
> doc/io_tester.md: include request_type::unlink in the docs
> doc/io-tester.md: update obsolete information in io_tester docs
> io_tester/conf.yaml: include an example of request_type::unlink job
> io_tester: implement request_type::unlink
> reactor: Print correct errno on io_submit failure
> src/core/reactor.cc: qualify metric function calls with "sm::"
> build: add shard_id.hh to seastar library
> thread: speed up thread creation in debug mode
> include: add missing modules.hh import to shard_id.hh
> prometheus: avoid ambiguity when calling MetricFamily.set_name()
> util/log: add formatter for log_level
> util/log: use string_view for log_level_names
> perf: Calculate length of name column in perf tests
> rpc_test: add a test for inter-compressor communication
> rpc: in multi_algo_compressor_factory, propagate send_empty_frame
> rpc: give compressors a way to send something over the connection
> rpc: allow (and skip) empty compressed frames
> metrics: change value_vector type to std::deque
> HACKING.md: remove doc related to test_dist
> test/unit: do not check if __cplusplus > 201703L
> json_elements: s/foramted/formatted/
> iostream: Refactor input_stream::read_exactly_part
> add unit test to verify str.starts_with(str), str.ends_with(str) return true.
> str.starts_with(str) and str.ends_with(str) should return true, just like std::string
> rpc: Remove FrameType::header_and_buffer_type
> rpc: Defuturize FrameType::return_type
> rpc: Kill FrameType::get_size()
> treewide: put std::invocable<> constraints in template param list
> include: do not include unuser headers
> rpc: fix a deadlock in connection::send()
> iostream: Replace recursion by iteration in input_stream::read_exactly_part
> core/bitops.hh: use std::integral when appropriate
> treewide: include <concepts> instead of seastar/util/concepts.hh
> abortable_fifo: fix the indent
> treewide: expand `SEASTAR_CONCEPT` macro
> util/concepts: always define SEASTAR_CONCEPT
> file: Remove unused thread-pool arg from directory lister
> seastar-json2code: collect required_query_params using a list
> seastar-json2code: reduce the indent level
> seastar-json2code: indent the enum and array elements
> seastar-json2code: generate code for enum type using Template
> seastar-json2code: extract add_operation() out
> reactor: Re-ifdef SIGSEGV sigaction installing
> reactor: Re-ifdef reactor::enable_timer()
> reactor: Re-ifdef task_histogram_add_task()
> reactor: Re-ifdef install_signal_handler_stack()
Closesscylladb/scylladb#17714
This small series improves the Alternator tests for metrics:
1. Improves some comments in the test.
2. Restores a test that was previously hidden by two tests having the same name.
3. Adds tests for latency histogram metrics.
Closesscylladb/scylladb#17623
* github.com:scylladb/scylladb:
test/alternator: tests for latency metrics
test/alternator: improve comments and unhide hidden test
There are four stages left to handle: cleanup, cleanup_target, end_migration and revert_migration. All are handling removed nodes already, so the PR just extends the test.
fixes: #16527Closesscylladb/scylladb#17684
* github.com:scylladb/scylladb:
test/tablets_migration: Test revert_migration failure handling
test/tablets_migration: Test end_migration failure handling
test/tablets_migration: Test cleanup_target failure handling
test/tablets_migration: Test cleanup failure handling
test/tablets_migration: Prepare for do_... stages
test/tablets_migration: Add ability to removenode via any other node
test/tablets_migration: Wrap migration stages failing code into a helper class
storage_service: Add failure injection to crash cleanup_tablet
Instead of a functor, for those metrics that just return the value of an
existing member variable. This is ever so slightly more efficient than a
functor.
Closesscylladb/scylladb#17726
In test/alternator/test_metrics.py we had tests for the operation-count
metrics for different Alternator API operations, but not for the latency
histograms for these same operations. So this patch adds the missing
tests (and removes a TODO asking to do that).
Note that only a subset of the operations - PutItem, GetItem, DeleteItem,
UpdateItem, and GetRecords - currently have a latency history, and this
test verifies this. We have an issue (Refs #17616) about adding latency
histograms for more operations - at which point we will be able to expand
this test for the additional operations.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The original goal of this patch was to improve comments in
test/alternator/test_metrics.py, but while doing that I discovered
that one of the test functions was hidden by a second test with
the same name! So this patch also renames the second test.
The test continues to work after this patch - the hidden test
was successful.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().
This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.
Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.
Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.
Fixes: https://github.com/scylladb/scylladb/issues/17342Closesscylladb/scylladb#17405
* github.com:scylladb/scylladb:
storage_service/ownership: discard get_ownership() requests when tablets enabled
storage_service/ownership/{keyspace}: handle requests when tablets are enabled
locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual
locator/tablets: add tablet_map::get_sorted_tokens()
pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient
rest_api/test_storage_service: add simplistic tests of ownership API for vnodes
Seastar removed `task_queue::_current` in
258b11220d343d8c7ae1a2ab056fb5e202723cc8 . let's adapt scylla-gdb.py
accordingly. despite that `current_scheduling_group_ptr()` is an internal
API, it's been around for a while, and relatively stable. so let's use
it instead.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17720
The short series allows do_status_check to handle down nodes that don't have HOST_ID application state.
Fixes#16936Closesscylladb/scylladb#17024
* github.com:scylladb/scylladb:
gossiper: do_status_check: fixup indentation
gossiper: do_status_check: allow evicting dead nodes from membership with no host_id
gossiper: print the host_id when endpoint state goes UP/DOWN
gossiper: get_host_id: differentiate between no endpoint_state and no application_state
gms: endpoint_state: add get_host_id
gossiper: do_status_check: continue loop after evicting FatClient
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for internal types in service/storage_proxy.cc.
please note, `service::storage_proxy::remote::read_verb` is extracted out of
the outter class, because, the class's implementation formats `read_verb` in this
class. so we have to put the formatter at the place where its callers can see.
that's why it is moved up and out of `service::storage_proxy::remote`.
some of the operator<<:s are preserved, as they are still being used by
the existing formatters, for instance, the one for
`seastar::shared_ptr<>`, which is used to print
`seastar::shared_ptr<service::paxos_response_handler>`.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17708
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `bound_kind` and `bound_view`,
and drop the latter's operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17706
Shard-level latencies generate a lot of metrics. This patch reduces the
the number of latencies reported by Alternator while keeping the same
functionality.
On the shard level, summaries will be reported instead of histograms.
On the instance level, an aggregated histogram will be reported.
Summaries, histograms, and counters are marked with skip_when_empty.
Fixes#12230Closesscylladb/scylladb#17581
This change introduces a logic, that is responsible
for checking if tablets are enabled for any of
keyspaces when get_ownership() is invoked.
Without it, the result would be calculated
based solely on sorted_tokens() which was
invalid.
Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().
This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.
Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.
Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.
Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Before this patch, the mentioned function was a specific
member of vnode_effective_replication_strategy class.
To allow its usage also when tablets are enabled it was
shifted to the base class - effective_replication_strategy
and made pure virtual to force the derived classes to
implement it.
It is used by 'storage_service::get_ranges_for_endpoint()'
that is used in calculation of effective ownership. Such
calculation needs to be performed also when tablets are
enabled.
Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
This change introudces a new member function that
returns a vector of sorted tokens where each pair of adjacent
elements depicts a range of tokens that belong to tablet.
It will be used to produce the equivalent of sorted_tokens() of
vnodes when trying to use dht::describe_ownership() for tablets.
Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
This change adds a member function that can be used
to access 'storage_service/ownership' API.
It will be used by tests that need to access this API.
Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
This change is intended to introduce tests for vnodes for
the following API paths:
- 'storage_service/ownership'
- 'storage_service/ownership/{keyspace}'
In next patches the logic that is tested will be adjusted
to work correctly when tablets are enabled. This is a safety
net that ensures that the logic is not broken.
Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* reader_permit::state
* reader_resources
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17707
instead of using fmt::runtime format string, use compile-time
format string, so that we can have compile-time format check provided
by {fmt}.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17709
This stage is also the error path that starts from write_both_read_old,
so check this failure in two steps -- first fail the latter stage in one
of the nodes, then fail the former in another.
For that one more node in the cluster is needed.
Also, to avoid name conflicts, the do_revert_migration pseudo stage name
is used.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This stage is pure barrier. Barriers already take ignored nodes into
account, so do the fail-injector, so just wire the stage name into the
test.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This stage is error path, so in order to fail it we need to fail some
other stage prior to that. This leads to the testing sequence of
1. fail streaming via source node
2. stop and remove source node to let state machine proceed
3. fail cleanup_target on the destination node
4. stop and remove destination node
First thing to note here, is that the test doesn't fail source node for
cleanup_target stage, symmetrically to how it does for cleanup stage.
Next, since we're removing two nodes, the cluster is equipeed with more
nodes nodes to have raft quorum.
Finally, since remove of source node doesn't finish until tablet
migration finishes, it's impossible to remove destination node via the
same node-0, so the 2nd removenode happens via node-3.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The handling itself is already there -- if the leaving node is excluded
the cleanup stage resolves immediately. So just add a code that
validates that.
Also, skip testing of pending replica failure during cleanup stage, as
it doesn't really participate in it any longer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The tablets migration test is parametrized with stage name to inject
failure in. Internal class node_failer uses this parameter as is when
injecting a failure into scylla barrier handler.
Next patch will need to extend the test with revert_migration value and
add handling of this name to node_failer class. The node_failer class,
in turn, will want to instantiate two other instances of the same class
-- one to fail the write_both_read_old stage, and the other one to fail
the revert_migration barrier. So internally the class will need to tell
revert_migration value as full test parameter from revert_migration as
barrier-only parameter.
This test adds the ability to add do_ prefix to node_failer parameter to
tell full test from barrier-only. When injecting a failure into scylla
the do_ prefix needs to be cut off, since scylla still needs to fail the
barrier named revert_migration, not do_revert_migration.
Also split the long line while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently the test calls removenode via node-0 in the cluster, which is
always alive. Next test case will need to call removenode on some other
node (more details in that patch later).
refs: #17681
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
One of the next stages will need to use two of them at the same time and
it's going to be easier if the failing code is encapsulated.
No functional changes here, just large portions of code and local
variables are moved into class and its methods.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Will be needed by test that verifies how failures in tablets migration
stages are handled by state machine
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Be more permissive about the presence of host_id
application state for dead and expired nodes in release mode,
so do not throw runtime_error in this case, but
rather consider them as non-normal token owners.
Instead, call on_internal_error_noexcept that will
log the internal error and a backtrace, and will abort
if abort-on-internal-error is set.
This was seen when replacing dead nodes,
without https://github.com/scylladb/scylladb/pull/15788Fixes#16936
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The host_id is now used in token_metadata
and in raft topology changes so print it
when the gossiper marks the node as UP/DOWN.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, we throw the same runtime_error:
`Host {} does not have HOST_ID application_state`
in both case: where there is no endpoint_state
or when the endpoint_state has no HOST_ID
application state.
The latter case is unexpected, especially
after 8ba0decda5
(and also from the add_saved_endpoint path
after https://github.com/scylladb/scylladb/pull/15788
is merged), so throw different error in each case
so we can tell them apart in the logs.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
A simpler getter to get the HOST_ID application state
from the endpoint_state.
Return a null host_id if the application state is not found.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We're seeing cases like #16936:
```
INFO 2024-01-23 02:14:19,915 [shard 0:strm] gossip - failure_detector_loop: Mark node 127.0.23.4 as DOWN
INFO 2024-01-23 02:14:19,915 [shard 0:strm] gossip - InetAddress 127.0.23.4 is now DOWN, status = BOOT
INFO 2024-01-23 02:14:27,913 [shard 0: gms] gossip - FatClient 127.0.23.4 has been silent for 30000ms, removing from gossip
INFO 2024-01-23 02:14:27,915 [shard 0: gms] gossip - Removed endpoint 127.0.23.4
WARN 2024-01-23 02:14:27,916 [shard 0: gms] gossip - === Gossip round FAIL: std::runtime_error (Host 127.0.23.4 does not have HOST_ID application_state)
```
Since the FatClient timeout handling already evicts the endpoint
from memberhsip there is no need to check further if the
node is dead and expired, so just co_return.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* repair_hash
* read_strategy
* streaming::stream_summary
and drop their operator<<:s
Refs #13245Closesscylladb/scylladb#17711
* github.com:scylladb/scylladb:
repair: add fmt::formatter for streaming::stream_summary
repair: add fmt::formatter for read_strategy
repair: add fmt::formatter for repair_hash
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for streaming::stream_summary, and
drop its operator<<
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for read_strategy, and drop its
operator<<
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for repair_hash.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
While measuring #17149 with this test some changes were applied, here they are
- keep initial_tablets number in output json's parameters section
- disable auto compaction
- add control over the amount of sstables generated for --bypass-cache case
Closesscylladb/scylladb#17473
* github.com:scylladb/scylladb:
perf_simple_query: Add --memtable-partitions option
perf_simple_query: Disable auto compaction
perf_simple_query: Keep number of initial tablets in output json
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* raft::election_tracker
* raft::votes
* raft::vote_result
and drop their operator<<:s.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17670
Before the change, when a test failed because of some error
in the `cql_test_env.cc`, we were getting:
```
error: boost/virtual_table_test: failed to parse XML output '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0
```
After the change we're getting:
```
error: boost/virtual_table_test: Empty testcase XML output, possibly caused by a crash in the cql_test_env.cc, details: '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0
```
Closesscylladb/scylladb#17679
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`partition_snapshot_row_cursor`, and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17669
before this change, "ring" subcommand has two issues:
1. `--resolve-ip` option accepts a boolean argument, but this option
should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
specified or not. but it should print the resolved name, instead
of an IP address if `--resolve-ip` is specified.
in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.
Closesscylladb/scylladb#17553
* github.com:scylladb/scylladb:
tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
test/nodetool: calc max_width from all_hosts
test/nodetool: keep tokens as Host's member
test/nodetool: remove unused import
* tools/java 5e11ed17...e4878ae7 (2):
> nodetool: fix a typo in error message
> bin/cassandra-stress: Add extended version info
Closesscylladb/scylladb#17680