Although ScyllaDB ignores this request parameter, the Java nodetools
sets it, so it is better to have the native one do the same for
symmetry. It makes testing easier.
Discovered with the more strict request matching introduced in the next
patches.
This PR implements the following new nodetool commands:
* netstats
* tablehistograms/cfhistograms
* proxyhistograms
All commands come with tests and all tests pass with both the new and the current nodetool implementations.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#17651
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the proxyhistograms command
tools/scylla-nodetool: implement the tableshistograms command
tools/scylla-nodetool: introduce buffer_samples
utils/estimated_histogram: estimated_histogram: add constructor taking buckets
tools/scylla-nodetool: implement the netstats command
tools/scylla-nodetool: add correct units to file_size_printer
When the partition_index_cache is evicted, we yield for preemption between
pages, but not within a page.
Commit 3b2890e1db ("sstables: Switch index_list to chunked_vector
to avoid large allocations") recognized that index pages can be large enough
to overflow a 128k alignment block (this was before the index cache and
index entries were not stored in LSA then). However, it did not go as far as
to gently free individual entries; either the problem was not recognized
or wasn't as bad.
As the referenced issue shows, a fairly large stall can happen when freeing
the page. The workload had a large number of tombstones, so index selectivity
was poor.
Fix by evicting individual rows gently.
The fix ignores the case where rows are still references: it is unlikely
that all index pages will be referenced, and in any case skipping over
a referenced page takes an insignificant amount of time, compared to freeing
a page.
Fixes#17605Closesscylladb/scylladb#17606
This is a speculative fix as the problem is observed only on CI.
When run_async is called right after driver_connect and get_cql
it fails with ConnectionException('Host has been marked down or
removed').
If the approach proves to be succesfull we can start to deprecate
base get_cql in favor of get_ready_cql. It's better to have robust
testing helper libraries than try to take care of it in every test
case separately.
Fixes#17713Closesscylladb/scylladb#17772
Two repair test cases verify that repair generated enough rows in the
history table. Both use identical code for that, worth generalizing
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17761
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* mutation_partition_v2::printer
* frozen_mutation::printer
* mutation
their operator<<:s are dropped.
Refs #13245Closesscylladb/scylladb#17769
* github.com:scylladb/scylladb:
mutation: add fmt::formatter for mutation
mutation: add fmt::formatter for frozen_mutation::printer
mutation: add fmt::formatter for mutation_partition_v2::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* column_definition
* column_mapping
* ordinal_column_id
* raw_view_info
* schema
* view_ptr
their operator<<:s are dropped. but operator<< for schema is preserved,
as we are still printing `seastar::lw_shared_ptr<const schema>` with
our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which
uses operator<< to print the pointee.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17768
codespell reports "Nees" should be "Needs" but "Nees" is the last
name of Georg Nees. so it is not a misspelling. can should not be
fixed.
since the purpose of lolwut.cc is to display Redis version and
print a generative computer art. the one included by our version
was created by Georg Nees. since the LOLWUT command does not contain
business logic connected with scylladb, we don't lose a lot if skip
it when scanning for spelling errors. so, in this change, let's
skip it, this should silence one more warning from the github
codespell workflow.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17770
downloads.scylladb.com recently started redirecting from http to https
(via `301 Moved Permanently`).
This broke package downloading in open-coredump.sh.
To fix this, we have to instruct curl to follow redirects.
Closesscylladb/scylladb#17759
When printing human-readable file-sizes, the Java nodetool always uses
base-2 steps (1024) to arrive at the human-readable size, but it uses
the base-10 units (MB) and base-2 units (MiB) interchangeably.
Adapt file_size_printer to support both. Add a flag to control which is
used.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for mutation. but its operator<<
is preserved, as we are still using our homebrew generic formatter
for printing `std::vector<mutation>`, and this formatter is using
operator<< for printing the elements in vector.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for frozen_mutation::printer,
and drop its operator.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for mutation_partition_v2::printer, and
drop its operator<<
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This patch adds the dc option support for table repair. The management
tool can use this option to select nodes in specific data centers to run
repair.
Fixes: #17550
Tests: repair_additional_test.py::TestRepairAdditional::test_repair_option_dc
Closesscylladb/scylladb#17571
Calling scylla-nodetool with option describering and ommiting the keyspace
name argument results in a boost exception with the following error message:
error running operation: boost::wrapexcept<boost::bad_any_cast> (boost::bad_any_cast: failed conversion using boost::any_cast)
This change checks for the missing keyspace and outputs a more sensible
error message:
error processing arguments: keyspace must be specified
Closesscylladb/scylladb#17741
Just a cleanup -- replace do_with_cql_env + async with do_with_cql_env_thread
Closesscylladb/scylladb#17758
* github.com:scylladb/scylladb:
test/storage_proxy: Restore indentation after previous patch
test/storage_proxy: Use do_with_cql_env_thread()
One of the test cases explicitly wraps itself into async, but there's a
convenience helper for that already.
Indentation is deliberately left broken
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Builder works in "steps". Each step runs for a given base table, when a
new view is created it either initiates a step or appends to currently
running step.
Running a step means reading mutations from local sstables reader and
applying them to all views that has jumped into this step so far. When a
view is added to the step it remembers the current token value the step
is on. When step receives end-of-stream it rewinds to minimal-token.
Rewinding is done by closing current reader and creating a new one. Each
time token is advanced, all the views that meet the new token value for
the second time (i.e. -- scan full round) are marked as built and are
removed from step. When no views are left on step, it finishes.
The above machinery can break when rewinding the end-of-stream reader.
The trick is that a running step silently assumes that if the reader
once produced some token (and there can be a view that remembered this
token as its starting one), then after rewinding the reader would
generate the same token or greater. With tablets, however, that's not
the case. When a node is decommissioned tablets are cleaned and all
sstables are removed. Rewinding a reader after it makes empty reader
that produces no tokens from now on. Respectively, any build steps that
had captured tokens prior to cleanup would get stuck forever.
The fix is to check if the mutation consumer stepped at least one step
forward after rewind, and if no -- complete all the attached views.
fixes: #17293
Similar thing should happen if the base table is truncated with views
being built from it. Testing it steps on compaction assertion elsewhere
and needs more research.
refs: #17543
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17548
summarize_tests() is only used to summarize boost tests, so reflect
this fact using its name. we will need to summarize the tests which
generate JUnit XML as well, so this change also prepares for a
following-up change to implement a new summarize helper.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17746
Here are three endpoints in the api/cache_service that report "metrics"
for the row cache and the values they return
- entries: number of partitions
- size: number of partitions
- capacity: used space
The size and capacity seem very inaccurate.
Comment says, that in C* the size should be weighted, but scylla doesn't
support weight of entries in cache. Also, capacity is configurable via
row_cache_size_in_mb config option or set_row_cache_capacity_in_mb API
call, but Scylla doesn't support both either.
This patch suggestes changing return values for size and capacity endpoints.
Despite row cache doesn't support weights, it's natural to return
used_space in bytes as the value, which is more accurate to what "size"
means rather than number of entries.
The capacity may return back total memory size, because this is what
Scylla really does -- row cache growth is only limited by other memory
consumers, not by configured limits.
fixes: #9418
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17724
The test carries const std::string_view& around, but the type is
lightweight class that can be copied around at the same cost as its
reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17735
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `view_info`, its operator<<
is dropped.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17745
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* utils::human_readable_value
* std::strong_ordering
* std::weak_ordering
* std::partial_ordering
* utils::exception_container
Refs https://github.com/scylladb/scylladb/issues/13245Closesscylladb/scylladb#17710
* github.com:scylladb/scylladb:
utils/exception_container: add fmt::formatter for exception_container
utils/human_readable: add fmt::formatter for human_readable_value
utils: add fmt::formatter for std::strong_ordering and friends
This PR fixes comments left from #17481 , namely
- adds case selection to boost suite
- describes the case selection in documentation
Closesscylladb/scylladb#17721
* github.com:scylladb/scylladb:
docs: Add info about the ability to run specific test case
test.py: Support case selection for boost tests
the corresponding implementation of operator<< was dropped in
a40d3fc25b, so there is no needs to
keep this friend declaration anymore.
also, drop `include <ostream>`, as this header does not reference
any of the ostream types with the change above.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17743
* seastar 5d3ee980...a71bd96d (51):
> util: add formatter for optimized_optional<>
> build: search protobuf using package config
> reactor: Move pieces of scollectd to scollectd
> reactor: Remove write-only task_queue._current
> Add missing include in tests/unit/rpc_test.cc
> doc/io_tester.md: include request_type::unlink in the docs
> doc/io-tester.md: update obsolete information in io_tester docs
> io_tester/conf.yaml: include an example of request_type::unlink job
> io_tester: implement request_type::unlink
> reactor: Print correct errno on io_submit failure
> src/core/reactor.cc: qualify metric function calls with "sm::"
> build: add shard_id.hh to seastar library
> thread: speed up thread creation in debug mode
> include: add missing modules.hh import to shard_id.hh
> prometheus: avoid ambiguity when calling MetricFamily.set_name()
> util/log: add formatter for log_level
> util/log: use string_view for log_level_names
> perf: Calculate length of name column in perf tests
> rpc_test: add a test for inter-compressor communication
> rpc: in multi_algo_compressor_factory, propagate send_empty_frame
> rpc: give compressors a way to send something over the connection
> rpc: allow (and skip) empty compressed frames
> metrics: change value_vector type to std::deque
> HACKING.md: remove doc related to test_dist
> test/unit: do not check if __cplusplus > 201703L
> json_elements: s/foramted/formatted/
> iostream: Refactor input_stream::read_exactly_part
> add unit test to verify str.starts_with(str), str.ends_with(str) return true.
> str.starts_with(str) and str.ends_with(str) should return true, just like std::string
> rpc: Remove FrameType::header_and_buffer_type
> rpc: Defuturize FrameType::return_type
> rpc: Kill FrameType::get_size()
> treewide: put std::invocable<> constraints in template param list
> include: do not include unuser headers
> rpc: fix a deadlock in connection::send()
> iostream: Replace recursion by iteration in input_stream::read_exactly_part
> core/bitops.hh: use std::integral when appropriate
> treewide: include <concepts> instead of seastar/util/concepts.hh
> abortable_fifo: fix the indent
> treewide: expand `SEASTAR_CONCEPT` macro
> util/concepts: always define SEASTAR_CONCEPT
> file: Remove unused thread-pool arg from directory lister
> seastar-json2code: collect required_query_params using a list
> seastar-json2code: reduce the indent level
> seastar-json2code: indent the enum and array elements
> seastar-json2code: generate code for enum type using Template
> seastar-json2code: extract add_operation() out
> reactor: Re-ifdef SIGSEGV sigaction installing
> reactor: Re-ifdef reactor::enable_timer()
> reactor: Re-ifdef task_histogram_add_task()
> reactor: Re-ifdef install_signal_handler_stack()
Closesscylladb/scylladb#17714
This small series improves the Alternator tests for metrics:
1. Improves some comments in the test.
2. Restores a test that was previously hidden by two tests having the same name.
3. Adds tests for latency histogram metrics.
Closesscylladb/scylladb#17623
* github.com:scylladb/scylladb:
test/alternator: tests for latency metrics
test/alternator: improve comments and unhide hidden test
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `exception_container<..>`
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for `utils::human_readable_value`,
and drop its operator<<
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* std::strong_ordering
* std::weak_ordering
* std::partial_ordering
and their operator<<:s are moved to test/lib/test_utils.{hh,cc}, as they
are only used by Boost.test.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
There are four stages left to handle: cleanup, cleanup_target, end_migration and revert_migration. All are handling removed nodes already, so the PR just extends the test.
fixes: #16527Closesscylladb/scylladb#17684
* github.com:scylladb/scylladb:
test/tablets_migration: Test revert_migration failure handling
test/tablets_migration: Test end_migration failure handling
test/tablets_migration: Test cleanup_target failure handling
test/tablets_migration: Test cleanup failure handling
test/tablets_migration: Prepare for do_... stages
test/tablets_migration: Add ability to removenode via any other node
test/tablets_migration: Wrap migration stages failing code into a helper class
storage_service: Add failure injection to crash cleanup_tablet
Instead of a functor, for those metrics that just return the value of an
existing member variable. This is ever so slightly more efficient than a
functor.
Closesscylladb/scylladb#17726
In test/alternator/test_metrics.py we had tests for the operation-count
metrics for different Alternator API operations, but not for the latency
histograms for these same operations. So this patch adds the missing
tests (and removes a TODO asking to do that).
Note that only a subset of the operations - PutItem, GetItem, DeleteItem,
UpdateItem, and GetRecords - currently have a latency history, and this
test verifies this. We have an issue (Refs #17616) about adding latency
histograms for more operations - at which point we will be able to expand
this test for the additional operations.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The original goal of this patch was to improve comments in
test/alternator/test_metrics.py, but while doing that I discovered
that one of the test functions was hidden by a second test with
the same name! So this patch also renames the second test.
The test continues to work after this patch - the hidden test
was successful.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().
This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.
Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.
Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.
Fixes: https://github.com/scylladb/scylladb/issues/17342Closesscylladb/scylladb#17405
* github.com:scylladb/scylladb:
storage_service/ownership: discard get_ownership() requests when tablets enabled
storage_service/ownership/{keyspace}: handle requests when tablets are enabled
locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual
locator/tablets: add tablet_map::get_sorted_tokens()
pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient
rest_api/test_storage_service: add simplistic tests of ownership API for vnodes
Seastar removed `task_queue::_current` in
258b11220d343d8c7ae1a2ab056fb5e202723cc8 . let's adapt scylla-gdb.py
accordingly. despite that `current_scheduling_group_ptr()` is an internal
API, it's been around for a while, and relatively stable. so let's use
it instead.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17720
The short series allows do_status_check to handle down nodes that don't have HOST_ID application state.
Fixes#16936Closesscylladb/scylladb#17024
* github.com:scylladb/scylladb:
gossiper: do_status_check: fixup indentation
gossiper: do_status_check: allow evicting dead nodes from membership with no host_id
gossiper: print the host_id when endpoint state goes UP/DOWN
gossiper: get_host_id: differentiate between no endpoint_state and no application_state
gms: endpoint_state: add get_host_id
gossiper: do_status_check: continue loop after evicting FatClient