instead of materializing the `managed_bytes_view` to a string, and
print it, print it directly to stdout. this change helps to deprecate
`to_hex()` helpers, we should materialize string only when necessary.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17463
Those that collect vectors with ks/cf names can reserve the vectors in advance. Also one of those can use range loop for shorter code
Closesscylladb/scylladb#17433
* github.com:scylladb/scylladb:
api: Reserve vectors in advance
api: Use range-loop to iterate keyspaces
When topology barrier is blocked for longer than configured threshold
(2s), stale versions are marked as stalled and when they get released
they report backtrace to the logs. This should help to identify what
was holding for token metadata pointer for too long.
Example log:
token_metadata - topology version 30 held for 299.159 [s] past expiry, released at: 0x2397ae1 0x23a36b6 ...
Closesscylladb/scylladb#17427
When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance.
Fixes: #16484Closesscylladb/scylladb#16802
* github.com:scylladb/scylladb:
test/cql-pytest: remove skip_with_tablets fixture
test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests
test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets
multishard_mutation_query: add tablets support
multishard_mutation_query: remove compaction-state from result-builder factory
multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>>
mutation_query: reconcilable_result: add merge_disjoint()
locator: introduce tablet_range_spliter
dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive
interval: add before() overload which takes another interval
The default AIO backend requires AIO blocks. On production systems, all
available AIO blocks could have been already taken by ScyllaDB. Even
though the tools only require a single unit, we have seen cases where
not even that is available, ScyllDB having siphoned all of the available
blocks.
We could try to ensure all deployments have some spare blocks, but it is
just less friction to not have to deal with this problem at all, by just
using the epoll backend. We don't care about performance in the case of
the tools anyway, so long as they are not unreasonably slow. And since
these tools are replacing legacy tools written in Java, the bar is low.
Closesscylladb/scylladb#17438
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for following types
* query::result::printer
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for following types
* query::result_set
* query::result_set_row
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for following types
* query::specific_ranges
* query::partition_slice
* query::read_command
* query::forward_request
* query::forward_request::reduction_type
* query::forward_request::aggregation_info
* query::forward_result::printer
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
To run with both vnodes and tablets. For this functionality, both
replication methods should be covered with tests, because it uses
different ways to produce partition lists, depending on the replication
method.
Also add scylla_only to those tests that were missing this fixture
before. All tests in this suite are scylla-only and with the
parameterization, this is even more apparent.
When reading a list of ranges with tablets, we don't need a multishard
reader. Instead, we intersect the range list with the local nodes tablet
ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with
database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets,
multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the
coordinator. The only code using multishard_mutation_query.cc are
forced, replica-local scans, like those used by SELECT * FROM
MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests,
so we optimize for simplicity, not performance.
This param was used by the query-result builder, to set the
last-position on end-of-stream. Instead, do this via a new ResultBuilder
method, maybe_set_last_position(), which is called from read_page(),
which has access to the compaction-state.
With this, the ResultBuilder can be created without a compaction-state
at hand. This will be important in the next patch.
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
Consider the inclusiveness of the token-range's start and end bounds and
copy the flag to the output bounds, instead of assuming they are always
inclusive.
The current point variant cannot take inclusiveness into account, when
said point comes from another interval bound.
This method had no tests at all, so add tests covering both overloads.
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.
Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.
The unit tests are renamed and range.hh is deleted.
Closesscylladb/scylladb#17428
In order to avoid running out of memory, we can't
underestimate the memory used when processing a view
update. Particularly, we need to handle the remote
view updates well, because we may create many of them
at the same time in contrast to local updates which
are processed synchronously.
After investigating a coredump generated in a crash
caused by running out of memory due to these remote
view updates, we found that the current estimation
is much lower than what we observed in practice; we
identified overhead of up to 2288 bytes for each
remote view update. The overhead consists of:
- 512 bytes - a write_response_handler
- less than 512 bytes - excessive memory allocation
for the mutation in bytes_ostream
- 448 bytes - the apply_to_remote_endpoints coroutine
started in mutate_MV()
- 192 bytes - a continuation to the coroutine above
- 320 bytes - the coroutine in result_parallel_for_each
started in mutate_begin()
- 112 bytes - a continuation to the coroutine above
- 192 bytes - 5 unspecified allocations of 32, 32, 32,
48 and 48 bytes
This patch changes the previous overhead estimate
of 256 bytes to 2288 bytes, which should take into
account all allocations in the current version of the
code. It's worth noting that changes in the related
pieces of code may result in a different overhead.
The allocations seem to be mostly captures for the
background tasks. Coroutines seem to allocate extra,
however testing shows that replacing a coroutine with
continuations may result in generating a few smaller
futures/continuations with a larger total size.
Besides that, considering that we're waiting for
a response for each remote view update, we need the
relatively large write_response_handler, which also
includes the mutation in case we needed to reuse it.
The change should not majorly affect workloads with many
local updates because we don't keep many of them at
the same time anyway, and an added benefit of correct
memory utilization estimation is avoiding evictions
of other memory that would be otherwise necessary
to handle the excessive memory used by view updates.
Fixes#17364Closesscylladb/scylladb#17420
It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration.
This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier.
To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field.
refs: #16527Closesscylladb/scylladb#17360
* github.com:scylladb/scylladb:
test/topology: Add checking error paths for failed migration
topology.tablets_migration: Handle failed streaming
topology.tablets_migration: Add cleanup_target transition stage
topology.tablets_migration: Add revert_migration transition stage
storage_service: Rewrap cleanup stage checking in cleanup_tablet()
test/topology: Move helpers to get tablet replicas to pylib
This PR removes information about outdated versions, including disclaimers and information when a given feature was added.
Now that the documentation is versioned, information about outdated versions is unnecessary (and makes the docs harder to read).
Fixes https://github.com/scylladb/scylladb/issues/12110Closesscylladb/scylladb#17430
Some endpoints in api/column_family fill vectors with data obtained from
database and return them back. Since the amount of data is known in
advance, it's good to reserve the vector.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Set filesystem permissions for the maintenance socket to 660 (previously it was 755) to allow a scyllaadm's group to connect.
Split the logic of creating sockets into two separate functions, one for each case: when it is a regular cql controller or used by maintenance_socket.
Fixes https://github.com/scylladb/scylladb/issues/16487.
Closesscylladb/scylladb#17113
* github.com:scylladb/scylladb:
maintenance_socket: add option to set owning group
transport/controller: get rid of magic number for socket path's maximal length
transport/controller: set unix_domain_socket_permissions for maintenance_socket
transport/controller: pass unix_domain_socket_permissions to generic_server::listen
transport/controller: split configuring sockets into separate functions
Using `parallel_for_each_table` instance of `for_each_table_gently` on
`repair_service::load_history`, to reduced bootstrap time.
Using uuid_xor_to_uint32 on repair load_history dispatch to shard.
Ref: https://github.com/scylladb/scylladb/issues/16774Closesscylladb/scylladb#16927
* github.com:scylladb/scylladb:
repair: resolve load_history shard load skew
repair: accelerate repair load_history time
in da53854b66, we added formatter for printing a `node*`, and switched
to this formatter when printing `node*`. but we failed to update some
caller sites when migrating to the new formatter, where a
`unique_ptr<node>` is printed instead. this is not the behavior before
the change, and is not expected.
so, in this change, we explicitly instantiate `node_printer` instances
with the pointer held by `unique_ptr<node>`, to restore the behavior
before da53854b66.
this issue was identified when compiling the tree using {fmt} v10 and
compile-time format-string check enabled, which is yet upstreamed to
Seastar.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17418
To allow to filter the returned keyspaces based by the replication they
use: tablets or vnodes.
The filter can be disabled by omitting the parameter or passing "all".
The default is "all".
Fixes: #16509Closesscylladb/scylladb#17319
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `raft::fsm`, and drop its
operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17414
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `mutation_partition::printer`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17419
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`cached_promoted_index::promoted_index_block`, and drop its
operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17415
so we exercise the cases where state and status are not "normal" and "up".
turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`.
* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
should help with debugging.
Fixes#17401Closesscylladb/scylladb#17417
* github.com:scylladb/scylladb:
test/nodetool: parameterize test_ring
test/nodetool: fail a test only with leftover expected requests
For now only fail streaming stage and check that migration doesn't get
stuck and doesn't make tablet appear on dead node.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In case pending or leaving replica is marked as ignored by operator,
streaming cannot be retried and should jump to "cleanup_target" stage
after a barrier.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The new stage will be used to revert migration that fails at some
stages. The goal is to cleanup the pending replica, which may already
received some writes by doing the cleanup RPC to the pending replica,
then jumping to "revert_migration" stage introduced earlier.
If pending node is dead, the call to cleanup RPC is skipped.
Coordinators use old replicas.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's like end_migration, but old replicas intact just removing the
transition (including new replicas).
Coordinators use old replicas.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patch will need to teach this code to handle new cleanup_target
stage, this change prepares this place for smoother patching
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
so we exercise the cases where state and status are not "normal" and "up".
turns out the MBean is able to cache some objects. so the requets
retrieving datacenter and rack are now marked `ANY`.
Fixes#17401
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
if there are unconsumed requests whose `multiple` is -1, we should
not consider it a required, the test can consume it or not. but if
it does not, we should not consider the test a failure just because
these requests are sitting at the end of queue.
so, in this change, we
* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
should help with debugging.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The mentioned test failed on CI. It sets up two nodes and performs
operations related to creation and dropping of tables as well as
moving tablets. Locally, the issue was not visible - also, the test
was passing on CI in majority of cases.
One of steps in the test case is intended to select the shard that
has some tablets on host_0 and then move them to (host_1, shard_3).
It contains also a precondition that requires the tablets count to
be greater than zero - to ensure, that move_tablets operation really
moves tablets.
The error message in the failed CI run comes from the precondition
related to tablets count on (host0, src_shard) - it was zero.
This indicated that there were no tablets on entire host_0.
The following commit removes the assumption about the existence of
tablets on host_0. In case when there are no tablets there, the
procedure is rerun for host_1.
Now the logic is as follows:
- find shard that has some tablets on host_0
- if such shard does not exist, then find such shard on host_1
- depending on the result of search set src/dest nodes
- verify that reported tablet count metric is changed when
move_tablet operation finishes
Refs: scylladb#17386
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17398
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`attribute_path_map_node<update_expression::action>`, and drop its
operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17270