For tables using tablet based replication strategies, the sstables
should be reshaped only within the compaction groups they belong to.
Updated shard_reshaping_compaction_task_impl to group the sstables based
on their compaction groups before reshaping them within the groups.
Fixes#16966
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
The total reshaped size should only be updated on reshape success and
not after reshape has been failed due to some exception.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Catch and handle the exceptions directly instead of rethrowing and
catching again.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `alternator::parsed::path`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17458
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* row_tombstone
* row_marker
* deletable_row::printer
* row::printer
* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer
and drop their operator<<:s
Refs #13245Closesscylladb/scylladb#17461
* github.com:scylladb/scylladb:
mutation: add fmt::formatter for clustering_row and friends
mutation: add fmt::formatter for row_tombstone and friends
Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token.
We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility.
Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping.
We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.
Closesscylladb/scylladb#17455
* github.com:scylladb/scylladb:
interval: rename nonwrapping_interval to interval
interval: rename interval_test to wrapping_interval_test
in af2553e8, we added formatters for cdc::image_mode and
cdc::delta_mode. but in that change, we failed to qualify `string_view`
with `std::` prefix. even it compiles, it depends on a `using
std::string_view` or a more error-prone `using namespace std`.
neither of which shold be relied on. so, in this change, we
add the `std::` prefix to `string_view`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17459
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* `streaming::stream_request`,
* `stream_session_state`
and drop their operator<<:s
Refs #13245Closesscylladb/scylladb#17464
* github.com:scylladb/scylladb:
streaming: add fmt::formatter for streaming::stream_request
streaming: add fmt::formatter for stream_session_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode``
and drop their operator<<:s.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17441
cluster_status_table virtual table have a status field for each node. In
gossiper mode the status is taken from the gossiper, but with raft the
states are different and are stored in the topology state machine. The
series fixes the code to check current mode and take the status from
correct place.
Refs scylladb/scylladb#16984
* 'gleb/cluster_status_table-v1' of github.com:scylladb/scylla-dev:
gossiper: remove unused REMOVAL_COORDINATOR state
virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled
virtual_tables: create result for cluster_status_table read on shard 0
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.
In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. This PR adjusts the Raft-based topology
to ensure all required generations are loaded into memory and their
data isn't cleared too early.
To load all required generations into memory, we replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
To ensure this set doesn't grow endlessly, we remove an entry from
this set together with the data in CDC_GENERATIONS_V3.
Currently, we may clear a CDC generation's data from
CDC_GENERATIONS_V3 if it is not the last committed generation
and it is at least 24 hours old (according to the topology
coordinator's clock). However, after allowing writes to the
previous CDC generations, this condition became incorrect. We
might clear data of a generation that could still be written to.
The new solution introduced in this PR is to clear data of the
generations that finished operating more than 24 hours ago.
Apart from the changes mentioned above, this PR hardens
`test_cdc_generation_clearing.py`.
Fixesscylladb/scylladb#16916Fixesscylladb/scylladb#17184Fixesscylladb/scylladb#17288Closesscylladb/scylladb#17374
* github.com:scylladb/scylladb:
test: harden test_cdc_generation_clearing
test: test clean-up of committed_cdc_generations
raft topology: clean committed_cdc_generations
raft topology: clean only obsolete CDC generations' data
storage_service: topology_state_load: load all committed CDC generations
system_keyspace: load_topology_state: fix indentation
raft topology: store committed CDC generations' IDs in the topology
removenode --force is an unsafe operation and does not even make sense with
topology over raft. This patch disables it if raft is enabled and prints
a deprecation note otherwise. We already have a PR to remove it
(https://github.com/scylladb/scylladb/pull/15834), but it was decided
there that a deprecation period is needed for legacy use case.
Fixes: scylladb/scylladb#16293
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer
and drop their operator<<:s
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
when '\' does not start an escape sequence, Python complains at seeing
it. but it continues anyway by considering '\' as a separate char.
but the warning message is still annoying:
```
scylla-gdb.py: 2417: SyntaxWarning: invalid escape sequence '\-'
branches = (r" |-- ", " \-- ")
```
when sourcing this script.
so, let's mark these strings as raw strings.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17466
we already have generic operator<< based formatter for sequence-alike
ranges defined in `utils/to_string.hh`, but as a part of efforts to
address #13245, we will eventually drop the formatter.
to prepare for this change, we should create/find the alternatives
where the operator<< for printing the ranges is still used.
Boost::program_options is one of them. it prints the options' default
values using operator<< in its error message or usage. so in order
to keep it working, we define operator<< for `vector<sstring>` here.
if there are more types are required, we will need the generalize
this formatter. if there are more needs from different compiling
units, we might need to extract this helper into, for instance,
`utils/to_string.hh`. but we should do this after removing it.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17413
instead of materializing the `managed_bytes_view` to a string, and
print it, print it directly to stdout. this change helps to deprecate
`to_hex()` helpers, we should materialize string only when necessary.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17463
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `streaming::stream_request`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`streaming::stream_session_state`, and drop its operator<<
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* row_tombstone
* row_marker
* deletable_row::printer
* row::printer
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Our interval template started life as `range`, and was supported
wrapping to follow Cassandra's convention of wrapping around the
maximum token.
We later recognized that an interval type should usually be non-wrapping
and split it into wrapping_range and nonwrapping_range, with `range`
aliasing wrapping_range to preserve compatibility.
Even later, we realized the name was already taken by C++ ranges and
so renamed it to `interval`. Given that intervals are usually non-wrapping,
the default `interval` type is non-wrapping.
We can now simplify it further, recognizing that everyone assumes
that an interval is non-wrapping and so doesn't need the
nonwrapping_interval_designation. We just rename nonwrapping_interval
to `interval` and remove the type alias.
Those that collect vectors with ks/cf names can reserve the vectors in advance. Also one of those can use range loop for shorter code
Closesscylladb/scylladb#17433
* github.com:scylladb/scylladb:
api: Reserve vectors in advance
api: Use range-loop to iterate keyspaces
When topology barrier is blocked for longer than configured threshold
(2s), stale versions are marked as stalled and when they get released
they report backtrace to the logs. This should help to identify what
was holding for token metadata pointer for too long.
Example log:
token_metadata - topology version 30 held for 299.159 [s] past expiry, released at: 0x2397ae1 0x23a36b6 ...
Closesscylladb/scylladb#17427
When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance.
Fixes: #16484Closesscylladb/scylladb#16802
* github.com:scylladb/scylladb:
test/cql-pytest: remove skip_with_tablets fixture
test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests
test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets
multishard_mutation_query: add tablets support
multishard_mutation_query: remove compaction-state from result-builder factory
multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>>
mutation_query: reconcilable_result: add merge_disjoint()
locator: introduce tablet_range_spliter
dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive
interval: add before() overload which takes another interval
The default AIO backend requires AIO blocks. On production systems, all
available AIO blocks could have been already taken by ScyllaDB. Even
though the tools only require a single unit, we have seen cases where
not even that is available, ScyllDB having siphoned all of the available
blocks.
We could try to ensure all deployments have some spare blocks, but it is
just less friction to not have to deal with this problem at all, by just
using the epoll backend. We don't care about performance in the case of
the tools anyway, so long as they are not unreasonably slow. And since
these tools are replacing legacy tools written in Java, the bar is low.
Closesscylladb/scylladb#17438
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for following types
* query::result::printer
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for following types
* query::result_set
* query::result_set_row
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for following types
* query::specific_ranges
* query::partition_slice
* query::read_command
* query::forward_request
* query::forward_request::reduction_type
* query::forward_request::aggregation_info
* query::forward_result::printer
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `formatted_sstables_list`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode`
and drop their operator<<:s.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
To run with both vnodes and tablets. For this functionality, both
replication methods should be covered with tests, because it uses
different ways to produce partition lists, depending on the replication
method.
Also add scylla_only to those tests that were missing this fixture
before. All tests in this suite are scylla-only and with the
parameterization, this is even more apparent.
When reading a list of ranges with tablets, we don't need a multishard
reader. Instead, we intersect the range list with the local nodes tablet
ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with
database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets,
multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the
coordinator. The only code using multishard_mutation_query.cc are
forced, replica-local scans, like those used by SELECT * FROM
MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests,
so we optimize for simplicity, not performance.
This param was used by the query-result builder, to set the
last-position on end-of-stream. Instead, do this via a new ResultBuilder
method, maybe_set_last_position(), which is called from read_page(),
which has access to the compaction-state.
With this, the ResultBuilder can be created without a compaction-state
at hand. This will be important in the next patch.
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
Consider the inclusiveness of the token-range's start and end bounds and
copy the flag to the output bounds, instead of assuming they are always
inclusive.
The current point variant cannot take inclusiveness into account, when
said point comes from another interval bound.
This method had no tests at all, so add tests covering both overloads.
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.
Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.
The unit tests are renamed and range.hh is deleted.
Closesscylladb/scylladb#17428
In order to avoid running out of memory, we can't
underestimate the memory used when processing a view
update. Particularly, we need to handle the remote
view updates well, because we may create many of them
at the same time in contrast to local updates which
are processed synchronously.
After investigating a coredump generated in a crash
caused by running out of memory due to these remote
view updates, we found that the current estimation
is much lower than what we observed in practice; we
identified overhead of up to 2288 bytes for each
remote view update. The overhead consists of:
- 512 bytes - a write_response_handler
- less than 512 bytes - excessive memory allocation
for the mutation in bytes_ostream
- 448 bytes - the apply_to_remote_endpoints coroutine
started in mutate_MV()
- 192 bytes - a continuation to the coroutine above
- 320 bytes - the coroutine in result_parallel_for_each
started in mutate_begin()
- 112 bytes - a continuation to the coroutine above
- 192 bytes - 5 unspecified allocations of 32, 32, 32,
48 and 48 bytes
This patch changes the previous overhead estimate
of 256 bytes to 2288 bytes, which should take into
account all allocations in the current version of the
code. It's worth noting that changes in the related
pieces of code may result in a different overhead.
The allocations seem to be mostly captures for the
background tasks. Coroutines seem to allocate extra,
however testing shows that replacing a coroutine with
continuations may result in generating a few smaller
futures/continuations with a larger total size.
Besides that, considering that we're waiting for
a response for each remote view update, we need the
relatively large write_response_handler, which also
includes the mutation in case we needed to reuse it.
The change should not majorly affect workloads with many
local updates because we don't keep many of them at
the same time anyway, and an added benefit of correct
memory utilization estimation is avoiding evictions
of other memory that would be otherwise necessary
to handle the excessive memory used by view updates.
Fixes#17364Closesscylladb/scylladb#17420
It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration.
This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier.
To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field.
refs: #16527Closesscylladb/scylladb#17360
* github.com:scylladb/scylladb:
test/topology: Add checking error paths for failed migration
topology.tablets_migration: Handle failed streaming
topology.tablets_migration: Add cleanup_target transition stage
topology.tablets_migration: Add revert_migration transition stage
storage_service: Rewrap cleanup stage checking in cleanup_tablet()
test/topology: Move helpers to get tablet replicas to pylib
This PR removes information about outdated versions, including disclaimers and information when a given feature was added.
Now that the documentation is versioned, information about outdated versions is unnecessary (and makes the docs harder to read).
Fixes https://github.com/scylladb/scylladb/issues/12110Closesscylladb/scylladb#17430
Some endpoints in api/column_family fill vectors with data obtained from
database and return them back. Since the amount of data is known in
advance, it's good to reserve the vector.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>