In one of the previous patches, we fixedscylladb/scylladb#16916 as
a side effect. We removed
`system_keyspace::get_cdc_generations_cleanup_candidate`, which
contained the bug causing the issue.
Even though we didn't have to fix this issue directly, it showed us
that `test_cdc_generation_clearing` was too weak. If something went
wrong during/after the only clearing, the test still could pass
because the clearing was the last action in the test. In
scylladb/scylladb#16916, the CDC generation publisher was stuck
after the clearing because of a recurring error. The test wouldn't
detect it. Therefore, we harden the test by expecting two clearings
instead of one. If something goes wrong during the first clearing,
there is a high chance that the second clearing will fail. The new
test version wouldn't pass with the old bug in the code.
We extend `test_cdc_generation_clearing`. Now, it also tests the
clean-up of `TOPOLOGY.committed_cdc_generations` added in the
previous patch.
In the implementation, we harden the already existing
`check_system_topology_and_cdc_generations_v3_consistency`. After
the previous patch, data of every generation present in
`committed_cdc_generations` should be present in CDC_GENERATIONS_V3.
In other words, `committed_cdc_generations` should always be a
subset of a set containing generations in CDC_GENERATIONS_V3.
Before the previous patch, this wasn't true after the clearing, so
the new version of `test_cdc_generation_clearing` wouldn't pass
back then.
We clean `TOPOLOGY.committed_cdc_generations` from obsolete
generations to ensure this set doesn't grow endlessly. After this
patch, the following invariant will be true: if a generation is in
`committed_cdc_generation`, its data is in CDC_GENERATIONS_V3.
Currently, we may clear a CDC generation's data from
CDC_GENERATIONS_V3 if it is not the last committed generation
and it is at least 24 hours old (according to the topology
coordinator's clock). However, after allowing writes to the
previous CDC generations, this condition became incorrect. We
might clear data of a generation that could still be written to.
The new solution is to clear data of the generations that
finished operating more than 24 hours ago. The rationale behind
it is in the new comment in
`topology_coordinator:clean_obsolete_cdc_generations`.
The previous solution used the clean-up candidate. After
introducing `committed_cdc_generations`, it became unneeded.
The last obsolete generation can be computed in
`topology_coordinator:clean_obsolete_cdc_generations`. Therefore,
we remove all the code that handles the clean-up candidate.
After changing how we clear CDC generations' data,
`test_current_cdc_generation_is_not_removed` became obsolete.
The tested feature is not present in the code anymore.
`test_dependency_on_timestamps` became the only test case covering
the CDC generation's data clearing. We adjust it after the changes.
We load all committed CDC generations into `cdc::metadata`. Since
we have allowed sending writes to the previous generations in
scylladb/scylladb#17134, the committed generations may be necessary
to handle a correct request.
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.
In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. We need to adjust the Raft-based topology to ensure
all required generations are loaded into memory and their data
isn't cleared too early.
This patch is the first step of the adjustment. We replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
This set is sorted by timestamps, just like
`unpublished_cdc_generations`.
This patch is mostly refactoring. The last generation in
`committed_cdc_generations` is the equivalent of the previous
`current_cdc_generation_{uuid, timestamp}`. The other generations
are irrelevant for now. They will be used in the following patches.
After introducing `committed_cdc_generations`, a newly committed
generation is also unpublished (it was current and unpublished
before the patch). We introduce `add_new_committed_cdc_generation`,
which updates both sets of generations so that we don't have to
call `add_committed_cdc_generation` and
`add_unpublished_cdc_generation` together. It's easy to forget
that both of them are necessary. Before this patch, there was
no call to `add_unpublished_cdc_generation` in
`topology_coordinator::build_coordinator_state`. It was a bug
reported in scylladb/scylladb#17288. This patch fixes it.
This patch also removes "the current generation" notion from the
Raft-based topology. For the Raft-based topology, the current
generation was the last committed generation. However, for the
`cdc::metadata`, it was the generation operating now. These two
generations could be different, which was confusing. For the
`cdc::metadata`, the current generation is relevant as it is
handled differently, but for the Raft-based topology, it isn't.
Therefore, we change only the Raft-based topology. The generation
called "current" is called "the last committed" from now.
To allow to filter the returned keyspaces based by the replication they
use: tablets or vnodes.
The filter can be disabled by omitting the parameter or passing "all".
The default is "all".
Fixes: #16509Closesscylladb/scylladb#17319
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `raft::fsm`, and drop its
operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17414
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `mutation_partition::printer`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17419
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`cached_promoted_index::promoted_index_block`, and drop its
operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17415
so we exercise the cases where state and status are not "normal" and "up".
turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`.
* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
should help with debugging.
Fixes#17401Closesscylladb/scylladb#17417
* github.com:scylladb/scylladb:
test/nodetool: parameterize test_ring
test/nodetool: fail a test only with leftover expected requests
so we exercise the cases where state and status are not "normal" and "up".
turns out the MBean is able to cache some objects. so the requets
retrieving datacenter and rack are now marked `ANY`.
Fixes#17401
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
if there are unconsumed requests whose `multiple` is -1, we should
not consider it a required, the test can consume it or not. but if
it does not, we should not consider the test a failure just because
these requests are sitting at the end of queue.
so, in this change, we
* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
should help with debugging.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The mentioned test failed on CI. It sets up two nodes and performs
operations related to creation and dropping of tables as well as
moving tablets. Locally, the issue was not visible - also, the test
was passing on CI in majority of cases.
One of steps in the test case is intended to select the shard that
has some tablets on host_0 and then move them to (host_1, shard_3).
It contains also a precondition that requires the tablets count to
be greater than zero - to ensure, that move_tablets operation really
moves tablets.
The error message in the failed CI run comes from the precondition
related to tablets count on (host0, src_shard) - it was zero.
This indicated that there were no tablets on entire host_0.
The following commit removes the assumption about the existence of
tablets on host_0. In case when there are no tablets there, the
procedure is rerun for host_1.
Now the logic is as follows:
- find shard that has some tablets on host_0
- if such shard does not exist, then find such shard on host_1
- depending on the result of search set src/dest nodes
- verify that reported tablet count metric is changed when
move_tablet operation finishes
Refs: scylladb#17386
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17398
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`attribute_path_map_node<update_expression::action>`, and drop its
operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17270
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`read_context::dismantle_buffer_stats`, and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17389
`-Wno-unused-command-line-argument` is used to disable the warning of
`-Wunused-command-line-argument`, which is in turn used to split
warnings if any of the command line arguments passed to the compiler
driver is not used. see
https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-command-line-argument
but it seems we are not passing unused command line arguments to
the compiler anymore. so let's drop this option.
this change helps to
* reduce the discrepencies between the compiling options used by
CMake-generated rules and those generated directly using
`configure.py`
* reenable the warning so we are aware if any of the options
is not used by compiler. this could a sign that the option fails
to serve its purpose.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17195
There's a bunch of debug- and trace-level logging of locator::node-s that also include current_backtrace(). Printing node is done via debug_format() helper that generates and returns an sstring to print. Backtrace printing is not very lightweight on its own because of backtrace collecting. Not to slow things down in info log level, which is default, all such prints are wrapped with explicit if-s about log-level being enabled or not.
This PR removes those level checks by introducing lazy_backtrace() helper and by providing a formatter for nodes that also results in lazy node format string calculation.
Closesscylladb/scylladb#17235
* github.com:scylladb/scylladb:
topology: Restore indentation after previous patch
topology: Drop if_enabled checks for logging
topology: Add lazy_backtrace() helper
topology: Add printer wrapper for node* and formatter for it
topology: Expand formatter<locator::node>
With this commit:
- The information about ScyllaDB Enterprise OS support
is removed from the Open Source documentation.
- The information about ScyllaDB Open Source OS support
is moved to the os-support-info file in the _common folder.
- The os-support-info file is included in the os-support page
using the scylladb_include_flag directive.
This update employs the solution we added with
https://github.com/scylladb/scylladb/pull/16753.
It allows to dynamically add content to a page
depending on the opensource/enterprise flag.
Refs https://github.com/scylladb/scylladb/issues/15484Closesscylladb/scylladb#17310
Before the patch if a decommissioned node tries
to restart, it calls _group0->discover_group0 first
in join_cluster, which hangs since decommissioned
nodes are banned and other nodes don't respond
to their discovering requests.
We fix the problem by checking was_decommissioned()
flag before calling discover_group0.
fixesscylladb/scylladb#17282Closesscylladb/scylladb#17358
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `gms::versioned_value`. its
operator<< is preserved, as it's still being used by the homebrew
generic formatter for std::unordered_map<gms::application_state,
gms::versioned_value>, which is in turn used in gms/gossiper.cc.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17366
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.
Fixes: #17046
Tests: test_tablet_repair_history
Closesscylladb/scylladb#17047
* github.com:scylladb/scylladb:
repair: Update repair history for tablet repair
repair: Extract flush hints code
ldflags are passed to ld (the linker), while cxxflags are passed to the
C++ compiler. the compiler does not understand the ldflags. if we
pass ldflags to it, it complains if `-Wunused-command-line-argument` is
enabled.
in this change, we do not include the ldflags in cxxflags, this helps
us to enable the warning option of `-Wunused-command-line-argument`,
so we don't need to disabled it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17328
As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation.
Refs: #15588Closesscylladb/scylladb#17368
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the repair command
test/nodetool: utils: add check_nodetool_fails_with_error_contains()
test/nodetool: util: replace flags with custom matcher
This is mostly a refactoring commit to make the test
more readable, as a byproduct of
scylladb/scylladb#17369 investigation.
We add the check for specific type of exceptions that
can be thrown (bad_property_file_error).
We also fix the potential race - the test may write
to res from multiple cores with no locks.
Closesscylladb/scylladb#17371
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
- gms::gossip_digest
- gms::gossip_digest_ack
- gms::gossip_digest_syn
and drop their operator<<:s
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17379
This API endpoint was failing when tablets were enabled
because of usage of get_vnode_effective_replication_map().
Moreover, it was providing an error message that was not
user-friendly.
This change extends the handler to properly service the incoming requests.
Furthermore, it introduces two new test cases that verify the behavior of
storage_service/range_to_endpoint_map API. It also adjusts the test case
of this endpoint for vnodes to succeed when tablets are enabled by default.
The new logic is as follows:
- when tablets are disabled then users may query endpoints
for a keyspace or for a given table in a keyspace
- when tablets are enabled then users have to provide
table name, because effective replication map is per-table
When user does not provide table name when tablets are enabled
for a given keyspace, then BAD_REQUEST is returned with a
meaningful error message.
Fixes: scylladb#17343
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17372
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
cdc::image_mode and cdc::delta_mode, and drop their operator<<:s.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17381
Before this PR, writes to the previous CDC generations would
always be rejected. After this PR, they will be accepted if the
write's timestamp is greater than `now - generation_leeway`.
This change was proposed around 3 years ago. The motivation was
to improve user experience. If a client generates timestamps by
itself and its clock is desynchronized with the clock of the node
the client is connected to, there could be a period during
generation switching when writes fail. We didn't consider this
problem critical because the client could simply retry a failed
write with a higher timestamp. Eventually, it would succeed. This
approach is safe because these failed writes cannot have any side
effects. However, it can be inconvenient. Writing to previous
generations was proposed to improve it.
The idea was rejected 3 years ago. Recently, it turned out that
there is a case when the client cannot retry a write with the
increased timestamp. It happens when a table uses CDC and LWT,
which makes timestamps permanent. Once Paxos commits an entry
with a given timestamp, Scylla will keep trying to apply that entry
until it succeeds, with the same timestamp. Applying the entry
involves writing to the CDC log table. If it fails, we get stuck.
It's a major bug with an unknown perfect solution.
Allowing writes to previous generations for `generation_leeway` is
a probabilistic fix that should solve the problem in practice.
Apart from this change, this PR adds tests for it and updates
the documentation.
This PR is sufficient to enable writes to the previous generations
only in the gossiper-based topology. The Raft-based topology
needs some adjustments in loading and cleaning CDC generations.
These changes won't interfere with the changes introduced in this
PR, so they are left for a follow-up.
Fixesscylladb/scylladb#7251Fixesscylladb/scylladb#15260Closesscylladb/scylladb#17134
* github.com:scylladb/scylladb:
docs: using-scylla: cdc: remove info about failing writes to old generations
docs: dev: cdc: document writing to previous CDC generations
test: add test_writes_to_previous_cdc_generations
cdc: generation: allow increasing generation_leeway through error injection
cdc: metadata: allow sending writes to the previous generations
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.
Fixes: #17046
Tests: test_tablet_repair_history
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `hints::host_filter`. its
operator<< is preserved as it's still used by the homebrew generic
formatter for vector<>, which is in turn used by db/config.cc.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17347
This commit adds the missing redirections
to the pages whose source files were
previously stored in the install-scylla folder
and were moved to another location.
Closesscylladb/scylladb#17367
this change introduces a new exception which carries the status code
so that an operation can return a non-zero exit code without printing
any errors. this mimics the behavior of "viewbuildstatus" command of
C* nodetool.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17359
_do_check_nodetool_fails_with() currently has a `match_all` flag to
control how the match is checked. Now we need yet another way to control
how matching is done. Instead of adding yet another flag (and who knows
how many more), jut replace the flag and the errors input with a matcher
functor, which gets the stdout and stderr and is delegated to do any
checks it wants. This method will scale much better going forward.
Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace.
Fixes#16848Closesscylladb/scylladb#17335
* github.com:scylladb/scylladb:
compaction: upgrade: handle keyspaces that use tablets
replica/database: add an optional variant to get_keyspace_local_ranges
-Wunused-parameter, -Wmissing-field-initializers and -Wdeprecated-copy
warning options are enabled by -Wextra. the tree fails to build with
these options enabled, before we address them if the warning are genuine
problems, let's disable them.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17352
When a node changes IP address we need to remove its old IP from `system.peers` and gossiper.
We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted.
The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address.
The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1.
To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup.
The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes.
Fixes#16886Fixes#16691Fixes#17199Closesscylladb/scylladb#17162
* github.com:scylladb/scylladb:
test_change_ip: improve the test
raft_ip_address_updater: remove stale IPs from gossiper
raft_address_map: add my ip with the new generation
system_keyspace::update_peer_info: check ep and host_id are not empty
system_keyspace::update_peer_info: make host_id an explicit parameter
system_keyspace::update_peer_info: remove any_set flag optimisation
system_keyspace: remove duplicate ips for host_id
system_keyspace: peers table: use coroutines
storage_service::raft_ip_address_updater: log gossiper event name
raft topology: ip change: purge old IP
on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
Tables in keyspaces governed by replication strategy that uses tablets, have
separate effective_replication_maps. Update the upgrade compaction task to
handle this when getting owned key ranges for a keyspace.
Fixes#16848
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>