When checking features on startup (i.e. whether support for any feature
was revoked in an unsafe way), it might happen that upgrade to raft
topology didn't finish yet. In that case, instead of loading an empty
set of features - which supposedly represents the set of features that
were enabled until last boot - we should fall back to loading the set
from the legacy `enabled_features` key in `system.scylla_local`.
The topology coordinator fiber is not started if a node starts in legacy
topology mode. We need to start the raft state monitor fiber after all
preconditions for starting upgrade to raft topology are met.
Add a fiber which is spawned only in legacy mode that will wait until:
- The schema-on-raft upgrade finishes,
- The SUPPORTS_CONSISTENT_CLUSTER_MANAGEMENT feature is enabled,
- The upgrade is triggered by the user.
and, after that, will spawn the raft state monitor fiber.
All nodes being capable of support for raft topology is a prerequisite
for starting upgrade to raft topology. The newly introduced feature will
track this prerequisite.
Move it before the topology coordinator is started. This way, the
topology coordinator will see non-empty state when it is started and it
will allow for us to assert that topology coordinator is never started
for an empty system.topology table.
Extend the prepare_and_broadcast_cdc_generation_data function like we
did in the case of prepare_new_cdc_generation_data - the topology
coordinator state building process not only has to create a new
generation, but also broadcast it.
During topology coordinator state build phase a new cdc generation will
be generated. We can reuse prepare_new_cdc_generation_data for that.
Currently, it always takes sharding information (shard count + ignore
msb) from the topology state machine - which won't be available yet at
the point of building the topology, so extend the function so that it
can accept a custom source of sharding information.
The FIXME mentions that token metadata should return host ID for given
token (instead of, presumably, an IP) - but that is already the case, so
let's remove the fixme.
Forbid starting new topology changes while upgrade to topology on raft
is in progress. While this does not take into account any ongoing
topology operations, it makes sure that at the end of the upgrade no
node will try to perform any legacy topology operations.
It tells whether the current node currently operates in recovery mode or
not. It will be vital for storage_service in determining which topology
operations to use at startup.
The `discover_group0` function returns only after it either finds a node
that belongs to some group 0, or learns that the current node is
supposed to create a new one. It will be very helpful to storage_service
in determining which topology mode to use.
This was previously done by `setup_group0`, which always was an
(indirect) caller of `discover_group0`. As we want to make
`discover_group0` public, it's more convenient for the callers if the
called method takes care of sanitizing the argument.
The goal is to make `discover_group0` public. The `my_id` argument was
always set to `this->load_my_id()`, so we can get rid of it and it will
make it more convenient to call `discover_group0` from the outside.
Currently, nodes either operate in the topology-on-raft mode or legacy
mode, depending on whether the experimental topology on raft flag is
enabled. This also affects the way nodes join the cluster, as both modes
have different procedures.
We want to allow joining nodes in legacy mode until the cluster is
upgraded. Nodes should automatically choose the best method. Therefore,
the existing boolean _raft_topology_change_enabled flag is extended into
an enum with the following variants:
- unknown - the node still didn't decide in which mode it will operate
- legacy - the node uses legacy topology operations
- upgrading_to_raft - the node is upgrading to use raft topology
operations
- raft - the node uses raft topology operations
Currently, only the `legacy` and `raft` variants are utilized, but this
will change in the commits that follow.
Additionally, the `_raft_experimental_topology` bool flag is introduced
which retains the meaning of the old `_raft_topology_change_enabled` but
has a more fitting name. It is explicitly needed in
`topology_state_load`.
In this PR we add the tests for two scenarios, related to the use of IPs in raft topology.
* When the replaced node transitions to the `LEFT` state we used to
remove the IP of such node from gossiper. If we replace with same IP,
this caused the IP of the new node to be removed from gossiper. This
problem was fixed by #16820, this PR adds a regression test for it.
* When a node is restarted after decommissioning some other node, the
restarting node tries to apply the raft log, this log contains a
record about the decommissioned node, and we got stuck trying to resolve
its IP. This was fixed by #16639 - we excluded IPs from the RAFt log
application code and moved it entirely to host_id-s. This PR adds a
regression test for this case.
Closesscylladb/scylladb#15967Closesscylladb/scylladb#14803Closesscylladb/scylladb#17180
* github.com:scylladb/scylladb:
test_topology_ops: check node restart after decommission
test_replace_reuse_ip: check other servers see the IP
For efficiency, if a base-table update generates many view updates that
go the same partition, they are collected as one mutation. If this
mutation grows too big it can lead to memory exhaustion, so since
commit 7d214800d0 we split the output
mutation to mutations no longer than 100 rows (max_rows_for_view_updates)
each.
This patch fixes a bug where this split was done incorrectly when
the update involved range tombstones, a bug which was discovered by
a user in a real use case (#17117).
Range tombstones are read in two parts, a beginning and an end, and the
code could split the processing between these two parts and the result
that some of the range tombstones in update could be missed - and the
view could miss some deletions that happened in the base table.
This patch fixes the code in two places to avoid breaking up the
processing between range tombstones:
1. The counter "_op_count" that decides where to break the output mutation
should only be incremented when adding rows to this output mutation.
The existing code strangely incrmented it on every read (!?) which
resulted in the counter being incremented on every *input* fragment,
and in particular could reach the limit 100 between two range
tombstone pieces.
2. Moreover, the length of output was checked in the wrong place...
The existing code could get to 100 rows, not check at that point,
read the next input - half a range tombstone - and only *then*
check that we reached 100 rows and stop. The fix is to calculate
the number of rows in the right place - exactly when it's needed,
not before the step.
The first change needs more justification: The old code, that incremented
_op_count on every input fragment and not just output fragments did not
fit the stated goal of its introduction - to avoid large allocations.
In one test it resulted in breaking up the output mutation to chunks of
25 rows instead of the intended 100 rows. But, maybe there was another
goal, to stop the iteration after 100 *input* rows and avoid the possibility
of stalls if there are no output rows? It turns out the answer is no -
we don't need this _op_count increment to avoid stalls: The function
build_some() uses `co_await on_results()` to run one step of processing
one input fragment - and `co_await` always checks for preemption.
I verfied that indeed no stalls happen by using the existing test
test_long_skipped_view_update_delete_with_timestamp. It generates a
very long base update where all the view updates go to the same partition,
but all but the last few updates don't generate any view updates.
I confirmed that the fixed code loops over all these input rows without
increasing _op_count and without generating any view update yet, but it
does NOT stall.
This patch also includes two tests reproducing this bug and confirming
its fixed, and also two additional tests for breaking up long deletions
that I wanted to make sure doesn't fail after this patch (it doesn't).
By the way, this fix would have also fixed issue #12297 - which we
fixed a year ago in a different way. That issue happend when the code
went through 100 input rows without generating *any* output rows,
and incorrectly concluding that there's no view update to send.
With this fix, the code no longer stops generating the view
update just because it saw 100 input rows - it would have waited
until it generated 100 output rows in the view update (or the
input is really done).
Fixes#17117
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17164
The new rpc::optional parameter must come after any existing parameters,
including the rpc::source parameters, otherwise it will break
compatibility.
The regression was introduced in:
```
commit fd3c089ccc
Author: Tomasz Grabiec <tgrabiec@scylladb.com>
Date: Thu Oct 26 00:35:19 2023 +0200
service: range_streamer: Propagate topology_guard to receivers
```
We need to backport this patch ASAP before we release anything that
contains commit fd3c089ccc.
Refs: #16941Fixes: #17175Closesscylladb/scylladb#17176
RF values appear as strings and strategies classes convert them to integers. This PR removes some duplication of efforts in converting code.
Closesscylladb/scylladb#17132
* github.com:scylladb/scylladb:
network_topology_strategy: Do not walk list of datacenters twice
replication_strategy: Do not convert string RF into int twise
abstract_replication_strategy: Make validate_replication_factor return value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `data_dictionary::user_types_metadata`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17140
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `exceptions::exception_code`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17151
C++20 introduced a new overload of std::ostringstream::str() that is selected when the mentioned member function is called on r-value.
The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed.
This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer.
It also removes a helper function `inet_addr_type_impl::to_sstring()` - it was used only in two places. It was replaced with `fmt::to_string()`.
Closesscylladb/scylladb#16991
* github.com:scylladb/scylladb:
use fmt::to_string() for seastar::net::inet_address
types/types.cc: move stringstream content instead of copying it
so we can tighten our dependencies a little bit. there are only three places where we are using the `date` library. also, there is no need to reinvent the wheels if there are ready-to-use ones.
Closesscylladb/scylladb#17177
* github.com:scylladb/scylladb:
types: use {fmt} to format boolean
types: use {fmt} to format time
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `dht::sharder`, and drop
its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17178
* seastar 85359b28...289ad5e5 (19):
> net/dpdk: use user-defined literal when appropriate
> io_tester: Allow running on non-XFS fs
> io: Apply rate-factor early
> circular_buffer: make iterator default constructible
> net/posix: add a way to change file permissions of unix domain socket
> resource: move includes to the top of the source file
> treewide: replace calls to future::get0() by calls to future::get()
> core/future: add as_ready_future utility
> build: do not expose -Wno-error=#warnings
> coroutine: remove remnants of variadic futures
> build: prevent gcc -Wstringop-overflow workaround from affecting clang
> util/spinlock: use #warning instead of #warn
> io_tester: encapsulate code into allocate_and_fill_buffer()
> io_tester: make maybe_remove_file a function
> future: remove tuples from get0_return_type
> circular_buffer_fixed_capacity: use std::uninitialized_move() instead of open-coding
> rpc/rpc_types: do not use integer literal in preprocessor macro
> future: use "(T(...))" instead of "{T(...)}" in uninitialized_set()
> net/posix: include used header
Closesscylladb/scylladb#17179
Adds a missing logging import in the file scylladb_common_images extension, which prevents the enterprise build from building.
Additionally, it standardizes logging handling across the extensions and removes "ami" references in Azure and GCP extensions.
Closesscylladb/scylladb#17137
also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them.
this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra.
Closesscylladb/scylladb#17131
* github.com:scylladb/scylladb:
configure.py: add -Wextra to cflags
test/tablets: do not compare signed and unsigned
There used to be a problem with restarting a node after
decommissioning some other node - the restarting node
tries to apply the raft log, this log contains a record
about the decommissioned node, and we got stuck trying
to resolve its IP.
This was fixed in #16639 - we excluded IPs from
the RAFt log application code and moved it entirely
to host_id-s.
In this commit we add a regression test
for this case. We move the decommission_node
call before server_stop/server_start. We need
to add one more server to retain majority when
the node is decommissioned, otherwise the topology
coordinator won't migrate from the stopped node
before replacing it, and we'll get an error.
closes#14803
The replaced node transitions to LEFT state, and
we used to remove the IPs of such nodes from gossiper.
If we replace with same IP, this caused the IP of the
new node to be removed from gossiper.
This problem was fixed by #16820, this commit
adds a regression test for it.
closes#15967
This PR removes the following pages:
- ScyllaDB Open Source Features
- ScyllaDB Enterprise Features
They were outdated, incomplete, and misleading. They were also redundant, as the per-release updates are added as Release Notes.
With this update, the features listed on the removed pages are added under the common page: ScyllaDB Features.
In addition, a reference to the Enterprise-only Features section is added.
Note: No redirections are added because no file paths or URLs are changed with this PR.
Fixes https://github.com/scylladb/scylladb/issues/13485
Refs https://github.com/scylladb/scylladb/issues/16496
(nobackport)
Closesscylladb/scylladb#17150
* github.com:scylladb/scylladb:
Update docs/using-scylla/features.rst
doc: remove the OSS and Enterprise Features pages
This PR:
- Adds the upgrade guide from ScyllaDB Open Source 5.4 to ScyllaDB Enterprise 2024.1. Note: The need to include the "Restore system tables" step in rollback has been confirmed; see https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959.
- Removes the 5.1-to-2022.2 upgrade guide (unsupported versions).
Fixes https://github.com/scylladb/scylladb/issues/16445Closesscylladb/scylladb#16887
* github.com:scylladb/scylladb:
doc: fix the OSS version number
doc: metric updates between 2024.1. and 5.4
doc: remove the 5.1-to-2022.2 upgrade guide
doc: add the 5.4-to-2024.1 upgrade guide
instead of filtering the keyspaces manually, let's reuse
`database::get_non_local_strategy_keyspaces_erms()`. less
repeatings and more future-proof this way.
Fixes#16974
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17121
Validate replication strategy constraints in /storage_service/tablets/move API:
- replicas are not on the same node
- replicas don't move across DC (violates RF in each DC)
- availability is not reduced due to rack overloading
Add flag to force tablet move even though dc/rack constraints aren't fulfilled.
Test for the change: https://github.com/scylladb/scylla-dtest/pull/3911.
Fixes: #16379.
Closesscylladb/scylladb#16648
* github.com:scylladb/scylladb:
api: service: add force param to move_tablet api
service: validate replication strategy constraints
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `dht::ring_position_ext` and
`dht::ring_position_view`, and drop their operator<<.
Refs #13245Closesscylladb/scylladb#17128
* github.com:scylladb/scylladb:
db: add formatter for dht::ring_position_ext
db: add formatter for dht::ring_position_view
This change removes inet_addr_type_impl::to_sstring()
and replaces its usages with fmt::to_string().
The removed helper performed an uneeded copying via
std::ostringstream::str().
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
C++20 introduced a new overload of std::ofstringstream::str()
that is selected when the mentioned member function is called
on r-value.
The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.
This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
After changing `left_token_ring` from a node state to a transition
state in scylladb/scylladb#17009, we do the same for
`rollback_to_normal`. `rollback_to_normal` was created as a node
state because `left_token_ring` was a node state.
This change will allow us to distinguish a failed removenode from
a failed decommission in the `rollback_to_normal` handler.
Currently, we use the same logic for both of them, so it's not
required. However, this might change, as it has happened with the
decommission and the failed bootstrap/replace in the
`left_token_ring` state (scylladb/scylladb#16797). We are making
this change now because it would be much harder after branching.
Fixesscylladb/scylladb#17032Closesscylladb/scylladb#17136
* github.com:scylladb/scylladb:
docs: dev: topology-over-raft: align indentation
docs: dev: topology-over-raft: document the rollback_to_normal state
topology_coordinator: improve logs in rollback_to_normal handler
raft topology: make rollback_to_normal a transition state
In a previous PR (https://github.com/scylladb/scylladb/pull/16840), we enabled tablets by default when running the cql-pytest suite. To handle tests which are failing with tablets enabled, we used a new fixture, `xfail_tablets` to mark these as xfail. This means that we effectively lost test coverage, as these tests can now freely fail and no-one will notice if this is due to a new regression. To restore test coverage, this PR re-enables all the previously disabled tests, by parametrizing each one of them to run with both vnodes and tablets, and targetedly mark as xfail, only the tablet variant. After these tests are fixed with tablets (or the underlying functionality they test is fixed to work with tablets), we will run them with both vnodes and tablets, because these tests apparently *do* care which replication method is used.
Together with https://github.com/scylladb/scylladb/pull/16802, this means all previously disabled test is re-enabled and no coverage is lost.
Closesscylladb/scylladb#16945
* github.com:scylladb/scylladb:
test/cql-pytest: conftest.py: remove xfail_tablets fixture
test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests
test/cql-pytest: test_describe.py: re-enable disabled tests
test/cql-pytest: test_cdc.py: re-enable disabled tests
test/cql-pytest: add parameter support to test_keyspace
Since `t.parallel_foreach_table_state` may yield,
we should access `type` by reference when calling
`stop_compaction` since it is captured by the calling
lambda and gets lost when it returns if
`parallel_foreach_table_state` returns an unavailable
future.
Instead change all captures to `[&]` so we can access
the `type` variable held by the coroutine frame.
Fixes#16975
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#17143