When changing configuration, don't pause and restart all tickers.
Only do it for the specific server(s) being removed.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Only pause and restart tickers for servers in configuration.
Currently when a server is taken out it's reset and a new one is set up,
but out of configuration. @gleb-cloudious requested to have fully
stopped servers when out of configuration, until they are re-added.
This change is needed to allow that or else restart would arm tickers on
servers no longer present.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Do verifications in raft_cluster::verify().
This will enable having persisted snapshots inside the class and
de-clutter caller code.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Keep snapshots inside raft_cluster, removing this need outside.
If this is needed later, a const getter can be implemented.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Since create_server() is in raft_cluster, there's no need for
change_configuration() to pass total values anymore. Remove it.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Do wait_log() for the next leader always in elect_new_leader.
Only wait log for new leader if it's connected to the old leader.
Pause and restart tickers when creating a candidate to avoid another
node to become a dueling candidate.
Remove pause and restart tickers around calls to elect_new_leader.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Remove the global create_raft_server() and replace with a
create_server() helper in replication_test().
This will allow not requiring the user of raft_cluster to create special
objects.
Note this does not move(apply) anymore as it's kept in raft_cluster.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Add a helper to reset a server in raft_cluster.
Besides simplifying code and preventing errors, this will help move
create_raft_server logic to raft_cluster.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Move tickers to raft_cluster helper class. Ticker initialization and
pause is done automatically at start_all() and stop_all().
Add temporary helpers to manage specific tickers. These might be removed
later once proper node abort and reset are implemented.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
raft_cluster at the moment only allows sequential 0 based ids.
The code was generating ids over this and causing problems for code
changes.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Style.
Move definition of add_entry and add_remaining_entries with the rest of
raft_cluster definitions.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Helper to calculate what's the value number to be added after snapshot
and leader initial log.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
For rpc tests, use raft_cluster::disconnect() instead of the local
connected reference.
This removes connected object use outside raft_cluster.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Add connectivity helpers disconnect(server, except) and connect_all() to
so users of raft_cluster don't need to keep the a connectivity object
pointer.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
When there's a defined next leader, only wait for log propagation for
this follower.
Splits wait_log() to waiting for one follower with wait_log() and
waiting for all followers with wait_log().
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Remove log wait after adding entries. It was added to handle some debug
hangs but it is not good for testing.
There are already wait logs at proper code locations.
(e.g. elect_new_leader, partition)
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
When user requests repair to be forcefully aborted, the `_abort_all_as`
abort source could be modified from multiple shards in parallel by the
`tracker::abort_all_repairs()` function, which can lead to undefined
behavior and to a crash. This commit makes sure that `_abort_all_as` is
used only from shard 0 when repair is aborted.
Fixes#8693Closes#8734
The new process has the following differences from the Dockerfile
based image:
- Using buildah commands instead of a Dockerfile. This is more flexible
since we don't need to pack everything into a "build context" and
transfer it to the container; instead we interact with the container
as we build it.
- Using packages instead of a remote yum repository. This makes it
easy to create an image in one step (no need to create a repository,
promote, then download the packages back via yum. It means that
the image cannot be upgraded via yum, but container images are
usually just replaced with a new version.
- Build output is an OCI archive (e.g. a tarball), not a docker image
in a local repoistory. This means the build process can later be
integrated into ninja, since the artifact is just a file. The file
can be uploaded into a repository or made available locally with
skopeo.
- any build mode is supported, not just release. This can be used
for quick(er) testing with dev mode.
I plan to integrate it further into the build system, but currently
this is blocked on a buildah bug [1].
[1] https://github.com/containers/buildah/issues/3262Closes#8730
The value of a frozen collection may only be indexed (using a secondary
index) in full - it is not allowed to index only the keys for example -
"CREATE INDEX idx ON table (keys(v))" is not allowed.
The error message referred to a frozen<map>, but the problem can happen
on any frozen collection (e.g., a frozen set), not just a frozen map,
so can be confusing to a user who used a frozen set, and getting an
error about a frozen map.
So this patch fixes the error message to refer to a "frozen collection".
Note that the Cassandra error message in this case is different - it
reads: "Frozen collections are immutable and must be fully indexed".
Fixes#8744.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529094056.825117-1-nyh@scylladb.com>