The test function test_mv_synchronous_updates checks the
synchronous_updates feature, which is a ScyllaDB extension and
doesn't exist in Cassandra. So it should be marked with "scylla_only"
so that it doesn't fail when running the tests on Cassandra.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).
When testing some invalid cases of ALTER TABLE, the test required
that you cannot choose SimpleStrategy without specifying a
replication_factor. As explained in Refs #16028, this isn't true
in Cassandra 4.1 and up - it now has a default value for
replication_factor and it's no longer required.
So in this patch we split that part of the test to a separate test
function and mark it scylla_only.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The tests in test/cql-pytest/test_guardrail_replication_strategy.py
are for a Scylla-only feature that doesn't exist in Cassandra, so
obviously they all fail on Cassandra. Let's mark them all as
scylla_only.
We use an autouse fixture to automatically mark all tests in this file
as scylla-only, instead of marking each one separately.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).
This patch is only a partial fix - it fixes trivial differences in error
messages, but some potentially-real differences remain so three of the
tests still fail:
1. Trying to set tombstone_threshold to 5.5 is an error in ScyllaDB
("must be between 0.0 and 1.0") but allowed in Cassandra.
2. Trying to set bucket_low to 0.0 is an error in ScyllaDB, giving the
wrong-looking error message "must be between 0.0 and 1.0" (so 0.0 should
have been fine?!) but allowed in Cassandra.
3. Trying to set timestamp_resolution to SECONDS is an error in ScyllaDB
("invalid timestamp resolution SECONDS") but allowed in Cassandra.
I don't think anybody wants to actually use "SECONDS", but it seems
legal in Cassandra, so do we need to support it?
The patch also simplifies the test to use cql-pytest's util.py, instead
of cassandra_tests/porting.py. The latter was meant to make porting
existing Cassandra tests easier - not for writing new ones - and made
using a regular expression for testing error messages harder so I
switched to using pytest.raises() whose "match=" accepts a regular
expression.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).
It turns out that when the token() function is used with incorrect
parameters (it needs to be passed all partition-key columns), the
error message is different in ScyllaDB and Cassandra. Both are
reasonable error messages, so if we insist on checking the error
message - we should allow both.
Also the same test called its second partition-key column "ck". This
is confusing, because we usually use the name "ck" to refer to a clustering
key. So just for clarity, we change this name to "pk2". This is not a
functional change in the test.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
the "task" fixture is supposed to return a task for test, if it
fails to do so, it would be an issue not directly related to
the test. so let's fail it early.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16042
Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.
Fixes: #16004Closesscylladb/scylladb#16013
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.
Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.
Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.
Additionally, after removing the timeout, we adjust the topology
coordinator. We make it try sending the response (both acceptance
and rejection) only once since we do not care if it fails anymore. We
only need to ensure that the joining node is moved to the left state
if sending fails.
Fixes#15865Closesscylladb/scylladb#15944
* github.com:scylladb/scylladb:
raft topology: fix indentation
raft topology: join: try sending the response only once
raft topology: join: do not time out waiting for the node to be joined
group 0: group0_handshaker: add the abort_source parameter to post_server_start
Since CentOS7 default kernel is too old, has performance issues and also
has some bugs, we have been recommended to use kernel-ml kernel.
Let's check kernel version in scylla_setup and print warning if the
kernel is CentOS7 default one.
related #7365Closesscylladb/scylladb#15705
Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions.
Fix the calculation of `nodes_down` which could count a single node multiple times.
Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode).
Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node).
More details in commit messages.
Ref: https://github.com/scylladb/scylladb/issues/15675Closesscylladb/scylladb#15941
* github.com:scylladb/scylladb:
gossiper: do_shadow_round: increment `nodes_down` in case of timeout
gossiper: do_shadow_round: fix `nodes_down` calculation
storage_service: make shadow round mandatory during bootstrap/replace
gossiper: do_shadow_round: remove default value for nodes param
gossiper: do_shadow_round: remove `fall_back_to_syn_msg`
Currently, it is started/stopped in the streaming/maintenance sg, which
is what the API itself runs in.
Starting the native transport in the streaming sg, will lead to severely
degraded performance, as the streaming sg has significantly less
CPU/disk shares and reader concurrency semaphore resources.
Furthermore, it will lead to multi-paged reads possibly switching
between scheduling groups mid-way, triggering an internal error.
To fix, use `with_scheduling_group()` for both starting and stopping
native transport. Technically, it is only strictly necessary for
starting, but I added it for stop as well for consistency.
Also apply the same treatment to RPC (Thrift). Although no one uses it,
best to fix it, just to be on the safe side.
I think we need a more systematic approach for solving this once and for
all, like passing the scheduling group to the protocol server and have
it switch to it internally. This allows the server to always run on the
correct scheduling group, not depending on the caller to remember using
it. However, I think this is best done in a follow-up, to keep this
critical patch small and easily backportable.
Fixes: #15485Closesscylladb/scylladb#16019
This commit updates the Repair-Based Node
Operations page. In particular:
- Information about RBNO enabled for all
node operations is added (before 5.4, RBNO
was enabled for the replace operation, while
it was experimental for others).
- The content is rewritten to remove redundant
information about previous versions.
The improvement is part of the 5.4 release.
This commit must be backported to branch-5.4
Closesscylladb/scylladb#16015
Recent seastar update included RPC metrics (scylladb/seastar#1753). The
reported metrics groups together sockets based on their "metrics_domain"
configuration option. This patch makes use of this domain to make scylla
metrics sane.
The domain as this patch defines it includes two strings:
First, the datacenter the server lives in. This is because grouping
metrics for connections to different datacenters makes little sense for
several reasons. For example -- packet delays _will_ differ for local-DC
vs cross-DC traffic and mixing those latencies together is pointless.
Another example -- the amount of traffic may also differ for local- vs
cross-DC connections e.g. because of different usage of enryption and/or
compression.
Second, each verb-idx gets its own domain. That's to be able to analyze
e.g. query-related traffic from gossiper one. For that the existing
isolation cookie is taken as is.
Note, that the metrics is _not_ per-server node. So e.g. two gossiper
connections to two different nodes (in one DC) will belong to the same
domain and thus their stats will be summed when reported.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15785
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.
we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15913
There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well
as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many
shards we have without readers on them.
Fixes: https://github.com/scylladb/scylladb/issues/14087Closesscylladb/scylladb#15806
* github.com:scylladb/scylladb:
test/boost/multishard_mutation_query_test: fix querier cache misses expectations
test/lib/test_utils: add require_* variants for all comparators
The polling loop was intended to ignore
`condition_variable_timed_out` and check for progress
using a longer `max_idle_duration` timeout in the loop.
Fixes#15669
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#15671
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response.
In the previous commit, we have made the operator responsible for
shutting down the joining node if the topology coordinator fails
to deliver a response by removing the timeout. In this commit, we
adjust the topology coordinator. We make it try sending the
response (both acceptance and rejection) only once since we do not
care if it fails anymore. We only need to ensure that the joining
node is moved to the left state if sending fails.
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.
Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.
Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.
This change additionally fixes the TODO in
raft_group0::join_group0.
This commit updates the cqlsh compatibility
with Python to Python 3.
In addition it:
- Replaces "Cassandra" with "ScyllaDB" in
the description of cqlsh.
The previous description was outdated, as
we no longer can talk about using cqlsh
released with Cassandra.
- Replaces occurrences of "Scylla" with "ScyllaDB".
- Adds additional locations of cqlsh (Docker Hub
and PyPI), as well as the link to the scylla-cqlsh
repository.
Closesscylladb/scylladb#16016
After 146e49d0dd (Rewrap keyspace population loop) the datadir layout is no longer needed by sstables boot-time loader and finally directories can be omitted for S3-backed keyspaces. Tables of that keyspace don't touch/remove their datadirs either (snapshots still don't work for S3)
fixes: #13020Closesscylladb/scylladb#16007
* github.com:scylladb/scylladb:
test/object_store: Check that keyspace directory doesn't appear
sstables/storage: Do storage init/destroy based on storage options
replica/{ks|cf}: Move storage init/destroy to sstables manager
database: Add get_sstables_manager(bool_class is_system) method
This reverts commit 7c7baf71d5.
If `stop_gracefully` times out during test teardown phase, it crashes
the test framework reporting multiple errors, for example:
```
12:35:52 /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
12:35:52 self.exit_artifacts = {}
12:35:52 RuntimeWarning: Enable tracemalloc to get the object allocation traceback
12:35:52 Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:52 Traceback (most recent call last):
12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for
12:35:52 return fut.result()
12:35:52 ^^^^^^^^^^^^
12:35:52 File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait
12:35:52 return await self._transport._wait()
12:35:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12:35:52 File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait
12:35:52 return await waiter
12:35:52 ^^^^^^^^^^^^
12:35:52 asyncio.exceptions.CancelledError
12:35:52
12:35:52 The above exception was the direct cause of the following exception:
12:35:52
12:35:52 Traceback (most recent call last):
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully
12:35:52 await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS)
12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for
12:35:52 raise exceptions.TimeoutError() from exc
12:35:52 TimeoutError
12:35:52
12:35:52 During handling of the above exception, another exception occurred:
12:35:52
12:35:52 Traceback (most recent call last):
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789
12:35:52 code = await main()
12:35:52 ^^^^^^^^^^^^
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main
12:35:52 await run_all_tests(signaled, options)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests
12:35:52 await reap(done, pending, signaled)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap
12:35:52 result = coro.result()
12:35:52 ^^^^^^^^^^^^^
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run
12:35:52 await test.run(options)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run
12:35:52 async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager:
12:35:52 File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__
12:35:52 await anext(self.gen)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager
12:35:52 await manager.stop()
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop
12:35:52 await self.clusters.put(self.cluster, is_dirty=True)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put
12:35:52 await self.destroy(obj)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster
12:35:52 await cluster.stop_gracefully()
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully
12:35:52 await asyncio.gather(*(server.stop_gracefully() for server in self.running.values()))
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully
12:35:52 raise RuntimeError(
12:35:52 RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited
12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
```
The test for the rollback relies on the log to be there after operation
fails, but if node's state is changed before the log the operation may
fail before the log is printed.
Fixesscylladb/scylladb#15980
Message-ID: <ZUuwoq65SJcS+yTH@scylladb.com>
This PR is a follow-up to https://github.com/scylladb/scylladb/pull/15742#issuecomment-1766888218.
It adds CQL Reference for Materialized Views to the Materialized Views page.
In addition, it removes the irrelevant information about when the feature was added and replaces "Scylla" with "ScyllaDB".
(nobackport)
Closesscylladb/scylladb#15855
* github.com:scylladb/scylladb:
doc: remove versions from Materialized Views
doc: add CQL Reference for Materialized Views
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.
Reverting because rest_api.test_compaction_task started failing after
this was merged.
Fixes: #16005
The use statement execution code can throw if the keyspace is
doesn't exist, this can be a problem for code that will use
execute in a fiber since the exception will break the fiber even
if `then_wrapped` is used.
Fixes#14449
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Closesscylladb/scylladb#14394
When creating a S3-backed keyspace its storage dir shouldn't be made.
Also it shouldn't be "resurrected" by boot-time loader of existing
keyspaces.
For extra confidence check that the system keyspace's directory does
exists where the test expects keyspaces' directories to appear.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's only local storage type that needs directores touch/remove, S3
storage initialization is for now a no-op, maybe some day soon it will
appear.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's the manager that knows about storages and it should init/destroy
it. Also the "upload" and "staging" paths are about to be hidden in
sstables/ code, this code move also facilitates that.
The indentation in storage.cc is deliberately broken to make next patch
look nicer (spoiler: it won't have to shift those lines right).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's one place that does this selection, soon there will appear
another, so it's worth having a convenience helper getter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add a space after each colon and comma (if they don't have any after) in values of table option which are json objects (`caching`, `tombstone_gc` and `cdc`).
This improves readability and matches client-side describe format.
Fixes: #14895Closesscylladb/scylladb#15900
* github.com:scylladb/scylladb:
cql-pytest:test_describe: add test for whitespaces in json objects
schema: add whitespace to description of table options
This commit fixes the information about
Raft-based consistent cluster management
in the 5.2-to-5.4 upgrade guide.
This a follow-up to https://github.com/scylladb/scylladb/pull/15880 and must be backported to branch-5.4.
In addition, it adds information about removing
DateTieredCompactionStrategy to the 5.2-to-5.4
upgrade guide, including the guideline to
migrate to TimeWindowCompactionStrategy.
Closesscylladb/scylladb#15988
`system.raft` was using the "user memory pool", i.e. the
`dirty_memory_manager` for this table was set to
`database::_dirty_memory_manager` (instead of
`database::_system_dirty_memory_manager`).
This meant that if a write workload caused memory pressure on the user
memory pool, internal `system.raft` writes would have to wait for
memtables of user tables to get flushed before the write would proceed.
This was observed in SCT longevity tests which ran a heavy workload on
the cluster and concurrently, schema changes (which underneath use the
`system.raft` table). Raft would often get stuck waiting many seconds
for user memtables to get flushed. More details in issue #15622.
Experiments showed that moving Raft to system memory fixed this
particular issue, bringing the waits to reasonable levels.
Currently `system.raft` stores only one group, group 0, which is
internally used for cluster metadata operations (schema and topology
changes) -- so it makes sense to keep use system memory.
In the future we'd like to have other groups, for strongly consistent
tables. These groups should use the user memory pool. It means we won't
be able to use `system.raft` for them -- we'll just have to use a
separate table.
Fixes: scylladb/scylladb#15622Closesscylladb/scylladb#15972
This PR implements the following new nodetool commands:
* snapshot
* drain
* flush
* disableautocompaction
* enableautocompaction
All commands come with tests and all tests pass with both the new and the current nodetool implementations.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#15939
* github.com:scylladb/scylladb:
test/nodetool: add README.md
tools/scylla-nodetool: implement enableautocompaction command
tools/scylla-nodetool: implement disableautocompaction command
tools/scylla-nodetool: implement the flush command
tools/scylla-nodetool: extract keyspace/table parsing
tools/scylla-nodetool: implement the drain command
tools/scylla-nodetool: implement the snapshot command
test/nodetool: add support for matching aproximate query parameters
utils/http: make dns_connection_factory::initialize() static
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.
The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:
Refs #6442: Always print all schema parameters (including default values)
Refs #8001: Documented unit "µs" not supported for assigning a duration"
type.
Refs #8892: Add an option for default RF for new keyspaces.
Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
for compression settings by default
Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:
1. Tests for CREATE TRIGGER were commented out because testing them
in Cassandra requires adding a Java class for the test. We're also
not likely to ever add this feature to Scylla (Refs #2205).
2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
used internal Java APIs instead of CQL, and it also unlikely
we'll ever implement it in a way compatible with Cassandra because
of its Java reliance.
3. One test for data center names used internal Cassandra Java APIs, not
CQL to create mock data centers and snitches.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#15791
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.
Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.
Closesscylladb/scylladb#15083
* github.com:scylladb/scylladb:
test: test abort of compaction task that isn't started yet
test: test running compaction task abort
tasks: fail if a task was aborted
compaction: abort task manager compaction tasks
Unlike yum, "apt-get install" may fails because package cache is outdated.
Let's check package cache mtime and run "apt-get update" if it's too old.
Fixes#4059Closesscylladb/scylladb#15960
When running on a particularly slow setup, for example on
an ARM machine in debug mode, the execution time of even
a small Lua UDF that we're using in tests may exceed our
default limits.
To avoid timeout errors, the limit in tests is now increased
to a value that won't be exceeded in any reasonable scenario
(for the current set of tested UDFs), while not making the
test take an excessive amount of time in case of an error in
the UDF execution.
Fixes#15977Closesscylladb/scylladb#15983
When topology coordinator tries to fence the previous coordinator it
performs a group0 operation. The current topology coordinator might be
aborted in the meantime, which will result in a `raft::request_aborted`
exception being thrown. After the fix to scylladb/scylladb#15728 was
merged, the exception is caught, but then `sleep_abortable` is called
which immediately throws `abort_requested_exception` as it uses the same
abort source as the group0 operation. The `fence_previous_coordinator`
function which does all those things is not supposed to throw
exceptions, if it does - it causes `raft_state_monitor_fiber` to exit,
completely disabling the topology coordinator functionality on that
node.
Modify the code in the following way:
- Catch `abort_requested_exception` thrown from `sleep_abortable` and
exit the function if it happens. In addition to the described issue,
it will also handle the case when abort is requested while
`sleep_abortable` happens,
- Catch `raft::request_aborted` thrown from group0 operation, log the
exception with lower verbosity and exit the function explicitly.
Finally, wrap both `fence_previous_coordinator` and `run` functions in a
`try` block with `on_fatal_internal_error` in the catch handler in order
to implement the behavior that adding `noexcept` was originally supposed
to introduce.
Fixes: scylladb/scylladb#15747Closesscylladb/scylladb#15948
* github.com:scylladb/scylladb:
raft topology: catch and abort on exceptions from topology_coordinator::run
Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept"
raft topology: don't print an error when fencing previous coordinator is aborted
raft topology: handle abort exceptions from sleeping in fence_previous_coordinator
this series tries to
1. render options with role. so the options can be cross referenced and defined.
2. move the formatting out of the content. so the representation can be defined in a more flexible way.
Closesscylladb/scylladb#15860
* github.com:scylladb/scylladb:
docs: add divider using CSS
docs: extract _clean_description as a filter
docs: render option with role
docs: parse source files right into rst
Having to extract 1 keyspace and N tables from the command-line is
proving to be a common pattern among commands. Extract this into a
method, so the boiler-plate can be shared. Add a forward-looking
overload as well, which will be used in the next patch.