mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-12 19:02:12 +00:00
doc: add the tablet limitation to the manual recovery procedure
This commit adds the information that the manual recovery procedure is not supported if tablets are enabled. In addition, the content in the Manual Recovery Procedure is reorganized by adding the Prerequisites and Procedure subsections - in this way, we can limit the number of Note and Warning boxes that made the page hard to follow. Fixes https://github.com/scylladb/scylladb/issues/18895 Closes scylladb/scylladb#18935
This commit is contained in:
committed by
Kamil Braun
parent
2bfdb1b583
commit
cfa3cd4c94
@@ -78,16 +78,10 @@ You can follow the manual recovery procedure when:
|
||||
**irrecoverable** nodes. If possible, restart your nodes, and use the manual
|
||||
recovery procedure as a last resort.
|
||||
|
||||
.. note::
|
||||
.. warning::
|
||||
|
||||
Before proceeding, make sure that the irrecoverable nodes are truly dead, and not,
|
||||
for example, temporarily partitioned away due to a network failure. If it is
|
||||
possible for the 'dead' nodes to come back to life, they might communicate and
|
||||
interfere with the recovery procedure and cause unpredictable problems.
|
||||
|
||||
If you have no means of ensuring that these irrecoverable nodes won't come back
|
||||
to life and communicate with the rest of the cluster, setup firewall rules or otherwise
|
||||
isolate your alive nodes to reject any communication attempts from these dead nodes.
|
||||
The manual recovery procedure is not supported :doc:`if tablets are enabled on any of your keyspaces </architecture/tablets/>`.
|
||||
In such a case, you need to :doc:`restore from backup </operating-scylla/procedures/backup-restore/restore>`.
|
||||
|
||||
During the manual recovery procedure you'll enter a special ``RECOVERY`` mode, remove
|
||||
all faulty nodes (using the standard :doc:`node removal procedure </operating-scylla/procedures/cluster-management/remove-node/>`),
|
||||
@@ -97,15 +91,26 @@ perform the Raft upgrade procedure again, initializing the Raft algorithm from s
|
||||
The manual recovery procedure is applicable both to clusters that were not running Raft
|
||||
in the past and then had Raft enabled, and to clusters that were bootstrapped using Raft.
|
||||
|
||||
.. note::
|
||||
**Prerequisites**
|
||||
|
||||
Entering ``RECOVERY`` mode requires a node restart. Restarting an additional node while
|
||||
some nodes are already dead may lead to unavailability of data queries (assuming that
|
||||
you haven't lost it already). For example, if you're using the standard RF=3,
|
||||
CL=QUORUM setup, and you're recovering from a stuck of upgrade procedure because one
|
||||
of your nodes is dead, restarting another node will cause temporary data query
|
||||
unavailability (until the node finishes restarting). Prepare your service for
|
||||
downtime before proceeding.
|
||||
* Before proceeding, make sure that the irrecoverable nodes are truly dead, and not,
|
||||
for example, temporarily partitioned away due to a network failure. If it is
|
||||
possible for the 'dead' nodes to come back to life, they might communicate and
|
||||
interfere with the recovery procedure and cause unpredictable problems.
|
||||
|
||||
If you have no means of ensuring that these irrecoverable nodes won't come back
|
||||
to life and communicate with the rest of the cluster, setup firewall rules or otherwise
|
||||
isolate your alive nodes to reject any communication attempts from these dead nodes.
|
||||
|
||||
* Prepare your service for downtime before proceeding.
|
||||
Entering ``RECOVERY`` mode requires a node restart. Restarting an additional node while
|
||||
some nodes are already dead may lead to unavailability of data queries (assuming that
|
||||
you haven't lost it already). For example, if you're using the standard RF=3,
|
||||
CL=QUORUM setup, and you're recovering from a stuck upgrade procedure because one
|
||||
of your nodes is dead, restarting another node will cause temporary data query
|
||||
unavailability (until the node finishes restarting).
|
||||
|
||||
**Procedure**
|
||||
|
||||
#. Perform the following query on **every alive node** in the cluster, using e.g. ``cqlsh``:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user