mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-12 19:02:12 +00:00
Docs: Describe driver issue with tablet RF increase
Current protocol extension that sends tablet info to drivers only does that if the driver selects a non-replica coordinator for a routable request. It works well if some node on the replica list is replaced by other node, or if some replicas are removed from the list. Driver will at some point send a request to stale replica, and receive new list in response. The issue is with extending the list with new replicas. In that case old replicas are all still correct, so driver will not select any wrong replica, and will not receive the new list. As far as I know that only scenario where this could happen is RF increase. It could be to some degree worked around in the drivers, but it would add significant complexity (definitely more than any other invalidations we introduced) while still not being ideal solution. This scenario should be rare enough, and the consequences of not handling it minor enough (new replicas not being used as coordinators) that it does not warrant driver-side solution. Instead this commit adds info about this to documentation, advising users to restart applications after replica lists are extended. It is worth noting that if new tablet feedback protocol extension is implemented then this problem goes away. See issue #21664. Closes scylladb/scylladb#23447
This commit is contained in:
committed by
Tomasz Grabiec
parent
cf11d5eb69
commit
df64985a4e
@@ -302,6 +302,7 @@ Modifying a keyspace with tablets enabled is possible and doesn't require any sp
|
||||
- The ``ALTER`` statement may take longer than the regular query timeout, and even if it times out, it will continue to execute in the background.
|
||||
- The replication strategy cannot be modified, as keyspaces with tablets only support ``NetworkTopologyStrategy``.
|
||||
- The ``ALTER`` statement will fail if it would make the keyspace :term:`RF-rack-invalid <RF-rack-valid keyspace>`.
|
||||
- After the ``ALTER`` statement that increases the RF finishes, client applications should be restarted. Without a restart, drivers will not know about new replicas, which may cause request imbalance.
|
||||
|
||||
.. _drop-keyspace-statement:
|
||||
|
||||
|
||||
@@ -9,13 +9,16 @@ How to Safely Increase the Replication Factor
|
||||
**Audience: ScyllaDB administrators**
|
||||
|
||||
|
||||
Issue
|
||||
-----
|
||||
Issues
|
||||
------
|
||||
|
||||
When a Replication Factor (RF) is increased, using the :ref:`ALTER KEYSPACE <alter-keyspace-statement>` command, the data consistency is effectively dropped
|
||||
by the difference of the RF_new value and the RF_old value for all pre-existing data.
|
||||
Consistency will only be restored after running a repair.
|
||||
|
||||
Another issue occurs in keyspaces with tablets enabled and is driver-related. Due to limitations in the current protocol used to pass tablet data to drivers, drivers will not pick
|
||||
up new replicas after replication factor is increased. This will cause them to avoid routing requests to those replicas, causing imbalance.
|
||||
|
||||
Resolution
|
||||
----------
|
||||
|
||||
@@ -27,6 +30,8 @@ As a result, in order to make sure that you can keep on reading the old data wit
|
||||
|
||||
After you run a repair, you can decrease the CL. If RF has only been changed in a particular Data Center (DC) only the nodes in that DC have to be repaired.
|
||||
|
||||
To resolve the driver-related issue, restart the client applications after the ALTER statement that changes the RF completes successfully.
|
||||
|
||||
Example
|
||||
=======
|
||||
|
||||
|
||||
Reference in New Issue
Block a user