issue 9065: add doc for node-agent prepare queue length

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
This commit is contained in:
Lyndon-Li
2025-07-28 14:11:46 +08:00
parent fb6ff2aa66
commit 1cd2a228ad
4 changed files with 60 additions and 5 deletions

View File

@@ -290,7 +290,9 @@ In which manner the `DataUpload`/`DataDownload` CRs are processed is totally dec
For Velero built-in data mover, it uses Kubernetes' scheduler to mount a snapshot volume/restore volume associated to a `DataUpload`/`DataDownload` CR into a specific node, and then the `DataUpload`/`DataDownload` controller (in node-agent daemonset) in that node will handle the `DataUpload`/`DataDownload`.
By default, a `DataUpload`/`DataDownload` controller in one node handles one request at a time. You can configure more parallelism per node by [node-agent Concurrency Configuration][14].
That is to say, the snapshot volumes/restore volumes may spread in different nodes, then their associated `DataUpload`/`DataDownload` CRs will be processed in parallel; while for the snapshot volumes/restore volumes in the same node, by default, their associated `DataUpload`/`DataDownload` CRs are processed sequentially and can be processed concurrently according to your [node-agent Concurrency Configuration][14].
That is to say, the snapshot volumes/restore volumes may spread in different nodes, then their associated `DataUpload`/`DataDownload` CRs will be processed in parallel; while for the snapshot volumes/restore volumes in the same node, by default, their associated `DataUpload`/`DataDownload` CRs are processed sequentially and can be processed concurrently according to your [node-agent Concurrency Configuration][14].
The prepare process of mounting the snapshot volume/restore volume may generate multiple intermediate objects, to make a control of the intermediate objects, you can configure the [node-agent Prepare Queue Length][20].
You can check in which node the `DataUpload`/`DataDownload` CRs are processed and their parallelism by watching the `DataUpload`/`DataDownload` CRs:
@@ -404,4 +406,5 @@ Sometimes, `RestorePVC` needs to be configured to increase the performance of re
[17]: backup-repository-configuration.md
[18]: https://github.com/vmware-tanzu/velero/pull/7576
[19]: data-movement-restore-pvc-configuration.md
[20]: node-agent-prepare-queue-length.md

View File

@@ -3,9 +3,11 @@ title: "Data Movement Pod Resource Configuration"
layout: docs
---
During [CSI Snapshot Data Movement][1], Velero built-in data mover launches data mover pods to to run the data transfer. While the data transfer is a time and resource consuming activity.
During [CSI Snapshot Data Movement][1], Velero built-in data mover launches data mover pods to run the data transfer.
During [fs-backup][2], Velero also launches data mover pods to run the data transfer.
The data transfer is a time and resource consuming activity.
Velero built-in data mover by default uses the [BestEffort QoS][2] for the data mover pods, which guarantees the best performance of the data movement activities. On the other hand, it may take lots of cluster resource, i.e., CPU, memory, and how many resources are taken is decided by the concurrency and the scale of data to be moved.
Velero by default uses the [BestEffort QoS][2] for the data mover pods, which guarantees the best performance of the data movement activities. On the other hand, it may take lots of cluster resource, i.e., CPU, memory, and how many resources are taken is decided by the concurrency and the scale of data to be moved.
If the cluster nodes don't have sufficient resource, Velero also allows you to customize the resources for the data mover pods.
Note: If less resources are assigned to data mover pods, the data movement activities may take longer time; or the data mover pods may be OOM killed if the assigned memory resource doesn't meet the requirements. Consequently, the dataUpload/dataDownload may run longer or fail.
@@ -52,5 +54,6 @@ spec:
```
[1]: csi-snapshot-data-movement.md
[2]: https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/
[3]: performance-guidance.md
[2]: file-system-backup.md
[3]: https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/
[4]: performance-guidance.md

View File

@@ -0,0 +1,48 @@
---
title: "Node-agent Prepare Queue Length"
layout: docs
---
During [CSI Snapshot Data Movement][1], Velero built-in data mover launches data mover pods to run the data transfer.
During [fs-backup][2], Velero also launches data mover pods to run the data transfer.
Other intermediate resources may also be created along with the data mover pods, i.e., PVCs, VolumeSnapshots, VolumeSnapshotContents, etc.
Velero uses [node-agent Concurrency Configuration][3] to control the number of concurrent data transfer activities across the nodes, by default, the concurrency is 1 per node.
when the parallelism across the available nodes are much lower than the total number of volumes to be backed up/restored, the intermediate objects may exist for much longer time unnecessarily, which takes unnecessary resources from the cluster.
The available nodes are decided by various factors, e.g., node OS type (linux or Windows), [Node Selection][4] (for CSI Snapshot Data Movement only), etc.
Velero allows you to configure the `prepareQueueLength` in node-agent Configuration, which defines the maximum number of `DataUpload`/`DataDownload`/`PodVolumeBackup`/`PodVolumeRestore` CRs under the preparation statuses but are not yet processed by any node (e.g., in phases of `Accepted`, `Prepared`). In this way, the number of intermediate objects are constrained.
### Sample
Here is a sample of the configMap with ```prepareQueueLength```:
```json
{
"prepareQueueLength": 10
}
```
To create the configMap, save something like the above sample to a json file and then run below command:
```
kubectl create cm node-agent-config -n velero --from-file=<json file name>
```
To provide the configMap to node-agent, edit the node-agent daemonset and add the ```- --node-agent-config``` argument to the spec:
1. Open the node-agent daemonset spec
```
kubectl edit ds node-agent -n velero
```
2. Add ```- --node-agent-config``` to ```spec.template.spec.containers```
```
spec:
template:
spec:
containers:
- args:
- --node-agent-config=<configMap name>
```
[1]: csi-snapshot-data-movement.md
[2]: file-system-backup.md
[3]: node-agent-concurrency.md
[4]: data-movement-node-selection.md