From 2579ef10938273cbf5707d9021e14ec80cdab1bf Mon Sep 17 00:00:00 2001 From: Lyndon-Li Date: Tue, 18 Nov 2025 18:25:41 +0800 Subject: [PATCH 1/3] doc for cache volume Signed-off-by: Lyndon-Li --- changelogs/unreleased/9418-Lyndon-Li | 1 + .../docs/main/csi-snapshot-data-movement.md | 7 ++- .../docs/main/data-movement-cache-volume.md | 46 +++++++++++++++++++ site/content/docs/main/file-system-backup.md | 9 ++-- site/data/docs/main-toc.yml | 8 ++-- 5 files changed, 64 insertions(+), 7 deletions(-) create mode 100644 changelogs/unreleased/9418-Lyndon-Li create mode 100644 site/content/docs/main/data-movement-cache-volume.md diff --git a/changelogs/unreleased/9418-Lyndon-Li b/changelogs/unreleased/9418-Lyndon-Li new file mode 100644 index 000000000..7fa2c380e --- /dev/null +++ b/changelogs/unreleased/9418-Lyndon-Li @@ -0,0 +1 @@ +Fix issue #9276, add doc for cache volume support \ No newline at end of file diff --git a/site/content/docs/main/csi-snapshot-data-movement.md b/site/content/docs/main/csi-snapshot-data-movement.md index 3e96dcd6a..a83cbac49 100644 --- a/site/content/docs/main/csi-snapshot-data-movement.md +++ b/site/content/docs/main/csi-snapshot-data-movement.md @@ -376,7 +376,10 @@ For Velero built-in data mover, Velero uses [BestEffort as the QoS][13] for data If you want to constraint the CPU/memory usage, you need to [Customize Data Mover Pod Resource Limits][11]. The CPU/memory consumption is always related to the scale of data to be backed up/restored, refer to [Performance Guidance][12] for more details, so it is highly recommended that you perform your own testing to find the best resource limits for your data. During the restore, the repository may also cache data/metadata so as to reduce the network footprint and speed up the restore. The repository uses its own policy to store and clean up the cache. -For Kopia repository, the cache is stored in the data mover pod's root file system. Velero allows you to configure a limit of the cache size so that the data mover pod won't be evicted due to running out of the ephemeral storage. For more details, check [Backup Repository Configuration][17]. +For Kopia repository, by default, the cache is stored in the data mover pod's root file system. If your root file system space is limited, the data mover pods may be evicted due to running out of the ephemeral storage, which causes the restore fails. To cope with this problem, Velero allows you: +- configure a limit of the cache size per backup repository, for more details, check [Backup Repository Configuration][17]. +- configure a dedicated volume for cache data, for more details, check [Data Movement Cache Volume][21]. + ### Node Selection @@ -416,4 +419,6 @@ Sometimes, `RestorePVC` needs to be configured to increase the performance of re [18]: https://github.com/vmware-tanzu/velero/pull/7576 [19]: data-movement-restore-pvc-configuration.md [20]: node-agent-prepare-queue-length.md +[21]: data-movement-cache-volume.md + diff --git a/site/content/docs/main/data-movement-cache-volume.md b/site/content/docs/main/data-movement-cache-volume.md new file mode 100644 index 000000000..c23f9efb1 --- /dev/null +++ b/site/content/docs/main/data-movement-cache-volume.md @@ -0,0 +1,46 @@ +--- +title: "Cache PVC Configuration for Data Movement Restore" +layout: docs +--- + +Velero data movement restore (i.e., for CSI snapshot data movement and fs-backup) may request the backup repository to cache data locally so as to reduce the data request from the remote backup storage. +The cache behavior is decided by the specific backup repository, and Velero allows you to configure a cache limit for the backup repositories who support it (i.e., kopia repository). For more details, see [Backup Repository Configuration][1]. +The size of cache may significantly impact on the performance. Specifically, if the cache size is too small, the restore throughput will be severely reduced and much more data would be downloaded from the backup storage. +By default, the cache data location is in the data mover pods' root disk. In some environments, the pods's root disk size is very limited, some a large cache size would cause the data mover pods evicted because of running out of ephemeral disk. + +To cope with the problems and guarantee the data mover pods always run with a fine tuned local cache, Velero supports dedicated cache PVCs for data movement restore, for CSI snapshot data movement and fs-backup. + +By default, Velero data mover pods run without cache PVCs. To enable cache PVC, you need to fill the cache PVC configurations in the node-agent configMap. + +A sample of cache PVC configuration as part of the ConfigMap would look like: +```json +{ + "cachePVC": { + "thresholdInGB": 1, + "storageClass": "sc-wffc" + } +} +``` + +To create the configMap, save something like the above sample to a file and then run below commands: +```shell +kubectl create cm node-agent-config -n velero --from-file= +``` + +A must-have field in the configuration is `storageClass` which tells Velero which storage class is used to provision the cache PVC. Velero relies on Kubernetes dynamic provision process to provision the PVC, static provision is not supported. + +The cache PVC behavior could be further fine tuned through `thresholdInGB`. Its value is compared to the size of the backup, if the size is smaller than this value, no cache PVC would be created when restoring from the backup. This ensures that cache PVCs are not created in vain when the backup size is too small and can be accommodated in the data mover pods' root disk. + +This configuration decides whether and how to provision cache PVCs, but it doesn't decide their size. Instead, the size is decided by the specific backup repository. Specifically, Velero asks a cache limit from the backup repository and uses this limit to calculate the cache PVC size. +The cache limit is decided by the backup repository itself, for Kopia repository, if `cacheLimitMB` is specified in the backup repository configuration, its value will be used; otherwise, a default limit (5 GB) is used. +Then Velero inflates the limit with 20% by considering the non-payload overheads and delay cache cleanup behavior varying on backup repositories. + +Take Kopia repository and the above cache PVC configuration for example: +- When `cacheLimitMB` is not available for the repository, a 6GB cache PVC is created for the backup that is larger than 1GB; otherwise, no cache volume is created +- When `cacheLimitMB` is specified as `10240` for the repository, a 12GB cache PVC is created for the backup that is larger than 1GB; otherwise, no cache volume is created + +To enable both the node-agent configMap and backup repository configMap, specify the flags in velero installation by CLI: +`velero install --node-agent-configmap= --backup-repository-configmap=` + + +[1]: backup-repository-configuration.md \ No newline at end of file diff --git a/site/content/docs/main/file-system-backup.md b/site/content/docs/main/file-system-backup.md index b4d904f1b..f74571b4c 100644 --- a/site/content/docs/main/file-system-backup.md +++ b/site/content/docs/main/file-system-backup.md @@ -693,7 +693,7 @@ spec: ## Priority Class Configuration -For Velero built-in data mover, data mover pods launched during file system backup will use the priority class name configured in the node-agent configmap. The node-agent daemonset itself gets its priority class from the `--node-agent-priority-class-name` flag during Velero installation. This can help ensure proper scheduling behavior in resource-constrained environments. For more details on configuring data mover pod resources, see [Data Movement Pod Resource Configuration][data-movement-config]. +For Velero built-in data mover, data mover pods launched during file system backup will use the priority class name configured in the node-agent configmap. The node-agent daemonset itself gets its priority class from the `--node-agent-priority-class-name` flag during Velero installation. This can help ensure proper scheduling behavior in resource-constrained environments. For more details on configuring data mover pod resources, see [Data Movement Pod Resource Configuration][21]. ## Resource Consumption @@ -712,7 +712,9 @@ totalPreservedMemory = (128M + 24M * numOfCPUCores) * numOfWorkerNodes However, whether and when this limit is reached is related to the data you are backing up/restoring. During the restore, the repository may also cache data/metadata so as to reduce the network footprint and speed up the restore. The repository uses its own policy to store and clean up the cache. -For Kopia repository, the cache is stored in the node-agent pod's root file system. Velero allows you to configure a limit of the cache size so that the node-agent pod won't be evicted due to running out of the ephemeral storage. For more details, check [Backup Repository Configuration][18]. +For Kopia repository, by default, the cache is stored in the data mover pod's root file system. If your root file system space is limited, the data mover pods may be evicted due to running out of the ephemeral storage, which causes the restore fails. To cope with this problem, Velero allows you: +- configure a limit of the cache size per backup repository, for more details, check [Backup Repository Configuration][18]. +- configure a dedicated volume for cache data, for more details, check [Data Movement Cache Volume][22]. ## Restic Deprecation @@ -766,4 +768,5 @@ Velero still effectively manage restic repository, though you cannot write any n [18]: backup-repository-configuration.md [19]: node-agent-concurrency.md [20]: node-agent-prepare-queue-length.md -[data-movement-config]: data-movement-pod-resource-configuration.md +[21]: data-movement-pod-resource-configuration.md +[22]: data-movement-cache-volume.md diff --git a/site/data/docs/main-toc.yml b/site/data/docs/main-toc.yml index 8b93bc400..a8ff1a739 100644 --- a/site/data/docs/main-toc.yml +++ b/site/data/docs/main-toc.yml @@ -44,9 +44,7 @@ toc: - page: Restore Resource Modifiers url: /restore-resource-modifiers - page: Run in any namespace - url: /namespace - - page: File system backup - url: /file-system-backup + url: /namespace - page: CSI Support url: /csi - page: Volume Group Snapshots @@ -67,6 +65,8 @@ toc: subfolderitems: - page: CSI Snapshot Data Mover url: /csi-snapshot-data-movement + - page: File system backup + url: /file-system-backup - page: Data Movement Backup PVC Configuration url: /data-movement-backup-pvc-configuration - page: Data Movement Restore PVC Configuration @@ -75,6 +75,8 @@ toc: url: /data-movement-pod-resource-configuration - page: Data Movement Node Selection Configuration url: /data-movement-node-selection + - page: Data Movement Cache PVC Configuration + url: /data-movement-cache-volume.md - page: Node-agent Concurrency url: /node-agent-concurrency - title: Plugins From 39892abef232ef49a549f880b685282565cea18e Mon Sep 17 00:00:00 2001 From: lyndon-li <98304688+Lyndon-Li@users.noreply.github.com> Date: Wed, 19 Nov 2025 10:50:16 +0800 Subject: [PATCH 2/3] Update site/content/docs/main/data-movement-cache-volume.md Co-authored-by: Tiger Kaovilai Signed-off-by: lyndon-li <98304688+Lyndon-Li@users.noreply.github.com> --- site/content/docs/main/data-movement-cache-volume.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/content/docs/main/data-movement-cache-volume.md b/site/content/docs/main/data-movement-cache-volume.md index c23f9efb1..b86c10edc 100644 --- a/site/content/docs/main/data-movement-cache-volume.md +++ b/site/content/docs/main/data-movement-cache-volume.md @@ -6,7 +6,7 @@ layout: docs Velero data movement restore (i.e., for CSI snapshot data movement and fs-backup) may request the backup repository to cache data locally so as to reduce the data request from the remote backup storage. The cache behavior is decided by the specific backup repository, and Velero allows you to configure a cache limit for the backup repositories who support it (i.e., kopia repository). For more details, see [Backup Repository Configuration][1]. The size of cache may significantly impact on the performance. Specifically, if the cache size is too small, the restore throughput will be severely reduced and much more data would be downloaded from the backup storage. -By default, the cache data location is in the data mover pods' root disk. In some environments, the pods's root disk size is very limited, some a large cache size would cause the data mover pods evicted because of running out of ephemeral disk. +By default, the cache data location is in the data mover pods' root disk. In some environments, the pods' root disk size is very limited, so a large cache size would cause the data mover pods evicted because of running out of ephemeral disk. To cope with the problems and guarantee the data mover pods always run with a fine tuned local cache, Velero supports dedicated cache PVCs for data movement restore, for CSI snapshot data movement and fs-backup. From 9dc27555bc1d2027ee75ce17bfd87b40658815df Mon Sep 17 00:00:00 2001 From: lyndon-li <98304688+Lyndon-Li@users.noreply.github.com> Date: Wed, 19 Nov 2025 10:50:25 +0800 Subject: [PATCH 3/3] Update site/content/docs/main/data-movement-cache-volume.md Co-authored-by: Tiger Kaovilai Signed-off-by: lyndon-li <98304688+Lyndon-Li@users.noreply.github.com> --- site/content/docs/main/data-movement-cache-volume.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/content/docs/main/data-movement-cache-volume.md b/site/content/docs/main/data-movement-cache-volume.md index b86c10edc..95ba80c1b 100644 --- a/site/content/docs/main/data-movement-cache-volume.md +++ b/site/content/docs/main/data-movement-cache-volume.md @@ -33,7 +33,7 @@ The cache PVC behavior could be further fine tuned through `thresholdInGB`. Its This configuration decides whether and how to provision cache PVCs, but it doesn't decide their size. Instead, the size is decided by the specific backup repository. Specifically, Velero asks a cache limit from the backup repository and uses this limit to calculate the cache PVC size. The cache limit is decided by the backup repository itself, for Kopia repository, if `cacheLimitMB` is specified in the backup repository configuration, its value will be used; otherwise, a default limit (5 GB) is used. -Then Velero inflates the limit with 20% by considering the non-payload overheads and delay cache cleanup behavior varying on backup repositories. +Then Velero inflates the limit by 20% by considering the non-payload overheads and delay cache cleanup behavior varying on backup repositories. Take Kopia repository and the above cache PVC configuration for example: - When `cacheLimitMB` is not available for the repository, a 6GB cache PVC is created for the backup that is larger than 1GB; otherwise, no cache volume is created