Merge branch 'main' into upgrade-to-1.17-doc

2025-12-23 06:15:21 +00:00 · 2025-08-07 15:06:20 +08:00
parent f60dce06f2 0c4055c2c0
commit 5e3ae2a886
51 changed files with 2269 additions and 368 deletions
--- a/site/content/docs/main/basic-install.md
+++ b/site/content/docs/main/basic-install.md
@@ -4,7 +4,7 @@ layout: docs
 ---

 Use this doc to get a basic installation of Velero.
-Refer [this document](customize-installation.md) to customize your installation.
+Refer [this document](customize-installation.md) to customize your installation, including setting priority classes for Velero components.

 ## Prerequisites

--- a/site/content/docs/main/csi-snapshot-data-movement.md
+++ b/site/content/docs/main/csi-snapshot-data-movement.md
@@ -18,7 +18,11 @@ On the other hand, there are quite some cases that CSI snapshot is not available

 CSI Snapshot Data Movement supports both built-in data mover and customized data movers. For the details of how Velero works with customized data movers, check the [Volume Snapshot Data Movement design][1]. Velero provides a built-in data mover which uses Velero built-in uploaders (at present the available uploader is Kopia uploader) to read the snapshot data and write to the Unified Repository (by default implemented by Kopia repository).    

-Velero built-in data mover restores both volume data and metadata, so the data mover pods need to run as root user.  
+Velero built-in data mover restores both volume data and metadata, so the data mover pods need to run as root user.
+
+### Priority Class Configuration
+
+For Velero built-in data mover, data mover pods launched during CSI snapshot data movement will use the priority class name configured in the node-agent configmap. The node-agent daemonset itself gets its priority class from the `--node-agent-priority-class-name` flag during Velero installation. This can help ensure proper scheduling behavior in resource-constrained environments. For more details on configuring data mover pod resources, see [Data Movement Pod Resource Configuration][11].

 ## Setup CSI Snapshot Data Movement

@@ -364,7 +368,7 @@ At present, Velero doesn't allow to set `ReadOnlyRootFileSystem` parameter to da
 Both the uploader and repository consume remarkable CPU/memory during the backup/restore, especially for massive small files or large backup size cases.  

 For Velero built-in data mover, Velero uses [BestEffort as the QoS][13] for data mover pods (so no CPU/memory request/limit is set), so that backups/restores wouldn't fail due to resource throttling in any cases.  
-If you want to constraint the CPU/memory usage, you need to [Customize Data Mover Pod Resource Limits][11]. The CPU/memory consumption is always related to the scale of data to be backed up/restored, refer to [Performance Guidance][12] for more details, so it is highly recommended that you perform your own testing to find the best resource limits for your data.   
+If you want to constraint the CPU/memory usage, you need to [Customize Data Mover Pod Resource Limits][11]. The CPU/memory consumption is always related to the scale of data to be backed up/restored, refer to [Performance Guidance][12] for more details, so it is highly recommended that you perform your own testing to find the best resource limits for your data.  

 During the restore, the repository may also cache data/metadata so as to reduce the network footprint and speed up the restore. The repository uses its own policy to store and clean up the cache.  
 For Kopia repository, the cache is stored in the data mover pod's root file system. Velero allows you to configure a limit of the cache size so that the data mover pod won't be evicted due to running out of the ephemeral storage. For more details, check [Backup Repository Configuration][17]. 
--- a/site/content/docs/main/customize-installation.md
+++ b/site/content/docs/main/customize-installation.md
@@ -96,6 +96,53 @@ Note that if you specify `--colorized=true` as a CLI option it will override
 the config file setting.


+## Set priority class names for Velero components
+
+You can set priority class names for different Velero components during installation. This allows you to influence the scheduling and eviction behavior of Velero pods, which can be useful in clusters where resource contention is high.
+
+### Priority class configuration options:
+
+1. **Velero server deployment**: Use the `--server-priority-class-name` flag
+2. **Node agent daemonset**: Use the `--node-agent-priority-class-name` flag
+3. **Data mover pods**: Configure through the node-agent configmap (see below)
+4. **Maintenance jobs**: Configure through the repository maintenance job configmap (see below)
+
+```bash
+velero install \
+    --server-priority-class-name=<SERVER_PRIORITY_CLASS> \
+    --node-agent-priority-class-name=<NODE_AGENT_PRIORITY_CLASS>
+```
+
+### Configuring priority classes for data mover pods and maintenance jobs
+
+For data mover pods and maintenance jobs, priority classes are configured through ConfigMaps that must be created before installation:
+
+**Data mover pods** (via node-agent configmap):
+```bash
+kubectl create configmap node-agent-config -n velero --from-file=config.json=/dev/stdin <<EOF
+{
+    "priorityClassName": "low-priority"
+}
+EOF
+
+velero install --node-agent-configmap node-agent-config # ... other flags
+```
+
+**Maintenance jobs** (via repository maintenance job configmap):
+```bash
+kubectl create configmap repo-maintenance-job-config -n velero --from-file=config.json=/dev/stdin <<EOF
+{
+    "global": {
+        "priorityClassName": "low-priority"
+    }
+}
+EOF
+
+velero install --repo-maintenance-job-configmap repo-maintenance-job-config # ... other flags
+```
+
+Note that you need to create the priority classes before installing Velero. For more information on priority classes, see the [Kubernetes documentation on Pod Priority and Preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/).
+
 ## Customize resource requests and limits

 At installation, You could set resource requests and limits for the Velero pod, the node-agent pod and the [repository maintenance job][14], if you are using the [File System Backup][3] or [CSI Snapshot Data Movement][12].  
--- a/site/content/docs/main/data-movement-pod-resource-configuration.md
+++ b/site/content/docs/main/data-movement-pod-resource-configuration.md
@@ -27,7 +27,8 @@ Here is a sample of the configMap with ```podResources```:
        "cpuLimit": "1000m",
        "memoryRequest": "512Mi",
        "memoryLimit": "1Gi"        
-    }
+    },
+    "priorityClassName": "high-priority"
 }
 ```

@@ -53,6 +54,75 @@ spec:
        - --node-agent-config=<configMap name>
 ```

+### Priority Class
+
+Data mover pods will use the priorityClassName configured in the node-agent configmap. The priorityClassName for data mover pods is configured through the node-agent configmap (specified via the `--node-agent-configmap` flag), while the node-agent daemonset itself uses the priority class set by the `--node-agent-priority-class-name` flag during Velero installation.
+
+#### When to Use Priority Classes
+
+**Higher Priority Classes** (e.g., `system-cluster-critical`, `system-node-critical`, or custom high-priority):
+- When you have dedicated nodes for backup operations
+- When backup/restore operations are time-critical
+- When you want to ensure data mover pods are scheduled even during high cluster utilization
+- For disaster recovery scenarios where restore speed is critical
+
+**Lower Priority Classes** (e.g., `low-priority` or negative values):
+- When you want to protect production workload performance
+- When backup operations can be delayed during peak hours
+- When cluster resources are limited and production workloads take precedence
+- For non-critical backup operations that can tolerate delays
+
+#### Consequences of Priority Class Settings
+
+**High Priority**:
+- ✅ Data mover pods are more likely to be scheduled quickly
+- ✅ Less likely to be preempted by other workloads
+- ❌ May cause resource pressure on production workloads
+- ❌ Could lead to production pod evictions in extreme cases
+
+**Low Priority**:
+- ✅ Production workloads are protected from resource competition
+- ✅ Cluster stability is maintained during backup operations
+- ❌ Backup/restore operations may take longer to start
+- ❌ Data mover pods may be preempted, causing backup failures
+- ❌ In resource-constrained clusters, backups might not run at all
+
+#### Example Configuration
+
+To configure priority class for data mover pods, include it in your node-agent configmap:
+
+```json
+{
+    "podResources": {
+        "cpuRequest": "1000m",
+        "cpuLimit": "2000m",
+        "memoryRequest": "1Gi",
+        "memoryLimit": "4Gi"
+    },
+    "priorityClassName": "backup-priority"
+}
+```
+
+First, create the priority class in your cluster:
+
+```yaml
+apiVersion: scheduling.k8s.io/v1
+kind: PriorityClass
+metadata:
+  name: backup-priority
+value: 1000
+globalDefault: false
+description: "Priority class for Velero data mover pods"
+```
+
+Then create or update the node-agent configmap:
+
+```bash
+kubectl create cm node-agent-config -n velero --from-file=node-agent-config.json
+```
+
+**Note**: If the specified priority class doesn't exist in the cluster when data mover pods are created, the pods will fail to schedule. Velero validates the priority class at startup and logs a warning if it doesn't exist, but the pods will still attempt to use it.
+
 [1]: csi-snapshot-data-movement.md
 [2]: file-system-backup.md
 [3]: https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/
--- a/site/content/docs/main/file-system-backup.md
+++ b/site/content/docs/main/file-system-backup.md
@@ -691,6 +691,10 @@ spec:
        ......
 ```  

+## Priority Class Configuration
+
+For Velero built-in data mover, data mover pods launched during file system backup will use the priority class name configured in the node-agent configmap. The node-agent daemonset itself gets its priority class from the `--node-agent-priority-class-name` flag during Velero installation. This can help ensure proper scheduling behavior in resource-constrained environments. For more details on configuring data mover pod resources, see [Data Movement Pod Resource Configuration][data-movement-config].
+
 ## Resource Consumption

 Both the uploader and repository consume remarkable CPU/memory during the backup/restore, especially for massive small files or large backup size cases.  
@@ -762,3 +766,4 @@ Velero still effectively manage restic repository, though you cannot write any n
 [18]: backup-repository-configuration.md
 [19]: node-agent-concurrency.md
 [20]: node-agent-prepare-queue-length.md
+[data-movement-config]: data-movement-pod-resource-configuration.md
--- a/site/content/docs/main/repository-maintenance.md
+++ b/site/content/docs/main/repository-maintenance.md
@@ -155,7 +155,7 @@ Status:
 - `Recent Maintenance` keeps the status of the recent 3 maintenance jobs, including its start time, result (succeeded/failed), completion time (if the maintenance job succeeded), or error message (if the maintenance failed)

 ### Others
-Maintenance jobs will inherit toleration, nodeSelector, service account, image, environment variables, cloud-credentials etc. from Velero deployment.
+Maintenance jobs will inherit toleration, nodeSelector, service account, image, environment variables, cloud-credentials, priorityClassName etc. from Velero deployment.

 For labels and annotations, maintenance jobs do NOT inherit all labels and annotations from the Velero deployment. Instead, they include:

@@ -171,7 +171,24 @@ For labels and annotations, maintenance jobs do NOT inherit all labels and annot
  * `iam.amazonaws.com/role`

 **Important:** Other labels and annotations from the Velero deployment are NOT inherited by maintenance jobs. This is by design to ensure only specific labels and annotations required for cloud provider identity systems are propagated.
-Maintenance jobs will not run for backup repositories whose backup storage location is set as readOnly.  
+Maintenance jobs will not run for backup repositories whose backup storage location is set as readOnly.
+
+#### Priority Class Configuration
+Maintenance jobs can be configured with a specific priority class through the repository maintenance job ConfigMap. The priority class name should be specified in the global configuration section:
+
+```json
+{
+    "global": {
+        "priorityClassName": "low-priority",
+        "podResources": {
+            "cpuRequest": "100m",
+            "memoryRequest": "128Mi"
+        }
+    }
+}
+```
+
+Note that priority class configuration is only read from the global configuration section, ensuring all maintenance jobs use the same priority class regardless of which repository they are maintaining.

 [1]: velero-install.md#usage
 [2]: node-agent-concurrency.md
--- a/site/content/docs/main/velero-install.md
+++ b/site/content/docs/main/velero-install.md
@@ -31,12 +31,16 @@ velero install \
    [--maintenance-job-cpu-request <CPU_REQUEST>] \
    [--maintenance-job-mem-request <MEMORY_REQUEST>] \
    [--maintenance-job-cpu-limit <CPU_LIMIT>] \
-    [--maintenance-job-mem-limit <MEMORY_LIMIT>]
+    [--maintenance-job-mem-limit <MEMORY_LIMIT>] \
+    [--server-priority-class-name <PRIORITY_CLASS_NAME>] \
+    [--node-agent-priority-class-name <PRIORITY_CLASS_NAME>]
 ```

 The values for the resource requests and limits flags follow the same format as [Kubernetes resource requirements][3]
 For plugin container images, please refer to our [supported providers][2] page.

+The `--server-priority-class-name` and `--node-agent-priority-class-name` flags allow you to set priority classes for the Velero server deployment and node agent daemonset respectively. This can help ensure proper scheduling and eviction behavior in resource-constrained environments. Note that you must create the priority class before installing Velero.
+
 ## Examples

 This section provides examples that serve as a starting point for more customized installations.