Update README and move the implemented Designs for v1.14

Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
This commit is contained in:
Daniel Jiang
2024-05-23 14:08:47 +08:00
parent 0e7fb402cd
commit 349c8f26c6
7 changed files with 8 additions and 9 deletions

View File

@@ -0,0 +1,344 @@
# Extend VolumePolicies to support more actions
## Abstract
Currently, the [VolumePolicies feature](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/handle-backup-of-volumes-by-resources-filters.md) which can be used to filter/handle volumes during backup only supports the skip action on matching conditions. Users need more actions to be supported.
## Background
The `VolumePolicies` feature was introduced in Velero 1.11 as a flexible way to handle volumes. The main agenda of
introducing the VolumePolicies feature was to improve the overall user experience when performing backup operations
for volume resources, the feature enables users to group volumes according the `conditions` (criteria) specified and
also lets you specify the `action` that velero needs to take for these grouped volumes during the backup operation.
The limitation being that currently `VolumePolicies` only supports `skip` as an action, We want to extend the `action`
functionality to support more usable options like `fs-backup` (File system backup) and `snapshot` (VolumeSnapshots).
## Goals
- Extending the VolumePolicies to support more actions like `fs-backup` (File system backup) and `snapshot` (VolumeSnapshots).
- Improve user experience when backing up Volumes via Velero
## Non-Goals
- No changes to existing approaches to opt-in/opt-out annotations for volumes
- No changes to existing `VolumePolicies` functionalities
- No additions or implementations to support more granular actions like `snapshot-csi` and `snapshot-datamover`. These actions can be implemented as a future enhancement
## Use-cases/Scenarios
**Use-case 1:**
- A user wants to use `snapshot` (volumesnapshots) backup option for all the csi supported volumes and `fs-backup` for the rest of the volumes.
- Currently, velero supports this use-case but the user experience is not that great.
- The user will have to individually annotate the volume mounting pod with the annotation "backup.velero.io/backup-volumes" for `fs-backup`
- This becomes cumbersome at scale.
- Using `VolumePolicies`, the user can just specify 2 simple `VolumePolicies` like for csi supported volumes as `snapshot` action and rest can be backed up`fs-backup` action:
```yaml
version: v1
volumePolicies:
- conditions:
storageClass:
- gp2
action:
type: snapshot
- conditions: {}
action:
type: fs-backup
```
**Use-case 2:**
- A user wants to use `fs-backup` for nfs volumes pertaining to a particular server
- In such a scenario the user can just specify a `VolumePolicy` like:
```yaml
version: v1
volumePolicies:
- conditions:
nfs:
server: 192.168.200.90
action:
type: fs-backup
```
## High-Level Design
- When the VolumePolicy action is set as `fs-backup` the backup workflow modifications would be:
- We call [backupItem() -> backupItemInternal()](https://github.com/vmware-tanzu/velero/blob/main/pkg/backup/item_backupper.go#L95) on all the items that are to be backed up
- Here when we encounter [Pod as an item ](https://github.com/vmware-tanzu/velero/blob/main/pkg/backup/item_backupper.go#L195)
- We will have to modify the backup workflow to account for the `fs-backup` VolumePolicy action
- When the VolumePolicy action is set as `snapshot` the backup workflow modifications would be:
- Once again, We call [backupItem() -> backupItemInternal()](https://github.com/vmware-tanzu/velero/blob/main/pkg/backup/item_backupper.go#L95) on all the items that are to be backed up
- Here when we encounter [Persistent Volume as an item](https://github.com/vmware-tanzu/velero/blob/d4128542590470b204a642ee43311921c11db880/pkg/backup/item_backupper.go#L253)
- And we call the [takePVSnapshot func](https://github.com/vmware-tanzu/velero/blob/d4128542590470b204a642ee43311921c11db880/pkg/backup/item_backupper.go#L508)
- We need to modify the takePVSnapshot function to account for the `snapshot` VolumePolicy action.
- In case of csi snapshots for PVC objects, these snapshot actions are taken by the velero-plugin-for-csi, we need to modify the [executeActions()](https://github.com/vmware-tanzu/velero/blob/512fe0dabdcb3bbf1ca68a9089056ae549663bcf/pkg/backup/item_backupper.go#L232) function to account for the `snapshot` VolumePolicy action.
**Note:** `Snapshot` action can either be a native snapshot or a csi snapshot, as is the case with the current flow where velero itself makes the decision based on the backup CR.
## Detailed Design
- Update VolumePolicy action type validation to account for `fs-backup` and `snapshot` as valid VolumePolicy actions.
- Modifications needed for `fs-backup` action:
- Now based on the specification of volume policy on backup request we will decide whether to go via legacy pod annotations approach or the newer volume policy based fs-backup action approach.
- If there is a presence of volume policy(fs-backup/snapshot) on the backup request that matches as an action for a volume we use the newer volume policy approach to get the list of the volumes for `fs-backup` action
- Else continue with the annotation based legacy approach workflow.
- Modifications needed for `snapshot` action:
- In the [takePVSnapshot function](https://github.com/vmware-tanzu/velero/blob/d4128542590470b204a642ee43311921c11db880/pkg/backup/item_backupper.go#L508) we will check the PV fits the volume policy criteria and see if the associated action is `snapshot`
- If it is not snapshot then we skip the further workflow and avoid taking the snapshot of the PV
- Similarly, For csi snapshot of PVC object, we need to do similar changes in [executeAction() function](https://github.com/vmware-tanzu/velero/blob/512fe0dabdcb3bbf1ca68a9089056ae549663bcf/pkg/backup/item_backupper.go#L348). we will check the PVC fits the volume policy criteria and see if the associated action is `snapshot` via csi plugin
- If it is not snapshot then we skip the csi BIA execute action and avoid taking the snapshot of the PVC by not invoking the csi plugin action for the PVC
**Note:**
- When we are using the `VolumePolicy` approach for backing up the volumes then the volume policy criteria and action need to be specific and explicit, there is no default behavior, if a volume matches `fs-backup` action then `fs-backup` method will be used for that volume and similarly if the volume matches the criteria for `snapshot` action then the snapshot workflow will be used for the volume backup.
- Another thing to note is the workflow proposed in this design uses the legacy `opt-in/opt-out` approach as a fallback option. For instance, the user specifies a VolumePolicy but for a particular volume included in the backup there are no actions(fs-backup/snapshot) matching in the volume policy for that volume, in such a scenario the legacy approach will be used for backing up the particular volume.
- The relation between the `VolumePolicy` and the backup's legacy parameter `SnapshotVolumes`:
- The `VolumePolicy`'s `snapshot` action matching for volume has higher priority. When there is a `snapshot` action matching for the selected volume, it will be backed by the snapshot way, no matter of the `backup.Spec.SnapshotVolumes` setting.
- If there is no `snapshot` action matching the selected volume in the `VolumePolicy`, then the volume will be backed up by `snapshot` way, if the `backup.Spec.SnapshotVolumes` is not set to false.
- The relation between the `VolumePolicy` and the backup's legacy filesystem `opt-in/opt-out` approach:
- The `VolumePolicy`'s `fs-backup` action matching for volume has higher priority. When there is a `fs-backup` action matching for the selected volume, it will be backed by the fs-backup way, no matter of the `backup.Spec.DefaultVolumesToFsBackup` setting and the pod's `opt-in/opt-out` annotation setting.
- If there is no `fs-backup` action matching the selected volume in the `VolumePolicy`, then the volume will be backed up by the legacy `opt-in/opt-out` way.
## Implementation
- The implementation should be included in velero 1.14
- We will introduce a `VolumeHelper` interface. It will consist of two methods:
```go
type VolumeHelper interface {
ShouldPerformSnapshot(obj runtime.Unstructured, groupResource schema.GroupResource) (bool, error)
ShouldPerformFSBackup(volume corev1api.Volume, pod corev1api.Pod) (bool, error)
}
```
- The `VolumeHelperImpl` struct will implement the `VolumeHelper` interface and will consist of the functions that we will use through the backup workflow to accommodate volume policies for PVs and PVCs.
```go
type volumeHelperImpl struct {
volumePolicy *resourcepolicies.Policies
snapshotVolumes *bool
logger logrus.FieldLogger
client crclient.Client
defaultVolumesToFSBackup bool
backupExcludePVC bool
}
```
- We will create an instance of the structure `volumeHelperImpl` in `item_backupper.go`
```go
itemBackupper := &itemBackupper{
...
volumeHelperImpl: volumehelper.NewVolumeHelperImpl(
resourcePolicy,
backupRequest.Spec.SnapshotVolumes,
log,
kb.kbClient,
boolptr.IsSetToTrue(backupRequest.Spec.DefaultVolumesToFsBackup),
!backupRequest.ResourceIncludesExcludes.ShouldInclude(kuberesource.PersistentVolumeClaims.String()),
),
}
```
#### FS-Backup
- Regarding `fs-backup` action to decide whether to use legacy annotation based approach or volume policy based approach:
- We will use the `vh.ShouldPerformFSBackup()` function from the `volumehelper` package
- Functions involved in processing `fs-backup` volume policy action will somewhat look like:
```go
func (v volumeHelperImpl) ShouldPerformFSBackup(volume corev1api.Volume, pod corev1api.Pod) (bool, error) {
if !v.shouldIncludeVolumeInBackup(volume) {
v.logger.Debugf("skip fs-backup action for pod %s's volume %s, due to not pass volume check.", pod.Namespace+"/"+pod.Name, volume.Name)
return false, nil
}
if v.volumePolicy != nil {
pvc, err := kubeutil.GetPVCForPodVolume(&volume, &pod, v.client)
if err != nil {
v.logger.WithError(err).Errorf("fail to get PVC for pod %s", pod.Namespace+"/"+pod.Name)
return false, err
}
pv, err := kubeutil.GetPVForPVC(pvc, v.client)
if err != nil {
v.logger.WithError(err).Errorf("fail to get PV for PVC %s", pvc.Namespace+"/"+pvc.Name)
return false, err
}
action, err := v.volumePolicy.GetMatchAction(pv)
if err != nil {
v.logger.WithError(err).Errorf("fail to get VolumePolicy match action for PV %s", pv.Name)
return false, err
}
if action != nil {
if action.Type == resourcepolicies.FSBackup {
v.logger.Infof("Perform fs-backup action for volume %s of pod %s due to volume policy match",
volume.Name, pod.Namespace+"/"+pod.Name)
return true, nil
} else {
v.logger.Infof("Skip fs-backup action for volume %s for pod %s because the action type is %s",
volume.Name, pod.Namespace+"/"+pod.Name, action.Type)
return false, nil
}
}
}
if v.shouldPerformFSBackupLegacy(volume, pod) {
v.logger.Infof("Perform fs-backup action for volume %s of pod %s due to opt-in/out way",
volume.Name, pod.Namespace+"/"+pod.Name)
return true, nil
} else {
v.logger.Infof("Skip fs-backup action for volume %s of pod %s due to opt-in/out way",
volume.Name, pod.Namespace+"/"+pod.Name)
return false, nil
}
}
```
- The main function from the above will be called when we encounter Pods during the backup workflow:
```go
for _, volume := range pod.Spec.Volumes {
shouldDoFSBackup, err := ib.volumeHelperImpl.ShouldPerformFSBackup(volume, *pod)
if err != nil {
backupErrs = append(backupErrs, errors.WithStack(err))
}
...
}
```
#### Snapshot (PV)
- Making sure that `snapshot` action is skipped for PVs that do not fit the volume policy criteria, for this we will use the `vh.ShouldPerformSnapshot` from the `VolumeHelperImpl(vh)` receiver.
```go
func (v *volumeHelperImpl) ShouldPerformSnapshot(obj runtime.Unstructured, groupResource schema.GroupResource) (bool, error) {
// check if volume policy exists and also check if the object(pv/pvc) fits a volume policy criteria and see if the associated action is snapshot
// if it is not snapshot then skip the code path for snapshotting the PV/PVC
pvc := new(corev1api.PersistentVolumeClaim)
pv := new(corev1api.PersistentVolume)
var err error
if groupResource == kuberesource.PersistentVolumeClaims {
if err = runtime.DefaultUnstructuredConverter.FromUnstructured(obj.UnstructuredContent(), &pvc); err != nil {
return false, err
}
pv, err = kubeutil.GetPVForPVC(pvc, v.client)
if err != nil {
return false, err
}
}
if groupResource == kuberesource.PersistentVolumes {
if err = runtime.DefaultUnstructuredConverter.FromUnstructured(obj.UnstructuredContent(), &pv); err != nil {
return false, err
}
}
if v.volumePolicy != nil {
action, err := v.volumePolicy.GetMatchAction(pv)
if err != nil {
return false, err
}
// If there is a match action, and the action type is snapshot, return true,
// or the action type is not snapshot, then return false.
// If there is no match action, go on to the next check.
if action != nil {
if action.Type == resourcepolicies.Snapshot {
v.logger.Infof(fmt.Sprintf("performing snapshot action for pv %s", pv.Name))
return true, nil
} else {
v.logger.Infof("Skip snapshot action for pv %s as the action type is %s", pv.Name, action.Type)
return false, nil
}
}
}
// If this PV is claimed, see if we've already taken a (pod volume backup)
// snapshot of the contents of this PV. If so, don't take a snapshot.
if pv.Spec.ClaimRef != nil {
pods, err := podvolumeutil.GetPodsUsingPVC(
pv.Spec.ClaimRef.Namespace,
pv.Spec.ClaimRef.Name,
v.client,
)
if err != nil {
v.logger.WithError(err).Errorf("fail to get pod for PV %s", pv.Name)
return false, err
}
for _, pod := range pods {
for _, vol := range pod.Spec.Volumes {
if vol.PersistentVolumeClaim != nil &&
vol.PersistentVolumeClaim.ClaimName == pv.Spec.ClaimRef.Name &&
v.shouldPerformFSBackupLegacy(vol, pod) {
v.logger.Infof("Skipping snapshot of pv %s because it is backed up with PodVolumeBackup.", pv.Name)
return false, nil
}
}
}
}
if !boolptr.IsSetToFalse(v.snapshotVolumes) {
// If the backup.Spec.SnapshotVolumes is not set, or set to true, then should take the snapshot.
v.logger.Infof("performing snapshot action for pv %s as the snapshotVolumes is not set to false")
return true, nil
}
v.logger.Infof(fmt.Sprintf("skipping snapshot action for pv %s possibly due to no volume policy setting or snapshotVolumes is false", pv.Name))
return false, nil
}
```
- The function `ShouldPerformSnapshot` will be used as follows in `takePVSnapshot` function of the backup workflow:
```go
snapshotVolume, err := ib.volumeHelperImpl.ShouldPerformSnapshot(obj, kuberesource.PersistentVolumes)
if err != nil {
return err
}
if !snapshotVolume {
log.Info(fmt.Sprintf("skipping volume snapshot for PV %s as it does not fit the volume policy criteria specified by the user for snapshot action", pv.Name))
ib.trackSkippedPV(obj, kuberesource.PersistentVolumes, volumeSnapshotApproach, "does not satisfy the criteria for volume policy based snapshot action", log)
return nil
}
```
#### Snapshot (PVC)
- Making sure that `snapshot` action is skipped for PVCs that do not fit the volume policy criteria, for this we will again use the `vh.ShouldPerformSnapshot` from the `VolumeHelperImpl(vh)` receiver.
- We will pass the `VolumeHelperImpl(vh)` instance in `executeActions` method so that it is available to use.
```go
```
- The above function will be used as follows in the `executeActions` function of backup workflow.
- Considering the vSphere plugin doesn't support the VolumePolicy yet, don't use the VolumePolicy for vSphere plugin by now.
```go
if groupResource == kuberesource.PersistentVolumeClaims {
if actionName == csiBIAPluginName {
snapshotVolume, err := ib.volumeHelperImpl.ShouldPerformSnapshot(obj, kuberesource.PersistentVolumeClaims)
if err != nil {
return nil, itemFiles, errors.WithStack(err)
}
if !snapshotVolume {
log.Info(fmt.Sprintf("skipping csi volume snapshot for PVC %s as it does not fit the volume policy criteria specified by the user for snapshot action", namespace+"/"+name))
ib.trackSkippedPV(obj, kuberesource.PersistentVolumeClaims, volumeSnapshotApproach, "does not satisfy the criteria for volume policy based snapshot action", log)
continue
}
}
}
```
## Future Implementation
It makes sense to add more specific actions in the future, once we deprecate the legacy opt-in/opt-out approach to keep things simple. Another point of note is, csi related action can be
easier to implement once we decide to merge the csi plugin into main velero code flow.
In the future, we envision the following actions that can be implemented:
- `snapshot-native`: only use volume snapshotter (native cloud provider snapshots), do nothing if not present/not compatible
- `snapshot-csi`: only use csi-plugin, don't use volume snapshotter(native cloud provider snapshots), don't use datamover even if snapshotMoveData is true
- `snapshot-datamover`: only use csi with datamover, don't use volume snapshotter (native cloud provider snapshots), use datamover even if snapshotMoveData is false
**Note:** The above actions are just suggestions for future scope, we may not use/implement them as is. We could definitely merge these suggested actions as `Snapshot` actions and use volume policy parameters and criteria to segregate them instead of making the user explicitly supply the action names to such granular levels.
## Related to Design
[Handle backup of volumes by resources filters](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/handle-backup-of-volumes-by-resources-filters.md)
## Alternatives Considered
Same as the earlier design as this is an extension of the original VolumePolicies design

View File

@@ -0,0 +1,137 @@
# Node-agent Load Affinity Design
## Glossary & Abbreviation
**Velero Generic Data Path (VGDP)**: VGDP is the collective modules that is introduced in [Unified Repository design][1]. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository.
**Exposer**: Exposer is a module that is introduced in [Volume Snapshot Data Movement Design][2]. Velero uses this module to expose the volume snapshots to Velero node-agent pods or node-agent associated pods so as to complete the data movement from the snapshots.
## Background
Velero node-agent is a daemonset hosting controllers and VGDP modules to complete the concrete work of backups/restores, i.e., PodVolume backup/restore, Volume Snapshot Data Movement backup/restore.
Specifically, node-agent runs DataUpload controllers to watch DataUpload CRs for Volume Snapshot Data Movement backups, so there is one controller instance in each node. One controller instance takes a DataUpload CR and then launches a VGDP instance, which initializes a uploader instance and the backup repository connection, to finish the data transfer. The VGDP instance runs inside a node-agent pod or in a pod associated to the node-agent pod in the same node.
Varying from the data size, data complexity, resource availability, VGDP may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.).
Technically, VGDP instances are able to run in any node that allows pod schedule. On the other hand, users may want to constrain the nodes where VGDP instances run for various reasons, below are some examples:
- Prevent VGDP instances from running in specific nodes because users have more critical workloads in the nodes
- Constrain VGDP instances to run in specific nodes because these nodes have more resources than others
- Constrain VGDP instances to run in specific nodes because the storage allows volume/snapshot provisions in these nodes only
Therefore, in order to improve the compatibility, it is worthy to configure the affinity of VGDP to nodes, especially for backups for which VGDP instances run frequently and centrally.
## Goals
- Define the behaviors of node affinity of VGDP instances in node-agent for volume snapshot data movement backups
- Create a mechanism for users to specify the node affinity of VGDP instances for volume snapshot data movement backups
## Non-Goals
- It is also beneficial to support VGDP instances affinity for PodVolume backup/restore, however, it is not possible since VGDP instances for PodVolume backup/restore should always run in the node where the source/target pods are created.
- It is also beneficial to support VGDP instances affinity for data movement restores, however, it is not possible in some cases. For example, when the `volumeBindingMode` in the storageclass is `WaitForFirstConsumer`, the restore volume must be mounted in the node where the target pod is scheduled, so the VGDP instance must run in the same node. On the other hand, considering the fact that restores may not frequently and centrally run, we will not support data movement restores.
- As elaberated in the [Volume Snapshot Data Movement Design][2], the Exposer may take different ways to expose snapshots, i.e., through backup pods (this is the only way supported at present). The implementation section below only considers this approach currently, if a new expose method is introduced in future, the definition of the affinity configurations and behaviors should still work, but we may need a new implementation.
## Solution
We will use the ```node-agent-config``` configMap to host the node affinity configurations.
This configMap is not created by Velero, users should create it manually on demand. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only.
Node-agent server checks these configurations at startup time and use it to initiate the related VGDP modules. Therefore, users could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted.
Inside ```node-agent-config``` configMap we will add one new kind of configuration as the data in the configMap, the name is ```loadAffinity```.
Users may want to set different LoadAffinity configurations according to different conditions (i.e., for different storages represented by StorageClass, CSI driver, etc.), so we define ```loadAffinity``` as an array. This is for extensibility consideration, at present, we don't implement multiple configurations support, so if there are multiple configurations, we always take the first one in the array.
The data structure for ```node-agent-config``` is as below:
```go
type Configs struct {
// LoadConcurrency is the config for load concurrency per node.
LoadConcurrency *LoadConcurrency `json:"loadConcurrency,omitempty"`
// LoadAffinity is the config for data path load affinity.
LoadAffinity []*LoadAffinity `json:"loadAffinity,omitempty"`
}
type LoadAffinity struct {
// NodeSelector specifies the label selector to match nodes
NodeSelector metav1.LabelSelector `json:"nodeSelector"`
}
```
### Affinity
Affinity configuration means allowing VGDP instances running in the nodes specified. There are two ways to define it:
- It could be defined by `MatchLabels` of `metav1.LabelSelector`. The labels defined in `MatchLabels` means a `LabelSelectorOpIn` operation by default, so in the current context, they will be treated as affinity rules.
- It could be defined by `MatchExpressions` of `metav1.LabelSelector`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpIn` or `LabelSelectorOpExists`.
### Anti-affinity
Anti-affinity configuration means preventing VGDP instances running in the nodes specified. Below is the way to define it:
- It could be defined by `MatchExpressions` of `metav1.LabelSelector`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpNotIn` or `LabelSelectorOpDoesNotExist`.
### Sample
A sample of the ```node-agent-config``` configMap is as below:
```json
{
"loadAffinity": [
{
"nodeSelector": {
"matchLabels": {
"beta.kubernetes.io/instance-type": "Standard_B4ms"
},
"matchExpressions": [
{
"key": "kubernetes.io/hostname",
"values": [
"node-1",
"node-2",
"node-3"
],
"operator": "In"
},
{
"key": "xxx/critial-workload",
"operator": "DoesNotExist"
}
]
}
}
]
}
```
This sample showcases two affinity configurations:
- matchLabels: VGDP instances will run only in nodes with label key `beta.kubernetes.io/instance-type` and value `Standard_B4ms`
- matchExpressions: VGDP instances will run in node `node1`, `node2` and `node3` (selected by `kubernetes.io/hostname` label)
This sample showcases one anti-affinity configuration:
- matchExpressions: VGDP instances will not run in nodes with label key `xxx/critial-workload`
To create the configMap, users need to save something like the above sample to a json file and then run below command:
```
kubectl create cm node-agent-config -n velero --from-file=<json file name>
```
### Implementation
As mentioned in the [Volume Snapshot Data Movement Design][2], the exposer decides where to launch the VGDP instances. At present, for volume snapshot data movement backups, the exposer creates backupPods and the VGDP instances will be initiated in the nodes where backupPods are scheduled. So the loadAffinity will be translated (from `metav1.LabelSelector` to `corev1.Affinity`) and set to the backupPods.
It is possible that node-agent pods, as a daemonset, don't run in every worker nodes, users could fulfil this by specify `nodeSelector` or `nodeAffinity` to the node-agent daemonset spec. On the other hand, at present, VGDP instances must be assigned to nodes where node-agent pods are running. Therefore, if there is any node selection for node-agent pods, users must consider this into this load affinity configuration, so as to guarantee that VGDP instances are always assigned to nodes where node-agent pods are available. This is done by users, we don't inherit any node selection configuration from node-agent daemonset as we think daemonset scheduler works differently from plain pod scheduler, simply inheriting all the configurations may cause unexpected result of backupPod schedule.
Otherwise, if a backupPod are scheduled to a node where node-agent pod is absent, the corresponding DataUpload CR will stay in `Accepted` phase until the prepare timeout (by default 30min).
At present, as part of the expose operations, the exposer creates a volume, represented by backupPVC, from the snapshot. The backupPVC uses the same storageClass with the source volume. If the `volumeBindingMode` in the storageClass is `Immediate`, the volume is immediately allocated from the underlying storage without waiting for the backupPod. On the other hand, the loadAffinity is set to the backupPod's affinity. If the backupPod is scheduled to a node where the snapshot volume is not accessible, e.g., because of storage topologies, the backupPod won't get into Running state, concequently, the data movement won't complete.
Once this problem happens, the backupPod stays in `Pending` phase, and the corresponding DataUpload CR stays in `Accepted` phase until the prepare timeout (by default 30min).
There is a common solution for the both problems:
- We have an existing logic to periodically enqueue the dataupload CRs which are in the `Accepted` phase for timeout and cancel checks
- We add a new logic to this existing logic to check if the corresponding backupPods are in unrecoverable status
- The above problems could be covered by this check, because in both cases the backupPods are in abnormal and unrecoverable status
- If a backupPod is unrecoverable, the dataupload controller cancels the dataupload and deletes the backupPod
Specifically, when the above problems happen, the status of a backupPod is like below:
```
status:
conditions:
- lastProbeTime: null
message: '0/2 nodes are available: 1 node(s) didn''t match Pod''s node affinity/selector,
1 node(s) had volume node affinity conflict. preemption: 0/2 nodes are available:
2 Preemption is not helpful for scheduling..'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
```
[1]: Implemented/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md
[2]: volume-snapshot-data-movement/volume-snapshot-data-movement.md

View File

@@ -0,0 +1,143 @@
# Volume information for restore design
## Background
Velero has different ways to handle data in the volumes during restore. The users want to have more clarity in terms of how
the volumes are handled in restore process via either Velero CLI or other downstream product which consumes Velero.
## Goals
- Create new metadata to store the information of the restored volume, which will have the same life-cycle as the restore CR.
- Consume the metadata in velero CLI to enable it display more details for volumes in the output of `velero restore describe --details`
## Non Goals
- Provide finer grained control of the volume restore process. The focus of the design is to enable displaying more details.
- Persist additional metadata like podvolume, datadownloads etc to the restore folder in backup-location.
## Design
### Structure of the restore volume info
The restore volume info will be stored in a file named like `${restore_name}-vol-info.json`. The content of the file will
be a list of volume info objects, each of which will map to a volume that is restored, and will contain the information
like name of the restored PV/PVC, restore method and related objects to provide details depending on the way it's restored,
it will look like this:
```
[
{
"pvcName": "nginx-logs-2",
"pvcNamespace": "nginx-app-restore",
"pvName": "pvc-e320d75b-a788-41a3-b6ba-267a553efa5e",
"restoreMethod": "PodVolumeRestore",
"snapshotDataMoved": false,
"pvrInfo": {
"snapshotHandle": "81973157c3a945a5229285c931b02c68",
"uploaderType": "kopia",
"volumeName": "nginx-logs",
"podName": "nginx-deployment-79b56c644b-mjdhp",
"podNamespace": "nginx-app-restore"
}
},
{
"pvcName": "nginx-logs-1",
"pvcNamespace": "nginx-app-restore",
"pvName": "pvc-98c151f4-df47-4980-ba6d-470842f652cc",
"restoreMethod": "CSISnapshot",
"snapshotDataMoved": false,
"csiSnapshotInfo": {
"snapshotHandle": "snap-01a3b21a5e9f85528",
"size": 2147483648,
"driver": "ebs.csi.aws.com",
"vscName": "velero-velero-nginx-logs-1-jxmbg-hx9x5"
}
}
......
]
```
Each field will have the same meaning as the corresponding field in the backup volume info. It will not have the fields
that were introduced to help with the backup process, like `pvInfo`, `dataupload` etc.
### How the restore volume info is generated
Two steps are involved in generating the restore volume info, the first is "collection", which is to gather the information
for restoration of the volumes, the second is "generation", which is to iterate through the data collected in the first step
and generate the volume info list as is described above.
Unlike backup, the CR objects created during the restore process will not be persisted to the backup storage location.
Therefore, to gather the information needed to generate volume information, we either need to collect the CRs in the middle
of the restore process, or retrieve the objects based on the `resouce-list.json` of the restore via API server.
The information to be collected are:
- **PV/PVC mapping relationship:** It will be collected via the `restore-resource-list.json`, b/c at the time the json is ready, all
PVCs and PVs are already created.
- **Native snapshot information:** It will be collected in the restore workflow when each snapshot is restored.
- **podvolumerestore CRs:** It will be collected in the restore workflow after each pvr is created.
- **volumesnapshot CRs for CSI snapshot:** It will be collected in the step of collecting PVC info, by reading the `dataSource`
field in the spec of the PVC.
- **datadownload CRs** It will be collected in the phase of collecting PVC info, by querying the API-server to list the datadownload
CRs labeled with the restore name.
After the collection step, the generation step is relatively straight-forward, as we have all the information needed in
the data structures.
The whole collection and generation steps will be done with the "best-effort" manner, i.e. if there are any failures we
will only log the error in restore log, rather than failing the whole restore process, we will not put these errors or warnings
into the `result.json`, b/c it won't impact the restored resources.
Depending on the number of the restored PVCs the "collection" step may involve many API calls, but it's considered acceptable
b/c at that time the resources are already created, so the actual RTO is not impacted. By using the client of controller runtime
we can make the collection step more efficient by using the cache of the API server. We may consider to make improvements if
we observe performance issues, like using multiple go-routines in the collection.
### Implementation
Because the restore volume info shares the same data structures with the backup volume info, we will refactor the code in
package `internal/volume` to make the sub-components in backup volume info shared by both backup and restore volume info.
We'll introduce a struct called `RestoreVolumeInfoTracker` which encapsulates the logic of collecting and generating the restore volume info:
```
// RestoreVolumeInfoTracker is used to track the volume information during restore.
// It is used to generate the RestoreVolumeInfo array.
type RestoreVolumeInfoTracker struct {
*sync.Mutex
restore *velerov1api.Restore
log logrus.FieldLogger
client kbclient.Client
pvPvc *pvcPvMap
// map of PV name to the NativeSnapshotInfo from which the PV is restored
pvNativeSnapshotMap map[string]NativeSnapshotInfo
// map of PV name to the CSISnapshot object from which the PV is restored
pvCSISnapshotMap map[string]snapshotv1api.VolumeSnapshot
datadownloadList *velerov2alpha1.DataDownloadList
pvrs []*velerov1api.PodVolumeRestore
}
```
The `RestoreVolumeInfoTracker` will be created when the restore request is initialized, and it will be passed to the `restoreContext`
and carried over the whole restore process.
The `client` in this struct is to be used to query the resources in the restored namespace, and the current client in restore
reconciler only watches the resources in the namespace where velero is installed. Therefore, we need to introduce the
`CrClient` which has the same life-cycle of velero server to the restore reconciler, because this is the client that watches all the
resources on the cluster.
In addition to that, we will make small changes in the restore workflow to collect the information needed. We'll make the
changes un-intrusive and make sure not to change the logic of the restore to avoid break change or regression.
We'll also introduce routine changes in the package `pkg/persistence` to persist the restore volume info to the backup storage location.
Last but not least, the `velero restore describe --details` will be updated to display the volume info in the output.
## Alternatives Considered
There used to be suggestion that to provide more details about volume, we can query the `backup-vol-info.json` with the resource
identifier in `restore-resource-list.json`. This will not work when there're resource modifiers involved in the restore process,
which may change the metadata of PVC/PV. In addition, we may add more detailed restore-specific information about the volumes that is not available
in the `backup-vol-info.json`. Therefore, the `restore-vol-info.json` is a better approach.
## Security Considerations
There should be no security impact introduced by this design.
## Compatibility
The restore volume info will be consumed by Velero CLI and downstream products for displaying details. So the functionality
of backup and restore will not be impacted for restores created by older versions of Velero which do not have the restore volume info
metadata. The client should properly handle the case when the restore volume info does not exist.
The data structures referenced by volume info is shared between both restore and backup and it's not versioned, so in the future
we must make sure there will only be incremental changes to the metadata, such that no break change will be introduced to the client.
## Open Issues
https://github.com/vmware-tanzu/velero/issues/7546
https://github.com/vmware-tanzu/velero/issues/6478

View File

@@ -0,0 +1,318 @@
# Design for repository maintenance job
## Abstract
This design proposal aims to decouple repository maintenance from the Velero server by launching a maintenance job when needed, to mitigate the impact on the Velero server during backups.
## Background
During backups, Velero performs periodic maintenance on the repository. This operation may consume significant CPU and memory resources in some cases, leading to potential issues such as the Velero server being killed by OOM. This proposal addresses these challenges by separating repository maintenance from the Velero server.
## Goals
1. **Independent Repository Maintenance**: Decouple maintenance from Velero's main logic to reduce the impact on the Velero server pod.
2. **Configurable Resources Usage**: Make the resources used by the maintenance job configurable.
3. **No API Changes**: Retain existing APIs and workflow in the backup repository controller.
## Non Goals
We have lots of concerns over parallel maintenance, which will increase the complexity of our design currently.
- Non-blocking maintenance job: it may conflict with updating the same `backuprepositories` CR when parallel maintenance.
- Maintenance job concurrency control: there is no one suitable mechanism in Kubernetes to control the concurrency of different jobs.
- Parallel maintenance: Maintaining the same repo by multiple jobs at the same time would have some compatible cases that some providers may not support.
Unfortunately, parallel maintenance is currently not a priority because of the concerns above, improving maintenance efficiency is not the primary focus at this stage.
## High-Level Design
1. **Add Maintenance Subcommand**: Introduce a new Velero server subcommand for repository maintenance.
2. **Create Jobs by Repository Manager**: Modify the backup repository controller to create a maintenance job instead of directly calling the multiple chain calls for Kopia or Restic maintenance.
3. **Update Maintenance Job Result in BackupRepository CR**: Retrieve the result of the maintenance job and update the status of the `BackupRepository` CR accordingly.
4. **Add Setting for Maintenance Job**: Introduce a configuration option to set maintenance jobs, including resource limits (CPU and memory), keeping the latest N maintenance jobs for each repository.
## Detailed Design
### 1. Add Maintenance sub-command
The CLI command will be added to the Velero CLI, the command is designed for use in a pod of maintenance jobs.
Our CLI command is designed as follows:
```shell
$ velero repo-maintenance --repo-name $repo-name --repo-type $repo-type --backup-storage-location $bsl
```
Compared with other CLI commands, the maintenance command is used in a pod of maintenance jobs not for user use, and the job should show the result of maintenance after finish.
Here we will write the error message into one specific file which could be read by the maintenance job.
on the whole, we record two kinds of logs:
- one is the log output of the intermediate maintenance process: this log could be retrieved via the Kubernetes API server, including the error log.
- one is the result of the command which could indicate whether the execution is an error or not: the result could be redirected to a file that the maintenance job itself could read, and the file only contains the error message.
we will write the error message into the `/dev/termination-log` file if execution is failed.
The main maintenance logic would be using the repository provider to do the maintenance.
```golang
func checkError(err error, file *os.File) {
if err != nil {
if err != context.Canceled {
if _, errWrite := file.WriteString(fmt.Sprintf("An error occurred: %v", err)); errWrite != nil {
fmt.Fprintf(os.Stderr, "Failed to write error to termination log file: %v\n", errWrite)
}
file.Close()
os.Exit(1) // indicate the command executed failed
}
}
}
func (o *Options) Run(f veleroCli.Factory) {
logger := logging.DefaultLogger(o.LogLevelFlag.Parse(), o.FormatFlag.Parse())
logger.SetOutput(os.Stdout)
errorFile, err := os.Create("/dev/termination-log")
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to create termination log file: %v\n", err)
return
}
defer errorFile.Close()
...
err = o.runRepoPrune(cli, f.Namespace(), logger)
checkError(err, errorFile)
...
}
func (o *Options) runRepoPrune(cli client.Client, namespace string, logger logrus.FieldLogger) error {
...
var repoProvider provider.Provider
if o.RepoType == velerov1api.BackupRepositoryTypeRestic {
repoProvider = provider.NewResticRepositoryProvider(credentialFileStore, filesystem.NewFileSystem(), logger)
} else {
repoProvider = provider.NewUnifiedRepoProvider(
credentials.CredentialGetter{
FromFile: credentialFileStore,
FromSecret: credentialSecretStore,
}, o.RepoType, cli, logger)
}
...
err = repoProvider.BoostRepoConnect(context.Background(), para)
if err != nil {
return errors.Wrap(err, "failed to boost repo connect")
}
err = repoProvider.PruneRepo(context.Background(), para)
if err != nil {
return errors.Wrap(err, "failed to prune repo")
}
return nil
}
```
### 2. Create Jobs by Repository Manager
Currently, the backup repository controller will call the repository manager to do the `PruneRepo`, and Kopia or Restic maintenance is then finally called through multiple chain calls.
We will keep using the `PruneRepo` function in the repository manager, but we cut off the multiple chain calls by creating a maintenance job.
The job definition would be like below:
```yaml
apiVersion: v1
items:
- apiVersion: batch/v1
kind: Job
metadata:
# labels or affinity or topology settings would inherit from the velero deployment
labels:
# label the job name for later list jobs by name
job-name: nginx-example-default-kopia-pqz6c
name: nginx-example-default-kopia-pqz6c
namespace: velero
spec:
# Not retry it again
backoffLimit: 1
# Only have one job one time
completions: 1
# Not parallel running job
parallelism: 1
template:
metadata:
labels:
job-name: nginx-example-default-kopia-pqz6c
name: kopia-maintenance-job
spec:
containers:
# arguments for repo maintenance job
- args:
- repo-maintenance
- --repo-name=nginx-example
- --repo-type=kopia
- --backup-storage-location=default
# inherit from Velero server
- --log-level=debug
command:
- /velero
# inherit environment variables from the velero deployment
env:
- name: AZURE_CREDENTIALS_FILE
value: /credentials/cloud
# inherit image from the velero deployment
image: velero/velero:main
imagePullPolicy: IfNotPresent
name: kopia-maintenance-container
# resource limitation set by Velero server configuration
# if not specified, it would apply best effort resources allocation strategy
resources: {}
# error message would be written to /dev/termination-log
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
# inherit volume mounts from the velero deployment
volumeMounts:
- mountPath: /credentials
name: cloud-credentials
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
# inherit service account from the velero deployment
serviceAccount: velero
serviceAccountName: velero
volumes:
# inherit cloud credentials from the velero deployment
- name: cloud-credentials
secret:
defaultMode: 420
secretName: cloud-credentials
# ttlSecondsAfterFinished set the job expired seconds
ttlSecondsAfterFinished: 86400
status:
# which contains the result after maintenance
message: ""
lastMaintenanceTime: ""
```
Now, the backup repository controller will call the repository manager to create one maintenance job and wait for the job to complete. The Kopia or Restic maintenance multiple chains are called by the job.
### 3. Update the Result of the Maintenance Job into BackupRepository CR
The backup repository controller will update the result of the maintenance job into the backup repository CR.
For how to get the result of the maintenance job we could refer to [here](https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/#writing-and-reading-a-termination-message).
After the maintenance job is finished, we could get the result of maintenance by getting the terminated message from the related pod:
```golang
func GetContainerTerminatedMessage(pod *v1.Pod) string {
...
for _, containerStatus := range pod.Status.ContainerStatuses {
if containerStatus.LastTerminationState.Terminated != nil {
return containerStatus.LastTerminationState.Terminated.Message
}
}
...
return ""
}
```
Then we could update the status of backupRepository CR with the message.
### 4. Add Setting for Resource Usage of Maintenance
Add one configuration for setting the resource limit of maintenance jobs as below:
```shell
velero server --maintenance-job-cpu-request $cpu-request --maintenance-job-mem-request $mem-request --maintenance-job-cpu-limit $cpu-limit --maintenance-job-mem-limit $mem-limit
```
Our default value is 0, which means we don't limit the resources, and the resource allocation strategy would be [best effort](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#besteffort).
### 5. Automatic Cleanup for Finished Maintenance Jobs
Add configuration for clean up maintenance jobs:
- keep-latest-maintenance-jobs: the number of keeping latest maintenance jobs for each repository.
```shell
velero server --keep-latest-maintenance-jobs $num
```
We would check and keep the latest N jobs after a new job is finished.
```golang
func deleteOldMaintenanceJobs(cli client.Client, repo string, keep int) error {
// Get the maintenance job list by label
jobList := &batchv1.JobList{}
err := cli.List(context.TODO(), jobList, client.MatchingLabels(map[string]string{RepositoryNameLabel: repo}))
if err != nil {
return err
}
// Delete old maintenance jobs
if len(jobList.Items) > keep {
sort.Slice(jobList.Items, func(i, j int) bool {
return jobList.Items[i].CreationTimestamp.Before(&jobList.Items[j].CreationTimestamp)
})
for i := 0; i < len(jobList.Items)-keep; i++ {
err = cli.Delete(context.TODO(), &jobList.Items[i], client.PropagationPolicy(metav1.DeletePropagationBackground))
if err != nil {
return err
}
}
}
return nil
}
```
### 6 Velero Install with Maintenance Options
All the above maintenance options should be supported by Velero install command.
### 7. Observability and Debuggability
Some monitoring metrics are added for backup repository maintenance:
- repo_maintenance_total
- repo_maintenance_success_total
- repo_maintenance_failed_total
- repo_maintenance_duration_seconds
We will keep the latest N maintenance jobs for each repo, and users can get the log from the job. the job log level inherent from the Velero server setting.
Also, we would integrate maintenance job logs and `backuprepositories` CRs into `velero debug`.
Roughly, the process is as follows:
1. The backup repository controller will check the BackupRepository request in the queue periodically.
2. If the maintenance period of the repository checked by `runMaintenanceIfDue` in `Reconcile` is due, then the backup repository controller will call the Repository manager to execute `PruneRepo`
3. The `PruneRepo` of the Repository manager will create one maintenance job, the resource limitation, environment variables, service account, images, etc. would inherit from the Velero server pod. Also, one clean up TTL would be set to maintenance job.
4. The maintenance job will execute the Velero maintenance command, wait for maintaining to finish and write the maintenance result into the terminationMessagePath file of the related pod.
5. Kubernetes could show the result in the status of the pod by reading the termination message in the pod.
6. The backup repository controller will wait for the maintenance job to finish and read the status of the maintenance job, then update the message field and phase in the status of `backuprepositories` CR accordingly.
6. Clean up old maintenance jobs and keep only N latest for each repository.
### 8. Codes Refinement
Once `backuprepositories` CR status is modified, the CR would re-queue to be reconciled, and re-execute logics in reconcile shortly not respecting the re-queue frequency configured by `repoSyncPeriod`.
For one abnormal scenario if the maintenance job fails, the status of `backuprepositories` CR would be updated and the CR will re-queue immediately, if the new maintenance job still fails, then it will re-queue again, making the logic of `backuprepositories` CR re-queue like a dead loop.
So we change the Predicates logic in Controller manager making it only re-queue if the Spec of `backuprepositories` CR is changed.
```golang
ctrl.NewControllerManagedBy(mgr).For(&velerov1api.BackupRepository{}, builder.WithPredicates(kube.SpecChangePredicate{}))
```
This change would bring the behavior different from the previous, errors that occurred in the maintenance job would retry in the next reconciliation period instead of retrying immediately.
## Prospects for Future Work
Future work may focus on improving the efficiency of Velero maintenance through non-blocking parallel modes. Potential areas for enhancement include:
**Non-blocking Mode**: Explore the implementation of a non-blocking mode for parallel maintenance to enhance overall efficiency.
**Concurrency Control**: Investigate mechanisms for better concurrency control of different maintenance jobs.
**Provider Support for Parallel Maintenance**: Evaluate the feasibility of parallel maintenance for different providers and address any compatibility issues.
**Efficiency Improvements**: Investigate strategies to optimize maintenance efficiency without compromising reliability.
By considering these areas, future iterations of Velero may benefit from enhanced parallelization and improved resource utilization during repository maintenance.

View File

@@ -0,0 +1,120 @@
# Design for Adding Finalization Phase in Restore Workflow
## Abstract
This design proposes adding the finalization phase to the restore workflow. The finalization phase would be entered after all item restoration and plugin operations have been completed, similar to the way the backup process proceeds. Its purpose is to perform any wrap-up work necessary before transitioning the restore process to a terminal phase.
## Background
Currently, the restore process enters a terminal phase once all item restoration and plugin operations have been completed. However, there are some wrap-up works that need to be performed after item restoration and plugin operations have been fully executed. There is no suitable opportunity to perform them at present.
To address this, a new finalization phase should be added to the existing restore workflow. in this phase, all plugin operations and item restoration has been fully completed, which provides a clean opportunity to perform any wrap-up work before termination, improving the overall restore process.
Wrap-up tasks in Velero can serve several purposes:
- Post-restore modification - Velero can modify the restored data that was temporarily changed for some purpose but required to be changed back finally or data that was newly created but missing some information. For example, [issue6435](https://github.com/vmware-tanzu/velero/issues/6435) indicates that some custom settings(like labels, reclaim policy) on restored PVs was lost because those restored PVs was newly dynamically provisioned. Velero can address it by patching the PVs' custom settings back in the finalization phase.
- Clean up unused data - Velero can identify and delete any data that are no longer needed after a successful restore in the finalization phase.
- Post-restore validation - Velero can validate the state of restored data and report any errors to help users locate the issue in the finalization phase.
The uses of wrap-up tasks are not limited to these examples. Additional needs may be addressed as they develop over time.
## Goals
- Add the finalization phase and the corresponding controller to restore workflow.
## Non Goals
- Implement the specific wrap-up work.
## High-Level Design
- The finalization phase will be added to current restore workflow.
- The logic for handling current phase transition in restore and restore operations controller will be modified with the introduction of the finalization phase.
- A new restore finalizer controller will be implemented to handle the finalization phase.
## Detailed Design
### phase transition
Two new phases related to finalization will be added to restore workflow, which are `FinalizingPartiallyFailed` and `Finalizing`. The new phase transition will be similar to backup workflow, proceeding as follow:
![image](restore-phases-transition.png)
### restore finalizer controller
The new restore finalizer controller will be implemented to watch for restores in `FinalizingPartiallyFailed` and `Finalizing` phases. Any wrap-up work that needs to wait for the completion of item restoration and plugin operations will be executed by this controller, and the phase will be set to either `Completed` or `PartiallyFailed` based on the results of these works.
Points worth noting about the new restore finalizer controller:
A new structure `finalizerContext` will be created to facilitate the implementation of any wrap-up tasks. It includes all the dependencies the tasks require as well as a function `execute()` to orderly implement task logic.
```
// finalizerContext includes all the dependencies required by wrap-up tasks
type finalizerContext struct {
.......
restore *velerov1api.Restore
log logrus.FieldLogger
.......
}
// execute executes all the wrap-up tasks and return the result
func (ctx *finalizerContext) execute() (results.Result, results.Result) {
// execute task1
.......
// execute task2
.......
// the task execution logic will be expanded as new tasks are included
.......
}
// newFinalizerContext returns a finalizerContext object, the parameters will be added as new tasks are included.
func newFinalizerContext(restore *velerov1api.Restore, log logrus.FieldLogger, ...) *finalizerContext{
return &finalizerContext{
.......
restore: restore,
log: log,
.......
}
}
```
The finalizer controller is responsible for collecting all dependencies and creating a `finalizerContext` object using those dependencies. It then invokes the `execute` function.
```
func (r *restoreFinalizerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
.......
// collect all dependencies required by wrap-up tasks
.......
// create a finalizerContext object and invoke execute()
finalizerCtx := newFinalizerContext(restore, log, ...)
warnings, errs := finalizerCtx.execute()
.......
}
```
After completing all necessary tasks, the result metadata in object storage will be updated if any errors or warnings occur during the execution. This behavior breaks the feature of keeping metadata files in object storage immutable, However, we believe the tradeoff is justified because it provides users with the access to examine the error/warning details when the wrap-up tasks go wrong.
```
// UpdateResults updates the result metadata in object storage if necessary
func (r *restoreFinalizerReconciler) UpdateResults(restore *api.Restore, newWarnings *results.Result, newErrs *results.Result, backupStore persistence.BackupStore) error {
originResults, err := backupStore.GetRestoreResults(restore.Name)
if err != nil {
return errors.Wrap(err, "error getting restore results")
}
warnings := originResults["warnings"]
errs := originResults["errors"]
warnings.Merge(newWarnings)
errs.Merge(newErrs)
m := map[string]results.Result{
"warnings": warnings,
"errors": errs,
}
if err := putResults(restore, m, backupStore); err != nil {
return errors.Wrap(err, "error putting restore results")
}
return nil
}
```
## Compatibility
The new finalization phases are added without modifying the existing phases in the restore workflow. Both new and ongoing restore processes will continue to eventually transition to a terminal phase from any prior phase, ensuring backward compatibility.
## Implementation
This will be implemented during the Velero 1.14 development cycle.

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB