mirror of
https://github.com/vmware-tanzu/velero.git
synced 2026-01-03 11:45:20 +00:00
update 1.17 readme and implemented design
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
This commit is contained in:
374
design/Implemented/clean_artifacts_in_csi_flow.md
Normal file
374
design/Implemented/clean_artifacts_in_csi_flow.md
Normal file
@@ -0,0 +1,374 @@
|
||||
# Design to clean the artifacts generated in the CSI backup and restore workflows
|
||||
|
||||
## Terminology
|
||||
|
||||
* VSC: VolumeSnapshotContent
|
||||
* VS: VolumeSnapshot
|
||||
|
||||
## Abstract
|
||||
* The design aims to delete the unnecessary VSs and VSCs generated during CSI backup and restore process.
|
||||
* The design stop creating related VSCs during backup syncing.
|
||||
|
||||
## Background
|
||||
In the current CSI backup and restore workflows, please notice the CSI B/R workflows means only using the CSI snapshots in the B/R, not including the CSI snapshot data movement workflows, some generated artifacts are kept after the backup or the restore process completion.
|
||||
|
||||
Some of them are kept due to design, for example, the VolumeSnapshotContents generated during the backup are kept to make sure the backup deletion can clean the snapshots in the storage providers.
|
||||
|
||||
Some of them are kept by accident, for example, after restore, two VolumeSnapshotContents are generated for the same VolumeSnapshot. One is from the backup content, and one is dynamically generated from the restore's VolumeSnapshot.
|
||||
|
||||
The design aims to clean the unnecessary artifacts, and make the CSI B/R workflow more concise and reliable.
|
||||
|
||||
## Goals
|
||||
- Clean the redundant VSC generated during CSI backup and restore.
|
||||
- Remove the VSCs in the backup sync process.
|
||||
|
||||
## Non Goals
|
||||
- There were some discussion about whether Velero backup should include VSs and VSCs not generated in during the backup. By far, the conclusion is not including them is a better option. Although that is a useful enhancement, that is not included this design.
|
||||
- Delete all the CSI-related metadata files in the BSL is not the aim of this design.
|
||||
|
||||
## Detailed Design
|
||||
### Backup
|
||||
During backup, the main change is the backup-generated VSCs should not kept anymore.
|
||||
|
||||
The reasons is we don't need them to ensure the snapshots clean up during backup deletion. Please reference to the [Backup Deletion section](#backup-deletion) section for detail.
|
||||
|
||||
As a result, we can simplify the VS deletion logic in the backup. Before, we need to not only delete the VS, but also recreate a static VSC pointing a non-exiting VS.
|
||||
|
||||
The deletion code in VS BackupItemAction can be simplify to the following:
|
||||
|
||||
``` go
|
||||
if backup.Status.Phase == velerov1api.BackupPhaseFinalizing ||
|
||||
backup.Status.Phase == velerov1api.BackupPhaseFinalizingPartiallyFailed {
|
||||
p.log.
|
||||
WithField("Backup", fmt.Sprintf("%s/%s", backup.Namespace, backup.Name)).
|
||||
WithField("BackupPhase", backup.Status.Phase).Debugf("Cleaning VolumeSnapshots.")
|
||||
|
||||
if vsc == nil {
|
||||
vsc = &snapshotv1api.VolumeSnapshotContent{}
|
||||
}
|
||||
|
||||
csi.DeleteReadyVolumeSnapshot(*vs, *vsc, p.crClient, p.log)
|
||||
return item, nil, "", nil, nil
|
||||
}
|
||||
|
||||
|
||||
func DeleteReadyVolumeSnapshot(
|
||||
vs snapshotv1api.VolumeSnapshot,
|
||||
vsc snapshotv1api.VolumeSnapshotContent,
|
||||
client crclient.Client,
|
||||
logger logrus.FieldLogger,
|
||||
) {
|
||||
logger.Infof("Deleting Volumesnapshot %s/%s", vs.Namespace, vs.Name)
|
||||
if vs.Status == nil ||
|
||||
vs.Status.BoundVolumeSnapshotContentName == nil ||
|
||||
len(*vs.Status.BoundVolumeSnapshotContentName) <= 0 {
|
||||
logger.Errorf("VolumeSnapshot %s/%s is not ready. This is not expected.",
|
||||
vs.Namespace, vs.Name)
|
||||
return
|
||||
}
|
||||
|
||||
if vs.Status != nil && vs.Status.BoundVolumeSnapshotContentName != nil {
|
||||
// Patch the DeletionPolicy of the VolumeSnapshotContent to set it to Retain.
|
||||
// This ensures that the volume snapshot in the storage provider is kept.
|
||||
if err := SetVolumeSnapshotContentDeletionPolicy(
|
||||
vsc.Name,
|
||||
client,
|
||||
snapshotv1api.VolumeSnapshotContentRetain,
|
||||
); err != nil {
|
||||
logger.Warnf("Failed to patch DeletionPolicy of volume snapshot %s/%s",
|
||||
vs.Namespace, vs.Name)
|
||||
return
|
||||
}
|
||||
|
||||
if err := client.Delete(context.TODO(), &vsc); err != nil {
|
||||
logger.Warnf("Failed to delete the VSC %s: %s", vsc.Name, err.Error())
|
||||
}
|
||||
}
|
||||
if err := client.Delete(context.TODO(), &vs); err != nil {
|
||||
logger.Warnf("Failed to delete volumesnapshot %s/%s: %v", vs.Namespace, vs.Name, err)
|
||||
} else {
|
||||
logger.Infof("Deleted volumesnapshot with volumesnapshotContent %s/%s",
|
||||
vs.Namespace, vs.Name)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Restore
|
||||
|
||||
#### Restore the VolumeSnapshotContent
|
||||
The current behavior of VSC restoration is that the VSC from the backup is restore, and the restored VS also triggers creating a new VSC dynamically.
|
||||
|
||||
Two VSCs created for the same VS in one restore seems not right.
|
||||
|
||||
Skip restore the VSC from the backup is not a viable alternative, because VSC may reference to a [snapshot create secret](https://kubernetes-csi.github.io/docs/secrets-and-credentials-volume-snapshot-class.html?highlight=snapshotter-secret-name#createdelete-volumesnapshot-secret).
|
||||
|
||||
If the `SkipRestore` is set true in the restore action's result, the secret returned in the additional items is ignored too.
|
||||
|
||||
As a result, restore the VSC from the backup, and setup the VSC and the VS's relation is a better choice.
|
||||
|
||||
Another consideration is the VSC name should not be the same as the backed-up VSC's, because the older version Velero's restore and backup keep the VSC after completion.
|
||||
|
||||
There's high possibility that the restore will fail due to the VSC already exists in the cluster.
|
||||
|
||||
Multiple restores of the same backup will also meet the same problem.
|
||||
|
||||
The proposed solution is using the restore's UID and the VS's name to generate sha256 hash value as the new VSC name. Both the VS and VSC RestoreItemAction can access those UIDs, and it will avoid the conflicts issues.
|
||||
|
||||
The restored VS name also shares the same generated name.
|
||||
|
||||
The VS-referenced VSC name and the VSC's snapshot handle name are in their status.
|
||||
|
||||
Velero restore process purges the restore resources' metadata and status before running the RestoreItemActions.
|
||||
|
||||
As a result, we cannot read these information in the VS and VSC RestoreItemActions.
|
||||
|
||||
Fortunately, RestoreItemAction input parameters includes the `ItemFromBackup`. The status is intact in `ItemFromBackup`.
|
||||
|
||||
``` go
|
||||
func (p *volumeSnapshotRestoreItemAction) Execute(
|
||||
input *velero.RestoreItemActionExecuteInput,
|
||||
) (*velero.RestoreItemActionExecuteOutput, error) {
|
||||
p.log.Info("Starting VolumeSnapshotRestoreItemAction")
|
||||
|
||||
if boolptr.IsSetToFalse(input.Restore.Spec.RestorePVs) {
|
||||
p.log.Infof("Restore %s/%s did not request for PVs to be restored.",
|
||||
input.Restore.Namespace, input.Restore.Name)
|
||||
return &velero.RestoreItemActionExecuteOutput{SkipRestore: true}, nil
|
||||
}
|
||||
|
||||
var vs snapshotv1api.VolumeSnapshot
|
||||
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(
|
||||
input.Item.UnstructuredContent(), &vs); err != nil {
|
||||
return &velero.RestoreItemActionExecuteOutput{},
|
||||
errors.Wrapf(err, "failed to convert input.Item from unstructured")
|
||||
}
|
||||
|
||||
var vsFromBackup snapshotv1api.VolumeSnapshot
|
||||
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(
|
||||
input.ItemFromBackup.UnstructuredContent(), &vsFromBackup); err != nil {
|
||||
return &velero.RestoreItemActionExecuteOutput{},
|
||||
errors.Wrapf(err, "failed to convert input.Item from unstructured")
|
||||
}
|
||||
|
||||
// If cross-namespace restore is configured, change the namespace
|
||||
// for VolumeSnapshot object to be restored
|
||||
newNamespace, ok := input.Restore.Spec.NamespaceMapping[vs.GetNamespace()]
|
||||
if !ok {
|
||||
// Use original namespace
|
||||
newNamespace = vs.Namespace
|
||||
}
|
||||
|
||||
if csiutil.IsVolumeSnapshotExists(newNamespace, vs.Name, p.crClient) {
|
||||
p.log.Debugf("VolumeSnapshot %s already exists in the cluster. Return without change.", vs.Namespace+"/"+vs.Name)
|
||||
return &velero.RestoreItemActionExecuteOutput{UpdatedItem: input.Item}, nil
|
||||
}
|
||||
|
||||
newVSCName := generateSha256FromRestoreAndVsUID(string(input.Restore.UID), string(vsFromBackup.UID))
|
||||
// Reset Spec to convert the VolumeSnapshot from using
|
||||
// the dynamic VolumeSnapshotContent to the static one.
|
||||
resetVolumeSnapshotSpecForRestore(&vs, &newVSCName)
|
||||
|
||||
// Reset VolumeSnapshot annotation. By now, only change
|
||||
// DeletionPolicy to Retain.
|
||||
resetVolumeSnapshotAnnotation(&vs)
|
||||
|
||||
vsMap, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&vs)
|
||||
if err != nil {
|
||||
p.log.Errorf("Fail to convert VS %s to unstructured", vs.Namespace+"/"+vs.Name)
|
||||
return nil, errors.WithStack(err)
|
||||
}
|
||||
|
||||
p.log.Infof(`Returning from VolumeSnapshotRestoreItemAction with
|
||||
no additionalItems`)
|
||||
|
||||
return &velero.RestoreItemActionExecuteOutput{
|
||||
UpdatedItem: &unstructured.Unstructured{Object: vsMap},
|
||||
AdditionalItems: []velero.ResourceIdentifier{},
|
||||
}, nil
|
||||
}
|
||||
|
||||
// generateSha256FromRestoreAndVsUID Use the restore UID and the VS UID to generate the new VSC name.
|
||||
// By this way, VS and VSC RIA action can get the same VSC name.
|
||||
func generateSha256FromRestoreAndVsUID(restoreUID string, vsUID string) string {
|
||||
sha256Bytes := sha256.Sum256([]byte(restoreUID + "/" + vsUID))
|
||||
return "vsc-" + hex.EncodeToString(sha256Bytes[:])
|
||||
}
|
||||
```
|
||||
|
||||
#### Restore the VolumeSnapshot
|
||||
``` go
|
||||
// Execute restores a VolumeSnapshotContent object without modification
|
||||
// returning the snapshot lister secret, if any, as additional items to restore.
|
||||
func (p *volumeSnapshotContentRestoreItemAction) Execute(
|
||||
input *velero.RestoreItemActionExecuteInput,
|
||||
) (*velero.RestoreItemActionExecuteOutput, error) {
|
||||
if boolptr.IsSetToFalse(input.Restore.Spec.RestorePVs) {
|
||||
p.log.Infof("Restore did not request for PVs to be restored %s/%s",
|
||||
input.Restore.Namespace, input.Restore.Name)
|
||||
return &velero.RestoreItemActionExecuteOutput{SkipRestore: true}, nil
|
||||
}
|
||||
|
||||
p.log.Info("Starting VolumeSnapshotContentRestoreItemAction")
|
||||
|
||||
var vsc snapshotv1api.VolumeSnapshotContent
|
||||
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(
|
||||
input.Item.UnstructuredContent(), &vsc); err != nil {
|
||||
return &velero.RestoreItemActionExecuteOutput{},
|
||||
errors.Wrapf(err, "failed to convert input.Item from unstructured")
|
||||
}
|
||||
|
||||
var vscFromBackup snapshotv1api.VolumeSnapshotContent
|
||||
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(
|
||||
input.ItemFromBackup.UnstructuredContent(), &vscFromBackup); err != nil {
|
||||
return &velero.RestoreItemActionExecuteOutput{},
|
||||
errors.Errorf(err.Error(), "failed to convert input.ItemFromBackup from unstructured")
|
||||
}
|
||||
|
||||
// If cross-namespace restore is configured, change the namespace
|
||||
// for VolumeSnapshot object to be restored
|
||||
newNamespace, ok := input.Restore.Spec.NamespaceMapping[vsc.Spec.VolumeSnapshotRef.Namespace]
|
||||
if ok {
|
||||
// Update the referenced VS namespace to the mapping one.
|
||||
vsc.Spec.VolumeSnapshotRef.Namespace = newNamespace
|
||||
}
|
||||
|
||||
// Reset VSC name to align with VS.
|
||||
vsc.Name = generateSha256FromRestoreAndVsUID(string(input.Restore.UID), string(vscFromBackup.Spec.VolumeSnapshotRef.UID))
|
||||
|
||||
// Reset the ResourceVersion and UID of referenced VolumeSnapshot.
|
||||
vsc.Spec.VolumeSnapshotRef.ResourceVersion = ""
|
||||
vsc.Spec.VolumeSnapshotRef.UID = ""
|
||||
|
||||
// Set the DeletionPolicy to Retain to avoid VS deletion will not trigger snapshot deletion
|
||||
vsc.Spec.DeletionPolicy = snapshotv1api.VolumeSnapshotContentRetain
|
||||
|
||||
if vscFromBackup.Status != nil && vscFromBackup.Status.SnapshotHandle != nil {
|
||||
vsc.Spec.Source.VolumeHandle = nil
|
||||
vsc.Spec.Source.SnapshotHandle = vscFromBackup.Status.SnapshotHandle
|
||||
} else {
|
||||
p.log.Errorf("fail to get snapshot handle from VSC %s status", vsc.Name)
|
||||
return nil, errors.Errorf("fail to get snapshot handle from VSC %s status", vsc.Name)
|
||||
}
|
||||
|
||||
additionalItems := []velero.ResourceIdentifier{}
|
||||
if csi.IsVolumeSnapshotContentHasDeleteSecret(&vsc) {
|
||||
additionalItems = append(additionalItems,
|
||||
velero.ResourceIdentifier{
|
||||
GroupResource: schema.GroupResource{Group: "", Resource: "secrets"},
|
||||
Name: vsc.Annotations[velerov1api.PrefixedSecretNameAnnotation],
|
||||
Namespace: vsc.Annotations[velerov1api.PrefixedSecretNamespaceAnnotation],
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
vscMap, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&vsc)
|
||||
if err != nil {
|
||||
return nil, errors.WithStack(err)
|
||||
}
|
||||
|
||||
p.log.Infof("Returning from VolumeSnapshotContentRestoreItemAction with %d additionalItems",
|
||||
len(additionalItems))
|
||||
return &velero.RestoreItemActionExecuteOutput{
|
||||
UpdatedItem: &unstructured.Unstructured{Object: vscMap},
|
||||
AdditionalItems: additionalItems,
|
||||
}, nil
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### Backup Sync
|
||||
csi-volumesnapshotclasses.json, csi-volumesnapshotcontents.json, and csi-volumesnapshots.json are CSI-related metadata files in the BSL for each backup.
|
||||
|
||||
csi-volumesnapshotcontents.json and csi-volumesnapshots.json are not needed anymore, but csi-volumesnapshotclasses.json is still needed.
|
||||
|
||||
One concrete scenario is that a backup is created in cluster-A, then the backup is synced to cluster-B, and the backup is deleted in the cluster-B. In this case, we don't have a chance to create the VS and VSC needed VolumeSnapshotClass.
|
||||
|
||||
The VSC deletion workflow proposed by this design needs to create the VSC first. If the VSC's referenced VolumeSnapshotClass doesn't exist in cluster, the creation of VSC will fail.
|
||||
|
||||
As a result, the VolumeSnapshotClass should still be synced in the backup sync process.
|
||||
|
||||
### Backup Deletion
|
||||
Two factors are worthy for consideration for the backup deletion change:
|
||||
* Because the VSCs generated by the backup are not synced anymore, and the VSCs generated during the backup will not be kept too. The backup deletion needs to generate a VSC, then deletes it to make sure the snapshots in the storage provider are clean too.
|
||||
* The VSs generated by the backup are already deleted in the backup process, we don't need a DeleteItemAction for the VS anymore. As a result, the `velero.io/csi-volumesnapshot-delete` plugin is unneeded.
|
||||
|
||||
For the VSC DeleteItemAction, we need to generate a VSC. Because we only care about the snapshot deletion, we don't need to create a VS associated with the VSC.
|
||||
|
||||
Create a static VSC, then point it to a pseudo VS, and reference to the snapshot handle should be enough.
|
||||
|
||||
To avoid the created VSC conflict with older version Velero B/R generated ones, the VSC name is set to `vsc-uuid`.
|
||||
|
||||
The following is an example of the implementation.
|
||||
``` go
|
||||
uuid, err := uuid.NewRandom()
|
||||
if err != nil {
|
||||
p.log.WithError(err).Errorf("Fail to generate the UUID to create VSC %s", snapCont.Name)
|
||||
return errors.Wrapf(err, "Fail to generate the UUID to create VSC %s", snapCont.Name)
|
||||
}
|
||||
snapCont.Name = "vsc-" + uuid.String()
|
||||
|
||||
snapCont.Spec.DeletionPolicy = snapshotv1api.VolumeSnapshotContentDelete
|
||||
|
||||
snapCont.Spec.Source = snapshotv1api.VolumeSnapshotContentSource{
|
||||
SnapshotHandle: snapCont.Status.SnapshotHandle,
|
||||
}
|
||||
|
||||
snapCont.Spec.VolumeSnapshotRef = corev1api.ObjectReference{
|
||||
APIVersion: snapshotv1api.SchemeGroupVersion.String(),
|
||||
Kind: "VolumeSnapshot",
|
||||
Namespace: "ns-" + string(snapCont.UID),
|
||||
Name: "name-" + string(snapCont.UID),
|
||||
}
|
||||
|
||||
snapCont.ResourceVersion = ""
|
||||
|
||||
if err := p.crClient.Create(context.TODO(), &snapCont); err != nil {
|
||||
return errors.Wrapf(err, "fail to create VolumeSnapshotContent %s", snapCont.Name)
|
||||
}
|
||||
|
||||
// Read resource timeout from backup annotation, if not set, use default value.
|
||||
timeout, err := time.ParseDuration(
|
||||
input.Backup.Annotations[velerov1api.ResourceTimeoutAnnotation])
|
||||
if err != nil {
|
||||
p.log.Warnf("fail to parse resource timeout annotation %s: %s",
|
||||
input.Backup.Annotations[velerov1api.ResourceTimeoutAnnotation], err.Error())
|
||||
timeout = 10 * time.Minute
|
||||
}
|
||||
p.log.Debugf("resource timeout is set to %s", timeout.String())
|
||||
|
||||
interval := 5 * time.Second
|
||||
|
||||
// Wait until VSC created and ReadyToUse is true.
|
||||
if err := wait.PollUntilContextTimeout(
|
||||
context.Background(),
|
||||
interval,
|
||||
timeout,
|
||||
true,
|
||||
func(ctx context.Context) (bool, error) {
|
||||
tmpVSC := new(snapshotv1api.VolumeSnapshotContent)
|
||||
if err := p.crClient.Get(ctx, crclient.ObjectKeyFromObject(&snapCont), tmpVSC); err != nil {
|
||||
return false, errors.Wrapf(
|
||||
err, "failed to get VolumeSnapshotContent %s", snapCont.Name,
|
||||
)
|
||||
}
|
||||
|
||||
if tmpVSC.Status != nil && boolptr.IsSetToTrue(tmpVSC.Status.ReadyToUse) {
|
||||
return true, nil
|
||||
}
|
||||
|
||||
return false, nil
|
||||
},
|
||||
); err != nil {
|
||||
return errors.Wrapf(err, "fail to wait VolumeSnapshotContent %s becomes ready.", snapCont.Name)
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
Security is not relevant to this design.
|
||||
|
||||
## Compatibility
|
||||
In this design, no new information is added in backup and restore. As a result, this design doesn't have any compatibility issue.
|
||||
|
||||
## Open Issues
|
||||
Please notice the CSI snapshot backup and restore mechanism not supporting all file-store-based volume, e.g. Azure Files, EFS or vSphere CNS File Volume. Only block-based volumes are supported.
|
||||
Refer to [this comment](https://github.com/vmware-tanzu/velero/issues/3151#issuecomment-2623507686) for more details.
|
||||
121
design/Implemented/node-agent-load-soothing.md
Normal file
121
design/Implemented/node-agent-load-soothing.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Node-agent Load Soothing Design
|
||||
|
||||
## Glossary & Abbreviation
|
||||
|
||||
**Velero Generic Data Path (VGDP)**: VGDP is the collective of modules that is introduced in [Unified Repository design][1]. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository.
|
||||
|
||||
## Background
|
||||
|
||||
As mentioned in [node-agent Concurrency design][2], [CSI Snapshot Data Movement design][3], [VGDP Micro Service design][4] and [VGDP Micro Service for fs-backup design][5], all data movement activities for CSI snapshot data movement backups/restores and fs-backup respect the `loadConcurrency` settings configured in the `node-agent-configmap`. Once the number of existing loads exceeds the corresponding `loadConcurrency` setting, the loads will be throttled and some loads will be held until VGDP quotas are available.
|
||||
However, this throttling only happens after the data mover pod is started and gets to `running`. As a result, when there are large number of concurrent volume backups, there may be many data mover pods get created but the VGDP instances inside them are actually on hold because of the VGDP throttling.
|
||||
This could cause below problems:
|
||||
- In some environments, there is a pod limit in each node of the cluster or a pod limit throughout the cluster, too many of the inactive data mover pods may block other pods from running
|
||||
- In some environments, the system disk for each node of the cluster is limited, while pods also occupy system disk space, etc., many of the inactive data mover pods also take unnecessary space from system disk and cause other critical pods evicted
|
||||
- For CSI snapshot data movement backup, before creation of the data mover pod, the volume snapshot has also created, this means excessive number of snapshots may also be created and live for longer time since the VGDP won't start until the quota is available. However, in some environments, large number of snapshots is not allowed or may cause degradation of the storage peroformance
|
||||
|
||||
On the other hand, the VGDP throttling mentioned in [node-agent Concurrency design][2] is an accurate controlling mechanism, that is, exactly the required number of data mover pods are throttled.
|
||||
|
||||
Therefore, another mechanism is required to soothe the creation of the data mover pods and volume snapshots before the VGDP throttling. It doesn't need to accurately control these creations but should effectively reduce the excessive number of inactive data mover pods and volume snapshots.
|
||||
It is not practical to make an accurate control as it is almost impossible to predict which group of nodes a data mover pod is scheduled to, under the consideration of many complex factors, i.e., selected node, affinity, node OS, etc.
|
||||
|
||||
|
||||
## Goals
|
||||
|
||||
- Allow users to configure the expected number of loads pending on waiting for VGDP load concurrency quota
|
||||
- Create a soothing mechanism to prevent new loads from starting if the number of existing loads excceds the expected number
|
||||
|
||||
## Non-Goals
|
||||
- Accurately controlling the loads from initiation is not a goal
|
||||
|
||||
## Solution
|
||||
|
||||
We introduce a new field `prepareQueueLength` in `loadConcurrency` of `node-agent-configmap` as the allowed number of loads that are under preparing (expose). Specifically, loads are in this situation after its CR is in `Accepted` and `Prepared` phase. The `prepareQueueLength` should be a positive number, negative numbers will be ignored.
|
||||
Once the value is set, the soothing mechanism takes effect, as the best effort, only the allowed number of CRs go into `Accepted` or `Prepared` phase, others will wait and stay as `New` state; and thereby only the allowed number of data mover pods, volume snapshots are created.
|
||||
Otherwise, node-agent works the same as the legacy behavior, CRs go to `Accepted` or `Prepared` state as soon as the controllers process them and data mover pods and volume snapshots are also created without any constraints.
|
||||
If users want to constrain the excessive number of pending data mover pods and volume snapshots, they could set a value by considering the VGDP load concurrency; otherwise, if they don't see constrains for pods or volume snapshots in their environment, they don't need to use this feature, in parallel preparing could also be beneficial for increasing the concurrency.
|
||||
|
||||
Node-agent server checks this configuration at startup time and use it to initiate the related VGDP modules. Therefore, users could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted.
|
||||
|
||||
The data structure is as below:
|
||||
```go
|
||||
type LoadConcurrency struct {
|
||||
// GlobalConfig specifies the concurrency number to all nodes for which per-node config is not specified
|
||||
GlobalConfig int `json:"globalConfig,omitempty"`
|
||||
|
||||
// PerNodeConfig specifies the concurrency number to nodes matched by rules
|
||||
PerNodeConfig []RuledConfigs `json:"perNodeConfig,omitempty"`
|
||||
|
||||
// PrepareQueueLength specifies the max number of loads that are under expose
|
||||
PrepareQueueLength int `json:"prepareQueueLength,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
### Sample
|
||||
A sample of the ConfigMap is as below:
|
||||
```json
|
||||
{
|
||||
"loadConcurrency": {
|
||||
"globalConfig": 2,
|
||||
"perNodeConfig": [
|
||||
{
|
||||
"nodeSelector": {
|
||||
"matchLabels": {
|
||||
"kubernetes.io/hostname": "node1"
|
||||
}
|
||||
},
|
||||
"number": 3
|
||||
},
|
||||
{
|
||||
"nodeSelector": {
|
||||
"matchLabels": {
|
||||
"beta.kubernetes.io/instance-type": "Standard_B4ms"
|
||||
}
|
||||
},
|
||||
"number": 5
|
||||
}
|
||||
],
|
||||
"prepareQueueLength": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
To create the configMap, users need to save something like the above sample to a json file and then run below command:
|
||||
```
|
||||
kubectl create cm <ConfigMap name> -n velero --from-file=<json file name>
|
||||
```
|
||||
|
||||
## Detailed Design
|
||||
Changes apply to the DataUpload Controller, DataDownload Controller, PodVolumeBackup Controller and PodVolumeRestore Controller, as below:
|
||||
1. The soothe happens to data mover CRs (DataUpload, DataDownload, PodVolumeBackup or PodVolumeRestore) that are in `New` state
|
||||
2. Before starting processing the CR, the corresponding controller counts the existing CRs under or pending for expose in the cluster, that is a total number of existing DataUpload, DataDownload, PodVolumeBackup and PodVolumeRestore that are in either `Accepted` or `Preparing` state
|
||||
3. If the total number doesn't exceed the allowed number, the controller set the CR's phase to `Accepted`
|
||||
4. Once the total number exceeds the allowed number, the controller gives up processing the CR and have it requeued later. The delay for the requeue is 5 seconds
|
||||
|
||||
The count happens for all the controllers in all nodes, to prevent the checks drain out the API server, the count happens to controller client caches for those CRs. And the count result is also cached, so that the count only happens whenever necessary. Below shows how it judges the necessity:
|
||||
- When one or more CRs' phase change to `Accepted`
|
||||
- When one or more CRs' phase change from `Accepted` to one of the terminal phases
|
||||
- When one or more CRs' phase change from `Prepared` to one of the terminal phases
|
||||
- When one or more CRs' phase change from `Prepared` to `InProgress`
|
||||
|
||||
Ideally, 2~3 in the above steps need to be synchornized among controllers in all nodes. However, this synchronization is not implemented, the consideration is as below:
|
||||
1. It is impossible to accurately synchronize the count among controllers in different nodes, because the client cache is not coherrent among nodes.
|
||||
2. It is possible to synchronize the count among controllers in the same node. However, it is too expensive to make this synchronization, because 2~3 are part of the expose workflow, the synchronization impacts the performance and stability of the existing workflow.
|
||||
3. Even without the synchronization, the soothing mechanism still works eventually -- when the controllers see all the discharged loads (expected ones and over-discharged ones), they will stop creating new loads until the quota is available again.
|
||||
4. Step 2~3 that need to be synchronized could complete very quickly.
|
||||
|
||||
This is why we say this mechanism is not an accurate control. Or in another word, it is possible that more loads than the number of `prepareQueueLength` are discharged if controllers make the count and expose in the overlapped time (step 2~3).
|
||||
For example, when multiple controllers of the same type (DataUpload, DataDownload, PodVolumeBackup or PodVolumeRestore) from different nodes make the count:
|
||||
```
|
||||
max number of waiting loads = number defined by `prepareQueueLength` + number of nodes in cluster
|
||||
```
|
||||
As another example, when hybrid loads are running the count concurrently, e.g., mix of data mover backups, data mover restores, pod volume backups or pod volume restores, more loads may be discharged and the number depends on the number of concurrent hybrid loads.
|
||||
In either case, because step 2~3 is short in time, it is less likely to reach the theoretically worset result.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
[1]: Implemented/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md
|
||||
[2]: Implemented/node-agent-concurrency.md
|
||||
[3]: Implemented/volume-snapshot-data-movement/volume-snapshot-data-movement.md
|
||||
[4]: Implemented/vgdp-micro-service/vgdp-micro-service.md
|
||||
[5]: vgdp-micro-service-for-fs-backup/vgdp-micro-service-for-fs-backup.md
|
||||
257
design/Implemented/vgdp-affinity-enhancement.md
Normal file
257
design/Implemented/vgdp-affinity-enhancement.md
Normal file
@@ -0,0 +1,257 @@
|
||||
# Velero Generic Data Path Load Affinity Enhancement Design
|
||||
|
||||
## Glossary & Abbreviation
|
||||
|
||||
**Velero Generic Data Path (VGDP)**: VGDP is the collective modules that is introduced in [Unified Repository design][1]. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository.
|
||||
|
||||
**Exposer**: Exposer is a module that is introduced in [Volume Snapshot Data Movement Design][1]. Velero uses this module to expose the volume snapshots to Velero node-agent pods or node-agent associated pods so as to complete the data movement from the snapshots.
|
||||
|
||||
## Background
|
||||
|
||||
The implemented [VGDP LoadAffinity design][3] already defined the a structure `LoadAffinity` in `--node-agent-configmap` parameter. The parameter is used to set the affinity of the backupPod of VGDP.
|
||||
|
||||
There are still some limitations of this design:
|
||||
* The affinity setting is global. Say there are two StorageClasses and the underlying storage can only provision volumes to part of the cluster nodes. The supported nodes don't have intersection. Then the affinity will definitely not work in some cases.
|
||||
* The old design focuses on the backupPod affinity, but the restorePod also needs the affinity setting.
|
||||
|
||||
As a result, create this design to address the limitations.
|
||||
|
||||
## Goals
|
||||
|
||||
- Enhance the node affinity of VGDP instances for volume snapshot data movement: add per StorageClass node affinity.
|
||||
- Enhance the node affinity of VGDP instances for volume snapshot data movement: support the or logic between affinity selectors.
|
||||
- Define the behaviors of node affinity of VGDP instances in node-agent for volume snapshot data movement restore, when the PVC restore doesn't require delay binding.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- It is also beneficial to support VGDP instances affinity for PodVolume backup/restore, this will be implemented after the PodVolume micro service completes.
|
||||
|
||||
## Solution
|
||||
|
||||
This design still uses the ConfigMap specified by `velero node-agent` CLI's parameter `--node-agent-configmap` to host the node affinity configurations.
|
||||
|
||||
Upon the implemented [VGDP LoadAffinity design][3] introduced `[]*LoadAffinity` structure, this design add a new field `StorageClass`. This field is optional.
|
||||
* If the `LoadAffinity` element's `StorageClass` doesn't have value, it means this element is applied to global, just as the old design.
|
||||
* If the `LoadAffinity` element's `StorageClass` has value, it means this element is applied to the VGDP instances' PVCs use the specified StorageClass.
|
||||
* The `LoadAffinity` element whose `StorageClass` has value has higher priority than the `LoadAffinity` element whose `StorageClass` doesn't have value.
|
||||
|
||||
|
||||
```go
|
||||
type Configs struct {
|
||||
// LoadConcurrency is the config for load concurrency per node.
|
||||
LoadConcurrency *LoadConcurrency `json:"loadConcurrency,omitempty"`
|
||||
|
||||
// LoadAffinity is the config for data path load affinity.
|
||||
LoadAffinity []*LoadAffinity `json:"loadAffinity,omitempty"`
|
||||
}
|
||||
|
||||
type LoadAffinity struct {
|
||||
// NodeSelector specifies the label selector to match nodes
|
||||
NodeSelector metav1.LabelSelector `json:"nodeSelector"`
|
||||
}
|
||||
```
|
||||
|
||||
``` go
|
||||
type LoadAffinity struct {
|
||||
// NodeSelector specifies the label selector to match nodes
|
||||
NodeSelector metav1.LabelSelector `json:"nodeSelector"`
|
||||
|
||||
// StorageClass specifies the VGDPs the LoadAffinity applied to. If the StorageClass doesn't have value, it applies to all. If not, it applies to only the VGDPs that use this StorageClass.
|
||||
StorageClass string `json:"storageClass"`
|
||||
}
|
||||
```
|
||||
|
||||
### Decision Tree
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[VGDP Pod Needs Scheduling] --> B{Is this a restore operation?}
|
||||
|
||||
B -->|Yes| C{StorageClass has volumeBindingMode: WaitForFirstConsumer?}
|
||||
B -->|No| D[Backup Operation]
|
||||
|
||||
C -->|Yes| E{restorePVC.ignoreDelayBinding = true?}
|
||||
C -->|No| F[StorageClass binding mode: Immediate]
|
||||
|
||||
E -->|No| G[Wait for target Pod scheduling<br/>Use Pod's selected node<br/>⚠️ Affinity rules ignored]
|
||||
E -->|Yes| H[Apply affinity rules<br/>despite WaitForFirstConsumer]
|
||||
|
||||
F --> I{Check StorageClass in loadAffinity by StorageClass field}
|
||||
H --> I
|
||||
D --> J{Using backupPVC with different StorageClass?}
|
||||
|
||||
J -->|Yes| K[Use final StorageClass<br/>for affinity lookup]
|
||||
J -->|No| L[Use original PVC StorageClass<br/>for affinity lookup]
|
||||
|
||||
K --> I
|
||||
L --> I
|
||||
|
||||
I -->|StorageClass found| N[Filter the LoadAffinity by <br/>the StorageClass<br/>🎯 and apply the LoadAffinity HIGHEST PRIORITY]
|
||||
I -->|StorageClass not found| O{Check loadAffinity element without StorageClass field}
|
||||
|
||||
O -->|No loadAffinity configured| R[No affinity constraints<br/>Schedule on any available node<br/>🌐 DEFAULT]
|
||||
|
||||
O --> V[Validate node-agent availability<br/>⚠️ Ensure node-agent pods exist on target nodes]
|
||||
N --> V
|
||||
|
||||
V --> W{Node-agent available on selected nodes?}
|
||||
W -->|Yes| X[✅ VGDP Pod scheduled successfully]
|
||||
W -->|No| Y[❌ Pod stays in Pending state<br/>Timeout after 30min<br/>Check node-agent DaemonSet coverage]
|
||||
|
||||
R --> Z[Schedule on any node<br/>✅ Basic scheduling]
|
||||
|
||||
%% Styling
|
||||
classDef successNode fill:#d4edda,stroke:#155724,color:#155724
|
||||
classDef warningNode fill:#fff3cd,stroke:#856404,color:#856404
|
||||
classDef errorNode fill:#f8d7da,stroke:#721c24,color:#721c24
|
||||
classDef priorityHigh fill:#e7f3ff,stroke:#0066cc,color:#0066cc
|
||||
classDef priorityMedium fill:#f0f8ff,stroke:#4d94ff,color:#4d94ff
|
||||
classDef priorityDefault fill:#f8f9fa,stroke:#6c757d,color:#6c757d
|
||||
|
||||
class X,Z successNode
|
||||
class G,V,Y warningNode
|
||||
class Y errorNode
|
||||
class N,T,U priorityHigh
|
||||
class P,Q priorityMedium
|
||||
class R priorityDefault
|
||||
```
|
||||
|
||||
### Examples
|
||||
|
||||
#### LoadAffinity interacts with LoadAffinityPerStorageClass
|
||||
|
||||
``` json
|
||||
{
|
||||
"loadAffinity": [
|
||||
{
|
||||
"nodeSelector": {
|
||||
"matchLabels": {
|
||||
"beta.kubernetes.io/instance-type": "Standard_B4ms"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeSelector": {
|
||||
"matchExpressions": [
|
||||
{
|
||||
"key": "kubernetes.io/os",
|
||||
"values": [
|
||||
"linux"
|
||||
],
|
||||
"operator": "In"
|
||||
}
|
||||
]
|
||||
},
|
||||
"storageClass": "kibishii-storage-class"
|
||||
},
|
||||
{
|
||||
"nodeSelector": {
|
||||
"matchLabels": {
|
||||
"beta.kubernetes.io/instance-type": "Standard_B8ms"
|
||||
}
|
||||
},
|
||||
"storageClass": "kibishii-storage-class"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
This sample demonstrates how the `loadAffinity` elements with `StorageClass` field and without `StorageClass` field setting work together.
|
||||
If the VGDP mounting volume is created from StorageClass `kibishii-storage-class`, its pod will run Linux nodes or instance type as `Standard_B8ms`.
|
||||
|
||||
The other VGDP instances will run on nodes, which instance type is `Standard_B4ms`.
|
||||
|
||||
#### LoadAffinity interacts with BackupPVC
|
||||
|
||||
``` json
|
||||
{
|
||||
"loadAffinity": [
|
||||
{
|
||||
"nodeSelector": {
|
||||
"matchLabels": {
|
||||
"beta.kubernetes.io/instance-type": "Standard_B4ms"
|
||||
}
|
||||
},
|
||||
"storageClass": "kibishii-storage-class"
|
||||
},
|
||||
{
|
||||
"nodeSelector": {
|
||||
"matchLabels": {
|
||||
"beta.kubernetes.io/instance-type": "Standard_B2ms"
|
||||
}
|
||||
},
|
||||
"storageClass": "worker-storagepolicy"
|
||||
}
|
||||
],
|
||||
"backupPVC": {
|
||||
"kibishii-storage-class": {
|
||||
"storageClass": "worker-storagepolicy"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Velero data mover supports to use different StorageClass to create backupPVC by [design](https://github.com/vmware-tanzu/velero/pull/7982).
|
||||
|
||||
In this example, if the backup target PVC's StorageClass is `kibishii-storage-class`, its backupPVC should use StorageClass `worker-storagepolicy`. Because the final StorageClass is `worker-storagepolicy`, the backupPod uses the loadAffinity specified by `loadAffinity`'s elements with `StorageClass` field set to `worker-storagepolicy`. backupPod will be assigned to nodes, which instance type is `Standard_B2ms`.
|
||||
|
||||
|
||||
#### LoadAffinity interacts with RestorePVC
|
||||
|
||||
``` json
|
||||
{
|
||||
"loadAffinity": [
|
||||
{
|
||||
"nodeSelector": {
|
||||
"matchLabels": {
|
||||
"beta.kubernetes.io/instance-type": "Standard_B4ms"
|
||||
}
|
||||
},
|
||||
"storageClass": "kibishii-storage-class"
|
||||
}
|
||||
],
|
||||
"restorePVC": {
|
||||
"ignoreDelayBinding": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
##### StorageClass's bind mode is WaitForFirstConsumer
|
||||
|
||||
``` yaml
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: kibishii-storage-class
|
||||
parameters:
|
||||
svStorageClass: worker-storagepolicy
|
||||
provisioner: csi.vsphere.vmware.com
|
||||
reclaimPolicy: Delete
|
||||
volumeBindingMode: WaitForFirstConsumer
|
||||
```
|
||||
|
||||
If restorePVC should be created from StorageClass `kibishii-storage-class`, and it's volumeBindingMode is `WaitForFirstConsumer`.
|
||||
Although `loadAffinityPerStorageClass` has a section matches the StorageClass, the `ignoreDelayBinding` is set `false`, the Velero exposer will wait until the target Pod scheduled to a node, and returns the node as SelectedNode for the restorePVC.
|
||||
As a result, the `loadAffinityPerStorageClass` will not take affect.
|
||||
|
||||
##### StorageClass's bind mode is Immediate
|
||||
|
||||
``` yaml
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: kibishii-storage-class
|
||||
parameters:
|
||||
svStorageClass: worker-storagepolicy
|
||||
provisioner: csi.vsphere.vmware.com
|
||||
reclaimPolicy: Delete
|
||||
volumeBindingMode: Immediate
|
||||
```
|
||||
|
||||
Because the StorageClass volumeBindingMode is `Immediate`, although `ignoreDelayBinding` is set to `false`, restorePVC will not be created according to the target Pod.
|
||||
|
||||
The restorePod will be assigned to nodes, which instance type is `Standard_B4ms`.
|
||||
|
||||
[1]: Implemented/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md
|
||||
[2]: Implemented/volume-snapshot-data-movement/volume-snapshot-data-movement.md
|
||||
[3]: Implemented/node-agent-affinity.md
|
||||
611
design/Implemented/volume-group-snapshot.md
Normal file
611
design/Implemented/volume-group-snapshot.md
Normal file
@@ -0,0 +1,611 @@
|
||||
# Add Support for VolumeGroupSnapshots
|
||||
|
||||
This proposal outlines the design and implementation plan for incorporating VolumeGroupSnapshot support into Velero. The enhancement will allow Velero to perform consistent, atomic snapshots of groups of Volumes using the new Kubernetes [VolumeGroupSnapshot API](https://kubernetes.io/blog/2024/12/18/kubernetes-1-32-volume-group-snapshot-beta/). This capability is especially critical for stateful applications that rely on multiple volumes to ensure data consistency, such as databases and analytics workloads.
|
||||
|
||||
## Glossary & Abbreviation
|
||||
|
||||
Terminology used in this document:
|
||||
- VGS: VolumeGroupSnapshot
|
||||
- VS: VolumeSnapshot
|
||||
- VGSC: VolumeGroupSnapshotContent
|
||||
- VSC: VolumeSnapshotContent
|
||||
- VGSClass: VolumeGroupSnapshotClass
|
||||
- VSClass: VolumeSnapshotClass
|
||||
|
||||
## Background
|
||||
|
||||
Velero currently enables snapshot-based backups on an individual Volume basis through CSI drivers. However, modern stateful applications often require multiple volumes for data, logs, and backups. This distributed data architecture increases the risk of inconsistencies when volumes are captured individually. Kubernetes has introduced the VolumeGroupSnapshot(VGS) API [(KEP-3476)](https://github.com/kubernetes/enhancements/pull/1551), which allows for the atomic snapshotting of multiple volumes in a coordinated manner. By integrating this feature, Velero can offer enhanced disaster recovery for multi-volume applications, ensuring consistency across all related data.
|
||||
|
||||
## Goals
|
||||
- Ensure that multiple related volumes are snapshotted simultaneously, preserving consistency for stateful applications via VolumeGroupSnapshots(VGS) API.
|
||||
- Integrate VolumeGroupSnapshot functionality into Velero’s existing backup and restore workflows.
|
||||
- Allow users to opt in to volume group snapshots via specifying the group label.
|
||||
|
||||
## Non-Goals
|
||||
- The proposal does not require a complete overhaul of Velero’s CSI integration, it will extend the current mechanism to support group snapshots.
|
||||
- No any changes pertaining to execution of Restore Hooks
|
||||
|
||||
## High-Level Design
|
||||
|
||||
### Backup workflow:
|
||||
#### Accept the label to be used for VGS from the user:
|
||||
- Accept the label from the user, we will do this in 3 ways:
|
||||
- Firstly, we will have a hard-coded default label key like `velero.io/volume-group-snapshot` that the users can directly use on their PVCs.
|
||||
- Secondly, we will let the users override this default VGS label via a velero server arg, `--volume-group-nsaphot-label-key`, if needed.
|
||||
- And Finally we will have the option to override the default label via Backup API spec, `backup.spec.volumeGroupSnapshotLabelKey`
|
||||
- In all the instances, the VGS label key will be present on the backup spec, this makes the label key accessible to plugins during the execution of backup operation.
|
||||
- This label will enable velero to filter the PVC to be included in the VGS spec.
|
||||
- Users will have to label the PVCs before invoking the backup operation.
|
||||
- This label would act as a group identifier for the PVCs to be grouped under a specific VGS.
|
||||
- It will be used to collect the PVCs to be used for a particular instance of VGS object.
|
||||
|
||||
**Note:**
|
||||
- Modifying or adding VGS label on PVCs during an active backup operation may lead to unexpected or undesirable backup results. To avoid inconsistencies, ensure PVC labels remain unchanged throughout the backup execution.
|
||||
- Label Key Precedence: When determining which label key to use for grouping PVCs into a VolumeGroupSnapshot, Velero applies overrides in the following order (highest to lowest):
|
||||
- Backup API spec (`backup.spec.volumeGroupSnapshotLabelKey`)
|
||||
- Server flag (`--volume-group-snapshot-label-key`)
|
||||
- Built-in default (`velero.io/volume-group-snapshot`)
|
||||
|
||||
Whichever key wins this precedence is then injected into the Backup spec so that all Velero plugins can uniformly discover and use it during the backup execution.
|
||||
#### Changes to the Existing PVC ItemBlockAction plugin:
|
||||
- Currently the PVC IBA plugin is applied to PVCs and adds the RelatedItems for the particular PVC into the ItemBlock.
|
||||
- At first it checks whether the PVC is bound and VolumeName is non-empty.
|
||||
- Then it adds the related PV under the list of relatedItems.
|
||||
- Following on, the plugin adds the pods mounting the PVC as relatedItems.
|
||||
- Now we need to extend this PVC IBA plugin to add the PVCs to be grouped for a particular VGS object, so that they are processed together under an ItemBlock by Velero.
|
||||
- First we will check if the PVC that is being processed by the plugin has the user specified VGS label.
|
||||
- If it is present then we will execute a List call in the namespace with the label as a matching criteria and see if this results in any PVCs (other than the current one).
|
||||
- If there are PVCs matching the criteria then we add the PVCs to the relatedItems list.
|
||||
- This helps in building the ItemBlock we need for VGS processing, i.e. we have the relevant pods and PVCs in the ItemBlock.
|
||||
|
||||
**Note:** The ItemBlock to VGS relationship will not always be 1:1. There might be scenarios when the ItemBlock might have multiple VGS instances associated with it.
|
||||
Lets go over some ItemBlock/VGS scenarios that we might encounter and visualize them for clarity:
|
||||
1. Pod Mounts: Pod1 mounts both PVC1 and PVC2.
|
||||
Grouping: PVC1 and PVC2 share the same group label (group: A)
|
||||
ItemBlock: The item block includes Pod1, PVC1, and PVC2.
|
||||
VolumeGroupSnapshot (VGS): Because PVC1 and PVC2 are grouped together by their label, they trigger the creation of a single VGS (labeled with group: A).
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph ItemBlock
|
||||
P1[Pod1]
|
||||
PVC1[PVC1 group: A]
|
||||
PVC2[PVC2 group: A]
|
||||
end
|
||||
|
||||
P1 -->|mounts| PVC1
|
||||
P1 -->|mounts| PVC2
|
||||
|
||||
PVC1 --- PVC2
|
||||
|
||||
PVC1 -- "group: A" --> VGS[VGS group: A]
|
||||
PVC2 -- "group: A" --> VGS
|
||||
|
||||
```
|
||||
2. Pod Mounts: Pod1 mounts each of the four PVCs.
|
||||
Grouping:
|
||||
Group A: PVC1 and PVC2 share the same grouping label (group: A).
|
||||
Group B: PVC3 and PVC4 share the grouping label (group: B)
|
||||
ItemBlock: All objects (Pod1, PVC1, PVC2, PVC3, and PVC4) are collected into a single item block.
|
||||
VolumeGroupSnapshots:
|
||||
PVC1 and PVC2 (group A) point to the same VGS (VGS (group: A)).
|
||||
PVC3 and PVC4 (group B) point to a different VGS (VGS (group: B)).
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph ItemBlock
|
||||
P1[Pod1]
|
||||
PVC1[PVC1 group: A]
|
||||
PVC2[PVC2 group: A]
|
||||
PVC3[PVC3 group: B]
|
||||
PVC4[PVC4 group: B]
|
||||
end
|
||||
|
||||
%% Pod mounts all PVCs
|
||||
P1 -->|mounts| PVC1
|
||||
P1 -->|mounts| PVC2
|
||||
P1 -->|mounts| PVC3
|
||||
P1 -->|mounts| PVC4
|
||||
|
||||
%% Group A relationships: PVC1 and PVC2
|
||||
PVC1 --- PVC2
|
||||
PVC1 -- "group: A" --> VGS_A[VGS-A group: A]
|
||||
PVC2 -- "group: A" --> VGS_A
|
||||
|
||||
%% Group B relationships: PVC3 and PVC4
|
||||
PVC3 --- PVC4
|
||||
PVC3 -- "group: B" --> VGS_B[VGS-B group: B]
|
||||
PVC4 -- "group: B" --> VGS_B
|
||||
```
|
||||
|
||||
3. Pod Mounts: Pod1 mounts both PVC1 and PVC2, Pod2 mounts PVC1 and PVC3.
|
||||
Grouping:
|
||||
Group A: PVC1 and PVC2
|
||||
Group B: PVC3
|
||||
ItemBlock: All objects-Pod1, Pod2, PVC1, PVC2, and PVC3, are collected into a single item block.
|
||||
VolumeGroupSnapshots:
|
||||
PVC1 and PVC2 (group A) point to the same VGS (VGS (group: A)).
|
||||
PVC3 (group B) point to a different VGS (VGS (group: B)).
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph ItemBlock
|
||||
P1[Pod1]
|
||||
P2[Pod2]
|
||||
PVC1[PVC1 group: A]
|
||||
PVC2[PVC2 group: A]
|
||||
PVC3[PVC3 group: B]
|
||||
end
|
||||
|
||||
%% Pod mount relationships
|
||||
P1 -->|mounts| PVC1
|
||||
P1 -->|mounts| PVC2
|
||||
P2 -->|mounts| PVC1
|
||||
P2 -->|mounts| PVC3
|
||||
|
||||
%% Grouping for Group A: PVC1 and PVC2 are grouped into VGS_A
|
||||
PVC1 --- PVC2
|
||||
PVC1 -- "Group A" --> VGS_A[VGS Group A]
|
||||
PVC2 -- "Group A" --> VGS_A
|
||||
|
||||
%% Grouping for Group B: PVC3 grouped into VGS_B
|
||||
PVC3 -- "Group B" --> VGS_B[VGS Group B]
|
||||
|
||||
```
|
||||
|
||||
#### Updates to CSI PVC plugin:
|
||||
The CSI PVC plugin now supports obtaining a VolumeSnapshot (VS) reference for a PVC in three ways, and then applies common branching for datamover and non‑datamover workflows:
|
||||
|
||||
- Scenario 1: PVC has a VGS label and no VS (created via the VGS workflow) exists for its volume group:
|
||||
- Determine VGSClass: The plugin will pick `VolumeGroupSnapshotClass` by following the same tier based precedence as it does for individual `VolumeSnapshotClasses`:
|
||||
- Default by Label: Use the one VGSClass labeled
|
||||
```yaml
|
||||
metadata:
|
||||
labels:
|
||||
velero.io/csi-volumegroupsnapshot-class: "true"
|
||||
|
||||
```
|
||||
whose `spec.driver` matches the CSI driver used by the PVCs.
|
||||
- Backup‑level Override: If the Backup CR has an annotation
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
velero.io/csi-volumegroupsnapshot-class_<driver>: <className>
|
||||
```
|
||||
(with <driver> equal to the PVCs’ CSI driver), use that class.
|
||||
- PVC‑level Override: Finally, if the PVC itself carries an annotation
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
velero.io/csi-volume-group-snapshot-class: <className>
|
||||
```
|
||||
and that class exists, use it.
|
||||
At each step, if the plugin finds zero or multiple matching classes, VGS creation is skipped and backup fails.
|
||||
- Create VGS: The plugin creates a new VolumeGroupSnapshot (VGS) for the PVC’s volume group. This action automatically triggers creation of the corresponding VGSC, VS, and VSC objects.
|
||||
- Wait for VS Status: The plugin waits until each VS (one per PVC in the group) has its `volumeGroupSnapshotName` populated. This confirms that the snapshot controller has completed its work. `CSISnapshotTimeout` will be used here.
|
||||
- Update VS Objects: Once the VS objects are provisioned, the plugin updates them by removing VGS owner references and VGS-related finalizers, and by adding backup metadata labels (including BackupName, BackupUUID, and PVC name). These labels are later used to detect an existing VS when processing another PVC of the same group.
|
||||
- Patch and Cleanup: The plugin patches the deletionPolicy of the VGSC to "Retain" (ensuring that deletion of the VGSC does not remove the underlying VSC objects or storage snapshots) and then deletes the temporary VGS and VGSC objects.
|
||||
|
||||
- Scenario 2: PVC has a VGS label and a VS created via an earlier VGS workflow already exists:
|
||||
- The plugin lists VS objects in the PVC’s namespace using backup metadata labels (BackupUID, BackupName, and PVCName).
|
||||
- It verifies that at least one VS has a non‑empty `volumeGroupSnapshotName` in its status.
|
||||
- If such a VS exists, the plugin skips creating a new VGS (or VS) and proceeds with the legacy workflow using the existing VS.
|
||||
- If a VS is found but its status does not indicate it was created by the VGS workflow (i.e. its `volumeGroupSnapshotName` is empty), the backup for that PVC is failed, resulting in a partially failed backup.
|
||||
- Scenario 3: PVC does not have a VGS label:
|
||||
- The legacy workflow is followed, and an individual VolumeSnapshot (VS) is created for the PVC.
|
||||
- Common Branching for Datamover and Non‑datamover Workflows:
|
||||
- Once a VS reference (`vsRef`) is determined—whether through the VGS workflow (Scenario 1 or 2) or the legacy workflow (Scenario 3)—the plugin then applies the common branching:
|
||||
- Non‑datamover Case: The VS reference is directly added as an additional backup item.
|
||||
|
||||
- Datamover Case: The plugin waits until the VS’s associated VSC snapshot handle is ready (using the configured CSISnapshotTimeout), then creates a DataUpload for the VS–PVC pair. The resulting DataUpload is then added as an additional backup item.
|
||||
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
%% Section 1: Accept VGS Label from User
|
||||
subgraph Accept_Label
|
||||
A1[User sets VGS label key using default velero.io/volume-group-snapshot or via server arg or Backup API spec]
|
||||
A2[User labels PVCs before backup]
|
||||
A1 --> A2
|
||||
end
|
||||
|
||||
%% Section 2: PVC ItemBlockAction Plugin Extension
|
||||
subgraph PVC_ItemBlockAction
|
||||
B1[Check PVC is bound and has VolumeName]
|
||||
B2[Add related PV to relatedItems]
|
||||
B3[Add pods mounting PVC to relatedItems]
|
||||
B4[Check if PVC has user-specified VGS label]
|
||||
B5[List PVCs in namespace matching label criteria]
|
||||
B6[Add matching PVCs to relatedItems]
|
||||
B1 --> B2 --> B3 --> B4
|
||||
B4 -- Yes --> B5
|
||||
B5 --> B6
|
||||
end
|
||||
|
||||
%% Section 3: CSI PVC Plugin Updates
|
||||
subgraph CSI_PVC_Plugin
|
||||
C1[For each PVC, check for VGS label]
|
||||
C1 -- Has VGS label --> C2[Determine scenario]
|
||||
C1 -- No VGS label --> C16[Scenario 3: Legacy workflow - create individual VS]
|
||||
|
||||
%% Scenario 1: No existing VS via VGS exists
|
||||
subgraph Scenario1[Scenario 1: No existing VS via VGS]
|
||||
S1[List grouped PVCs using VGS label]
|
||||
S2[Determine CSI driver for grouped PVCs]
|
||||
S3[If single CSI driver then select matching VGSClass; else fail backup]
|
||||
S4[Create new VGS triggering VGSC, VS, and VSC creation]
|
||||
S5[Wait for VS objects to have nonempty volumeGroupSnapshotName]
|
||||
S6[Update VS objects; remove VGS owner refs and finalizers; add backup metadata labels]
|
||||
S7[Patch VGSC deletionPolicy to Retain]
|
||||
S8[Delete transient VGS and VGSC]
|
||||
S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 --> S8
|
||||
|
||||
end
|
||||
|
||||
%% Scenario 2: Existing VS via VGS exists
|
||||
subgraph Scenario2[Scenario 2: Existing VS via VGS exists]
|
||||
S9[List VS objects using backup metadata - BackupUID, BackupName, PVCName]
|
||||
S10[Check if any VS has nonempty volumeGroupSnapshotName]
|
||||
S9 --> S10
|
||||
S10 -- Yes --> S11[Use existing VS]
|
||||
S10 -- No --> S12[Fail backup for PVC]
|
||||
end
|
||||
|
||||
C2 -- Scenario1 applies --> S1
|
||||
C2 -- Scenario2 applies --> S9
|
||||
|
||||
%% Common Branch: After obtaining a VS reference
|
||||
subgraph Common_Branch[Common Branch]
|
||||
CB1[Obtain VS reference as vsRef]
|
||||
CB2[If non-datamover, add vsRef as additional backup item]
|
||||
CB3[If datamover, wait for VSC handle and create DataUpload; add DataUpload as additional backup item]
|
||||
CB1 --> CB2
|
||||
CB1 --> CB3
|
||||
end
|
||||
|
||||
%% Connect Scenario outcomes and legacy branch to the common branch
|
||||
S8 --> CB1
|
||||
S11 --> CB1
|
||||
C16 --> CB1
|
||||
end
|
||||
|
||||
%% Overall Flow Connections
|
||||
A2 --> B1
|
||||
B6 --> C1
|
||||
|
||||
```
|
||||
|
||||
|
||||
Restore workflow:
|
||||
|
||||
- No changes required for the restore workflow.
|
||||
|
||||
## Detailed Design
|
||||
|
||||
Backup workflow:
|
||||
- Accept the label to be used for VGS from the user as a server argument:
|
||||
- Set a default VGS label key to be used:
|
||||
```go
|
||||
// default VolumeGroupSnapshot Label
|
||||
defaultVGSLabelKey = "velero.io/volume-group-snapshot"
|
||||
|
||||
```
|
||||
- Add this as a server flag and pass it to backup reconciler, so that we can use it during the backup request execution.
|
||||
```go
|
||||
flags.StringVar(&c.DefaultVGSLabelKey, "volume-group-snapshot-label-key", c.DefaultVGSLabelKey, "Label key for grouping PVCs into VolumeGroupSnapshot")
|
||||
```
|
||||
|
||||
- Update the Backup CRD to accept the VGS Label Key as a spec value:
|
||||
```go
|
||||
// VolumeGroupSnapshotLabelKey specifies the label key to be used for grouping the PVCs under
|
||||
// an instance of VolumeGroupSnapshot, if left unspecified velero.io/volume-group-snapshot is used
|
||||
// +optional
|
||||
VolumeGroupSnapshotLabelKey string `json:"volumeGroupSnapshotLabelKey,omitempty"`
|
||||
```
|
||||
- Modify the [`prepareBackupRequest` function](https://github.com/openshift/velero/blob/8c8a6cccd78b78bd797e40189b0b9bee46a97f9e/pkg/controller/backup_controller.go#L327) to set the default label key as a backup spec if the user does not specify any value:
|
||||
```go
|
||||
if len(request.Spec.VolumeGroupSnapshotLabelKey) == 0 {
|
||||
// set the default key value
|
||||
request.Spec.VolumeGroupSnapshotLabelKey = b.defaultVGSLabelKey
|
||||
}
|
||||
```
|
||||
|
||||
- Changes to the Existing [PVC ItemBlockAction plugin](https://github.com/vmware-tanzu/velero/blob/512199723ff95d5016b32e91e3bf06b65f57d608/pkg/itemblock/actions/pvc_action.go#L64) (Update the GetRelatedItems function):
|
||||
```go
|
||||
// Retrieve the VGS label key from the Backup spec.
|
||||
vgsLabelKey := backup.Spec.VolumeGroupSnapshotLabelKey
|
||||
if vgsLabelKey != "" {
|
||||
// Check if the PVC has the specified VGS label.
|
||||
if groupID, ok := pvc.Labels[vgsLabelKey]; ok {
|
||||
// List all PVCs in the namespace with the same label key and value (i.e. same group).
|
||||
pvcList := new(corev1api.PersistentVolumeClaimList)
|
||||
if err := a.crClient.List(context.Background(), pvcList, crclient.InNamespace(pvc.Namespace), crclient.MatchingLabels{vgsLabelKey: groupID}); err != nil {
|
||||
return nil, errors.Wrap(err, "failed to list PVCs for VGS grouping")
|
||||
}
|
||||
// Add each matching PVC (except the current one) to the relatedItems.
|
||||
for _, groupPVC := range pvcList.Items {
|
||||
if groupPVC.Name == pvc.Name {
|
||||
continue
|
||||
}
|
||||
a.log.Infof("Adding grouped PVC %s to relatedItems for PVC %s", groupPVC.Name, pvc.Name)
|
||||
relatedItems = append(relatedItems, velero.ResourceIdentifier{
|
||||
GroupResource: kuberesource.PersistentVolumeClaims,
|
||||
Namespace: groupPVC.Namespace,
|
||||
Name: groupPVC.Name,
|
||||
})
|
||||
}
|
||||
}
|
||||
} else {
|
||||
a.log.Info("No VolumeGroupSnapshotLabelKey provided in backup spec; skipping PVC grouping")
|
||||
}
|
||||
```
|
||||
|
||||
- Updates to [CSI PVC plugin](https://github.com/vmware-tanzu/velero/blob/512199723ff95d5016b32e91e3bf06b65f57d608/pkg/backup/actions/csi/pvc_action.go#L200) (Update the Execute method):
|
||||
```go
|
||||
func (p *pvcBackupItemAction) Execute(
|
||||
item runtime.Unstructured,
|
||||
backup *velerov1api.Backup,
|
||||
) (
|
||||
runtime.Unstructured,
|
||||
[]velero.ResourceIdentifier,
|
||||
string,
|
||||
[]velero.ResourceIdentifier,
|
||||
error,
|
||||
) {
|
||||
p.log.Info("Starting PVCBackupItemAction")
|
||||
|
||||
// Validate backup policy and PVC/PV
|
||||
if valid := p.validateBackup(*backup); !valid {
|
||||
return item, nil, "", nil, nil
|
||||
}
|
||||
|
||||
var pvc corev1api.PersistentVolumeClaim
|
||||
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(item.UnstructuredContent(), &pvc); err != nil {
|
||||
return nil, nil, "", nil, errors.WithStack(err)
|
||||
}
|
||||
if valid, item, err := p.validatePVCandPV(pvc, item); !valid {
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
return item, nil, "", nil, nil
|
||||
}
|
||||
|
||||
shouldSnapshot, err := volumehelper.ShouldPerformSnapshotWithBackup(
|
||||
item,
|
||||
kuberesource.PersistentVolumeClaims,
|
||||
*backup,
|
||||
p.crClient,
|
||||
p.log,
|
||||
)
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
if !shouldSnapshot {
|
||||
p.log.Debugf("CSI plugin skip snapshot for PVC %s according to VolumeHelper setting", pvc.Namespace+"/"+pvc.Name)
|
||||
return nil, nil, "", nil, nil
|
||||
}
|
||||
|
||||
var additionalItems []velero.ResourceIdentifier
|
||||
var operationID string
|
||||
var itemToUpdate []velero.ResourceIdentifier
|
||||
|
||||
// vsRef will be our common reference to the VolumeSnapshot (VS)
|
||||
var vsRef *corev1api.ObjectReference
|
||||
|
||||
// Retrieve the VGS label key from the backup spec.
|
||||
vgsLabelKey := backup.Spec.VolumeGroupSnapshotLabelKey
|
||||
|
||||
// Check if the PVC has the user-specified VGS label.
|
||||
if group, ok := pvc.Labels[vgsLabelKey]; ok && group != "" {
|
||||
p.log.Infof("PVC %s has VGS label with group %s", pvc.Name, group)
|
||||
// --- VGS branch ---
|
||||
// 1. Check if a VS created via a VGS workflow exists for this PVC.
|
||||
existingVS, err := p.findExistingVSForBackup(backup.UID, backup.Name, pvc.Name, pvc.Namespace)
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
if existingVS != nil && existingVS.Status.VolumeGroupSnapshotName != "" {
|
||||
p.log.Infof("Existing VS %s found for PVC %s in group %s; skipping VGS creation", existingVS.Name, pvc.Name, group)
|
||||
vsRef = &corev1api.ObjectReference{
|
||||
Namespace: existingVS.Namespace,
|
||||
Name: existingVS.Name,
|
||||
}
|
||||
} else {
|
||||
// 2. No existing VS via VGS; execute VGS creation workflow.
|
||||
groupedPVCs, err := p.listGroupedPVCs(backup, pvc.Namespace, vgsLabelKey, group)
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
pvcNames := extractPVCNames(groupedPVCs)
|
||||
// Determine the CSI driver used by the grouped PVCs.
|
||||
driver, err := p.determineCSIDriver(groupedPVCs)
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, errors.Wrap(err, "failed to determine CSI driver for grouped PVCs")
|
||||
}
|
||||
if driver == "" {
|
||||
return nil, nil, "", nil, errors.New("multiple CSI drivers found for grouped PVCs; failing backup")
|
||||
}
|
||||
// Retrieve the appropriate VGSClass for the CSI driver.
|
||||
vgsClass := p.getVGSClassForDriver(driver)
|
||||
p.log.Infof("Determined CSI driver %s with VGSClass %s for PVC group %s", driver, vgsClass, group)
|
||||
|
||||
newVGS, err := p.createVolumeGroupSnapshot(backup, pvc, pvcNames, vgsLabelKey, group, vgsClass)
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
p.log.Infof("Created new VGS %s for PVC group %s", newVGS.Name, group)
|
||||
|
||||
// Wait for the VS objects created via VGS to have volumeGroupSnapshotName in status.
|
||||
if err := p.waitForVGSAssociatedVS(newVGS, pvc.Namespace, backup.Spec.CSISnapshotTimeout.Duration); err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
// Update the VS objects: remove VGS owner references and finalizers; add backup metadata labels.
|
||||
if err := p.updateVGSCreatedVS(newVGS, backup); err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
// Patch the VGSC deletionPolicy to Retain.
|
||||
if err := p.patchVGSCDeletionPolicy(newVGS, pvc.Namespace); err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
// Delete the VGS and VGSC
|
||||
if err := p.deleteVGSAndVGSC(newVGS, pvc.Namespace); err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
// Fetch the VS that was created for this PVC via VGS.
|
||||
vs, err := p.getVSForPVC(backup, pvc, vgsLabelKey, group)
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
vsRef = &corev1api.ObjectReference{
|
||||
Namespace: vs.Namespace,
|
||||
Name: vs.Name,
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Legacy workflow: PVC does not have a VGS label; create an individual VS.
|
||||
vs, err := p.createVolumeSnapshot(pvc, backup)
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, err
|
||||
}
|
||||
vsRef = &corev1api.ObjectReference{
|
||||
Namespace: vs.Namespace,
|
||||
Name: vs.Name,
|
||||
}
|
||||
}
|
||||
|
||||
// --- Common Branch ---
|
||||
// Now we have vsRef populated from one of the above cases.
|
||||
// Branch further based on backup.Spec.SnapshotMoveData.
|
||||
if boolptr.IsSetToTrue(backup.Spec.SnapshotMoveData) {
|
||||
// Datamover case:
|
||||
operationID = label.GetValidName(
|
||||
string(velerov1api.AsyncOperationIDPrefixDataUpload) + string(backup.UID) + "." + string(pvc.UID),
|
||||
)
|
||||
dataUploadLog := p.log.WithFields(logrus.Fields{
|
||||
"Source PVC": fmt.Sprintf("%s/%s", pvc.Namespace, pvc.Name),
|
||||
"VolumeSnapshot": fmt.Sprintf("%s/%s", vsRef.Namespace, vsRef.Name),
|
||||
"Operation ID": operationID,
|
||||
"Backup": backup.Name,
|
||||
})
|
||||
// Retrieve the current VS using vsRef
|
||||
vs := &snapshotv1api.VolumeSnapshot{}
|
||||
if err := p.crClient.Get(context.TODO(), crclient.ObjectKey{Namespace: vsRef.Namespace, Name: vsRef.Name}, vs); err != nil {
|
||||
return nil, nil, "", nil, errors.Wrapf(err, "failed to get VolumeSnapshot %s", vsRef.Name)
|
||||
}
|
||||
// Wait until the VS-associated VSC snapshot handle is ready.
|
||||
_, err := csi.WaitUntilVSCHandleIsReady(
|
||||
vs,
|
||||
p.crClient,
|
||||
p.log,
|
||||
true,
|
||||
backup.Spec.CSISnapshotTimeout.Duration,
|
||||
)
|
||||
if err != nil {
|
||||
dataUploadLog.Errorf("Failed to wait for VolumeSnapshot to become ReadyToUse: %s", err.Error())
|
||||
csi.CleanupVolumeSnapshot(vs, p.crClient, p.log)
|
||||
return nil, nil, "", nil, errors.WithStack(err)
|
||||
}
|
||||
dataUploadLog.Info("Starting data upload of backup")
|
||||
dataUpload, err := createDataUpload(
|
||||
context.Background(),
|
||||
backup,
|
||||
p.crClient,
|
||||
vs,
|
||||
&pvc,
|
||||
operationID,
|
||||
)
|
||||
if err != nil {
|
||||
dataUploadLog.WithError(err).Error("Failed to submit DataUpload")
|
||||
if deleteErr := p.crClient.Delete(context.TODO(), vs); deleteErr != nil && !apierrors.IsNotFound(deleteErr) {
|
||||
dataUploadLog.WithError(deleteErr).Error("Failed to delete VolumeSnapshot")
|
||||
}
|
||||
return item, nil, "", nil, nil
|
||||
}
|
||||
dataUploadLog.Info("DataUpload submitted successfully")
|
||||
itemToUpdate = []velero.ResourceIdentifier{
|
||||
{
|
||||
GroupResource: schema.GroupResource{
|
||||
Group: "velero.io",
|
||||
Resource: "datauploads",
|
||||
},
|
||||
Namespace: dataUpload.Namespace,
|
||||
Name: dataUpload.Name,
|
||||
},
|
||||
}
|
||||
annotations[velerov1api.DataUploadNameAnnotation] = dataUpload.Namespace + "/" + dataUpload.Name
|
||||
// For the datamover case, add the dataUpload as an additional item directly.
|
||||
vsRef = &corev1api.ObjectReference{
|
||||
Namespace: dataUpload.Namespace,
|
||||
Name: dataUpload.Name,
|
||||
}
|
||||
additionalItems = append(additionalItems, velero.ResourceIdentifier{
|
||||
GroupResource: schema.GroupResource{
|
||||
Group: "velero.io",
|
||||
Resource: "datauploads",
|
||||
},
|
||||
Namespace: dataUpload.Namespace,
|
||||
Name: dataUpload.Name,
|
||||
})
|
||||
} else {
|
||||
// Non-datamover case:
|
||||
// Use vsRef for snapshot purposes.
|
||||
additionalItems = append(additionalItems, convertVSToResourceIdentifiersFromRef(vsRef)...)
|
||||
p.log.Infof("VolumeSnapshot additional item added for VS %s", vsRef.Name)
|
||||
}
|
||||
|
||||
// Update PVC metadata with common labels and annotations.
|
||||
labels := map[string]string{
|
||||
velerov1api.VolumeSnapshotLabel: vsRef.Name,
|
||||
velerov1api.BackupNameLabel: backup.Name,
|
||||
}
|
||||
annotations := map[string]string{
|
||||
velerov1api.VolumeSnapshotLabel: vsRef.Name,
|
||||
velerov1api.MustIncludeAdditionalItemAnnotation: "true",
|
||||
}
|
||||
kubeutil.AddAnnotations(&pvc.ObjectMeta, annotations)
|
||||
kubeutil.AddLabels(&pvc.ObjectMeta, labels)
|
||||
|
||||
p.log.Infof("Returning from PVCBackupItemAction with %d additionalItems to backup", len(additionalItems))
|
||||
for _, ai := range additionalItems {
|
||||
p.log.Debugf("%s: %s", ai.GroupResource.String(), ai.Name)
|
||||
}
|
||||
|
||||
pvcMap, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&pvc)
|
||||
if err != nil {
|
||||
return nil, nil, "", nil, errors.WithStack(err)
|
||||
}
|
||||
|
||||
return &unstructured.Unstructured{Object: pvcMap},
|
||||
additionalItems, operationID, itemToUpdate, nil
|
||||
}
|
||||
|
||||
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
This design proposal is targeted for velero 1.16.
|
||||
|
||||
The implementation of this proposed design is targeted for velero 1.17.
|
||||
|
||||
**Note:**
|
||||
- VGS support isn't a requirement on restore. The design does not have any VGS related elements/considerations in the restore workflow.
|
||||
|
||||
## Requirements and Assumptions
|
||||
- Kubernetes Version:
|
||||
- Minimum: v1.32.0 or later, since the VolumeGroupSnapshot API goes beta in 1.32.
|
||||
- Assumption: CRDs for `VolumeGroupSnapshot`, `VolumeGroupSnapshotClass`, and `VolumeGroupSnapshotContent` are already installed.
|
||||
|
||||
- VolumeGroupSnapshot API Availability:
|
||||
- If the VGS API group (`groupsnapshot.storage.k8s.io/v1beta1`) is not present, Velero backup will fail.
|
||||
|
||||
- CSI Driver Compatibility
|
||||
- Only CSI drivers that implement the VolumeGroupSnapshot admission and controller support this feature.
|
||||
- Upon VGS creation, we assume the driver will atomically snapshot all matching PVCs; if it does not, the plugin may time out.
|
||||
|
||||
## Performance Considerations
|
||||
- Use VGS if you have many similar volumes that must be snapped together and you want to minimize API/server load.
|
||||
- Use individual VS if you have only a few volumes, or want one‐volume failures to be isolated.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
- Unit tests: We will add targeted unit tests to cover all new code paths—including existing-VS detection, VGS creation, legacy VS fallback, and error scenarios.
|
||||
- E2E tests: For E2E we would need, a Kind cluster with a CSI driver that supports group snapshots, deploy an application with multiple PVCs, execute a Velero backup and restore, and verify that VGS is created, all underlying VS objects reach ReadyToUse, and every PVC is restored successfully.
|
||||
Reference in New Issue
Block a user