* Specify the Kind explicitly in the API resource
Specify the Kind explicitly in the API resource to avoid wrong Kind conversion
* Do not attempt restore resource with no available GVK in cluster (#7322)
Check for GVK before attempting restore.
---------
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-authored-by: Tiger Kaovilai <tkaovila@redhat.com>
Make "disable-informer-cache" option false(enabled) by default to keep it consi
stent with the help message
Fixes#7264
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
fixes#7263
This commit makes the data structures more consistent, that namespaces,
as cluster scoped resource will not have "targetNamespace" in the
"restoreableItem" instance.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Add sleep to avoid snapshot limitation issue https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html#:~:text=SnapshotCreationPerVolumeRateExceeded;
2. Move InstallVelero variable out of struct of Veleroconfig as a global one since it's not for controlling any individual case;
3. Unskip migration test case on AWS pipeline, because we added a new EKS pipeline and deleted TKG AWS pipline in internal E2E test, so this restriction for TKG AWS pipline is no long existed;
4. Skip retainPV test on vSphere pipeline due to PV longtime bounding issue;
5. Fix failing get snapshot by CSI from EC2 issue, snapshot by CSI has no label of backup name.
Signed-off-by: danfengl <danfengl@vmware.com>
VolumeInfo contains several sub-structures. They are filled for
different scenarios. Do not generate empty structure for the
not filled sub-structures.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Fixes#1970
Namespaces will be handled as cluster-scope resource, but for
consistency they will still created via "Ensure namespace" flow for
consistency.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
This commit makes sure if a PV is not taken snapshot b/c the flag
SnapshotVolumes is set to false in a backup CR, the PV is also also
tracked as skipped in the tracker.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Modify design according to comments.
Add PVInfo structure.
Add backup VolumeInfo's object storage's put and get methods.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Remove dependecy of generate client from pkg/cmd/cli/snapshotLocation.
Remove the Velero generated informer from PVB and PVR.
Remove dependency of generated client from pkg/podvolume directory.
Replace generated codec with runtime codec.
Signed-off-by: Xun Jiang <jxun@vmware.com>
enabled, before executing the action.
The DeleteItemAction is not checked, because the DIA doesn't have a
method to get the action's plugin name.
This should be OK, because the CSI will check whether the VS and VSC
have a backup name annotation. If the VS and VSC is not handled by
the CSI plugin, then they don't have the annotation.
Signed-off-by: Xun Jiang <jxun@vmware.com>
PVC block mode backup and restore introduced some OS specific
system calls. Those calls are not available for Windows, so
add both non Windows version and Windows version code, and
return error for block mode on the Windows platform.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Use informer cache with dynamic client for Get calls on restore
When enabled, also make the Get call before create.
Add server and install parameter to allow disabling this feature,
but enable by default
Signed-off-by: Scott Seago <sseago@redhat.com>
When creating resources with generateName, apimachinery
does not guarantee uniqueness when it appends the random
suffix to the generateName stub, so if it fails with
already exists error, we need to retry.
Signed-off-by: Scott Seago <sseago@redhat.com>
* doc: Alert that plugins run as binaries when turning on debug logs
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! doc: Alert that plugins run as binaries when turning on debug logs
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! doc: Alert that plugins run as binaries when turning on debug logs
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! doc: Alert that plugins run as binaries when turning on debug logs
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
---------
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
When preparing a backup repository, Velero tries to connect to it, if fails then create it. The repository status always records the error reported by creation but the real reason maybe caused by the connect operation. This is confuseing and hard to debug
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
This commit introduces our own Azure storage provider by wrapping Kopia's implementation rather than contributing to upstream based on the following considerations:
1. Velero needs the capability to interact with the repository concurrently while Kopia doesn't, this will increase the complexity of Kopia if we contribute to upstream
2. The configuration items provided by Velero and Kopia are conflict, e.g. Velero supports customizing storage account URI which is a full path while Kopia supports customizing storage account domain which is part of the URI. We need to consider the backward compatibility and upgrade case if we contribute to upstream which needs extra efforts
3. Contribute to upstream is a longer cycle when we need to introduce new changes. With this commit, we no longer depends on upstream for the Azure storage provider part and is easy for us to maintain
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
1. Skip deleting the restore files from storage if the backup/BSL is not found
2. Allow deleting the restore files from storage even though the BSL is readonly
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
1. Capture Velero pod log and K8S cluster event;
2. Fix wrong path of storageclass yaml file issue caused by pert test;
3. Fix change storageclass test issue that no sc named 'default' in EKS cluster;
4. Support AWS credential as config format;
5. Support more E2E script input parameters like standy cluster plugins and provider.
Signed-off-by: danfengl <danfengl@vmware.com>
Enlarge throttle of UT case TestThrottle_ShouldOutput to avoid occasional CI
failure due to timeout caused by test environment's CPU speed
Signed-off-by: Xun Jiang <jxun@vmware.com>
This commit introduces a deleteItemAction which writes a temporary configmap to
record the snapshot info so that the controller can trigger repo manager
to remove the snapshot
This process is a bit chatty and we should consider to refactor the code
so it's easier to connect to the repo directly in the DIA
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
* fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
---------
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
1. In K8S v1.27 API Version v1beta1 for CR volumesnapshotclass is deprcated, so E2E test should adapt both API versions to cover all K8S versio;
2. Support getting additional plugin from input;
3. Velero version and plugin map should not deprated version older than v1.10, because upgrade test will use them.
Signed-off-by: danfengl <danfengl@vmware.com>
when running `go mod why -m github.com/kopia/kopia` in velero-plugins prior to this change you will see following
```
❯ go mod why -m github.com/kopia/kopia
github.com/konveyor/openshift-velero-plugin/velero-plugins
github.com/vmware-tanzu/velero/pkg/plugin/framework
github.com/vmware-tanzu/velero/pkg/util/logging
github.com/kopia/kopia/repo/logging
```
after
```
❯ go mod why -m github.com/kopia/kopia
(main module does not need module github.com/kopia/kopia)
```
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Following the examples instructions[1], the nginx-deployment is not
backed up or restored. Add a label to the deployment so it will be
backed up and restored.
Similar change is needed for `examples/nginx-app/with-pv.yaml` but I did
not try that example.
[1] https://velero.io/docs/v1.11/contributions/minio/Fixes#6347
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
1. Because VolumeSnapshot and VolumeSnapshotContent CRs are not kept after backup completed,
don't persist them in the backup metadata.
2. Add some builder methods needed by CSI plugin.
Signed-off-by: Xun Jiang <jxun@vmware.com>
1. Bumpup velero version to the latest 2 versions in upgrade script;
2. Bumpup velero verioin to the latest 1 vesion in migration script;
3. Bring B/R with restic test back in vSphere pipeline since vSphere plugin issue fix was included
in v1.5;
4. Disable nodeport test in AWS pipeline since AWS k8s version bumpup;
5. Prepare for data mover test, allow object store provider diffrent from cloud provider.
Signed-off-by: danfengl <danfengl@vmware.com>
Due to the logic moving to plugin, and the plugin cannot read the
Velero server's resourceTimeout setting, add the resourceTimeout
in the backup annotation to pass to plugin.
Remove VolumeSnapshotContent reset code from Velero server.
Signed-off-by: Xun Jiang <jxun@vmware.com>
For some use cases, namespaced-scope resources are inluded into backup,
but the namespaces are not included due to filters setting.
To do this, removing label selector filter from namespace resource.
Namespace resource only honor namespace exclude/include filters.
Signed-off-by: Xun Jiang <jxun@vmware.com>
This commit skips updating the restore progress, in the first loop for
restoration when CRDs are handled, so that the misleading "totalItem"
will not appear in the CR.
Fixes#5990
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Fix context issues produced by previous PR, increase timeout or add case scpoed global timeout param to make backup/restore command timeout configurable.
2. Add global param for storage class name using by test cases;
3. Fix param DefaultVolumesToFsBackup usage issue: set DefaultVolumesToFsBackup to false in backup CLI in case it was set to true in install CLI.
4. Make namespace names of each namespace mapping test unique from being interfered by each other.
Signed-off-by: danfengl <danfengl@vmware.com>
The log message should be clarified, otherwise when a user chooses to do
the backup via podvolme there will be confusing logs, but actually it's
just skipping the BIA for CSI plugin.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Restore Services before Clusters so they can be adopted by AKO-operator and no new Services will be created for the same clusters
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Update README to clarify the backward compatibility.
Trivial update to the support process to reflect how issues are labeled
as for now.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1) default frequency 10s
2) per-reconcile log is now Debug not info
3) added predicate to reduce reconcile events
Signed-off-by: Scott Seago <sseago@redhat.com>
Add secret restore item action to handle service account token secret:
1. Skip the restoration for the auto-created service account token secret
2. Remove several fields for non-auto-created service account token secret to make sure the secret can be restored
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Use the same pvb/pvr update functions across pkg/controller and pkg/cli/nodeagent for consistency of behavior
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
The option "--service-account-name" is to be added to that user can use
an existing service account for velero and node-agent pods. This is
helpful for users who wanna use IRSA.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Update the group members of "maintainers" and "tech-writer" to reflect
the change in the team.
As for the group "tech-writer" I just selected a few members from
maintains team who has been working on velero for a relatively longer
time.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1) clarification around Cancel() return values
2) updates to itemoperation json upload to account for progress
3) update to OperationProgress struct to avoid duplicate parameter
4) update new B/R phase name to WaitingForPluginOperationsPartiallyFailed for consistency
Signed-off-by: Scott Seago <sseago@redhat.com>
Due to CSIDriver is checked for Restic volume mounting path, and CSIDriver is GA and moved to storage v1 group in k8s v1.18, so update Velero v1.8, v1.9 and v1.10 compatible k8s version to 1.18-latest.
Signed-off-by: Xun Jiang <blackpiglet@gmail.com>
This commit makes update to the update api-types docs to add missing
fields.
It also includes misc changes to the inline comment, and a change to
Dockerfile to make sure the build-image works on mac
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Restore ClusterBootstrap before Cluster otherwise a new default ClusterBootstrap object is create for the cluster
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
This design combines the requirements for the previously-merged
Upload Progress Monitoring design with the requirements for the
(not submitted but discussed in meetings and slack) proposed asynchronous
item action plugins into one integrated proposal.
Signed-off-by: Scott Seago <sseago@redhat.com>
Enhance the restore priorities list to support specifying the low prioritized resources that need to be r
estored in the last
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
The container name for the aws plugin is `velero-plugin-for-aws`. There was an extra `velero-` prefix in the doc.
Signed-off-by: Dave Pedu <dave@davepedu.com>
1. Fix issue of kubectl client and server mismatch version in GitAction E2E job, refer to https://github.com/elastic/cloud-on-k8s/issues/4737;
2. Adapt to the changing of keyword for involing Kpoia as fs backupper, new installtion breaked upgrade and migration tests;
3. Accept multi-labels of Ginkgo focus as input of E2E make command;
4. Distinguish workload namespace from each tests;
5. Fix issues of not using Velero util to perform Velero commands;
6. Add snapshot test case for NamespaceMapping E2E test;
7. Collect debug bundle after catching error of Velero backup or restore command;
Signed-off-by: danfengl <danfengl@vmware.com>
The RIA refactoring moved velero.RestoreItemAction into a separate
(restoreitemaction) v1 package. Unfortunately, this change would require
plugins to make code changes to locate the RestoreItemActionExecuteInput
and RestoreItemActionExecuteOutput structs.
This commit restores those structs to the original velero package, leaving
just the RestoreItemAction interface in the new v1 package.
Signed-off-by: Scott Seago <sseago@redhat.com>
This commit provides a simple contract that if the BackupItemAction
plugin sets an annotation in a resource it has handled, the additional
items will considered "must include" i.e. each of them will skip the
"include-exclude" filter, such that the plugin developer can make sure
they are included in the backup disregarding the filter setting in the
bakcup CR.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. One of API group test failed due to other PR with fix for treat PartiallyFailed as failure to collect debugbundle without wrap the origin error;
2. Fix migration test issue of wrong velero cli for backup commmand;
3. Fix wrong pararmeter name issue for pv opt-out backup test.
Signed-off-by: danfengl <danfengl@vmware.com>
When running velero backup/restore command, if the command result is "PartiallyFailed", it won't reture error as design, but we do need to know the debug information to figure out the reason, so the command output is needed to get the command result, then further action will be taken.
Signed-off-by: danfengl <danfengl@vmware.com>
Refactors the framework package to implement the plugin versioning changes
needed for BIA v1 and overall package refactoring to support plugin versions
in different packages. This should be all that's needed to move on to
v2 for BackupItemAction. The remaining plugin types still need similar
refactoring to what's being done here for BIA before attempting a
v2 implementation.
Signed-off-by: Scott Seago <sseago@redhat.com>
Refactors the clientmgmt package to implement the plugin versioning changes
needed for BIA v1 and overall package refactoring to support plugin versions
in different packages. This should be all that's needed to move on to
v2 for BackupItemAction. The remaining plugin types still need similar
refactoring to what's being done here for BIA before attempting a
v2 implementation.
Signed-off-by: Scott Seago <sseago@redhat.com>
I think is necessary this little comment about TTL expiration, because it can be confusing when the expiration time has passed and the data allocated and the snapshots are not erased at that time.
Signed-off-by: Aaron Arias <33655005+aaronariasperez@users.noreply.github.com>
If generating protoc go files from scratch, `make update` fails if
CRD generation happens first, since the protoc-generated
files are imported by the api go files.
protoc generation needs to happen earlier.
Signed-off-by: Scott Seago <sseago@redhat.com>
In determining whether a backup includes all namespaces, item_collector
checks for an empty string in the first element of the ns list. If processing
includes+excludes results in an empty list, treat this as another case
of a not-all-namespaces backup rather than crashing velero.
Signed-off-by: Scott Seago <sseago@redhat.com>
1. Add some refactored controllers initiation code into enabledRuntimeControllers.
2. Add reconciler struct initiation function for DownloadRequest and ServerStatusRequest controllers.
Signed-off-by: Xun Jiang <blackpiglet@gmail.com>
This stops subheading description from showing in posted issues by default.
Signed-off-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
1. Clean backups after each test to avoid exceeding limitation of storage capability during E2E test;
2. Fix exlude label test issue that namespace should not be included and excluded at the same time no matter by which way to config.
Signed-off-by: danfengl <danfengl@vmware.com>
This commit adds the parameter "uploader-type" to velero server, add exposes the
setting via "velero install" in CLI.
fixes#5062
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Also checking annotation "pv.kubernetes.io/migrated-to" to find out whether volume is provisioned by CSI.
2. Add UT cases.
Signed-off-by: Xun Jiang <jxun@vmware.com>
This commit splits the pkg/restic package into several packages to support Kopia integration works
Fixes#5055
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
1. Make the Restore hook.InitConatianer server side field pruing disable.
2. Remove restore patch in update-generate-crd-code.sh.
3. Modify related testcases.
4. Add Container fields validation in Restore Init hook.
Signed-off-by: Xun Jiang <jxun@vmware.com>
"EnableAPIGroupVersions" is set
The crd-remap-version plugin will always backup v1b1 resource for some
CRD. It impacts the feature flag `EnableAPIGroupVersions` which means to
backup all versions, and make migration fail.
In this commit the featureSet was removed from plugin server struct b/c
it blocks the parm `--features` to be populated correctly. This change
should not have negative impact b/c the attribute in server struct is never used.
Fixes#5146
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
This commit adds additional fields to podvolumebackup
and podvolumerestore. The resticrepository will be renamed to
backuprepository
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Pass in a new copy of the map of config values rather than
modifying the BSL Spec.Config and then pass in that field.
Signed-off-by: Scott Seago <sseago@redhat.com>
This commit mitigates the issue for running "make update" locally when
the network is not friendly for accessing the default "proxy.golang.org"
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Mitigate the issue mentioned in #4782
When there's a bug or misconfiguration that causes nil pointer there
will be more stack trace information to help us debug.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Fix bsl validation bug: the BSL is validated continually and doesn't respect the validation period configured
Fixes#5056
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
* move 'velero.io/exclude-from-backup' label name to const
Signed-off-by: Niu Lechuan <lechuan.niu@daocloud.io>
* add changelog file (in changelogs/unreleased) of this PR
Signed-off-by: Niu Lechuan <lechuan.niu@daocloud.io>
1. remove go.sum file from code spell check action.
2. change go version to 1.17 in CRD verify action, and add k8s 1.23 and 1.24 in verification list.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Because the column and project specified by this action do not exist anymore, and Velero team doesn't use this action to assign issue and triage anymore, remove this action.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Update the release steps to reflect the change in the `tag-release.sh`,
that the release branch must be created manually before RC is tagged.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Use patch rather status patch in backup sync controller as we have disable status as sub resource
2. Set the GC period with default value if it isn't set
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
It's not necessary to set the deletion policy as the delete item action
plugin in CSI plugin will set it to Delete when the backup is deleted.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
When enabling the status as sub resource in CRD, the status will be ignored when creating the CR with status, this will cause issues when syncing backups/pvbs
Fixes#4950
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
We have made a few changes to the CSI plugin to provide official support
for AWS/Azure. This commit makes change to the docs to reflect those
changes.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Add filter functions for PeriodicalEnqueueSource.
Move BSL's valication frequency check test case to PeriodicalEnqueueSource's test.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Make in-progress PVB/PVR as failed when restic controller restarts to avoid hanging backup/restore
Fixes#4772
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
1. Add checkpoint in snapshot E2E test to verify snapshot CR should be created and snapshot should be created in cloud side after backup completion;
2. Fix snapshot name issue that CSI snapshot name in cloud side is not the same with other non-CSI cloud snapshots;
Signed-off-by: danfengl <danfengl@vmware.com>
When iterating over applicable restore actions, if a non-matching label
selector is found, velero should continue to the next action rather than
returning from the restoreItem func, which ends up preventing the item's
restore entirely.
Signed-off-by: Scott Seago <sseago@redhat.com>
1. Delete VolumeSnapshot directly when DeletionPolicy set to Retain.
2. Change VolumeSnapshotContent's DeletionPolicy to Retain, then delete VolumeSnapshot. After that delete VolumeSnapshotContent and change VSC DeletionPolicy to Delete back, then re-create the VolumeSnapshotContent.
Signed-off-by: Xun Jiang <jxun@vmware.com>
This commit makes backup sync controller delete the volumesnapshot and
volumesnapshotcontent created by the backup which is cleaned up as orphan
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Make in-progress backup/restore as failed when doing the reconcile to avoid hanging in in-progress status
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Fixes#4760
This commit make changes in 2 parts:
1) When a volumesnapshotcontent is persisted during backup, velero will reset its
`Source` field to remove the VolumeHandle, so that the
csi-snapshotter will not try to call `CreateSnapshot` when its synced
to another cluster with a backup.
2) Make sure the referenced volumesnapshotclasses are persisted and
synced with the backup, so that when the volumesnapshotcontent is
deleted the storage snapshot is also removed.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Add --insecure-tls for ResticManager's commands.
2. Add --insecure-tls in PodVolumeBackup and PodVolumeRestore controller.
3. Upgrade integrated Restic version to v0.13.1
4. Change --last flag in Restic command to --latest=1 due to Restic version update.
Signed-off-by: Xun Jiang <jxun@vmware.com>
As we are refactoring controllers with kubebuilder, use the controller-gen rather than code-generator to generate the deep copy methods for objects
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
- go install cmd/velero/velero.go
- go install cmd/velero-restic-restore-helper/velero-restic-restore-helper.go
Will generate binary in `$(go env GOPATH)/bin/` with the correct name.
build.sh still works the same.
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
* Add bsl related TTL gc errors to labelSelectors
* if backup label map is nil, make map
* clear label if not BSL error
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
This allows a user inspecting the restore logs to see any
errors or warnings generated by the restore so that they
will be seen even without having to use the describe cli.
Signed-off-by: Scott Seago <sseago@redhat.com>
* Add plugin versioning design doc
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Use more generic versions in scenarios section
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Address code review
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Address code review
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Modify design to allow other interface changes
The previous design assumed that only method addition would be
supported. It now includes guidance for making changes such as method
removal or signature changes.
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
Co-authored-by: Bridget McErlean <bmcerlean@vmware.com>
The GINKGO_SKIP option is updated to string that can be separated by "." for "make test-e2e".
Signed-off-by: Xun Jiang <jxun@vmware.com>
Signed-off-by: Hoang, Phuong <phuong.n.hoang@dell.com>
1. Mark the BSL as "Unavailable" when gets any error
2. Add a new field "Message" to the BSL status to record the error message
Fixes#4485Fixes#4405
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
When velero is running on clusters that don't support v1beta1 CRD, the
plugin will not try to backup v1beta1 CRD.
The plugin should be kept for backward compatibility. It will be
removed when velero drop the support for k8s v1.21
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. rename zoneSeparator to gkeZoneSeparator
2. add example of regional PV's node affinity. modify test case description.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Specify the risk of this parameter set to true. Add the issue first reported about this topic which includeds the google document illustrates about it.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Test case description is "Deleted backups are deleted from object storage and backups deleted from object storage can be deleted locally",
in this test, only resource backup objects are target for verifition, restic repo verification is not included in this PR, and snapshot verification will be in later PR
Signed-off-by: danfengl <danfengl@vmware.com>
Fix#4499
When hook influnce multiple pods, current logic's first pod's container will overwrite the hook's exec.container parameter. That will cause the other pod fail on the hook executing.
Signed-off-by: Xun Jiang <jxun@vmware.com>
By now, only busybox:latest is used by e2e. It is already upload to gcr.io/velero-gcp/busybox:latest
Change the image to gcr.io to avoid pulling rate limitation from docker hub.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Push to GCR in github workflow to faciliate some environments that have rate limitation to docker hub, e.g. vSphere.
<root@jxun-jumpserver.c.velero-gcp.internal>
Signed-off-by: Xun Jiang <jxun@vmware.com>
Since Itemsnapshotter plugin is still WIP,
this commit removes the reference and the deprecation of volumeSnapshotter plugin
from the doc to avoid confusion.
We'll update the doc when it's ready and we have a reference
implementation.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
* Use OrderedResources in schedules
Make ParseOrderedResources public for use in schedules
Add changelog
Signed-off-by: Dominic <dominic@xdnx.org>
* Rename function in comment section
Signed-off-by: Dominic <dominic@xdnx.org>
* #4067 Initial design of the new plugins - pre-post backup and restore
Signed-off-by: Rafael Brito <rbrito@vmware.com>
* Update new-prepost-backuprestore-plugin-hooks.md
* Updated design doc as per feedback
Signed-off-by: Rafael Brito <rbrito@vmware.com>
* Adding design changes as per feedback
* Update design on prepost-backup-restore plugins
* More color on how to call plugins
Signed-off-by: Rafael Brito <rbrito@vmware.com>
* Proposing annotations to skip plugin execution
Signed-off-by: Rafael Brito <rbrito@vmware.com>
We introduces the installation option "--default-restic-prune-frequency" to make restic prune frequency configuration in the previous release, but there is a bug that make the option don't take effect. This commit fixes the bug by removing the evaluation part. The restic repository controller will take care the prune frequency for the repository
Fixes#3062
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Check the existence of the expected service when ignoring the NodePort already allocated error
Fixes 2308
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Test case description is "Deleted backups are deleted from object storage and backups deleted from object storage can be deleted locally",
in this test, only resource backup objects are target for verifition, restic repo verification is not included in this PR, and snapshot verification will be in later PR
Signed-off-by: danfengl <danfengl@vmware.com>
* Migrate backup sync controller from code-generator to kubebuilder
1. use kubebuilder's reconcile logic to replace controller's old logic.
2. use ginkgo and gomega to replace testing.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* Fix: modify code according to comments
1. Remove DefaultBackupLocation
2. Remove unneccessary comment line
3. Add syncPeriod default value setting logic
4. Modify ListBackupStorageLocations function's context parameter
5. Add RequeueAfter parameter in Reconcile function return value
Signed-off-by: Xun Jiang <jxun@vmware.com>
* Reconcile function use context passed from parameter
1. Use context passed from parameter, instead of using Reconciler struct's context.
2. Delete Reconciler struct's context member.
3. Modify test case accordingly.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* Remove backups and restic repos associated with deleted BSL(s)
Signed-off-by: F. Gold <fgold@vmware.com>
* add changelog
Signed-off-by: F. Gold <fgold@vmware.com>
* Add PR number to changelog
Signed-off-by: F. Gold <fgold@vmware.com>
* Fix typo
Signed-off-by: F. Gold <fgold@vmware.com>
* Only delete backups and restic repos and report success when without errors
Signed-off-by: F. Gold <fgold@vmware.com>
* Adds <backup-name>-itemsnapshots.gz file to backup (when provided). Also
adds DownloadTargetKindBackupItemSnapshots type to allow downloading.
Updated object store unit test
Fixes#3758
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
* Removed redundant checks
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
* Consolidated code for resolving actions and plugins into ActionResolver. Added BackupWithResolvers and
RestoreWithResolvers. Introduces ItemSnapshooterResolver to bring ItemSnapshotter plugins into backup and
restore. ItemSnapshotters are not used yet.
Added action_resolver_test
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
* Addressed review comments
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
This commit adds a restore action item plugin to reset invalid value
of "sideEffects" in resource of mutatingwebhookconfiguration and
validating webhookconfiguration.
To fix the problem the "sideEffects" is illegal for resource migrated
from v1beta1.
fixes#3516
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. remove config/crd/v1beta1
2. remove PROJECT file
3. update controller-gen and kubebuilder version
4. generate client and CRD file
5. add changelog and remove v1beta1 CRD generated code.
6. add kubebuilder test bundle setup command.
7. due to apiextensions.k8s.io/v1beta1 is not supported, only k8s after v1.16 is supported, so remove v1.15 check.
8. add CRD and k8s suppored version update in changelog.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* fix: modify generated from schedule's backup name timestamp to UTC timezone
fix#4279
When backup is created from schedule, and the backup name is not specified, a containing-timestamp generated name will be used. Due to velero client not set timezone to UTC, a local timezone will be used for the generated name.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* fix: modify generated from schedule's backup name timestamp to UTC timezone
fix#4279
When backup is created from schedule, and the backup name is not specified, a containing-timestamp generated name will be used. Due to velero client not set timezone to UTC, a local timezone will be used for the generated name.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* fix: modify generated from schedule's backup name timestamp to UTC timezone
fix#4279
When backup is created from schedule, and the backup name is not specified, a containing-timestamp generated name will be used. Due to velero client not set timezone to UTC, a local timezone will be used for the generated name.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* modify changelog description
Reword the changelog description according to comments.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Co-authored-by: jxun <jxun@jxun-a01.vmware.com>
Co-authored-by: Xun Jiang <jxun@vmware.com>
logrusr is a open source convertor, which can convert logrus logger into logr.
By using logrusr, velero can use exsiting formatted logrus logger, other than introducing zap as a new logger.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Added ItemSnapshotter.proto
Added item_snapshotter Go interface
Added framework components for item_snapshotter
Updated plugins doc with ItemSnapshotter info
Added SnapshotPhase to item_snapshotter.go
ProgressOutputOutput now includes a phase as well as an error string for problems that occured
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
* Update EnableAPIGroupVersion feature design doc as implemented
Signed-off-by: F. Gold <fgold@vmware.com>
* Design doc for issue 2082 to delete associated resources when deleting BSLs
Signed-off-by: F. Gold <fgold@vmware.com>
* Changes per @dsu-igeek review comments
Signed-off-by: F. Gold <fgold@vmware.com>
The error should be returned explicitly, because when the default URL is
used S3 will return a 301 and the response can't be handled by restic.
Fixes#4178
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
When the snapshot uploading is failed, it should not be treat as completed and continue.
This commit covers both the phases of in progress and failed when uploading snapshot with vSphere plugin
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Previously, the BSL credential field would always be set when using the
`create` command, even if no credential details were provided. This
would result in an empty `SecretKeySelector` in the BSL which would
cause operations using this BSL to fail as Velero would attempt to fetch
a `Secret` with an empty name from the K8s API server.
With this change, the `Credential` field is only set if credential
details have been specified. This change also includes some refactoring
to allow the change to be tested.
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
fix paging items in to use list options passed by the paging function
The client-go pager sets the Limit options for the list call
to paginate the request[1]. This PR fixes the paging function
to use the options passed by the pager instead of shadowed options
This is required for the pagination to work correctly.
- simplify the pager list implementation by using pager.List()
The List() function already implements a lot of the logic that was
needed for paging here, using it simplifies the code.
1. 3f40906dd8/staging/src/k8s.io/client-go/tools/pager/pager.go (L219)
Signed-off-by: Alay Patel <alay1431@gmail.com>
@@ -5,15 +5,18 @@ about: Tell us about a problem you are experiencing
---
**What steps did you take and what happened:**
[A clear and concise description of what the bug is, and what commands you ran.)
<!--A clear and concise description of what the bug is, and what commands you ran.-->
**What did you expect to happen:**
**The following information will help us better understand what's going on**:
**The output of the following commands will help us better understand what's going on**:
(Pasting long output into a [GitHub gist](https://gist.github.com) or other pastebin is fine.)
_If you are using velero v1.7.0+:_
Please use `velero debug --backup <backupname> --restore <restorename>` to generate the support bundle, and attach to this issue, more options please refer to `velero debug --help`
_If you are using earlier versions:_
Please provide the output of the following commands (Pasting long output into a [GitHub gist](https://gist.github.com) or other pastebin is fine.)
-`kubectl logs deployment/velero -n velero`
-`velero backup describe <backupname>` or `kubectl get backup/<backupname> -n velero -o yaml`
-`velero backup logs <backupname>`
@@ -22,7 +25,7 @@ about: Tell us about a problem you are experiencing
**Anything else you would like to add:**
[Miscellaneous information that will assist in solving the issue.]
<!--Miscellaneous information that will assist in solving the issue.-->
- [ ] [Accepted the DCO](https://velero.io/docs/v1.5/code-standards/#dco-sign-off). Commits without the DCO will delay acceptance.
- [ ] [Created a changelog file](https://velero.io/docs/v1.5/code-standards/#adding-a-changelog) or added `/kind changelog-not-required`.
- [ ] [Created a changelog file](https://velero.io/docs/v1.5/code-standards/#adding-a-changelog) or added `/kind changelog-not-required` as a comment on this pull request.
- [ ] Updated the corresponding documentation in `site/content/docs/main`.
stale-issue-message:"This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. If a Velero team member has requested log or more information, please provide the output of the shared commands."
close-issue-message:"This issue was closed because it has been stalled for 5 days with no activity."
days-before-issue-stale:30
days-before-issue-close:5
stale-issue-message:"This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands."
close-issue-message:"This issue was closed because it has been stalled for 14 days with no activity."
days-before-issue-stale:60
days-before-issue-close:14
stale-issue-label:staled
# Disable stale PRs for now; they can remain open.
days-before-pr-stale:-1
days-before-pr-close:-1
# Only issues made after Feb 09 2021.
start-date:"2021-09-02T00:00:00"
# Only make issues stale if they have these labels. Comma separated.
only-labels:"Needs info,Duplicate"
exempt-issue-labels:"Epic,Area/CLI,Area/Cloud/AWS,Area/Cloud/Azure,Area/Cloud/GCP,Area/Cloud/vSphere,Area/CSI,Area/Design,Area/Documentation,Area/Plugins,Bug,Enhancement/User,kind/requirement,kind/refactor,kind/tech-debt,limitation,Needs investigation,Needs triage,Needs Product,P0 - Hair on fire,P1 - Important,P2 - Long-term important,P3 - Wouldn't it be nice if...,Product Requirements,Restic - GA,Restic,release-blocker,Security"
Below is a list of adopters of Velero in **production environments** that have
publicly shared the details of how they use it.
**[BitGo][20]**
BitGo uses Velero backup and restore capabilities to seamlessly provision and scale fullnode statefulsets on the fly as well as having it serve an integral piece for our kubernetes disaster-recovery story.
BitGo uses Velero backup and restore capabilities to seamlessly provision and scale fullnode statefulsets on the fly as well as having it serve an integral piece for our Kubernetes disaster-recovery story.
**[Bugsnag][30]**
We use Velero for managing backups of an internal instance of our on-premise clustered solution. We also recommend our users of [on-premise Bugsnag installations][31] use Velero for [managing their own backups][32].
We use Velero for managing backups of an internal instance of our on-premise clustered solution. We also recommend our users of [on-premise Bugsnag installations](https://www.bugsnag.com/on-premise) use Velero for [managing their own backups](https://docs.bugsnag.com/on-premise/clustered/backup-restore/). <!-- Velero.io word list : ignore -->
**[Banzai Cloud][60]**
[Banzai Cloud Pipeline][61] is a Kubernetes-based microservices platform that integrates services needed for Day-1 and Day-2 operations along with first-class support both for on-prem and hybrid multi-cloud deployments. We use Velero to periodically [backup and restore these clusters in case of disasters][62].
@@ -40,7 +42,9 @@ We have integrated our [solution with Velero][11] to provide our customers with
Kyma [integrates with Velero][41] to effortlessly back up and restore Kyma clusters with all its resources. Velero capabilities allow Kyma users to define and run manual and scheduled backups in order to successfully handle a disaster-recovery scenario.
**[Red Hat][50]**
Red Hat has developed the [Cluster Application Migration Tool][51] which uses [Velero and Restic][52] to drive the migration of applications between OpenShift clusters.
Red Hat has developed 2 operators for the OpenShift platform:
- [Migration Toolkit for Containers][51] (Crane): This operator uses [Velero and Restic][52] to drive the migration of applications between OpenShift clusters.
- [OADP (OpenShift API for Data Protection) Operator][53]: This operator sets up and installs Velero on the OpenShift platform, allowing users to backup and restore applications.
**[Dell EMC][70]**
For Kubernetes environments, [PowerProtect Data Manager][71] leverages the Container Storage Interface (CSI) framework to take snapshots to back up the persistent data or the data that the application creates e.g. databases. [Dell EMC leverages Velero][72] to backup the namespace configuration files (also known as Namespace meta data) for enterprise grade data protection.
@@ -56,8 +60,14 @@ MayaData is a large user of Velero as well as a contributor. MayaData offers a D
Okteto integrates Velero in [Okteto Cloud][94] and [Okteto Enterprise][95] to periodically backup and restore our clusters for disaster recovery. Velero is also a core software building block to provide namespace cloning capabilities, a feature that allows our users cloning staging environments into their personal development namespace for providing production-like development environments.
**[Replicated][100]**<br>
Replicated uses the Velero open source project to enable snapshots in [KOTS][101] to backup Kubernetes manifests & persistent volumes. In addition to the default functionality that Velero provides, [KOTS][101] provides a detailed interface in the [Admin Console][102] that can be used to manage the storage destination and schedule, and to perform and monitor the backup and restore process.
Replicated uses the Velero open source project to enable snapshots in [KOTS][101] to backup Kubernetes manifests & persistent volumes. In addition to the default functionality that Velero provides, [KOTS][101] provides a detailed interface in the [Admin Console][102] that can be used to manage the storage destination and schedule, and to perform and monitor the backup and restore process.<br>
**[CloudCasa][103]**<br>
[Catalogic Software][104] integrates Velero with [CloudCasa][103] - A Smart Home in the Cloud for Backups. CloudCasa is a simple, scalable, cloud-native solution providing data protection and disaster recovery as a service. This solution is built using Kubernetes for protecting Kubernetes clusters.<br>
**[Microsoft Azure][105]**<br>
[Azure Backup for AKS][106] is an Azure native, Kubernetes aware, Enterprise ready backup for containerized applications deployed on Azure Kubernetes Service (AKS). AKS Backup utilizes Velero to perform backup and restore operations to protect stateful applications in AKS clusters.<br>
## Adding your organization to the list of Velero Adopters
If you are using Velero and would like to be included in the list of `Velero Adopters`, add an SVG version of your logo to the `site/static/img/adopters` directory in this repo and submit a [pull request][3] with your change. Name the image file something that reflects your company (e.g., if your company is called Acme, name the image acme.png). See this for an example [PR][4].
@@ -77,8 +87,6 @@ If you would like to add your logo to a future `Adopters of Velero` section on [
[![Build Status][1]][2] [](https://bestpractices.coreinfrastructure.org/projects/3811)
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a public cloud platform or on-premises. Velero lets you:
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a public cloud platform or on-premises.
Velero lets you:
* Take backups of your cluster and restore in case of loss.
* Migrate cluster resources to other clusters.
@@ -18,7 +20,7 @@ Velero consists of:
## Documentation
[The documentation][29] provides a getting started guide and information about building from source, architecture, extending Velero, and more.
[The documentation][29] provides a getting started guide and information about building from source, architecture, extending Velero and more.
Please use the version selector at the top of the site to ensure you are using the appropriate documentation for your version of Velero.
@@ -34,6 +36,27 @@ If you are ready to jump in and test, add code, or help with documentation, foll
See [the list of releases][6] to find out about feature changes.
### Velero compatibility matrix
The following is a list of the supported Kubernetes versions for each Velero version.
| Velero version | Expected Kubernetes version compatibility | Tested on Kubernetes version |
Velero supports IPv4, IPv6, and dual stack environments. Support for this was tested against Velero v1.8.
The Velero maintainers are continuously working to expand testing coverage, but are not able to test every combination of Velero and supported Kubernetes versions for each Velero release. The table above is meant to track the current testing coverage and the expected supported Kubernetes versions for each Velero version. If you have a question about test coverage before v1.9, please reach out in the [#velero-users](https://kubernetes.slack.com/archives/C6VCGP4MT) Slack channel.
If you are interested in using a different version of Kubernetes with a given Velero version, we'd recommend that you perform testing before installing or upgrading your environment. For full information around capabilities within a release, also see the Velero [release notes](https://github.com/vmware-tanzu/velero/releases) or Kubernetes [release notes](https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG). See the Velero [support page](https://velero.io/docs/latest/support-process/) for information about supported versions of Velero.
For each release, Velero maintainers run the test to ensure the upgrade path from n-2 minor release. For example, before the release of v1.10.x, the test will verify that the backup created by v1.9.x and v1.8.x can be restored using the build to be tagged as v1.10.x.
This document provides a link to the [Velero Project boards](https://github.com/vmware-tanzu/velero/projects) that serves as the up to date description of items that are in the release pipeline. The release boards have separate swim lanes based on prioritization. Most items are gathered from the community or include a feedback loop with the community. This should serve as a reference point for Velero users and contributors to understand where the project is heading, and help determine if a contribution could be conflicting with a longer term plan.
### How to help?
Discussion on the roadmap can take place in threads under [Issues](https://github.com/vmware-tanzu/velero/issues) or in [community meetings](https://velero.io/community/). Please open and comment on an issue if you want to provide suggestions, use cases, and feedback to an item in the roadmap. Please review the roadmap to avoid potential duplicated effort.
### How to add an item to the roadmap?
One of the most important aspects in any open source community is the concept of proposals. Large changes to the codebase and / or new features should be preceded by a [proposal](https://github.com/vmware-tanzu/velero/blob/main/GOVERNANCE.md#proposal-process) in our repo.
For smaller enhancements, you can open an issue to track that initiative or feature request.
We work with and rely on community feedback to focus our efforts to improve Velero and maintain a healthy roadmap.
### Current Roadmap
The following table includes the current roadmap for Velero. If you have any questions or would like to contribute to Velero, please attend a [community meeting](https://velero.io/community/) to discuss with our team. If you don't know where to start, we are always looking for contributors that will help us reduce technical, automation, and documentation debt.
Please take the timelines & dates as proposals and goals. Priorities and requirements change based on community feedback, roadblocks encountered, community contributions, etc. If you depend on a specific item, we encourage you to attend community meetings to get updated status information, or help us deliver that feature by contributing to Velero.
`Last Updated: October 2021`
#### 1.8.0 Roadmap (to be delivered January/February 2021)
|Issue|Description|Timeline|Notes|
|---|---|---|---|
|[4108](https://github.com/vmware-tanzu/velero/issues/4108), [4109](https://github.com/vmware-tanzu/velero/issues/4109)|Solution for CSI - Azure and AWS|2022 H1|Currently, Velero plugins for AWS and Azure cannot back up persistent volumes that were provisioned using the CSI driver. This will fix that.|
|[3229](https://github.com/vmware-tanzu/velero/issues/3229),[4112](https://github.com/vmware-tanzu/velero/issues/4112)|Moving data mover functionality from the Velero Plugin for vSphere into Velero proper|2022 H1|This work is a precursor to decoupling the Astrolabe snapshotting infrastructure.|
|[3533](https://github.com/vmware-tanzu/velero/issues/3533)|Upload Progress Monitoring|2022 H1|Finishing up the work done in the 1.7 timeframe. The data mover work depends on this.|
|[1975](https://github.com/vmware-tanzu/velero/issues/1975)|Test dual stack mode|2022 H1|We already tested IPv6, but we want to confirm that dual stack mode works as well.|
|[2082](https://github.com/vmware-tanzu/velero/issues/2082)|Delete Backup CRs on removing target location. |2022 H1||
|[3516](https://github.com/vmware-tanzu/velero/issues/3516)|Restore issue with MutatingWebhookConfiguration v1beta1 API version|2022 H1||
|[2308](https://github.com/vmware-tanzu/velero/issues/2308)|Restoring nodePort service that has nodePort preservation always fails if service already exists in the namespace|2022 H1||
|[4115](https://github.com/vmware-tanzu/velero/issues/4115)|Support for multiple set of credentials for VolumeSnapshotLocations|2022 H1||
|[1980](https://github.com/vmware-tanzu/velero/issues/1980)|Velero triggers backup immediately for scheduled backups|2022 H1||
|[4067](https://github.com/vmware-tanzu/velero/issues/4067)|Pre and post backup and restore hooks|2022 H1||
|[3742](https://github.com/vmware-tanzu/velero/issues/3742)|Carvel packaging for Velero for vSphere|2022 H1|AWS and Azure have been completed already.|
|[3285](https://github.com/vmware-tanzu/velero/issues/3285)|Design doc for Velero plugin versioning|2022 H1||
|[4231](https://github.com/vmware-tanzu/velero/issues/4231)|Technical health (prioritizing giving developers confidence and saving developers time)|2022 H1|More automated tests (especially the pre-release manual tests) and more automation of the running of tests.|
|[4110](https://github.com/vmware-tanzu/velero/issues/4110)|Solution for CSI - GCP|2022 H1|Currently, the Velero plugin for GCP cannot back up persistent volumes that were provisioned using the CSI driver. This will fix that.|
|[3742](https://github.com/vmware-tanzu/velero/issues/3742)|Carvel packaging for Velero for restic|2022 H1|AWS and Azure have been completed already.|
|[4111](https://github.com/vmware-tanzu/velero/issues/4111)|Ignore items returned by ItemSnapshotter.AlsoHandles during backup|2022 H1|This will enable backup of complex objects, because we can then tell Velero to ignore things that were already backed up when Velero was previously called recursively.|
Other work may make it into the 1.8 release, but this is the work that will be prioritized first.
# Please go to the [Velero Wiki](https://github.com/vmware-tanzu/velero/wiki/) to see our latest roadmap, archived roadmaps and roadmap guidance.
This folder contains logo images for Velero in gray (for light backgrounds) and white (for dark backgrounds like black tshirts or dark mode!) – horizontal and stacked… in .eps and .svg.
This folder contains logo images for Velero in gray (for light backgrounds) and white (for dark backgrounds like black t-shirts or dark mode!) – horizontal and stacked… in .eps and .svg.
In this release, we introduced the Unified Repository architecture to build a data path where data movers and the backup repository are decoupled and a unified backup repository could serve various data movement activities.
In this release, we also deeply integrate Velero with Kopia, specifically, Kopia's uploader modules are isolated as a generic file system uploader; Kopia's repository modules are encapsulated as the unified backup repository.
For more information, refer to the [design document](https://github.com/vmware-tanzu/velero/blob/v1.10.0/design/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md).
#### File system backup refactor
Velero's file system backup (a.k.s. pod volume backup or formerly restic backup) is refactored as the first user of the Unified Repository architecture. Specifically, we added a new path, the Kopia path, besides the existing Restic path. While Restic path is still available and set as default, you can opt in Kopia path by specifying the `uploader-type` parameter at installation time. Meanwhile, you are free to restore from existing backups under either path, Velero dynamically switches to the correct path to process the restore.
Because of the new path, we renamed some modules and parameters, refer to the Break Changes section for more details.
For more information, visit the [file system backup document](https://velero.io/docs/v1.10/file-system-backup/) and [v1.10 upgrade guide document](https://velero.io/docs/v1.10/upgrade-to-1.10/).
Meanwhile, we've created a performance guide for both Restic path and Kopia path, which helps you to choose between the two paths and provides you the best practice to configure them under different scenarios. Please note that the results in the guide are based on our testing environments, you may get different results when testing in your own ones. For more information, visit the [performance guide document](https://velero.io/docs/v1.10/performance-guidance/).
#### Plugin versioning V1 refactor
In this release, Velero moves plugins BackupItemAction, RestoreItemAction and VolumeSnapshotterAction to version v1, this allows future plugin changes that do not support backward compatibility, so is a preparation for various complex tasks, for example, data movement tasks.
For more information, refer to the [plugin versioning design document](https://github.com/vmware-tanzu/velero/blob/v1.10.0/design/plugin-versioning.md).
#### Refactor the controllers using Kubebuilder v3
In this release we continued our code modernization work, rewriting some controllers using Kubebuilder v3. This work is ongoing and we will continue to make progress in future releases.
#### Add credentials to volume snapshot locations
In this release, we enabled dedicate credentials options to volume snapshot locations so that you can specify credentials per volume snapshot location as same as backup storage location.
For more information, please visit the [locations document](https://velero.io/docs/v1.10/locations/).
#### CSI snapshot enhancements
In this release we added several changes to enhance the robustness of CSI snapshot procedures, for example, some protection code for error handling, and a mechanism to skip exclusion checks so that CSI snapshot works with various backup resource filters.
#### Backup schedule pause/unpause
In this release, Velero supports to pause/unpause a backup schedule during or after its creation. Specifically:
At creation time, you can specify `–paused` flag to `velero schedule create` command, if so, you will create a paused schedule that will not run until it is unpaused
After creation, you can run `velero schedule pause` or `velero schedule unpause` command to pause/unpause a schedule
#### Runtime and dependencies
In order to fix CVEs, we changed Velero's runtime and dependencies as follows:
Bump go runtime to v1.18.8
Bump some core dependent libraries to newer versions
Compile Restic (v0.13.1) with go 1.18.8 instead of packaging the official binary
#### Breaking changes
Due to file system backup refactor, below modules and parameters name have been changed in this release:
`restic` daemonset is renamed to `node-agent`
`resticRepository` CR is renamed to `backupRepository`
`velero restic repo` command is renamed to `velero repo`
`velero-restic-credentials` secret is renamed to `velero-repo-credentials`
`default-volumes-to-restic` parameter is renamed to `default-volumes-to-fs-backup`
`restic-timeout` parameter is renamed to `fs-backup-timeout`
`default-restic-prune-frequency` parameter is renamed to `default-repo-maintain-frequency`
#### Upgrade
Due to the major changes of file system backup, the old upgrade steps are not suitable any more. For the new upgrade steps, visit [v1.10 upgrade guide document](https://velero.io/docs/v1.10/upgrade-to-1.10/).
#### Limitations/Known issues
In this release, Kopia backup repository (so the Kopia path of file system backup) doesn't support self signed certificate for S3 compatible storage. To track this problem, refer to this [Velero issue](https://github.com/vmware-tanzu/velero/issues/5123) or [Kopia issue](https://github.com/kopia/kopia/issues/1443).
Due to the code change in Velero, there will be some code change required in vSphere plugin, without which the functionality may be impacted. Therefore, if you are using vSphere plugin in your workflow, please hold the upgrade until the issue [#485](https://github.com/vmware-tanzu/velero-plugin-for-vsphere/issues/485) is fixed in vSphere plugin.
### All changes
* Restore ClusterBootstrap before Cluster otherwise a new default ClusterBootstrap object is create for the cluster (#5616, @ywk253100)
* Add compile restic binary for CVE fix (#5574, @qiuming-best)
* Add credential store in backup deletion controller to support VSL credential. (#5521, @blackpiglet)
* Fix issue 5505: the pod volume backups/restores except the first one fail under the kopia path if "AZURE_CLOUD_NAME" is specified (#5512, @Lyndon-Li)
* After Pod Volume Backup/Restore refactor, remove all the unreasonable appearance of "restic" word from documents (#5499, @Lyndon-Li)
* Refactor Pod Volume Backup/Restore doc to match the new behavior (#5484, @Lyndon-Li)
* Remove redundancy code block left by #5388. (#5483, @blackpiglet)
* Issue fix 5477: create the common way to support S3 compatible object storages that work for both Restic and Kopia; Keep the resticRepoPrefix parameter for compatibility (#5478, @Lyndon-Li)
* Update the k8s.io dependencies to 0.24.0.
This also required an update to github.com/bombsimon/logrusr/v3.
Removed the `WithClusterName` method
as it is a "legacy field that was
always cleared by the system and never used" as per upstream k8s
* Remove irrational "Restic" names in Velero code after the PVBR refactor (#5444, @Lyndon-Li)
* moved RIA execute input/output structs back to velero package (#5441, @sseago)
* Rename Velero pod volume restore init helper from "velero-restic-restore-helper" to "velero-restore-helper" (#5432, @Lyndon-Li)
* Skip the exclusion check for additional resources returned by BIA (#5429, @reasonerjt)
* Change B/R describe CLI to support Kopia (#5412, @allenxu404)
* Add nil check before execution of csi snapshot delete (#5401, @shubham-pampattiwar)
* update velero using klog to version v2.9.0 (#5396, @blackpiglet)
* Fix Test_prepareBackupRequest_BackupStorageLocation UT failure. (#5394, @blackpiglet)
* Rename Velero daemonset from "restic" to "node-agent" (#5390, @Lyndon-Li)
* Add some corner cases checking for CSI snapshot in backup controller. (#5388, @blackpiglet)
* Fix issue 5386: Velero providers a full URL as the S3Url while the underlying minio client only accept the host part of the URL as the endpoint and the schema should be specified separately. (#5387, @Lyndon-Li)
* Fix restore error with flag namespace-mappings (#5377, @qiuming-best)
* Pod Volume Backup/Restore Refactor: Rename parameters in CRDs and commands to remove "Restic" word (#5370, @Lyndon-Li)
* Added backupController's UT to test the prepareBackupRequest() method BackupStorageLocation processing logic (#5362, @niulechuan)
* Fix a repoEnsurer problem introduced by the refactor - The repoEnsurer didn't check "" state of BackupRepository, as a result, the function GetBackupRepository always returns without an error even though the ensreReady is specified. (#5359, @Lyndon-Li)
* Add E2E test for schedule backup (#5355, @danfengliu)
* Add useOwnerReferencesInBackup field doc for schedule. (#5353, @cleverhu)
* Clarify the help message for the default value of parameter --snapshot-volumes, when it's not set. (#5350, @blackpiglet)
* Fix issue 4874 and 4752: check the daemonset pod is running in the node where the workload pod resides before running the PVB for the pod (#5319, @Lyndon-Li)
* plugin versioning v1 refactor for VolumeSnapshotter (#5318, @sseago)
* Change the status of restore to completed from partially failed when restore empty backup (#5314, @allenxu404)
* RestoreItemAction v1 refactoring for plugin api versioning (#5312, @sseago)
* Refactor the repoEnsurer code to use controller runtime client and wrap some common BackupRepository operations to share with other modules (#5308, @Lyndon-Li)
* Remove snapshot related lister, informer and client from backup controller. (#5299, @jxun)
* change CSISnapshotTimeout from pointer to normal variables. (#5294, @cleverhu)
* Optimize code for restore exists resources. (#5293, @cleverhu)
* Add more detailed comments for labels columns. (#5291, @cleverhu)
* Add backup status checking in schedule controller. (#5283, @blackpiglet)
* Add changes for problems/enhancements found during smoking test for Kopia pod volume backup/restore (#5282, @Lyndon-Li)
* Support pause/unpause schedules (#5279, @ywk253100)
* plugin/clientmgmt refactoring for BackupItemAction v1 (#5271, @sseago)
* Don't move velero v1 plugins to new proto dir (#5263, @sseago)
* Fill gaps for Kopia path of PVBR: integrate Repo Manager with Unified Repo; pass UploaderType to PVBR backupper and restorer; pass RepositoryType to BackupRepository controller and Repo Ensurer (#5259, @Lyndon-Li)
* Add csiSnapshotTimeout for describe backup (#5252, @cleverhu)
* equip gc controller with configurable frequency (#5248, @allenxu404)
* Fix nil pointer panic when restoring StatefulSets (#5247, @divolgin)
* Add labeled and unlabeled events for PR changelog check action. (#5157, @jxun)
* VolumeSnapshotLocation refactor with kubebuilder. (#5148, @jxun)
* Delay CA file deletion in PVB controller. (#5145, @jxun)
* This commit splits the pkg/restic package into several packages to support Kopia integration works (#5143, @ywk253100)
* Kopia Integration: Add the Unified Repository Interface definition. Kopia Integration: Add the changes for Unified Repository storage config. Related Issues; #5076, #5080 (#5142, @Lyndon-Li)
* Update the CRD for kopia integration (#5135, @reasonerjt)
* Let "make shell xxx" respect GOPROXY (#5128, @reasonerjt)
* Modify BackupStoreGetter to avoid BSL spec changes (#5122, @sseago)
* Dump stack trace when the plugin server handles panic (#5110, @reasonerjt)
* Make CSI snapshot creation timeout configurable. (#5104, @jxun)
* Fix bsl validation bug: the BSL is validated continually and doesn't respect the validation period configured (#5101, @ywk253100)
* Exclude "csinodes.storage.k8s.io" and "volumeattachments.storage.k8s.io" from restore by default. (#5064, @jxun)
* Move 'velero.io/exclude-from-backup' label string to const (#5053, @niulechuan)
* Modify Github actions. (#5052, @jxun)
* Fix typo in doc, in https://velero.io/docs/main/restore-reference/ "Restore order" section, "Mamespace" should be "Namespace". (#5051, @niulechuan)
* Delete opened issues triage action. (#5041, @jxun)
* When spec.RestoreStatus is empty, don't restore status (#5008, @sseago)
* Added DownloadTargetKindCSIBackupVolumeSnapshots for retrieving the signed URL to download only the `<backup name>`-csi-volumesnapshots.json.gz and DownloadTargetKindCSIBackupVolumeSnapshotContents to download only `<backup name>`-csi-volumesnapshotcontents.json.gz in the DownloadRequest CR structure. These files are already present in the backup layout. (#4980, @anshulahuja98)
* Refactor BackupItemAction proto and related code to backupitemaction/v1 package. This is part of implementation of the plugin version design https://github.com/vmware-tanzu/velero/blob/main/design/plugin-versioning.md (#4943, @phuongatemc)
* Unified Repository Design (#4926, @Lyndon-Li)
* Add credentials to volume snapshot locations (#4864, @sseago)
This feature implements the BackupItemAction v2. BIA v2 has two new methods: Progress() and Cancel() and modifies the Execute() return value.
The API change is needed to facilitate long-running BackupItemAction plugin actions that may not be complete when the Execute() method returns. This will allow long-running BackupItemAction plugin actions to continue in the background while the Velero moves to the following plugin or the next item.
#### RestoreItemAction v2
This feature implemented the RestoreItemAction v2. RIA v2 has three new methods: Progress(), Cancel(), and AreAdditionalItemsReady(), and it modifies RestoreItemActionExecuteOutput() structure in the RIA return value.
The Progress() and Cancel() methods are needed to facilitate long-running RestoreItemAction plugin actions that may not be complete when the Execute() method returns. This will allow long-running RestoreItemAction plugin actions to continue in the background while the Velero moves to the following plugin or the next item. The AreAdditionalItemsReady() method is needed to allow plugins to tell Velero to wait until the returned additional items have been restored and are ready for use in the cluster before restoring the current item.
#### Plugin Progress Monitoring
This is intended as a replacement for the previously-approved Upload Progress Monitoring design ([Upload Progress Monitoring](https://github.com/vmware-tanzu/velero/blob/main/design/upload-progress.md)) to expand the supported use cases beyond snapshot upload to include what was previously called Async Backup/Restore Item Actions.
#### Flexible resource policy that can filter volumes to skip in the backup
This feature provides a flexible policy to filter volumes in the backup without requiring patching any labels or annotations to the pods or volumes. This policy is configured as k8s ConfigMap and maintained by the users themselves, and it can be extended to more scenarios in the future. By now, the policy rules out volumes from backup depending on the CSI driver, NFS setting, volume size, and StorageClass setting. Please refer to [policy API design](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/handle-backup-of-volumes-by-resources-filters.md#api-design)for the policy's ConifgMap format. It is not guaranteed to work on unofficial third-party plugins as it may not follow the existing backup workflow code logic of Velero.
#### Resource Filters that can distinguish cluster scope and namespace scope resources
This feature adds four new resource filters for backup. The new filters are separated into cluster scope and namespace scope. Before this feature, Velero could not filter cluster scope resources precisely. This feature provides the ability and refactors existing resource filter parameters.
#### Add a parameter for setting the Velero server connection with the k8s API server's timeout
In Velero, some code pieces need to communicate with the k8s API server. Before v1.11, these code pieces used hard-code timeout settings. This feature adds a resource-timeout parameter in the velero server binary to make it configurable.
#### Add resource list in the output of the restore describe command
Before this feature, Velero restore didn't have a restored resources list as the Velero backup. It's not convenient for users to learn what is restored. This feature adds the resources list and the handling result of the resources (including created, updated, failed, and skipped).
#### Refactor controllers with controller-runtime
In v1.11, Backup Controller and Restore controller are refactored with controller-runtime. Till v1.11, all Velero controllers use the controller-runtime framework.
#### Runtime and dependencies
To fix CVEs and keep pace with Golang, Velero made changes as follows:
* Bump Golang runtime to v1.19.8.
* Bump several dependent libraries to new versions.
* Compile Restic (v0.15.0) with Golang v1.19.8 instead of packaging the official binary.
### Breaking changes
* The Velero CSI plugin now determines whether to restore Volume's data from snapshots on the restore's restorePVs setting. Before v1.11, the CSI plugin doesn't check the restorePVs parameter setting.
### Limitations/Known issues
* The Flexible resource policy that can filter volumes to skip in the backup is not guaranteed to work on unofficial third-party plugins because the plugins may not follow the existing backup workflow code logic of Velero. The ConfigMap used as the policy is supposed to be maintained by users.
### All Changes
* Modify new scope resource filters name. (#6089, @blackpiglet)
* Make Velero not exits when EnableCSI is on and CSI snapshot not installed (#6062, @blackpiglet)
* Restore Services before Clusters (#6057, @ywk253100)
* Fixed backup deletion bug related to async operations (#6041, @sseago)
* Update Golang version to v1.19 for branch main. (#6039, @blackpiglet)
* Fix issue #5972, don't assume errorField as error type when dealing with logger.WithError (#6028, @Lyndon-Li)
* distinguish between New and InProgress operations (#6012, @sseago)
* Modify golangci.yaml file. Resolve found lint issues. (#6008, @blackpiglet)
* Remove Reference of itemsnapshotter (#5997, @reasonerjt)
* minor fixes for backup_operations_controller (#5996, @sseago)
* RIAv2 async operations controller work (#5993, @sseago)
* Follow-on fixes for BIAv2 controller work (#5971, @sseago)
* Refactor backup controller based on the controller-runtime framework. (#5969, @qiuming-best)
* Fix client wait problem after async operation change, velero backup/restore --wait should check a full list of the terminal status (#5964, @Lyndon-Li)
* Fix issue #5935, refactor the logics for backup/restore persistent log, so as to remove the contest to gzip writer (#5956, @Lyndon-Li)
* Switch the base image to distroless/base-nossl-debian11 to reduce the CVE triage efforts (#5939, @ywk253100)
* Wait for additional items to be ready before restoring current item (#5933, @sseago)
* Add configurable server setting for default timeouts (#5926, @eemcmullan)
* Add warning/error result to cmd `velero backup describe` (#5916, @allenxu404)
* Fix Dependabot alerts. Use 1.18 and 1.19 golang instead of patch image in dockerfile. Add release-1.10 and release-1.9 in Trivy daily scan. (#5911, @blackpiglet)
* Update client-go to v0.25.6 (#5907, @kaovilai)
* Limit the concurrent number for backup's VolumeSnapshot operation. (#5900, @blackpiglet)
* Fix goreleaser issue for resolving tags and updated it's version. (#5899, @anshulahuja98)
* This is to fix issue 5881, enhance the PVB tracker in two modes, Track and Taken (#5894, @Lyndon-Li)
* Add labels for velero installed namespace to support PSA. (#5873, @blackpiglet)
* Add restored resource list in the restore describe command (#5867, @ywk253100)
* Add a json output to cmd velero backup describe (#5865, @allenxu404)
* Make restore controller adopting the controller-runtime framework. (#5864, @blackpiglet)
* Replace k8s.io/apimachinery/pkg/util/clock with k8s.io/utils/clock (#5859, @hezhizhen)
* Restore finalizer and managedFields of metadata during the restoration (#5853, @ywk253100)
* BIAv2 async operations controller work (#5849, @sseago)
* Add secret restore item action to handle service account token secret (#5843, @ywk253100)
* Add new resource filters can separate cluster and namespace scope resources. (#5838, @blackpiglet)
* Correct PVB/PVR Failed Phase patching during startup (#5828, @kaovilai)
* bump up golang net to fix CVE-2022-41721 (#5812, @Lyndon-Li)
* Update CRD descriptions for SnapshotVolumes and restorePVs (#5807, @shubham-pampattiwar)
* Add mapped selected-node existence check (#5806, @blackpiglet)
* Add option "--service-account-name" to install cmd (#5802, @reasonerjt)
* Define itemoperations.json format and update DownloadRequest API (#5752, @sseago)
* Add Trivy nightly scan. (#5740, @jxun)
* Fix issue 5696, check if the repo is still openable before running the prune and forget operation, if not, try to reconnect the repo (#5715, @Lyndon-Li)
* Fix error with Restic backup empty volumes (#5713, @qiuming-best)
* new backup and restore phases to support async plugin operations:
CSI Snapshot Data Movement refers to back up CSI snapshot data from the volatile and limited production environment into durable, heterogeneous, and scalable backup storage in a consistent manner; and restore the data to volumes in the original or alternative environment.
CSI Snapshot Data Movement is useful in below scenarios:
* For on-premises users, the storage usually doesn't support durable snapshots, so it is impossible/less efficient/cost ineffective to keep volume snapshots by the storage This feature helps to move the snapshot data to a storage with lower cost and larger scale for long time preservation.
* For public cloud users, this feature helps users to fulfill the multiple cloud strategy. It allows users to back up volume snapshots from one cloud provider and preserve or restore the data to another cloud provider. Then users will be free to flow their business data across cloud providers based on Velero backup and restore
CSI Snapshot Data Movement is built according to the Volume Snapshot Data Movement design ([Volume Snapshot Data Movement](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md)). More details can be found in the design.
#### Resource Modifiers
In many use cases, customers often need to substitute specific values in Kubernetes resources during the restoration process like changing the namespace, changing the storage class, etc.
To address this need, Resource Modifiers (also known as JSON Substitutions) offer a generic solution in the restore workflow. It allows the user to define filters for specific resources and then specify a JSON patch (operator, path, value) to apply to the resource. This feature simplifies the process of making substitutions without requiring the implementation of a new RestoreItemAction plugin. More details can be found in Volume Snapshot Resource Modifiers design ([Resource Modifiers](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/json-substitution-action-design.md)).
#### Multiple VolumeSnapshotClasses
Prior to version 1.12, the Velero CSI plugin would choose the VolumeSnapshotClass in the cluster based on matching driver names and the presence of the "velero.io/csi-volumesnapshot-class" label. However, this approach proved inadequate for many user scenarios.
With the introduction of version 1.12, Velero now offers support for multiple VolumeSnapshotClasses in the CSI Plugin, enabling users to select a specific class for a particular backup. More details can be found in Multiple VolumeSnapshotClasses design ([Multiple VolumeSnapshotClasses](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/multiple-csi-volumesnapshotclass-support.md)).
#### Restore Finalizer
Before v1.12, the restore controller would only delete restore resources but wouldn’t delete restore data from the backup storage location when the command `velero restore delete` was executed. The only chance Velero deletes restores data from the backup storage location is when the associated backup is deleted.
In this version, Velero introduces a finalizer that ensures the cleanup of all associated data for restores when running the command `velero restore delete`.
#### Runtime and dependencies
To fix CVEs and keep pace with Golang, Velero made changes as follows:
* Bump Golang runtime to v1.20.7.
* Bump several dependent libraries to new versions.
* Bump Kopia to v0.13.
### Breaking changes
* Prior to v1.12, the parameter `uploader-type` for Velero installation had a default value of "restic". However, starting from this version, the default value has been changed to "kopia". This means that Velero will now use Kopia as the default path for file system backup.
* The ways of setting CSI snapshot time have changed in v1.12. First, the sync waiting time for creating a snapshot handle in the CSI plugin is changed from the fixed 10 minutes into backup.Spec.CSISnapshotTimeout. The second, the async waiting time for VolumeSnapshot and VolumeSnapshotContent's status turning into `ReadyToUse` in operation uses the operation's timeout. The default value is 4 hours.
* As from [Velero helm chart v4.0.0](https://github.com/vmware-tanzu/helm-charts/releases/tag/velero-4.0.0), it supports multiple BSL and VSL, and the BSL and VSL have changed from the map into a slice, and[ this breaking change](https://github.com/vmware-tanzu/helm-charts/pull/413) is not backward compatible. So it would be best to change the BSL and VSL configuration into slices before the Upgrade.
### Limitations/Known issues
* The Azure plugin supports Azure AD Workload identity way, but it only works for Velero native snapshots. It cannot support filesystem backup and snapshot data mover scenarios.
### All Changes
* Fixes #6498. Get resource client again after restore actions in case resource's gv is changed. This is an improvement of pr #6499, to support group changes. A group change usually happens in a restore plugin which is used for resource conversion: convert a resource from a not supported gv to a supported gv (#6634, @27149chen)
* Add API support for volMode block, only error for now. (#6608, @shawn-hurley)
* Fix how the AWS credentials are obtained from configuration (#6598, @aws_creds)
* Add performance E2E test (#6569, @qiuming-best)
* Non default s3 credential profiles work on Unified Repository Provider (kopia) (#6558, @kaovilai)
* Fix issue #6571, fix the problem for restore item operation to set the errors correctly so that they can be recorded by Velero restore and then reflect the correct status for Velero restore. (#6594, @Lyndon-Li)
* Fix issue 6575, flush the repo after delete the snapshot, otherwise, the changes(deleting repo snapshot) cannot be committed to the repo. (#6587, @Lyndon-Li)
* Delete moved snapshots when the backup is deleted (#6547, @reasonerjt)
* check if restore crd exist before operating restores (#6544, @allenxu404)
* Remove PVC's selector in backup's PVC action. (#6481, @blackpiglet)
* Delete the expired deletebackuprequests that are stuck in "InProgress" (#6476, @reasonerjt)
* Fix issue #6534, reset PVB CR's StorageLocation to the latest one during backup sync as same as the backup CR. Also fix similar problem with DataUploadResult for data mover restore. (#6533, @Lyndon-Li)
* Fix issue #6519. Restrict the client manager of node-agent server to include only Velero resources from the server's namespace, otherwise, the controllers will try to reconcile CRs from all the installed Velero namespaces. (#6523, @Lyndon-Li)
* Track the skipped PVC and print the summary in backup log (#6496, @reasonerjt)
* Add restore finalizer to clean up external resources (#6479, @allenxu404)
* fix: Typos and add more spell checking rules to CI (#6415, @mateusoliveira43)
* Add missing CompletionTimestamp and metrics when restore moved into terminal phase in restoreOperationsReconciler (#6397, @Nutrymaco)
* Add support for resource Modifications in the restore flow. Also known as JSON Substitutions. (#6452, @anshulahuja98)
* Remove dependency of the legacy client code from pkg/cmd directory part 2 (#6497, @blackpiglet)
* Add data upload and download metrics (#6493, @allenxu404)
* Fix issue 6490, If a backup/restore has multiple async operations and one operation fails while others are still in-progress, when all the operations finish, the backup/restore will be set as Completed falsely (#6491, @Lyndon-Li)
* Velero Plugins no longer need kopia indirect dependency in their go.mod (#6484, @kaovilai)
* Remove dependency of the legacy client code from pkg/cmd directory (#6469, @blackpiglet)
* Add support for OpenStack CSI drivers topology keys (#6464, @openstack-csi-topology-keys)
* Add exit code log and possible memory shortage warning log for Restic command failure. (#6459, @blackpiglet)
* Add UT cases for pkg/podvolume (#6336, @Lyndon-Li)
* Remove Wait VolumeSnapshot to ReadyToUse logic. (#6327, @blackpiglet)
* Enhance the code because of #6297, the return value of GetBucketRegion is not recorded, as a result, when it fails, we have no way to get the cause (#6326, @Lyndon-Li)
* Skip updating status when CRDs are restored (#6325, @reasonerjt)
* Include namespaces needed by namespaced-scope resources in backup. (#6320, @blackpiglet)
* Update metrics when backup failed with validation error (#6318, @ywk253100)
* Add the code for data mover backup expose (#6308, @Lyndon-Li)
* Fix a PVR issue for generic data path -- the namespace remap was not honored, and enhance the code for better error handling (#6303, @Lyndon-Li)
* Add default values for defaultItemOperationTimeout and itemOperationSyncFrequency in velero CLI (#6298, @shubham-pampattiwar)
* Add UT cases for pkg/repository (#6296, @Lyndon-Li)
* Fix issue #5875. Since Kopia has supported IAM, Velero should not require static credentials all the time (#6283, @Lyndon-Li)
* Fixed a bug where status.progress is not getting updated for backups. (#6276, @kkothule)
* Add code change for async generic data path that is used by both PVB/PVR and data mover (#6226, @Lyndon-Li)
* Add data mover CRD under v2alpha1, include DataUpload CRD and DataDownload CRD (#6176, @Lyndon-Li)
* Remove any dataSource or dataSourceRef fields from PVCs in PVC BIA for cases of
prior PVC restores with CSI (#6111, @eemcmullan)
* Add the design for Volume Snapshot Data Movement (#5968, @Lyndon-Li)
* Fix issue #5123, Kopia repository supports self-cert CA for S3 compatible storage. (#6268, @Lyndon-Li)
* Bump up Kopia to v0.13 (#6248, @Lyndon-Li)
* log volumes to backup to help debug why `IsPodRunning` is called. (#6232, @kaovilai)
* Enable errcheck linter and resolve found issues (#6208, @blackpiglet)
* Enable more linters, and remove mal-functioned milestoned issue action. (#6194, @blackpiglet)
* Enable stylecheck linter and resolve found issues. (#6185, @blackpiglet)
* Fix issue #6182. If pod is not running, don't treat it as an error, let it go and leave a warning. (#6184, @Lyndon-Li)
* Enable staticcheck and resolve found issues (#6183, @blackpiglet)
* Enable linter revive and resolve found errors: part 2 (#6177, @blackpiglet)
* Enable linter revive and resolve found errors: part 1 (#6173, @blackpiglet)
* Fix usestdlibvars and whitespace linters issues. (#6162, @blackpiglet)
* Update Golang to v1.20 for main. (#6158, @blackpiglet)
* Make GetPluginConfig accessible from other packages. (#6151, @tkaovila)
* Ignore not found error during patching managedFields (#6136, @ywk253100)
* Fix the goreleaser issues and add a new goreleaser action (#6109, @blackpiglet)
Velero introduced the Resource Modifiers in v1.12.0. This feature allows users to specify a ConfigMap with a set of rules to modify the resources during restoration. However, only the JSON Patch is supported when creating the rules, and JSON Patch has some limitations, which cannot cover all use cases. In v1.13.0, Velero adds new support for JSON Merge Patch and Strategic Merge Patch, which provide more power and flexibility and allow users to use the same ConfigMap to apply patches on the resources. More design details can be found in [Support JSON Merge Patch and Strategic Merge Patch in Resource Modifiers](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/merge-patch-and-strategic-in-resource-modifier.md) design. For instructions on how to use the feature, please refer to the [Resource Modifiers](https://velero.io/docs/v1.13/restore-resource-modifiers/) doc.
#### Node-Agent Concurrency
Velero data movement activities from fs-backups and CSI snapshot data movements run in Velero node-agent, so may be hosted by every node in the cluster and consume resources (i.e. CPU, memory, network bandwidth) from there. With v1.13, users are allowed to configure how many data movement activities (a.k.a, loads) run in each node globally or by node, so that users can better leverage the performance of Velero data movement activities and the resource consumption in the cluster. For more information, check the [Node-Agent Concurrency](https://velero.io/docs/v1.13/node-agent-concurrency/) document.
#### Parallel Files Upload Options
Velero now supports configurable options for parallel files upload when using Kopia uploader to do fs-backups or CSI snapshot data movements which makes speed up backup possible.
For more information, please check [Here](https://velero.io/docs/v1.13/backup-reference/#parallel-files-upload).
#### Write Sparse Files Options
If using fs-restore or CSI snapshot data movements, it’s supported to write sparse files during restore. For more information, please check [Here](https://velero.io/docs/v1.13/restore-reference/#write-sparse-files).
#### Backup Describe
In v1.13, the Backup Volume section is added to the velero backup describe command output. The backup Volumes section describes information for all the volumes included in the backup of various backup types, i.e. native snapshot, fs-backup, CSI snapshot, and CSI snapshot data movement. Particularly, the velero backup description now supports showing the information of CSI snapshot data movements, which is not supported in v1.12.
Additionally, backup describe command will not check EnableCSI feature gate from client side, so if a backup has volumes with CSI snapshot or CSI snapshot data movement, backup describe command always shows the corresponding information in its output.
#### Backup's new VolumeInfo metadata
Create a new metadata file in the backup repository's backup name sub-directory to store the backup-including PVC and PV information. The information includes the backing-up method of the PVC and PV data, snapshot information, and status. The VolumeInfo metadata file determines how the PV resource should be restored. The Velero downstream software can also use this metadata file to get a summary of the backup's volume data information.
#### Enhancement for CSI Snapshot Data Movements when Velero Pod Restart
When performing backup and restore operations, enhancements have been implemented for Velero server pods or node agents to ensure that the current backup or restore process is not stuck or interrupted after restart due to certain exceptional circumstances.
#### New status fields added to show hook execution details
Hook execution status is now included in the backup/restore CR status and displayed in the backup/restore describe command output. Specifically, it will show the number of hooks which attempted to execute under the HooksAttempted field and the number of hooks which failed to execute under the HooksFailed field.
#### AWS SDK Bump Up
Bump up AWS SDK for Go to version 2, which offers significant performance improvements in CPU and memory utilization over version 1.
#### Azure AD/Workload Identity Support
Azure AD/Workload Identity is the recommended approach to do the authentication with Azure services/AKS, Velero has introduced support for Azure AD/Workload Identity on the Velero Azure plugin side in previous releases, and in v1.13.0 Velero adds new support for Kopia operations(file system backup/data mover/etc.) with Azure AD/Workload Identity.
#### Runtime and dependencies
To fix CVEs and keep pace with Golang, Velero made changes as follows:
* Bump Golang runtime to v1.21.6.
* Bump several dependent libraries to new versions.
* Bump Kopia to v0.15.0.
### Breaking changes
* Backup describe command: due to the backup describe output enhancement, some existing information (i.e. the output for native snapshot, CSI snapshot, and fs-backup) has been moved to the Backup Volumes section with some format changes.
* API type changes: changes the field [DataMoverConfig](https://github.com/vmware-tanzu/velero/blob/v1.13.0/pkg/apis/velero/v2alpha1/data_upload_types.go#L54) in DataUploadSpec from `*map[string][string]`` to `map[string]string`
* Velero install command: due to the issue [#7264](https://github.com/vmware-tanzu/velero/issues/7264), v1.13.0 introduces a break change that make the informer cache enabled by default to keep the actual behavior consistent with the helper message(the informer cache is disabled by default before the change).
### Limitations/Known issues
* The backup's VolumeInfo metadata doesn't have the information updated in the async operations. This function could be supported in v1.14 release.
### Note
* Velero introduces the informer cache which is enabled by default. The informer cache improves the restore performance but may cause higher memory consumption. Increase the memory limit of the Velero pod or disable the informer cache by specifying the `--disable-informer-cache` option when installing Velero if you get the OOM error.
### Deprecation announcement
* The generated k8s clients, informers, and listers are deprecated in the Velero v1.13 release. They are put in the Velero repository's pkg/generated directory. According to the n+2 supporting policy, the deprecated are kept for two more releases. The pkg/generated directory should be deleted in the v1.15 release.
* After the backup VolumeInfo metadata file is added to the backup, Velero decides how to restore the PV resource according to the VolumeInfo content. To support the backup generated by the older version of Velero, the old logic is also kept. The support for the backup without the VolumeInfo metadata file will be kept for two releases. The support logic will be deleted in the v1.15 release.
### All Changes
* Check resource Group Version and Kind is available in cluster before attempting restore to prevent being stuck (#7336, @kaovilai)
* Make "disable-informer-cache" option false(enabled) by default to keep it consistent with the help message (#7294, @ywk253100)
* Do not set "targetNamespace" to namespace items (#7274, @reasonerjt)
* Fix issue #7244. By the end of the upload, check the outstanding incomplete snapshots and delete them by calling ApplyRetentionPolicy (#7245, @Lyndon-Li)
* Adjust the newline output of resource list in restore describer (#7238, @allenxu404)
* Remove the redundant newline in backup describe output (#7229, @allenxu404)
* Fix issue #7189, data mover generic restore - don't assume the first volume as the restore volume (#7201, @Lyndon-Li)
* Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content. (#7184, @blackpiglet)
* Refactor DownloadRequest Stream function (#7175, @blackpiglet)
* Add `--skip-immediately` flag to schedule commands; `--schedule-skip-immediately` server and install (#7169, @kaovilai)
* Add node-agent concurrency doc and change the config name from dataPathConcurrency to loadCocurrency (#7161, @Lyndon-Li)
* Enhance hooks tracker by adding a returned error to record function (#7153, @allenxu404)
* Track the skipped PV when SnapshotVolumes set as false (#7152, @reasonerjt)
* Add more linters part 2. (#7151, @blackpiglet)
* Fix issue #7135, check pod status before checking node-agent pod status (#7150, @Lyndon-Li)
* Treat namespace as a regular restorable item (#7143, @reasonerjt)
* Fix issue #6695, add describe for data mover backups (#7125, @Lyndon-Li)
* Add hooks status to backup/restore CR (#7117, @allenxu404)
* Include plugin name in the error message by operations (#7115, @reasonerjt)
* Fix issue #7068, due to a behavior of CSI external snapshotter, manipulations of VS and VSC may not be handled in the same order inside external snapshotter as the API is called. So add a protection finalizer to ensure the order (#7102, @Lyndon-Li)
* Generate VolumeInfo for backup. (#7100, @blackpiglet)
* Fix issue #7094, fallback to full backup if previous snapshot is not found (#7096, @Lyndon-Li)
* Fix issue #7068, due to an behavior of CSI external snapshotter, manipulations of VS and VSC may not be handled in the same order inside external snapshotter as the API is called. So add a protection finalizer to ensure the order (#7095, @Lyndon-Li)
* Skip syncing the backup which doesn't contain backup metadata (#7081, @ywk253100)
* Fix issue #6693, partially fail restore if CSI snapshot is involved but CSI feature is not ready, i.e., CSI feature gate is not enabled or CSI plugin is not installed. (#7077, @Lyndon-Li)
* Truncate the credential file to avoid the change of secret content messing it up (#7072, @ywk253100)
* improve discoveryHelper.Refresh() in restore (#7069, @27149chen)
* Add DataUpload Result and CSI VolumeSnapshot check for restore PV. (#7061, @blackpiglet)
* Add the implementation for design #6950, configurable data path concurrency (#7059, @Lyndon-Li)
* Make data mover fail early (#7052, @qiuming-best)
* Remove dependency of generated client part 3. (#7051, @blackpiglet)
* Update Backup.Status.CSIVolumeSnapshotsCompleted during finalize (#7046, @kaovilai)
* Remove the Velero generated client. (#7041, @blackpiglet)
* Fix issue #7027, data mover backup exposer should not assume the first volume as the backup volume in backup pod (#7038, @Lyndon-Li)
* Read information from the credential specified by BSL (#7034, @ywk253100)
* Fix #6857. Added check for matching Owner References when synchronizing backups, removing references that are not found/have mismatched uid. (#7032, @deefdragon)
* Add description markers for dataupload and datadownload CRDs (#7028, @shubham-pampattiwar)
* Add HealthCheckNodePort deletion logic for Service restore. (#7026, @blackpiglet)
* Fix inconsistent behavior of Backup and Restore hook execution (#7022, @allenxu404)
* Fix #6964. Don't use csiSnapshotTimeout (10 min) for waiting snapshot to readyToUse for data mover, so as to make the behavior complied with CSI snapshot backup (#7011, @Lyndon-Li)
* restore: Use warning when Create IsAlreadyExist and Get error (#7004, @kaovilai)
* Bump kopia to 0.15.0 (#7001, @Lyndon-Li)
* Make Kopia file parallelism configurable (#7000, @qiuming-best)
* It is a valid case that the Status.RestoreSize field in VolumeSnapshot is not set, if so, get the volume size from the source PVC to create the backup PVC (#6976, @Lyndon-Li)
* Check whether the action is a CSI action and whether CSI feature is enabled, before executing the action. (#6968, @blackpiglet)
* Add the PV backup information design document. (#6962, @blackpiglet)
* Change controller-runtime List option from MatchingFields to ListOptions (#6958, @blackpiglet)
* Add the design for node-agent concurrency (#6950, @Lyndon-Li)
* Import auth provider plugins (#6947, @0x113)
* Fix #6668, add a limitation for file system restore parallelism with other types of restores (CSI snapshot restore, CSI snapshot movement restore) (#6946, @Lyndon-Li)
* Add MSI Support for Azure plugin. (#6938, @yanggangtony)
* Partially fix #6734, guide Kubernetes' scheduler to spread backup pods evenly across nodes as much as possible, so that data mover backup could achieve better parallelism (#6926, @Lyndon-Li)
* Bump up aws sdk to aws-sdk-go-v2 (#6923, @reasonerjt)
* Optional check if targeted container is ready before executing a hook (#6918, @Ripolin)
* Support JSON Merge Patch and Strategic Merge Patch in Resource Modifiers (#6917, @27149chen)
* Fix issue 6913: Velero Built-in Datamover: Backup stucks in phase WaitingForPluginOperations when Node Agent pod gets restarted (#6914, @shubham-pampattiwar)
* Set ParallelUploadAboveSize as MaxInt64 and flush repo after setting up policy so that policy is retrieved correctly by TreeForSource (#6885, @Lyndon-Li)
* Replace the base image with paketobuildpacks image (#6883, @ywk253100)
* Fix issue #6859, move plugin depending podvolume functions to util pkg, so as to remove the dependencies to unnecessary repository packages like kopia, azure, etc. (#6875, @Lyndon-Li)
* Fix #6861. Only Restic path requires repoIdentifier, so for non-restic path, set the repoIdentifier fields as empty in PVB and PVR and also remove the RepoIdentifier column in the get output of PVBs and PVRs (#6872, @Lyndon-Li)
* Add volume types filter in resource policies (#6863, @qiuming-best)
* change the metrics backup_attempt_total default value to 1. (#6838, @yanggangtony)
* Bump kopia to v0.14 (#6833, @Lyndon-Li)
* Retry failed create when using generateName (#6830, @sseago)
* Fix issue #6786, always delete VSC regardless of the deletion policy (#6827, @Lyndon-Li)
* Proposal to support JSON Merge Patch and Strategic Merge Patch in Resource Modifiers (#6797, @27149chen)
* Fix the node-agent missing metrics-address defines. (#6784, @yanggangtony)
* Fix default BSL setting not work (#6771, @qiuming-best)
* Update restore controller logic for restore deletion (#6770, @ywk253100)
* Fix issue #6753, remove the check for read-only BSL in restore async operation controller since Velero cannot fully support read-only mode BSL in restore at present (#6757, @Lyndon-Li)
* Fix issue #6647, add the --default-snapshot-move-data parameter to Velero install, so that users don't need to specify --snapshot-move-data per backup when they want to move snapshot data for all backups (#6751, @Lyndon-Li)
* Use old(origin) namespace in resource modifier conditions in case namespace may change during restore (#6724, @27149chen)
* Perf improvements for existing resource restore (#6723, @sseago)
* Remove schedule-related metrics on schedule delete (#6715, @nilesh-akhade)
* Kubernetes 1.27 new job label batch.kubernetes.io/controller-uid are deleted during restore per https://github.com/kubernetes/kubernetes/pull/114930 (#6712, @kaovilai)
* This pr made some improvements in Resource Modifiers: 1. add label selector 2. change the field name from groupKind to groupResource (#6704, @27149chen)
* Make Kopia support Azure AD (#6686, @ywk253100)
* Add support for block volumes with Kopia (#6680, @dzaninovic)
* Delete PartiallyFailed orphaned backups as well as Completed ones (#6649, @sseago)
* Add CSI snapshot data movement doc (#6637, @Lyndon-Li)
* Fixes #6636, skip subresource in resource discovery (#6635, @27149chen)
* Add `orLabelSelectors` for backup, restore commands (#6475, @nilesh-akhade)
* fix run preHook and postHook on completed pods (#5211, @cleverhu)
#### Velero plugins now support handling volumes created by the CSI drivers of cloud providers
Versions 1.4 of the Velero plugins for AWS, Azure and GCP now support snapshotting and restoring the persistent volumes provisioned by CSI driver via the APIs of the cloud providers. With this enhancement, users can backup and restore the persistent volumes on these cloud providers without using the Velero CSI plugin. The CSI plugin will remain beta and the feature flag `EnableCSI` will be disabled by default.
For the version of the plugins and the CSI drivers they support respectively please see the table:
We've verified the functionality of Velero on IPv6 dual stack by successfully running the E2E test on IPv6 dual stack environment.
#### Refactor the controllers using Kubebuilder v3
In this release we continued our code modernization work, rewriting some controllers using Kubebuilder v3. This work is ongoing and we will continue to make progress in future releases.
#### Enhancements to E2E test cases
More test cases have been added to the E2E test suite to improve the release health.
#### Respect the cron setting of scheduled backup
The creation time is now taken into account to calculate the next run for scheduled backup.
#### Deleting BSLs also cleans up related resources
When a Backup Storage Location (BSL) is deleted, backup and Restic repository resources will also be deleted.
#### Breaking changes
Starting in v1.8, Velero will only support Kubernetes v1 CRD meaning that Velero v1.8+ will only run on Kubernetes v1.16+. Before upgrading, make sure you are running a supported Kubernetes version. For more information, see our [compatibility matrix](https://github.com/vmware-tanzu/velero#velero-compatibility-matrix).
#### Upload Progress Monitoring and Item Snapshotter
Item Snapshotter plugin API was merged. This will support both Upload Progress
monitoring and the planned Data Mover. Upload Progress monitoring PRs are
in progress for 1.9.
### All changes
* E2E test on ssr object with controller namespace mix-ups (#4521, @mqiu)
* Check whether the volume is provisioned by CSI driver or not by the annotation as well (#4513, @ywk253100)
* Initialize the labels field of `velero backup-location create` option to avoid #4484 (#4491, @ywk253100)
* Fix e2e 2500 namespaces scale test timeout problem (#4480, @mqiu)
* Add backup deletion e2e test (#4401, @danfengliu)
* Return the error when getting backup store in backup deletion controller (#4465, @reasonerjt)
* Ignore the provided port is already allocated error when restoring the LoadBalancer service (#4462, @ywk253100)
* Add rbac and annotation test cases (#4455, @mqiu)
* remove --crds-version in velero install command. (#4446, @jxun)
* Upgrade e2e test vsphere plugin (#4440, @mqiu)
* Fix e2e test failures for the inappropriate optimize of velero install (#4438, @mqiu)
* Limit backup namespaces on test resource filtering cases (#4437, @mqiu)
* Bump up Go to 1.17 (#4431, @reasonerjt)
* Added `<backup name>`-itemsnapshots.json.gz to the backup format. This file exists
when item snapshots are taken and contains an array of volume.Itemsnapshots
containing the information about the snapshots. This will not be used unless
upload progress monitoring and item snapshots are enabled and an ItemSnapshot
plugin is used to take snapshots.
Also added DownloadTargetKindBackupItemSnapshots for retrieving the signed URL to download only the `<backup name>`-itemsnapshots.json.gz part of a backup for use by
`velero backup describe`. (#4429, @dsmithuchida)
* Migrate backup sync controller from code-generator to kubebuilder. (#4423, @jxun)
* Added UploadProgressFeature flag to enable Upload Progress Monitoring and Item
Snapshotters. (#4416, @dsmithuchida)
* Added BackupWithResolvers and RestoreWithResolvers calls. Will eventually replace Backup and Restore methods.
Adds ItemSnapshotters to Backup and Restore workflows. (#4410, @dsu)
* Build for darwin-arm64 (#4409, @epk)
* Add resource filtering test cases (#4404, @mqiu)
* Fix the issue that the backup cannot be deleted after the application uninstalled (#4398, @ywk253100)
* Ignore the `provided port is already allocated` error when restoring the `NodePort` service (#4336, @ywk253100)
* Fixed an issue with the `backup-location create` command where the BSL Credential field would be set to an invalid empty SecretKeySelector when no credential details were provided. (#4322, @zubron)
* fix buggy pager func (#4306, @alaypatel07)
* Don't create a backup immediately after creating a schedule (#4281, @ywk253100)
* Fix CVE-2020-29652 and CVE-2020-26160 (#4274, @ywk253100)
* Refine tag-release.sh to align with change in release process (#4185, @reasonerjt)
* Fix plugins incompatible issue in upgrade test (#4141, @danfengliu)
* Verify group before treating resource as cohabiting (#4126, @sseago)
- No VolumeSnapshot will be left in the source namespace of the workload
- Report metrics for CSI snapshots
More improvements please refer to [CSI plugin improvement](https://github.com/vmware-tanzu/velero/issues?q=is%3Aissue+label%3A%22CSI+plugin+-+GA+-+phase1%22+is%3Aclosed)
With these improvements we'll provide official support for CSI snapshots on AKS/EKS clusters. (with CSI plugin v0.3.0)
#### Refactor the controllers using Kubebuilder v3
In this release we continued our code modernization work, rewriting some controllers using Kubebuilder v3. This work is ongoing and we will continue to make progress in future releases.
#### Optionally restore status on selected resources
Options are added to the CLI and Restore spec to control the group of resources whose status will be restored.
#### ExistingResourcePolicy in the restore API
Users can choose to overwrite or patch the existing resources during restore by setting this policy.
#### Upgrade integrated Restic version and add skip TLS validation in Restic command
Upgrade integrated Restic version, which will resolve some of the CVEs, and support skip TLS validation in Restic backup/restore.
#### Breaking changes
With bumping up the API to v1 in CSI plugin, the v0.3.0 CSI plugin will only work for Kubernetes v1.20+
### All changes
* restic: add full support for setting SecurityContext for restore init container from configMap. (#4084, @MatthieuFin)
* Add metrics backup_items_total and backup_items_errors (#4296, @tobiasgiese)
* Convert PodVolumebackup controller to the Kubebuilder framework (#4436, @fgold)
* Skip not mounted volumes when backing up (#4497, @dkeven)
* Update doc for v1.8 (#4517, @reasonerjt)
* Fix bug to make the restic prune frequency configurable (#4518, @ywk253100)
* Add E2E test of backups sync from BSL (#4545, @mqiu)
* Fix: OrderedResources in Schedules (#4550, @dbrekau)
* Skip volumes of non-running pods when backing up (#4584, @bynare)
* E2E SSR test add retry mechanism and logs (#4591, @mqiu)
* Add pushing image to GCR in github workflow to facilitate some environments that have rate limitation to docker hub, e.g. vSphere. (#4623, @jxun)
* Add existingResourcePolicy to Restore API (#4628, @shubham-pampattiwar)
* Fix E2E backup namespaces test (#4634, @qiuming-best)
* Update image used by E2E test to gcr.io (#4639, @jxun)
* Add multiple label selector support to Velero Backup and Restore APIs (#4650, @shubham-pampattiwar)
* Convert Pod Volume Restore resource/controller to the Kubebuilder framework (#4655, @ywk253100)
* Update --use-owner-references-in-backup description in velero command line. (#4660, @jxun)
* Avoid overwritten hook's exec.container parameter when running pod command executor. (#4661, @jxun)
* Support regional pv for GKE (#4680, @jxun)
* Bypass the remap CRD version plugin when v1beta1 CRD is not supported (#4686, @reasonerjt)
* Add GINKGO_SKIP to support skip specific case in e2e test. (#4692, @jxun)
* Add --pod-labels flag to velero install (#4694, @j4m3s-s)
* Enable coverage in test.sh and upload to codecov (#4704, @reasonerjt)
* Mark the BSL as "Unavailable" when gets any error and add a new field "Message" to the status to record the error message (#4719, @ywk253100)
* Support multiple skip option for E2E test (#4725, @jxun)
* Add PriorityClass to the AdditionalItems of Backup's PodAction and Restore's PodAction plugin to backup and restore PriorityClass if it is used by a Pod. (#4740, @phuongatemc)
* Insert all restore errors and warnings into restore log. (#4743, @sseago)
* Refactor schedule controller with kubebuilder (#4748, @ywk253100)
* Garbage collector now adds labels to backups that failed to delete for BSLNotFound, BSLCannotGet, BSLReadOnly reasons. (#4757, @kaovilai)
* Skip podvolumerestore creation when restore excludes pv/pvc (#4769, @half-life666)
* Add parameter for e2e test to support modify kibishii install path. (#4778, @jxun)
* Ensure the restore hook applied to new namespace based on the mapping (#4779, @reasonerjt)
* Add ability to restore status on selected resources (#4785, @RafaeLeal)
* Do not take snapshot for PV to avoid duplicated snapshotting, when CSI feature is enabled. (#4797, @jxun)
* Bump up to v1 API for CSI snapshot (#4800, @reasonerjt)
* fix: delete empty backups (#4817, @yuvalman)
* Add CSI VolumeSnapshot related metrics. (#4818, @jxun)
* Fix default-backup-ttl not work (#4831, @qiuming-best)
* Make the vsc created by backup sync controller deletable (#4832, @reasonerjt)
* Make in-progress backup/restore as failed when doing the reconcile to avoid hanging in in-progress status (#4833, @ywk253100)
* Use controller-gen to generate the deep copy methods for objects (#4838, @ywk253100)
* Update integrated Restic version and add insecureSkipTLSVerify for Restic CLI. (#4839, @jxun)
* Modify CSI VolumeSnapshot metric related code. (#4854, @jxun)
* Refactor backup deletion controller based on kubebuilder (#4855, @reasonerjt)
* Remove VolumeSnapshots created during backup when CSI feature is enabled. (#4858, @jxun)
* Convert Restic Repository resource/controller to the Kubebuilder framework (#4859, @qiuming-best)
* Add ClusterClasses to the restore priority list (#4866, @reasonerjt)
* Cleanup the .velero folder after restic done (#4872, @big-appled)
description:Backup is a Velero resource that represents the capture of Kubernetes
cluster state at a point in time (API objects and associated volume state).
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:BackupSpec defines the specification for a Velero backup.
properties:
defaultVolumesToRestic:
description:DefaultVolumesToRestic specifies whether restic should
be used to take a backup of all pod volumes by default.
type:boolean
excludedNamespaces:
description:ExcludedNamespaces contains a list of namespaces that are
not included in the backup.
items:
type:string
nullable:true
type:array
excludedResources:
description:ExcludedResources is a slice of resource names that are
not included in the backup.
items:
type:string
nullable:true
type:array
hooks:
description:Hooks represent custom behaviors that should be executed
at different phases of the backup.
properties:
resources:
description:Resources are hooks that should be executed when backing
up individual instances of a resource.
items:
description:BackupResourceHookSpec defines one or more BackupResourceHooks
that should be executed based on the rules defined for namespaces,
resources, and label selector.
properties:
excludedNamespaces:
description:ExcludedNamespaces specifies the namespaces to
which this hook spec does not apply.
items:
type:string
nullable:true
type:array
excludedResources:
description:ExcludedResources specifies the resources to
which this hook spec does not apply.
items:
type:string
nullable:true
type:array
includedNamespaces:
description:IncludedNamespaces specifies the namespaces to
which this hook spec applies. If empty, it applies to all
namespaces.
items:
type:string
nullable:true
type:array
includedResources:
description:IncludedResources specifies the resources to
which this hook spec applies. If empty, it applies to all
resources.
items:
type:string
nullable:true
type:array
labelSelector:
description:LabelSelector, if specified, filters the resources
to which this hook spec applies.
nullable:true
properties:
matchExpressions:
description:matchExpressions is a list of label selector
requirements. The requirements are ANDed.
items:
description:A label selector requirement is a selector
that contains values, a key, and an operator that
relates the key and values.
properties:
key:
description:key is the label key that the selector
applies to.
type:string
operator:
description:operator represents a key's relationship
to a set of values. Valid operators are In, NotIn,
Exists and DoesNotExist.
type:string
values:
description:values is an array of string values.
If the operator is In or NotIn, the values array
must be non-empty. If the operator is Exists or
DoesNotExist, the values array must be empty.
This array is replaced during a strategic merge
patch.
items:
type:string
type:array
required:
- key
- operator
type:object
type:array
matchLabels:
additionalProperties:
type:string
description:matchLabels is a map of {key,value} pairs.
A single {key,value} in the matchLabels map is equivalent
to an element of matchExpressions, whose key field is
"key",the operator is "In", and the values array contains
only "value". The requirements are ANDed.
type:object
type:object
name:
description:Name is the name of this hook.
type:string
post:
description:PostHooks is a list of BackupResourceHooks to
execute after storing the item in the backup. These are
executed after all "additional items" from item actions
are processed.
items:
description:BackupResourceHook defines a hook for a resource.
properties:
exec:
description:Exec defines an exec hook.
properties:
command:
description:Command is the command and arguments
to execute.
items:
type:string
minItems:1
type:array
container:
description:Container is the container in the pod
where the command should be executed. If not specified,
the pod's first container is used.
type:string
onError:
description:OnError specifies how Velero should
behave if it encounters an error executing this
hook.
enum:
- Continue
- Fail
type:string
timeout:
description:Timeout defines the maximum amount
of time Velero should wait for the hook to complete
before considering the execution a failure.
type:string
required:
- command
type:object
required:
- exec
type:object
type:array
pre:
description:PreHooks is a list of BackupResourceHooks to
execute prior to storing the item in the backup. These are
executed before any "additional items" from item actions
are processed.
items:
description:BackupResourceHook defines a hook for a resource.
properties:
exec:
description:Exec defines an exec hook.
properties:
command:
description:Command is the command and arguments
to execute.
items:
type:string
minItems:1
type:array
container:
description:Container is the container in the pod
where the command should be executed. If not specified,
the pod's first container is used.
type:string
onError:
description:OnError specifies how Velero should
behave if it encounters an error executing this
hook.
enum:
- Continue
- Fail
type:string
timeout:
description:Timeout defines the maximum amount
of time Velero should wait for the hook to complete
description:Backup Storage Location status such as Available/Unavailable
name:Phase
type:string
- JSONPath:.status.lastValidationTime
description:LastValidationTime is the last time the backup store location was
validated
name:Last Validated
type:date
- JSONPath:.metadata.creationTimestamp
name:Age
type:date
- JSONPath:.spec.default
description:Default backup storage location
name:Default
type:boolean
group:velero.io
names:
kind:BackupStorageLocation
listKind:BackupStorageLocationList
plural:backupstoragelocations
shortNames:
- bsl
singular:backupstoragelocation
preserveUnknownFields:false
scope:Namespaced
subresources:
status:{}
validation:
openAPIV3Schema:
description:BackupStorageLocation is a location where Velero stores backup
objects
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:BackupStorageLocationSpec defines the desired state of a Velero
BackupStorageLocation
properties:
accessMode:
description:AccessMode defines the permissions for the backup storage
location.
enum:
- ReadOnly
- ReadWrite
type:string
backupSyncPeriod:
description:BackupSyncPeriod defines how frequently to sync backup
API objects from object storage. A value of 0 disables sync.
nullable:true
type:string
config:
additionalProperties:
type:string
description:Config is for provider-specific configuration fields.
type:object
credential:
description:Credential contains the credential information intended
to be used with this location
properties:
key:
description:The key of the secret to select from. Must be a valid
secret key.
type:string
name:
description: 'Name of the referent. More info:https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO:Add other useful fields. apiVersion, kind, uid?'
type:string
optional:
description:Specify whether the Secret or its key must be defined
type:boolean
required:
- key
type:object
default:
description:Default indicates this location is the default backup storage
location.
type:boolean
objectStorage:
description:ObjectStorageLocation specifies the settings necessary
to connect to a provider's object storage.
properties:
bucket:
description:Bucket is the bucket to use for object storage.
type:string
caCert:
description:CACert defines a CA bundle to use when verifying TLS
connections to the provider.
format:byte
type:string
prefix:
description:Prefix is the path inside a bucket to use for Velero
storage. Optional.
type:string
required:
- bucket
type:object
provider:
description:Provider is the provider of the backup storage.
type:string
validationFrequency:
description:ValidationFrequency defines how frequently to validate
the corresponding object storage. A value of 0 disables validation.
nullable:true
type:string
required:
- objectStorage
- provider
type:object
status:
description:BackupStorageLocationStatus defines the observed state of BackupStorageLocation
properties:
accessMode:
description:"AccessMode is an unused field. \n Deprecated: there is
now an AccessMode field on the Spec and this field will be removed
entirely as of v2.0."
enum:
- ReadOnly
- ReadWrite
type:string
lastSyncedRevision:
description:"LastSyncedRevision is the value of the `metadata/revision`
file in the backup storage location the last time the BSL's contents
were synced into the cluster. \n Deprecated: this field is no longer
updated or used for detecting changes to the location's contents and
will be removed entirely in v2.0."
type:string
lastSyncedTime:
description:LastSyncedTime is the last time the contents of the location
were synced into the cluster.
format:date-time
nullable:true
type:string
lastValidationTime:
description:LastValidationTime is the last time the backup store location
was validated the cluster.
format:date-time
nullable:true
type:string
phase:
description:Phase is the current state of the BackupStorageLocation.
description:DeleteBackupRequest is a request to delete one or more backups.
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:DeleteBackupRequestSpec is the specification for which backups
to delete.
properties:
backupName:
type:string
required:
- backupName
type:object
status:
description:DeleteBackupRequestStatus is the current status of a DeleteBackupRequest.
properties:
errors:
description:Errors contains any errors that were encountered during
the deletion process.
items:
type:string
nullable:true
type:array
phase:
description:Phase is the current state of the DeleteBackupRequest.
description:DownloadRequest is a request to download an artifact from backup
object storage, such as a backup log file.
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:DownloadRequestSpec is the specification for a download request.
properties:
target:
description:Target is what to download (e.g. logs for a backup).
properties:
kind:
description:Kind is the type of file to download.
enum:
- BackupLog
- BackupContents
- BackupVolumeSnapshots
- BackupResourceList
- RestoreLog
- RestoreResults
type:string
name:
description:Name is the name of the kubernetes resource with which
the file is associated.
type:string
required:
- kind
- name
type:object
required:
- target
type:object
status:
description:DownloadRequestStatus is the current status of a DownloadRequest.
properties:
downloadURL:
description:DownloadURL contains the pre-signed URL for the target
file.
type:string
expiration:
description:Expiration is when this DownloadRequest expires and can
be deleted by the system.
format:date-time
nullable:true
type:string
phase:
description:Phase is the current state of the DownloadRequest.
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:PodVolumeBackupSpec is the specification for a PodVolumeBackup.
properties:
backupStorageLocation:
description:BackupStorageLocation is the name of the backup storage
location where the restic repository is stored.
type:string
node:
description:Node is the name of the node that the Pod is running on.
type:string
pod:
description:Pod is a reference to the pod containing the volume to
be backed up.
properties:
apiVersion:
description:API version of the referent.
type:string
fieldPath:
description:'If referring to a piece of an object instead of an
entire object, this string should contain a valid JSON/Go field
access statement, such as desiredState.manifest.containers[2].
For example, if the object reference is to a container within
a pod, this would take on a value like: "spec.containers{name}"
(where "name" refers to the name of the container that triggered
the event) or if no container name is specified "spec.containers[2]"
(container with index 2 in this pod). This syntax is chosen only
to have some well-defined way of referencing a part of an object.
TODO: this design is not final and this field is subject to change
in the future.'
type:string
kind:
description: 'Kind of the referent. More info:https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
name:
description: 'Name of the referent. More info:https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names'
type:string
namespace:
description: 'Namespace of the referent. More info:https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/'
type:string
resourceVersion:
description:'Specific resourceVersion to which this reference is
made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency'
type:string
uid:
description: 'UID of the referent. More info:https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids'
type:string
type:object
repoIdentifier:
description:RepoIdentifier is the restic repository identifier.
type:string
tags:
additionalProperties:
type:string
description:Tags are a map of key-value pairs that should be applied
to the volume backup as tags.
type:object
volume:
description:Volume is the name of the volume within the Pod to be backed
up.
type:string
required:
- backupStorageLocation
- node
- pod
- repoIdentifier
- volume
type:object
status:
description:PodVolumeBackupStatus is the current status of a PodVolumeBackup.
properties:
completionTimestamp:
description:CompletionTimestamp records the time a backup was completed.
Completion time is recorded even on failed backups. Completion time
is recorded before uploading the backup object. The server's time
is used for CompletionTimestamps
format:date-time
nullable:true
type:string
message:
description:Message is a message about the pod volume backup's status.
type:string
path:
description:Path is the full path within the controller pod being backed
up.
type:string
phase:
description:Phase is the current state of the PodVolumeBackup.
enum:
- New
- InProgress
- Completed
- Failed
type:string
progress:
description:Progress holds the total number of bytes of the volume
and the current number of backed up bytes. This can be used to display
progress information about the backup operation.
properties:
bytesDone:
format:int64
type:integer
totalBytes:
format:int64
type:integer
type:object
snapshotID:
description:SnapshotID is the identifier for the snapshot of the pod
volume.
type:string
startTimestamp:
description:StartTimestamp records the time a backup was started. Separate
from CreationTimestamp, since that value changes on restores. The
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:PodVolumeRestoreSpec is the specification for a PodVolumeRestore.
properties:
backupStorageLocation:
description:BackupStorageLocation is the name of the backup storage
location where the restic repository is stored.
type:string
pod:
description:Pod is a reference to the pod containing the volume to
be restored.
properties:
apiVersion:
description:API version of the referent.
type:string
fieldPath:
description:'If referring to a piece of an object instead of an
entire object, this string should contain a valid JSON/Go field
access statement, such as desiredState.manifest.containers[2].
For example, if the object reference is to a container within
a pod, this would take on a value like: "spec.containers{name}"
(where "name" refers to the name of the container that triggered
the event) or if no container name is specified "spec.containers[2]"
(container with index 2 in this pod). This syntax is chosen only
to have some well-defined way of referencing a part of an object.
TODO: this design is not final and this field is subject to change
in the future.'
type:string
kind:
description: 'Kind of the referent. More info:https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
name:
description: 'Name of the referent. More info:https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names'
type:string
namespace:
description: 'Namespace of the referent. More info:https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/'
type:string
resourceVersion:
description:'Specific resourceVersion to which this reference is
made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency'
type:string
uid:
description: 'UID of the referent. More info:https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids'
type:string
type:object
repoIdentifier:
description:RepoIdentifier is the restic repository identifier.
type:string
snapshotID:
description:SnapshotID is the ID of the volume snapshot to be restored.
type:string
volume:
description:Volume is the name of the volume within the Pod to be restored.
type:string
required:
- backupStorageLocation
- pod
- repoIdentifier
- snapshotID
- volume
type:object
status:
description:PodVolumeRestoreStatus is the current status of a PodVolumeRestore.
properties:
completionTimestamp:
description:CompletionTimestamp records the time a restore was completed.
Completion time is recorded even on failed restores. The server's
time is used for CompletionTimestamps
format:date-time
nullable:true
type:string
message:
description:Message is a message about the pod volume restore's status.
type:string
phase:
description:Phase is the current state of the PodVolumeRestore.
enum:
- New
- InProgress
- Completed
- Failed
type:string
progress:
description:Progress holds the total number of bytes of the snapshot
and the current number of restored bytes. This can be used to display
progress information about the restore operation.
properties:
bytesDone:
format:int64
type:integer
totalBytes:
format:int64
type:integer
type:object
startTimestamp:
description:StartTimestamp records the time a restore was started.
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:ResticRepositorySpec is the specification for a ResticRepository.
properties:
backupStorageLocation:
description:BackupStorageLocation is the name of the BackupStorageLocation
that should contain this repository.
type:string
maintenanceFrequency:
description:MaintenanceFrequency is how often maintenance should be
run.
type:string
resticIdentifier:
description:ResticIdentifier is the full restic-compatible string for
identifying this repository.
type:string
volumeNamespace:
description:VolumeNamespace is the namespace this restic repository
contains pod volume backups for.
type:string
required:
- backupStorageLocation
- maintenanceFrequency
- resticIdentifier
- volumeNamespace
type:object
status:
description:ResticRepositoryStatus is the current status of a ResticRepository.
properties:
lastMaintenanceTime:
description:LastMaintenanceTime is the last time maintenance was run.
format:date-time
nullable:true
type:string
message:
description:Message is a message about the current status of the ResticRepository.
type:string
phase:
description:Phase is the current state of the ResticRepository.
description:Schedule is a Velero resource that represents a pre-scheduled or
periodic Backup that should be run.
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:ScheduleSpec defines the specification for a Velero schedule
properties:
schedule:
description:Schedule is a Cron expression defining when to run the
Backup.
type:string
template:
description:Template is the definition of the Backup to be run on the
provided schedule
properties:
defaultVolumesToRestic:
description:DefaultVolumesToRestic specifies whether restic should
be used to take a backup of all pod volumes by default.
type:boolean
excludedNamespaces:
description:ExcludedNamespaces contains a list of namespaces that
are not included in the backup.
items:
type:string
nullable:true
type:array
excludedResources:
description:ExcludedResources is a slice of resource names that
are not included in the backup.
items:
type:string
nullable:true
type:array
hooks:
description:Hooks represent custom behaviors that should be executed
at different phases of the backup.
properties:
resources:
description:Resources are hooks that should be executed when
backing up individual instances of a resource.
items:
description:BackupResourceHookSpec defines one or more BackupResourceHooks
that should be executed based on the rules defined for namespaces,
resources, and label selector.
properties:
excludedNamespaces:
description:ExcludedNamespaces specifies the namespaces
to which this hook spec does not apply.
items:
type:string
nullable:true
type:array
excludedResources:
description:ExcludedResources specifies the resources
to which this hook spec does not apply.
items:
type:string
nullable:true
type:array
includedNamespaces:
description:IncludedNamespaces specifies the namespaces
to which this hook spec applies. If empty, it applies
to all namespaces.
items:
type:string
nullable:true
type:array
includedResources:
description:IncludedResources specifies the resources
to which this hook spec applies. If empty, it applies
to all resources.
items:
type:string
nullable:true
type:array
labelSelector:
description:LabelSelector, if specified, filters the
resources to which this hook spec applies.
nullable:true
properties:
matchExpressions:
description:matchExpressions is a list of label selector
requirements. The requirements are ANDed.
items:
description:A label selector requirement is a selector
that contains values, a key, and an operator that
relates the key and values.
properties:
key:
description:key is the label key that the selector
applies to.
type:string
operator:
description:operator represents a key's relationship
to a set of values. Valid operators are In,
NotIn, Exists and DoesNotExist.
type:string
values:
description:values is an array of string values.
If the operator is In or NotIn, the values
array must be non-empty. If the operator is
Exists or DoesNotExist, the values array must
be empty. This array is replaced during a
strategic merge patch.
items:
type:string
type:array
required:
- key
- operator
type:object
type:array
matchLabels:
additionalProperties:
type:string
description:matchLabels is a map of {key,value} pairs.
A single {key,value} in the matchLabels map is equivalent
to an element of matchExpressions, whose key field
is "key", the operator is "In", and the values array
contains only "value". The requirements are ANDed.
type:object
type:object
name:
description:Name is the name of this hook.
type:string
post:
description:PostHooks is a list of BackupResourceHooks
to execute after storing the item in the backup. These
are executed after all "additional items" from item
actions are processed.
items:
description:BackupResourceHook defines a hook for a
resource.
properties:
exec:
description:Exec defines an exec hook.
properties:
command:
description:Command is the command and arguments
to execute.
items:
type:string
minItems:1
type:array
container:
description:Container is the container in the
pod where the command should be executed.
If not specified, the pod's first container
is used.
type:string
onError:
description:OnError specifies how Velero should
behave if it encounters an error executing
this hook.
enum:
- Continue
- Fail
type:string
timeout:
description:Timeout defines the maximum amount
of time Velero should wait for the hook to
complete before considering the execution
a failure.
type:string
required:
- command
type:object
required:
- exec
type:object
type:array
pre:
description:PreHooks is a list of BackupResourceHooks
to execute prior to storing the item in the backup.
These are executed before any "additional items" from
item actions are processed.
items:
description:BackupResourceHook defines a hook for a
description:ServerStatusRequest is a request to access current status information
about the Velero server.
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:ServerStatusRequestSpec is the specification for a ServerStatusRequest.
type:object
status:
description:ServerStatusRequestStatus is the current status of a ServerStatusRequest.
properties:
phase:
description:Phase is the current lifecycle phase of the ServerStatusRequest.
enum:
- New
- Processed
type:string
plugins:
description:Plugins list information about the plugins running on the
Velero server
items:
description:PluginInfo contains attributes of a Velero plugin
properties:
kind:
type:string
name:
type:string
required:
- kind
- name
type:object
nullable:true
type:array
processedTimestamp:
description:ProcessedTimestamp is when the ServerStatusRequest was
processed by the ServerStatusRequestController.
format:date-time
nullable:true
type:string
serverVersion:
description:ServerVersion is the Velero server version.
description:VolumeSnapshotLocation is a location where Velero stores volume
snapshots.
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:VolumeSnapshotLocationSpec defines the specification for a
Velero VolumeSnapshotLocation.
properties:
config:
additionalProperties:
type:string
description:Config is for provider-specific configuration fields.
type:object
provider:
description:Provider is the provider of the volume storage.
type:string
required:
- provider
type:object
status:
description:VolumeSnapshotLocationStatus describes the current status of
a Velero VolumeSnapshotLocation.
properties:
phase:
description:VolumeSnapshotLocationPhase is the lifecycle phase of a
- description:DataDownload status such as New/InProgress
jsonPath:.status.phase
name:Status
type:string
- description:Time duration since this DataDownload was started
jsonPath:.status.startTimestamp
name:Started
type:date
- description:Completed bytes
format:int64
jsonPath:.status.progress.bytesDone
name:Bytes Done
type:integer
- description:Total bytes
format:int64
jsonPath:.status.progress.totalBytes
name:Total Bytes
type:integer
- description:Name of the Backup Storage Location where the backup data is stored
jsonPath:.spec.backupStorageLocation
name:Storage Location
type:string
- description:Time duration since this DataDownload was created
jsonPath:.metadata.creationTimestamp
name:Age
type:date
- description:Name of the node where the DataDownload is processed
jsonPath:.status.node
name:Node
type:string
name:v2alpha1
schema:
openAPIV3Schema:
description:DataDownload acts as the protocol between data mover plugins
and data mover controller for the datamover restore operation
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:DataDownloadSpec is the specification for a DataDownload.
properties:
backupStorageLocation:
description:BackupStorageLocation is the name of the backup storage
location where the backup repository is stored.
type:string
cancel:
description:Cancel indicates request to cancel the ongoing DataDownload.
It can be set when the DataDownload is in InProgress phase
type:boolean
dataMoverConfig:
additionalProperties:
type:string
description:DataMoverConfig is for data-mover-specific configuration
fields.
type:object
datamover:
description:DataMover specifies the data mover to be used by the
backup. If DataMover is "" or "velero", the built-in data mover
will be used.
type:string
operationTimeout:
description:OperationTimeout specifies the time used to wait internal
operations, before returning error as timeout.
type:string
snapshotID:
description:SnapshotID is the ID of the Velero backup snapshot to
be restored from.
type:string
sourceNamespace:
description:SourceNamespace is the original namespace where the volume
is backed up from. It may be different from SourcePVC's namespace
if namespace is remapped during restore.
type:string
targetVolume:
description:TargetVolume is the information of the target PVC and
PV.
properties:
namespace:
description:Namespace is the target namespace
type:string
pv:
description:PV is the name of the target PV that is created by
Velero restore
type:string
pvc:
description:PVC is the name of the target PVC that is created
by Velero restore
type:string
required:
- namespace
- pv
- pvc
type:object
required:
- backupStorageLocation
- operationTimeout
- snapshotID
- sourceNamespace
- targetVolume
type:object
status:
description:DataDownloadStatus is the current status of a DataDownload.
properties:
completionTimestamp:
description:CompletionTimestamp records the time a restore was completed.
Completion time is recorded even on failed restores. The server's
time is used for CompletionTimestamps
format:date-time
nullable:true
type:string
message:
description:Message is a message about the DataDownload's status.
type:string
node:
description:Node is name of the node where the DataDownload is processed.
type:string
phase:
description:Phase is the current state of the DataDownload.
enum:
- New
- Accepted
- Prepared
- InProgress
- Canceling
- Canceled
- Completed
- Failed
type:string
progress:
description:Progress holds the total number of bytes of the snapshot
and the current number of restored bytes. This can be used to display
progress information about the restore operation.
properties:
bytesDone:
format:int64
type:integer
totalBytes:
format:int64
type:integer
type:object
startTimestamp:
description:StartTimestamp records the time a restore was started.
- description:DataUpload status such as New/InProgress
jsonPath:.status.phase
name:Status
type:string
- description:Time duration since this DataUpload was started
jsonPath:.status.startTimestamp
name:Started
type:date
- description:Completed bytes
format:int64
jsonPath:.status.progress.bytesDone
name:Bytes Done
type:integer
- description:Total bytes
format:int64
jsonPath:.status.progress.totalBytes
name:Total Bytes
type:integer
- description:Name of the Backup Storage Location where this backup should be
stored
jsonPath:.spec.backupStorageLocation
name:Storage Location
type:string
- description:Time duration since this DataUpload was created
jsonPath:.metadata.creationTimestamp
name:Age
type:date
- description:Name of the node where the DataUpload is processed
jsonPath:.status.node
name:Node
type:string
name:v2alpha1
schema:
openAPIV3Schema:
description:DataUpload acts as the protocol between data mover plugins and
data mover controller for the datamover backup operation
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:DataUploadSpec is the specification for a DataUpload.
properties:
backupStorageLocation:
description:BackupStorageLocation is the name of the backup storage
location where the backup repository is stored.
type:string
cancel:
description:Cancel indicates request to cancel the ongoing DataUpload.
It can be set when the DataUpload is in InProgress phase
type:boolean
csiSnapshot:
description:If SnapshotType is CSI, CSISnapshot provides the information
of the CSI snapshot.
nullable:true
properties:
snapshotClass:
description:SnapshotClass is the name of the snapshot class that
the volume snapshot is created with
type:string
storageClass:
description:StorageClass is the name of the storage class of
the PVC that the volume snapshot is created from
type:string
volumeSnapshot:
description:VolumeSnapshot is the name of the volume snapshot
to be backed up
type:string
required:
- storageClass
- volumeSnapshot
type:object
dataMoverConfig:
additionalProperties:
type:string
description:DataMoverConfig is for data-mover-specific configuration
fields.
nullable:true
type:object
datamover:
description:DataMover specifies the data mover to be used by the
backup. If DataMover is "" or "velero", the built-in data mover
will be used.
type:string
operationTimeout:
description:OperationTimeout specifies the time used to wait internal
operations, before returning error as timeout.
type:string
snapshotType:
description:SnapshotType is the type of the snapshot to be backed
up.
type:string
sourceNamespace:
description:SourceNamespace is the original namespace where the volume
is backed up from. It is the same namespace for SourcePVC and CSI
namespaced objects.
type:string
sourcePVC:
description:SourcePVC is the name of the PVC which the snapshot is
taken for.
type:string
required:
- backupStorageLocation
- operationTimeout
- snapshotType
- sourceNamespace
- sourcePVC
type:object
status:
description:DataUploadStatus is the current status of a DataUpload.
properties:
completionTimestamp:
description:CompletionTimestamp records the time a backup was completed.
Completion time is recorded even on failed backups. Completion time
is recorded before uploading the backup object. The server's time
is used for CompletionTimestamps
format:date-time
nullable:true
type:string
dataMoverResult:
additionalProperties:
type:string
description:DataMoverResult stores data-mover-specific information
as a result of the DataUpload.
nullable:true
type:object
message:
description:Message is a message about the DataUpload's status.
type:string
node:
description:Node is name of the node where the DataUpload is processed.
type:string
path:
description:Path is the full path of the snapshot volume being backed
up.
type:string
phase:
description:Phase is the current state of the DataUpload.
enum:
- New
- Accepted
- Prepared
- InProgress
- Canceling
- Canceled
- Completed
- Failed
type:string
progress:
description:Progress holds the total number of bytes of the volume
and the current number of backed up bytes. This can be used to display
progress information about the backup operation.
properties:
bytesDone:
format:int64
type:integer
totalBytes:
format:int64
type:integer
type:object
snapshotID:
description:SnapshotID is the identifier for the snapshot in the
backup repository.
type:string
startTimestamp:
description:StartTimestamp records the time a backup was started.
Separate from CreationTimestamp, since that value changes on restores.
# Delete Backup and Restic Repo Resources when BSL is Deleted
## Abstract
Issue #2082 requested that with the command `velero backup-location delete <bsl name>` (implemented in Velero 1.6 with #3073), the following will be deleted:
- associated Velero backups (to be clear, these are custom Kubernetes resources called "backups" that are stored in the API server)
- associated Restic repositories (custom Kubernetes resources called "resticrepositories")
This design doc explains how the request will be implemented.
## Background
When a BSL resource is deleted from its Velero namespace, the associated custom Kubernetes resources, backups and Restic repositories, can no longer be used.
It makes sense to clean those resources up when a BSL is deleted.
## Goals
Update the `velero backup-location delete <bsl name>` command to delete associated backup and Restic repository resources in the same Velero namespace.
## Non Goals
[It was suggested](https://github.com/vmware-tanzu/velero/issues/2082#issuecomment-827951311) to fix bug #2697 alongside this issue.
However, I think that should be fixed separately because although it is similar (restore objects are not being deleted), it is also quite different.
One is adding a command feature update (this issue) and the other is a bug fix and each affect different parts of the code base.
## High-Level Design
Update the `velero backup-location delete <bsl name>` command to do the following:
- find in the same Velero namespace from which the BSL was deleted the associated backup resources and Restic repositories, called "backups.velero.io" and "resticrepositories.velero.io" respectively
- delete the resources found
The above logic will be added to [where BSLs are deleted](https://github.com/vmware-tanzu/velero/blob/main/pkg/cmd/cli/backuplocation/delete.go).
## Alternative Considered
I had considered deleting the backup files (the ones in json format and tarballs) in the BSL itself.
However, a standard use case is to back up a cluster and then restore into a new cluster.
Deleting the backup storage location in either location is not expected to remove all of the backups in the backup storage location and should not be done.
This document proposes a solution that allows user to specify a backup order for resources of specific resource type.
## Background
During backup process, user may need to back up resources of specific type in some specific order to ensure the resources were backup properly because these resources are related and ordering might be required to preserve the consistency for the apps to recover itself <EFBFBD>from the backup image
During backup process, user may need to back up resources of specific type in some specific order to ensure the resources were backup properly because these resources are related and ordering might be required to preserve the consistency for the apps to recover itself from the backup image
(Ex: primary-secondary database pods in a cluster).
## Goals
- Enable user to specify an order of backup resources belong to specific resource type
- Enable user to specify an order of backup resources belong to specific resource type
## Alternatives Considered
- Use a plugin to backup an resources and all the sub resources. For example use a plugin for StatefulSet and backup pods belong to the StatefulSet in specific order. This plugin solution is not generic and requires plugin for each resource type.
## High-Level Design
User will specify a map of resource type to list resource names (separate by semicolons). Each name will be in the format "namespaceName/resourceName" to enable ordering accross namespaces. Based on this map, the resources of each resource type will be sorted by the order specified in the list of resources. If a resource instance belong to that specific type but its name is not in the order list, then it will be put behind other resources that are in the list.
User will specify a map of resource type to list resource names (separate by semicolons). Each name will be in the format "namespaceName/resourceName" to enable ordering across namespaces. Based on this map, the resources of each resource type will be sorted by the order specified in the list of resources. If a resource instance belong to that specific type but its name is not in the order list, then it will be put behind other resources that are in the list.
- In the CLI, the design proposes to use commas to separate items of a resource type and semicolon to separate key-value pairs. This follows the convention of using commas to separate items in a list (For example: --include-namespaces ns1,ns2). However, the syntax for map in labels and annotations use commas to seperate key-value pairs. So it introduces some inconsistency.
- In the CLI, the design proposes to use commas to separate items of a resource type and semicolon to separate key-value pairs. This follows the convention of using commas to separate items in a list (For example: --include-namespaces ns1,ns2). However, the syntax for map in labels and annotations use commas to separate key-value pairs. So it introduces some inconsistency.
- For pods that managed by Deployment or DaemonSet, this design may not work because the pods' name is randomly generated and if pods are restarted, they would have different names so the Backup operation may not consider the restarted pods in the sorting algorithm. This problem will be addressed when we enhance the design to use regular expression to specify the OrderResources instead of exact match.
This design includes the changes to the BackupItemAction (BIA) api design as required by the [Item Action Progress Monitoring](general-progress-monitoring.md) feature.
The BIA v2 interface will have two new methods, and the Execute() return signature will be modified.
If there are any additional BIA API changes that are needed in the same Velero release cycle as this change, those can be added here as well.
## Background
This API change is needed to facilitate long-running plugin actions that may not be complete when the Execute() method returns.
It is an optional feature, so plugins which don't need this feature can simply return an empty operation ID and the new methods can be no-ops.
This will allow long-running plugin actions to continue in the background while Velero moves on to the next plugin, the next item, etc.
## Goals
- Allow for BIA Execute() to optionally initiate a long-running operation and report on operation status.
## Non Goals
- Allowing velero control over when the long-running operation begins.
## High-Level Design
As per the [Plugin Versioning](plugin-versioning.md) design, a new BIAv2 plugin `.proto` file will be created to define the GRPC interface.
v2 go files will also be created in `plugin/clientmgmt/backupitemaction` and `plugin/framework/backupitemaction`, and a new PluginKind will be created.
The velero Backup process will be modified to reference v2 plugins instead of v1 plugins.
An adapter will be created so that any existing BIA v1 plugin can be executed as a v2 plugin when executing a backup.
## Detailed Design
### proto changes (compiled into golang by protoc)
The v2 BackupItemAction.proto will be like the current v1 version with the following changes:
To support these new rpc methods, we define new request/response message types:
```
message BackupItemActionProgressRequest {
string plugin = 1;
string operationID = 2;
bytes backup = 3;
}
message BackupItemActionProgressResponse {
generated.OperationProgress progress = 1;
}
message BackupItemActionCancelRequest {
string plugin = 1;
string operationID = 2;
bytes backup = 3;
}
```
One new shared message type will be added, as this will also be needed for v2 RestoreItemAction and VolmeSnapshotter:
```
message OperationProgress {
bool completed = 1;
string err = 2;
int64 nCompleted = 3;
int64 nTotal = 4;
string operationUnits = 5;
string description = 6;
google.protobuf.Timestamp started = 7;
google.protobuf.Timestamp updated = 8;
}
```
In addition to the two new rpc methods added to the BackupItemAction interface, there is also a new `Name()` method. This one is only actually used internally by Velero to get the name that the plugin was registered with, but it still must be defined in a plugin which implements BackupItemActionV2 in order to implement the interface. It doesn't really matter what it returns, though, as this particular method is not delegated to the plugin via RPC calls. The new (and modified) interface methods for `BackupItemAction` are as follows:
A new PluginKind, `BackupItemActionV2`, will be created, and the backup process will be modified to use this plugin kind.
See [Plugin Versioning](plugin-versioning.md) for more details on implementation plans, including v1 adapters, etc.
## Compatibility
The included v1 adapter will allow any existing BackupItemAction plugin to work as expected, with an empty operation ID returned from Execute() and no-op Progress() and Cancel() methods.
## Implementation
This will be implemented during the Velero 1.11 development cycle.
# Proposal to add resource filters for backup can distinguish whether resource is cluster-scoped or namespace-scoped.
- [Proposal to add resource filters for backup can distinguish whether resource is cluster-scoped or namespace-scoped.](#proposal-to-add-resource-filters-for-backup-can-distinguish-whether-resource-is-cluster-scoped-or-namespace-scoped)
- [Abstract](#abstract)
- [Background](#background)
- [Goals](#goals)
- [Non Goals](#non-goals)
- [High-Level Design](#high-level-design)
- [Parameters Rules](#parameters-rules)
- [Using scenarios:](#using-scenarios)
- [no namespace-scoped resources + some cluster-scoped resources](#no-namespace-scoped-resources--some-cluster-scoped-resources)
- [no namespace-scoped resources + all cluster-scoped resources](#no-namespace-scoped-resources--all-cluster-scoped-resources)
- [some namespace-scoped resources + no cluster-scoped resources](#some-namespace-scoped-resources--no-cluster-scoped-resources)
- [scenario 1](#scenario-1)
- [scenario 2](#scenario-2)
- [scenario 3](#scenario-3)
- [scenario 4](#scenario-4)
- [some namespace-scoped resources + only related cluster-scoped resources](#some-namespace-scoped-resources--only-related-cluster-scoped-resources)
- [scenario 1](#scenario-1-1)
- [scenario 2](#scenario-2-1)
- [scenario 3](#scenario-3-1)
- [some namespace-scoped resources + some additional cluster-scoped resources](#some-namespace-scoped-resources--some-additional-cluster-scoped-resources)
- [scenario 1](#scenario-1-2)
- [scenario 2](#scenario-2-2)
- [scenario 3](#scenario-3-2)
- [scenario 4](#scenario-4-1)
- [some namespace-scoped resources + all cluster-scoped resources](#some-namespace-scoped-resources--all-cluster-scoped-resources)
- [scenario 1](#scenario-1-3)
- [scenario 2](#scenario-2-3)
- [scenario 3](#scenario-3-3)
- [all namespace-scoped resources + no cluster-scoped resources](#all-namespace-scoped-resources--no-cluster-scoped-resources)
- [all namespace-scoped resources + some additional cluster-scoped resources](#all-namespace-scoped-resources--some-additional-cluster-scoped-resources)
- [all namespace-scoped resources + all cluster-scoped resources](#all-namespace-scoped-resources--all-cluster-scoped-resources)
The current filter (IncludedResources/ExcludedResources + IncludeClusterResources flag) is not enough for some special cases, e.g. all namespace-scoped resources + some kind of cluster-scoped resource and all namespace-scoped resources + cluster-scoped resource excludes.
Propose to add a new group of resource filtering parameters, which can distinguish cluster-scoped and namespace-scoped resources.
## Background
There are two sets of resource filters for Velero: `IncludedNamespaces/ExcludedNamespaces` and `IncludedResources/ExcludedResources`.
`IncludedResources` means only including the resource types specified in the parameter. Both cluster-scoped and namespace-scoped resources are handled in this parameter by now.
The k8s resources are separated into cluster-scoped and namespace-scoped.
As a result, it's hard to include all resources in one group and only including specified resource in the other group.
## Goals
- Make Velero can support more complicated namespace-scoped and cluster-scoped resources filtering scenarios in backup.
## Non Goals
- Enrich the resource filtering rules, for example, advanced PV filtering and filtering by resource names.
## High-Level Design
Four new parameters are added into command `velero backup create`: `--include-cluster-scoped-resources`, `--exclude-cluster-scoped-resources`, `--include-namespace-scoped-resources` and `--exclude-namespace-scoped-resources`.
`--include-cluster-scoped-resources` and `--exclude-cluster-scoped-resources` are used to filter cluster-scoped resources included or excluded in backup per resource type.
`--include-namespace-scoped-resources` and `--exclude-namespace-scoped-resources` are used to filter namespace-scoped resources included or excluded in backup per resource type.
Restore and other code pieces also use resource filtering will be handled in future releases.
### Parameters Rules
*`--include-cluster-scoped-resources`, `--include-namespace-scoped-resources`, `--exclude-cluster-scoped-resources` and `--exclude-namespace-scoped-resources` valid value include `*` and comma separated string. Each element of the CSV string should a k8s resource name. The format should be `resource.group`, such as `storageclasses.storage.k8s.io.`.
*`--include-cluster-scoped-resources`, `--include-namespace-scoped-resources`, `--exclude-cluster-scoped-resources` and `--exclude-namespace-scoped-resources` parameters are mutual exclusive with `--include-cluster-resources`, `--include-resources` and `--exclude-resources` parameters. If both sets of parameters are provisioned, validation failure should be returned.
*`--include-cluster-scoped-resources` and `--exclude-cluster-scoped-resources` should only contain cluster-scoped resource type names. If namespace-scoped resource type names are included, they are ignored.
* If there are conflicts between `--include-cluster-scoped-resources` and `--exclude-cluster-scoped-resources` specified resources type lists, `--exclude-cluster-scoped-resources` parameter has higher priority.
*`--include-namespace-scoped-resources` and `--exclude-namespace-scoped-resources` should only contain namespace-scoped resource type names. If cluster-scoped resource type names are included, they are ignored.
* If there are conflicts between `--include-namespace-scoped-resources` and `--exclude-namespace-scoped-resources` specified resources type lists, `--exclude-namespace-scoped-resources` parameter has higher priority.
* If `--include-namespace-scoped-resources` is not present, it means all namespace-scoped resources are included per resource type.
* If both `--include-cluster-scoped-resources` and `--exclude-cluster-scoped-resources` are not present, it means no additional cluster-scoped resource is included per resource type, just as the existing `--include-cluster-resources` parameter not setting value. Cluster-scoped resources are related to the namespace-scoped resources, which means those are returned in the namespace-scoped resources' BackupItemAction's result AdditionalItems array, are still included in backup by default. Taking backing up PVC scenario as an example, PVC is namespace-scoped, PV is cluster-scoped. PVC's BIA will include PVC related PV into backup too.
### Using scenarios:
Please notice, if the scenario give the example of using old filtering parameters (`--include-cluster-resources`, `--include-resources` and `--exclude-resources`), that means the old parameters also work for this case. If old parameters example is not given, that means they don't work for this scenario, only new parameters (`--include-cluster-scoped-resources`, `--include-namespace-scoped-resources`, `--exclude-cluster-scoped-resources` and `--exclude-namespace-scoped-resources`) work.
#### no namespace-scoped resources + some cluster-scoped resources
The following command means backup no namespace-scoped resources and some cluster-scoped resources.
``` bash
velero backup create <backup-name>
--exclude-namespace-scoped-resources=*
--include-cluster-scoped-resources=storageclass
```
#### no namespace-scoped resources + all cluster-scoped resources
The following command means backup no namespace-scoped resources and all cluster-scoped resources.
``` bash
velero backup create <backup-name>
--exclude-namespace-scoped-resources=*
--include-cluster-scoped-resources=*
```
#### some namespace-scoped resources + no cluster-scoped resources
##### scenario 1
The following commands mean backup all resources in namespaces default and kube-system, and no cluster-scoped resources.
Example of new parameters:
``` bash
velero backup create <backup-name>
--include-namespaces=default,kube-system
--exclude-cluster-scoped-resources=*
```
Example of old parameters:
``` bash
velero backup create <backup-name>
--include-namespaces=default,kube-system
--include-cluster-resources=false
```
##### scenario 2
The following commands mean backup PVC, Deployment, Service, Endpoint, Pod and ReplicaSet resources in all namespaces, and no cluster-scoped resources. Although PVC's related PV should be included, due to no cluster-scoped resources are included, so they are ruled out too.
The following commands mean backup PVC, Deployment, Service, Endpoint, Pod and ReplicaSet resources in namespace default and kube-system, and no cluster-scoped resources. Although PVC's related PV should be included, due to no cluster-scoped resources are included, so they are ruled out too.
This means backup all resources except Ingress type resources in all namespaces, and related cluster-scoped resources.
Example of new parameters:
``` bash
velero backup create <backup-name>
--exclude-namespace-scoped-resources=ingress
```
Example of old parameters:
``` bash
velero backup create <backup-name>
--exclude-resources=ingress
```
#### some namespace-scoped resources + some additional cluster-scoped resources
##### scenario 1
This means backup all resources in namespace in default, kube-system, and related cluster-scoped resources, plus all StorageClass resources.
``` bash
velero backup create <backup-name>
--include-namespaces=default,kube-system
--include-cluster-scoped-resources=storageclass
```
##### scenario 2
This means backup PVC, Deployment, Service, Endpoint, Pod and ReplicaSet resources in all namespaces, and related cluster-scoped resources, plus all StorageClass resources, and PVC related PV.
This means backup PVC, Deployment, Service, Endpoint, Pod and ReplicaSet resources in default and kube-system namespaces, and related cluster-scoped resources, plus all StorageClass resources, and PVC related PV.
This means backup PVC, Deployment, Service, Endpoint, Pod and ReplicaSet resources in default and kube-system namespaces, and related cluster-scoped resources, plus all cluster-scoped resources except StorageClass type resources.
#### all namespace-scoped resources + all cluster-scoped resources
The following commands have the same meaning: backup all namespace-scoped resources, and all cluster-scoped resources.
``` bash
velero backup create <backup-name>
--include-cluster-scoped-resources=*
```
``` bash
velero backup create <backup-name>
--include-cluster-resources=true
```
#### describe command change
In `velero backup describe` command, the four new parameters should be outputted too.
``` bash
velero backup describe <backup-name>
......
Namespaces:
Included: ns2
Excluded: <none>
Resources:
Included cluster-scoped: StorageClass,PersistentVolume
Excluded cluster-scoped: <none>
Included namespace-scoped: default
Excluded namespace-scoped: <none>
......
```
**Note:** `velero restore` command doesn't support those four new parameter in Velero v1.11, but `velero schedule` supports the four new parameters through backup specification.
## Detailed Design
With adding `IncludedNamespaceScopedResources`, `ExcludedNamespaceScopedResources`, `IncludedClusterScopedResources` and `ExcludedClusterScopedResources`, the `BackupSpec` looks like:
``` go
type BackupSpec struct {
......
// IncludedResources is a slice of resource names to include
// in the backup. If empty, all resources are included.
Proposal from Jibu Data [Issue 5120](https://github.com/vmware-tanzu/velero/issues/5120#issue-1304534563)
## Security Considerations
No security impact.
## Compatibility
The four new parameters cannot be mixed with existing resource filter parameters: `IncludedResources`, `ExcludedResources` and `IncludeClusterResources`.
If the new parameters and old parameters both appears in command line, or are specified in backup spec, the command line and the backup should fail.
## Implementation
This change should be included into Velero v1.11.
New parameters will coexist with `IncludedResources`, `ExcludedResources` and `IncludeClusterResources`.
Plan to deprecate `IncludedResources`, `ExcludedResources` and `IncludeClusterResources` in future releases, but also open to the community's feedback.
## Open Issues
`LabelSelector/OrLabelSelectors` apply to namespace-scoped resources.
It may be reasonable to make them also working on cluster-scoped resources.
An issue is created to trace this topic [resource label selector not work for cluster-scoped resources](https://github.com/vmware-tanzu/velero/issues/5787)
@@ -175,7 +175,7 @@ If there are one or more, download the backup tarball from backup storage, untar
## Alternatives Considered
Another proposal for higher level `DeleteItemActions` was initially included, which would require implementors to individually download the backup tarball themselves.
Another proposal for higher level `DeleteItemActions` was initially included, which would require implementers to individually download the backup tarball themselves.
While this may be useful long term, it is not a good fit for the current goals as each plugin would be re-implementing a lot of boilerplate.
See the deletion-plugins.md file for this alternative proposal in more detail.
# Add support for `ExistingResourcePolicy` to restore API
## Abstract
Velero currently does not support any restore policy on Kubernetes resources that are already present in-cluster. Velero skips over the restore of the resource if it already exists in the namespace/cluster irrespective of whether the resource present in the restore is the same or different from the one present on the cluster. It is desired that Velero gives the option to the user to decide whether or not the resource in backup should overwrite the one present in the cluster.
## Background
As of Today, Velero will skip over the restoration of resources that already exist in the cluster. The current workflow followed by Velero is (Using a `service` that is backed up for example):
- Velero tries to attempt restore of the `service`
- Fetches the `service` from the cluster
- If the `service` exists then:
- Checks whether the `service` instance in the cluster is equal to the `service` instance present in backup
- If not equal then skips the restore of the `service` and adds a restore warning (except for [ServiceAccount objects](https://github.com/vmware-tanzu/velero/blob/574baeb3c920f97b47985ec3957debdc70bcd5f8/pkg/restore/restore.go#L1246))
- If equal then skips the restore of the `service` and mentions that the restore of resource `service` is skipped in logs
It is desired to add the functionality to specify whether or not to overwrite the instance of resource `service` in cluster with the one present in backup during the restore process.
Related issue: https://github.com/vmware-tanzu/velero/issues/4066
## Goals
- Add support for `ExistingResourcePolicy` to restore API for Kubernetes resources.
## Non Goals
- Change existing restore workflow for `ServiceAccount` objects
- Add support for `ExistingResourcePolicy` as `recreate` for Kubernetes resources. (Future scope feature)
## Unrelated Proposals (Completely different functionalities than the one proposed in the design)
- Add support for `ExistingResourcePolicy` to restore API for Non-Kubernetes resources.
- Add support for `ExistingResourcePolicy` to restore API for `PersistentVolume` data.
### Use-cases/Scenarios
### A. Production Cluster - Backup Cluster:
Let's say you have a Backup Cluster which is identical to the Production Cluster. After some operations/usage/time the Production Cluster had changed itself, there might be new deployments, some secrets might have been updated. Now, this means that the Backup cluster will no longer be identical to the Production Cluster. In order to keep the Backup Cluster up to date/identical to the Production Cluster with respect to Kubernetes resources except PV data we would like to use Velero for scheduling new backups which would in turn help us update the Backup Cluster via Velero restore.
Here delta resources mean the resources restored by a previous backup, but they are no longer in the latest backup. Let's follow a sequence of steps to understand this scenario:
- Consider there are 2 clusters, Cluster A, which has 3 resources - P1, P2 and P3.
- Create a Backup1 from Cluster A which has P1, P2 and P3.
- Perform restore on a new Cluster B using Backup1.
- Now, Lets say in Cluster A resource P1 gets deleted and resource P2 gets updated.
- Create a new Backup2 with the new state of Cluster A, keep in mind Backup1 has P1, P2 and P3 while Backup2 has P2' and P3.
- So the Delta here is (|Cluster B - Backup2|), Delete P1 and Update P2.
- During Restore time we would want the Restore to help us identify this resource delta.
### Approach 1: Add a new spec field `existingResourcePolicy` to the Restore API
In this approach we do *not* change existing velero behavior. If the resource to restore in cluster is equal to the one backed up then do nothing following current Velero behavior. For resources that already exist in the cluster that are not equal to the resource in the backup (other than Service Accounts). We add a new optional spec field `existingResourcePolicy` which can have the following values:
1.`none`: This is the existing behavior, if Velero encounters a resource that already exists in the cluster, we simply
skip restoration.
2.`update`: This option would provide the following behavior.
- Unchanged resources: Velero would update the backup/restore labels on the unchanged resources, if labels patch fails Velero adds a restore error.
- Changed resources: Velero will first try to patch the changed resource, Now if the patch:
- succeeds: Then the in-cluster resource gets updated with the labels as well as the resource diff
- fails: Velero adds a restore warning and tries to just update the backup/restore labels on the resource, if the labels patch also fails then we add restore error.
3.`recreate`: If resource already exists, then Velero will delete it and recreate the resource.
*Note:* The `recreate` option is a non-goal for this enhancement proposal, but it is considered as a future scope.
Another thing to highlight is that Velero will not be deleting any resources in any of the policy options proposed in
this design but Velero will patch the resources in `update` policy option.
Example:
A. The following Restore will execute the `existingResourcePolicy` restore type `none` for the `services` and `deployments` present in the `velero-protection` namespace.
```
Kind: Restore
…
includeNamespaces: velero-protection
includeResources:
- services
- deployments
existingResourcePolicy: none
```
B. The following Restore will execute the `existingResourcePolicy` restore type `update` for the `secrets` and `daemonsets` present in the `gdpr-application` namespace.
```
Kind: Restore
…
includeNamespaces: gdpr-application
includeResources:
- secrets
- daemonsets
existingResourcePolicy: update
```
### Approach 2: Add a new spec field `existingResourcePolicyConfig` to the Restore API
In this approach we give user the ability to specify which resources are to be included for a particular kind of force update behaviour, essentially a more granular approach where in the user is able to specify a resource:behaviour mapping. It would look like:
`existingResourcePolicyConfig`:
-`patch:`
-`includedResources:` [ ]string
-`recreate:`
-`includedResources:` [ ]string
*Note:*
- There is no `none` behaviour in this approach as that would conform to the current/default Velero restore behaviour.
- The `recreate` option is a non-goal for this enhancement proposal, but it is considered as a future scope.
Example:
A. The following Restore will execute the restore type `patch` and apply the `existingResourcePolicyConfig` for `secrets` and `daemonsets` present in the `inventory-app` namespace.
```
Kind: Restore
…
includeNamespaces: inventory-app
existingResourcePolicyConfig:
patch:
includedResources
- secrets
- daemonsets
```
### Approach 3: Combination of Approach 1 and Approach 2
Now, this approach is somewhat a combination of the aforementioned approaches. Here we propose addition of two spec fields to the Restore API - `existingResourceDefaultPolicy` and `existingResourcePolicyOverrides`. As the names suggest ,the idea being that `existingResourceDefaultPolicy` would describe the default velero behaviour for this restore and `existingResourcePolicyOverrides` would override the default policy explicitly for some resources.
Example:
A. The following Restore will execute the restore type `patch` as the `existingResourceDefaultPolicy` but will override the default policy for `secrets` using the `existingResourcePolicyOverrides` spec as `none`.
```
Kind: Restore
…
includeNamespaces: inventory-app
existingResourceDefaultPolicy: patch
existingResourcePolicyOverrides:
none:
includedResources
- secrets
```
## Detailed Design
### Approach 1: Add a new spec field `existingResourcePolicy` to the Restore API
The `existingResourcePolicy` spec field will be an `PolicyType` type field.
Restore API:
```
type RestoreSpec struct {
.
.
.
// ExistingResourcePolicy specifies the restore behaviour for the Kubernetes resource to be restored
// +optional
ExistingResourcePolicy PolicyType
}
```
PolicyType:
```
type PolicyType string
const PolicyTypeNone PolicyType = "none"
const PolicyTypePatch PolicyType = "update"
```
### Approach 2: Add a new spec field `existingResourcePolicyConfig` to the Restore API
The `existingResourcePolicyConfig` will be a spec of type `PolicyConfiguration` which gets added to the Restore API.
Restore API:
```
type RestoreSpec struct {
.
.
.
// ExistingResourcePolicyConfig specifies the restore behaviour for a particular/list of Kubernetes resource(s) to be restored
The restore workflow changes will be done [here](https://github.com/vmware-tanzu/velero/blob/b40bbda2d62af2f35d1406b9af4d387d4b396839/pkg/restore/restore.go#L1245)
### CLI changes for Approach 1
We would introduce a new CLI flag called `existing-resource-policy` of string type. This flag would be used to accept the
policy from the user. The velero restore command would look somewhat like this:
This is intended as a replacement for the previously-approved Upload Progress Monitoring design
([Upload Progress Monitoring](upload-progress.md)) in order to expand the supported use cases beyond
snapshot uploads to include what was previously called Async Backup/Restore Item Actions. This
updated design should handle the combined set of use cases for those previously separate designs.
Volume snapshotter plugin are used by Velero to take snapshots of persistent volume contents.
Depending on the underlying storage system, those snapshots may be available to use immediately,
they may be uploaded to stable storage internally by the plugin or they may need to be uploaded after
the snapshot has been taken. We would like for Velero to continue on to the next part of the backup as quickly
as possible but we would also like the backup to not be marked as complete until it is a usable backup. We'd also
eventually like to bring the control of upload under the control of Velero and allow the user to make decisions
about the ultimate destination of backup data independent of the storage system they're using.
We would also like any internal or third party Backup or Restore Item Action to have the option of
making use of this same ability to run some external process without blocking the current backup or
restore operation. Beyond Volume Snapshotters, this is also needed for data mover operations on both
backup and restore, and potentially useful for other third party operations -- for example
in-cluster registry image backup or restore could make use of this feature in a third party plugin).
### Glossary
-<b>BIA</b>: BackupItemAction
-<b>RIA</b>: RestoreItemAction
## Examples
- AWS - AWS snapshots return quickly, but are then uploaded in the background and cannot be used until EBS moves
the data into S3 internally.
- vSphere - The vSphere plugin takes a local snapshot and then the vSphere plugin uploads the data to S3. The local
snapshot is usable before the upload completes.
- Restic - Does not go through the volume snapshot path. Restic backups will block Velero progress
until completed. However, with the more generalized approach in the revised design, restic/kopia
backup and restore *could* make use of this framework if their actions are refactored as
Backup/RestoreItemActions.
- Data Movers
- Data movers are asynchronous processes executed inside backup/restore item actions that applies to a specific Kubernetes resources. A common use case for data mover is to backup/restore PVCs whose data we want to move to some form of backup storage outside of using velero kopia/restic implementations.
- Workflow
- User takes velero backup of PVC A
- BIA plugin applies to PVCs with compatible storage driver
- BIA plugin triggers data mover
- Most common use case would be for the plugin action to create a new CR which would
trigger an external controller action
- Another possible use case would be for the plugin to run its own async action in a
goroutine, although this would be less resilient to plugin container restarts.
- BIA plugin returns
- Velero backup process continues
- Main velero backup process monitors running BIA threads via gRPC to determine if process is done and healthy
## Primary changes from the original Upload Progress Monitoring design
The most fundamental change here is that rather than proposing a new special-purpose
SnapshotItemAction, the existing BackupItemAction plugin will be modified to accommodate an optional
snapshot (or other item operation) ID return. The primary reasons for this change are as follows:
1. The intended scope has moved beyond snapshot processing, so it makes sense to support
asynchronous operations in other backup or restore item actions.
2. We expect to have plugin API versioning implemented in Velero 1.10, making it feasible to
implement changes in the existing plugin APIs now.
3. We will need this feature on both backup and restore, meaning that if we took the "new plugin
type" approach, we'd need two new plugin types.
4. Other than the snapshot/operation ID return, the rest of the plugin processing is identical to
Backup/RestoreItemActions. With separate plugin types, we'd have to repeat all of that logic
(including dealing with additional items, etc.) twice.
The other major change is that we will be applying this to both backups and restores, although the
Volume Snapshotter use case only needs this on backup. This means that everything we're doing around
backup phase and workflow will also need to be done for restore.
Then there are various minor changes around terminology to make things more generic. Instead of
"snapshotID", we'll have "operationID" (which for volume snapshotters will be a snapshot
ID).
## Goals
- Enable monitoring of backup/restore item action operations that continue after snapshotting and other operations have completed
- Keep non-usable backups and restores (upload/persistence has not finished, etc.) from appearing as completed
- Make use of plugin API versioning functionality to manage changes to Backup/RestoreItemAction interfaces
- Enable vendors to plug their own data movers into velero using BIA/RIA plugins
## Non-goals
- Today, Velero is unable to recover from an in progress backup when the velero server crashes (pod is deleted). This has an impact on running asynchronous processes, but it’s not something we intend to solve in this design.
## Models
### Internal configuration and management
In this model, movement of the snapshot to stable storage is under the control of the snapshot
plugin. Decisions about where and when the snapshot gets moved to stable storage are not
directly controlled by Velero. This is the model for the current VolumeSnapshot plugins.
### Velero controlled management
In this model, the snapshot is moved to external storage under the control of Velero. This
enables Velero to move data between storage systems. This also allows backup partners to use
Velero to snapshot data and then move the data into their backup repository.
## Backup and Restore phases
Velero currently has backup/restore phases "InProgress" and "Completed". A backup moves to the
Completed phase when all of the volume snapshots have completed and the Kubernetes metadata has been
written into the object store. However, the actual data movement may be happening in the background
after the backup has been marked "Completed". The backup is not actually a stable backup until the
data has been persisted properly. In some cases (e.g. AWS) the backup cannot be restored from until
the snapshots have been persisted.
Once the snapshots have been taken, however, it is possible for additional backups or restores (as
long as they don't use not-yet-completed backups) to be made without interference. Waiting until
all data has been moved before starting the next backup will slow the progress of the system without
adding any actual benefit to the user.
New backup/restore phases, "WaitingForPluginOperations" and
"WaitingForPluginOperationsPartiallyFailed" will be introduced. When a backup or restore has
entered one of these phases, Velero is free to start another backup/restore. The backup/restore
will remain in the "WaitingForPluginOperations" phase until all BIA/RIA operations have completed
(for example, for a volume snapshotter, until all data has been successfully moved to persistent
storage). The backup/restore will not fail once it reaches this phase, although an error return
from a plugin could cause a backup or restore to move to "PartiallyFailed". If the backup is
deleted (cancelled), the plugins will attempt to delete the snapshots and stop the data movement -
this may not be possible with all storage systems.
In addition, for backups (but not restores), there will also be two additional phases, "Finalizing"
and "FinalizingPartiallyFailed", which will handle any steps required after plugin operations have
all completed. Initially, this will just include adding any required resources to the backup that
might have changed during asynchronous operation execution, although eventually other cleanup
actions could be added to this phase.
### State progression

### New
When a backup/restore request is initially created, it is in the "New" phase.
The next state is either "InProgress" or "FailedValidation"
### FailedValidation
If the backup/restore request is incorrectly formed, it goes to the "FailedValidation" phase and
terminates
### InProgress
When work on the backup/restore begins, it moves to the "InProgress" phase. It remains in the
"InProgress" phase until all pre/post execution hooks have been executed, all snapshots have been
taken and the Kubernetes metadata and backup/restore info is safely written to the object store
plugin.
In the current implementation, Restic backups will move data during the "InProgress" phase. In the
future, it may be possible to combine a snapshot with a Restic (or equivalent) backup which would
allow for data movement to be handled in the "WaitingForPluginOperations" phase,
The next phase would be "WaitingForPluginOperations" for backups or restores which have unfinished
asynchronous plugin operations and no errors so far, "WaitingForPluginOperationsPartiallyFailed" for
backups or restores which have unfinished asynchronous plugin operations at least one error,
"Completed" for restores with no unfinished asynchronous plugin operations and no errors,
"PartiallyFailed" for restores with no unfinished asynchronous plugin operations and at least one
error, "Finalizing" for backups with no unfinished asynchronous plugin operations and no errors,
"FinalizingPartiallyFailed" for backups with no unfinished asynchronous plugin operations and at
least one error, or "PartiallyFailed". Backups/restores which would have a final phase of
"Completed" or "PartiallyFailed" may move to the "WaitingForPluginOperations" or
"WaitingForPluginOperationsPartiallyFailed" state. A backup/restore which will be marked "Failed"
will go directly to the "Failed" phase. Uploads may continue in the background for snapshots that
were taken by a "Failed" backup/restore, but no progress will not be monitored or updated. If there
are any operations in progress when a backup is moved to the "Failed" phase (although with the
current workflow, that shouldn't happen), Cancel() should be called on these operations. When a
"Failed" backup is deleted, all snapshots will be deleted and at that point any uploads still in
progress should be aborted.
### WaitingForPluginOperations (new)
The "WaitingForPluginOperations" phase signifies that the main part of the backup/restore, including
snapshotting has completed successfully and uploading and any other asynchronous BIA/RIA plugin
operations are continuing. In the event of an error during this phase, the phase will change to
WaitingForPluginOperationsPartiallyFailed. On success, the phase changes to
"Finalizing" for backups and "Completed" for restores. Backups cannot be
restored from when they are in the WaitingForPluginOperations state.
@@ -28,7 +28,7 @@ This document proposes adding _controller-tools_ to the project to automatically
_controller-tools_ works by reading the Go files that contain the API type definitions.
It uses a combination of the struct fields, types, tags and comments to build the OpenAPIv3 schema for the CRDs. The tooling makes some assumptions based on conventions followed in upstream Kubernetes and the ecosystem, which involves some changes to the Velero API type definitions, especially around optional fields.
In order for _controller-tools_ to read the Go files containing Velero API type defintiions, the CRDs need to be generated at build time, as these files are not available at runtime (i.e. the Go files are not accessible by the compiled binary).
In order for _controller-tools_ to read the Go files containing Velero API type definitions, the CRDs need to be generated at build time, as these files are not available at runtime (i.e. the Go files are not accessible by the compiled binary).
These generated CRD manifests (YAML) will then need to be available to the `pkg/install` package for it to include when installing Velero resources.
Currently, Velero doesn't have one flexible way to handle volumes.
If users want to skip the backup of volumes or only backup some volumes in different namespaces in batch, currently they need to use the opt-in and opt-out approach one by one, or use label-selector but if it has big different labels on each different related pod, which is cumbersome when they have lots of volumes to handle with. it would be convenient if Velero could provide policies to handle the backup of volumes just by `some specific volumes conditions`.
## Background
As of Today, Velero has lots of filters to handle (backup or skip backup) resources including resources filters like `IncludedNamespaces, ExcludedNamespaces`, label selectors like `LabelSelector, OrLabelSelectors`, annotation like `backup.velero.io/must-include-additional-items` etc. But it's not enough flexible to handle volumes, we need one generic way to handle volumes.
## Goals
- Introducing flexible policies to handle volumes, and do not patch any labels or annotations to the pods or volumes.
## Non-Goals
- We only handle volumes for backup and do not support restore.
- Currently, only handles volumes, and does not support other resources.
- Only environment-unrelated and platform-independent general volumes attributes are supported, do not support volumes attributes related to a specific environment.
## Use-cases/Scenarios
### Skip backup volumes by some attributes
Users want to skip PV with the requirements:
- option to skip all PV data
- option to skip specified PV type (RBD, NFS)
- option to skip specified PV size
- option to skip specified storage-class
## High-Level Design
First, Velero will provide the user with one YAML file template and all supported volume policies will be in.
Second, writing your own configuration file by imitating the YAML template, it could be partial volume policies from the template.
Third, create one configmap from your own configuration file, and the configmap should be in Velero install namespace.
Fourth, create a backup with the command `velero backup create --resource-policies-configmap $policiesConfigmap`, which will reference the current backup to your volume policies. At the same time, Velero will validate all volume policies user imported, the backup will fail if the volume policies are not supported or some items could not be parsed.
Fifth, the current backup CR will record the reference of volume policies configmap.
Sixth, Velero first filters volumes by other current supported filters, at last, it will apply the volume policies to the filtered volumes to get the final matched volume to handle.
## Detailed Design
The volume resources policies should contain a list of policies which is the combination of conditions and related `action`, when target volumes meet the conditions, the related `action` will take effection.
Below is the API Design for the user configuration:
### API Design
```go
typeVolumeActionTypestring
constSkipVolumeActionType="skip"
// Action defined as one action for a specific way of backup
typeActionstruct{
// Type defined specific type of action, it could be 'file-system-backup', 'volume-snapshot', or 'skip' currently
TypeVolumeActionType`yaml:"type"`
// Parameters defined map of parameters when executing a specific action
// we may support other resource policies in the future, and they could be added separately
// OtherResourcePolicies: []OtherResourcePolicy
}
```
The policies YAML config file would look like this:
```yaml
version:v1
volumePolicies:
# it's a list and if the input item matches the first policy, the latters will be ignored
# each policy consists of a list of conditions and an action
# each key in the object is one condition, and one policy will apply to resources that meet ALL conditions
- conditions:
# capacity condition matches the volumes whose capacity falls into the range
capacity:"0,100Gi"
csi:
driver:aws.ebs.csi.driver
fsType:ext4
storageClass:
- gp2
- ebs-sc
action:
type:volume-snapshot
parameters:
# optional parameters which are custom-defined parameters when doing an action
volume-snapshot-timeout:"6h"
- conditions:
capacity:"0,100Gi"
storageClass:
- gp2
- ebs-sc
action:
type:file-system-backup
- conditions:
nfs:
server:192.168.200.90
action:
# type of file-system-backup could be defined a second time
type:file-system-backup
- conditions:
nfs:{}
action:
type:skip
- conditions:
csi:
driver:aws.efs.csi.driver
action:
type:skip
```
### Filter rules
#### VolumePolicies
The whole resource policies consist of groups of volume policies.
For one specific volume policy which is a combination of one action and serval conditions. which means one action and serval conditions are the smallest unit of volume policy.
Volume policies are a list and if the target volumes match the first policy, the latter will be ignored, which would reduce the complexity of matching volumes especially when there are multiple complex volumes policies.
#### Action
`Action` defined one action for a specific way of backup:
- if choosing `Kopia` or `Restic`, the action value would be `file-system-backup`.
- if choosing volume snapshot, the action value would be `volume-snapshot`.
- if choosing skip backup of volume, the action value would be `skip`, and it will skip backup of volume no matter is `file-system-backup` or `volume-snapshot`.
The policies could be extended for later other ways of backup, which means it may have some other `Action` value that will be assigned in the future.
Both `file-system-backup``volume-snapshot`, and `skip` could be partially or fully configured in the YAML file. And configuration could take effect only for the related action.
#### Conditions
The conditions are serials of volume attributes, the matched Volumes should meet all the volume attributes in one conditions configuration.
##### Supported conditions
In Velero 1.11, we want to support the volume attributes listed below:
- capacity: matching volumes have the capacity that falls within this `capacity` range.
- storageClass: matching volumes those with specified `storageClass`, such as `gp2`, `ebs-sc` in eks.
- matching volumes that used specified volume sources.
##### Parameters
Parameters are optional for one specific action. For example, it could be `csi-snapshot-timeout: 6h` for CSI snapshot.
#### Special rule definitions:
- One single condition in `Conditions` with a specific key and empty value, which means the value matches any value. For example, if the `conditions.nfs` is `{}`, it means if `NFS` is used as `persistentVolumeSource` in Persistent Volume will be skipped no matter what the NFS server or NFS Path is.
- The size of each single filter value should limit to 256 bytes in case of an unfriendly long variable assignment.
- For capacity for PV or size for Volume, the value should include the lower value and upper value concatenated by commas. And it has several combinations below:
- "0,5Gi" or "0Gi,5Gi" which means capacity or size matches from 0 to 5Gi, including value 0 and value 5Gi
- ",5Gi" which is equal to "0,5Gi"
- "5Gi," which means capacity or size matches larger than 5Gi, including value 5Gi
- "5Gi" which is not supported and will be failed in validating configuration.
### Configmap Reference
Currently, resources policies are defined in `BackupSpec` struct, it will be more and more bloated with adding more and more filters which makes the size of `Backup` CR bigger and bigger, so we want to store the resources policies in configmap, and `Backup` CRD reference to current configmap.
the `configmap` user created would be like this:
```yaml
apiVersion:v1
data:
policies.yaml:
----
version:v1
volumePolicies:
- conditions:
capacity:"0,100Gi"
csi:
driver:aws.ebs.csi.driver
fsType:ext4
storageClass:
- gp2
- ebs-sc
action:
type:volume-snapshot
parameters:
volume-snapshot-timeout:"6h"
kind:ConfigMap
metadata:
creationTimestamp:"2023-01-16T14:08:12Z"
name:backup01
namespace:velero
resourceVersion:"17891025"
uid:b73e7f76-fc9e-4e72-8e2e-79db717fe9f1
```
A new variable `resourcePolices` would be added into `BackupSpec`, it's value is assigned with the current resources policy configmap
```yaml
apiVersion:velero.io/v1
kind:Backup
metadata:
name:backup-1
spec:
resourcePolices:
refType:Configmap
ref:backup01
...
```
The configmap only stores those assigned values, not the whole resources policies.
The name of the configmap is `$BackupName`, and it's in Velero install namespace.
#### Resource policies configmap related
The life cycle of resource policies configmap is managed by the user instead of Velero, which could make it more flexible and easy to maintain.
- The resource policies configmap will remain in the cluster until the user deletes it.
- Unlike backup, the resource policies configmap will not sync to the new cluster. So if the user wants to use one resource policies that do not sync to the new cluster, the backup will fail with resource policies not found.
- One resource policies configmap could be used by multiple backups.
- If the backup referenced resource policies configmap is been deleted, it won't affect the already existing backups, but if the user wants to reference the deleted configmap to create one new backup, it will fail with resource policies not found.
#### Versioning
We want to introduce the version field in the YAML data to contain break changes. Therefore, we won't follow a semver paradigm, for example in v1.11 the data look like this:
```yaml
version:v1
volumePolicies:
....
```
Hypothetically, in v1.12 we add new fields like clusterResourcePolicies, the version will remain as v1 b/c this change is backward compatible:
```yaml
version:v1
volumePolicies:
....
clusterResourcePolicies:
....
```
Suppose in v1.13, we have to introduce a break change, at this time we will bump up the version:
```yaml
version:v2
# This is just an example, we should try to avoid break change
volume-policies:
....
```
We only support one version in Velero, so it won't be recognized if backup using a former version of YAML data.
#### Multiple versions supporting
To manage the effort for maintenance, we will only support one version of the data in Velero. Suppose that there is one break change for the YAML data in Velero v1.13, we should bump up the config version to v2, and v2 is only supported in v1.13. For the existing data with version: v1, it should migrate them when the Velero startup, this won't hurt the existing backup schedule CR as it only references the configmap. To make the migration easier, the configmap for such resource filter policies should be labeled manually before Velero startup like this, Velero will migrate the labeled configmap.
We only support migrating from the previous version to the current version in case of complexity in data format conversion, which users could regenerate configmap in the new YAML data version, and it is easier to do version control.
```yaml
apiVersion:v1
kind:ConfigMap
metadata:
labels:
# This label can be optional but if this is not set, the backup will fail after the breaking change and the user will need to update the data manually
velero.io/resource-filter-policies:"true"
name:example
namespace:velero
data:
.....
```
### Display of resources policies
As the resource policies configmap is referenced by backup CR, the policies in configmap are not so intuitive, so we need to integrate policies in configmap to the output of the command `velero backup describe`, and make it more readable.
## Compatibility
Currently, we have these resources filters:
- IncludedNamespaces
- ExcludedNamespaces
- IncludedResources
- ExcludedResources
- LabelSelector
- OrLabelSelectors
- IncludeClusterResources
- UseVolumeSnapshots
- velero.io/exclude-from-backup=true
- backup.velero.io/backup-volumes-excludes
- backup.velero.io/backup-volumes
- backup.velero.io/must-include-additional-items
So it should be careful with the combination of volumes resources policies and the above resources filters.
- When volume resource policies conflict with the above resource filters, we should respect the above resource filters. For example, if the user used the opt-out approach to `backup.velero.io/backup-volumes-excludes` annotation on the pod and also defined include volume in volumes resources filters configuration, we should respect the opt-out approach to skip backup of the volume.
- If volume resource policies conflict with themselves, the first matched policy will be respect.
## Implementation
This implementation should be included in Velero v1.11.0
Currently, in Velero v1.11.0 we only support `Action`
`skip`, and support `file-system-backup` and `volume-snapshot` for the later version. And `Parameters` in `Action` is also not supported in v1.11.0, we will support in a later version.
In Velero 1.11, we supported Conditions and format listed below:
- capacity
```yaml
capacity: "10Gi,100Gi" // match volume has the size between 10Gi and 100Gi
```
- storageClass
```yaml
storageClass: // match volume has the storage class gp2 or ebs-sc
- gp2
- ebs-sc
```
- volume sources (currently only support below format and attributes)
1. Specify the volume source name, the name could be `nfs`, `rbd`, `iscsi`, `csi` etc.
```yaml
nfs : {} // match any volume has nfs volume source
csi : {} // match any volume has csi volume source
```
2. Specify details for the related volume source (currently we only support csi driver filter and nfs server or path filter)
```yaml
csi: // match volume has nfs volume source and using `aws.efs.csi.driver`
driver: aws.efs.csi.driver
nfs: // match volume has nfs volume source and using below server and path
server: 192.168.200.90
path: /mnt/nfs
```
The conditions also could be extended in later versions, such as we could further supporting filtering other volume source detail not only NFS and CSI.
## Alternatives Considered
### Configmap VS CRD
Here we support the user define the YAML config file and storing the resources policies into configmap, also we could define one resource's policies CRD and store policies imported from the user-defined config file in the related CR.
But CRD is more like one kind of resource with status, Kubernetes API Server handles the lifecycle of a CR and handles it in different statuses. Compared to CRD, Configmap is more focused to store data.
## Open Issues
Should we support more than one version of filter policies configmap?
# Proposal to add support for Resource Modifiers (AKA JSON Substitutions) in Restore Workflow
- [Proposal to add support for Resource Modifiers (AKA JSON Substitutions) in Restore Workflow](#proposal-to-add-support-for-resource-modifiers-aka-json-substitutions-in-restore-workflow)
- [Abstract](#abstract)
- [Goals](#goals)
- [Non Goals](#non-goals)
- [User Stories](#user-stories)
- [Scenario 1](#scenario-1)
- [Scenario 2](#scenario-2)
- [Detailed Design](#detailed-design)
- [Reference in velero API](#reference-in-velero-api)
- [ConfigMap Structure](#configmap-structure)
- [Operations supported by the JSON Patch library:](#operations-supported-by-the-json-patch-library)
- [Advance scenarios](#advance-scenarios)
- [Conditional patches using test operation](#conditional-patches-using-test-operation)
Currently velero supports substituting certain values in the K8s resources during restoration like changing the namespace, changing the storage class, etc. This proposal is to add generic support for JSON substitutions in the restore workflow. This will allow the user specify filters for particular resources and then specify a JSON patch (operator, path, value) to apply on a resource. This will allow the user to substitute any value in the K8s resource without having to write a new RestoreItemAction plugin for each kind of substitution.
<!-- ## Background -->
## Goals
- Allow the user to specify a GroupResource, Name(optional), JSON patch for modification.
- Allow the user to specify multiple JSON patch.
## Non Goals
- Deprecating the existing RestoreItemAction plugins for standard substitutions(like changing the namespace, changing the storage class, etc.)
## User Stories
### Scenario 1
- Alice has a PVC which is encrypted using a DES(Disk Encryption Set - Azure example) mentioned in the PVC YAML through the StorageClass YAML.
- Alice wishes to restore this snapshot to a different cluster. The new cluster does not have access to the same DES to provision disk's out of the snapshot.
- She wishes to use a different DES for all the PVCs which use the certain DES.
- She can use this feature to substitute the DES in all StorageClass YAMLs with the new DES without having to create a fresh storageclass, or understanding the name of the storageclass.
### Scenario 2
- Bob has multi zone cluster where nodes are spread across zones.
- Bob has pinned certain pods to a particular zone using nodeSelector/ nodeaffinity on the pod spec.
- In case of zone outage of the cloudprovider, Bob wishes to restore the workload to a different namespace in the same cluster, but change the zone pinning of the workload.
- Bob can use this feature to substitute the nodeSelector/ nodeaffinity in the pod spec with the new zone pinning to quickly failover the workload to a different zone's nodes.
## Detailed Design
- The design and approach is inspired from [kubectl patch command](https://github.com/kubernetes/kubectl/blob/0a61782351a027411b8b45b1443ec3dceddef421/pkg/cmd/patch/patch.go#L102C2-L104C1)
```bash
# Update a container's image using a json patch with positional arrays
- The user is expected to create a configmap with the desired Resource Modifications. Then the reference of the configmap will be provided in the RestoreSpec.
- The core restore workflow before creating/updating a particular resource in the cluster will be checked against the filters provided and respective substitutions will be applied on it.
### Reference in velero API
> Example of Reference to configmap in RestoreSpec
- User first needs to provide details on which resources the JSON Substitutions need to be applied.
- For this the user will provide 4 inputs - Namespaces(for NS Scoped resources), GroupResource (resource.group format similar to includeResources field in velero) and Name Regex(optional).
- If the user does not provide the Name, the JSON Substitutions will be applied to all the resources of the given Group and Kind under the given namespaces.
- Further the use will specify the JSON Patch using the structure of kubectl's "JSON Patch" based inputs.
- Sample data in ConfigMap
```yaml
version:v1
resourceModifierRules:
- conditions:
groupResource:persistentvolumeclaims
resourceNameRegex:"mysql.*"
namespaces:
- bar
- foo
patches:
- operation:replace
path:"/spec/storageClassName"
value:"premium"
- operation:remove
path:"/metadata/labels/test"
```
- The above configmap will apply the JSON Patch to all the PVCs in the namespaces bar and foo with name starting with mysql. The JSON Patch will replace the storageClassName with "premium" and remove the label "test" from the PVCs.
- Note that the Namespace here is the original namespace of the backed up resource, not the new namespace where the resource is going to be restored.
- The user can specify multiple JSON Patches for a particular resource. The patches will be applied in the order specified in the configmap. A subsequent patch is applied in order and if multiple patches are specified for the same path, the last patch will override the previous patches.
- The user can specify multiple resourceModifierRules in the configmap. The rules will be applied in the order specified in the configmap.
> Users need to create one configmap in Velero install namespace from a YAML file that defined resource modifiers. The creating command would be like the below:
```bash
kubectl create cm <configmap-name> --from-file <yaml-file> -n velero
```
### Operations supported by the JSON Patch library:
- add
- remove
- replace
- move
- copy
- test (covered below)
### Advance scenarios
#### **Conditional patches using test operation**
The `test` operation can be used to check if a particular value is present in the resource. If the value is present, the patch will be applied. If the value is not present, the patch will not be applied. This can be used to apply a patch only if a particular value is present in the resource. For example, if the user wishes to change the storage class of a PVC only if the PVC is using a particular storage class, the user can use the following configmap.
1. JSON Path based addressal of json fields in the resource
- This was the initial planned approach, but there is no open source library which gives satisfactory edit functionality with support for all operators supported by the JsonPath RFC.
- We attempted modifying the [https://kubernetes.io/docs/reference/kubectl/jsonpath/](https://kubernetes.io/docs/reference/kubectl/jsonpath/) but given the complexity of the code it did not make sense to change it since it would become a long term maintainability problem.
1. RestoreItemAction for each kind of standard substitution
- Not an extensible design. If a new kind of substitution is required, a new RestoreItemAction needs to be written.
1. RIA for JSON Substitution: The approach of doing JSON Substitution through a RestoreItemAction plugin was considered. But it is likely to have performance implications as the plugin will be invoked for all the resources.
## Security Considerations
No security impact.
## Compatibility
Compatibility with existing StorageClass mapping RestoreItemAction and similar plugins needs to be evaluated.
## Implementation
- Changes in Restore CRD. Add a new field to the RestoreSpec to reference the configmap.
- One example of where code will be modified: https://github.com/vmware-tanzu/velero/blob/eeee4e06d209df7f08bfabda326b27aaf0054759/pkg/restore/restore.go#L1266 On the obj before Creation, we can apply the conditions to check if the resource is filtered out using given parameters. Then using JsonPatch provided, we can update the resource.
- For Jsonpatch - https://github.com/evanphx/json-patch library is used.
- Additional features such as wildcard support in path, regex match support in value, etc. can be added in future. This would involve forking the https://github.com/evanphx/json-patch library and adding the required features, since those features are not supported by the library currently and are not part of jsonpatch RFC.
# Proposal to Support JSON Merge Patch and Strategic Merge Patch in Resource Modifiers
- [Proposal to Support JSON Merge Patch and Strategic Merge Patch in Resource Modifiers](#proposal-to-support-json-merge-patch-and-strategic-merge-patch-in-resource-modifiers)
- [Abstract](#abstract)
- [Goals](#goals)
- [Non Goals](#non-goals)
- [User Stories](#user-stories)
- [Scenario 1](#scenario-1)
- [Scenario 2](#scenario-2)
- [Detailed Design](#detailed-design)
- [How to choose the right patch type](#how-to-choose-the-right-patch-type)
- [New Field MergePatches](#new-field-mergepatches)
- [New Field StrategicPatches](#new-field-strategicpatches)
- [Conditional Patches in ALL Patch Types](#conditional-patches-in-all-patch-types)
- [Wildcard Support for GroupResource](#wildcard-support-for-groupresource)
- [Helper Command to Generate Merge Patch and Strategic Merge Patch](#helper-command-to-generate-merge-patch-and-strategic-merge-patch)
Velero introduced the concept of Resource Modifiers in v1.12.0. This feature allows the user to specify a configmap with a set of rules to modify the resources during restore. The user can specify the filters to select the resources and then specify the JSON Patch to apply on the resource. This feature is currently limited to the operations supported by JSON Patch RFC.
This proposal is to add support for JSON Merge Patch and Strategic Merge Patch in the Resource Modifiers. This will allow the user to use the same configmap to apply JSON Merge Patch and Strategic Merge Patch on the resources during restore.
## Goals
- Allow the user to specify a JSON patch, JSON Merge Patch or Strategic Merge Patch for modification.
- Allow the user to specify multiple JSON Patch, JSON Merge Patch or Strategic Merge Patch.
- Allow the user to specify mixed JSON Patch, JSON Merge Patch and Strategic Merge Patch in the same configmap.
## Non Goals
- Deprecating the existing RestoreItemAction plugins for standard substitutions(like changing the namespace, changing the storage class, etc.)
## User Stories
### Scenario 1
- Alice has some Pods and part of them have an annotation `{"for": "bar"}`.
- Alice wishes to restore these Pods to a different cluster without this annotation.
- Alice can use this feature to remove this annotation during restore.
### Scenario 2
- Bob has a Pod with several containers and one container with name nginx has an image `repo1/nginx`.
- Bob wishes to restore this Pod to a different cluster, but new cluster can not access repo1, so he pushes the image to repo2.
- Bob can use this feature to update the image of container nginx to `repo2/nginx` during restore.
## Detailed Design
- The design and approach is inspired by kubectl patch command and [this doc](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/).
- New fields `MergePatches` and `StrategicPatches` will be added to the `ResourceModifierRule` struct to support all three patch types.
- Only one of the three patch types can be specified in a single `ResourceModifierRule`.
- Add wildcard support for `groupResource` in `conditions` struct.
- The workflow to create Resource Modifier ConfigMap and reference it in RestoreSpec will remain the same as described in document [Resource Modifiers](https://github.com/vmware-tanzu/velero/blob/main/site/content/docs/main/restore-resource-modifiers.md).
### How to choose the right patch type
- [JSON Merge Patch](https://datatracker.ietf.org/doc/html/rfc7386) is a naively simple format, with limited usability. Probably it is a good choice if you are building something small, with very simple JSON Schema.
- [JSON Patch](https://datatracker.ietf.org/doc/html/rfc6902) is a more complex format, but it is applicable to any JSON documents. For a comparison of JSON patch and JSON merge patch, see [JSON Patch and JSON Merge Patch](https://erosb.github.io/post/json-patch-vs-merge-patch/).
- Strategic Merge Patch is a Kubernetes defined patch type, mainly used to process resources of type list. You can replace/merge a list, add/remove items from a list by key, change the order of items in a list, etc. Strategic merge patch is not supported for custom resources. For more details, see [this doc](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/).
### New Field MergePatches
MergePatches is a list to specify the merge patches to be applied on the resource. The merge patches will be applied in the order specified in the configmap. A subsequent patch is applied in order and if multiple patches are specified for the same path, the last patch will override the previous patches.
Example of MergePatches in ResourceModifierRule
```yaml
version:v1
resourceModifierRules:
- conditions:
groupResource:pods
namespaces:
- ns1
mergePatches:
- patchData:|
{
"metadata": {
"annotations": {
"foo": null
}
}
}
```
- The above configmap will apply the Merge Patch to all the pods in namespace ns1 and remove the annotation `foo` from the pods.
- Both json and yaml format are supported for the patchData.
### New Field StrategicPatches
StrategicPatches is a list to specify the strategic merge patches to be applied on the resource. The strategic merge patches will be applied in the order specified in the configmap. A subsequent patch is applied in order and if multiple patches are specified for the same path, the last patch will override the previous patches.
Example of StrategicPatches in ResourceModifierRule
```yaml
version:v1
resourceModifierRules:
- conditions:
groupResource:pods
resourceNameRegex:"^my-pod$"
namespaces:
- ns1
strategicPatches:
- patchData:|
{
"spec": {
"containers": [
{
"name": "nginx",
"image": "repo2/nginx"
}
]
}
}
```
- The above configmap will apply the Strategic Merge Patch to the pod with name my-pod in namespace ns1 and update the image of container nginx to `repo2/nginx`.
- Both json and yaml format are supported for the patchData.
### Conditional Patches in ALL Patch Types
Since JSON Merge Patch and Strategic Merge Patch do not support conditional patches, we will use the `test` operation of JSON Patch to support conditional patches in all patch types by adding it to `Conditions` struct in `ResourceModifierRule`.
- The above configmap will apply the Merge Patch to all the PVCs in all namespaces with storageClassName premium and remove the annotation `foo` from the PVCs.
- You can specify multiple rules in the `matches` list. The patch will be applied only if all the matches are satisfied.
### Wildcard Support for GroupResource
The user can specify a wildcard for groupResource in the conditions' struct. This will allow the user to apply the patches for all the resources of a particular group or all resources in all groups. For example, `*.apps` will apply to all the resources in the `apps` group, `*` will apply to all the resources in all groups.
### Helper Command to Generate Merge Patch and Strategic Merge Patch
The patchData of Strategic Merge Patch is sometimes a bit complex for user to write. We can provide a helper command to generate the patchData for Strategic Merge Patch. The command will take the original resource and the modified resource as input and generate the patchData.
- Use "github.com/evanphx/json-patch" to support JSON Merge Patch.
- Use "k8s.io/apimachinery/pkg/util/strategicpatch" to support Strategic Merge Patch.
- Use glob to support wildcard for `groupResource` in `conditions` struct.
- Use `test` operation of JSON Patch to calculate the `matches` in `conditions` struct.
## Future enhancements
- add a Velero subcommand to generate/validate the patchData for Strategic Merge Patch and JSON Merge Patch.
- add jq support for more complex conditions or patches, to meet the situations that the current conditions or patches can not handle. like [this issue](https://github.com/vmware-tanzu/velero/issues/6344)
# Proposal to add support for Multiple VolumeSnapshotClasses in CSI Plugin
- [Proposal to add support for Multiple VolumeSnapshotClasses in CSI Plugin](#proposal-to-add-support-for-multiple-volumesnapshotclasses-in-csi-plugin)
Currently the Velero CSI plugin chooses the VolumeSnapshotClass in the cluster that has the same driver name and also has the velero.io/csi-volumesnapshot-class label set on it. This global selection is not sufficient for many use cases. This proposal is to add support for multiple VolumeSnapshotClasses in CSI Plugin where the user can specify the VolumeSnapshotClass to use for a particular driver and backup.
## Background
The Velero CSI plugin chooses the VolumeSnapshotClass in the cluster that has the same driver name and also has the velero.io/csi-volumesnapshot-class label set on it. This global selection is not sufficient for many use cases. For example, if a cluster has multiple VolumeSnapshotClasses for the same driver, the user may want to use a VolumeSnapshotClass that is different from the default one. The user might also have different schedules set up for backing up different parts of the cluster and might wish to use different VolumeSnapshotClasses for each of these backups.
## Goals
- Allow the user to specify the VolumeSnapshotClass to use for a particular driver and backup.
## Non Goals
- Deprecating existing VSC selection behaviour. (The current behaviour will remain the default behaviour if the user does not specify the VolumeSnapshotClass to use for a particular driver and backup.)
## User Stories
### Scenario 1
- Consider Alice is a cluster admin and has a cluster with multiple VolumeSnapshotClasses for the same driver. Each VSC stores the snapshots taken in different ResourceGroup(Azure equivalent).
- Alice has configured multiple scheduled backups each covering a different set of namespaces, representing different apps owned by different teams.
- Alice wants to use a different VolumeSnapshotClass for each backup such that each snapshot goes in it's respective ResourceGroup to simply management of snapshots(COGS, RBAC etc).
- In current velero, Alice can't achieve this as the CSI plugin will use the default VolumeSnapshotClass for the driver and all snapshots will go in the same ResourceGroup.
- Proposed design will allow Alice to achieve this by specifying the VolumeSnapshotClass to use for a particular driver and backup/schedule.
## Scenario 2
- Bob is a cluster admin has PVCs storing different types of data.
- Most of the PVCs are used for storing non-sensitive application data. But certain PVCs store critical financial data.
- For such PVCs Bob wants to use a VolumeSnapshotClass with certain encryption related parameters set.
- In current velero, Bob can't achieve this as the CSI plugin will use the default VolumeSnapshotClass for the driver and all snapshots will be taken using the same VolumeSnapshotClass.
- Proposed design will allow Bob to achieve this by overriding the VolumeSnapshotClass to use for a particular driver and backup/schedule using annotations on those specific PVCs.
## Detailed Design
### Staged Approach:
### Stage 1 Approach
#### Through Annotations
1.**Support VolumeSnapshotClass selection at backup/schedule level**
The user can annotate the backup/ schedule with driver and VolumeSnapshotClass name. The CSI plugin will use the VolumeSnapshotClass specified in the annotation. If the annotation is not present, the CSI plugin will use the default VolumeSnapshotClass for the driver.
To query the annotations on a backup: "velero.io/csi-volumesnapshot-class_'driver name'" - where driver names comes from the PVC's driver.
2. **Support VolumeSnapshotClass selection at PVC level**
The user can annotate the PVCs with driver and VolumeSnapshotClass name. The CSI plugin will use the VolumeSnapshotClass specified in the annotation. If the annotation is not present, the CSI plugin will use the default VolumeSnapshotClass for the driver. If the VolumeSnapshotClass provided is of a different driver, the CSI plugin will use the default VolumeSnapshotClass for the driver.
Consider this as a override option in conjunction to part 1.
**Note**: The user has to annotate the PVCs or backups with the VolumeSnapshotClass to use for each driver. This is not ideal for the user experience.
- **Mitigation**: We can extend Velero CLI to also annotate backups/schedules with the VolumeSnapshotClass to use for each driver. This will make it easier for the user to annotate the backups/schedules. This mitigation is not for the PVCs though, since PVCs is anyways a specific use case. Similar to : " kubectl run --image myimage --annotations="foo=bar" --annotations="another=one" mypod"
We can add support for - velero backup create my-backup --annotations "velero.io/csi:csi.cloud.disk.driver=csi-diskdriver-snapclass"
### Stage 2 Approach
The above annotations route is to get started and for initial design closure/ implementation, north star is to either introduce CSI specific fields (considering that CSI might be a very core part of velero going forward) in the backup/restore CR OR leverage the pluginInputs field as being tracked in: https://github.com/vmware-tanzu/velero/pull/5981
Refer section Alternatives 2. **Through generic property bag in the velero contracts**: in the design doc for more details on the pluginInputs field.
## Alternatives Considered
1. **Through CSI Specific Fields in Velero contracts**
**Considerations**
- Since CSI snapshotting is done through the plugin, we don't intend to bloat up the Backup Spec with CSI specific fields.
- But considering that CSI Snapshotting is the way forward, we can debate if we should add a CSI section to the Backup Spec.
**Approach**: Similar to VolumeSnapshotLocation param in the Backup Spec, we can add a VolumeSnapshotClass param in the Backup Spec. This will allow the user to specify the VolumeSnapshotClass to use for the backup. The CSI plugin will use the VolumeSnapshotClass specified in the Backup Spec. If the VolumeSnapshotClass is not specified, the CSI plugin will use the default VolumeSnapshotClass for the driver.
*example of VolumeSnapshotClass param in the Backup Spec:*
```yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
name: backup-1
spec:
csiParameters:
volumeSnapshotClasses:
driver: csi.cloud.disk.driver
snapClass: csi-diskdriver-snapclass
timeout: 10m
```
1. **Through changes in velero contracts**
1. **Through configmap references.**
Currently even the storageclass mapping plugin expects the user to create a configmap which is used globally, and fetched through labels. This behaviour has same issue as the VolumeSnapshotClass selection. We can introduce a field in the velero contracts which allow passing configmap references for each plugin. And then the plugin can honour the configmap passed in as reference. The configmap can be used to pass the VolumeSnapshotClass to use for the backup, and also other parameters to tweak. This can help in making plugins more flexible while not depending on global behaviour.
*example of configmap reference in the velero contracts:*
```yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
name: backup-1
spec:
configmapRefs:
- name: csi-volumesnapshotclass-configmap
- namespace: velero
- plugin: velero.io/csi
```
2. **Through generic property bag in the velero contracts**: We can introduce a field in the velero contracts which allow passing a generic property bag for each plugin. And then the plugin can honour the property bag passed in.
*example of property bag in the velero contracts:*
```yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
name: backup-1
spec:
pluginInputs:
- name: velero.io/csi
- properties:
- key: csi.cloud.disk.driver
- value: csi-diskdriver-snapclass
- key: csi.cloud.file.driver
- value: csi-filedriver-snapclass
```
**Note**: Both these approaches can also be used to tweak other parameters such as CSI Snapshotting Timeout/intervals. And further can be used by other plugins.
## Security Considerations
No security impact.
## Compatibility
Existing behaviour of csi plugin will be retained where it fetches the VolumeSnapshotClass through the label. This will be the default behaviour if the user does not specify the VolumeSnapshotClass.
## Implementation
TBD based on closure of high level design proposals.
# Ensure support for backing up resources based on multiple labels
## Abstract
As of today Velero supports filtering of resources based on single label selector per backup. It is desired that Velero
support backing up of resources based on multiple labels (OR logic).
**Note:** This solution is required because Kubernetes label selectors only allow AND logic of labels.
## Background
Currently, Velero's Backup/Restore API has a spec field `LabelSelector` which helps in filtering of resources based on
a **single** label value per backup/restore request. For instance, if the user specifies the `Backup.Spec.LabelSelector` as
`data-protection-app: true`, Velero will grab all the resources that possess this label and perform the backup
operation on them. The `LabelSelector` field does not accept more than one labels, and thus if the user want to take
backup for resources consisting of a label from a set of labels (label1 OR label2 OR label3) then the user needs to
create multiple backups per label rule. It would be really useful if Velero Backup API could respect a set of
labels (OR Rule) for a single backup request.
Related Issue: https://github.com/vmware-tanzu/velero/issues/1508
## Goals
- Enable support for backing up resources based on multiple labels (OR Logic) in a single backup config.
- Enable support for restoring resources based on multiple labels (OR Logic) in a single restore config.
## Use Case/Scenario
Let's say as a Velero user you want to take a backup of secrets, but all these secrets do not have one single consistent
label on them. We want to take backup of secrets having any one label in `app=gdpr`, `app=wpa` and `app=ccpa`. Here
we would have to create 3 instances of backup for each label rule. This can become cumbersome at scale.
## High-Level Design
### Addition of `OrLabelSelectors` spec to Velero Backup/Restore API
For Velero to back up resources if they consist of any one label from a set of labels, we would like to add a new spec
field `OrLabelSelectors` which would enable user to specify them. The Velero backup would somewhat look like:
```
apiVersion: velero.io/v1
kind: Backup
metadata:
name: backup-101
namespace: openshift-adp
spec:
includedNamespaces:
- test
storageLocation: velero-sample-1
ttl: 720h0m0s
orLabelSelectors:
- matchLabels:
app=gdpr
- matchLabels:
app=wpa
- matchLabels:
app=ccpa
```
**Note:** This approach will **not** be changing any current behavior related to Backup API spec `LabelSelector`. Rather we
propose that the label in `LabelSelector` spec and labels in `OrLabelSelectors` should be treated as different Velero functionalities.
Both these fields will be treated as separate Velero Backup API specs. If `LabelSelector` (singular) is present then just match that label.
And if `OrLabelSelectors` is present then match to any label in the set specified by the user. For backup case, if both the `LabelSelector` and `OrLabelSelectors`
are specified (we do not anticipate this as a real world use-case) then the `OrLabelSelectors` will take precedence, `LabelSelector` will
only be used to filter only when `OrLabelSelectors` is not specified by the user. This helps to keep both spec behaviour independent and not confuse the users.
This way we preserve the existing Velero behaviour and implement the new functionality in a much cleaner way.
For instance, let's take a look the following cases:
1. Only `LabelSelector` specified: Velero will create a backup with resources matching label `app=protect-db`
```
apiVersion: velero.io/v1
kind: Backup
metadata:
name: backup-101
namespace: openshift-adp
spec:
includedNamespaces:
- test
storageLocation: velero-sample-1
ttl: 720h0m0s
labelSelector:
- matchLabels:
app=gdpr
```
2. Only `OrLabelSelectors` specified: Velero will create a backup with resources matching any label from set `{app=gdpr, app=wpa, app=ccpa}`
```
apiVersion: velero.io/v1
kind: Backup
metadata:
name: backup-101
namespace: openshift-adp
spec:
includedNamespaces:
- test
storageLocation: velero-sample-1
ttl: 720h0m0s
orLabelSelectors:
- matchLabels:
app=gdpr
- matchLabels:
app=wpa
- matchLabels:
app=ccpa
```
Similar implementation will be done for the Restore API as well.
## Detailed Design
With the Introduction of `OrLabelSelectors` the BackupSpec and RestoreSpec will look like:
BackupSpec:
```
type BackupSpec struct {
[...]
// OrLabelSelectors is a set of []metav1.LabelSelector to filter with
// when adding individual objects to the backup. Resources matching any one
// label from the set of labels will be added to the backup. If empty
// or nil, all objects are included. Optional.
// +optional
OrLabelSelectors []\*metav1.LabelSelector
[...]
}
```
RestoreSpec:
```
type RestoreSpec struct {
[...]
// OrLabelSelectors is a set of []metav1.LabelSelector to filter with
// when restoring objects from the backup. Resources matching any one
// label from the set of labels will be restored from the backup. If empty
// or nil, all objects are included from the backup. Optional.
// +optional
OrLabelSelectors []\*metav1.LabelSelector
[...]
}
```
The logic to collect resources to be backed up for a particular backup will be updated in the `backup/item_collector.go`
around [here](https://github.com/vmware-tanzu/velero/blob/574baeb3c920f97b47985ec3957debdc70bcd5f8/pkg/backup/item_collector.go#L294).
And for filtering the resources to be restored, the changes will go [here](https://github.com/vmware-tanzu/velero/blob/d1063bda7e513150fd9ae09c3c3c8b1115cb1965/pkg/restore/restore.go#L1769)
**Note:**
- This feature will not be exposed via Velero CLI.
**Velero Generic Data Path (VGDP)**: VGDP is the collective of modules that is introduced in [Unified Repository design][1]. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository.
## Background
Velero node-agent is a daemonset hosting controllers and VGDP modules to complete the concrete work of backups/restores, i.e., PodVolume backup/restore, Volume Snapshot Data Movement backup/restore.
For example, node-agent runs DataUpload controllers to watch DataUpload CRs for Volume Snapshot Data Movement backups, so there is one controller instance in each node. One controller instance takes a DataUpload CR and then launches a VGDP instance, which initializes a uploader instance and the backup repository connection, to finish the data transfer. The VGDP instance runs inside the node-agent pod or in a pod associated to the node-agent pod in the same node.
Varying from the data size, data complexity, resource availability, VGDP may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.).
Technically, VGDP instances are able to run in concurrent regardless of the requesters. For example, a VGDP instance for a PodVolume backup could run in parallel with another VGDP instance for a DataUpload. Then the two VGDP instances share the same resources if they are running in the same node.
Therefore, in order to gain the optimized performance with the limited resources, it is worthy to configure the concurrent number of VGDP per node. When the resources are sufficient in nodes, users can set a large concurrent number, so as to reduce the backup/restore time; otherwise, the concurrency should be reduced, otherwise, the backup/restore may encounter problems, i.e., time lagging, hang or OOM kill.
## Goals
- Define the behaviors of concurrent VGDP instances in node-agent
- Create a mechanism for users to specify the concurrent number of VGDP per node
## Non-Goals
- VGDP instances from different nodes always run in concurrent since in most common cases the resources are isolated. For special cases that some resources are shared across nodes, there is no support at present
- In practice, restores run in prioritized scenarios, e.g., disaster recovery. However, the current design doesn't consider this difference, a VGDP instance for a restore is blocked if it reaches to the limit of the concurrency, even though the ones block it are for backups. If users do meet some problems here, they should consider to stop the backups first
- Sometimes, users wants to totally block backups/restores from running in a specific node, this is out of the scope the current design. To archive this, more modules need to be considered (i.e., expoers of data movers), simply blocking the VGDP (e.g., by setting its concurrent number to 0) doesn't work. E.g., for a fs backup, VGDP instance must run in the node the source pod is running in, if we simply block from VGDP instance, the PodVolumeBackup CR is still submitted but never processed.
## Solution
We introduce a configMap named ```node-agent-config``` for users to specify the node-agent related configurations. This configMap is not created by Velero, users should create it manually on demand. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only.
Node-agent server checks these configurations at startup time and use it to initiate the related VGDP modules. Therefore, users could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted.
The ```node-agent-config``` configMap may be used for other purpose of configuring node-agent in future, at present, there is only one kind of configuration as the data in the configMap, the name is ```loadConcurrency```.
The data structure for ```node-agent-config``` is as below:
```go
type Configs struct {
// LoadConcurrency is the config for load concurrency per node.
// Number specifies the number value associated to the matched nodes
Number int `json:"number"`
}
```
### Global concurrent number
We allow users to specify a concurrent number that will be applied to all nodes if the per-node number is not specified. This number is set through ```globalConfig```.
The number starts from 1 which means there is no concurrency, only one instance of VGDP is allowed. There is no roof limit.
If this number is not specified or not valid, a hard-coded default value will be used, the value is set to 1.
### Per-node concurrent number
We allow users to specify different concurrent number per node, for example, users can set 3 concurrent instances in Node-1, 2 instances in Node-2 and 1 instance in Node-3. This is for below considerations:
- The resources may be different among nodes. Then users could specify smaller concurrent number for nodes with less resources while larger number for the ones with more resources
- Help users to isolate critical environments. Users may run some critical workloads in some specified nodes, since VGDP instances may take large resource consumption, users may want to run less number of instances in the nodes with critical workloads
The range of Per-node concurrent number is the same with Global concurrent number.
Per-node concurrent number is preferable to Global concurrent number, so it will overwrite the Global concurrent number for that node.
Per-node concurrent number is implemented through ```perNodeConfig``` field.
```perNodeConfig``` is a list of ```RuledConfigs``` each item of which matches one or more nodes by label selectors and specify the concurrent number for the matched nodes. This means, the nodes are identified by labels.
For example, the ```perNodeConfig`` could have below elements:
The first element means the node with host name ```node1``` gets the Per-node concurrent number of 3.
The second element means all the nodes with label ```beta.kubernetes.io/instance-type``` of value ```Standard_B4ms``` get the Per-node concurrent number of 5.
At least one node is expected to have a label with the specified ```RuledConfigs``` element (rule). If no node is with this label, the Per-node rule makes no effect.
If one node falls into more than one rules, e.g., if node1 also has the label ```beta.kubernetes.io/instance-type=Standard_B4ms```, the smallest number (3) will be used.
### Sample
A sample of the ```node-agent-config``` configMap is as below:
To create the configMap, users need to save something like the above sample to a json file and then run below command:
```
kubectl create cm node-agent-config -n velero --from-file=<json file name>
```
### Global data path manager
As for the code implementation, data path manager is to maintain the total number of the running VGDP instances and ensure the limit is not excceeded. At present, there is one data path manager instance per controller, as a result, the concurrent numbers are calculated separately for each controller. This doesn't help to limit the concurrency among different requesters.
Therefore, we need to create one global data path manager instance server-wide, and pass it to different controllers. The instance will be created at node-agent server startup.
The concurrent number is required to initiate a data path manager, the number comes from either Per-node concurrent number or Global concurrent number.
Below are some prototypes related to data path manager:
@@ -393,7 +393,7 @@ Deletion of `VolumePluginBackup` CR can be delegated to plugin. Plugin can perfo
### 'core' Velero client/server required changes
- Creation of the VolumePluginBackup/VolumePluginRestore CRDs at installation time
- Persistence of VolumePluginBackup CRs towards the end of the backup operation
- Persistence of VolumePluginBackup CRs towards the end of the backup operation
- As part of backup synchronization, VolumePluginBackup CRs related to the backup will be synced.
- Deletion of VolumePluginBackup when volumeshapshotter's DeleteSnapshot is called
- Deletion of VolumePluginRestore as part of handling deletion of Restore CR
@@ -429,7 +429,7 @@ Instead, a new method for 'Progress' will be added to interface. Velero server r
But, this involves good amount of changes and needs a way for backward compatibility.
As volume plugins are mostly K8s native, its fine to go ahead with current limiation.
As volume plugins are mostly K8s native, its fine to go ahead with current limitation.
### Update Backup CR
Instead of creating new CRs, plugins can directly update the status of Backup CR. But, this deviates from current approach of having separate CRs like PodVolumeBackup/PodVolumeRestore to know operations progress.
This proposal outlines an approach to support versioning of Velero's plugin APIs to enable changes to those APIs.
It will allow for backwards compatible changes to be made, such as the addition of new plugin methods, but also backwards incompatible changes such as method removal or method signature changes.
## Background
When changes are made to Velero’s plugin APIs, there is no mechanism for the Velero server to communicate the version of the API that is supported, or for plugins to communicate what version they implement.
This means that any modification to a plugin API is a backwards incompatible change as it requires all plugins which implement the API to update and implement the new method.
There are several components involved to use plugins within Velero.
From the perspective of the core Velero codebase, all plugin kinds (e.g. `ObjectStore`, `BackupItemAction`) are defined by a single API interface and all interactions with plugins are managed by a plugin manager which provides an implementation of the plugin API interface for Velero to use.
Velero communicates with plugins via gRPC.
The core Velero project provides a framework (using the [go-plugin project](https://github.com/hashicorp/go-plugin)) for plugin authors to use to implement their plugins which manages the creation of gRPC servers and clients.
Velero plugins import the Velero plugin library in order to use this framework.
When a change is made to a plugin API, it needs to be made to the Go interface used by the Velero codebase, and also to the rpc service definition which is compiled to form part of the framework.
As each plugin kind is defined by a single interface, when a plugin imports the latest version of the Velero framework, it will need to implement the new APIs in order to build and run successfully.
If a plugin does not use the latest version of the framework, and is used with a newer version of Velero that expects the plugin to implement those methods, this will result in a runtime error as the plugin is incompatible.
With this proposal, we aim to break this coupling and introduce plugin API versions.
## Scenarios to Support
The following describes interactions between Velero and its plugins that will be supported with the implementation of this proposal.
For the purposes of this list, we will refer to existing Velero and plugin versions as `v1` and all following versions as version `n`.
Velero client communicating with plugins or plugin client calling other plugins:
- Version `n` client will be able to communicate with Version `n` plugin
- Version `n` client will be able to communicate with all previous versions of the plugin (Version `n-1` back to `v1`)
Velero plugins importing Velero framework:
-`v1` plugin built against Version `n` Velero framework
- A plugin may choose to only implement a `v1` API, but it must be able to be built using Version `n` of the Velero framework
## Goals
- Allow plugin APIs to change without requiring all plugins to implement the latest changes (even if they upgrade the version of Velero that is imported)
- Allow plugins to choose which plugin versions they support and enable them to support multiple versions
- Support breaking changes in the plugin APIs such as method removal or method signature changes
- Establish a design process for modifying plugin APIs such as method addition and removal and signature changes
- Establish a process for newer Velero clients to use older versions of a plugin API through adaptation
## Non Goals
- Change how plugins are managed or added
- Allow older plugin clients to communicate with new versions of plugins
## High-Level Design
With each change to a plugin API, a new version of the plugin interface and the proto service definition will be created which describes the new plugin API.
The plugin framework will be adapted to allow these new plugin versions to be registered.
Plugins can opt to implement any or all versions of an API, however Velero will always attempt to use the latest version, and the plugin management will be modified to adapt earlier versions of a plugin to be compatible with the latest API where possible.
Under the existing plugin framework, any new plugin version will be treated as a new plugin with a new kind.
The plugin manager (which provides implementations of a plugin to Velero) will include an adapter layer which will manage the different versions and provide the adaptation for versions which do not implement the latest version of the plugin API.
Providing an adaptation layer enables Velero and other plugin clients to use an older version of a plugin if it can be safely adapted.
As the plugins will be able to introduce backwards incompatible changes, it will _not_ be possible for older version of Velero to use plugins which only support the latest versions of the plugin APIs.
Although adding new rpc methods to a service is considered a backwards compatible change within gRPC, due to the way the proto definitions are compiled and included in the framework used by plugins, this will require every plugin to implement the new methods.
Instead, we are opting to treat the addition of a method to an API as one requiring versioning.
The addition of optional fields to existing structs which are used as parameters to or return values of API methods will not be considered as a change requiring versioning.
These kinds of changes do not modify method signatures and have been safely made in the past with no impact on existing plugins.
## Detailed Design
The following areas will need to be adapted to support plugin versioning.
### Plugin Interface Definitions
To provide versioned plugins, any change to a plugin interface (method addition, removal, or signature change) will require a new versioned interface to be created.
Currently, all plugin interface definitions reside in `pkg/plugin/velero` in a file corresponding to their plugin kind.
These files will be rearranged to be grouped by kind and then versioned: `pkg/plugin/velero/<plugin_kind>/<version>/`.
The following are examples of how each change may be treated:
#### Complete Interface Change
If the entire `ObjectStore` interface is being changed such that no previous methods are being included, a file would be added to `pkg/plugin/velero/objectstore/v2/` and would contain the new interface definition:
```
type ObjectStore interface {
// Only include new methods that the new API version will support
NewMethod()
// ...
}
```
#### Method Addition
If a method is being added to the `ObjectStore` API, a file would be added to `pkg/plugin/velero/objectstore/v2/` and may contain a new API definition as follows:
// Import all the methods from the previous version of the API if they are to be included as is
v1.ObjectStore
// Provide definitions of any new methods
NewMethod()
```
#### Method Removal
If a method is being removed from the `ObjectStore` API, a file would be added to `pkg/plugin/velero/objectstore/v2/` and may contain a new API definition as follows:
```
type ObjectStore interface {
// Methods which are required from the previous API version must be included, for example
Init(config)
PutObject(bucket, key, body)
// ...
// Methods which are to be removed are not included
```
#### Method Signature modification
If a method signature in the `ObjectStore` API is being modified, a file would be added to `pkg/plugin/velero/objectstore/v2/` and may contain a new API definition as follows:
```
type ObjectStore interface {
// Methods which are required from the previous API version must be included, for example
Init(config)
PutObject(bucket, key, body)
// ...
// Provide new definitions for methods which are being modified
List(bucket, prefix, newParameter)
}
```
### Proto Service Definitions
The proto service definitions of the plugins will also be versioned and arranged by their plugin kind.
Currently, all the proto definitions reside under `pkg/plugin/proto` in a file corresponding to their plugin kind.
These files will be rearranged to be grouped by kind and then versioned: `pkg/plugin/proto/<plugin_kind>/<version>`,
except for the current v1 plugins. Those will remain in their current package/location for backwards compatibility.
This will allow plugin images built with earlier versions of velero to work with the latest velero (for v1 plugins
only). The go_package option will be added to all proto service definitions to allow the proto compilation script
to place the generated go code for each plugin api version in the proper go package directory.
It is not possible to import an existing proto service into a new one, so any methods will need to be duplicated across versions if they are required by the new version.
The message definitions can be shared however, so these could be extracted from the service definition files and placed in a file that can be shared across all versions of the service.
### Plugin Framework
To allow plugins to register which versions of the API they implement, the plugin framework will need to be adapted to accept new versions.
Currently, the plugin manager stores a [`map[string]RestartableProcess`](https://github.com/vmware-tanzu/velero/blob/main/pkg/plugin/clientmgmt/manager.go#L69), where the string key is the binary name for the plugin process (e.g. "velero-plugin-for-aws").
Each `RestartableProcess` contains a [`map[kindAndName]interface{}`](https://github.com/vmware-tanzu/velero/blob/main/pkg/plugin/clientmgmt/restartable_process.go#L60) which represents each of the unique plugin implementations provided by that binary.
[`kindAndName`](https://github.com/vmware-tanzu/velero/blob/main/pkg/plugin/clientmgmt/registry.go#L42) is a struct which combines the plugin kind (`ObjectStore`, `VolumeSnapshotter`) and the plugin name ("velero.io/aws", "velero.io/azure").
Each plugin version registration must be unique (to allow for multiple versions to be implemented within the same plugin binary).
This will be achieved by adding a specific registration method for each version to the Server interface in the plugin framework.
For example, if adding a V2 `RestoreItemAction` plugin, the Server interface would be modified to add the `RegisterRestoreItemActionV2` method.
This would require [adding a new plugin Kind const](https://github.com/vmware-tanzu/velero/blob/main/pkg/plugin/framework/plugin_kinds.go#L28-L46) to represent the new plugin version, e.g. `PluginKindRestoreItemActionV2`.
It also requires the creation of a new implementation of the go-plugin interface ([example](https://github.com/vmware-tanzu/velero/blob/main/pkg/plugin/framework/object_store.go)) to support that version and use the generated gRPC code from the proto definition (including a client and server implementation).
The Server will also need to be adapted to recognize this new plugin Kind and to serve the new implementation.
Existing plugin Kind consts and registration methods will be left unchanged and will correspond to the current version of the plugin APIs (assumed to be v1).
### Plugin Manager
The plugin manager is responsible for managing the lifecycle of plugins.
It provides an interface which is used by Velero to retrieve an instance of a plugin kind with a specific name (e.g. `ObjectStore` with the name "velero.io/aws").
The manager contains a registry of all available plugins which is populated during the main Velero server startup.
When the plugin manager is requested to provide a particular plugin, it checks the registry for that plugin kind and name.
If it is available in the registry, the manager retrieves a `RestartableProcess` for the plugin binary, creating it if it does not already exist.
That `RestartableProcess` is then used by individual restartable implementations of a plugin kind (e.g. `restartableObjectStore`, `restartableVolumeSnapshotter`).
As new plugin versions are added, the plugin manager will be modified to always retrieve the latest version of a plugin kind.
This is to allow the remainder of the Velero codebase to assume that it will always interact with the latest version of a plugin.
If the latest version of a plugin is not available, it will attempt to fall back to previous versions and use an implementation adapted to the latest version if available.
It will be up to the author of new plugin versions to determine whether a previous version of a plugin can be adapted to work with the interface of the new version.
For each plugin kind, a new `Restartable<PluginKind>` struct will be introduced which will contain the plugin Kind and a function, `Get`, which will instantiate a restartable instance of that plugin kind and perform any adaptation required to make it compatible with the latest version.
For example, `RestartableObjectStore` or `RestartableVolumeSnapshotter`.
For each restartable plugin kind, a new function will be introduced which will return a slice of `Restartable<PluginKind>` objects, sorted by version in descending order.
The manager will iterate through the list of `Restartable<PluginKind>`s and will check the registry for the given plugin kind and name.
If the requested version is not found, it will skip and continue to iterate, attempting to fetch previous versions of the plugin kind.
Once the requested version is found, the `Get` function will be called, returning the restartable implementation of the latest version of that plugin Kind.
```
type RestartableObjectStore struct {
kind framework.PluginKind
// Get returns a restartable ObjectStore for the given name and process, wrapping if necessary
Get func(name string, restartableProcess RestartableProcess) v2.ObjectStore
return nil, fmt.Errorf("unable to get valid ObjectStore for %q", name)
}
```
If the previous version is not available, or can not be adapted to the latest version, it should not be included in the `restartableObjectStores` slice.
This will result in an error being returned as is currently the case when a plugin implementation for a particular kind and provider can not be found.
There are situations where it may be beneficial to check at the point where a plugin API call is made whether it implements a specific version of the API.
This is something that can be addressed with future amendments to this design, however it does not seem to be necessary at this time.
#### Plugin Adaptation
When a new plugin API version is being proposed, it will be up to the author and the maintainer team to determine whether older versions of an API can be safely adapted to the latest version.
An adaptation will implement the latest version of the plugin API interface but will use the methods from the version that is being adapted.
In cases where the methods signatures remain the same, the adaptation layer will call through to the same method in the version being adapted.
Examples where an adaptation may be safe:
- A method signature is being changed to add a new parameter but the parameter could be optional (for example, adding a context parameter). The adaptation could call through to the method provided in the previous version but omit the parameter.
- A method signature is being changed to remove a parameter, but it is safe to pass a default value to the previous version. The adaptation could call through to the method provided in the previous version but use a default value for the parameter.
- A new method is being added but does not impact any existing behaviour of Velero (for example, a new method which will allow Velero to [wait for additional items to be ready](https://github.com/vmware-tanzu/velero/blob/main/design/wait-for-additional-items.md)). The adaptation would return a value which allows the existing behaviour to be performed.
- A method is being deleted as it is no longer used. The adaptation would call through to any methods which are still included but would omit the deleted method in the adaptation.
Examples where an adaptation may not be safe:
- A new method is added which is used to provide new critical functionality in Velero. If this functionality can not be replicated using existing plugin methods in previous API versions, this should not be adapted and instead the plugin manager should return an error indicating that the plugin implementation can not be found.
### Restartable Plugin Process
As new versions of plugins are added, new restartable implementations of plugins will also need to be created.
These are currently located within "pkg/plugin/clientmgmt" but will be rearranged to be grouped by kind and version like other plugin files.
## Versioning Considerations
It should be noted that if changes are being made to a plugin's API, it will only be necessary to bump the API version once within a release cycle, regardless of how many changes are made within that cycle.
This is because the changes will only be available to consumers when they upgrade to the next minor version of the Velero library.
New plugin API versions will not be introduced or backported to patch releases.
Once a new minor or major version of Velero has been released however, any further changes will need to follow the process above and use a new API version.
## Alternatives Considered
### Relying on gRPC’s backwards compatibility when adding new methods
One approach for adapting the plugin APIs would have been to rely on the fact that adding methods to gRPC services is a backwards compatible change.
This approach would allow older clients to communicate with newer plugins as the existing interface would still be provided.
This was considered but ruled out as our current framework would require any plugin that recompiles using the latest version of the framework to adapt to the new version.
Also, without specific versioned interfaces, it would require checking plugin implementations at runtime for the specific methods that are supported.
## Compatibility
This design doc aims to allow plugin API changes to be made in a manner that may provide some backwards compatibility.
Older versions of Velero will not be able to make use of new plugin versions however may continue to use previous versions of a plugin API if supported by the plugin.
All compatibility concerns are addressed earlier in the document.
## Implementation
This design document primarily outlines an approach to allow future plugin API changes to be made.
However, there are changes to the existing code base that will be made to allow plugin authors to more easily propose and introduce changes to these APIs.
* Plugin interface definitions (currently in `pkg/plugin/velero`) will be rearranged to be grouped by kind and then versioned: `pkg/plugin/velero/<plugin_kind>/<version>/`.
* Proto definitions (currently in `pkg/plugin/proto`) will be rearranged to be grouped by kind and then versioned: `pkg/plugin/proto/<plugin_kind>/<version>`.
* This will also require changes to the `make update` build task to correctly find the new proto location and output to the versioned directories.
It is anticipated that changes to the plugin APIs will be made as part of the 1.9 release cycle.
To assist with this work, an additional follow-up task to the ones listed above would be to prepare a V2 version of each of the existing plugins.
These new versions will not yet provide any new API methods but will provide a layout for new additions to be made
Create a new metadata file in the backup repository's backup name sub-directory to store the backup-including PVC and PV information. The information includes the way of backing up the PVC and PV data, snapshot information, and status. The needed snapshot status can also be recorded there, but the Velero-Native snapshot plugin doesn't provide a way to get the snapshot size from the API, so it's possible that not all snapshot size information is available.
This new additional metadata file is needed when:
* Get a summary of the backup's PVC and PV information, including how the data in them is backed up, or whether the data in them is skipped from backup.
* Find out how the PVC and PV should be restored in the restore process.
* Retrieve the PV's snapshot information for backup.
## Background
There is already a [PR](https://github.com/vmware-tanzu/velero/pull/6496) to track the skipped PVC in the backup. This design will depend on it and go further to get a summary of PVC and PV information, then persist into a metadata file in the backup repository.
In the restore process, the Velero server needs to decide how the PV resource should be restored according to how the PV is backed up. The current logic is to check whether it's backed up by Velero-native snapshot, by file-system backup, or having `DeletionPolicy` set as `Delete`.
The checks are made by the backup-generated PVBs or Snapshots. There is no generic way to find this information, and the CSI backup and Snapshot data movement backup are not covered.
Another thing that needs noticing is when describing the backup, there is no generic way to find the PV's snapshot information.
## Goals
- Create a new metadata file to store backup's PVCs and PVs information and volume data backing up method. The file can be used to let downstream consumers generate a summary.
- Create a generic way to let the Velero server know how the PV resources are backed up.
- Create a generic way to let the Velero server find the PV corresponding snapshot information.
## Non Goals
- Unify how to get snapshot size information for all PV backing-up methods, and all other currently not ready PVs' information.
## High-Level Design
Create _backup-name_-volumes-info.json metadata file in the backup's repository. This file will be encoded to contain all the PVC and PV information included in the backup. The information covers whether the PV or PVC's data is skipped during backup, how its data is backed up, and the backed-up detail information.
Please notice that the new metadata file includes all skipped volume information. This is used to address [the second phase needs of skipped volumes information](https://github.com/vmware-tanzu/velero/issues/5834#issuecomment-1526624211).
The `restoreItem` function can decode the _backup-name_-volumes-info.json file to determine how to handle the PV resource.
## Detailed Design
### The VolumeInfo structure
_backup-name_-volumes-info.json file is a structure that contains an array of structure `VolumeInfo`.
``` golang
type VolumeInfo struct {
PVCName string // The PVC's name.
PVCNamespace string // The PVC's namespace.
PVName string // The PV name.
BackupMethod string // The way the volume data is backed up. The valid value includes `VeleroNativeSnapshot`, `PodVolumeBackup` and `CSISnapshot`.
SnapshotDataMoved bool // Whether the volume's snapshot data is moved to specified storage.
Skipped boolean // Whether the Volume is skipped in this backup.
SkippedReason string // The reason for the volume is skipped in the backup.
// CSISnapshotInfo is used for displaying the CSI snapshot status
type CSISnapshotInfo struct {
SnapshotHandle string // It's the storage provider's snapshot ID for CSI.
Size int64 // The snapshot corresponding volume size.
Driver string // The name of the CSI driver.
VSCName string // The name of the VolumeSnapshotContent.
}
// SnapshotDataMovementInfo is used for displaying the snapshot data mover status.
type SnapshotDataMovementInfo struct {
DataMover string // The data mover used by the backup. The valid values are `velero` and ``(equals to `velero`).
UploaderType string // The type of the uploader that uploads the snapshot data. The valid values are `kopia` and `restic`.
RetainedSnapshot string // The name or ID of the snapshot associated object(SAO). SAO is used to support local snapshots for the snapshot data mover, e.g. it could be a VolumeSnapshot for CSI snapshot data moign/pv_backup_info.
SnapshotHandle string // It's the filesystem repository's snapshot ID.
}
// VeleroNativeSnapshotInfo is used for displaying the Velero native snapshot status.
type VeleroNativeSnapshotInfo struct {
SnapshotHandle string // It's the storage provider's snapshot ID for the Velero-native snapshot.
VolumeType string // The cloud provider snapshot volume type.
VolumeAZ string // The cloud provider snapshot volume's availability zones.
IOPS string // The cloud provider snapshot volume's IOPS.
}
// PodVolumeBackupInfo is used for displaying the PodVolumeBackup snapshot status.
type PodVolumeBackupInfo struct {
SnapshotHandle string // It's the file-system uploader's snapshot ID for PodVolumeBackup.
Size int64 // The snapshot corresponding volume size.
UploaderType string // The type of the uploader that uploads the data. The valid values are `kopia` and `restic`.
VolumeName string // The PVC's corresponding volume name used by Pod: https://github.com/kubernetes/kubernetes/blob/e4b74dd12fa8cb63c174091d5536a10b8ec19d34/pkg/apis/core/types.go#L48
PodName string // The Pod name mounting this PVC.
PodNamespace string // The Pod namespace.
NodeName string // The PVB-taken k8s node's name.
}
// PVInfo is used to store some PV information modified after creation.
// Those information are lost after PV recreation.
type PVInfo struct {
ReclaimPolicy string // ReclaimPolicy of PV. It could be different from the referenced StorageClass.
Labels map[string]string // The PV's labels should be kept after recreation.
}
```
### How the VolumeInfo array is generated.
The function `persistBackup` has `backup *pkgbackup.Request` in parameters.
From it, the `VolumeSnapshots`, `PodVolumeBackups`, `CSISnapshots`, `itemOperationsList`, and `SkippedPVTracker` can be read. All of them will be iterated and merged into the `VolumeInfo` array, and then persisted into backup repository in function `persistBackup`.
Please notice that the change happened in async operations are not reflected in the new metadata file. The file only covers the volume changes happen in the Velero server process scope.
A new methods are added to BackupStore to download the VolumeInfo metadata file.
Uploading the metadata file is covered in the exiting `PutBackup` method.
#### Generate the PVC backed-up information summary
The downstream tools can use this VolumeInfo array to format and display their volume information. This is not in the scope of this feature.
#### Retrieve volume backed-up information for `velero backup describe` command
The `velero backup describe` can also use this VolumeInfo array structure to display the volume information. The snapshot data mover volume should use this structure at first, then the Velero native snapshot, CSI snapshot, and PodVolumeBackup can also use this structure. The detailed implementation is also not in this feature's scope.
#### Let restore know how to restore the PV
In the function `restoreItem`, it will determine whether to restore the PV resource by checking it in the Velero native snapshots list, PodVolumeBackup list, and its DeletionPolicy. This logic is still kept. The logic will be used when the new `VolumeInfo` metadata cannot be found to support backward compatibility.
``` golang
if groupResource == kuberesource.PersistentVolumes {
switch {
case hasSnapshot(name, ctx.volumeSnapshots):
...
case hasPodVolumeBackup(obj, ctx):
...
case hasDeleteReclaimPolicy(obj.Object):
...
default:
...
```
After introducing the VolumeInfo array, the following logic will be added.
``` golang
if groupResource == kuberesource.PersistentVolumes {
volumeInfo := GetVolumeInfo(pvName)
switch volumeInfo.BackupMethod {
case VeleroNativeSnapshot:
...
case PodVolumeBackup:
...
case CSISnapshot:
...
default:
// Need to check whether the volume is backed up by the SnapshotDataMover.
if volumeInfo.SnapshotDataMovement:
// Check whether the Velero server should restore the PV depending on the DeletionPolicy setting.
if volumeInfo.Skipped:
```
### How the VolumeInfo metadata file is deleted
_backup-name_-volumes-info.json file is deleted during backup deletion.
## Alternatives Considered
The restore process needs more information about how the PVs are backed up to determine whether this PV should be restored. The released branches also need a similar function, but backporting a new feature into previous releases may not be a good idea, so according to [Anshul Ahuja's suggestion](https://github.com/vmware-tanzu/velero/issues/6595#issuecomment-1731081580), adding more cases here to support checking PV backed-up by CSI plugin and CSI snapshot data mover: https://github.com/vmware-tanzu/velero/blob/5ff5073cc3f364bafcfbd26755e2a92af68ba180/pkg/restore/restore.go#L1206-L1324.
## Security Considerations
There should be no security impact introduced by this design.
## Compatibility
After this design is implemented, there should be no impact on the existing [skipped PVC summary feature](https://github.com/vmware-tanzu/velero/pull/6496).
To support older version backup, which doesn't have the VolumeInfo metadata file, the old logic, which is checking the Velero native snapshots list, PodVolumeBackup list, and PVC DeletionPolicy, is still kept, and supporting CSI snapshots and snapshot data mover logic will be added too.
## Implementation
This will be implemented in the Velero v1.13 development cycle.
# Restore API Group Version by Priority Level When EnableAPIGroupVersions Feature is Set
Status: Draft
Status: Accepted
## Abstract
@@ -29,7 +29,7 @@ During restore, the proposal is that Velero will determine if the `APIGroupVersi
The proposed code starts with creating three lists for each backed up resource. The three lists will be created by
(1) reading the directory names in the backup tarball file and seeing which API group versions were backed up from the source cluster,
(2) looking at the target cluster and determining which API group versions are supported, and
(3) getting config maps from the target cluster in order to get user-defined prioritization of versions.
(3) getting ConfigMaps from the target cluster in order to get user-defined prioritization of versions.
The three lists will be used to create a map of chosen versions for each resource to restore. If there is a user-defined list of priority versions, the versions will be checked against the supported versions lists. The highest user-defined priority version that is/was supported by both target and source clusters will be the chosen version for that resource. If no user specified versions are supported by neither target nor source, the versions will be logged and the restore will continue with other prioritizations.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.