Compare commits

...

14 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
2dea6b7d71 Update base image to fix glibc vulnerabilities CVE-2023-4813 and CVE-2024-33600
Co-authored-by: kaovilai <11228024+kaovilai@users.noreply.github.com>
2025-09-10 21:52:07 +00:00
copilot-swe-agent[bot]
128d9427dc Initial plan 2025-09-10 21:46:16 +00:00
dependabot[bot]
02edbc0c65 Bump actions/stale from 9.1.0 to 10.0.0 (#9232)
Some checks failed
Run the E2E test on kind / build (push) Failing after 4s
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / Build (push) Failing after 4s
Bumps [actions/stale](https://github.com/actions/stale) from 9.1.0 to 10.0.0.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/stale/compare/v9.1.0...v10.0.0)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-version: 10.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-10 16:44:18 -05:00
lyndon-li
3be76da952 Merge pull request #8991 from sseago/concurrent-backup-design
Some checks failed
Run the E2E test on kind / build (push) Failing after 2m18s
Run the E2E test on kind / setup-test-matrix (push) Successful in 1m20s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / Build (push) Failing after 43s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 7s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 4s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 3s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 4s
Concurrent backup design doc
2025-09-05 11:21:40 +08:00
Scott Seago
7132720a49 Concurrent backup design doc
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-09-03 12:09:55 -04:00
Xun Jiang/Bruce Jiang
2dbfbc29e8 Merge pull request #9214 from weeix/patch-1
Some checks failed
Run the E2E test on kind / build (push) Failing after 7s
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / Build (push) Failing after 3s
Close stale issues and PRs / stale (push) Successful in 26s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 4s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 4s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 14m0s
clarify VolumeSnapshotClass error for mismatched driver/provisioner
2025-09-03 15:12:09 +08:00
weeix
80da461458 clarify VolumeSnapshotClass error for mismatched driver/provisioner
Signed-off-by: weeix <weeix@users.noreply.github.com>
2025-09-02 18:31:13 -05:00
Xun Jiang/Bruce Jiang
fdee2700a7 Merge pull request #9219 from blackpiglet/9157_e2e
Some checks failed
Run the E2E test on kind / build (push) Failing after 4s
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / Build (push) Failing after 3s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 10s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 5s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 4s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 5s
Add E2E auto case for node-agent-config validation.
2025-09-02 22:37:33 +08:00
Xun Jiang
8e1c4a7dc5 Add E2E cases for node-agent-configmap.
Some checks failed
Run the E2E test on kind / build (push) Failing after 11s
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Fix the default BackupRepoConfig setting issue.
Delete PriorityClass in migration case clean stage.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-09-02 15:03:20 +08:00
lyndon-li
09b5183fce Merge pull request #9173 from clementnuss/feat/backup-pvc-annotations
Some checks failed
Run the E2E test on kind / build (push) Failing after 8s
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / Build (push) Failing after 3s
Close stale issues and PRs / stale (push) Successful in 13s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 3m0s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 55s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 2s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 21s
feat: Permit specifying annotations for the BackupPVC
2025-08-29 16:46:30 +08:00
Clément Nussbaumer
c5b70b4a0d test: fix backuppvc annotations test case
Signed-off-by: Clément Nussbaumer <clement.nussbaumer@postfinance.ch>
2025-08-29 10:10:41 +02:00
Clément Nussbaumer
248a840918 feat: Permit specifying annotations for the BackupPVC
Signed-off-by: Clément Nussbaumer <clement.nussbaumer@postfinance.ch>
2025-08-29 10:10:41 +02:00
Xun Jiang/Bruce Jiang
04fb20676d Merge pull request #9215 from blackpiglet/9135_e2e
Add E2E test cases for repository maintenance job configuration.
2025-08-29 13:27:55 +08:00
Xun Jiang
996d2a025f Add E2E test cases for repository maintenance job configuration.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-08-28 20:06:15 +08:00
18 changed files with 1071 additions and 22 deletions

View File

@@ -7,7 +7,7 @@ jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v9.1.0
- uses: actions/stale@v10.0.0
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
stale-issue-message: "This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands."

View File

@@ -73,7 +73,7 @@ RUN mkdir -p /output/usr/bin && \
go clean -modcache -cache
# Velero image packing section
FROM paketobuildpacks/run-jammy-tiny:latest
FROM paketobuildpacks/run-jammy-tiny:0.2.73
LABEL maintainer="Xun Jiang <jxun@vmware.com>"

View File

@@ -0,0 +1 @@
feat: Permit specifying annotations for the BackupPVC

View File

@@ -0,0 +1,257 @@
# Concurrent Backup Processing
This enhancement will enable Velero to process multiple backups at the same time. This is largely a usability enhancement rather than a performance enhancement, since the overall backup throughput may not be significantly improved over the current implementation, since we are already processing individual backup items in parallel. It is a significant usability improvement, though, as with the current design, a user who submits a small backup may have to wait significantly longer than expected if the backup is submitted immediately after a large backup.
## Background
With the current implementation, only one backup may be `InProgress` at a time. A second backup created will not start processing until the first backup moves on to `WaitingForPluginOperations` or `Finalizing`. This is a usability concern, especially in clusters when multiple users are initiating backups. With this enhancement, we intend to allow multiple backups to be processed concurrently. This will allow backups to start processing immediately, even if a large backup was just submitted by another user. This enhancement will build on top of the prior parallel item processing feature by creating a dedicatede ItemBlock worker pool for each running backup. The pool will be created at the beginning of the backup reconcile, and the input channel will be passed to the Kubernetes backupper just like it is in the current release.
The primary challenge is to make sure that the same workload in multiple backups is not backed up concurrently. If that were to happen, we would risk data corruption, especially around the processing of pod hooks and volume backup. For this first release we will take a conservative, high-level approach to overlap detection. Two backups will not run concurrently if there is any overlap in included namespaces. For example, if a backup that includes `ns1` and `ns2` is running, then a second backup for `ns2` and `ns3` will not be started. If a backup which does not filter namespaces is running (either a whole cluster backup or a non-namespace-limited backup with a label selector) then no other backups will be started, since a backup across all namespaces overlaps with any other backup. Calculating item-level overlap for queued backups is problematic since we don't know which items are included in a backup until backup processing has begun. A future release may add ItemBlock overlap detection, where at the item block worker level, the same item will not be processed by two different workers at the same time. This works together with workload conflict detection to further detect conflicts in a more granular level for shared resources between backups. Eventually, with a more complete understanding of individual workloads (either via ItemBlocks or some higher level model), the namespace level overlap detection may be relaxed in future versions.
## Goals
- Process multiple backups concurrently
- Detect namespace overlap to avoid conflicts
- For queued backups (not yet runnable due to concurrency limits or overlap), indicate the queue position in status
## Non Goals
- Handling NFS PVs when more than one PV point to the same underlying NFS share
- Handling VGDP cancellation for failed backups on restart
- Mounting a PVC for scenarios in which /tmp is too small for the number of concurrent backups
- Providing a mechanism to identify high priority backups which get preferential treatment in terms of ItemBlock worker availability
- Item-level overlap detection (future feature)
- Providing the ability to disable namespace-level overlap detection once Item-level overlap detection is in place (although this may be supported in a future version).
## High-Level Design
### Backup CRD changes
Two new backup phases will be added: `Queued` and `ReadyToStart`. In the Backup workflow, new backups will be moved to the Queued phase when they are added to the backup queue. When a backup is removed from the queue because it is now able to run, it will be moved to the `ReadyToStart` phase, which will allow the backup controller to start processing it.
In addition, a new Status field, `QueuePosition`, will be added to track the backup's current position in the queue.
### New Controller: `backupQueueReconciler`
A new reconciler will be added, `backupQueueReconciler` which will use the current `backupReconciler` logic for reconciling `New` backups but instead of running the backup, it will move the Backup to the `Queued` phase and set `QueuePosition`.
In addition, this reconciler will periodically reconcile all queued backups (on some configurable time interval) and if there is a runnable backup, remove it from the queue, update `QueuePosition` for any queued backups behind it, and update its phase to `ReadyToStart`.
Queued backups will be reconciled in order based on `QueuePosition`, so the first runnable backup found will be processed. A backup is runnable if both of the following conditions are true:
1) The total number of backups either `InProgress` or `ReadyToStart` is less than the configured number of concurrent backups.
2) The backup has no overlap with any backups currently `InProgress` or `ReadyToStart` or with any `Queued` backups with a higher (i.e. closer to 1) queue position than this backup.
### Updates to Backup controller
The current `backupReconciler` will change its reconciling rules. Instead of watching and reconciling New backups, it will reconcile `ReadyToStart` backups. In addition, it will be configured to run in parallel by setting `MaxConcurrentReconciles` based on the `concurrent-backups` server arg.
The startup (and shutdown) of the ItemBlock worker pool will be moved from reconciler startup to the backup reconcile, which will give each running backup its own dedicated worker pool. The per-backup worker pool will will use the existing `--item-block-worker-count` installer/server arg. This means that the maximum number of ItemBlock workers for the entire Velero pod will be the ItemBlock worker count multiplied by concurrentBackups. For example, if concurrentBackups is 5, and itemBlockWorkerCount is 6, then there will be, at most, 30 worker threads active, 5 dedicated to each InProgress backup, but this maximum will only be achieved when the maximum number of backups are InProgress. This also means that each InProgress backup will have a dedicated ItemBlock input channel with the same fixed buffer size.
## Detailed Design
### New Install/Server configuration args
A new install/server arg, `concurrent-backups` will be added. This will be an int-valued field specifying the number of backups which may be processed concurrently (with phase `InProgress`). If not specified, the default value of 1 will be used.
### Consideration of backup overlap and concurrent backup processing
The primary consideration for running additional backups concurrently is the configured `concurrent-backups` parameter. If the total number of `InProgress` and `ReadyToStart` backups is equal to `concurrent-backups` then any `Queued` backups will remain in the queue.
The second consideration is backup overlap. In order to prevent interaction between running backups (particularly around volume backup and pod hooks), we cannot allow two overlapping backups to run at the same time. For now, we will define overlap broadly -- requiring that two concurrent backups don't include any of the same namespaces. A backup for `ns1` can run concurrently with a backup for `ns2`, but a backup for `[ns1,ns2]` cannot run concurrently with a backup for `ns1`. One consequence of this approach is that a backup which includes all namespaces (even if further filtered by resource or label) cannot run concurrently with *any other backup*.
When determining which queued backup to run next, velero will look for the next queued backup which has no overlap with any InProgress backup or any Queued backup ahead of it. The reason we need to consider queued as well as running backups for overlap detection is as follows.
Consider the following scenario. These are the current not-completed backups (ordered from oldest to newest)
1. backup1, includedNamespaces: [ns1, ns2], phase: InProgress
2. backup2, includedNamespaces: [ns2, ns3, ns5], phase: Queued, QueuePosition: 1
3. backup3, includedNamespaces: [ns4, ns3], phase: Queued, QueuePosition: 2
4. backup4, includedNamespaces: [ns5, ns6], phase: Queued, QueuePosition: 2
5. backup5, includedNamespaces: [ns8, ns9], phase: Queued, QueuePosition: 3
Assuming `concurrent-backups` is 2, on the next reconcile, Velero will be able to start a second backup if there is one with no overlap. `backup2` cannot run, since `ns2` overlaps between it and the running `backup1`. If we only considered running overlap (and not queued overlap), then `backup3` could run now. It conflicts with the queued `backup2` on `ns3` but it does not conflict with the running backup. However, if it runs now, then when `backup1` completes, then `backup2` still can't run (since it now overlaps with running `backup3`on `ns3`), so `backup4` starts instead. Now when `backup3` completes, `backup2` still can't run (since it now conflicts with `backup4` on `ns5`). This means that even though it was the second backup created, it's the fourth to run -- providing worse time to completion than without parallel backups. If a queued backup has a large number of namespaces (a full-cluster backup for example), it would never run as long as new single-namespace backups keep being added to the queue.
To resolve this problem we consider both running backups as well as backups ahead in the queue when resolving overlap conflicts. In the above scenario, `backup2` can't run yet since it overlaps with the running backup on `ns2`. In addition, `backup3` and `backup4` also can't run yet since they overlap with queued `backup2`. Therefore, `backup5` will run now. Once `backup1` completes, `backup2` will be free to run.
### Backup CRD changes
New Backup phases:
```go
const (
// BackupPhaseQueued means the backup has been added to the
// queue by the BackupQueueReconciler.
BackupPhaseQueued BackupPhase = "Queued"
// BackupPhaseReadyToStart means the backup has been removed from the
// queue by the BackupQueueReconciler and is ready to start.
BackupPhaseReadyToStart BackupPhase = "ReadyToStart"
)
```
In addition, a new Status field, `queuePosition`, will be added to track the backup's current position in the queue.
```go
// QueuePosition is the position held by the backup in the queue.
// QueuePosition=1 means this backup is the next to be considered.
// Only relevant when Phase is "Queued"
// +optional
QueuePosition int `json:"queuePosition,omitempty"`
```
### New Controller: `backupQueueReconciler`
A new reconciler will be added, `backupQueueReconciler` which will reconcile backups under these conditions:
1) Watching Create/Update for backups in `New` (or empty) phase
2) Watching for Backup phase transition from `InProgress` to something else to reconcile all `Queued` backups
2) Watching for Backup phase transition from `New` (or empty) to `Queued` to reconcile all `Queued` backups
2) Periodic reconcile of `Queued` backups to handle backups queued at server startup as well as to make sure we never have a situation where backups are queued indefinitely because of a race condition or was otherwise missed in the reconcile on prior backup completion.
The reconciler will be set up as follows -- note that New backups are reconciled on Create/Update, while Queued backups are reconciled when an InProgress backup moves on to another state or when a new backup moves to the Queued state. We also reconcile Queued backups periodically to handle the case of a Velero pod restart with Queued backups, as well as to handle possible edge cases where a queued backup doesn't get moved out of the queue at the point of backup completion or an error occurs during a prior Queued backup reconcile.
```go
func (c *backupOperationsReconciler) SetupWithManager(mgr ctrl.Manager) error {
// only consider Queued backups, order by QueuePosition
gp := kube.NewGenericEventPredicate(func(object client.Object) bool {
backup := object.(*velerov1api.Backup)
return (backup.Status.Phase == velerov1api.BackupPhaseQueued)
})
s := kube.NewPeriodicalEnqueueSource(c.logger.WithField("controller", constant.ControllerBackupOperations), mgr.GetClient(), &velerov1api.BackupList{}, c.frequency, kube.PeriodicalEnqueueSourceOption{
Predicates: []predicate.Predicate{gp},
OrderFunc: queuePositionOrderFunc,
})
return ctrl.NewControllerManagedBy(mgr).
For(&velerov1api.Backup{}, builder.WithPredicates(predicate.Funcs{
UpdateFunc: func(ue event.UpdateEvent) bool {
backup := ue.ObjectNew.(*velerov1api.Backup)
return backup.Status.Phase == "" || backup.status.Phase == velerov1api.BackupPhaseNew
},
CreateFunc: func(event.CreateEvent) bool {
return backup.Status.Phase == "" || backup.status.Phase == velerov1api.BackupPhaseNew
},
DeleteFunc: func(de event.DeleteEvent) bool {
return false
},
GenericFunc: func(ge event.GenericEvent) bool {
return false
},
})).
Watch(
&source.Kind{Type: &velerov1api.Backup{}},
&handler.EnqueueRequestsFromMapFunc{
ToRequests: handler.ToRequestsFunc(func(a handler.MapObject) []reconcile.Request {
backupList := velerov1api.BackupList{}
if err := p.List(ctx, backupList); err != nil {
p.logger.WithError(err).Error("error listing backups")
return
}
requests = []reconcile.request{}
// filter backup list by Phase=queued
// sort backup list by queuePosition
return requests
}),
},
builder.WithPredicates(predicate.Funcs{
UpdateFunc: func(ue event.UpdateEvent) bool {
oldBackup := ue.ObjectOld.(*velerov1api.Backup)
newBackup := ue.ObjectNew.(*velerov1api.Backup)
return oldBackup.Status.Phase == velerov1api.BackupPhaseInProgress &&
newBackup.Status.Phase != velerov1api.BackupPhaseInProgress ||
oldBackup.Status.Phase != velerov1api.BackupPhaseQueued &&
newBackup.Status.Phase == velerov1api.BackupPhaseQueued
},
CreateFunc: func(event.CreateEvent) bool {
return false
},
DeleteFunc: func(de event.DeleteEvent) bool {
return false
},
GenericFunc: func(ge event.GenericEvent) bool {
return false
},
}).
WatchesRawSource(s).
Named(constant.ControllerBackupQueue).
Complete(c)
}
```
New backups will be queued: Phase will be set to `Queued`, and `QueuePosition` will be set to a int value incremented from the highest current `QueuePosition` value among Queued backups.
Queued backups will be removed from the queue if runnable:
1) If the total number of backups either InProgress or ReadyToStart is greater than or equal to the concurrency limit, then exit without removing from the queue.
2) If the current backup overlaps with any InProgress, ReadyToStart, or Queued backup with `QueuePosition < currentBackup.QueuePosition` then exit without removing from the queue.
3) If we get here, the backup is runnable. To resolve a potential race condition where an InProgress backup completes between reconciling the backup with QueuePosition `n-1` and reconciling the current backup with QueuePosition `n`, we also check to see whether there are any runnable backups in the queue ahead of this one. The only time this will happen is if a backup completes immediately before reconcile starts which either frees up a concurrency slot or removes a namespace conflict. In this case, we don't want to run the current backup since the one ahead of this one in the queue (which was recently passed over before the InProgress backup completed) must run first. In this case, exit without removing from the queue.
4) If we get here, remove the backup from the queue by setting Phase to `ReadyToStart` and `QueuePosition` to zero. Decrement the `QueuePosition` of any other Queued backups with a `QueuePosition` higher than the current backup's queue position prior to dequeuing. At this point, the backup reconciler will start the backup.
`if len(inProgressBackups)+len(pendingStartBackups) >= concurrentBackups`
```
switch original.Status.Phase {
case "", velerov1api.BackupPhaseNew:
// enqueue backup -- set phase=Queued, set queuePosition=maxCurrentQueuePosition+1
}
// We should only ever get these events when added in order by the periodical enqueue source
// so as long as the current backup has not conflicts ahead of it or running, we should be good to
// dequeue
case "", velerov1api.BackupPhaseQueued:
// list backups, filter on Queued, ReadyToStart, and InProgress
// if number of InProgress backups + number of ReadyToStart backups >= concurrency limit, exit
// generate list of all namespaces included in InProgress, ReadyToStart, and Queued backups with
// queuePosition < backup.Status.QueuePosition
// if overlap found, exit
// check backups ahead of this one in the queue for runnability. If any are runnable, exit
// dequeue backup: set Phase to ReadyToStart, QueuePosition to 0, and decrement QueuePosition
// for all QueuedBackups behind this one in the queue
}
```
The queue controller will run as a single reconciler thread, so we will not need to deal with concurrency issues when moving backups from New to Queued or from Queued to ReadyToStart, and all of the updates to QueuePosition will be from a single thread.
### Updates to Backup controller
The Reconcile logic will be updated to respond to ReadyToStart backups instead of New backups:
```
@@ -234,8 +234,8 @@ func (b *backupReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
// InProgress, we still need this check so we can return nil to indicate we've finished processing
// this key (even though it was a no-op).
switch original.Status.Phase {
- case "", velerov1api.BackupPhaseNew:
- // only process new backups
+ case velerov1api.BackupPhaseReadyToStart:
+ // only process ReadyToStart backups
default:
b.logger.WithFields(logrus.Fields{
"backup": kubeutil.NamespaceAndName(original),
```
In addition, it will be configured to run in parallel by setting `MaxConcurrentReconciles` based on the `concurrent-backups` server arg.
```
@@ -149,6 +149,9 @@ func NewBackupReconciler(
func (b *backupReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&velerov1api.Backup{}).
+ WithOptions(controller.Options{
+ MaxConcurrentReconciles: concurrentBackups,
+ }).
Named(constant.ControllerBackup).
Complete(b)
}
```
The controller-runtime core reconciler logic already prevents the same resource from being reconciled by two different reconciler threads, so we don't need to worry about concurrency issues at the controller level.
The workerPool reference will be moved from the backupReconciler to the backupRequest, since this will now be backup-specific, and the initialization code for the worker pool will be moved from the reconciler init into the backup reconcile. This worker pool will be shut down upon exiting the Reconcile method.
### Resilience to restart of velero pod
The new backup phases (`Queued` and `ReadyToStart`) will be resilient to velero pod restarts. If the velero pod crashes or is restarted, only backups in the `InProgress` phase will be failed, so there is no change to current behavior. Queued backups will retain their queue position on restart, and ReadyToStart backups will move to InProgress when reconciled.
### Observability
#### Logging
When a backup is dequeued, an info log message will also include the wait time, calculated as `now - creationTimestamp`. When a backup is passed over due to overlap, an info log message will indicate which namespaces were in conflict.
#### Velero CLI
The `velero backup describe` output will include the current queue position for queued backups.

View File

@@ -0,0 +1,52 @@
/*
Copyright 2019 the Velero contributors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package builder
import (
corev1api "k8s.io/api/core/v1"
schedulingv1api "k8s.io/api/scheduling/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
type PriorityClassBuilder struct {
object *schedulingv1api.PriorityClass
}
func ForPriorityClass(name string) *PriorityClassBuilder {
return &PriorityClassBuilder{
object: &schedulingv1api.PriorityClass{
ObjectMeta: metav1.ObjectMeta{
Name: name,
},
},
}
}
func (p *PriorityClassBuilder) Value(value int) *PriorityClassBuilder {
p.object.Value = int32(value)
return p
}
func (p *PriorityClassBuilder) PreemptionPolicy(policy string) *PriorityClassBuilder {
preemptionPolicy := corev1api.PreemptionPolicy(policy)
p.object.PreemptionPolicy = &preemptionPolicy
return p
}
func (p *PriorityClassBuilder) Result() *schedulingv1api.PriorityClass {
return p.object
}

View File

@@ -188,6 +188,7 @@ func (e *csiSnapshotExposer) Expose(ctx context.Context, ownerObject corev1api.O
backupPVCStorageClass := csiExposeParam.StorageClass
backupPVCReadOnly := false
spcNoRelabeling := false
backupPVCAnnotations := map[string]string{}
if value, exists := csiExposeParam.BackupPVCConfig[csiExposeParam.StorageClass]; exists {
if value.StorageClass != "" {
backupPVCStorageClass = value.StorageClass
@@ -201,9 +202,13 @@ func (e *csiSnapshotExposer) Expose(ctx context.Context, ownerObject corev1api.O
curLog.WithField("vs name", volumeSnapshot.Name).Warn("Ignoring spcNoRelabling for read-write volume")
}
}
if len(value.Annotations) > 0 {
backupPVCAnnotations = value.Annotations
}
}
backupPVC, err := e.createBackupPVC(ctx, ownerObject, backupVS.Name, backupPVCStorageClass, csiExposeParam.AccessMode, volumeSize, backupPVCReadOnly)
backupPVC, err := e.createBackupPVC(ctx, ownerObject, backupVS.Name, backupPVCStorageClass, csiExposeParam.AccessMode, volumeSize, backupPVCReadOnly, backupPVCAnnotations)
if err != nil {
return errors.Wrap(err, "error to create backup pvc")
}
@@ -485,7 +490,7 @@ func (e *csiSnapshotExposer) createBackupVSC(ctx context.Context, ownerObject co
return e.csiSnapshotClient.VolumeSnapshotContents().Create(ctx, vsc, metav1.CreateOptions{})
}
func (e *csiSnapshotExposer) createBackupPVC(ctx context.Context, ownerObject corev1api.ObjectReference, backupVS, storageClass, accessMode string, resource resource.Quantity, readOnly bool) (*corev1api.PersistentVolumeClaim, error) {
func (e *csiSnapshotExposer) createBackupPVC(ctx context.Context, ownerObject corev1api.ObjectReference, backupVS, storageClass, accessMode string, resource resource.Quantity, readOnly bool, annotations map[string]string) (*corev1api.PersistentVolumeClaim, error) {
backupPVCName := ownerObject.Name
volumeMode, err := getVolumeModeByAccessMode(accessMode)
@@ -507,8 +512,9 @@ func (e *csiSnapshotExposer) createBackupPVC(ctx context.Context, ownerObject co
pvc := &corev1api.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
Namespace: ownerObject.Namespace,
Name: backupPVCName,
Namespace: ownerObject.Namespace,
Name: backupPVCName,
Annotations: annotations,
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: ownerObject.APIVersion,

View File

@@ -1001,8 +1001,9 @@ func Test_csiSnapshotExposer_createBackupPVC(t *testing.T) {
backupPVC := corev1api.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
Namespace: velerov1.DefaultNamespace,
Name: "fake-backup",
Namespace: velerov1.DefaultNamespace,
Name: "fake-backup",
Annotations: map[string]string{},
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: backup.APIVersion,
@@ -1031,8 +1032,9 @@ func Test_csiSnapshotExposer_createBackupPVC(t *testing.T) {
backupPVCReadOnly := corev1api.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
Namespace: velerov1.DefaultNamespace,
Name: "fake-backup",
Namespace: velerov1.DefaultNamespace,
Name: "fake-backup",
Annotations: map[string]string{},
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: backup.APIVersion,
@@ -1114,7 +1116,7 @@ func Test_csiSnapshotExposer_createBackupPVC(t *testing.T) {
APIVersion: tt.ownerBackup.APIVersion,
}
}
got, err := e.createBackupPVC(t.Context(), ownerObject, tt.backupVS, tt.storageClass, tt.accessMode, tt.resource, tt.readOnly)
got, err := e.createBackupPVC(t.Context(), ownerObject, tt.backupVS, tt.storageClass, tt.accessMode, tt.resource, tt.readOnly, map[string]string{})
if !tt.wantErr(t, err, fmt.Sprintf("createBackupPVC(%v, %v, %v, %v, %v, %v)", ownerObject, tt.backupVS, tt.storageClass, tt.accessMode, tt.resource, tt.readOnly)) {
return
}

View File

@@ -56,6 +56,9 @@ type BackupPVC struct {
// SPCNoRelabeling sets Spec.SecurityContext.SELinux.Type to "spc_t" for the pod mounting the backupPVC
// ignored if ReadOnly is false
SPCNoRelabeling bool `json:"spcNoRelabeling,omitempty"`
// Annotations permits setting annotations for the backupPVC
Annotations map[string]string `json:"annotations,omitempty"`
}
type RestorePVC struct {

View File

@@ -447,8 +447,13 @@ func GetVolumeSnapshotClassForStorageClass(
return &vsClass, nil
}
return nil, fmt.Errorf(
"failed to get VolumeSnapshotClass for provisioner %s, ensure that the desired VolumeSnapshot class has the %s label or %s annotation",
provisioner, velerov1api.VolumeSnapshotClassSelectorLabel, velerov1api.VolumeSnapshotClassKubernetesAnnotation)
"failed to get VolumeSnapshotClass for provisioner %s: "+
"ensure that the desired VolumeSnapshotClass has the %s label or %s annotation, "+
"and that its driver matches the StorageClass provisioner",
provisioner,
velerov1api.VolumeSnapshotClassSelectorLabel,
velerov1api.VolumeSnapshotClassKubernetesAnnotation,
)
}
// IsVolumeSnapshotClassHasListerSecret returns whether a volumesnapshotclass has a snapshotlister secret

View File

@@ -37,6 +37,9 @@ default the source PVC's storage class will be used.
The users can specify the ConfigMap name during velero installation by CLI:
`velero install --node-agent-configmap=<ConfigMap-Name>`
- `annotations`: permits to set annotations on the backupPVC itself. typically useful for some CSI provider which cannot mount
a VolumeSnapshot without a custom annotation.
A sample of `backupPVC` config as part of the ConfigMap would look like:
```json
{
@@ -49,8 +52,11 @@ A sample of `backupPVC` config as part of the ConfigMap would look like:
"storageClass": "backupPVC-storage-class"
},
"storage-class-3": {
"readOnly": true
}
"readOnly": true,
"annotations": {
"some-csi.provider.io/readOnlyClone": true
}
},
"storage-class-4": {
"readOnly": true,
"spcNoRelabeling": true

View File

@@ -39,10 +39,12 @@ import (
. "github.com/vmware-tanzu/velero/test/e2e/basic/resources-check"
. "github.com/vmware-tanzu/velero/test/e2e/bsl-mgmt"
. "github.com/vmware-tanzu/velero/test/e2e/migration"
. "github.com/vmware-tanzu/velero/test/e2e/nodeagentconfig"
. "github.com/vmware-tanzu/velero/test/e2e/parallelfilesdownload"
. "github.com/vmware-tanzu/velero/test/e2e/parallelfilesupload"
. "github.com/vmware-tanzu/velero/test/e2e/privilegesmgmt"
. "github.com/vmware-tanzu/velero/test/e2e/pv-backup"
. "github.com/vmware-tanzu/velero/test/e2e/repomaintenance"
. "github.com/vmware-tanzu/velero/test/e2e/resource-filtering"
. "github.com/vmware-tanzu/velero/test/e2e/resourcemodifiers"
. "github.com/vmware-tanzu/velero/test/e2e/resourcepolicies"
@@ -660,6 +662,24 @@ var _ = Describe(
ParallelFilesDownloadTest,
)
var _ = Describe(
"Test Repository Maintenance Job Configuration's global part",
Label("RepoMaintenance", "LongTime"),
GlobalRepoMaintenanceTest,
)
var _ = Describe(
"Test Repository Maintenance Job Configuration's specific part",
Label("RepoMaintenance", "LongTime"),
SpecificRepoMaintenanceTest,
)
var _ = Describe(
"Test node agent config's LoadAffinity part",
Label("NodeAgentConfig", "LoadAffinity"),
LoadAffinities,
)
func GetKubeConfigContext() error {
var err error
var tcDefault, tcStandby k8s.TestClient
@@ -740,6 +760,12 @@ var _ = BeforeSuite(func() {
).To(Succeed())
}
By("Install PriorityClasses for E2E.")
Expect(veleroutil.CreatePriorityClasses(
context.Background(),
test.VeleroCfg.ClientToInstallVelero.Kubebuilder,
)).To(Succeed())
if test.InstallVelero {
By("Install test resources before testing")
Expect(
@@ -764,6 +790,8 @@ var _ = AfterSuite(func() {
test.StorageClassName,
),
).To(Succeed())
By("Delete PriorityClasses created by E2E")
Expect(
k8s.DeleteStorageClass(
ctx,
@@ -783,6 +811,11 @@ var _ = AfterSuite(func() {
).To(Succeed())
}
Expect(veleroutil.DeletePriorityClasses(
ctx,
test.VeleroCfg.ClientToInstallVelero.Kubebuilder,
)).To(Succeed())
// If the Velero is installed during test, and the FailFast is not enabled,
// uninstall Velero. If not, either Velero is not installed, or kept it for debug on failure.
if test.InstallVelero && (testSuitePassed || !test.VeleroCfg.FailFast) {

View File

@@ -342,6 +342,12 @@ func (m *migrationE2E) Restore() error {
Expect(veleroutil.InstallStorageClasses(
m.VeleroCfg.StandbyClusterCloudProvider)).To(Succeed())
By("Install PriorityClass for E2E.")
Expect(veleroutil.CreatePriorityClasses(
context.Background(),
test.VeleroCfg.StandbyClient.Kubebuilder,
)).To(Succeed())
if strings.EqualFold(m.VeleroCfg.Features, test.FeatureCSI) &&
m.VeleroCfg.UseVolumeSnapshots {
By("Install VolumeSnapshotClass for E2E.")
@@ -447,6 +453,7 @@ func (m *migrationE2E) Clean() error {
Expect(k8sutil.KubectlConfigUseContext(
m.Ctx, m.VeleroCfg.StandbyClusterContext)).To(Succeed())
m.VeleroCfg.ClientToInstallVelero = m.VeleroCfg.StandbyClient
m.VeleroCfg.ClusterToInstallVelero = m.VeleroCfg.StandbyClusterName
@@ -459,7 +466,6 @@ func (m *migrationE2E) Clean() error {
fmt.Println("Fail to delete StorageClass1: ", err)
return
}
if err := k8sutil.DeleteStorageClass(
m.Ctx,
*m.VeleroCfg.ClientToInstallVelero,
@@ -469,6 +475,12 @@ func (m *migrationE2E) Clean() error {
return
}
By("Delete PriorityClasses created by E2E")
Expect(veleroutil.DeletePriorityClasses(
m.Ctx,
m.VeleroCfg.ClientToInstallVelero.Kubebuilder,
)).To(Succeed())
if strings.EqualFold(m.VeleroCfg.Features, test.FeatureCSI) &&
m.VeleroCfg.UseVolumeSnapshots {
By("Delete VolumeSnapshotClass created by E2E")

View File

@@ -0,0 +1,347 @@
/*
Copyright the Velero contributors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package nodeagentconfig
import (
"context"
"encoding/json"
"fmt"
"strings"
"time"
. "github.com/onsi/gomega"
"github.com/pkg/errors"
corev1api "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/util/wait"
"sigs.k8s.io/controller-runtime/pkg/client"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
velerov2alpha1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v2alpha1"
"github.com/vmware-tanzu/velero/pkg/builder"
velerotypes "github.com/vmware-tanzu/velero/pkg/types"
"github.com/vmware-tanzu/velero/pkg/util/kube"
velerokubeutil "github.com/vmware-tanzu/velero/pkg/util/kube"
"github.com/vmware-tanzu/velero/test"
. "github.com/vmware-tanzu/velero/test/e2e/test"
k8sutil "github.com/vmware-tanzu/velero/test/util/k8s"
veleroutil "github.com/vmware-tanzu/velero/test/util/velero"
)
type NodeAgentConfigTestCase struct {
TestCase
nodeAgentConfigs velerotypes.NodeAgentConfigs
nodeAgentConfigMapName string
}
var LoadAffinities func() = TestFunc(&NodeAgentConfigTestCase{
nodeAgentConfigs: velerotypes.NodeAgentConfigs{
LoadAffinity: []*kube.LoadAffinity{
{
NodeSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"beta.kubernetes.io/arch": "amd64",
},
},
StorageClass: test.StorageClassName,
},
{
NodeSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"kubernetes.io/arch": "amd64",
},
},
StorageClass: test.StorageClassName2,
},
},
BackupPVCConfig: map[string]velerotypes.BackupPVC{
test.StorageClassName: {
StorageClass: test.StorageClassName2,
},
},
RestorePVCConfig: &velerotypes.RestorePVC{
IgnoreDelayBinding: true,
},
PriorityClassName: test.PriorityClassNameForDataMover,
},
nodeAgentConfigMapName: "node-agent-config",
})
func (n *NodeAgentConfigTestCase) Init() error {
// generate random number as UUIDgen and set one default timeout duration
n.TestCase.Init()
// generate variable names based on CaseBaseName + UUIDgen
n.CaseBaseName = "node-agent-config-" + n.UUIDgen
n.BackupName = "backup-" + n.CaseBaseName
n.RestoreName = "restore-" + n.CaseBaseName
// generate namespaces by NamespacesTotal
n.NamespacesTotal = 1
n.NSIncluded = &[]string{}
for nsNum := 0; nsNum < n.NamespacesTotal; nsNum++ {
createNSName := fmt.Sprintf("%s-%00000d", n.CaseBaseName, nsNum)
*n.NSIncluded = append(*n.NSIncluded, createNSName)
}
// assign values to the inner variable for specific case
n.VeleroCfg.UseNodeAgent = true
n.VeleroCfg.UseNodeAgentWindows = true
// Need to verify the data mover pod content, so don't wait until backup completion.
n.BackupArgs = []string{
"create", "--namespace", n.VeleroCfg.VeleroNamespace, "backup", n.BackupName,
"--include-namespaces", strings.Join(*n.NSIncluded, ","),
"--snapshot-volumes=true", "--snapshot-move-data",
}
// Need to verify the data mover pod content, so don't wait until restore completion.
n.RestoreArgs = []string{
"create", "--namespace", n.VeleroCfg.VeleroNamespace, "restore", n.RestoreName,
"--from-backup", n.BackupName,
}
// Message output by ginkgo
n.TestMsg = &TestMSG{
Desc: "Validate Node Agent ConfigMap configuration",
FailedMSG: "Failed to apply and / or validate configuration in VGDP pod.",
Text: "Should be able to apply and validate configuration in VGDP pod.",
}
return nil
}
func (n *NodeAgentConfigTestCase) InstallVelero() error {
// Because this test needs to use customized Node Agent ConfigMap,
// need to uninstall and reinstall Velero.
fmt.Println("Start to uninstall Velero")
if err := veleroutil.VeleroUninstall(n.Ctx, n.VeleroCfg); err != nil {
fmt.Printf("Fail to uninstall Velero: %s\n", err.Error())
return err
}
result, err := json.Marshal(n.nodeAgentConfigs)
if err != nil {
return err
}
repoMaintenanceConfig := builder.ForConfigMap(n.VeleroCfg.VeleroNamespace, n.nodeAgentConfigMapName).
Data("node-agent-config", string(result)).Result()
n.VeleroCfg.NodeAgentConfigMap = n.nodeAgentConfigMapName
return veleroutil.PrepareVelero(
n.Ctx,
n.CaseBaseName,
n.VeleroCfg,
repoMaintenanceConfig,
)
}
func (n *NodeAgentConfigTestCase) CreateResources() error {
for _, ns := range *n.NSIncluded {
if err := k8sutil.CreateNamespace(n.Ctx, n.Client, ns); err != nil {
fmt.Printf("Fail to create ns %s: %s\n", ns, err.Error())
return err
}
pvc, err := k8sutil.CreatePVC(n.Client, ns, "volume-1", test.StorageClassName, nil)
if err != nil {
fmt.Printf("Fail to create PVC %s: %s\n", "volume-1", err.Error())
return err
}
vols := k8sutil.CreateVolumes(pvc.Name, []string{"volume-1"})
deployment := k8sutil.NewDeployment(
n.CaseBaseName,
(*n.NSIncluded)[0],
1,
map[string]string{"app": "test"},
n.VeleroCfg.ImageRegistryProxy,
n.VeleroCfg.WorkerOS,
).WithVolume(vols).Result()
deployment, err = k8sutil.CreateDeployment(n.Client.ClientGo, ns, deployment)
if err != nil {
fmt.Printf("Fail to create deployment %s: %s \n", deployment.Name, err.Error())
return errors.Wrap(err, fmt.Sprintf("failed to create deployment: %s", err.Error()))
}
if err := k8sutil.WaitForReadyDeployment(n.Client.ClientGo, deployment.Namespace, deployment.Name); err != nil {
fmt.Printf("Fail to create deployment %s: %s\n", n.CaseBaseName, err.Error())
return err
}
}
return nil
}
func (n *NodeAgentConfigTestCase) Backup() error {
if err := veleroutil.VeleroCmdExec(n.Ctx, n.VeleroCfg.VeleroCLI, n.BackupArgs); err != nil {
return err
}
backupPodList := new(corev1api.PodList)
wait.PollUntilContextTimeout(n.Ctx, 5*time.Second, 5*time.Minute, true, func(ctx context.Context) (bool, error) {
duList := new(velerov2alpha1api.DataUploadList)
if err := n.VeleroCfg.ClientToInstallVelero.Kubebuilder.List(
n.Ctx,
duList,
&client.ListOptions{Namespace: n.VeleroCfg.VeleroNamespace},
); err != nil {
fmt.Printf("Fail to list DataUpload: %s\n", err.Error())
return false, fmt.Errorf("Fail to list DataUpload: %w", err)
} else {
if len(duList.Items) <= 0 {
fmt.Println("No DataUpload found yet. Continue polling.")
return false, nil
}
}
if err := n.VeleroCfg.ClientToInstallVelero.Kubebuilder.List(
n.Ctx,
backupPodList,
&client.ListOptions{
LabelSelector: labels.SelectorFromSet(map[string]string{
velerov1api.DataUploadLabel: duList.Items[0].Name,
}),
}); err != nil {
fmt.Printf("Fail to list backupPod %s\n", err.Error())
return false, errors.Wrapf(err, "error to list backup pods")
} else {
if len(backupPodList.Items) <= 0 {
fmt.Println("No backupPod found yet. Continue polling.")
return false, nil
}
}
return true, nil
})
fmt.Println("Start to verify backupPod content.")
Expect(backupPodList.Items[0].Spec.PriorityClassName).To(Equal(n.nodeAgentConfigs.PriorityClassName))
// In backup, only the second element of LoadAffinity array should be used.
expectedAffinity := velerokubeutil.ToSystemAffinity(n.nodeAgentConfigs.LoadAffinity[1:])
Expect(backupPodList.Items[0].Spec.Affinity).To(Equal(expectedAffinity))
fmt.Println("backupPod content verification completed successfully.")
wait.PollUntilContextTimeout(n.Ctx, 5*time.Second, 5*time.Minute, true, func(ctx context.Context) (bool, error) {
backup := new(velerov1api.Backup)
if err := n.VeleroCfg.ClientToInstallVelero.Kubebuilder.Get(
n.Ctx,
client.ObjectKey{Namespace: n.VeleroCfg.VeleroNamespace, Name: n.BackupName},
backup,
); err != nil {
return false, err
}
if backup.Status.Phase != velerov1api.BackupPhaseCompleted &&
backup.Status.Phase != velerov1api.BackupPhaseFailed &&
backup.Status.Phase != velerov1api.BackupPhasePartiallyFailed {
fmt.Printf("backup status is %s. Continue polling until backup reach to a final state.\n", backup.Status.Phase)
return false, nil
}
return true, nil
})
return nil
}
func (n *NodeAgentConfigTestCase) Restore() error {
if err := veleroutil.VeleroCmdExec(n.Ctx, n.VeleroCfg.VeleroCLI, n.RestoreArgs); err != nil {
return err
}
restorePodList := new(corev1api.PodList)
wait.PollUntilContextTimeout(n.Ctx, 5*time.Second, 5*time.Minute, true, func(ctx context.Context) (bool, error) {
ddList := new(velerov2alpha1api.DataDownloadList)
if err := n.VeleroCfg.ClientToInstallVelero.Kubebuilder.List(
n.Ctx,
ddList,
&client.ListOptions{Namespace: n.VeleroCfg.VeleroNamespace},
); err != nil {
fmt.Printf("Fail to list DataDownload: %s\n", err.Error())
return false, fmt.Errorf("Fail to list DataDownload %w", err)
} else {
if len(ddList.Items) <= 0 {
fmt.Println("No DataDownload found yet. Continue polling.")
return false, nil
}
}
if err := n.VeleroCfg.ClientToInstallVelero.Kubebuilder.List(
n.Ctx,
restorePodList,
&client.ListOptions{
LabelSelector: labels.SelectorFromSet(map[string]string{
velerov1api.DataDownloadLabel: ddList.Items[0].Name,
}),
}); err != nil {
fmt.Printf("Fail to list restorePod %s\n", err.Error())
return false, errors.Wrapf(err, "error to list restore pods")
} else {
if len(restorePodList.Items) <= 0 {
fmt.Println("No restorePod found yet. Continue polling.")
return false, nil
}
}
return true, nil
})
fmt.Println("Start to verify restorePod content.")
Expect(restorePodList.Items[0].Spec.PriorityClassName).To(Equal(n.nodeAgentConfigs.PriorityClassName))
// In restore, only the first element of LoadAffinity array should be used.
expectedAffinity := velerokubeutil.ToSystemAffinity(n.nodeAgentConfigs.LoadAffinity[:1])
Expect(restorePodList.Items[0].Spec.Affinity).To(Equal(expectedAffinity))
fmt.Println("restorePod content verification completed successfully.")
wait.PollUntilContextTimeout(n.Ctx, 5*time.Second, 5*time.Minute, true, func(ctx context.Context) (bool, error) {
restore := new(velerov1api.Restore)
if err := n.VeleroCfg.ClientToInstallVelero.Kubebuilder.Get(
n.Ctx,
client.ObjectKey{Namespace: n.VeleroCfg.VeleroNamespace, Name: n.RestoreName},
restore,
); err != nil {
return false, err
}
if restore.Status.Phase != velerov1api.RestorePhaseCompleted &&
restore.Status.Phase != velerov1api.RestorePhaseFailed &&
restore.Status.Phase != velerov1api.RestorePhasePartiallyFailed {
fmt.Printf("restore status is %s. Continue polling until restore reach to a final state.\n", restore.Status.Phase)
return false, nil
}
return true, nil
})
return nil
}

View File

@@ -0,0 +1,247 @@
/*
Copyright 2021 the Velero contributors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package repomaintenance
import (
"encoding/json"
"fmt"
"strings"
"time"
. "github.com/onsi/gomega"
"github.com/pkg/errors"
batchv1api "k8s.io/api/batch/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
"sigs.k8s.io/controller-runtime/pkg/client"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/builder"
velerotypes "github.com/vmware-tanzu/velero/pkg/types"
"github.com/vmware-tanzu/velero/pkg/util/kube"
velerokubeutil "github.com/vmware-tanzu/velero/pkg/util/kube"
"github.com/vmware-tanzu/velero/test"
. "github.com/vmware-tanzu/velero/test/e2e/test"
k8sutil "github.com/vmware-tanzu/velero/test/util/k8s"
veleroutil "github.com/vmware-tanzu/velero/test/util/velero"
)
type RepoMaintenanceTestCase struct {
TestCase
repoMaintenanceConfigMapName string
repoMaintenanceConfigKey string
jobConfigs velerotypes.JobConfigs
}
var keepJobNum = 1
var GlobalRepoMaintenanceTest func() = TestFunc(&RepoMaintenanceTestCase{
repoMaintenanceConfigKey: "global",
repoMaintenanceConfigMapName: "global",
jobConfigs: velerotypes.JobConfigs{
KeepLatestMaintenanceJobs: &keepJobNum,
PodResources: &velerokubeutil.PodResources{
CPURequest: "100m",
MemoryRequest: "100Mi",
CPULimit: "200m",
MemoryLimit: "200Mi",
},
PriorityClassName: test.PriorityClassNameForRepoMaintenance,
},
})
var SpecificRepoMaintenanceTest func() = TestFunc(&RepoMaintenanceTestCase{
repoMaintenanceConfigKey: "",
repoMaintenanceConfigMapName: "specific",
jobConfigs: velerotypes.JobConfigs{
KeepLatestMaintenanceJobs: &keepJobNum,
PodResources: &velerokubeutil.PodResources{
CPURequest: "100m",
MemoryRequest: "100Mi",
CPULimit: "200m",
MemoryLimit: "200Mi",
},
PriorityClassName: test.PriorityClassNameForRepoMaintenance,
},
})
func (r *RepoMaintenanceTestCase) Init() error {
// generate random number as UUIDgen and set one default timeout duration
r.TestCase.Init()
// generate variable names based on CaseBaseName + UUIDgen
r.CaseBaseName = "repo-maintenance-" + r.UUIDgen
r.BackupName = "backup-" + r.CaseBaseName
r.RestoreName = "restore-" + r.CaseBaseName
// generate namespaces by NamespacesTotal
r.NamespacesTotal = 1
r.NSIncluded = &[]string{}
for nsNum := 0; nsNum < r.NamespacesTotal; nsNum++ {
createNSName := fmt.Sprintf("%s-%00000d", r.CaseBaseName, nsNum)
*r.NSIncluded = append(*r.NSIncluded, createNSName)
}
// If repoMaintenanceConfigKey is not set, it means testing the specific repo case.
// Need to assemble the BackupRepository name. The format is "volumeNamespace-bslName-uploaderName"
if r.repoMaintenanceConfigKey == "" {
r.repoMaintenanceConfigKey = (*r.NSIncluded)[0] + "-" + "default" + "-" + test.UploaderTypeKopia
}
// assign values to the inner variable for specific case
r.VeleroCfg.UseNodeAgent = true
r.VeleroCfg.UseNodeAgentWindows = true
r.BackupArgs = []string{
"create", "--namespace", r.VeleroCfg.VeleroNamespace, "backup", r.BackupName,
"--include-namespaces", strings.Join(*r.NSIncluded, ","),
"--snapshot-volumes=true", "--snapshot-move-data", "--wait",
}
// Message output by ginkgo
r.TestMsg = &TestMSG{
Desc: "Validate Repository Maintenance Job configuration",
FailedMSG: "Failed to apply and / or validate configuration in repository maintenance jobs.",
Text: "Should be able to apply and validate configuration in repository maintenance jobs.",
}
return nil
}
func (r *RepoMaintenanceTestCase) InstallVelero() error {
// Because this test needs to use customized repository maintenance ConfigMap,
// need to uninstall and reinstall Velero.
fmt.Println("Start to uninstall Velero")
if err := veleroutil.VeleroUninstall(r.Ctx, r.VeleroCfg); err != nil {
fmt.Printf("Fail to uninstall Velero: %s\n", err.Error())
return err
}
result, err := json.Marshal(r.jobConfigs)
if err != nil {
return err
}
repoMaintenanceConfig := builder.ForConfigMap(r.VeleroCfg.VeleroNamespace, r.repoMaintenanceConfigMapName).
Data(r.repoMaintenanceConfigKey, string(result)).Result()
r.VeleroCfg.RepoMaintenanceJobConfigMap = r.repoMaintenanceConfigMapName
return veleroutil.PrepareVelero(
r.Ctx,
r.CaseBaseName,
r.VeleroCfg,
repoMaintenanceConfig,
)
}
func (r *RepoMaintenanceTestCase) CreateResources() error {
for _, ns := range *r.NSIncluded {
if err := k8sutil.CreateNamespace(r.Ctx, r.Client, ns); err != nil {
fmt.Printf("Fail to create ns %s: %s\n", ns, err.Error())
return err
}
pvc, err := k8sutil.CreatePVC(r.Client, ns, "volume-1", test.StorageClassName, nil)
if err != nil {
fmt.Printf("Fail to create PVC %s: %s\n", "volume-1", err.Error())
return err
}
vols := k8sutil.CreateVolumes(pvc.Name, []string{"volume-1"})
deployment := k8sutil.NewDeployment(
r.CaseBaseName,
(*r.NSIncluded)[0],
1,
map[string]string{"app": "test"},
r.VeleroCfg.ImageRegistryProxy,
r.VeleroCfg.WorkerOS,
).WithVolume(vols).Result()
deployment, err = k8sutil.CreateDeployment(r.Client.ClientGo, ns, deployment)
if err != nil {
fmt.Printf("Fail to create deployment %s: %s \n", deployment.Name, err.Error())
return errors.Wrap(err, fmt.Sprintf("failed to create deployment: %s", err.Error()))
}
if err := k8sutil.WaitForReadyDeployment(r.Client.ClientGo, deployment.Namespace, deployment.Name); err != nil {
fmt.Printf("Fail to create deployment %s: %s\n", r.CaseBaseName, err.Error())
return err
}
}
return nil
}
func (r *RepoMaintenanceTestCase) Verify() error {
// Reduce the MaintenanceFrequency to 1 minute.
backupRepositoryList := new(velerov1api.BackupRepositoryList)
if err := r.VeleroCfg.ClientToInstallVelero.Kubebuilder.List(
r.Ctx,
backupRepositoryList,
&client.ListOptions{
Namespace: r.VeleroCfg.Namespace,
LabelSelector: labels.SelectorFromSet(map[string]string{velerov1api.VolumeNamespaceLabel: (*r.NSIncluded)[0]}),
},
); err != nil {
return err
}
if len(backupRepositoryList.Items) <= 0 {
return fmt.Errorf("fail list BackupRepository. no item is returned")
}
backupRepository := backupRepositoryList.Items[0]
updated := backupRepository.DeepCopy()
updated.Spec.MaintenanceFrequency = metav1.Duration{Duration: time.Minute}
if err := r.VeleroCfg.ClientToInstallVelero.Kubebuilder.Patch(r.Ctx, updated, client.MergeFrom(&backupRepository)); err != nil {
fmt.Printf("failed to patch BackupRepository %q: %s", backupRepository.GetName(), err.Error())
return err
}
// The minimal time unit of Repository Maintenance is 5 minutes.
// Wait for more than one cycles to make sure the result is valid.
time.Sleep(6 * time.Minute)
jobList := new(batchv1api.JobList)
if err := r.VeleroCfg.ClientToInstallVelero.Kubebuilder.List(r.Ctx, jobList, &client.ListOptions{
Namespace: r.VeleroCfg.Namespace,
LabelSelector: labels.SelectorFromSet(map[string]string{"velero.io/repo-name": backupRepository.Name}),
}); err != nil {
return nil
}
resources, err := kube.ParseResourceRequirements(
r.jobConfigs.PodResources.CPURequest,
r.jobConfigs.PodResources.MemoryRequest,
r.jobConfigs.PodResources.CPULimit,
r.jobConfigs.PodResources.MemoryLimit,
)
if err != nil {
return errors.Wrap(err, "failed to parse resource requirements for maintenance job")
}
Expect(jobList.Items[0].Spec.Template.Spec.Containers[0].Resources).To(Equal(resources))
Expect(jobList.Items).To(HaveLen(*r.jobConfigs.KeepLatestMaintenanceJobs))
Expect(jobList.Items[0].Spec.Template.Spec.PriorityClassName).To(Equal(r.jobConfigs.PriorityClassName))
return nil
}

View File

@@ -41,6 +41,7 @@ depends on your test patterns.
*/
type VeleroBackupRestoreTest interface {
Init() error
InstallVelero() error
CreateResources() error
Backup() error
Destroy() error
@@ -109,6 +110,10 @@ func (t *TestCase) GenerateUUID() string {
return fmt.Sprintf("%08d", rand.IntN(100000000))
}
func (t *TestCase) InstallVelero() error {
return PrepareVelero(context.Background(), t.GetTestCase().CaseBaseName, t.GetTestCase().VeleroCfg)
}
func (t *TestCase) CreateResources() error {
return nil
}
@@ -221,7 +226,8 @@ func RunTestCase(test VeleroBackupRestoreTest) error {
fmt.Printf("Running test case %s %s\n", test.GetTestMsg().Desc, time.Now().Format("2006-01-02 15:04:05"))
if InstallVelero {
Expect(PrepareVelero(context.Background(), test.GetTestCase().CaseBaseName, test.GetTestCase().VeleroCfg)).To(Succeed())
fmt.Printf("Install Velero for test case %s: %s", test.GetTestCase().CaseBaseName, time.Now().Format("2006-01-02 15:04:05"))
Expect(test.InstallVelero()).To(Succeed())
}
defer test.Clean()

View File

@@ -57,6 +57,11 @@ const (
BackupRepositoryConfigName = "backup-repository-config"
)
const (
PriorityClassNameForDataMover = "data-mover"
PriorityClassNameForRepoMaintenance = "repo-maintenance"
)
var PublicCloudProviders = []string{AWS, Azure, GCP, Vsphere}
var LocalCloudProviders = []string{Kind, VanillaZFS}
var CloudProviders = append(PublicCloudProviders, LocalCloudProviders...)

View File

@@ -36,6 +36,7 @@ import (
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"k8s.io/apimachinery/pkg/util/wait"
clientset "k8s.io/client-go/kubernetes"
"sigs.k8s.io/controller-runtime/pkg/client"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/cmd/cli/install"
@@ -56,7 +57,17 @@ type installOptions struct {
WorkerOS string
}
func VeleroInstall(ctx context.Context, veleroCfg *test.VeleroConfig, isStandbyCluster bool) error {
/*
VeleroInstall is used to install Velero for E2E test
params:
ctx: The context
veleroCfg: Velero E2E case configuration
isStandbyCluster: Whether Velero is installed on standby cluster
objects: The objects are installed in Velero installed namespace, e.g. the ConfigMaps.
*/
func VeleroInstall(ctx context.Context, veleroCfg *test.VeleroConfig, isStandbyCluster bool, objects ...client.Object) error {
fmt.Printf("Velero install %s\n", time.Now().Format("2006-01-02 15:04:05"))
// veleroCfg struct including a set of BSL params and a set of additional BSL params,
@@ -152,6 +163,15 @@ func VeleroInstall(ctx context.Context, veleroCfg *test.VeleroConfig, isStandbyC
veleroCfg.VeleroNamespace,
)
}
veleroCfg.BackupRepoConfigMap = test.BackupRepositoryConfigName
// Install the passed-in objects in Velero installed namespace
for _, obj := range objects {
if err := veleroCfg.ClientToInstallVelero.Kubebuilder.Create(ctx, obj); err != nil {
fmt.Printf("fail to create object %s in namespace %s: %s\n", obj.GetName(), obj.GetNamespace(), err.Error())
return fmt.Errorf("fail to create object %s in namespace %s: %w", obj.GetName(), obj.GetNamespace(), err)
}
}
// For AWS IRSA credential test, AWS IAM service account is required, so if ServiceAccountName and EKSPolicyARN
// are both provided, we assume IRSA test is running, otherwise skip this IAM service account creation part.
@@ -635,7 +655,7 @@ func patchResources(resources *unstructured.UnstructuredList, namespace string,
APIVersion: corev1api.SchemeGroupVersion.String(),
},
ObjectMeta: metav1.ObjectMeta{
Name: "restic-restore-action-config",
Name: "fs-restore-action-config",
Namespace: namespace,
Labels: map[string]string{
"velero.io/plugin-config": "",
@@ -652,7 +672,7 @@ func patchResources(resources *unstructured.UnstructuredList, namespace string,
return errors.Wrapf(err, "failed to convert restore action config to unstructure")
}
resources.Items = append(resources.Items, un)
fmt.Printf("the restic restore helper image is set by the configmap %q \n", "restic-restore-action-config")
fmt.Printf("the restic restore helper image is set by the configmap %q \n", "fs-restore-action-config")
}
return nil
@@ -790,7 +810,7 @@ func CheckBSL(ctx context.Context, ns string, bslName string) error {
return err
}
func PrepareVelero(ctx context.Context, caseName string, veleroCfg test.VeleroConfig) error {
func PrepareVelero(ctx context.Context, caseName string, veleroCfg test.VeleroConfig, objects ...client.Object) error {
ready, err := IsVeleroReady(context.Background(), &veleroCfg)
if err != nil {
fmt.Printf("error in checking velero status with %v", err)
@@ -804,7 +824,7 @@ func PrepareVelero(ctx context.Context, caseName string, veleroCfg test.VeleroCo
return nil
}
fmt.Printf("need to install velero for case %s \n", caseName)
return VeleroInstall(context.Background(), &veleroCfg, false)
return VeleroInstall(context.Background(), &veleroCfg, false, objects...)
}
func VeleroUninstall(ctx context.Context, veleroCfg test.VeleroConfig) error {

View File

@@ -38,6 +38,8 @@ import (
"github.com/pkg/errors"
"golang.org/x/mod/semver"
schedulingv1api "k8s.io/api/scheduling/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
ver "k8s.io/apimachinery/pkg/util/version"
"k8s.io/apimachinery/pkg/util/wait"
@@ -45,9 +47,11 @@ import (
"github.com/vmware-tanzu/velero/internal/volume"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/builder"
cliinstall "github.com/vmware-tanzu/velero/pkg/cmd/cli/install"
"github.com/vmware-tanzu/velero/pkg/cmd/util/flag"
veleroexec "github.com/vmware-tanzu/velero/pkg/util/exec"
"github.com/vmware-tanzu/velero/test"
. "github.com/vmware-tanzu/velero/test"
common "github.com/vmware-tanzu/velero/test/util/common"
. "github.com/vmware-tanzu/velero/test/util/k8s"
@@ -274,6 +278,9 @@ func getProviderVeleroInstallOptions(veleroCfg *VeleroConfig,
io.ItemBlockWorkerCount = veleroCfg.ItemBlockWorkerCount
io.ServerPriorityClassName = veleroCfg.ServerPriorityClassName
io.NodeAgentPriorityClassName = veleroCfg.NodeAgentPriorityClassName
io.RepoMaintenanceJobConfigMap = veleroCfg.RepoMaintenanceJobConfigMap
io.BackupRepoConfigMap = veleroCfg.BackupRepoConfigMap
io.NodeAgentConfigMap = veleroCfg.NodeAgentConfigMap
return io, nil
}
@@ -1812,3 +1819,43 @@ func KubectlGetAllDeleteBackupRequest(ctx context.Context, backupName, veleroNam
return common.GetListByCmdPipes(ctx, cmds)
}
func CreatePriorityClasses(ctx context.Context, client kbclient.Client) error {
dataMoverPriorityClass := builder.ForPriorityClass(test.PriorityClassNameForDataMover).
Value(90000).PreemptionPolicy("Never").Result()
if err := client.Create(ctx, dataMoverPriorityClass); err != nil {
fmt.Printf("Fail to create PriorityClass %s: %s\n", test.PriorityClassNameForDataMover, err.Error())
return fmt.Errorf("fail to create PriorityClass %s: %w", test.PriorityClassNameForDataMover, err)
}
repoMaintenancePriorityClass := builder.ForPriorityClass(test.PriorityClassNameForRepoMaintenance).
Value(80000).PreemptionPolicy("Never").Result()
if err := client.Create(ctx, repoMaintenancePriorityClass); err != nil {
fmt.Printf("Fail to create PriorityClass %s: %s\n", test.PriorityClassNameForRepoMaintenance, err.Error())
return fmt.Errorf("fail to create PriorityClass %s: %w", test.PriorityClassNameForRepoMaintenance, err)
}
return nil
}
func DeletePriorityClasses(ctx context.Context, client kbclient.Client) error {
priorityClassDataMover := &schedulingv1api.PriorityClass{
ObjectMeta: metav1.ObjectMeta{
Name: test.PriorityClassNameForDataMover,
},
}
if err := client.Delete(ctx, priorityClassDataMover); err != nil {
return err
}
priorityClassRepoMaintenance := &schedulingv1api.PriorityClass{
ObjectMeta: metav1.ObjectMeta{
Name: test.PriorityClassNameForRepoMaintenance,
},
}
if err := client.Delete(ctx, priorityClassRepoMaintenance); err != nil {
return err
}
return nil
}