Commit Graph

6063 Commits

Author SHA1 Message Date
Xun Jiang/Bruce Jiang
fd99ed4dd6 Merge pull request #9696 from adam-jian-zhang/fix-restore-pvr-scope-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m10s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 13s
Main CI / Build (push) Has been skipped
Fix PodVolumeBackup list scope during restore
2026-04-10 16:01:59 +08:00
Adam Zhang
0291c53e9d Fix PodVolumeBackup list scope during restore
Restrict the listing of PodVolumeBackup resources to the specific
restore namespace in both the core restore controller and the pod
volume restore action plugin. This prevents "Forbidden" errors when
Velero is configured with namespace-scoped minimum privileges,
avoiding the need for cluster-scoped list permissions for
PodVolumeBackups.

Fixes: #9681

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-04-09 09:57:04 +08:00
Shubham Pampattiwar
5ad4e604b8 [release-1.18] Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver (#9687)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m4s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 12s
Main CI / Build (push) Has been skipped
* Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver (#9516)

* Fix VolumeGroupSnapshot restore on Ceph RBD

This PR fixes two related issues affecting CSI snapshot restore on Ceph RBD:

1. VolumeGroupSnapshot restore fails because Ceph RBD populates
   volumeGroupSnapshotHandle on pre-provisioned VSCs, but Velero doesn't
   create the required VGSC during restore.

2. CSI snapshot restore fails because VolumeSnapshotClassName is removed
   from restored VSCs, preventing the CSI controller from getting
   credentials for snapshot verification.

Changes:
- Capture volumeGroupSnapshotHandle during backup as VS annotation
- Create stub VGSC during restore with matching handle in status
- Look up VolumeSnapshotClass by driver and set on restored VSC

Fixes #9512
Fixes #9515

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for VGS restore fix

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix gofmt import order

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for VGS restore fix

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix import alias corev1 to corev1api per lint config

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix: Add snapshot handles to existing stub VGSC and add unit tests

When multiple VolumeSnapshots from the same VolumeGroupSnapshot are
restored, they share the same VolumeGroupSnapshotHandle but have
different individual snapshot handles. This commit:

1. Fixes incomplete logic where existing VGSC wasn't updated with
   new snapshot handles (addresses review feedback)

2. Fixes race condition where Create returning AlreadyExists would
   skip adding the snapshot handle

3. Adds comprehensive unit tests for ensureStubVGSCExists (5 cases)
   and addSnapshotHandleToVGSC (4 cases) functions

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Clean up stub VolumeGroupSnapshotContents during restore finalization

Add cleanup logic for stub VGSCs created during VolumeGroupSnapshot restore.
The stub VGSCs are temporary objects needed to satisfy CSI controller
validation during VSC reconciliation. Once all related VSCs become
ReadyToUse, the stub VGSCs are no longer needed and should be removed.

The cleanup runs in the restore finalizer controller's execute() phase.
Before deleting each VGSC, it polls until all related VolumeSnapshotContents
(correlated by snapshot handle) are ReadyToUse, with a timeout fallback.
Deletion failures and CRD-not-installed scenarios are treated as warnings
rather than errors to avoid failing the restore.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix lint: remove unused nolint directive and simplify cleanupStubVGSC return

The cleanupStubVGSC function only produces warnings (not errors), so
simplify its return signature. Also remove the now-unused nolint:unparam
directive on execute() since warnings are no longer always nil.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

---------

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Rename changelog file to match cherry-pick PR number

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

---------

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-04-08 12:45:02 -07:00
lyndon-li
f854a0653a Merge pull request #9678 from sseago/custom-volume-policy-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m2s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 13s
Main CI / Build (push) Has been skipped
[release-1.18] Add custom action type to volume policies
2026-04-08 13:42:36 +08:00
lyndon-li
cce0f20168 Merge branch 'release-1.18' into custom-volume-policy-1.18 2026-04-08 11:08:42 +08:00
Xun Jiang/Bruce Jiang
2b6b3091c2 Merge pull request #9670 from blackpiglet/xj014661/1.18/go-jose-cve
[1.18][cherry-pick] Bump github.com/go-jose/go-jose/v4 from 4.1.3 to 4.1.4
2026-04-08 10:21:04 +08:00
dependabot[bot]
4cb9c7b9a2 Bump github.com/go-jose/go-jose/v4 from 4.1.3 to 4.1.4
Bumps [github.com/go-jose/go-jose/v4](https://github.com/go-jose/go-jose) from 4.1.3 to 4.1.4.
- [Release notes](https://github.com/go-jose/go-jose/releases)
- [Commits](https://github.com/go-jose/go-jose/compare/v4.1.3...v4.1.4)

---
updated-dependencies:
- dependency-name: github.com/go-jose/go-jose/v4
  dependency-version: 4.1.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-08 10:11:48 +08:00
Xun Jiang/Bruce Jiang
a33e5a3f8f Merge pull request #9587 from blackpiglet/xj014661/1.18/9180_fix
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m3s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 11s
Main CI / Build (push) Has been skipped
Remove wildcard check from getNamespacesToList.
2026-04-08 09:33:59 +08:00
Xun Jiang/Bruce Jiang
f89b55269c Update pkg/util/podvolume/pod_volume_test.go
Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Xun Jiang/Bruce Jiang <59276555+blackpiglet@users.noreply.github.com>
2026-04-08 08:12:21 +08:00
Xun Jiang
8ac8f49b5c Remove wildcard check from getNamespacesToList.
Expand wildcard in namespace filter only for backup scenario.
Restore doesn't need that now, because restore has logic to rely on
IncludeEverything function to check whether cluster-scoped resources
should be restored. Expand wildcard will break the logic.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-04-08 08:12:21 +08:00
Scott Seago
5dd9d5242b Add custom action type to volume policies (#9540)
* Add custom action type to volume policies

Signed-off-by: Scott Seago <sseago@redhat.com>

* Update internal/resourcepolicies/resource_policies.go

Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Scott Seago <sseago@redhat.com>

* added "custom" to validation list

Signed-off-by: Scott Seago <sseago@redhat.com>

* responding to review comments

Signed-off-by: Scott Seago <sseago@redhat.com>

---------

Signed-off-by: Scott Seago <sseago@redhat.com>
Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-04-07 14:57:15 -04:00
lyndon-li
bf9e1f8fd7 Merge pull request #9671 from adam-jian-zhang/fix-node-agent-detection-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 52s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 9s
Main CI / Build (push) Has been skipped
fix node-agent node detection logic
2026-04-03 15:59:23 +08:00
lyndon-li
c9b5429a7a Merge branch 'release-1.18' into fix-node-agent-detection-1.18 2026-04-03 15:34:39 +08:00
lyndon-li
536e43719b Merge pull request #9672 from Lyndon-Li/release-1.18
[1.18] Issue 9659: fix crash on cancel without loading data path #9663
2026-04-03 15:27:15 +08:00
Lyndon-Li
ffede3ca6e issue 9659: fix crash on cancel without loading data path
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-03 14:46:16 +08:00
Lyndon-Li
ed2daeedf6 issue 9659: fix crash on cancel without loading data path
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-03 14:41:40 +08:00
Wenkai Yin(尹文开)
16f9e4f303 Merge pull request #9669 from Lyndon-Li/release-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 50s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 9s
Main CI / Build (push) Has been skipped
[1.18] Issue 9626: let go for uninitialized repo under readonly mode
2026-04-03 14:39:08 +08:00
Adam Zhang
ea057e42fa fix node-agent node detection logic
Add namespace in ListOptions, to fix node-agent node detection
in its deployed namespace.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-04-03 14:31:13 +08:00
Lyndon-Li
a6e579cb93 issue 9626: let go for uninitialized repo under readonly mode
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-03 13:21:58 +08:00
Xun Jiang/Bruce Jiang
856f1296fc Merge pull request #9661 from blackpiglet/xj014661/1.18/tarball_extraction_check
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 46s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 8s
Main CI / Build (push) Has been skipped
[1.18][cherry-pick] Add more check for file extraction from tarball.
2026-04-02 17:20:33 +08:00
Xun Jiang
c7fa4bfe35 Add more check for file extraction from tarball.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-04-02 16:57:00 +08:00
Xun Jiang/Bruce Jiang
e9bc0eca53 Merge pull request #9665 from blackpiglet/xj014661/1.18/controller-runtime-tag
[1.18][cherry-pick] Pin the sigs.k8s.io/controller-runtime to v0.23.2
2026-04-02 16:56:36 +08:00
Xun Jiang
c857dff5a4 Pin the sigs.k8s.io/controller-runtime to v0.23.2
The tag used to latest. Due to latest tag v0.23.3 already used
Golang v1.26, Velero main still uses v1.25. Build failed.
To fix this, pin the controller-runtime to v0.23.2

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-04-02 15:55:21 +08:00
Xun Jiang/Bruce Jiang
1644a2c738 Merge pull request #9637 from adam-jian-zhang/fix-install-options
[1.18] fix configmap lookup in non-default namespaces
2026-04-02 15:41:29 +08:00
Adam Zhang
09795245e7 switch the call order of validate/complete
switch the call order of validate/complete which accomplish
the same effect.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-03-24 11:12:42 +08:00
Adam Zhang
cd7c9cba3e fix configmap lookup in non-default namespaces
o.Namespace is empty when Validate runs (Complete hasn't been called yet),
causing VerifyJSONConfigs to query the default namespace instead of the
intended one. Replace o.Namespace with f.Namespace() in all three ConfigMap
validation calls so the factory's already-resolved namespace is used.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-03-23 14:55:06 +08:00
Xun Jiang/Bruce Jiang
33b1fde8e1 Merge pull request #9629 from sseago/windows-polling-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m8s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 21s
Main CI / Build (push) Has been skipped
refactor: Optimize VSC handle readiness polling for VSS backups
2026-03-20 10:55:26 +08:00
Scott Seago
525036bc69 Merge branch 'release-1.18' into windows-polling-1.18 2026-03-19 08:44:32 -04:00
Xun Jiang/Bruce Jiang
974c465d0a Merge pull request #9631 from blackpiglet/xj014661/1.18/RepoMaintenance_e2e_fix
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m13s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 13s
Main CI / Build (push) Has been skipped
[1.18] Fix Repository Maintenance Job Configuration's global part E2E case.
2026-03-19 18:03:33 +08:00
Xun Jiang/Bruce Jiang
7da042a053 Merge branch 'release-1.18' into xj014661/1.18/RepoMaintenance_e2e_fix 2026-03-19 17:54:28 +08:00
lyndon-li
ca628ccc44 Merge pull request #9632 from blackpiglet/xj014661/1.18/grpc-1.79.3
[1.18][cherry-pick] Bump google.golang.org/grpc from 1.77.0 to 1.79.3
2026-03-19 17:24:19 +08:00
dependabot[bot]
6055bd5478 Bump google.golang.org/grpc from 1.77.0 to 1.79.3
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.77.0 to 1.79.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.77.0...v1.79.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-19 16:33:59 +08:00
Xun Jiang
f7890d3c59 Fix Repository Maintenance Job Configuration's global part E2E case.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-19 16:09:50 +08:00
Scott Seago
a83ab21a9a feat: Implement early frequent polling for CSI snapshots
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-03-18 18:18:17 -04:00
Scott Seago
79f0e72fde refactor: Optimize VSC handle readiness polling for VSS backups
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-03-18 09:43:06 -04:00
lyndon-li
c5bca75f17 Merge pull request #9621 from Lyndon-Li/release-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m10s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 17s
Main CI / Build (push) Has been skipped
[1.18] Fix compile error for Windows
2026-03-16 14:53:19 +08:00
Lyndon-Li
fcdbc7cfa8 fix compile error for Windows
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-16 13:54:57 +08:00
Xun Jiang/Bruce Jiang
2b87a2306e Merge pull request #9612 from vmware-tanzu/xj014661/1.18/fix_NodeAgentConfig_e2e_error
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m21s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 15s
Main CI / Build (push) Has been skipped
[E2E][1.18] Compare affinity by string instead of exactly same compare.
2026-03-16 10:47:03 +08:00
Xun Jiang
c239b27bf2 Compare affinity by string instead of exactly same compare.
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m16s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
From 1.18.1, Velero adds some default affinity in the backup/restore pod,
so we can't directly compare the whole affinity,
but we can verify if the expected affinity is contained in the pod affinity.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-13 18:01:07 +08:00
lyndon-li
6ba0f86586 Merge pull request #9610 from Lyndon-Li/release-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m20s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 2m31s
Main CI / Build (push) Has been skipped
[1.18] Issue 9460: Uploader flush buffer
2026-03-12 15:37:42 +08:00
lyndon-li
6dfd8c96d0 Merge branch 'release-1.18' into release-1.18 2026-03-12 15:13:08 +08:00
Shubham Pampattiwar
336e8c4b56 Merge pull request #9604 from shubham-pampattiwar/cherry-pick-fix-dbr-stuck-release-1.18
[release-1.18] Fix DBR stuck when CSI snapshot no longer exists in cloud provider
2026-03-11 22:23:47 -07:00
Shubham Pampattiwar
883befcdde Remove cherry-picked changelog file from upstream PR
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-03-11 20:41:06 -07:00
Shubham Pampattiwar
7cfd4af733 Add changelog for PR #9604
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-03-11 20:41:06 -07:00
Shubham Pampattiwar
4cc1779fec Fix DBR stuck when CSI snapshot no longer exists in cloud provider (#9581)
* Fix DBR stuck when CSI snapshot no longer exists in cloud provider

During backup deletion, VolumeSnapshotContentDeleteItemAction creates a
new VSC with the snapshot handle from the backup and polls for readiness.
If the underlying snapshot no longer exists (e.g., deleted externally),
the CSI driver reports Status.Error but checkVSCReadiness() only checks
ReadyToUse, causing it to poll for the full 10-minute timeout instead of
failing fast. Additionally, the newly created VSC is never cleaned up on
failure, leaving orphaned resources in the cluster.

This commit:
- Adds Status.Error detection in checkVSCReadiness() to fail immediately
  on permanent CSI driver errors (e.g., InvalidSnapshot.NotFound)
- Cleans up the dangling VSC when readiness polling fails

Fixes #9579

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for PR #9581

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix typo in pod_volume_test.go: colume -> volume

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

---------

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-03-11 20:41:06 -07:00
Lyndon-Li
ce2b4c191f issue 9460: flush buffer when uploader completes
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-12 11:26:30 +08:00
Lyndon-Li
1e6f02dc24 flush volume after restore
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-12 11:19:48 +08:00
Lyndon-Li
e2bbace03b uploader flush buffer for restore
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-12 11:19:25 +08:00
lyndon-li
341597f542 Merge pull request #9609 from Lyndon-Li/release-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m9s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 22s
Main CI / Build (push) Has been skipped
[1.18] Issue 9475: Selected node to node selector
2026-03-12 11:18:32 +08:00
Lyndon-Li
ea97ef8279 node-selector for selected node
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-12 10:47:22 +08:00