Commit Graph

2797 Commits

Author SHA1 Message Date
Adam Zhang
0291c53e9d Fix PodVolumeBackup list scope during restore
Restrict the listing of PodVolumeBackup resources to the specific
restore namespace in both the core restore controller and the pod
volume restore action plugin. This prevents "Forbidden" errors when
Velero is configured with namespace-scoped minimum privileges,
avoiding the need for cluster-scoped list permissions for
PodVolumeBackups.

Fixes: #9681

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-04-09 09:57:04 +08:00
Shubham Pampattiwar
5ad4e604b8 [release-1.18] Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver (#9687)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m4s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 12s
Main CI / Build (push) Has been skipped
* Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver (#9516)

* Fix VolumeGroupSnapshot restore on Ceph RBD

This PR fixes two related issues affecting CSI snapshot restore on Ceph RBD:

1. VolumeGroupSnapshot restore fails because Ceph RBD populates
   volumeGroupSnapshotHandle on pre-provisioned VSCs, but Velero doesn't
   create the required VGSC during restore.

2. CSI snapshot restore fails because VolumeSnapshotClassName is removed
   from restored VSCs, preventing the CSI controller from getting
   credentials for snapshot verification.

Changes:
- Capture volumeGroupSnapshotHandle during backup as VS annotation
- Create stub VGSC during restore with matching handle in status
- Look up VolumeSnapshotClass by driver and set on restored VSC

Fixes #9512
Fixes #9515

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for VGS restore fix

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix gofmt import order

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for VGS restore fix

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix import alias corev1 to corev1api per lint config

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix: Add snapshot handles to existing stub VGSC and add unit tests

When multiple VolumeSnapshots from the same VolumeGroupSnapshot are
restored, they share the same VolumeGroupSnapshotHandle but have
different individual snapshot handles. This commit:

1. Fixes incomplete logic where existing VGSC wasn't updated with
   new snapshot handles (addresses review feedback)

2. Fixes race condition where Create returning AlreadyExists would
   skip adding the snapshot handle

3. Adds comprehensive unit tests for ensureStubVGSCExists (5 cases)
   and addSnapshotHandleToVGSC (4 cases) functions

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Clean up stub VolumeGroupSnapshotContents during restore finalization

Add cleanup logic for stub VGSCs created during VolumeGroupSnapshot restore.
The stub VGSCs are temporary objects needed to satisfy CSI controller
validation during VSC reconciliation. Once all related VSCs become
ReadyToUse, the stub VGSCs are no longer needed and should be removed.

The cleanup runs in the restore finalizer controller's execute() phase.
Before deleting each VGSC, it polls until all related VolumeSnapshotContents
(correlated by snapshot handle) are ReadyToUse, with a timeout fallback.
Deletion failures and CRD-not-installed scenarios are treated as warnings
rather than errors to avoid failing the restore.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix lint: remove unused nolint directive and simplify cleanupStubVGSC return

The cleanupStubVGSC function only produces warnings (not errors), so
simplify its return signature. Also remove the now-unused nolint:unparam
directive on execute() since warnings are no longer always nil.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

---------

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Rename changelog file to match cherry-pick PR number

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

---------

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-04-08 12:45:02 -07:00
lyndon-li
cce0f20168 Merge branch 'release-1.18' into custom-volume-policy-1.18 2026-04-08 11:08:42 +08:00
Xun Jiang/Bruce Jiang
f89b55269c Update pkg/util/podvolume/pod_volume_test.go
Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Xun Jiang/Bruce Jiang <59276555+blackpiglet@users.noreply.github.com>
2026-04-08 08:12:21 +08:00
Xun Jiang
8ac8f49b5c Remove wildcard check from getNamespacesToList.
Expand wildcard in namespace filter only for backup scenario.
Restore doesn't need that now, because restore has logic to rely on
IncludeEverything function to check whether cluster-scoped resources
should be restored. Expand wildcard will break the logic.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-04-08 08:12:21 +08:00
Scott Seago
5dd9d5242b Add custom action type to volume policies (#9540)
* Add custom action type to volume policies

Signed-off-by: Scott Seago <sseago@redhat.com>

* Update internal/resourcepolicies/resource_policies.go

Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Scott Seago <sseago@redhat.com>

* added "custom" to validation list

Signed-off-by: Scott Seago <sseago@redhat.com>

* responding to review comments

Signed-off-by: Scott Seago <sseago@redhat.com>

---------

Signed-off-by: Scott Seago <sseago@redhat.com>
Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-04-07 14:57:15 -04:00
lyndon-li
c9b5429a7a Merge branch 'release-1.18' into fix-node-agent-detection-1.18 2026-04-03 15:34:39 +08:00
Lyndon-Li
ffede3ca6e issue 9659: fix crash on cancel without loading data path
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-03 14:46:16 +08:00
Lyndon-Li
ed2daeedf6 issue 9659: fix crash on cancel without loading data path
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-03 14:41:40 +08:00
Adam Zhang
ea057e42fa fix node-agent node detection logic
Add namespace in ListOptions, to fix node-agent node detection
in its deployed namespace.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-04-03 14:31:13 +08:00
Lyndon-Li
a6e579cb93 issue 9626: let go for uninitialized repo under readonly mode
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-03 13:21:58 +08:00
Xun Jiang
c7fa4bfe35 Add more check for file extraction from tarball.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-04-02 16:57:00 +08:00
Adam Zhang
09795245e7 switch the call order of validate/complete
switch the call order of validate/complete which accomplish
the same effect.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-03-24 11:12:42 +08:00
Adam Zhang
cd7c9cba3e fix configmap lookup in non-default namespaces
o.Namespace is empty when Validate runs (Complete hasn't been called yet),
causing VerifyJSONConfigs to query the default namespace instead of the
intended one. Replace o.Namespace with f.Namespace() in all three ConfigMap
validation calls so the factory's already-resolved namespace is used.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-03-23 14:55:06 +08:00
Scott Seago
a83ab21a9a feat: Implement early frequent polling for CSI snapshots
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-03-18 18:18:17 -04:00
Scott Seago
79f0e72fde refactor: Optimize VSC handle readiness polling for VSS backups
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-03-18 09:43:06 -04:00
Lyndon-Li
fcdbc7cfa8 fix compile error for Windows
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-16 13:54:57 +08:00
Lyndon-Li
ce2b4c191f issue 9460: flush buffer when uploader completes
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-12 11:26:30 +08:00
Lyndon-Li
1e6f02dc24 flush volume after restore
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-12 11:19:48 +08:00
Lyndon-Li
e2bbace03b uploader flush buffer for restore
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-12 11:19:25 +08:00
Lyndon-Li
384a492aa2 replace nodeName with node selector
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-12 10:45:11 +08:00
lyndon-li
e4774b32f3 Merge branch 'release-1.18' into release-1.18 2026-03-11 20:14:40 +08:00
Xun Jiang/Bruce Jiang
ea2c4f4e5c Merge branch 'release-1.18' into xj014661/1.18/ephemeral_storage_config 2026-03-11 18:14:54 +08:00
Lyndon-Li
2c0fddc498 support custom os
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-11 18:02:34 +08:00
Lyndon-Li
eac69375c9 support custom os
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-11 17:56:53 +08:00
Lyndon-Li
733b2eb6f5 support customized host os
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-11 17:56:14 +08:00
Lyndon-Li
01bd153968 support customized host os - use affinity for host os selection
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-11 17:56:08 +08:00
Lyndon-Li
57892169a9 support customized host os
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-11 16:09:12 +08:00
lyndon-li
77c60589d6 Merge branch 'release-1.18' into release-1.18 2026-03-11 15:15:05 +08:00
Xun Jiang
9a39cbfbf5 Remove the skipped item from the resource list when it's skipped by BIA.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-10 17:51:20 +08:00
Xun Jiang
62a24ece50 If BIA return updateObj with SkipFromBackupAnnotation, treat it as skip the resource from backup.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-10 17:49:59 +08:00
Xun Jiang
b85a8f6784 Add ephemeral storage limit and request support for data mover and maintenance job.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-10 17:42:06 +08:00
Lyndon-Li
d39285be32 issue 9343: include PV topology to data mover pod affinitiesq
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-10 15:56:01 +08:00
Lyndon-Li
c30164c355 issue 9343: include PV topology to data mover pod affinities
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-10 15:53:39 +08:00
Lyndon-Li
ce0888ee44 issue 9343: include PV topology to data mover pod affinities
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-10 15:38:27 +08:00
Lyndon-Li
c87e8acbf4 remove unecessary changelogs for 1.18.0
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-09 17:47:12 +08:00
Xun Jiang/Bruce Jiang
5db4c65a92 Merge branch 'release-1.18' into 1.18-9508 2026-02-12 11:32:06 +08:00
Tiger Kaovilai
5b54ccd2e0 Fix VolumePolicy PVC phase condition filter for unbound PVCs
Use typed error approach: Make GetPVForPVC return ErrPVNotFoundForPVC
when PV is not expected to be found (unbound PVC), then use errors.Is
to check for this error type. When a matching policy exists (e.g.,
pvcPhase: [Pending, Lost] with action: skip), apply the action without
error. When no policy matches, return the original error to preserve
default behavior.

Changes:
- Add ErrPVNotFoundForPVC sentinel error to pvc_pv.go
- Update ShouldPerformSnapshot to handle unbound PVCs with policies
- Update ShouldPerformFSBackup to handle unbound PVCs with policies
- Update item_backupper.go to handle Lost PVCs in tracking functions
- Remove checkPVCOnlySkip helper (no longer needed)
- Update tests to reflect new behavior

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 15:29:14 -05:00
Joseph Antony Vaikath
43b926a58b Support all glob wildcard characters in namespace validation (#9502)
* Support all glob wildcard characters in namespace validation

Expand namespace validation to allow all valid glob pattern characters
(*, ?, {}, [], ,) by replacing them with valid characters during RFC 1123
validation. The actual glob pattern validation is handled separately by
the wildcard package.

Also add validation to reject unsupported characters (|, (), !) that are
not valid in glob patterns, and update terminology from "regex" to "glob"
for clarity since this implementation uses glob patterns, not regex.

Changes:
- Replace all glob wildcard characters in validateNamespaceName
- Add test coverage for valid glob patterns in includes/excludes
- Add test coverage for unsupported characters
- Reject exclamation mark (!) in wildcard patterns
- Clarify comments and error messages about glob vs regex

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Changelog

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add documentation: glob patterns are now accepted

Signed-off-by: Joseph <jvaikath@redhat.com>

* Error message fix

Signed-off-by: Joseph <jvaikath@redhat.com>

* Remove negation glob char test

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add bracket pattern validation for namespace glob patterns

Extends wildcard validation to support square bracket patterns [] used in glob character classes. Validates bracket syntax including empty brackets, unclosed brackets, and unmatched brackets. Extracts ValidateNamespaceName as a public function to enable reuse in namespace validation logic.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Reduce scope to *, ?, [ and ]

Signed-off-by: Joseph <jvaikath@redhat.com>

* Fix tests

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add namespace glob patterns documentation page

Adds dedicated documentation explaining supported glob patterns
for namespace include/exclude filtering to help users understand
the wildcard syntax.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Fix build-image Dockerfile envtest download

Replace inaccessible go.kubebuilder.io URL with setup-envtest and update envtest version to 1.33.0 to match Kubernetes v0.33.3 dependencies.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* kubebuilder binaries mv

Signed-off-by: Joseph <jvaikath@redhat.com>

* Reject brace patterns and update documentation

Add {, }, and , to unsupported characters list to explicitly reject
brace expansion patterns. Remove { from wildcard detection since these
patterns are not supported in the 1.18 release.

Update all documentation to show supported patterns inline (*, ?, [abc])
with clickable links to the detailed namespace-glob-patterns page.
Simplify YAML comments by removing non-clickable URLs.

Update tests to expect errors when brace patterns are used.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Document brace expansion as unsupported

Add {} and , to the unsupported patterns section to clarify that
brace expansion patterns like {a,b,c} are not supported.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Update tests to expect brace pattern rejection

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

---------

Signed-off-by: Joseph <jvaikath@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 15:14:06 -05:00
Xun Jiang/Bruce Jiang
386599638f Merge pull request #9510 from Lyndon-Li/ignore-cache-volume-config-without-backup-repo-config
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m13s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 34s
Close stale issues and PRs / stale (push) Successful in 18s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m59s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m21s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m21s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m36s
Log when cache volume configured but backup repo is not
2026-02-02 10:50:20 +08:00
Lyndon-Li
9796da389d ignore cache volume config when backup repo config is not provided
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-30 18:35:36 +08:00
Scott Seago
dfb1d45831 Remove backup from running list when backup fails validation (#9498)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m4s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 12s
Main CI / Build (push) Failing after 32s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m16s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m30s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m16s
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-01-27 16:25:30 -05:00
Xun Jiang/Bruce Jiang
72beb35edc Maintenance Job only uses the first element of the LoadAffinity array from the ConfigMap. (#9494)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m7s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 17s
Main CI / Build (push) Failing after 35s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m46s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m25s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m13s
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-01-23 14:27:50 -05:00
Wenkai Yin(尹文开)
7442d20f9d Merge pull request #9481 from Lyndon-Li/issue-fix-9478
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m3s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 35s
Close stale issues and PRs / stale (push) Successful in 13s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m46s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m22s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m18s
Issue 9478: Diagnose expose on peek error
2026-01-15 16:53:57 +08:00
Lyndon-Li
e72fea8ecd fix issue for cache volume
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-14 17:45:01 +08:00
Lyndon-Li
e703e06eeb diagnose expose on peek error
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-13 16:33:14 +08:00
Xun Jiang
b7289b51c7 Add Role, RoleBinding, ClusterRole, and ClusterRoleBinding in restore sequence.
Ensure the RBAC resources are restored before pods.
The change help to avoid pod starting error when pod depends on the RBAC resources,
e.g., prometheus operator check whether it has enough permission before launching
controller, if prometheus operator pod starts before RBAC resources created, it
will not launch controllers, and it will not retry.
f7f07bcdfb/cmd/operator/main.go (L392-L400)

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-01-07 12:40:23 +08:00
lyndon-li
6eae73f0bf Merge pull request #9466 from Lyndon-Li/collect-kopia-content-log
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m9s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 33s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m49s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m24s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m56s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m28s
Collect kopia content log
2026-01-06 15:51:47 +08:00
Lubron Zhan
0d80995e62 Update the logging to print correct affinity field
Signed-off-by: Lubron Zhan <lubronzhan@gmail.com>
2026-01-01 11:37:21 -08:00
Lyndon-Li
1425ebb369 collect kopia content log
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-12-31 15:42:14 +08:00