Compare commits

..

174 Commits

Author SHA1 Message Date
dependabot[bot]
ac1f6e7f3e Bump github.com/moby/spdystream from 0.5.0 to 0.5.1
Bumps [github.com/moby/spdystream](https://github.com/moby/spdystream) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/moby/spdystream/releases)
- [Commits](https://github.com/moby/spdystream/compare/v0.5.0...v0.5.1)

---
updated-dependencies:
- dependency-name: github.com/moby/spdystream
  dependency-version: 0.5.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-16 20:51:51 +00:00
Scott Seago
8e9e6b4d36 added parallel backup configuration to install docs (#9729)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 8m3s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 40s
Close stale issues and PRs / stale (push) Successful in 11s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m27s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m14s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m8s
* added parallel backup configuration to install docs

Signed-off-by: Scott Seago <sseago@redhat.com>

* Update site/content/docs/main/customize-installation.md

Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Scott Seago <sseago@redhat.com>

---------

Signed-off-by: Scott Seago <sseago@redhat.com>
Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
2026-04-16 14:42:48 -04:00
lyndon-li
71b230f82e Merge pull request #9728 from blackpiglet/9490_fix
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m18s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 22s
Main CI / Build (push) Failing after 33s
Remove Restic build from Dockerfile, Makefile and Tiltfile.
2026-04-16 18:07:36 +08:00
Nolan Emirot
fc6361ba06 perf: better string concatenation (#9705)
* perf: better string concatenation

Signed-off-by: emirot <emirot.nolan@gmail.com>
Signed-off-by: nolanemirot <nolan.emirot@broadcom.com>
Signed-off-by: emirot <emirot.nolan@gmail.com>

* fix: backup deletion silently succeeds when tarball download fails (#9693)

* Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* added changelog
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* Refactor error handling in backup deletion
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* prevent backup deletion when errors occur
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* added logger
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Signed-off-by: emirot <emirot.nolan@gmail.com>

* perf: better string concatenation

Signed-off-by: emirot <emirot.nolan@gmail.com>

* Add delay to avoid race conditions during VolumeSnapshotContent deletion (#9700)

* Add delay to avoid race conditions during VolumeSnapshotContent deletion
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* updated changelog
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* Updated Changelog
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Signed-off-by: emirot <emirot.nolan@gmail.com>

* block data mover design

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: emirot <emirot.nolan@gmail.com>

* block data mover design

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: emirot <emirot.nolan@gmail.com>

* irregular volume size

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: emirot <emirot.nolan@gmail.com>

* block data mover design

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: emirot <emirot.nolan@gmail.com>

* Update the "community" page of website (#9722)

Update the community page to add the correct links to community meeting
and meeting notes.
I also removed the referece of google group as I confirmed the last
message was sent 2 years ago.

Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
Signed-off-by: emirot <emirot.nolan@gmail.com>

* perf: better string concatenation

Signed-off-by: emirot <emirot.nolan@gmail.com>

---------

Signed-off-by: emirot <emirot.nolan@gmail.com>
Signed-off-by: nolanemirot <nolan.emirot@broadcom.com>
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
Co-authored-by: Priyansh Choudhary <im1706@gmail.com>
Co-authored-by: nolanemirot <nolan.emirot@broadcom.com>
Co-authored-by: Lyndon-Li <lyonghui@vmware.com>
Co-authored-by: Daniel Jiang <daniel.jiang@broadcom.com>
2026-04-16 02:56:25 -04:00
Xun Jiang
39db9f9c1e Remove Restic build from Dockerfile, Makefile and Tiltfile.
Delete Restic-build-related scripts.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-04-15 23:46:31 +08:00
Daniel Jiang
4d9bd91200 Update the "community" page of website (#9722)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m14s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 41s
Close stale issues and PRs / stale (push) Failing after 10m58s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 2m1s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m29s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m14s
Update the community page to add the correct links to community meeting
and meeting notes.
I also removed the referece of google group as I confirmed the last
message was sent 2 years ago.

Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
2026-04-15 11:15:57 -04:00
Xun Jiang/Bruce Jiang
c5fa50bedc Merge pull request #9528 from Lyndon-Li/block-data-mover-design
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m15s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 18s
Main CI / Build (push) Failing after 39s
Block data mover design
2026-04-15 13:43:14 +08:00
Priyansh Choudhary
df2686c146 Add delay to avoid race conditions during VolumeSnapshotContent deletion (#9700)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m8s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 32s
* Add delay to avoid race conditions during VolumeSnapshotContent deletion
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* updated changelog
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* Updated Changelog
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
2026-04-14 18:26:49 -04:00
Priyansh Choudhary
8a6ac7af1c fix: backup deletion silently succeeds when tarball download fails (#9693)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m13s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 11s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 34s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m5s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m1s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m4s
* Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* added changelog
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* Refactor error handling in backup deletion
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* prevent backup deletion when errors occur
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>

* added logger
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
2026-04-14 16:36:40 -04:00
lyndon-li
a990bd81f1 Merge branch 'main' into block-data-mover-design 2026-04-14 15:37:45 +08:00
lyndon-li
15db9d2552 Merge pull request #9721 from Lyndon-Li/fix-change-log-path-error
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m7s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 32s
Fix change log path error for 9683
2026-04-14 15:37:31 +08:00
Lyndon-Li
1b4c7fe4be fix change log path error for 9683
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-14 15:07:49 +08:00
Lyndon-Li
5b9bcc99f1 block data mover design
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-14 14:39:56 +08:00
Xun Jiang/Bruce Jiang
e921c177cc Merge pull request #9701 from emirot/chore/update_base_image
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m17s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
build-image / Build (push) Failing after 13s
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 32s
chore: update base image to newer debian image
2026-04-14 09:16:33 +08:00
Tiger Kaovilai
cf605c948e Add CI check for invalid characters in file paths (#9553)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m13s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 32s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m40s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m14s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m6s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m15s
* Add CI check for invalid characters in file paths

Go's module zip rejects filenames containing certain characters (shell
special chars like " ' * < > ? ` |, path separators : \, and non-letter
Unicode such as control/format characters). This caused a build failure
when a changelog file contained an invisible U+200E LEFT-TO-RIGHT MARK
(see PR #9552).

Add a GitHub Actions workflow that validates all tracked file paths on
every PR to catch these issues before they reach downstream consumers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* Fix changelog filenames containing invisible U+200E characters

Remove LEFT-TO-RIGHT MARK unicode characters from changelog filenames
that would cause Go module zip failures.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

---------

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Happy <yesreply@happy.engineering>
2026-04-13 14:52:42 -04:00
emirot
cd89c0ffa7 chore: update base image to newer debian image
Signed-off-by: emirot <emirot.nolan@gmail.com>
2026-04-13 13:42:30 +08:00
emirot
eb0a1814c6 chore: update base image to newer debian image
Signed-off-by: emirot <emirot.nolan@gmail.com>
2026-04-13 13:42:30 +08:00
lyndon-li
eaef4ead42 Merge pull request #9704 from adam-jian-zhang/fix-csi-pvc-backup-plugin-scoping
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m7s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 10s
Main CI / Build (push) Failing after 29s
Fix DataUpload list scope in CSI PVC backup plugin
2026-04-13 13:16:14 +08:00
Adam Zhang
7562011b79 Fix DataUpload list scope in CSI PVC backup plugin
The `getDataUpload` function in the CSI PVC backup plugin was
previously making a cluster-scoped list query to retrieve DataUpload
CRs. In environments with strict minimum-privilege RBAC, this would
fail with forbidden errors.
This explicitly passes the backup namespace into the `ListOptions`
when calling `crClient.List`, correctly scoping the queries to the
backup's namespace. Unit tests have also been updated to ensure
cross-namespace queries are rejected appropriately.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-04-10 15:53:46 +08:00
lyndon-li
4a6756d57b Merge pull request #9683 from Lyndon-Li/increase-repo-maintenance-history-queue-length
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m18s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 16s
Main CI / Build (push) Failing after 31s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m24s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m4s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m9s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m4s
Issue 9428: increase repo maintenance history queue length
2026-04-10 11:50:24 +08:00
Xun Jiang/Bruce Jiang
e1cc07cec3 Merge pull request #9695 from shubham-pampattiwar/bump-ext-snapshotter-v8.4-vgs-v1beta2
Bump external-snapshotter to v8.4.0 for VGS v1beta2 support
2026-04-10 11:38:24 +08:00
Lyndon-Li
1730b7f414 issue 9428: incremental repo maintenance history queue length
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-09 14:41:07 +08:00
lyndon-li
37abfb4bfa Merge pull request #9682 from adam-jian-zhang/fix-restore-pvr-scope
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m4s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 30s
Close stale issues and PRs / stale (push) Successful in 11s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m2s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m2s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m3s
Fix PodVolumeBackup list scope during restore
2026-04-09 10:59:26 +08:00
Shubham Pampattiwar
0cf8f94268 Add changelog for PR #9695
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-04-08 15:47:06 -07:00
Shubham Pampattiwar
1b5503e20b Bump external-snapshotter to v8.4.0 for VGS v1beta2 support
Kubernetes 1.34 introduced VolumeGroupSnapshot v1beta2 API and
deprecated v1beta1. Distributions running K8s 1.34+ (e.g. OpenShift
4.21+) have removed v1beta1 VGS CRDs entirely, breaking Velero's
VGS functionality on those clusters.

This change bumps external-snapshotter/client/v8 from v8.2.0 to
v8.4.0 and migrates all VGS API usage from v1beta1 to v1beta2.

The v1beta2 API is structurally compatible - the Spec-level types
(GroupSnapshotHandles, VolumeGroupSnapshotContentSource) are
unchanged. The Status-level change (VolumeSnapshotHandlePairList
replaced by VolumeSnapshotInfoList) does not affect Velero as it
does not directly consume that type.

Fixes #9694

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-04-08 15:46:06 -07:00
Shubham Pampattiwar
e439977117 Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver (#9516)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m5s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 30s
* Fix VolumeGroupSnapshot restore on Ceph RBD

This PR fixes two related issues affecting CSI snapshot restore on Ceph RBD:

1. VolumeGroupSnapshot restore fails because Ceph RBD populates
   volumeGroupSnapshotHandle on pre-provisioned VSCs, but Velero doesn't
   create the required VGSC during restore.

2. CSI snapshot restore fails because VolumeSnapshotClassName is removed
   from restored VSCs, preventing the CSI controller from getting
   credentials for snapshot verification.

Changes:
- Capture volumeGroupSnapshotHandle during backup as VS annotation
- Create stub VGSC during restore with matching handle in status
- Look up VolumeSnapshotClass by driver and set on restored VSC

Fixes #9512
Fixes #9515

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for VGS restore fix

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix gofmt import order

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for VGS restore fix

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix import alias corev1 to corev1api per lint config

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix: Add snapshot handles to existing stub VGSC and add unit tests

When multiple VolumeSnapshots from the same VolumeGroupSnapshot are
restored, they share the same VolumeGroupSnapshotHandle but have
different individual snapshot handles. This commit:

1. Fixes incomplete logic where existing VGSC wasn't updated with
   new snapshot handles (addresses review feedback)

2. Fixes race condition where Create returning AlreadyExists would
   skip adding the snapshot handle

3. Adds comprehensive unit tests for ensureStubVGSCExists (5 cases)
   and addSnapshotHandleToVGSC (4 cases) functions

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Clean up stub VolumeGroupSnapshotContents during restore finalization

Add cleanup logic for stub VGSCs created during VolumeGroupSnapshot restore.
The stub VGSCs are temporary objects needed to satisfy CSI controller
validation during VSC reconciliation. Once all related VSCs become
ReadyToUse, the stub VGSCs are no longer needed and should be removed.

The cleanup runs in the restore finalizer controller's execute() phase.
Before deleting each VGSC, it polls until all related VolumeSnapshotContents
(correlated by snapshot handle) are ReadyToUse, with a timeout fallback.
Deletion failures and CRD-not-installed scenarios are treated as warnings
rather than errors to avoid failing the restore.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix lint: remove unused nolint directive and simplify cleanupStubVGSC return

The cleanupStubVGSC function only produces warnings (not errors), so
simplify its return signature. Also remove the now-unused nolint:unparam
directive on execute() since warnings are no longer always nil.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

---------

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-04-08 12:08:56 -07:00
Adam Zhang
dd82645909 Fix PodVolumeBackup list scope during restore
Restrict the listing of PodVolumeBackup resources to the specific
restore namespace in both the core restore controller and the pod
volume restore action plugin. This prevents "Forbidden" errors when
Velero is configured with namespace-scoped minimum privileges,
avoiding the need for cluster-scoped list permissions for
PodVolumeBackups.

Fixes: #9681

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-04-08 16:50:09 +08:00
lyndon-li
22f93ad457 Merge pull request #9676 from Lyndon-Li/remove-restic-for-repo
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m5s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 29s
Close stale issues and PRs / stale (push) Successful in 13s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m35s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m10s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m13s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m16s
Remove restic for repo
2026-04-08 14:21:17 +08:00
Lyndon-Li
9598c50295 Merge branch 'main' into remove-restic-for-repo 2026-04-08 13:37:34 +08:00
Wenkai Yin(尹文开)
54761092c1 Merge pull request #9677 from Lyndon-Li/remove-restic-for-uploader
Remove restic for uploader
2026-04-08 12:32:58 +08:00
Lyndon-Li
dca3d3001f remove restic for repo
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-08 11:11:15 +08:00
Scott Seago
e8fa708933 Add custom action type to volume policies (#9540)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m0s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 10s
Main CI / Build (push) Failing after 25s
Close stale issues and PRs / stale (push) Successful in 11s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m3s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m3s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m13s
* Add custom action type to volume policies

Signed-off-by: Scott Seago <sseago@redhat.com>

* Update internal/resourcepolicies/resource_policies.go

Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Scott Seago <sseago@redhat.com>

* added "custom" to validation list

Signed-off-by: Scott Seago <sseago@redhat.com>

* responding to review comments

Signed-off-by: Scott Seago <sseago@redhat.com>

---------

Signed-off-by: Scott Seago <sseago@redhat.com>
Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
2026-04-07 10:22:38 -07:00
Lyndon-Li
fca4d405b1 remove restic for uploader
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-07 18:07:51 +08:00
Xun Jiang/Bruce Jiang
d3f4b2c67e Merge pull request #9653 from BassinD/bugfix/nil-check-for-service-health-check-node-port-in-last-applied-config
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 47s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 10s
Main CI / Build (push) Failing after 24s
Fix service restore with null healthCheckNodePort in last-applied-configuration label
2026-04-07 16:41:23 +08:00
Lyndon-Li
235e579581 remove restic for repo
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-07 07:35:25 +00:00
Daniil Basin
dd1def9d33 Use strict minimal structure to parse last applied configuration JSON
Signed-off-by: Daniil Basin <bassindanil@hotmail.com>
2026-04-07 09:12:51 +05:00
Xun Jiang/Bruce Jiang
baf2491344 Merge pull request #9668 from adam-jian-zhang/fix-node-agent-detection
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 45s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 9s
Main CI / Build (push) Failing after 23s
Close stale issues and PRs / stale (push) Successful in 10s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m22s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m17s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m4s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m5s
fix node-agent node detection logic
2026-04-03 14:19:20 +08:00
Adam Zhang
e79ad64a10 fix node-agent node detection logic
Add namespace in ListOptions, to fix node-agent node detection
in its deployed namespace.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-04-03 13:23:57 +08:00
Xun Jiang/Bruce Jiang
e9226527de Merge pull request #9634 from Lyndon-Li/let-go-for-unitialized-readonly-repo
Issue 9626: let go for uninitialized repo under readonly mode
2026-04-03 13:15:16 +08:00
Wenkai Yin(尹文开)
78fba2146c Merge pull request #9663 from Lyndon-Li/issue-fix-9659
Issue 9659: fix crash on cancel without loading data path
2026-04-03 12:40:44 +08:00
lyndon-li
4dbdd2df3a Merge pull request #9520 from sseago/custom-volume-policy-design
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 51s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
build-image / Build (push) Failing after 9s
Main CI / get-go-version (push) Successful in 9s
Main CI / Build (push) Failing after 26s
Close stale issues and PRs / stale (push) Successful in 10s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m31s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m0s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 54s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 55s
Design for custom volume policy action API changes
2026-04-02 16:29:02 +08:00
Lyndon-Li
6869b7bf54 issue 9659: fix crash on cancel without loading data path
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-02 16:14:12 +08:00
lyndon-li
30ddf3f35f Merge branch 'main' into let-go-for-unitialized-readonly-repo 2026-04-02 15:29:12 +08:00
lyndon-li
38d9e96130 Merge branch 'main' into issue-fix-9659 2026-04-02 15:29:05 +08:00
lyndon-li
531fc4810f Merge pull request #9664 from blackpiglet/xj014661/main/controller-runtime-tag
Pin the sigs.k8s.io/controller-runtime to v0.23.2
2026-04-02 15:28:32 +08:00
Xun Jiang
94259e8a5c Pin the sigs.k8s.io/controller-runtime to v0.23.2
The tag used to latest. Due to latest tag v0.23.3 already used
Golang v1.26, Velero main still uses v1.25. Build failed.
To fix this, pin the controller-runtime to v0.23.2

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-04-02 11:41:39 +08:00
Gabriele Fedi
5433eb3081 feat: support backup hooks on native sidecars (#9403)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 51s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 10s
Main CI / Build (push) Failing after 25s
Close stale issues and PRs / stale (push) Successful in 9s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m28s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 55s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m2s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m1s
* feat: support backup hooks on sidecars
Add support for configuring Kubernates native
Sidecars as target containrs for Backup Hooks
commands. This is purely a validation level
patch as the actual pods/exec API doesn't make
any distinction between standard and sidecar
containers.

Signed-off-by: Gabriele Fedi <gabriele.fedi@enterprisedb.com>

* test: extend unit tests

Signed-off-by: Gabriele Fedi <gabriele.fedi@enterprisedb.com>

* chore: changelog

Signed-off-by: Gabriele Fedi <gabriele.fedi@enterprisedb.com>

* style: fix linter issues

Signed-off-by: Gabriele Fedi <gabriele.fedi@enterprisedb.com>

---------

Signed-off-by: Gabriele Fedi <gabriele.fedi@enterprisedb.com>
2026-04-01 14:27:18 -04:00
Lyndon-Li
238b1e1f13 issue 9659: fix crash on cancel without loading data path
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-01 18:11:26 +08:00
Lyndon-Li
ef7b468fb9 issue 9626: let go for uninitialized repo under readonly mode
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-04-01 13:09:29 +08:00
Xun Jiang/Bruce Jiang
f0aa64172e Fix Repository Maintenance Job Configuration's global part E2E case. (#9633)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 48s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 8s
Main CI / Build (push) Failing after 23s
Close stale issues and PRs / stale (push) Successful in 9s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m19s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 57s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 55s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 54s
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-31 13:55:07 -04:00
Xun Jiang/Bruce Jiang
3f8e358849 Merge pull request #9614 from vmware-tanzu/xj014661/main/fix_extract_cve
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 45s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 9s
Main CI / Build (push) Failing after 24s
Add check for file extraction from tarball.
2026-03-31 16:59:13 +08:00
Wenkai Yin(尹文开)
bbd5ae079d Merge pull request #9655 from blackpiglet/xj014661/main/trivy_action_fix
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 51s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 27s
Close stale issues and PRs / stale (push) Successful in 9s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m25s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 53s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 58s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 56s
[main] Update the trivy-action version from main to specific tag
2026-03-30 16:26:10 +08:00
Xun Jiang
91922103b4 Update the trivy-action version from main to specific tag to fix supply chain attack
https://www.aquasec.com/blog/trivy-supply-chain-attack-what-you-need-to-know/

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-27 16:09:53 +08:00
Anshul Ahuja
e6bdff61bd Merge pull request #9643 from priyansh17/user/priyansh/issue-#9641
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m12s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 29s
Close stale issues and PRs / stale (push) Successful in 7s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 58s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 58s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 51s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 47s
Refactor: simplify VolumeSnapshotContent deletion logic
2026-03-26 12:02:19 +05:30
Priyansh Choudhary
c74d5e7aba refactor: streamline VolumeSnapshotContent deletion process and remove readiness checks
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
2026-03-25 15:23:20 +05:30
Priyansh Choudhary
905a561c84 added changelog
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
2026-03-24 20:49:16 +05:30
Priyansh Choudhary
e9d312c27e refactor: simplify VolumeSnapshotContent deletion logic and remove unused timeout handling
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
2026-03-24 20:35:51 +05:30
Adam Zhang
a5391e13e7 [main] fix configmap lookup in non-default namespaces (#9638)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m19s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 17s
Main CI / Build (push) Failing after 27s
Close stale issues and PRs / stale (push) Successful in 13s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m35s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m1s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m9s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m2s
* fix configmap lookup in non-default namespaces

o.Namespace is empty when Validate runs (Complete hasn't been called yet),
causing VerifyJSONConfigs to query the default namespace instead of the
intended one. Replace o.Namespace with f.Namespace() in all three ConfigMap
validation calls so the factory's already-resolved namespace is used.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>

* switch the call order of validate/complete

switch the call order of validate/complete which accomplish
the same effect.

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>

---------

Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
2026-03-24 04:05:20 -04:00
Xun Jiang/Bruce Jiang
e368fc8803 Merge pull request #9628 from priyansh17/9625-fixOrphanesVscCleanup
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m10s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 12s
Main CI / Build (push) Failing after 49s
Close stale issues and PRs / stale (push) Successful in 12s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m15s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 59s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 55s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 55s
Implement original VolumeSnapshotContent deletion for legacy backups - Fixes #9625
2026-03-23 15:53:19 +08:00
Xun Jiang/Bruce Jiang
b5734a6ba2 Merge branch 'main' into xj014661/main/fix_extract_cve
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m12s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
2026-03-20 10:59:55 +08:00
Priyansh Choudhary
65c88f3425 Merge branch 'main' into 9625-fixOrphanesVscCleanup 2026-03-19 20:10:16 +05:30
Wenkai Yin(尹文开)
bb9a94bebe Merge pull request #9620 from vmware-tanzu/jxun/main/fix_NodeAgentConfig_e2e_error
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m5s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 17s
Main CI / Build (push) Failing after 34s
Close stale issues and PRs / stale (push) Successful in 10s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m10s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 53s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m4s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 56s
[main][cherry-pick] Compare affinity by string instead of exactly same compare.
2026-03-19 16:45:23 +08:00
Xun Jiang/Bruce Jiang
74401b20b0 Merge pull request #9630 from vmware-tanzu/dependabot/go_modules/google.golang.org/grpc-1.79.3
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m3s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 35s
Bump google.golang.org/grpc from 1.77.0 to 1.79.3
2026-03-19 14:02:57 +08:00
dependabot[bot]
417d3d2562 Bump google.golang.org/grpc from 1.77.0 to 1.79.3
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.77.0 to 1.79.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.77.0...v1.79.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-19 05:05:16 +00:00
Priyansh Choudhary
68cee893f1 Enhance logging and error handling in VolumeSnapshotContent deletion process
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
2026-03-19 03:17:45 +05:30
Priyansh Choudhary
fce276bca9 Added changelog
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
2026-03-19 02:38:11 +05:30
Priyansh Choudhary
ade433ecbd Implement original VolumeSnapshotContent deletion for legacy backups
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
2026-03-19 02:30:39 +05:30
Xun Jiang/Bruce Jiang
48e66b1790 Merge pull request #9564 from hollycai05/add-e2e-tests-for-PR9255
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m25s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 19s
Main CI / Build (push) Failing after 6m20s
Close stale issues and PRs / stale (push) Successful in 16s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m6s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 23s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 51s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 27s
Add e2e test case for PR 9255
2026-03-17 16:18:19 +08:00
Xun Jiang
29a9f80f10 Compare affinity by string instead of exactly same compare.
From 1.18.1, Velero adds some default affinity in the backup/restore pod,
so we can't directly compare the whole affinity,
but we can verify if the expected affinity is contained in the pod affinity.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-16 10:49:50 +08:00
Xun Jiang
70043af85b Add more check for file extraction from tarball.
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m25s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-13 16:42:10 +08:00
Xun Jiang/Bruce Jiang
66ac235e1f Merge pull request #9595 from vmware-tanzu/xj014661/main/disable_search_in_site
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m5s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 33s
Close stale issues and PRs / stale (push) Successful in 17s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 43s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m5s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m13s
Disable Algolia docs search
2026-03-11 11:23:22 +08:00
Shubham Pampattiwar
afe7df17d4 Add itemOperationTimeout to Schedule API type docs (#9599)
The itemOperationTimeout field was missing from the Schedule API type
documentation even though it is supported in the Schedule CRD template.
This led users to believe the field was not available per-schedule.

Fixes #9598

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-03-10 16:12:47 -04:00
Shubham Pampattiwar
a31f4abcb3 Fix DBR stuck when CSI snapshot no longer exists in cloud provider (#9581)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m17s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 19s
Main CI / Build (push) Failing after 37s
Close stale issues and PRs / stale (push) Successful in 18s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m29s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 40s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 50s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 53s
* Fix DBR stuck when CSI snapshot no longer exists in cloud provider

During backup deletion, VolumeSnapshotContentDeleteItemAction creates a
new VSC with the snapshot handle from the backup and polls for readiness.
If the underlying snapshot no longer exists (e.g., deleted externally),
the CSI driver reports Status.Error but checkVSCReadiness() only checks
ReadyToUse, causing it to poll for the full 10-minute timeout instead of
failing fast. Additionally, the newly created VSC is never cleaned up on
failure, leaving orphaned resources in the cluster.

This commit:
- Adds Status.Error detection in checkVSCReadiness() to fail immediately
  on permanent CSI driver errors (e.g., InvalidSnapshot.NotFound)
- Cleans up the dangling VSC when readiness polling fails

Fixes #9579

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for PR #9581

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix typo in pod_volume_test.go: colume -> volume

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

---------

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2026-03-10 13:40:09 -04:00
Xun Jiang/Bruce Jiang
2145c57642 Merge pull request #9562 from hollycai05/add-e2e-test-for-PR9366
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m4s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 18s
Main CI / Build (push) Failing after 41s
Add e2e test case for PR 9366
2026-03-10 17:28:23 +08:00
Xun Jiang
a9b3cfa062 Disable Algolia docs search.
Revert PR 6105.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-10 16:10:44 +08:00
Wenkai Yin(尹文开)
bca6afada7 Merge pull request #9590 from Lyndon-Li/set-latest-do-to-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m27s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 19s
Main CI / Build (push) Failing after 1m55s
Close stale issues and PRs / stale (push) Successful in 16s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m3s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 55s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 30s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 46s
Issue 9586: set latest doc to 1.18
2026-03-09 17:27:23 +08:00
Lyndon-Li
d1cc303553 issue 9586: set latest doc to 1.18
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-09 15:41:13 +08:00
Xun Jiang/Bruce Jiang
befa61cee1 Merge pull request #9570 from H-M-Quang-Ngo/add-schedule-interval-metric
Add schedule_expected_interval_seconds metric
2026-03-09 15:28:59 +08:00
lyndon-li
245525c26b Merge pull request #9547 from blackpiglet/1.18_add_bia_skip_resource_logic
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m6s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 37s
Close stale issues and PRs / stale (push) Successful in 16s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m27s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 52s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 29s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 47s
Add BIA skip resource logic
2026-03-06 12:28:05 +08:00
Xun Jiang/Bruce Jiang
55737b9cf1 Merge pull request #9574 from blackpiglet/xj014661/main/ephemeral_storage_config
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m24s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 17s
Main CI / Build (push) Failing after 40s
Close stale issues and PRs / stale (push) Successful in 16s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m28s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 41s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 35s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 25s
Add ephemeral storage limit and request support for data mover and maintenance job
2026-03-05 22:43:16 +08:00
Xun Jiang
ffea850522 Add ephemeral storage limit and request support for data mover and maintenance job.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-03-05 14:22:53 +08:00
dongqingcc
d315bca32b add namespace wildcard test case for restore
Signed-off-by: dongqingcc <dongqingcc@vmware.com>
2026-03-05 13:46:21 +08:00
Quang
b3aff97684 Merge branch 'main' into add-schedule-interval-metric 2026-03-05 09:15:52 +11:00
testsabirweb
23a3c242fa Add test coverage and fix validation for MRAP ARN bucket names (#9554)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m21s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 43s
Close stale issues and PRs / stale (push) Successful in 16s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 2m35s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m49s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m36s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m44s
* Issue #9544: Add test coverage and fix validation for MRAP ARN bucket names

S3 Multi-Region Access Point (MRAP) ARNs have the format:
  arn:aws:s3::{account-id}:accesspoint/{mrap-alias}.mrap

These ARNs contain a '/' as part of the ARN path, which caused Velero's
BSL bucket validation to reject them with an error asking the user to
put the value in the Prefix field instead.

Fix the bucket name validation in objectBackupStoreGetter.Get() to
exempt ARNs (identified by the "arn:" prefix) from the slash check,
since slashes are a valid and required part of ARN syntax.

Add unit tests in object_store_mrap_test.go covering:
- A plain MRAP ARN as bucket name succeeds
- A MRAP ARN with a trailing slash is trimmed and accepted

Signed-off-by: Sabir Ali <testsabirweb@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* Address review comments: fix changelog filename and import grouping

Signed-off-by: Sabir Ali <testsabirweb@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* Restrict MRAP ARN bucket validation to arn:aws:s3: prefix

Per review, use HasPrefix(bucket, "arn:aws:s3:") instead of
HasPrefix(bucket, "arn:") so only S3 ARNs (e.g. MRAP) are exempt
from the slash check, not any ARN from other AWS services.

Signed-off-by: Sabir Ali <sabir.ali@spectrocloud.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* Move MRAP bucket tests into TestNewObjectBackupStoreGetter

Consolidate MRAP ARN test cases into the existing table in
object_store_test.go and remove object_store_mrap_test.go.

Signed-off-by: Sabir Ali <sabir.ali@spectrocloud.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Signed-off-by: Sabir Ali <testsabirweb@gmail.com>
Signed-off-by: Sabir Ali <sabir.ali@spectrocloud.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-03-04 15:11:01 +00:00
Xun Jiang/Bruce Jiang
b7bc16f190 Merge pull request #9569 from vmware-tanzu/dependabot/go_modules/go.opentelemetry.io/otel/sdk-1.40.0
Bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.40.0
2026-03-04 23:00:11 +08:00
dongqingcc
bbec46f6ee Add e2e test case for PR 9366: Use hookIndex for recording multiple restore exec hooks.
Signed-off-by: dongqingcc <dongqingcc@vmware.com>
2026-03-03 17:53:11 +08:00
Quang
475050108b Merge branch 'main' into add-schedule-interval-metric 2026-03-03 01:00:32 +11:00
lyndon-li
b5f7cd92c7 Merge pull request #9571 from Lyndon-Li/fix-compile-error-for-windows
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m14s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 2s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 17s
Main CI / Build (push) Failing after 34s
Close stale issues and PRs / stale (push) Successful in 20s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 2m13s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m42s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m46s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m41s
Fix compile error for Windows
2026-03-02 16:43:59 +08:00
Lyndon-Li
ab31b811ee fix compile error for Windows
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-02 15:11:54 +08:00
dependabot[bot]
19360622e7 Bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.40.0
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m11s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.38.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.38.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.40.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-02 06:50:57 +00:00
lyndon-li
932d27541c Merge pull request #9561 from Lyndon-Li/uploader-flush-buffer
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m11s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 36s
Issue 9460: Uploader flush buffer
2026-03-02 14:49:51 +08:00
Quang
b0642b3078 Merge branch 'main' into add-schedule-interval-metric 2026-03-02 15:23:53 +11:00
Lyndon-Li
9cada8fc11 issue 9460: flush buffer when uploader completes
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-03-02 11:43:44 +08:00
Wenkai Yin(尹文开)
25d5fa1b88 Merge pull request #9560 from Lyndon-Li/selected-node-to-node-selector
Issue 9475: Selected node to node selector
2026-03-02 11:26:26 +08:00
Quang Ngo
1c08af8461 Add changelog for #9570
Signed-off-by: Quang Ngo <quang.ngo@canonical.com>
2026-03-02 10:49:14 +11:00
Quang Ngo
6c3d81a146 Add schedule_expected_interval_seconds metric
Add a new Prometheus gauge metric that exposes the expected interval
between consecutive scheduled backups. This enables dynamic alerting
thresholds per schedule backups.

Signed-off-by: Quang Ngo <quang.ngo@canonical.com>
2026-03-02 10:20:09 +11:00
Lyndon-Li
81029d64ff irregular volume size
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-27 16:08:35 +08:00
Xun Jiang/Bruce Jiang
8f32696449 Merge branch 'main' into 1.18_add_bia_skip_resource_logic 2026-02-27 11:38:27 +08:00
Xun Jiang
3f15e9219f Remove the skipped item from the resource list when it's skipped by BIA.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-02-27 11:37:34 +08:00
dongqingcc
62aa70219b Add e2e test case for PR 9255
Signed-off-by: dongqingcc <dongqingcc@vmware.com>
2026-02-26 17:10:28 +08:00
Lyndon-Li
544b184d6c Merge branch 'main' into uploader-flush-buffer 2026-02-26 13:38:44 +08:00
Lyndon-Li
250c4db158 node-selector for selected node
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-26 13:34:43 +08:00
Lyndon-Li
f0d81c56e2 Merge branch 'main' into selected-node-to-node-selector 2026-02-26 13:30:47 +08:00
lyndon-li
8b5559274d Merge pull request #9533 from Lyndon-Li/support-customized-host-os
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m6s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 12s
Main CI / Build (push) Failing after 46s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 2m12s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m28s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m45s
Issue 9496: support customized host os
2026-02-26 12:00:02 +08:00
Lyndon-Li
a230929111 block data mover design
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-24 23:17:03 +08:00
Lyndon-Li
7235180de4 Merge branch 'main' into support-customized-host-os 2026-02-24 15:40:56 +08:00
Tiger Kaovilai
ba5e7681ff rename malformed changelog file name (#9552)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m2s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 21s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 2m3s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m29s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m15s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m29s
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
2026-02-19 14:28:15 -05:00
lyndon-li
fc0a16d734 Merge pull request #9548 from Lyndon-Li/doc-for-1.18-2
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m25s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 41s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 2m28s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m38s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m25s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m22s
Update doc link for 1.18
2026-02-13 18:02:40 +08:00
Xun Jiang
bcdee1b116 If BIA return updateObj with SkipFromBackupAnnotation, treat it as skip the resource from backup.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-02-13 17:42:46 +08:00
Lyndon-Li
2a696a4431 update doc link for 1.18
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-13 17:34:36 +08:00
Xun Jiang/Bruce Jiang
991bf1b000 Merge pull request #9545 from Lyndon-Li/add-upgrade-to-1.18-doc
Add upgrade-to-1.18 doc
2026-02-13 16:32:47 +08:00
Lyndon-Li
4d47471932 add upgrade-to-1.18 doc
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-13 16:20:53 +08:00
lyndon-li
0bf968d24d Merge pull request #9532 from Lyndon-Li/issue-fix-9343
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m33s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 18s
Main CI / Build (push) Failing after 41s
Issue 9343: include PV topology to data mover pod affinities
2026-02-13 13:14:34 +08:00
Xun Jiang/Bruce Jiang
158681e927 Merge branch 'main' into custom-volume-policy-design 2026-02-13 11:36:04 +08:00
Lyndon-Li
05c9a8d8f8 issue 9343: include PV topology to data mover pod affinitiesq
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-13 11:22:32 +08:00
Xun Jiang/Bruce Jiang
bc957a22b7 Merge pull request #9542 from blackpiglet/xj014661/main/cherry_pick_e2e_fixes
[main] cherry pick e2e fixes
2026-02-13 10:24:03 +08:00
Xun Jiang
7e3d66adc7 Fix test case issue and add UT.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-02-12 13:22:18 +08:00
Xun Jiang
710ebb9d92 Update the migration and upgrade test cases.
Modify Dockerfile to fix GitHub CI action error.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-02-12 13:20:34 +08:00
Joseph Antony Vaikath
1315399f35 Support all glob wildcard characters in namespace validation (#9502)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m19s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
build-image / Build (push) Failing after 16s
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 36s
Close stale issues and PRs / stale (push) Successful in 16s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m56s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m33s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m30s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m40s
* Support all glob wildcard characters in namespace validation

Expand namespace validation to allow all valid glob pattern characters
(*, ?, {}, [], ,) by replacing them with valid characters during RFC 1123
validation. The actual glob pattern validation is handled separately by
the wildcard package.

Also add validation to reject unsupported characters (|, (), !) that are
not valid in glob patterns, and update terminology from "regex" to "glob"
for clarity since this implementation uses glob patterns, not regex.

Changes:
- Replace all glob wildcard characters in validateNamespaceName
- Add test coverage for valid glob patterns in includes/excludes
- Add test coverage for unsupported characters
- Reject exclamation mark (!) in wildcard patterns
- Clarify comments and error messages about glob vs regex

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Changelog

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add documentation: glob patterns are now accepted

Signed-off-by: Joseph <jvaikath@redhat.com>

* Error message fix

Signed-off-by: Joseph <jvaikath@redhat.com>

* Remove negation glob char test

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add bracket pattern validation for namespace glob patterns

Extends wildcard validation to support square bracket patterns [] used in glob character classes. Validates bracket syntax including empty brackets, unclosed brackets, and unmatched brackets. Extracts ValidateNamespaceName as a public function to enable reuse in namespace validation logic.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Reduce scope to *, ?, [ and ]

Signed-off-by: Joseph <jvaikath@redhat.com>

* Fix tests

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add namespace glob patterns documentation page

Adds dedicated documentation explaining supported glob patterns
for namespace include/exclude filtering to help users understand
the wildcard syntax.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Fix build-image Dockerfile envtest download

Replace inaccessible go.kubebuilder.io URL with setup-envtest and update envtest version to 1.33.0 to match Kubernetes v0.33.3 dependencies.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* kubebuilder binaries mv

Signed-off-by: Joseph <jvaikath@redhat.com>

* Reject brace patterns and update documentation

Add {, }, and , to unsupported characters list to explicitly reject
brace expansion patterns. Remove { from wildcard detection since these
patterns are not supported in the 1.18 release.

Update all documentation to show supported patterns inline (*, ?, [abc])
with clickable links to the detailed namespace-glob-patterns page.
Simplify YAML comments by removing non-clickable URLs.

Update tests to expect errors when brace patterns are used.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Document brace expansion as unsupported

Add {} and , to the unsupported patterns section to clarify that
brace expansion patterns like {a,b,c} are not supported.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Update tests to expect brace pattern rejection

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

---------

Signed-off-by: Joseph <jvaikath@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 12:43:55 -05:00
Scott Seago
eadacf43e1 updated to reflect moving VolumeHelper interface
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-02-10 20:27:24 -05:00
Scott Seago
ddd83a66c5 Design for custom volume policy action API changes
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-02-10 12:09:27 -05:00
lyndon-li
7af688fbf5 Merge pull request #9508 from kaovilai/9507
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m41s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 18s
Main CI / Build (push) Failing after 34s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m58s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 2m0s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m45s
Fix VolumePolicy PVC phase condition filter for unbound PVCs (#9507)
2026-02-10 17:53:46 +08:00
Lyndon-Li
41fa774844 support custom os
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-10 13:35:07 +08:00
Lyndon-Li
5121417457 Merge branch 'main' into support-customized-host-os 2026-02-09 18:36:55 +08:00
Lyndon-Li
ece04e6e39 Merge branch 'main' into issue-fix-9343 2026-02-09 18:34:14 +08:00
Tiger Kaovilai
71ddeefcd6 Fix VolumePolicy PVC phase condition filter for unbound PVCs
Use typed error approach: Make GetPVForPVC return ErrPVNotFoundForPVC
when PV is not expected to be found (unbound PVC), then use errors.Is
to check for this error type. When a matching policy exists (e.g.,
pvcPhase: [Pending, Lost] with action: skip), apply the action without
error. When no policy matches, return the original error to preserve
default behavior.

Changes:
- Add ErrPVNotFoundForPVC sentinel error to pvc_pv.go
- Update ShouldPerformSnapshot to handle unbound PVCs with policies
- Update ShouldPerformFSBackup to handle unbound PVCs with policies
- Update item_backupper.go to handle Lost PVCs in tracking functions
- Remove checkPVCOnlySkip helper (no longer needed)
- Update tests to reflect new behavior

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 01:03:45 -05:00
Xun Jiang/Bruce Jiang
e159992f48 Merge pull request #9529 from Lyndon-Li/move-implemented-design-for-1.18
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m49s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 34s
Close stale issues and PRs / stale (push) Successful in 18s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m44s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m25s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m35s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m36s
Move implemented design for 1.18
2026-02-09 10:32:08 +08:00
Lyndon-Li
48b14194df move implemented design for 1.18
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-06 18:46:41 +08:00
Lyndon-Li
4ada356bf1 Merge branch 'main' into block-data-mover-design 2026-02-06 18:20:06 +08:00
Lyndon-Li
7f51017842 block data mover design
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-06 18:18:53 +08:00
lyndon-li
556d5826a8 Merge pull request #9523 from Lyndon-Li/1.18-release-notes
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m20s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 38s
1.18 release note and changelog
2026-02-06 13:47:20 +08:00
Lyndon-Li
62939cec18 1.18 release note and changelog
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-06 13:18:59 +08:00
lyndon-li
7d6a10d3ea Merge pull request #9524 from Lyndon-Li/1.18-doc
Add 1.18 doc
2026-02-06 13:16:17 +08:00
Xun Jiang/Bruce Jiang
1c0cf6c51d Merge pull request #9525 from reasonerjt/add-broadcom-to-adopters
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m38s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 29s
Main CI / Build (push) Failing after 37s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m42s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m40s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m29s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m23s
Add Broadcom to adopters list
2026-02-05 18:33:27 +08:00
Daniel Jiang
58f0b29091 Add Broadcom to adopters list
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
2026-02-05 17:24:49 +08:00
Lyndon-Li
5cb4cdba61 add 1.18 doc
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-05 15:23:39 +08:00
Xun Jiang/Bruce Jiang
325eb50480 Merge pull request #9522 from Lyndon-Li/1.18-readme
Update readme for 1.18
2026-02-05 15:15:40 +08:00
Lyndon-Li
993b80a350 update readme for 1.18
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-05 14:22:37 +08:00
Wenkai Yin(尹文开)
a909bd1f85 Merge pull request #9518 from blackpiglet/1.18_fix_cve
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m24s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 34s
Close stale issues and PRs / stale (push) Successful in 17s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m53s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m24s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m20s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m21s
Bump golang.org/x/crypto to v0.45.0 to fix CVEs in restic
2026-02-04 17:10:08 +08:00
Xun Jiang/Bruce Jiang
62a47b9fc5 Merge pull request #9521 from reasonerjt/cleanup-maintainers-md
Clean up MAINTAINERS.md
2026-02-04 16:10:42 +08:00
Daniel Jiang
31e9dcbb87 Clean up MAINTAINERS.md
Update the affiliation of maintainers from Broadcom.
Removes the company-specific roles.

Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
2026-02-04 15:34:14 +08:00
Xun Jiang
f824c3ca3b Bump golang.org/x/crypto to v0.45.0 to fix CVEs.
* CVE-2025-47914
* CVE-2025-58181

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-02-03 17:23:01 +08:00
Xun Jiang/Bruce Jiang
386599638f Merge pull request #9510 from Lyndon-Li/ignore-cache-volume-config-without-backup-repo-config
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m13s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 34s
Close stale issues and PRs / stale (push) Successful in 18s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m59s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m21s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m21s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m36s
Log when cache volume configured but backup repo is not
2026-02-02 10:50:20 +08:00
Lyndon-Li
9796da389d ignore cache volume config when backup repo config is not provided
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-30 18:35:36 +08:00
Scott Seago
dfb1d45831 Remove backup from running list when backup fails validation (#9498)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m4s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 12s
Main CI / Build (push) Failing after 32s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m16s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m30s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m16s
Signed-off-by: Scott Seago <sseago@redhat.com>
2026-01-27 16:25:30 -05:00
Lyndon-Li
18c32ed29c support customized host os
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-27 15:23:25 +08:00
Lyndon-Li
598c8c528b support customized host os - use affinity for host os selection
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-27 14:49:55 +08:00
Lyndon-Li
8f9beb04f0 support customized host os
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-27 14:37:38 +08:00
Lyndon-Li
bb518e6d89 replace nodeName with node selector
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-26 13:58:29 +08:00
Lyndon-Li
89c5182c3c flush volume after restore
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-26 13:17:44 +08:00
Lyndon-Li
d17435542e Merge branch 'main' into uploader-flush-buffer 2026-01-26 11:15:14 +08:00
Xun Jiang/Bruce Jiang
72beb35edc Maintenance Job only uses the first element of the LoadAffinity array from the ConfigMap. (#9494)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m7s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 17s
Main CI / Build (push) Failing after 35s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m46s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m25s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m13s
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-01-23 14:27:50 -05:00
Lyndon-Li
e3b501d0d9 issue 9343: include PV topology to data mover pod affinities
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-23 15:45:43 +08:00
Wenkai Yin(尹文开)
7442d20f9d Merge pull request #9481 from Lyndon-Li/issue-fix-9478
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m3s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 35s
Close stale issues and PRs / stale (push) Successful in 13s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m46s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m34s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m22s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m18s
Issue 9478: Diagnose expose on peek error
2026-01-15 16:53:57 +08:00
lyndon-li
4dfb47dd21 Merge pull request #9487 from Lyndon-Li/issue-fix-for-cache-volume
Fix issue for cache volume
2026-01-15 11:43:36 +08:00
Lyndon-Li
e72fea8ecd fix issue for cache volume
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-14 17:45:01 +08:00
lyndon-li
f388a5ce51 update doc statement for Windows support (#9486)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m7s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 1m1s
Close stale issues and PRs / stale (push) Successful in 13s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m31s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m20s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m17s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m5s
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-14 00:25:58 -05:00
Lyndon-Li
e703e06eeb diagnose expose on peek error
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-01-13 16:33:14 +08:00
Wenkai Yin(尹文开)
1feaafc03e Merge pull request #9474 from blackpiglet/add_role_rolebinding_in_restore_sequence
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m58s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 15s
Main CI / Build (push) Failing after 42m42s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m38s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m12s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m22s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m30s
Add Role, RoleBinding, ClusterRole, and ClusterRoleBinding in restore sequence
2026-01-08 11:58:06 +08:00
Xun Jiang/Bruce Jiang
e446ce54f6 Merge pull request #9473 from vmware-tanzu/fix_e2e_version_check_issue
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m29s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 44s
Close stale issues and PRs / stale (push) Successful in 16s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m44s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m14s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m15s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m20s
Fix the version regexp to make sure releaseRe and tegRe can support string like v1.16
2026-01-07 22:41:40 +08:00
Xun Jiang
b7289b51c7 Add Role, RoleBinding, ClusterRole, and ClusterRoleBinding in restore sequence.
Ensure the RBAC resources are restored before pods.
The change help to avoid pod starting error when pod depends on the RBAC resources,
e.g., prometheus operator check whether it has enough permission before launching
controller, if prometheus operator pod starts before RBAC resources created, it
will not launch controllers, and it will not retry.
f7f07bcdfb/cmd/operator/main.go (L392-L400)

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-01-07 12:40:23 +08:00
lyndon-li
6eae73f0bf Merge pull request #9466 from Lyndon-Li/collect-kopia-content-log
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m9s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 33s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m49s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m24s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m56s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m28s
Collect kopia content log
2026-01-06 15:51:47 +08:00
Xun Jiang
07a3cf759d Fix the version regexp to make sure releaseRe and tegRe can support string like v1.16.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-01-06 11:51:59 +08:00
lyndon-li
8f8367be65 Merge pull request #9456 from lubronzhan/topic/lubron/fix_logger
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 2m6s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 35s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m38s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m21s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m16s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m11s
Fix: Update the logging to print correct affinity field
2026-01-05 13:50:55 +08:00
Lubron Zhan
0d80995e62 Update the logging to print correct affinity field
Signed-off-by: Lubron Zhan <lubronzhan@gmail.com>
2026-01-01 11:37:21 -08:00
Lyndon-Li
1425ebb369 collect kopia content log
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-12-31 15:42:14 +08:00
Lubron Zhan
04364ef2ca Update the logging to print correct affinity field
Signed-off-by: Lubron Zhan <lubronzhan@gmail.com>
2025-12-30 11:00:49 -08:00
lyndon-li
3b5118b45e Merge pull request #9463 from Lyndon-Li/kopia-bump-up-0.22.3
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m10s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 41s
Close stale issues and PRs / stale (push) Successful in 19s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m45s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m19s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m22s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m21s
Bump kopia to 0.22.3
2025-12-30 18:18:33 +08:00
Lyndon-Li
c556603ce2 bump kopia to 0.22.3
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-12-30 16:44:52 +08:00
Lyndon-Li
060b3364f2 uploader flush buffer for restore
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-12-29 18:19:23 +08:00
Xun Jiang/Bruce Jiang
8708d4fda8 Merge pull request #9457 from blackpiglet/9380_fix
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m17s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 12s
Main CI / Build (push) Failing after 39s
Close stale issues and PRs / stale (push) Successful in 12s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m44s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m11s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m24s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m17s
Add function to validate the E2E Velero version.
2025-12-29 14:04:39 +08:00
Xun Jiang/Bruce Jiang
0db7816fa6 Merge branch 'main' into 9380_fix 2025-12-26 18:09:22 +08:00
lyndon-li
8f2016000a Merge pull request #9166 from kaovilai/claude/issue-107-20250807-1638
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m3s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 11s
Main CI / Build (push) Failing after 35s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m32s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m17s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m26s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m20s
Add VolumePolicy support for Pending PVC Phase conditions
2025-12-24 11:05:34 +08:00
claude[bot]
80211d77e5 feat: Add VolumePolicy support for PVC Phase conditions to allow skipping Pending PVCs
This commit implements VolumePolicy support for PVC Phase conditions, resolving
vmware-tanzu/velero#7233 where backups fail with ''PVC has no volume backing this claim''
for Pending PVCs.

Changes made:
- Extended VolumePolicy API to support PVC phase conditions
- Added pvcPhaseCondition struct with matching logic
- Modified getMatchAction() to evaluate policies for unbound PVCs before returning errors
- Added case to GetMatchAction() to handle PVC-only scenarios (nil PV)
- Added comprehensive unit tests for PVC phase parsing and matching

Users can now skip Pending PVCs through volume policy configuration:
  apiVersion: v1
  kind: ConfigMap
  metadata:
    name: volume-policy
    namespace: velero
  data:
    policy.yaml: |
      version: v1
      volumePolicies:
      - conditions:
          pvcPhase: [Pending]
        action:
          type: skip

chore: rename changelog file to match PR #9166

Renamed changelogs/unreleased/7233-claude to changelogs/unreleased/9166-claude
to match the opened PR at https://github.com/vmware-tanzu/velero/pull/9166

docs: Add PVC phase condition support to VolumePolicy documentation

- Added pvcPhase field to YAML template example
- Documented pvcPhase as a supported condition in the list
- Added comprehensive examples for using PVC phase conditions
- Included examples for Pending, Bound, and Lost phases
- Demonstrated combining PVC phase with other conditions

Co-Authored-By: Tiger Kaovilai <kaovilai@users.noreply.github.com>
2025-12-22 10:07:05 +07:00
369 changed files with 21150 additions and 5121 deletions

View File

@@ -22,7 +22,7 @@ jobs:
uses: actions/checkout@v6
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
uses: aquasecurity/trivy-action@57a97c7e7821a5776cebc9bb87c984fa69cba8f1
with:
image-ref: 'docker.io/velero/${{ matrix.images }}:${{ matrix.versions }}'
severity: 'CRITICAL,HIGH,MEDIUM'

93
.github/workflows/pr-filepath-check.yml vendored Normal file
View File

@@ -0,0 +1,93 @@
name: Pull Request File Path Check
on: [pull_request]
jobs:
filepath-check:
name: Check for invalid characters in file paths
runs-on: ubuntu-latest
steps:
- name: Check out the code
uses: actions/checkout@v6
- name: Validate file paths for Go module compatibility
run: |
# Go's module zip rejects filenames containing certain characters.
# See golang.org/x/mod/module fileNameOK() for the full specification.
#
# Allowed ASCII: letters, digits, and: !#$%&()+,-.=@[]^_{}~ and space
# Allowed non-ASCII: unicode letters only
# Rejected: " ' * < > ? ` | / \ : and any non-letter unicode (control
# chars, format chars like U+200E LEFT-TO-RIGHT MARK, etc.)
#
# This check catches issues like the U+200E incident in PR #9552.
EXIT_STATUS=0
git ls-files -z | python3 -c "
import sys, unicodedata
data = sys.stdin.buffer.read()
files = data.split(b'\x00')
# Characters explicitly rejected by Go's fileNameOK
# (path separators / and \ are inherent to paths so we check per-element)
bad_ascii = set('\"' + \"'\" + '*<>?\`|:')
allowed_ascii = set('!#$%&()+,-.=@[]^_{}~ ')
def is_ok(ch):
if ch.isascii():
return ch.isalnum() or ch in allowed_ascii
return ch.isalpha()
bad_files = [] # list of (original_path, clean_path, char_desc)
for f in files:
if not f:
continue
try:
name = f.decode('utf-8')
except UnicodeDecodeError:
print(f'::error::Non-UTF-8 bytes in filename: {f!r}')
bad_files.append((repr(f), None, 'non-UTF-8 bytes'))
continue
# Check each path element (split on /)
for element in name.split('/'):
for ch in element:
if not is_ok(ch):
cp = ord(ch)
char_name = unicodedata.name(ch, f'U+{cp:04X}')
char_desc = f'U+{cp:04X} ({char_name})'
# Build cleaned path by stripping invalid chars
clean = '/'.join(
''.join(c for c in elem if is_ok(c))
for elem in name.split('/')
)
print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}')
bad_files.append((name, clean, char_desc))
break
if bad_files:
print()
print('The following files have characters that are invalid in Go module zip archives:')
print()
for original, clean, desc in bad_files:
print(f' {original} — {desc}')
print()
print('To fix, rename the files to remove the problematic characters:')
print()
for original, clean, desc in bad_files:
if clean:
print(f' mv \"{original}\" \"{clean}\" && git add \"{clean}\"')
print(f' # or: git mv \"{original}\" \"{clean}\"')
else:
print(f' # {original} — cannot auto-suggest rename (non-UTF-8)')
print()
print('See https://github.com/vmware-tanzu/velero/pull/9552 for context.')
sys.exit(1)
else:
print('All file paths are valid for Go module zip.')
" || EXIT_STATUS=1
exit $EXIT_STATUS

View File

@@ -17,6 +17,7 @@ If you're using Velero and want to add your organization to this list,
<a href="https://www.replicated.com/" border="0" target="_blank"><img alt="replicated.com" src="site/static/img/adopters/replicated-logo-red.svg" height="50"></a>
<a href="https://cloudcasa.io/" border="0" target="_blank"><img alt="cloudcasa.io" src="site/static/img/adopters/cloudcasa.svg" height="50"></a>
<a href="https://azure.microsoft.com/" border="0" target="_blank"><img alt="azure.com" src="site/static/img/adopters/azure.svg" height="50"></a>
<a href="https://www.broadcom.com/" border="0" target="_blank"><img alt="broadcom.com" src="site/static/img/adopters/broadcom.svg" height="50"></a>
## Success Stories
Below is a list of adopters of Velero in **production environments** that have
@@ -68,6 +69,9 @@ Replicated uses the Velero open source project to enable snapshots in [KOTS][101
**[Microsoft Azure][105]**<br>
[Azure Backup for AKS][106] is an Azure native, Kubernetes aware, Enterprise ready backup for containerized applications deployed on Azure Kubernetes Service (AKS). AKS Backup utilizes Velero to perform backup and restore operations to protect stateful applications in AKS clusters.<br>
**[Broadcom][107]**<br>
[VMware Cloud Foundation][108] (VCF) offers built-in [vSphere Kubernetes Service][109] (VKS), a Kubernetes runtime that includes a CNCF certified Kubernetes distribution, to deploy and manage containerized workloads. VCF empowers platform engineers with native [Kubernetes multi-cluster management][110] capability for managing Kubernetes (K8s) infrastructure at scale. VCF utilizes Velero for Kubernetes data protection enabling platform engineers to back up and restore containerized workloads manifests & persistent volumes, helping to increase the resiliency of stateful applications in VKS cluster.
## Adding your organization to the list of Velero Adopters
If you are using Velero and would like to be included in the list of `Velero Adopters`, add an SVG version of your logo to the `site/static/img/adopters` directory in this repo and submit a [pull request][3] with your change. Name the image file something that reflects your company (e.g., if your company is called Acme, name the image acme.png). See this for an example [PR][4].
@@ -125,3 +129,8 @@ If you would like to add your logo to a future `Adopters of Velero` section on [
[105]: https://azure.microsoft.com/
[106]: https://learn.microsoft.com/azure/backup/backup-overview
[107]: https://www.broadcom.com/
[108]: https://www.vmware.com/products/cloud-infrastructure/vmware-cloud-foundation
[109]: https://www.vmware.com/products/cloud-infrastructure/vsphere-kubernetes-service
[110]: https://blogs.vmware.com/cloud-foundation/2025/09/29/empowering-platform-engineers-with-native-kubernetes-multi-cluster-management-in-vmware-cloud-foundation/

View File

@@ -13,7 +13,7 @@
# limitations under the License.
# Velero binary build section
FROM --platform=$BUILDPLATFORM golang:1.25-bookworm AS velero-builder
FROM --platform=$BUILDPLATFORM golang:1.25-trixie AS velero-builder
ARG GOPROXY
ARG BIN
@@ -48,30 +48,6 @@ RUN mkdir -p /output/usr/bin && \
-ldflags "${LDFLAGS}" ${PKG}/cmd/velero-helper && \
go clean -modcache -cache
# Restic binary build section
FROM --platform=$BUILDPLATFORM golang:1.25-bookworm AS restic-builder
ARG GOPROXY
ARG BIN
ARG TARGETOS
ARG TARGETARCH
ARG TARGETVARIANT
ARG RESTIC_VERSION
ENV CGO_ENABLED=0 \
GO111MODULE=on \
GOPROXY=${GOPROXY} \
GOOS=${TARGETOS} \
GOARCH=${TARGETARCH} \
GOARM=${TARGETVARIANT}
COPY . /go/src/github.com/vmware-tanzu/velero
RUN mkdir -p /output/usr/bin && \
export GOARM=$(echo "${GOARM}" | cut -c2-) && \
/go/src/github.com/vmware-tanzu/velero/hack/build-restic.sh && \
go clean -modcache -cache
# Velero image packing section
FROM paketobuildpacks/run-jammy-tiny:latest
@@ -79,7 +55,4 @@ LABEL maintainer="Xun Jiang <jxun@vmware.com>"
COPY --from=velero-builder /output /
COPY --from=restic-builder /output /
USER cnb:cnb

View File

@@ -15,7 +15,7 @@
ARG OS_VERSION=1809
# Velero binary build section
FROM --platform=$BUILDPLATFORM golang:1.25-bookworm AS velero-builder
FROM --platform=$BUILDPLATFORM golang:1.25-trixie AS velero-builder
ARG GOPROXY
ARG BIN

View File

@@ -7,11 +7,11 @@
| Maintainer | GitHub ID | Affiliation |
|---------------------|---------------------------------------------------------------|--------------------------------------------------|
| Scott Seago | [sseago](https://github.com/sseago) | [OpenShift](https://github.com/openshift) |
| Daniel Jiang | [reasonerjt](https://github.com/reasonerjt) | [VMware](https://www.github.com/vmware/) |
| Wenkai Yin | [ywk253100](https://github.com/ywk253100) | [VMware](https://www.github.com/vmware/) |
| Xun Jiang | [blackpiglet](https://github.com/blackpiglet) | [VMware](https://www.github.com/vmware/) |
| Daniel Jiang | [reasonerjt](https://github.com/reasonerjt) | Broadcom |
| Wenkai Yin | [ywk253100](https://github.com/ywk253100) | Broadcom |
| Xun Jiang | [blackpiglet](https://github.com/blackpiglet) | Broadcom |
| Shubham Pampattiwar | [shubham-pampattiwar](https://github.com/shubham-pampattiwar) | [OpenShift](https://github.com/openshift) |
| Yonghui Li | [Lyndon-Li](https://github.com/Lyndon-Li) | [VMware](https://www.github.com/vmware/) |
| Yonghui Li | [Lyndon-Li](https://github.com/Lyndon-Li) | Broadcom |
| Anshul Ahuja | [anshulahuja98](https://github.com/anshulahuja98) | [Microsoft Azure](https://www.github.com/azure/) |
| Tiger Kaovilai | [kaovilai](https://github.com/kaovilai) | [OpenShift](https://github.com/openshift) |
@@ -27,14 +27,3 @@
* JenTing Hsiao ([jenting](https://github.com/jenting))
* Dave Smith-Uchida ([dsu-igeek](https://github.com/dsu-igeek))
* Ming Qiu ([qiuming-best](https://github.com/qiuming-best))
## Velero Contributors & Stakeholders
| Feature Area | Lead |
|------------------------|:------------------------------------------------------------------------------------:|
| Technical Lead | Daniel Jiang [reasonerjt](https://github.com/reasonerjt) |
| Kubernetes CSI Liaison | |
| Deployment | |
| Community Management | Orlin Vasilev [OrlinVasilev](https://github.com/OrlinVasilev) |
| Product Management | Pradeep Kumar Chaturvedi [pradeepkchaturvedi](https://github.com/pradeepkchaturvedi) |

View File

@@ -105,8 +105,6 @@ see: https://velero.io/docs/main/build-from-source/#making-images-and-updating-v
endef
# comma cannot be escaped and can only be used in Make function arguments by putting into variable
comma=,
# The version of restic binary to be downloaded
RESTIC_VERSION ?= 0.15.0
CLI_PLATFORMS ?= linux-amd64 linux-arm linux-arm64 darwin-amd64 darwin-arm64 windows-amd64 linux-ppc64le linux-s390x
BUILD_OUTPUT_TYPE ?= docker
@@ -260,7 +258,6 @@ container-linux:
--build-arg=GIT_SHA=$(GIT_SHA) \
--build-arg=GIT_TREE_STATE=$(GIT_TREE_STATE) \
--build-arg=REGISTRY=$(REGISTRY) \
--build-arg=RESTIC_VERSION=$(RESTIC_VERSION) \
--provenance=false \
--sbom=false \
-f $(VELERO_DOCKERFILE) .

View File

@@ -42,13 +42,11 @@ The following is a list of the supported Kubernetes versions for each Velero ver
| Velero version | Expected Kubernetes version compatibility | Tested on Kubernetes version |
|----------------|-------------------------------------------|-------------------------------------|
| 1.17 | 1.18-latest | 1.31.7, 1.32.3, 1.33.1, and 1.34.0 |
| 1.18 | 1.18-latest | 1.33.7, 1.34.1, and 1.35.0 |
| 1.17 | 1.18-latest | 1.31.7, 1.32.3, 1.33.1, and 1.34.0 |
| 1.16 | 1.18-latest | 1.31.4, 1.32.3, and 1.33.0 |
| 1.15 | 1.18-latest | 1.28.8, 1.29.8, 1.30.4 and 1.31.1 |
| 1.14 | 1.18-latest | 1.27.9, 1.28.9, and 1.29.4 |
| 1.13 | 1.18-latest | 1.26.5, 1.27.3, 1.27.8, and 1.28.3 |
| 1.12 | 1.18-latest | 1.25.7, 1.26.5, 1.26.7, and 1.27.3 |
| 1.11 | 1.18-latest | 1.23.10, 1.24.9, 1.25.5, and 1.26.1 |
Velero supports IPv4, IPv6, and dual stack environments. Support for this was tested against Velero v1.8.

View File

@@ -103,11 +103,6 @@ local_resource(
deps = ["internal", "pkg/cmd"],
)
local_resource(
"restic_binary",
cmd = 'cd ' + '.' + ';mkdir -p _tiltbuild/restic; BIN=velero GOOS=linux GOARCH=amd64 GOARM="" RESTIC_VERSION=0.13.1 OUTPUT_DIR=_tiltbuild/restic ./hack/build-restic.sh',
)
# Note: we need a distro with a bash shell to exec into the Velero container
tilt_dockerfile_header = """
FROM ubuntu:22.04 as tilt
@@ -118,7 +113,6 @@ WORKDIR /
COPY --from=tilt-helper /start.sh .
COPY --from=tilt-helper /restart.sh .
COPY velero .
COPY restic/restic /usr/bin/restic
"""
dockerfile_contents = "\n".join([

View File

@@ -0,0 +1,109 @@
## v1.18
### Download
https://github.com/vmware-tanzu/velero/releases/tag/v1.18.0
### Container Image
`velero/velero:v1.18.0`
### Documentation
https://velero.io/docs/v1.18/
### Upgrading
https://velero.io/docs/v1.18/upgrade-to-1.18/
### Highlights
#### Concurrent backup
In v1.18, Velero is capable to process multiple backups concurrently. This is a significant usability improvement, especially for multiple tenants or multiple users case, backups submitted from different users could run their backups simultaneously without interfering with each other.
Check design https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/concurrent-backup-processing.md for more details.
#### Cache volume for data movers
In v1.18, Velero allows users to configure cache volumes for data mover pods during restore for CSI snapshot data movement and fs-backup. This brings below benefits:
- Solve the problem that data mover pods fail to when pod's ephemeral disk is limited
- Solve the problem that multiple data mover pods fail to run concurrently in one node when the node's ephemeral disk is limited
- Working together with backup repository's cache limit configuration, cache volume with appropriate size helps to improve the restore throughput
Check design https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/backup-repo-cache-volume.md for more details.
#### Incremental size for data movers
In v1.18, Velero allows users to observe the incremental size of data movers backups for CSI snapshot data movement and fs-backup, so that users could visually see the data reduction due to incremental backup.
#### Wildcard support for namespaces
In v1.18, Velero allows to use Glob regular expressions for namespace filters during backup and restore, so that users could filter namespaces in a batch manner.
#### VolumePolicy for PVC phase
In v1.18, Velero VolumePolicy supports actions by PVC phase, which help users to do special operations for PVCs with a specific phase, e.g., skip PVCs in Pending/Lost status from the backup.
#### Scalability and Resiliency improvements
##### Prevent Velero server OOM Kill for large backup repositories
In v1.18, some backup repository operations are delay executed out of Velero server, so Velero server won't be OOM Killed.
#### Performance improvement for VolumePolicy
In v1.18, VolumePolicy is enhanced for large number of pods/PVCs so that the performance is significantly improved.
#### Events for data mover pod diagnostic
In v1.18, events are recorded into data mover pod diagnostic, which allows user to see more information for troubleshooting when the data mover pod fails.
### Runtime and dependencies
Golang runtime: 1.25.7
kopia: 0.22.3
### Limitations/Known issues
### Breaking changes
#### Deprecation of PVC selected node feature
According to [Velero deprecation policy](https://github.com/vmware-tanzu/velero/blob/main/GOVERNANCE.md#deprecation-policy), PVC selected node feature is deprecated in v1.18. Velero could appropriately handle PVC's selected-node annotation, so users don't need to do anything particularly.
### All Changes
* Remove backup from running list when backup fails validation (#9498, @sseago)
* Maintenance Job only uses the first element of the LoadAffinity array (#9494, @blackpiglet)
* Fix issue #9478, add diagnose info on expose peek fails (#9481, @Lyndon-Li)
* Add Role, RoleBinding, ClusterRole, and ClusterRoleBinding in restore sequence. (#9474, @blackpiglet)
* Add maintenance job and data mover pod's labels and annotations setting. (#9452, @blackpiglet)
* Fix plugin init container names exceeding DNS-1123 limit (#9445, @mpryc)
* Add PVC-to-Pod cache to improve volume policy performance (#9441, @shubham-pampattiwar)
* Remove VolumeSnapshotClass from CSI B/R process. (#9431, @blackpiglet)
* Use hookIndex for recording multiple restore exec hooks. (#9366, @blackpiglet)
* Sanitize Azure HTTP responses in BSL status messages (#9321, @shubham-pampattiwar)
* Remove labels associated with previous backups (#9206, @Joeavaikath)
* Add VolumePolicy support for PVC Phase conditions to allow skipping Pending PVCs (#9166, @claude)
* feat: Enhance BackupStorageLocation with Secret-based CA certificate support (#9141, @kaovilai)
* Add `--apply` flag to `install` command, allowing usage of Kubernetes apply to make changes to existing installs (#9132, @mjnagel)
* Fix issue #9194, add doc for GOMAXPROCS behavior change (#9420, @Lyndon-Li)
* Apply volume policies to VolumeGroupSnapshot PVC filtering (#9419, @shubham-pampattiwar)
* Fix issue #9276, add doc for cache volume support (#9418, @Lyndon-Li)
* Add Prometheus metrics for maintenance jobs (#9414, @shubham-pampattiwar)
* Fix issue #9400, connect repo first time after creation so that init params could be written (#9407, @Lyndon-Li)
* Cache volume for PVR (#9397, @Lyndon-Li)
* Cache volume support for DataDownload (#9391, @Lyndon-Li)
* don't copy securitycontext from first container if configmap found (#9389, @sseago)
* Refactor repo provider interface for static configuration (#9379, @Lyndon-Li)
* Fix issue #9365, prevent fake completion notification due to multiple update of single PVR (#9375, @Lyndon-Li)
* Add cache volume configuration (#9370, @Lyndon-Li)
* Track actual resource names for GenerateName in restore status (#9368, @shubham-pampattiwar)
* Fix managed fields patch for resources using GenerateName (#9367, @shubham-pampattiwar)
* Support cache volume for generic restore exposer and pod volume exposer (#9362, @Lyndon-Li)
* Add incrementalSize to DU/PVB for reporting new/changed size (#9357, @sseago)
* Add snapshotSize for DataDownload, PodVolumeRestore (#9354, @Lyndon-Li)
* Add cache dir configuration for udmrepo (#9353, @Lyndon-Li)
* Fix the Job build error when BackupReposiotry name longer than 63. (#9350, @blackpiglet)
* Add cache configuration to VGDP (#9342, @Lyndon-Li)
* Fix issue #9332, add bytesDone for cache files (#9333, @Lyndon-Li)
* Fix typos in documentation (#9329, @T4iFooN-IX)
* Concurrent backup processing (#9307, @sseago)
* VerifyJSONConfigs verify every elements in Data. (#9302, @blackpiglet)
* Fix issue #9267, add events to data mover prepare diagnostic (#9296, @Lyndon-Li)
* Add option for privileged fs-backup pod (#9295, @sseago)
* Fix issue #9193, don't connect repo in repo controller (#9291, @Lyndon-Li)
* Implement concurrency control for cache of native VolumeSnapshotter plugin. (#9281, @0xLeo258)
* Fix issue #7904, remove the code and doc for PVC node selection (#9269, @Lyndon-Li)
* Fix schedule controller to prevent backup queue accumulation during extended blocking scenarios by properly handling empty backup phases (#9264, @shubham-pampattiwar)
* Fix repository maintenance jobs to inherit allowlisted tolerations from Velero deployment (#9256, @shubham-pampattiwar)
* Implement wildcard namespace pattern expansion for backup namespace includes/excludes. This change adds support for wildcard patterns (*, ?, [abc], {a,b,c}) in namespace includes and excludes during backup operations (#9255, @Joeavaikath)
* Protect VolumeSnapshot field from race condition during multi-thread backup (#9248, @0xLeo258)
* Update AzureAD Microsoft Authentication Library to v1.5.0 (#9244, @priyansh17)
* Get pod list once per namespace in pvc IBA (#9226, @sseago)
* Fix issue #7725, add design for backup repo cache configuration (#9148, @Lyndon-Li)
* Fix issue #9229, don't attach backupPVC to the source node (#9233, @Lyndon-Li)
* feat: Permit specifying annotations for the BackupPVC (#9173, @clementnuss)

View File

@@ -1 +0,0 @@
Add `--apply` flag to `install` command, allowing usage of Kubernetes apply to make changes to existing installs

View File

@@ -1 +0,0 @@
feat: Enhance BackupStorageLocation with Secret-based CA certificate support

View File

@@ -1 +0,0 @@
Fix issue #7725, add design for backup repo cache configuration

View File

@@ -1 +0,0 @@
feat: Permit specifying annotations for the BackupPVC

View File

@@ -1 +0,0 @@
Remove labels associated with previous backups

View File

@@ -1 +0,0 @@
Get pod list once per namespace in pvc IBA

View File

@@ -1 +0,0 @@
Fix issue #9229, don't attach backupPVC to the source node

View File

@@ -1 +0,0 @@
Update AzureAD Microsoft Authentication Library to v1.5.0

View File

@@ -1 +0,0 @@
Protect VolumeSnapshot field from race condition during multi-thread backup

View File

@@ -1,10 +0,0 @@
Implement wildcard namespace pattern expansion for backup namespace includes/excludes.
This change adds support for wildcard patterns (*, ?, [abc], {a,b,c}) in namespace includes and excludes during backup operations.
When wildcard patterns are detected, they are expanded against the list of active namespaces in the cluster before the backup proceeds.
Key features:
- Wildcard patterns in namespace includes/excludes are automatically detected and expanded
- Pattern validation ensures unsupported patterns (regex, consecutive asterisks) are rejected
- Empty wildcard results (e.g., "invalid*" matching no namespaces) correctly result in empty backups
- Exact namespace names and "*" continue to work as before (no expansion needed)

View File

@@ -1 +0,0 @@
Fix repository maintenance jobs to inherit allowlisted tolerations from Velero deployment

View File

@@ -1 +0,0 @@
Fix schedule controller to prevent backup queue accumulation during extended blocking scenarios by properly handling empty backup phases

View File

@@ -1 +0,0 @@
Fix issue #7904, remove the code and doc for PVC node selection

View File

@@ -1 +0,0 @@
Implement concurrency control for cache of native VolumeSnapshotter plugin.

View File

@@ -1 +0,0 @@
Fix issue #9193, don't connect repo in repo controller

View File

@@ -1 +0,0 @@
Add option for privileged fs-backup pod

View File

@@ -1 +0,0 @@
Fix issue #9267, add events to data mover prepare diagnostic

View File

@@ -1 +0,0 @@
VerifyJSONConfigs verify every elements in Data.

View File

@@ -1 +0,0 @@
Concurrent backup processing

View File

@@ -1 +0,0 @@
Sanitize Azure HTTP responses in BSL status messages

View File

@@ -1 +0,0 @@
Fix typos in documentation

View File

@@ -1 +0,0 @@
Fix issue #9332, add bytesDone for cache files

View File

@@ -1 +0,0 @@
Add cache configuration to VGDP

View File

@@ -1 +0,0 @@
Fix the Job build error when BackupReposiotry name longer than 63.

View File

@@ -1 +0,0 @@
Add cache dir configuration for udmrepo

View File

@@ -1 +0,0 @@
Add snapshotSize for DataDownload, PodVolumeRestore

View File

@@ -1 +0,0 @@
Add incrementalSize to DU/PVB for reporting new/changed size

View File

@@ -1 +0,0 @@
Support cache volume for generic restore exposer and pod volume exposer

View File

@@ -1 +0,0 @@
Use hookIndex for recording multiple restore exec hooks.

View File

@@ -1 +0,0 @@
Fix managed fields patch for resources using GenerateName

View File

@@ -1 +0,0 @@
Track actual resource names for GenerateName in restore status

View File

@@ -1 +0,0 @@
Add cache volume configuration

View File

@@ -1 +0,0 @@
Fix issue #9365, prevent fake completion notification due to multiple update of single PVR

View File

@@ -1 +0,0 @@
Refactor repo provider interface for static configuration

View File

@@ -1 +0,0 @@
don't copy securitycontext from first container if configmap found

View File

@@ -1 +0,0 @@
Cache volume support for DataDownload

View File

@@ -1 +0,0 @@
Cache volume for PVR

View File

@@ -0,0 +1 @@
Include InitContainer configured as Sidecars when validating the existence of the target containers configured for the Backup Hooks

View File

@@ -1 +0,0 @@
Fix issue #9400, connect repo first time after creation so that init params could be written

View File

@@ -1 +0,0 @@
Add Prometheus metrics for maintenance jobs

View File

@@ -1 +0,0 @@
Fix issue #9276, add doc for cache volume support

View File

@@ -1 +0,0 @@
Apply volume policies to VolumeGroupSnapshot PVC filtering

View File

@@ -1 +0,0 @@
Fix issue #9194, add doc for GOMAXPROCS behavior change

View File

@@ -1 +0,0 @@
Remove VolumeSnapshotClass from CSI B/R process.

View File

@@ -1 +0,0 @@
Add PVC-to-Pod cache to improve volume policy performance

View File

@@ -1 +0,0 @@
Fix plugin init container names exceeding DNS-1123 limit

View File

@@ -1 +0,0 @@
Add maintenance job and data mover pod's labels and annotations setting.

View File

@@ -0,0 +1 @@
Support all glob wildcard characters in namespace validation

View File

@@ -0,0 +1 @@
Fix VolumePolicy PVC phase condition filter for unbound PVCs (#9507)

View File

@@ -0,0 +1 @@
Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver by creating stub VolumeGroupSnapshotContent during restore and looking up VolumeSnapshotClass by driver for credential support

View File

@@ -0,0 +1 @@
Add block data mover design for block level incremental backup by integrating with Kubernetes CBT

View File

@@ -0,0 +1 @@
Fix issue #9343, include PV topology to data mover pod affinities

View File

@@ -0,0 +1 @@
Fix issue #9496, support customized host os

View File

@@ -0,0 +1 @@
Add custom action type to volume policies

View File

@@ -0,0 +1 @@
If BIA return updateObj with SkipFromBackupAnnotation, treat it as skip the resource from backup.

View File

@@ -0,0 +1 @@
Issue #9544: Add test coverage for S3 bucket name in MRAP ARN notation and fix bucket validation to accept ARN format

View File

@@ -0,0 +1 @@
Fix issue #9475, use node-selector instead of nodName for generic restore

View File

@@ -0,0 +1 @@
Fix issue #9460, flush buffer before data mover completes

View File

@@ -0,0 +1 @@
Add schedule_expected_interval_seconds metric for dynamic backup alerting thresholds (#9559)

View File

@@ -0,0 +1 @@
Add ephemeral storage limit and request support for data mover and maintenance job

View File

@@ -0,0 +1 @@
Fix DBR stuck when CSI snapshot no longer exists in cloud provider

View File

@@ -0,0 +1 @@
Add check for file extraction from tarball.

View File

@@ -0,0 +1 @@
Implement original VolumeSnapshotContent deletion for legacy backups

View File

@@ -0,0 +1 @@
Fix issue #9626, let go for uninitialized repo under readonly mode

View File

@@ -0,0 +1 @@
Fix issue #9636, fix configmap lookup in non-default namespaces

View File

@@ -0,0 +1 @@
Fix issue #9641, Remove redundant ReadyToUse polling in CSI VolumeSnapshotContent delete plugin

View File

@@ -0,0 +1 @@
Fix service restore with null healthCheckNodePort in last-applied-configuration label

View File

@@ -0,0 +1 @@
Fix issue #9659, in the case that PVB/PVR/DU/DD is cancelled before the data path is really started, call EndEvent to prevent data mover pod from crashing because of delay event distribution

View File

@@ -0,0 +1 @@
Fix issue #9666, fix node-agent node detection in multiple instances scenario

View File

@@ -0,0 +1 @@
Fix issue #9470, remove restic from repository

View File

@@ -0,0 +1 @@
Fix issue #9469, remove restic for uploader

View File

@@ -0,0 +1 @@
Fix issue #9681, fix restores and podvolumerestores list options to only list in installed namespace

View File

@@ -0,0 +1 @@
Fix issue #9428, increase repo maintenance history queue length from 3 to 25

View File

@@ -0,0 +1 @@
Enhance backup deletion logic to handle tarball download failures

View File

@@ -0,0 +1 @@
Bump external-snapshotter to v8.4.0 and migrate VolumeGroupSnapshot API from v1beta1 to v1beta2 for Kubernetes 1.34+ compatibility

View File

@@ -0,0 +1 @@
Fix issue #9699, add a 2-second gap between temporary CSI VolumeSnapshotContent create and delete operations

View File

@@ -0,0 +1 @@
Update Debian base image from bookworm to trixie

View File

@@ -0,0 +1 @@
Fix issue #9703, fix CSI PVC Backup Plugin list options to only list in installed namespace

View File

@@ -0,0 +1 @@
perf: better string concatenation

View File

@@ -0,0 +1 @@
Remove Restic build from Dockerfile, Makefile and Tiltfile.

View File

@@ -69,9 +69,7 @@ spec:
- ""
type: string
resticIdentifier:
description: |-
ResticIdentifier is the full restic-compatible string for identifying
this repository. This field is only used when RepositoryType is "restic".
description: Deprecated
type: string
volumeNamespace:
description: |-

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 498 KiB

View File

@@ -0,0 +1,551 @@
# Block Data Mover Design
## Glossary & Abbreviation
**Backup Storage**: The storage to store the backup data. Check [Unified Repository design][1] for details.
**Backup Repository**: Backup repository is layered between BR data movers and Backup Storage to provide BR related features that is introduced in [Unified Repository design][1].
**Velero Generic Data Path (VGDP)**: VGDP is the collective of modules that is introduced in [Unified Repository design][1]. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository.
**Velero Built-in Data Mover (VBDM)**: VBDM, which is introduced in [Volume Snapshot Data Movement design][2] and [Unified Repository design][1], is the built-in data mover shipped along with Velero, it includes Velero data mover controllers and VGDP.
**Data Mover Pods**: Intermediate pods which hold VGDP and complete the data transfer. See [VGDP Micro Service for Volume Snapshot Data Movement][3] for details.
**Change Block Tracking (CBT)**: CBT is the mechanism to track changed blocks, so that backups could back up the changed data only. CBT usually provides by the computing/storage platform.
**TCO**: Total Cost of Ownership. This is a general criteria for products/solutions, but also means a lot for BR solutions. For example, this means what kind of backup storage (and its cost) it requires, the retention policy of backup copies, the ways to remove backup data redundancy, etc.
**PodVolume Backup**: This is the Velero backup method which accesses the data from live file system, see [Kopia Integration design][1] for how it works.
**CAOS and CABS**: Content-Addressable Object Storage and Content-Addressable Block Storage, they are the parts from Kopia repository, see [Kopia Architecture][5].
## Background
Kubernetes supports two kinds of volume mode, `FileSystem` and `Block`, for persistent volumes. Underlyingly, the storage could use a block storage to provision either `FileSystem` mode or `Block` mode volumes; and the storage could use a file storage to provision `FileSystem` mode volumes.
For volumes provisioned by block storage, they could be backed up/restored from the block level, regardless the volume mode of the persistent volume.
On the other hand, as long as the data could be accessed from the file system, a backup/restore could be conducted from the file system level. That is to say `FileSystem` mode volumes could be backed up/restored from the file system level, regardless of the backend storage type.
Then if a `FileSystem` mode volume is provisioned by a block storage, the volume could be backed up/restored either from the file system level or block level.
For Velero, [CSI Snapshot Data Movement][2] which is implemented by VBDM, ships a file system uploader, so the backup/restore is done from file system only.
Once possible, block level backup/restore is better than file system level backup/restore:
- Block level backup could leverage CBT to process minimal size of data, so it significantly reduces the overhead to network, backup repository and backup storage. As a result, TCO is significantly reduced.
- Block level backup/restore is performant in throughput and resource consumption, because it doesn't need to handle the complexity of the file system, especially for the case that huge number of small files in the file system.
- Block level backup/restore is less OS dependent because the uploader doesn't need the OS to be aware of the file system in the volume.
At present, [Kubernetes CBT API][4] is mature and close to Beta stage. Many platform/storage has supported/is going to support it.
Therefore, it is very important for Velero to deliver the block level backup/restore and recommend users to use it over the file system data mover as long as:
- The volume is backed by block storage so block level access is possible
- The platform supports CBT
Meanwhile, file system level backup/restore is still valuable for below scenarios:
- The volume is backed by file storage, e.g., AWS EFS, Azure File, CephFS, VKS File Volume, etc.
- The volume is backed by block storage but CBT is not available
- The volume doesn't support CSI snapshot, so Velero PodVolume Backup method is used
There are rich features delivered with VGDP, VBDM and [VGDP micro service][3], to reuse these features, block data mover should be built based on these modules.
Velero VBDM supports linux and Windows nodes, however, Windows container doesn't support block mode volumes, so backing up/restoring from Windows nodes is not supported until Windows container removes this limitation. As a result, if there are both linux and Windows nodes in the cluster, block data mover can only run in linux nodes.
Both the Kubernetes CBT service and Velero work in the boundary of the cluster, even though the backend storage may be shared by multiple clusters, Velero can only protection workloads in the same cluster where it is running.
## Goals
Add a block data mover to VBDM and support block level backup/restore for [CSI Snapshot Data Movement][2], which includes:
- Support block level full backup for both `FileSystem` and `Block` mode volumes
- Support block level incremental backup for both `FileSystem` and `Block` mode volumes
- Support block level restore from full/incremental backup for both `FileSystem` and `Block` mode volumes
- Support block level backup/restore for both linux and Windows workloads from linux cluster nodes
- Support all existing features, i.e., load concurrency, node selection, cache volume, deduplication, compression, encryption, etc. for the block data mover
- Support volumes processed from file system level and block level in the same backup/restore
## Non-Goals
- PodVolume Backup does the backup/restore from file system level only, so block level backup/restore is not supported
- Volumes that are backed by file system storages, can only be backed up/restored from file system level, so block level backup/restore is not supported
- Backing up/restoring from Windows nodes is not supported
- Block level incremental backup requires special capabilities of the backup repository, and Velero [Unified Repository][1] supports multiple kinds of backup repositories. The current design focus on Kopia repository only, block level incremental backup support of other repositories will be considered when the specific backup repository is integrated to [Velero Unified Repository][1]
## Architecture
### Data Path
Below shows the architecture of VGDP when integrating to Unified Repository (implemented by Kopia repository).
A new block data mover will be added besides the existing file system data mover, the both data movers read/write data from/to the same backup repository through Unified Repo interface.
Unified Repo interface and the backup repository needs to be enhanced to support incremental backups.
![Data path overview](data-path-overview.png)
For more details of VGDP architecture, see [Unified Repository design][1], [Volume Snapshot Data Movement design][2] and [VGDP Micro Service for Volume Snapshot Data Movement][3].
### Backup
Below is the architecture for block data mover backup which is developed based on the existing VBDM:
![Backup architecture](backup-architecture.png)
The existing VBDM is reused, below are the major changes based on the existing VBDM:
**Exposer**: Exposer needs to create block mode backupPVC all the time regardless of the sourcePVC mode.
**CBT**: This is a new layer to retrieve, transform and store the changed blocks, it interacts with CSI SnapshotMetadataService through gRPC.
**Uploader**: A new block uploader is added. It interacts with CBT layer, holds special logics to make performant data read from block devices and holds special logics to write incremental data to Unified Repository.
**Extended Kopia repo**: A new Incremental Aware Object Extension is added to Kopia's CAOS, so as to support incremental data write. Other parts of Kopia repository, including the existing CAOS and CABS, are not changed.
### Restore
Below is architecture for block data mover restore which is developed based on the existing VBDM:
![Restore architecture](restore-architecture.png)
The existing VBDM is reused, below are the major changes based on the existing VBDM:
**Exposer**: While the restorePV is in block mode, exposer needs to rebind the restorePV to a targetPVC in either file system mode or block mode.
**Uploader**: The same block uploader holds special logics to make performant data write to block devices and holds special logics to read data from the backup chain in Unified repository.
For more details of VBDM, see [Volume Snapshot Data Movement design][2].
## Detailed Design
### Selectable Data Mover Type
#### Per Backup Selection
At present, the backup accepts a `DataMover` parameter and when its value is empty or `velero`, VBDM is used.
After block data mover is introduced, VBDM will have two types of data movers, Velero file system data mover and Velero block data mover.
A new type string `velero-block` is introduced for Velero block data mover, that is, when `DataMover` is set as `velero-block`, Velero block data mover is used.
Another new value `velero-fs` is introduced for Velero file system data mover, that is, when `DataMover` is set as `velero-fs`, Velero file system data mover is used.
For backwards compatibility consideration, `velero` is preserved a valid value, it refers to the default data mover, and the default data mover may change among releases. At present, Velero file system data mover is the default data mover; we can change the default one to Velero block data mover in future releases.
#### Volume Policy
It is a valid case that users have multiple volumes in a single backup, while they want to use Velero file system data mover for some of the volumes and use Velero block data mover for some others.
To meet this requirement, a combined solution of Per Backup Selection and Volume Policy is used.
Here are the data structs for VolumePolicy:
```go
type volPolicy struct {
action Action
conditions []volumeCondition
}
type volumeCondition interface {
match(v *structuredVolume) bool
validate() error
}
type structuredVolume struct {
capacity resource.Quantity
storageClass string
nfs *nFSVolumeSource
csi *csiVolumeSource
volumeType SupportedVolume
pvcLabels map[string]string
pvcPhase string
}
type Action struct {
Type VolumeActionType `yaml:"type"`
Parameters map[string]any `yaml:"parameters,omitempty"`
}
const (
ConfigmapRefType string = "configmap"
Skip VolumeActionType = "skip"
FSBackup VolumeActionType = "fs-backup"
Snapshot VolumeActionType = "snapshot"
)
```
`action.parameters` is used to provide extra information of the action. This is an ideal place to differentiate Velero file system data mover and Velero block data mover.
Therefore, Velero built-in data mover will support `dataMover` key in `parameters`, with the value either `velero-fs` or `velero-block`. While `velero-fs` and `velero-block` are with the same meaning with Per Backup Selection.
As an example, here is how a user might use both `velero-block` and `velero-fs` in a single backup:
- Users set `DataMover` parameter for the backup as `velero-block`
- Users add a record into Volume Policy, make `conditions` to filter the volumes they want to backup through Velero file system data mover, make `action.type` as `snapshot` and insert a record into `action.parameter` as `dataMover:velero-fs`
In this way, all volumes matched by `conditions` will be backed up with Velero file system data mover; while the others will fallback to the per backup method Velero block data mover.
Vice versa, users could set the per backup method as file system data mover and select volumes for Velero block data mover.
The selected data mover for each volume should be recorded to `volumeInfo.json`.
### Controllers
Backup controller and Restore controller are kept as is, async operations are still used to interact with VBDM with block data mover.
DataUpload controller and DataDownload controller are almost kept as is, with some minor changes to handle the data mover type and backup type appropriately and convey it to the exposers. With [VGDP Micro Service][3], the controllers are almost isolated from VGDP, so no major changes are required.
### Exposer
#### CSI Snapshot Exposer
The existing CSI Snapshot Exposer is reused with some changes to decide the backupPVC volume mode by access mode. Specifically, for Velero block data mover, access mode is always `Block`, so the backupPVC volume mode is always `Block`.
Once the backupPVC is created with correct volume mode, the existing code could create the backupPod and mount the backupPVC appropriately.
#### Generic Restore Exposer
The existing Generic Restore Exposer is reused, but the workflow needs some changes.
For block data mover, the restorePV is in Block mode all the time, whereas, the targetPVC may be in either file system mode or block mode.
However, Kubernetes doesn't allow to bound a PV to a PVC with mismatch volume mode.
Therefore, the workflow of ***Finish Volume Readiness*** as introduced in [Volume Snapshot Data Movement design][2] is changed as below:
- When restore completes and restorePV is created, set restorePV's `deletionPolicy` to `Retain`
- Create another rebindPV and copy restorePV's `volumeHandle` but the `volumeMode` matches to the targetPVC
- Delete restorePV
- Set the rebindPV's claim reference (the ```claimRef``` filed) to targetPVC
- Add the ```velero.io/dynamic-pv-restore``` label to the rebindPV
In this way, the targetPVC will be bound immediately by Kubernetes to rebindPV.
These changes work for file system data mover as well, so the old workflow will be replaced, only the new workflow is kept.
### VGDP
Below is the VGDP workflow during backup:
![VGDP Backup](vgdp-backup.png)
Below is the VGDP workflow during restore:
![VGDP Restore](vgdp-restore.png)
#### Unified Repo
For block data mover, one Unified Repo Object is created for each volume, and some metadata is also saved into Unified Repo to describe the volume.
During the backup, the write conducts a skippable-write manner:
- For the data range that the write does not skip, object is written with the real data
- For the data range that is skipped, the data is either filled as ZERO or cloned from the parent object. Specifically, for a full backup, data is filled as ZERO; for an incremental backup, data is cloned from the parent object
To support incremental backup, `ObjectWriter` interface needs to extend to support `io.WriterAt`, so that uploader could conduct a skippable-write manner:
```go
type ObjectWriter interface {
io.WriteCloser
io.WriterAt
// Seeker is used in the cases that the object is not written sequentially
io.Seeker
// Checkpoint is periodically called to preserve the state of data written to the repo so far.
// Checkpoint returns a unified identifier that represent the current state.
// An empty ID could be returned on success if the backup repository doesn't support this.
Checkpoint() (ID, error)
// Result waits for the completion of the object write.
// Result returns the object's unified identifier after the write completes.
Result() (ID, error)
}
```
To clone data from parent object, the caller needs to specify the parent object. To support this, `ObjectWriteOptions` is extended with `ParentObject`.
The existing `AccessMode` could be used to indicate the data access type, either file system or block:
```go
// ObjectWriteOptions defines the options when creating an object for write
type ObjectWriteOptions struct {
FullPath string // Full logical path of the object
DataType int // OBJECT_DATA_TYPE_*
Description string // A description of the object, could be empty
Prefix ID // A prefix of the name used to save the object
AccessMode int // OBJECT_DATA_ACCESS_*
BackupMode int // OBJECT_DATA_BACKUP_*
AsyncWrites int // Num of async writes for the object, 0 means no async write
ParentObject ID // the parent object based on which incremental write will be done
}
```
To support non-Kopia uploader to save snapshots to Unified Repo, snapshot related methods will be added to `BackupRepo` interface:
```go
// SaveSnapshot saves a repo snapshot
SaveSnapshot(ctx context.Context, snapshot Snapshot) (ID, error)
// GetSnapshot returns a repo snapshot from snapshot ID
GetSnapshot(ctx context.Context, id ID) (Snapshot, error)
// DeleteSnapshot deletes a repo snapshot
DeleteSnapshot(ctx context.Context, id ID) error
// ListSnapshot lists all snapshots in repo for the given source (if specified)
ListSnapshot(ctx context.Context, source string) ([]Snapshot, error)
```
To support non-Kopia uploader to save metadata, which is used to describe the backed up objects, some metadata related methods will be added to `BackupRepo` interface:
```go
// WriteMetadata writes metadata to the repo, metadata is used to describe data, e.g., file system
// dirs are saved as metadata
WriteMetadata(ctx context.Context, meta *Metadata, opt ObjectWriteOptions) (ID, error)
// ReadMetadata reads a metadata from repo by the metadata's object ID
ReadMetadata(ctx context.Context, id ID) (*Metadata, error)
```
kopia-lib for Unified Repo will implement these interfaces by calling the corresponding Kopia repository functions.
### Kopia Repository
CAOS of Kopia repository implements Unified Repo's Objects. However, CAOS supports full and sequential write only.
To make it support skippable write, a new Incremental Aware Object Extension is created based on the existing CAOS.
#### Block Address Table
Kopia CAOS uses Block Address Table (BAT) to track objects. It will be reused for both full backups and incremental backups.
![Incremental Aware Object Extension](caos-extension.png)
For Incremental Aware Object Extension, one object represents one volume.
For full backup, the skipped areas will be written as all ZERO by Incremental Aware Object Extension, since Kopia repository's interface doesn't support skippable write. But it is fine, the ZERO data will be deduplicated by Kopia repository so nothing is actually written to the backup storage.
For incremental backup, Incremental Aware Object Extension clones the table entries from the parent object for the skipped areas; for the written area, Incremental Aware Object Extension writes the data to Kopia repository and generate new entries. Finally, Incremental Aware Object Extension generates a new block address table for the incremental object which covers its entire logical space.
Incremental Aware Object Extension is automatically activated for block mode data access as set by `AccessMode` of `ObjectWriteOptions`.
#### Deduplication
The Incremental Aware Object Extension uses fix-sized splitter for deduplication, this is good enough for block level backup, reasons:
- Not like a file, a disk write never inserts data to the middle of the disk, it only does in-place update or append. So the data never shifts between two disks or the same disk of two different backups
- File system IO to disk general aligned to a specific size, e.g., 4KB for NTFS and ext4, as long as the chunk size is a multiply of this size, it effectively reduces the case that one IO kills two deduplication chunks
- For the usage cases that the disk is used as raw block device without a file system, the IO is still conducted by aligning to a specific boundary
The chunk size is intentionally chosen as 1MB, reasons:
- 1MB is a multiply of 4KB for file systems or common block sizes for raw block device usages
- 1MB is the start boundary of partitions for modern operating systems, for both MBR and GPT, so partition metadata could be isolated to a separate chunk
- The more chunks are there, the more indexes in the repository, 1MB is a moderate value regarding to the overhead of indexes for Kopia repository
#### Benefits
Since the existing block address table(BAT) of CAOS is reused and kept as is, it brings below benefits:
- All the entries are still managed by Kopia CAOS, so Velero doesn't need to keep an extra data
- The objects written by Velero block uploader is still recognizable by Kopia, for both full backup and incremental backup
- The existing data management in Kopia repository still works for objects generated by Velero block uploader, e.g., snapshot GC, repository maintenance, etc.
Most importantly, this solution is super performant:
- During incremental write, it doesn't copy any data from the parent object, instead, it only clones object block address entries
- During backup deletion, it doesn't need to move any data, it only deletes the BAT for the object
#### Uploader behavior
The block uploader's skippable write must also be aligned to this 1MB boundary, because Incremental Aware Object Extension needs to clone the entries that have been skipped from the parent object.
File system uploader is still using variable-sized deduplication, it is fine to keep data from the two uploaders into the same Kopia repository, though normally they won't be mutually deduplicated.
Volume could be resized; and volume size may not be aligned to 1MB boundary. The uploader need to handle the resize appropriately since Incremental Aware Object Extension cannot copy a BAT entry partially.
#### CBT Layer
CBT provides below functionalities:
1. For a full backup, it provides the allocated data ranges. E.g., for a 1TB volume, there may be only 1MB of files, with this functionality, the uploader could skip the ranges without real data
2. For an incremental backup, it provides the changed data ranges based on the provided parent snapshot. In this way, the uploader could skip the unchanged data and achieves an incremental backup
For case 1, the uploader calls Unified Repo Object's `WriteAt` method with the offset for the allocated data, ranges ahead of the offset will be filled as ZERO by unified repository.
For case 2, the uploader calls Unified Repo Object's `WriteAt` method with the offset for the changed data, ranges ahead of the offset will be cloned from the parent object unified repository.
A changeId is stored with each backup, the next backup will retrieve the parent snapshot's changeId and use it to retrieve the CBT.
The CBT retrieved from Kubernetes API are a list of `BlockMetadata`, each of range could be with fixed size or variable size.
Block uploader needs to maintain its own granularity that is friendly to its backup repository and uploader, as mentioned above.
From Kubernetes API, `GetMetadataAllocated` or `GetMetadataDelta` are called looply until all `BlockMetadata` are retrieved.
On the other hand, considering the complexity in uploader, e.g., multiple stream between read and write, the workflow should be driven by the uploader instead of the CBT iterator, therefore, in practice, all the allocated/changed blocks should be retrieved and preserved before passing it to the uploader.
As another fact, directly saving `BlockMetadata` list will be memory consuming.
With all the above considerations, the `Bitmap` data structure is used to save the allocated/changed blocks, calling CBT Bitmap.
CBT Bitmap chunk size could be set as 1MB or a multiply of it, but a larger chunk size would amplify the backup size, so 1MB size will be use.
Finally, interactions among CSI Snapshot Metadata Service, CBT Layer and Uploader is like below:
![CBT Layer](cbt.png)
In this way, CBT layer and uploader are decoupled and CBT bitmap plays as a north bound parameter of the uploader.
#### Block Uploader
Block uploader consists of the reader and writer which are running asynchronously.
During backup, reader reads data from the block device and also refers to CBT Bitmap for allocated/changed blocks; writer writes data to the Unified Repo.
During restore, reader reads data from the Unified Repo; writer writes data to the block device.
Reader and writer connects by a ring buffer, that is, reader pushes the block data to the ring buffer and writer gets data from the ring buffer and write to the target.
To improve performance, block device is opened with direct IO, so that no data is going through the system cache unnecessarily.
During restore, to optimize the write throughput and storage usage, zero blocks should be either skipped (for restoring to a new volume) or unmapped (for restoring to an existing volume). To cover the both cases in a unified way, the SCSI command `WRITE_SAME` is used. Logics are as below:
- Detect if a block read from the backup is with all zero data
- If true, the uploader sends `WRITE_SAME` SCSI command by calling `BLKZEROOUT` ioctl
- If the call fails, the uploader fallbaks to use the conservative way to write all zero bytes to the disk
Uploader implementation is OS dependent, but since Windows container doesn't support block volumes, the current implementation is for linux only.
#### ChangeId
ChangeId identifies the base that CBT is generated from, it must strictly map to the parent snapshot in the repository. Otherwise, there will be data corruption in the incremental backup.
Therefore, ChangeId is saved together with the repository snapshot.
The data mover always queries parent snapshot from Unified Repo together with the ChangeId. In this way, no mismatch would happen.
Inside the uploader, the upper layer (DataUpload controller) could also provide the ChangeId as a mechanism of double confirmation. The received ChangeId would be re-evaluated against the one in the provided snapshot.
For Kubernetes API, changeId is represented by `BaseSnapshotId`.
changeId retrieval is storage specific, generally, it is retrieved from the `SnapshotHandle` of the VolumeSnapshotContent object; however, storages may also refer to other places to retrieve the changeId.
That is, `SnapshotHandle` and changeId may be two different values, in this case, the both values need to be preserved.
#### Volume Snapshot Retention
Storages/CSI drivers may support the changeId differently based on the storage's capabilities:
1. In order to calculate the changes, some storages require the parent snapshot mapping to the changeId always exists at the time of `GetMetadataDelta` is called, then the parent snapshot can NOT be deleted as long as there are incremental backups based on it.
2. Some storages don't require the parent snapshot itself at the time of calculating changes, then parent snapshot could be deleted immediately after the parent backup completes.
The existing exposer works perfectly with Case 1, that is, the snapshot is always deleted when the backup completes.
However, for Case 2, since the snapshot must be retained, the exposer needs changes as below:
- At the end of each backup, keep the current VolumeSnapshot's `deletionPolicy` as `Retain`, then when the VolumeSnapshot is deleted at the end of the backup, the current snapshot is retained in the storage
- `GetMetadataDelta` is called with `BaseSnapshotId` set as the preserved changeId
- When deleting a backup, a VolumeSnapshot-VolumeSnapshotContent pair is rebuilt with `deletionPolicy` as `delete` and `snapshotHandle` as the preserved one
- Then the rebuilt VolumeSnapshot is deleted so that the volume snapshot is deleted from the storage
There is no way to automatically detect which way a specific volume support, so an interface is exposed to users to set the volume snapshot retention method.
The interface could be added to the `Action.Parameters` of Volume Policy. By default, Velero block data mover takes Way 1, so volume snapshot is never retained; if users specify `RetainSnapshot` parameter, Way 2 will be taken.
```go
type Action struct {
Type VolumeActionType `yaml:"type"`
Parameters map[string]any `yaml:"parameters,omitempty"`
}
```
In this way, users could specify --- for storage class "xxx" or CSI driver "yyy", backup through CSI snapshot with Velero block data mover and retain the snapshot.
#### Incremental Size
By the end of the backup, incremental size is also returned by the uploader, as same as Velero file system uploader. The size indicates how much data are unique so processed by the uploader, based on the provided CBT.
### Fallback to Full Backup
There are some occasions that the incremental backup won't continue, so the data mover fallbacks to full backup:
- `GetMetadataAllocated` or `GetMetadataDelta` returns error
- ChangeId is missing
- Parent snapshot is missing
When the fallback happens, the volume will be fully backed up from block level, but since because of the data deduplication from the backup repository, the unallocated/unchanged data would be probably deduplicated.
During restore, the volume will also be fully restored. The zero blocks handling as mentioned above is still working, so that write IO for unallocated data would be probably eliminated.
Fallback is to handle the exceptional cases, for most of the backups/restores, fallback is never expected.
### Irregular Volume Size
As mentioned above, during incremental backup, block uploader IO should be restricted to be aligned to the deduplication chunk size (1MB); on the other hand, there is no hard limit for users' volume size to be aligned.
To support volumes with irregular size, below measures are taken:
- Volume objects in the repository is always aligned to 1MB
- If the volume size is irregular, zero bytes will be padded to the tail of the volume object
- A real size is recorded in the repository snapshot
- During restore, the real size of data is restored
The padding must be always with zero bytes.
### Volume Size Change
Incremental backup could continue when volume is resized.
Block uploader supports to write disk with arbitrary size.
The volume resize cases don't need to be handled case by case.
Instead, when volume resize happens, block uploader needs to handle it appropriately in below ways:
- Loop with CBT
- Read data between RoundDownTo1M(newSize) and newSize to get the tail data
- If there is no tail data, which means the volume size is aligned to 1MB, then call `WriteAt(newSize, nil)`
- Otherwise, call `WriteAt(RoundDownTo1M(newSize), taildata)`, `taildata` is also padded to 1MB
That is to say:
- If CBT covers the tail of the volume, loop with CBT is enough for both shrink and expand case
- Otherwise, if volume is expanded, `WriteAt` guarantees to clone appropriate objects entries from the parent object and append zero data for the expanded areas. Particularly, if the parent volume is not in regular size, the zero padding bytes is also reused. Therefore, the parent object's padding bytes must be zero
- In the case the volume is shrunk, writing the tail data makes sure zero bytes are padding to the new volume object instead of inheriting non-zero data from the parent object
### Cancellation
The existing Cancellation mechanism is reused, so there is no change outside of the block uploader.
Inside the uploader, cancellation checkpoints are embedded to the uploader reader and writer, so that the execution could quit in a reasonable time once cancellation happens.
### Parallelism
Parallelism among data movers will reuse the existing mechanism --- load concurrency.
Inside the data mover, uploader reader and writer are always running in parallel. The number of reader and writer is always 1.
Sequential read/write of the volume is always optimized, there is no prove that multiple readers/writers are beneficial.
### Progress Report
Progress report outside of the data mover will reuse the existing mechanism.
Inside the data mover, progress update is embedded to the uploader writer.
The progress struct is kept as is, Velero block data mover still supports `TotalBytes` and `BytesDone`:
```go
type Progress struct {
TotalBytes int64 `json:"totalBytes,omitempty"`
BytesDone int64 `json:"doneBytes,omitempty"`
}
```
By the end of the backup, the progress for block data mover provides the same `GetIncrementalSize` which reports the incremental size of the backup, so that the incremental size is reported to users in the same way as the file system data mover.
### Selectable Backup Type
For many reasons, a periodical full backup is required:
- From user experience, a periodical full is required to make sure the data integrity among the incremental backups, e.g., every 1 week or 1 month
Therefore, backup type (full/incremental) should be supported in Velero's manual backup and backup schedule.
Backup type will also be added to `volumeInfo.json` to support observability purposes.
Backup TTL is still used for users to specify a backup's retention time. By default, both full and incremental backups are with 30 days retention, even though this is not so reasonable for the full backups. This could be enhanced when Velero supports sophisticated retention policy.
As a workaround, users could create two schedules for the same scope of backup, one is for full backups, with less frequency and longer backup TTL; the other one is for incremental backups, with normal frequency and shorter backup TTL.
#### File System Data Mover
At present, Velero file system data mover doesn't support selectable backup type, instead, incremental backups are always conducted once possible.
From user experience this is not reasonable.
Therefore, to solve this problem and to make it align with Velero block data mover, Velero file system data mover will support backup type as well.
At present, the data path for Velero file system data mover has already supported it, we only need to expose this functionality to users.
### Backup Describe
Backup type should be added to backup description, there are two appearances:
- The `backupType` in the Backup CR. This is the selected backup type by users
- The backup type recorded in `volumeInfo.json`, which is the actual type taken by the backup
With these two values, users are able to know the actual backup type and also whether a fallback happens.
The `DataMover` item in the existing backup description should be updated to reflect the actual data mover completing the backup, this information could be retrieved from `volumeInfo.json`.
### Backup Sync
No more data is required for sync, so Backup Sync is kept as is.
### Backup Deletion
As mentioned above, no data is moved when deleting a repo snapshot for Velero block data mover, so Backup Deletion is kept as is regarding to repo snapshot; and for volume snapshot retention case, backup deletion logics will be modified accordingly to delete the retained snapshots.
### Restarts
Restarts mechanism is reused without any change.
### Logging
Logging mechanism is not changed.
### Backup CRD
A `backupType` field is added to Backup CRD, two values are supported `full` or `incremental`.
`full` indicates the data mover to take a full backup.
`incremental` which is the default value, indicates the data mover to take an incremental backup.
```yaml
spec:
description: BackupSpec defines the specification for a Velero backup.
properties:
backupType:
description: BackupType indicates the type of the backup
enum:
- full
- incremental
type: string
```
### DataUpload CRD
A `parentSnapshot` field is added to the DataUpload CRD, below values are supported:
- `""`: it fallbacks to `auto`
- `auto`: it means the data mover finds the recent snapshot of the same volume from Unified Repository and use it as the parent
- `none`: it means the data mover is not assigned with a parent snapshot, so it runs a full backup
- a specific snapshotID: it means the data mover use the specific snapshotID to find the parent snapshot. If it cannot be found, the data mover fallbacks to a full backup
The last option is for a backup plan, it will not be used for now and may be useful when Velero supports sophisticated retention policy. This means, Velero always finds the recent backup as the parent.
When `backupType` of the Backup is `full`, the data mover controller sets `none` to `parentSnapshot` of DataUpload.
When `backupType` of the Backup is `incremental`, the data mover controller sets `auto` to `parentSnapshot` of DataUpload. And `""` is just kept for backwards compatibility consideration.
```yaml
spec:
description: DataUploadSpec is the specification for a DataUpload.
properties:
parentSnapshot:
description: |-
ParentSnapshot specifies the parent snapshot that current backup is based on.
If its value is "" or "auto", the data mover finds the recent backup of the same volume as parent.
If its value is "none", the data mover will do a full backup
If its value is a specific snapshotID, the data mover finds the specific snapshot as parent.
type: string
```
### DataDownload CRD
No change is required to DataDownload CRD.
## Plugin Data Movers
The current design doesn't break anything for plugin data movers.
The enhancement in VolumePolicy could also be used for plugin data movers. That is, users could select a plugin data mover through VolumePolicy as same as Velero built-in data movers.
## Installation
No change to Installation.
## Upgrade
No impacts to Upgrade. The new fields in the CRDs are all optional fields and have backwards compatible values.
## CLI
Backup type parameter is added to Velero CLI as below:
```
velero backup create --full
velero schedule create --full
```
When the parameter is not specified, by default, Velero goes with incremental backups.
[1]: ../Implemented/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md
[2]: ../Implemented/volume-snapshot-data-movement/volume-snapshot-data-movement.md
[3]: ../Implemented/vgdp-micro-service/vgdp-micro-service.md
[4]: https://kubernetes.io/blog/2025/09/25/csi-changed-block-tracking/
[5]: https://kopia.io/docs/advanced/architecture/

Binary file not shown.

After

Width:  |  Height:  |  Size: 518 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 377 KiB

Some files were not shown because too many files have changed in this diff Show More