velero

mirror of https://github.com/vmware-tanzu/velero.git synced 2026-04-17 14:11:11 +00:00

Author	SHA1	Message	Date
Priyansh Choudhary	8a6ac7af1c	fix: backup deletion silently succeeds when tarball download fails (#9693 ) Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m13s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 11s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 13s Details Main CI / Build (push) Failing after 34s Details Close stale issues and PRs / stale (push) Successful in 14s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m5s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m1s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m4s Details * Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor error handling in backup deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * prevent backup deletion when errors occur Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added logger Signed-off-by: Priyansh Choudhary <im1706@gmail.com>	2026-04-14 16:36:40 -04:00
lyndon-li	4a6756d57b	Merge pull request #9683 from Lyndon-Li/increase-repo-maintenance-history-queue-length Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m18s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 4s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 16s Details Main CI / Build (push) Failing after 31s Details Close stale issues and PRs / stale (push) Successful in 15s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m24s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m4s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m9s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m4s Details Issue 9428: increase repo maintenance history queue length	2026-04-10 11:50:24 +08:00
Xun Jiang/Bruce Jiang	e1cc07cec3	Merge pull request #9695 from shubham-pampattiwar/bump-ext-snapshotter-v8.4-vgs-v1beta2 Bump external-snapshotter to v8.4.0 for VGS v1beta2 support	2026-04-10 11:38:24 +08:00
Lyndon-Li	1730b7f414	issue 9428: incremental repo maintenance history queue length Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2026-04-09 14:41:07 +08:00
lyndon-li	37abfb4bfa	Merge pull request #9682 from adam-jian-zhang/fix-restore-pvr-scope Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m4s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 13s Details Main CI / Build (push) Failing after 30s Details Close stale issues and PRs / stale (push) Successful in 11s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m2s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m2s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m3s Details Fix PodVolumeBackup list scope during restore	2026-04-09 10:59:26 +08:00
Shubham Pampattiwar	1b5503e20b	Bump external-snapshotter to v8.4.0 for VGS v1beta2 support Kubernetes 1.34 introduced VolumeGroupSnapshot v1beta2 API and deprecated v1beta1. Distributions running K8s 1.34+ (e.g. OpenShift 4.21+) have removed v1beta1 VGS CRDs entirely, breaking Velero's VGS functionality on those clusters. This change bumps external-snapshotter/client/v8 from v8.2.0 to v8.4.0 and migrates all VGS API usage from v1beta1 to v1beta2. The v1beta2 API is structurally compatible - the Spec-level types (GroupSnapshotHandles, VolumeGroupSnapshotContentSource) are unchanged. The Status-level change (VolumeSnapshotHandlePairList replaced by VolumeSnapshotInfoList) does not affect Velero as it does not directly consume that type. Fixes #9694 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2026-04-08 15:46:06 -07:00
Shubham Pampattiwar	e439977117	Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver (#9516 ) Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m5s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 14s Details Main CI / Build (push) Failing after 30s Details * Fix VolumeGroupSnapshot restore on Ceph RBD This PR fixes two related issues affecting CSI snapshot restore on Ceph RBD: 1. VolumeGroupSnapshot restore fails because Ceph RBD populates volumeGroupSnapshotHandle on pre-provisioned VSCs, but Velero doesn't create the required VGSC during restore. 2. CSI snapshot restore fails because VolumeSnapshotClassName is removed from restored VSCs, preventing the CSI controller from getting credentials for snapshot verification. Changes: - Capture volumeGroupSnapshotHandle during backup as VS annotation - Create stub VGSC during restore with matching handle in status - Look up VolumeSnapshotClass by driver and set on restored VSC Fixes #9512 Fixes #9515 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Add changelog for VGS restore fix Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix gofmt import order Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Add changelog for VGS restore fix Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix import alias corev1 to corev1api per lint config Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix: Add snapshot handles to existing stub VGSC and add unit tests When multiple VolumeSnapshots from the same VolumeGroupSnapshot are restored, they share the same VolumeGroupSnapshotHandle but have different individual snapshot handles. This commit: 1. Fixes incomplete logic where existing VGSC wasn't updated with new snapshot handles (addresses review feedback) 2. Fixes race condition where Create returning AlreadyExists would skip adding the snapshot handle 3. Adds comprehensive unit tests for ensureStubVGSCExists (5 cases) and addSnapshotHandleToVGSC (4 cases) functions Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Clean up stub VolumeGroupSnapshotContents during restore finalization Add cleanup logic for stub VGSCs created during VolumeGroupSnapshot restore. The stub VGSCs are temporary objects needed to satisfy CSI controller validation during VSC reconciliation. Once all related VSCs become ReadyToUse, the stub VGSCs are no longer needed and should be removed. The cleanup runs in the restore finalizer controller's execute() phase. Before deleting each VGSC, it polls until all related VolumeSnapshotContents (correlated by snapshot handle) are ReadyToUse, with a timeout fallback. Deletion failures and CRD-not-installed scenarios are treated as warnings rather than errors to avoid failing the restore. Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix lint: remove unused nolint directive and simplify cleanupStubVGSC return The cleanupStubVGSC function only produces warnings (not errors), so simplify its return signature. Also remove the now-unused nolint:unparam directive on execute() since warnings are no longer always nil. Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> --------- Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2026-04-08 12:08:56 -07:00
Adam Zhang	dd82645909	Fix PodVolumeBackup list scope during restore Restrict the listing of PodVolumeBackup resources to the specific restore namespace in both the core restore controller and the pod volume restore action plugin. This prevents "Forbidden" errors when Velero is configured with namespace-scoped minimum privileges, avoiding the need for cluster-scoped list permissions for PodVolumeBackups. Fixes: #9681 Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>	2026-04-08 16:50:09 +08:00
Lyndon-Li	9598c50295	Merge branch 'main' into remove-restic-for-repo	2026-04-08 13:37:34 +08:00
Lyndon-Li	dca3d3001f	remove restic for repo Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2026-04-08 11:11:15 +08:00
Lyndon-Li	fca4d405b1	remove restic for uploader Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2026-04-07 18:07:51 +08:00
Lyndon-Li	235e579581	remove restic for repo Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2026-04-07 07:35:25 +00:00
Quang Ngo	6c3d81a146	Add schedule_expected_interval_seconds metric Add a new Prometheus gauge metric that exposes the expected interval between consecutive scheduled backups. This enables dynamic alerting thresholds per schedule backups. Signed-off-by: Quang Ngo <quang.ngo@canonical.com>	2026-03-02 10:20:09 +11:00
Scott Seago	dfb1d45831	Remove backup from running list when backup fails validation (#9498 ) Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m4s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 12s Details Main CI / Build (push) Failing after 32s Details Close stale issues and PRs / stale (push) Successful in 15s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m16s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m30s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m16s Details Signed-off-by: Scott Seago <sseago@redhat.com>	2026-01-27 16:25:30 -05:00
Wenkai Yin(尹文开)	7442d20f9d	Merge pull request #9481 from Lyndon-Li/issue-fix-9478 Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m3s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 15s Details Main CI / Build (push) Failing after 35s Details Close stale issues and PRs / stale (push) Successful in 13s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m46s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m34s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m22s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m18s Details Issue 9478: Diagnose expose on peek error	2026-01-15 16:53:57 +08:00
Lyndon-Li	e72fea8ecd	fix issue for cache volume Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2026-01-14 17:45:01 +08:00
Lyndon-Li	e703e06eeb	diagnose expose on peek error Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2026-01-13 16:33:14 +08:00
lyndon-li	2d93ab261e	Merge pull request #9141 from kaovilai/9097 Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m8s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 16s Details Main CI / Build (push) Failing after 45s Details feat: Enhance BackupStorageLocation with Secret-based CA certificate support	2025-12-19 13:13:47 +08:00
Xun Jiang/Bruce Jiang	aa3bd251dd	Merge branch 'main' into 9097	2025-12-18 14:18:04 +08:00
Xun Jiang	e39374f335	Add maintenance job and data mover pod's labels and annotations setting. Add wait in file_system_test's async test cases. Add related documents. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>	2025-12-17 13:21:07 +08:00
Tiger Kaovilai	61bf2ef777	feat: Enhance BackupStorageLocation with Secret-based CA certificate support - Introduced `CACertRef` field in `ObjectStorageLocation` to reference a Secret containing the CA certificate, replacing the deprecated `CACert` field. - Implemented validation logic to ensure mutual exclusivity between `CACert` and `CACertRef`. - Updated BSL controller and repository provider to handle the new certificate resolution logic. - Enhanced CLI to support automatic certificate discovery from BSL configurations. - Added unit and integration tests to validate new functionality and ensure backward compatibility. - Documented migration strategy for users transitioning from inline certificates to Secret-based management. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>	2025-12-12 21:07:37 +07:00
Shubham Pampattiwar	14b34f08cc	Merge pull request #9321 from shubham-pampattiwar/fix-azure-bsl-status-message-8368 Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m11s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 13s Details Main CI / Build (push) Failing after 33s Details Sanitize Azure HTTP responses in BSL status messages	2025-12-11 22:00:18 -08:00
Xun Jiang	096436507e	Remove VolumeSnapshotClass from CSI restore and deletion process. Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m1s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Remove VolumeSnapshotClass from backup sync process. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>	2025-12-11 17:56:11 +08:00
Shubham Pampattiwar	f0c97c489d	Merge pull request #9414 from shubham-pampattiwar/add-maintenance-job-metrics Some checks failed Run the E2E test on kind / get-go-version (push) Failing after 1m8s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 5s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 14s Details Main CI / Build (push) Failing after 37s Details Close stale issues and PRs / stale (push) Successful in 15s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m43s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 58s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m8s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 58s Details Add Prometheus metrics for maintenance jobs	2025-12-08 09:23:44 -08:00
Scott Seago	7286d24c35	Updates for merge conflict and to refine reconciler queueing logic Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-03 16:55:59 -05:00
Scott Seago	7e4797f588	Track running backup count via BackupTracker This avoids an unnecessary apiserver List call when the backup reconciler is already at capacity. Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:23:47 -05:00
Scott Seago	f238a7e47b	make update Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:23:21 -05:00
Scott Seago	0b2e7d1238	Minor refactoring Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:23:21 -05:00
Scott Seago	73864e31ff	Fix linters Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:04:55 -05:00
Scott Seago	8a95d512b3	make update, changelog Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:04:07 -05:00
Scott Seago	4d1802233a	add various scenarios to queue controller unit tests Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:01:09 -05:00
Scott Seago	f73443659a	Backup queue controller implementation Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:57:18 -05:00
Scott Seago	845eee4e60	feat: Create backup queue controller and add to disableable list Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:46:56 -05:00
Scott Seago	6a3f821606	fix lint Signed-off-by: Scott Seago <sseago@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	34dc381182	Refactor after review Signed-off-by: Scott Seago <sseago@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	29b01c3170	make update Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	9c1c7d20ff	Minor refactoring Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	7bc57b5a5f	Refactor queue controller to reduce apiserver list calls Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	e7b5d20f4c	Fix linters Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	aedc0fe5e2	make update, changelog Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:07 -05:00
Scott Seago	91357b28c4	Move worker pool creation to backup reconcile. ItemBlockWorkerPool is now created for each backup. Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	e0c08f03cf	add various scenarios to queue controller unit tests Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	a56ab10f23	Move debug logs to info Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	d39ad6f208	run multiple backup reconcilers, only reconcile ReadyToStart backups Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	13041b40c2	Backup queue controller implementation Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	fe799d7546	feat: Add concurrent backups configuration to backup reconciler Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:28:08 -05:00
Scott Seago	d91d50f696	feat: Add concurrentBackups to backupQueueReconciler Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:28:08 -05:00
Scott Seago	5d02af3ce3	feat: Create backup queue controller and add to disableable list Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:28:08 -05:00
Shubham Pampattiwar	20af2c20c5	Address PR review comments: sanitize errors and add SAS token scrubbing This commit addresses three review comments on PR #9321: 1. Keep sanitization in controller (response to @ywk253100) - Maintaining centralized error handling for easier extension - Azure-specific patterns detected and others passed through unchanged 2. Sanitize unavailableErrors array (@priyansh17) - Now using sanitizeStorageError() for both unavailableErrors array and location.Status.Message for consistency 3. Add SAS token scrubbing (@anshulahuja98) - Scrubs Azure SAS token parameters to prevent credential leakage - Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss - Example: ?sig=secret becomes ?sig=*REDACTED* Added comprehensive test coverage for SAS token scrubbing with 4 new test cases covering various scenarios. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2025-12-02 11:37:50 -08:00
Shubham Pampattiwar	a5d32f29da	Sanitize Azure HTTP responses in BSL status messages Azure storage errors include verbose HTTP response details and XML in error messages, making the BSL status.message field cluttered and hard to read. This change adds sanitization to extract only the error code and meaningful message. Before: BackupStorageLocation "test" is unavailable: rpc error: code = Unknown desc = GET https://... RESPONSE 404: 404 The specified container does not exist. ERROR CODE: ContainerNotFound <?xml version="1.0"...> After: BackupStorageLocation "test" is unavailable: rpc error: code = Unknown desc = ContainerNotFound: The specified container does not exist. AWS and GCP error messages are preserved as-is since they don't contain verbose HTTP responses. Fixes #8368 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2025-12-02 11:37:50 -08:00

1 2 3 4 5 ...

898 Commits