velero

mirror of https://github.com/vmware-tanzu/velero.git synced 2026-07-22 16:02:21 +00:00

Author	SHA1	Message	Date
Adam Zhang	0291c53e9d	Fix PodVolumeBackup list scope during restore Restrict the listing of PodVolumeBackup resources to the specific restore namespace in both the core restore controller and the pod volume restore action plugin. This prevents "Forbidden" errors when Velero is configured with namespace-scoped minimum privileges, avoiding the need for cluster-scoped list permissions for PodVolumeBackups. Fixes: #9681 Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>	2026-04-09 09:57:04 +08:00
Shubham Pampattiwar	5ad4e604b8	[release-1.18] Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver (#9687 ) Run the E2E test on kind / get-go-version (push) Failing after 1m4s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Failing after 12s Details Main CI / Build (push) Has been skipped Details * Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver (#9516) * Fix VolumeGroupSnapshot restore on Ceph RBD This PR fixes two related issues affecting CSI snapshot restore on Ceph RBD: 1. VolumeGroupSnapshot restore fails because Ceph RBD populates volumeGroupSnapshotHandle on pre-provisioned VSCs, but Velero doesn't create the required VGSC during restore. 2. CSI snapshot restore fails because VolumeSnapshotClassName is removed from restored VSCs, preventing the CSI controller from getting credentials for snapshot verification. Changes: - Capture volumeGroupSnapshotHandle during backup as VS annotation - Create stub VGSC during restore with matching handle in status - Look up VolumeSnapshotClass by driver and set on restored VSC Fixes #9512 Fixes #9515 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Add changelog for VGS restore fix Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix gofmt import order Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Add changelog for VGS restore fix Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix import alias corev1 to corev1api per lint config Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix: Add snapshot handles to existing stub VGSC and add unit tests When multiple VolumeSnapshots from the same VolumeGroupSnapshot are restored, they share the same VolumeGroupSnapshotHandle but have different individual snapshot handles. This commit: 1. Fixes incomplete logic where existing VGSC wasn't updated with new snapshot handles (addresses review feedback) 2. Fixes race condition where Create returning AlreadyExists would skip adding the snapshot handle 3. Adds comprehensive unit tests for ensureStubVGSCExists (5 cases) and addSnapshotHandleToVGSC (4 cases) functions Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Clean up stub VolumeGroupSnapshotContents during restore finalization Add cleanup logic for stub VGSCs created during VolumeGroupSnapshot restore. The stub VGSCs are temporary objects needed to satisfy CSI controller validation during VSC reconciliation. Once all related VSCs become ReadyToUse, the stub VGSCs are no longer needed and should be removed. The cleanup runs in the restore finalizer controller's execute() phase. Before deleting each VGSC, it polls until all related VolumeSnapshotContents (correlated by snapshot handle) are ReadyToUse, with a timeout fallback. Deletion failures and CRD-not-installed scenarios are treated as warnings rather than errors to avoid failing the restore. Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix lint: remove unused nolint directive and simplify cleanupStubVGSC return The cleanupStubVGSC function only produces warnings (not errors), so simplify its return signature. Also remove the now-unused nolint:unparam directive on execute() since warnings are no longer always nil. Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> --------- Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Rename changelog file to match cherry-pick PR number Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> --------- Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2026-04-08 12:45:02 -07:00
Xun Jiang	8ac8f49b5c	Remove wildcard check from getNamespacesToList. Expand wildcard in namespace filter only for backup scenario. Restore doesn't need that now, because restore has logic to rely on IncludeEverything function to check whether cluster-scoped resources should be restored. Expand wildcard will break the logic. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>	2026-04-08 08:12:21 +08:00
Scott Seago	dfb1d45831	Remove backup from running list when backup fails validation (#9498 ) Run the E2E test on kind / get-go-version (push) Failing after 1m4s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 12s Details Main CI / Build (push) Failing after 32s Details Close stale issues and PRs / stale (push) Successful in 15s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m34s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m16s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m30s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m16s Details Signed-off-by: Scott Seago <sseago@redhat.com>	2026-01-27 16:25:30 -05:00
Wenkai Yin(尹文开)	7442d20f9d	Merge pull request #9481 from Lyndon-Li/issue-fix-9478 Run the E2E test on kind / get-go-version (push) Failing after 1m3s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 15s Details Main CI / Build (push) Failing after 35s Details Close stale issues and PRs / stale (push) Successful in 13s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m46s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m34s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m22s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m18s Details Issue 9478: Diagnose expose on peek error	2026-01-15 16:53:57 +08:00
Lyndon-Li	e72fea8ecd	fix issue for cache volume Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2026-01-14 17:45:01 +08:00
Lyndon-Li	e703e06eeb	diagnose expose on peek error Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2026-01-13 16:33:14 +08:00
lyndon-li	2d93ab261e	Merge pull request #9141 from kaovilai/9097 Run the E2E test on kind / get-go-version (push) Failing after 1m8s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 16s Details Main CI / Build (push) Failing after 45s Details feat: Enhance BackupStorageLocation with Secret-based CA certificate support	2025-12-19 13:13:47 +08:00
Xun Jiang/Bruce Jiang	aa3bd251dd	Merge branch 'main' into 9097	2025-12-18 14:18:04 +08:00
Xun Jiang	e39374f335	Add maintenance job and data mover pod's labels and annotations setting. Add wait in file_system_test's async test cases. Add related documents. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>	2025-12-17 13:21:07 +08:00
Tiger Kaovilai	61bf2ef777	feat: Enhance BackupStorageLocation with Secret-based CA certificate support - Introduced `CACertRef` field in `ObjectStorageLocation` to reference a Secret containing the CA certificate, replacing the deprecated `CACert` field. - Implemented validation logic to ensure mutual exclusivity between `CACert` and `CACertRef`. - Updated BSL controller and repository provider to handle the new certificate resolution logic. - Enhanced CLI to support automatic certificate discovery from BSL configurations. - Added unit and integration tests to validate new functionality and ensure backward compatibility. - Documented migration strategy for users transitioning from inline certificates to Secret-based management. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>	2025-12-12 21:07:37 +07:00
Shubham Pampattiwar	14b34f08cc	Merge pull request #9321 from shubham-pampattiwar/fix-azure-bsl-status-message-8368 Run the E2E test on kind / get-go-version (push) Failing after 1m11s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 13s Details Main CI / Build (push) Failing after 33s Details Sanitize Azure HTTP responses in BSL status messages	2025-12-11 22:00:18 -08:00
Xun Jiang	096436507e	Remove VolumeSnapshotClass from CSI restore and deletion process. Run the E2E test on kind / get-go-version (push) Failing after 1m1s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 3s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Remove VolumeSnapshotClass from backup sync process. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>	2025-12-11 17:56:11 +08:00
Shubham Pampattiwar	f0c97c489d	Merge pull request #9414 from shubham-pampattiwar/add-maintenance-job-metrics Run the E2E test on kind / get-go-version (push) Failing after 1m8s Details Run the E2E test on kind / build (push) Has been skipped Details Run the E2E test on kind / setup-test-matrix (push) Successful in 5s Details Run the E2E test on kind / run-e2e-test (push) Has been skipped Details Main CI / get-go-version (push) Successful in 14s Details Main CI / Build (push) Failing after 37s Details Close stale issues and PRs / stale (push) Successful in 15s Details Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m43s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 58s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m8s Details Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 58s Details Add Prometheus metrics for maintenance jobs	2025-12-08 09:23:44 -08:00
Scott Seago	7286d24c35	Updates for merge conflict and to refine reconciler queueing logic Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-03 16:55:59 -05:00
Scott Seago	7e4797f588	Track running backup count via BackupTracker This avoids an unnecessary apiserver List call when the backup reconciler is already at capacity. Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:23:47 -05:00
Scott Seago	f238a7e47b	make update Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:23:21 -05:00
Scott Seago	0b2e7d1238	Minor refactoring Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:23:21 -05:00
Scott Seago	73864e31ff	Fix linters Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:04:55 -05:00
Scott Seago	8a95d512b3	make update, changelog Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:04:07 -05:00
Scott Seago	4d1802233a	add various scenarios to queue controller unit tests Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 17:01:09 -05:00
Scott Seago	f73443659a	Backup queue controller implementation Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:57:18 -05:00
Scott Seago	845eee4e60	feat: Create backup queue controller and add to disableable list Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:46:56 -05:00
Scott Seago	6a3f821606	fix lint Signed-off-by: Scott Seago <sseago@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	34dc381182	Refactor after review Signed-off-by: Scott Seago <sseago@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	29b01c3170	make update Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	9c1c7d20ff	Minor refactoring Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	7bc57b5a5f	Refactor queue controller to reduce apiserver list calls Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	e7b5d20f4c	Fix linters Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:10 -05:00
Scott Seago	aedc0fe5e2	make update, changelog Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:39:07 -05:00
Scott Seago	91357b28c4	Move worker pool creation to backup reconcile. ItemBlockWorkerPool is now created for each backup. Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	e0c08f03cf	add various scenarios to queue controller unit tests Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	a56ab10f23	Move debug logs to info Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	d39ad6f208	run multiple backup reconcilers, only reconcile ReadyToStart backups Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	13041b40c2	Backup queue controller implementation Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:38:41 -05:00
Scott Seago	fe799d7546	feat: Add concurrent backups configuration to backup reconciler Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:28:08 -05:00
Scott Seago	d91d50f696	feat: Add concurrentBackups to backupQueueReconciler Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:28:08 -05:00
Scott Seago	5d02af3ce3	feat: Create backup queue controller and add to disableable list Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat> Signed-off-by: Scott Seago <sseago@redhat.com>	2025-12-02 16:28:08 -05:00
Shubham Pampattiwar	20af2c20c5	Address PR review comments: sanitize errors and add SAS token scrubbing This commit addresses three review comments on PR #9321: 1. Keep sanitization in controller (response to @ywk253100) - Maintaining centralized error handling for easier extension - Azure-specific patterns detected and others passed through unchanged 2. Sanitize unavailableErrors array (@priyansh17) - Now using sanitizeStorageError() for both unavailableErrors array and location.Status.Message for consistency 3. Add SAS token scrubbing (@anshulahuja98) - Scrubs Azure SAS token parameters to prevent credential leakage - Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss - Example: ?sig=secret becomes ?sig=*REDACTED* Added comprehensive test coverage for SAS token scrubbing with 4 new test cases covering various scenarios. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2025-12-02 11:37:50 -08:00
Shubham Pampattiwar	a5d32f29da	Sanitize Azure HTTP responses in BSL status messages Azure storage errors include verbose HTTP response details and XML in error messages, making the BSL status.message field cluttered and hard to read. This change adds sanitization to extract only the error code and meaningful message. Before: BackupStorageLocation "test" is unavailable: rpc error: code = Unknown desc = GET https://... RESPONSE 404: 404 The specified container does not exist. ERROR CODE: ContainerNotFound <?xml version="1.0"...> After: BackupStorageLocation "test" is unavailable: rpc error: code = Unknown desc = ContainerNotFound: The specified container does not exist. AWS and GCP error messages are preserved as-is since they don't contain verbose HTTP responses. Fixes #8368 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2025-12-02 11:37:50 -08:00
Shubham Pampattiwar	27ca08b5a5	Address review comments: rename metrics to repo_maintenance_* - Rename metric constants from maintenance_job_* to repo_maintenance_* - Update metric help text to clarify these are for repo maintenance - Rename functions: RegisterMaintenanceJob* → RegisterRepoMaintenance* - Update all test references to use new names Addresses review comments from @Lyndon-Li on PR #9414 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2025-12-02 11:36:15 -08:00
Shubham Pampattiwar	fdf439963c	Add Prometheus metrics for maintenance jobs Adds three new Prometheus metrics to track backup repository maintenance job execution: - velero_maintenance_job_success_total: Counter for successful jobs - velero_maintenance_job_failure_total: Counter for failed jobs - velero_maintenance_job_duration_seconds: Histogram for job duration Metrics use repository_name label to identify specific BackupRepositories. Duration is recorded for both successful and failed jobs (when job runs), but not when job fails to start. Includes comprehensive unit and integration tests. Fixes #9225 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>	2025-12-02 11:36:15 -08:00
lyndon-li	f947092f1a	cache volume for PVR (#9397 ) Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2025-11-11 13:02:56 -05:00
Xun Jiang/Bruce Jiang	82367e7ff6	Fix the Job build error when BackupReposiotry name longer than 63. (#9350 ) * Fix the Job build error when BackupReposiotry name longer than 63. Fix the Job build error. Consider the name length limitation change in job list code. Use hash to replace the GetValidName function. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com> * Use ref_name to replace ref. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com> --------- Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>	2025-11-11 12:56:27 -05:00
Lyndon-Li	7dbe2b4358	cache volume for data download Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2025-11-04 16:59:52 +08:00
Scott Seago	5fc76db8c0	Add incrementalSize to DU/PVB for reporting new/changed size Signed-off-by: Scott Seago <sseago@redhat.com>	2025-10-27 15:38:31 -04:00
Lyndon-Li	6dbe772590	snapshot size in restore CRs only Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2025-10-22 12:41:41 +08:00
Lyndon-Li	166f50d776	add snapshot size to data mover CRs Signed-off-by: Lyndon-Li <lyonghui@vmware.com>	2025-10-21 15:14:38 +08:00
Scott Seago	4ade8cf8a2	Add option for privileged fs-backup pod Signed-off-by: Scott Seago <sseago@redhat.com>	2025-09-25 15:38:39 -04:00
lyndon-li	21691451e9	Merge branch 'main' into backup-pvc-to-different-node	2025-09-23 11:43:24 +08:00

1 2 3 4 5 ...

888 Commits