881 Commits

Author SHA1 Message Date
lyndon-li
2d93ab261e Merge pull request #9141 from kaovilai/9097
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m8s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 16s
Main CI / Build (push) Failing after 45s
feat: Enhance BackupStorageLocation with Secret-based CA certificate support
2025-12-19 13:13:47 +08:00
Xun Jiang/Bruce Jiang
aa3bd251dd Merge branch 'main' into 9097 2025-12-18 14:18:04 +08:00
Xun Jiang
e39374f335 Add maintenance job and data mover pod's labels and annotations setting.
Add wait in file_system_test's async test cases.
Add related documents.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-12-17 13:21:07 +08:00
Tiger Kaovilai
61bf2ef777 feat: Enhance BackupStorageLocation with Secret-based CA certificate support
- Introduced `CACertRef` field in `ObjectStorageLocation` to reference a Secret containing the CA certificate, replacing the deprecated `CACert` field.
- Implemented validation logic to ensure mutual exclusivity between `CACert` and `CACertRef`.
- Updated BSL controller and repository provider to handle the new certificate resolution logic.
- Enhanced CLI to support automatic certificate discovery from BSL configurations.
- Added unit and integration tests to validate new functionality and ensure backward compatibility.
- Documented migration strategy for users transitioning from inline certificates to Secret-based management.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
2025-12-12 21:07:37 +07:00
Shubham Pampattiwar
14b34f08cc Merge pull request #9321 from shubham-pampattiwar/fix-azure-bsl-status-message-8368
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m11s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 33s
Sanitize Azure HTTP responses in BSL status messages
2025-12-11 22:00:18 -08:00
Xun Jiang
096436507e Remove VolumeSnapshotClass from CSI restore and deletion process.
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m1s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Remove VolumeSnapshotClass from backup sync process.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-12-11 17:56:11 +08:00
Shubham Pampattiwar
f0c97c489d Merge pull request #9414 from shubham-pampattiwar/add-maintenance-job-metrics
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m8s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 37s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m43s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 58s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 58s
Add Prometheus metrics for maintenance jobs
2025-12-08 09:23:44 -08:00
Scott Seago
7286d24c35 Updates for merge conflict and to refine reconciler queueing logic
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-03 16:55:59 -05:00
Scott Seago
7e4797f588 Track running backup count via BackupTracker
This avoids an unnecessary apiserver List call when
the backup reconciler is already at capacity.

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:47 -05:00
Scott Seago
f238a7e47b make update
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:21 -05:00
Scott Seago
0b2e7d1238 Minor refactoring
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:21 -05:00
Scott Seago
73864e31ff Fix linters
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:04:55 -05:00
Scott Seago
8a95d512b3 make update, changelog
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:04:07 -05:00
Scott Seago
4d1802233a add various scenarios to queue controller unit tests
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:01:09 -05:00
Scott Seago
f73443659a Backup queue controller implementation
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:57:18 -05:00
Scott Seago
845eee4e60 feat: Create backup queue controller and add to disableable list
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:46:56 -05:00
Scott Seago
6a3f821606 fix lint
Signed-off-by: Scott Seago <sseago@redhat.com>

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
34dc381182 Refactor after review
Signed-off-by: Scott Seago <sseago@redhat.com>

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
29b01c3170 make update
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
9c1c7d20ff Minor refactoring
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
7bc57b5a5f Refactor queue controller to reduce apiserver list calls
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
e7b5d20f4c Fix linters
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
aedc0fe5e2 make update, changelog
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:07 -05:00
Scott Seago
91357b28c4 Move worker pool creation to backup reconcile.
ItemBlockWorkerPool is now created for each backup.

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
e0c08f03cf add various scenarios to queue controller unit tests
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
a56ab10f23 Move debug logs to info
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
d39ad6f208 run multiple backup reconcilers, only reconcile ReadyToStart backups
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
13041b40c2 Backup queue controller implementation
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
fe799d7546 feat: Add concurrent backups configuration to backup reconciler
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:28:08 -05:00
Scott Seago
d91d50f696 feat: Add concurrentBackups to backupQueueReconciler
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:28:08 -05:00
Scott Seago
5d02af3ce3 feat: Create backup queue controller and add to disableable list
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:28:08 -05:00
Shubham Pampattiwar
20af2c20c5 Address PR review comments: sanitize errors and add SAS token scrubbing
This commit addresses three review comments on PR #9321:

1. Keep sanitization in controller (response to @ywk253100)
   - Maintaining centralized error handling for easier extension
   - Azure-specific patterns detected and others passed through unchanged

2. Sanitize unavailableErrors array (@priyansh17)
   - Now using sanitizeStorageError() for both unavailableErrors array
     and location.Status.Message for consistency

3. Add SAS token scrubbing (@anshulahuja98)
   - Scrubs Azure SAS token parameters to prevent credential leakage
   - Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss
   - Example: ?sig=secret becomes ?sig=***REDACTED***

Added comprehensive test coverage for SAS token scrubbing with 4 new
test cases covering various scenarios.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-02 11:37:50 -08:00
Shubham Pampattiwar
a5d32f29da Sanitize Azure HTTP responses in BSL status messages
Azure storage errors include verbose HTTP response details and XML
in error messages, making the BSL status.message field cluttered
and hard to read. This change adds sanitization to extract only
the error code and meaningful message.

Before:
  BackupStorageLocation "test" is unavailable: rpc error: code = Unknown
  desc = GET https://...
  RESPONSE 404: 404 The specified container does not exist.
  ERROR CODE: ContainerNotFound
  <?xml version="1.0"...>

After:
  BackupStorageLocation "test" is unavailable: rpc error: code = Unknown
  desc = ContainerNotFound: The specified container does not exist.

AWS and GCP error messages are preserved as-is since they don't
contain verbose HTTP responses.

Fixes #8368

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-02 11:37:50 -08:00
Shubham Pampattiwar
27ca08b5a5 Address review comments: rename metrics to repo_maintenance_*
- Rename metric constants from maintenance_job_* to repo_maintenance_*
- Update metric help text to clarify these are for repo maintenance
- Rename functions: RegisterMaintenanceJob* → RegisterRepoMaintenance*
- Update all test references to use new names

Addresses review comments from @Lyndon-Li on PR #9414

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-02 11:36:15 -08:00
Shubham Pampattiwar
fdf439963c Add Prometheus metrics for maintenance jobs
Adds three new Prometheus metrics to track backup repository
maintenance job execution:

- velero_maintenance_job_success_total: Counter for successful jobs
- velero_maintenance_job_failure_total: Counter for failed jobs
- velero_maintenance_job_duration_seconds: Histogram for job duration

Metrics use repository_name label to identify specific BackupRepositories.
Duration is recorded for both successful and failed jobs (when job runs),
but not when job fails to start.

Includes comprehensive unit and integration tests.

Fixes #9225

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-02 11:36:15 -08:00
lyndon-li
f947092f1a cache volume for PVR (#9397)
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-11-11 13:02:56 -05:00
Xun Jiang/Bruce Jiang
82367e7ff6 Fix the Job build error when BackupReposiotry name longer than 63. (#9350)
* Fix the Job build error when BackupReposiotry name longer than 63.

Fix the Job build error.
Consider the name length limitation change in job list code.
Use hash to replace the GetValidName function.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>

* Use ref_name to replace ref.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>

---------

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-11-11 12:56:27 -05:00
Lyndon-Li
7dbe2b4358 cache volume for data download
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-11-04 16:59:52 +08:00
Scott Seago
5fc76db8c0 Add incrementalSize to DU/PVB for reporting new/changed size
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-10-27 15:38:31 -04:00
Lyndon-Li
6dbe772590 snapshot size in restore CRs only
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-10-22 12:41:41 +08:00
Lyndon-Li
166f50d776 add snapshot size to data mover CRs
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-10-21 15:14:38 +08:00
Scott Seago
4ade8cf8a2 Add option for privileged fs-backup pod
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-09-25 15:38:39 -04:00
lyndon-li
21691451e9 Merge branch 'main' into backup-pvc-to-different-node 2025-09-23 11:43:24 +08:00
0xLeo258
1ebe357d18 Add built-in mutex for SynchronizedVSList && Update unit tests
Signed-off-by: 0xLeo258 <noixe0312@gmail.com>
2025-09-20 09:13:07 +08:00
Shubham Pampattiwar
59289fba76 Fix Schedule Backup Queue Accumulation During Extended Blocking Scenarios
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-09-15 16:01:33 -07:00
lyndon-li
aad9dd9068 Merge branch 'main' into backup-pvc-to-different-node 2025-09-11 14:47:35 +08:00
Lyndon-Li
81c5b6692d backupPVC to different node
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-09-11 13:04:24 +08:00
Xun Jiang
e8208097ba Bump k8s library to v1.33.
Replace deprecated EventExpansion method with WithContext methods.
Modify UTs.
Align the E2E ginkgo CLI version with go.mod

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-09-10 17:58:38 +08:00
Xun Jiang
c62a486765 Add ConfigMap parameters validation for install CLI and server start.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-08-22 20:31:38 +08:00
Xun Jiang
ec99b50970 Remove the repository maintenance job parameters from velero server.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-08-07 23:25:22 +08:00