Commit Graph

874 Commits

Author SHA1 Message Date
Tiger Kaovilai
61bf2ef777 feat: Enhance BackupStorageLocation with Secret-based CA certificate support
- Introduced `CACertRef` field in `ObjectStorageLocation` to reference a Secret containing the CA certificate, replacing the deprecated `CACert` field.
- Implemented validation logic to ensure mutual exclusivity between `CACert` and `CACertRef`.
- Updated BSL controller and repository provider to handle the new certificate resolution logic.
- Enhanced CLI to support automatic certificate discovery from BSL configurations.
- Added unit and integration tests to validate new functionality and ensure backward compatibility.
- Documented migration strategy for users transitioning from inline certificates to Secret-based management.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
2025-12-12 21:07:37 +07:00
Shubham Pampattiwar
f0c97c489d Merge pull request #9414 from shubham-pampattiwar/add-maintenance-job-metrics
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m8s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 37s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m43s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 58s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 58s
Add Prometheus metrics for maintenance jobs
2025-12-08 09:23:44 -08:00
Scott Seago
7286d24c35 Updates for merge conflict and to refine reconciler queueing logic
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-03 16:55:59 -05:00
Scott Seago
7e4797f588 Track running backup count via BackupTracker
This avoids an unnecessary apiserver List call when
the backup reconciler is already at capacity.

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:47 -05:00
Scott Seago
f238a7e47b make update
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:21 -05:00
Scott Seago
0b2e7d1238 Minor refactoring
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:21 -05:00
Scott Seago
73864e31ff Fix linters
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:04:55 -05:00
Scott Seago
8a95d512b3 make update, changelog
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:04:07 -05:00
Scott Seago
4d1802233a add various scenarios to queue controller unit tests
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:01:09 -05:00
Scott Seago
f73443659a Backup queue controller implementation
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:57:18 -05:00
Scott Seago
845eee4e60 feat: Create backup queue controller and add to disableable list
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:46:56 -05:00
Scott Seago
6a3f821606 fix lint
Signed-off-by: Scott Seago <sseago@redhat.com>

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
34dc381182 Refactor after review
Signed-off-by: Scott Seago <sseago@redhat.com>

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
29b01c3170 make update
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
9c1c7d20ff Minor refactoring
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
7bc57b5a5f Refactor queue controller to reduce apiserver list calls
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
e7b5d20f4c Fix linters
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
aedc0fe5e2 make update, changelog
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:07 -05:00
Scott Seago
91357b28c4 Move worker pool creation to backup reconcile.
ItemBlockWorkerPool is now created for each backup.

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
e0c08f03cf add various scenarios to queue controller unit tests
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
a56ab10f23 Move debug logs to info
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
d39ad6f208 run multiple backup reconcilers, only reconcile ReadyToStart backups
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
13041b40c2 Backup queue controller implementation
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
fe799d7546 feat: Add concurrent backups configuration to backup reconciler
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:28:08 -05:00
Scott Seago
d91d50f696 feat: Add concurrentBackups to backupQueueReconciler
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:28:08 -05:00
Scott Seago
5d02af3ce3 feat: Create backup queue controller and add to disableable list
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:28:08 -05:00
Shubham Pampattiwar
27ca08b5a5 Address review comments: rename metrics to repo_maintenance_*
- Rename metric constants from maintenance_job_* to repo_maintenance_*
- Update metric help text to clarify these are for repo maintenance
- Rename functions: RegisterMaintenanceJob* → RegisterRepoMaintenance*
- Update all test references to use new names

Addresses review comments from @Lyndon-Li on PR #9414

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-02 11:36:15 -08:00
Shubham Pampattiwar
fdf439963c Add Prometheus metrics for maintenance jobs
Adds three new Prometheus metrics to track backup repository
maintenance job execution:

- velero_maintenance_job_success_total: Counter for successful jobs
- velero_maintenance_job_failure_total: Counter for failed jobs
- velero_maintenance_job_duration_seconds: Histogram for job duration

Metrics use repository_name label to identify specific BackupRepositories.
Duration is recorded for both successful and failed jobs (when job runs),
but not when job fails to start.

Includes comprehensive unit and integration tests.

Fixes #9225

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-02 11:36:15 -08:00
lyndon-li
f947092f1a cache volume for PVR (#9397)
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-11-11 13:02:56 -05:00
Xun Jiang/Bruce Jiang
82367e7ff6 Fix the Job build error when BackupReposiotry name longer than 63. (#9350)
* Fix the Job build error when BackupReposiotry name longer than 63.

Fix the Job build error.
Consider the name length limitation change in job list code.
Use hash to replace the GetValidName function.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>

* Use ref_name to replace ref.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>

---------

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-11-11 12:56:27 -05:00
Lyndon-Li
7dbe2b4358 cache volume for data download
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-11-04 16:59:52 +08:00
Scott Seago
5fc76db8c0 Add incrementalSize to DU/PVB for reporting new/changed size
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-10-27 15:38:31 -04:00
Lyndon-Li
6dbe772590 snapshot size in restore CRs only
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-10-22 12:41:41 +08:00
Lyndon-Li
166f50d776 add snapshot size to data mover CRs
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-10-21 15:14:38 +08:00
Scott Seago
4ade8cf8a2 Add option for privileged fs-backup pod
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-09-25 15:38:39 -04:00
lyndon-li
21691451e9 Merge branch 'main' into backup-pvc-to-different-node 2025-09-23 11:43:24 +08:00
0xLeo258
1ebe357d18 Add built-in mutex for SynchronizedVSList && Update unit tests
Signed-off-by: 0xLeo258 <noixe0312@gmail.com>
2025-09-20 09:13:07 +08:00
Shubham Pampattiwar
59289fba76 Fix Schedule Backup Queue Accumulation During Extended Blocking Scenarios
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-09-15 16:01:33 -07:00
lyndon-li
aad9dd9068 Merge branch 'main' into backup-pvc-to-different-node 2025-09-11 14:47:35 +08:00
Lyndon-Li
81c5b6692d backupPVC to different node
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2025-09-11 13:04:24 +08:00
Xun Jiang
e8208097ba Bump k8s library to v1.33.
Replace deprecated EventExpansion method with WithContext methods.
Modify UTs.
Align the E2E ginkgo CLI version with go.mod

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-09-10 17:58:38 +08:00
Xun Jiang
c62a486765 Add ConfigMap parameters validation for install CLI and server start.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-08-22 20:31:38 +08:00
Xun Jiang
ec99b50970 Remove the repository maintenance job parameters from velero server.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-08-07 23:25:22 +08:00
lyndon-li
ae29030917 Merge branch 'main' into implement8869 2025-08-06 13:45:35 +08:00
Tiger Kaovilai
35d2cc0890 Add priority class support for Velero server and node-agent
- Add --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
- Configure data mover pods (PVB/PVR/DataUpload/DataDownload) to use priority class from node-agent-configmap
- Configure maintenance jobs to use priority class from repo-maintenance-job-configmap (global config only)
- Add priority class validation with ValidatePriorityClass and GetDataMoverPriorityClassName utilities
- Update e2e tests to include PriorityClass testing utilities
- Move priority class design document to Implemented folder
- Add comprehensive unit tests for all priority class implementations
- Update documentation for priority class configuration
- Add changelog entry for #8883

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

remove unused test utils

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

feat: add unit test for getting priority class name in maintenance jobs

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

doc update

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

feat: add priority class validation for repository maintenance jobs

- Add ValidatePriorityClassWithClient function to validate priority class existence
- Integrate validation in maintenance.go when creating maintenance jobs
- Update tests to cover the new validation functionality
- Return boolean from ValidatePriorityClass to allow fallback behavior

This ensures maintenance jobs don't fail due to non-existent priority classes,
following the same pattern used for data mover pods.

Addresses feedback from:
https://github.com/vmware-tanzu/velero/pull/8883#discussion_r2238681442

Refs #8869

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

refactor: clean up priority class handling for data mover pods

- Fix comment in node_agent.go to clarify PriorityClassName is only for data mover pods
- Simplify server.go to use dataPathConfigs.PriorityClassName directly
- Remove redundant priority class logging from controllers as it's already logged during server startup
- Keep logging centralized in the node-agent server initialization

This reduces code duplication and clarifies the scope of priority class configuration.

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

refactor: remove GetDataMoverPriorityClassName from kube utilities

Remove GetDataMoverPriorityClassName function and its tests as priority
class is now read directly from dataPathConfigs instead of parsing from
ConfigMap. This simplifies the codebase by eliminating the need for
indirect ConfigMap parsing.

Refs #8869

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

refactor: remove priority class validation from install command

Remove priority class validation during install as it's redundant
since validation already occurs during server startup. Users cannot
see console logs during install, making the validation warnings
ineffective at this stage.

The validation remains in place during server and node-agent startup
where it's more appropriate and visible to users.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-06 01:36:22 -04:00
Daniel Jiang
249d8f581a Add include/exclude policy to resources policy
fixes #8610

This commit extends the resources policy, such that user can define
resource include exclude filters in the policy and reuse it in different backups.

Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
2025-08-05 15:16:59 +08:00
Xun Jiang/Bruce Jiang
9cb421c26f Fix the dd and du's node affinity issue. (#9130)
Some checks failed
Run the E2E test on kind / build (push) Failing after 12m11s
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / Build (push) Failing after 27s
Close stale issues and PRs / stale (push) Successful in 12s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m22s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m3s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m0s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m6s
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-08-04 16:21:35 -04:00
Shubham Pampattiwar
d8f222c83f Add ConfigMap support for keepLatestMaintenanceJobs
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

add changelog file

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

lint fix

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-07-31 16:33:46 -07:00
Tiger Kaovilai
1daa685e7d Make ResticIdentifier optional for kopia repositories (#8987)
The ResticIdentifier field in BackupRepository is only relevant for restic
repositories. For kopia repositories, this field is unused and should be
omitted. This change:

- Adds omitempty tag to ResticIdentifier field in BackupRepository CRD
- Updates controller to only populate ResticIdentifier for restic repos
- Adds tests to verify behavior for both restic and kopia repository types

This ensures backward compatibility while properly handling kopia repositories
that don't require a restic-compatible identifier.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
2025-07-24 22:25:09 -04:00
lyndon-li
9b721a8251 Merge branch 'main' into issue-fix-9077 2025-07-23 15:05:22 +08:00