Commit Graph

5913 Commits

Author SHA1 Message Date
Shubham Pampattiwar
f4c4653c08 Fix linter errors: use 'any' instead of 'interface{}'
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Shubham Pampattiwar
987edf5037 Add global VolumeHelper caching in CSI PVC BIA plugin
Address review feedback to have a global VolumeHelper instance per
plugin process instead of creating one on each ShouldPerformSnapshot
call.

Changes:
- Add volumeHelper, cachedForBackup, and mu fields to pvcBackupItemAction
  struct for caching the VolumeHelper per backup
- Add getOrCreateVolumeHelper() method for thread-safe lazy initialization
- Update Execute() to use cached VolumeHelper via
  ShouldPerformSnapshotWithVolumeHelper()
- Update filterPVCsByVolumePolicy() to accept VolumeHelper parameter
- Add ShouldPerformSnapshotWithVolumeHelper() that accepts optional
  VolumeHelper for reuse across multiple calls
- Add NewVolumeHelperForBackup() factory function for BIA plugins
- Add comprehensive unit tests for both nil and non-nil VolumeHelper paths

This completes the fix for issue #9179 by ensuring the PVC-to-Pod cache
is built once per backup and reused across all PVC processing, avoiding
O(N*M) complexity.

Fixes #9179

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Shubham Pampattiwar
99e821a870 Address review feedback: move cache building to volumehelper
- Rename NewVolumeHelperImplWithCache to NewVolumeHelperImplWithNamespaces
- Move cache building logic from backup.go into volumehelper
- Return error from NewVolumeHelperImplWithNamespaces if cache build fails
- Remove fallback in main backup path - backup fails if cache build fails
- Update NewVolumeHelperImpl to call NewVolumeHelperImplWithNamespaces
- Add comments clarifying fallback is only used by plugins
- Update tests for new error return signature

This addresses review comments from @Lyndon-Li and @kaovilai:
- Cache building is now encapsulated in volumehelper
- No fallback in main backup path ensures predictable performance
- Code reuse between constructors

Fixes #9179

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Shubham Pampattiwar
041e5e2a7e Address review feedback
- Use ResolveNamespaceList() instead of GetIncludes() for more accurate
  namespace resolution when building the PVC-to-Pod cache
- Refactor NewVolumeHelperImpl to call NewVolumeHelperImplWithCache with
  nil cache parameter to avoid code duplication

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Shubham Pampattiwar
8e58099674 Add test for cache usage without volume policy
Add test case to verify that the PVC-to-Pod cache is used even when
no volume policy is configured. When defaultVolumesToFSBackup is true,
the cache is used to find pods using the PVC to determine if fs-backup
should be used instead of snapshot.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Shubham Pampattiwar
a43f14b071 Add unit tests for ShouldPerformFSBackup with PVC-to-Pod cache
Add TestVolumeHelperImplWithCache_ShouldPerformFSBackup to verify:
- Volume policy match with cache returns correct fs-backup decision
- Volume policy match with snapshot action skips fs-backup
- Fallback to direct lookup when cache is not built

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Shubham Pampattiwar
26053ae6d6 Add unit tests for VolumeHelperImpl with PVC-to-Pod cache
Add TestVolumeHelperImplWithCache_ShouldPerformSnapshot to verify:
- Volume policy match with cache returns correct snapshot decision
- fs-backup via opt-out with cache properly skips snapshot
- Fallback to direct lookup when cache is not built

These tests verify the cache-enabled code path added in the previous
commit for improved volume policy performance.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Shubham Pampattiwar
60203ad01b Add changelog for PR #9441
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Shubham Pampattiwar
bcdc30b59a Add PVC-to-Pod cache to improve volume policy performance
The GetPodsUsingPVC function had O(N*M) complexity - for each PVC,
it listed ALL pods in the namespace and iterated through each pod.
With many PVCs and pods, this caused significant performance
degradation (2+ seconds per PV in some cases).

This change introduces a PVC-to-Pod cache that is built once per
backup and reused for all PVC lookups, reducing complexity from
O(N*M) to O(N+M).

Changes:
- Add PVCPodCache struct with thread-safe caching in podvolume pkg
- Add NewVolumeHelperImplWithCache constructor for cache support
- Build cache before backup item processing in backup.go
- Add comprehensive unit tests for cache functionality
- Graceful fallback to direct lookups if cache fails

Fixes #9179

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-15 14:18:05 -08:00
Tiger Kaovilai
a1026cb531 Update golangci-lint installation script URL to use HEAD for latest version (#9451)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m16s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
build-image / Build (push) Failing after 15s
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 39s
Close stale issues and PRs / stale (push) Successful in 14s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m39s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m27s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m27s
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
2025-12-12 12:25:45 -05:00
Shubham Pampattiwar
14b34f08cc Merge pull request #9321 from shubham-pampattiwar/fix-azure-bsl-status-message-8368
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m11s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 33s
Sanitize Azure HTTP responses in BSL status messages
2025-12-11 22:00:18 -08:00
Xun Jiang/Bruce Jiang
add66eac42 Merge pull request #9431 from vmware-tanzu/9033_fix
Remove VolumeSnapshotClass from CSI B/R process.
2025-12-12 10:59:47 +08:00
Xun Jiang
096436507e Remove VolumeSnapshotClass from CSI restore and deletion process.
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m1s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Remove VolumeSnapshotClass from backup sync process.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-12-11 17:56:11 +08:00
Xun Jiang/Bruce Jiang
554b04e6ca Merge pull request #9132 from mjnagel/crd-upgrade
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 56s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 25s
Close stale issues and PRs / stale (push) Successful in 12s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m36s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m16s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m13s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m4s
feat: add apply flag to install command
2025-12-10 16:41:56 +08:00
Xun Jiang/Bruce Jiang
c594026c1f Merge pull request #9446 from vmware-tanzu/dependabot/github_actions/actions/stale-10.1.1
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m5s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 32s
Bump actions/stale from 10.1.0 to 10.1.1
2025-12-10 13:31:28 +08:00
Xun Jiang/Bruce Jiang
46776898ab Merge branch 'main' into dependabot/github_actions/actions/stale-10.1.1 2025-12-10 11:34:29 +08:00
Xun Jiang/Bruce Jiang
fdcfed84f9 Add the node-agent ConfigMap document. (#9434)
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m1s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 31s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m45s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m9s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m6s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m14s
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2025-12-09 04:57:30 -05:00
dependabot[bot]
dbeb16aad7 Bump actions/stale from 10.1.0 to 10.1.1
Bumps [actions/stale](https://github.com/actions/stale) from 10.1.0 to 10.1.1.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/stale/compare/v10.1.0...v10.1.1)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-version: 10.1.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-08 19:02:57 +00:00
Shubham Pampattiwar
f0c97c489d Merge pull request #9414 from shubham-pampattiwar/add-maintenance-job-metrics
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m8s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 5s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 14s
Main CI / Build (push) Failing after 37s
Close stale issues and PRs / stale (push) Successful in 15s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m43s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 58s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m8s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 58s
Add Prometheus metrics for maintenance jobs
2025-12-08 09:23:44 -08:00
Micah Nagel
3244cc605f feat: add apply flag to install command
Signed-off-by: Micah Nagel <micah.nagel@defenseunicorns.com>
2025-12-05 11:26:10 +08:00
Shubham Pampattiwar
6a0307142c Merge pull request #9307 from sseago/parallel-backup
Some checks failed
Run the E2E test on kind / get-go-version (push) Failing after 1m4s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Successful in 13s
Main CI / Build (push) Failing after 30s
Close stale issues and PRs / stale (push) Successful in 17s
Trivy Nightly Scan / Trivy nightly scan (velero, main) (push) Failing after 1m46s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-aws, main) (push) Failing after 1m30s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-gcp, main) (push) Failing after 1m36s
Trivy Nightly Scan / Trivy nightly scan (velero-plugin-for-microsoft-azure, main) (push) Failing after 1m31s
Parallel backup processing
2025-12-04 11:37:24 -08:00
Shubham Pampattiwar
1ec622245b Run make update to fix gofmt alignment
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-03 16:13:13 -08:00
Shubham Pampattiwar
31fb828f8e Add clarifying comment for histogram metric
Explain that the duration histogram tracks distribution of individual
job durations, not accumulated sums, to address reviewer concerns.

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
2025-12-03 16:05:32 -08:00
Scott Seago
7286d24c35 Updates for merge conflict and to refine reconciler queueing logic
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-03 16:55:59 -05:00
Scott Seago
7e4797f588 Track running backup count via BackupTracker
This avoids an unnecessary apiserver List call when
the backup reconciler is already at capacity.

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:47 -05:00
Scott Seago
f238a7e47b make update
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:21 -05:00
Scott Seago
0b2e7d1238 Minor refactoring
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:23:21 -05:00
Scott Seago
73864e31ff Fix linters
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:04:55 -05:00
Scott Seago
8a95d512b3 make update, changelog
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:04:07 -05:00
Scott Seago
4d1802233a add various scenarios to queue controller unit tests
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 17:01:09 -05:00
Scott Seago
f73443659a Backup queue controller implementation
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:57:18 -05:00
Scott Seago
7111f3cea2 feat: Remove pvc-for-tmp install arg
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:49:17 -05:00
Scott Seago
845eee4e60 feat: Create backup queue controller and add to disableable list
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:46:56 -05:00
Scott Seago
c50ab4a6ea feat: Add pvc-for-tmp install arg to use PVC for server /tmp dir
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:40:49 -05:00
Scott Seago
6a3f821606 fix lint
Signed-off-by: Scott Seago <sseago@redhat.com>

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
34dc381182 Refactor after review
Signed-off-by: Scott Seago <sseago@redhat.com>

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
29b01c3170 make update
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
84571bc54d Added doc note around parallel backups and resource limits
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
9c1c7d20ff Minor refactoring
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
7bc57b5a5f Refactor queue controller to reduce apiserver list calls
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
e7b5d20f4c Fix linters
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:10 -05:00
Scott Seago
aedc0fe5e2 make update, changelog
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:39:07 -05:00
Scott Seago
dbaa25405d move podVolumeContext into backupRequest
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
91357b28c4 Move worker pool creation to backup reconcile.
ItemBlockWorkerPool is now created for each backup.

Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
e0c08f03cf add various scenarios to queue controller unit tests
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
a56ab10f23 Move debug logs to info
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
d39ad6f208 run multiple backup reconcilers, only reconcile ReadyToStart backups
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
300bc70c68 Add queue position to backup list/describe
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
13041b40c2 Backup queue controller implementation
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:38:41 -05:00
Scott Seago
4ffb29d750 feat: Remove pvc-for-tmp install arg
Co-authored-by: aider (gemini/gemini-2.5-pro) <aider@aider.chat>
Signed-off-by: Scott Seago <sseago@redhat.com>
2025-12-02 16:28:08 -05:00