Remove 'self' from MIGRATE_FROM_VELERO_VERSION.
* Need to modify the BYOT case to specify MIGRATE_FROM_VELERO_VERSION.
Remove the CSI plugin installation check for no older than v1.14
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
Ensure plugin init container names satisfy DNS-1123 label constraints
(max 63 chars). Long names are truncated with an 8-char hash suffix to
maintain uniqueness.
Fixes: #9444
Signed-off-by: Michal Pryc <mpryc@redhat.com>
The nolint:staticcheck directives are not needed in the test file because
it calls NewVolumeHelperImpl within the same package, which doesn't
trigger deprecation warnings. Only cross-package calls need the directive.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
The ShouldPerformSnapshotWithVolumeHelper function and tests intentionally
use NewVolumeHelperImpl (deprecated) for backwards compatibility with
third-party plugins. Add nolint:staticcheck to suppress the linter
warnings with explanatory comments.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Remove deprecated functions that were marked for removal per review:
- Remove GetPodsUsingPVC (replaced by GetPodsUsingPVCWithCache)
- Remove IsPVCDefaultToFSBackup (replaced by IsPVCDefaultToFSBackupWithCache)
- Remove associated tests for deprecated functions
- Add deprecation marker to NewVolumeHelperImpl
- Add deprecation marker to ShouldPerformSnapshotWithBackup
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
This commit addresses reviewer feedback on PR #9441 regarding
concurrent backup caching concerns. Key changes:
1. Added lazy per-namespace caching for the CSI PVC BIA plugin path:
- Added IsNamespaceBuilt() method to check if namespace is cached
- Added BuildCacheForNamespace() for lazy, per-namespace cache building
- Plugin builds cache incrementally as namespaces are encountered
2. Added NewVolumeHelperImplWithCache constructor for plugins:
- Accepts externally-managed PVC-to-Pod cache
- Follows pattern from PR #9226 (Scott Seago's design)
3. Plugin instance lifecycle clarification:
- Plugin instances are unique per backup (created via newPluginManager)
- Cleaned up via CleanupClients at backup completion
- No mutex or backup UID tracking needed
4. Test coverage:
- Added tests for IsNamespaceBuilt and BuildCacheForNamespace
- Added tests for NewVolumeHelperImplWithCache constructor
- Added test verifying cache usage for fs-backup determination
This maintains the O(N+M) complexity improvement from issue #9179
while addressing architectural concerns about concurrent access.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Address review feedback to have a global VolumeHelper instance per
plugin process instead of creating one on each ShouldPerformSnapshot
call.
Changes:
- Add volumeHelper, cachedForBackup, and mu fields to pvcBackupItemAction
struct for caching the VolumeHelper per backup
- Add getOrCreateVolumeHelper() method for thread-safe lazy initialization
- Update Execute() to use cached VolumeHelper via
ShouldPerformSnapshotWithVolumeHelper()
- Update filterPVCsByVolumePolicy() to accept VolumeHelper parameter
- Add ShouldPerformSnapshotWithVolumeHelper() that accepts optional
VolumeHelper for reuse across multiple calls
- Add NewVolumeHelperForBackup() factory function for BIA plugins
- Add comprehensive unit tests for both nil and non-nil VolumeHelper paths
This completes the fix for issue #9179 by ensuring the PVC-to-Pod cache
is built once per backup and reused across all PVC processing, avoiding
O(N*M) complexity.
Fixes#9179
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
- Rename NewVolumeHelperImplWithCache to NewVolumeHelperImplWithNamespaces
- Move cache building logic from backup.go into volumehelper
- Return error from NewVolumeHelperImplWithNamespaces if cache build fails
- Remove fallback in main backup path - backup fails if cache build fails
- Update NewVolumeHelperImpl to call NewVolumeHelperImplWithNamespaces
- Add comments clarifying fallback is only used by plugins
- Update tests for new error return signature
This addresses review comments from @Lyndon-Li and @kaovilai:
- Cache building is now encapsulated in volumehelper
- No fallback in main backup path ensures predictable performance
- Code reuse between constructors
Fixes#9179
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
- Use ResolveNamespaceList() instead of GetIncludes() for more accurate
namespace resolution when building the PVC-to-Pod cache
- Refactor NewVolumeHelperImpl to call NewVolumeHelperImplWithCache with
nil cache parameter to avoid code duplication
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Add test case to verify that the PVC-to-Pod cache is used even when
no volume policy is configured. When defaultVolumesToFSBackup is true,
the cache is used to find pods using the PVC to determine if fs-backup
should be used instead of snapshot.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Add TestVolumeHelperImplWithCache_ShouldPerformFSBackup to verify:
- Volume policy match with cache returns correct fs-backup decision
- Volume policy match with snapshot action skips fs-backup
- Fallback to direct lookup when cache is not built
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Add TestVolumeHelperImplWithCache_ShouldPerformSnapshot to verify:
- Volume policy match with cache returns correct snapshot decision
- fs-backup via opt-out with cache properly skips snapshot
- Fallback to direct lookup when cache is not built
These tests verify the cache-enabled code path added in the previous
commit for improved volume policy performance.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
The GetPodsUsingPVC function had O(N*M) complexity - for each PVC,
it listed ALL pods in the namespace and iterated through each pod.
With many PVCs and pods, this caused significant performance
degradation (2+ seconds per PV in some cases).
This change introduces a PVC-to-Pod cache that is built once per
backup and reused for all PVC lookups, reducing complexity from
O(N*M) to O(N+M).
Changes:
- Add PVCPodCache struct with thread-safe caching in podvolume pkg
- Add NewVolumeHelperImplWithCache constructor for cache support
- Build cache before backup item processing in backup.go
- Add comprehensive unit tests for cache functionality
- Graceful fallback to direct lookups if cache fails
Fixes#9179
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
This change enables BSL validation to work when using caCertRef
(Secret-based CA certificate) by resolving the certificate from
the Secret in velero core before passing it to the object store
plugin as 'caCert' in the config map.
This approach requires no changes to provider plugins since they
already understand the 'caCert' config key.
Changes:
- Add SecretStore to objectBackupStoreGetter struct
- Add NewObjectBackupStoreGetterWithSecretStore constructor
- Update Get method to resolve caCertRef from Secret
- Update server.go to use new constructor with SecretStore
- Add CACertRef builder method and unit tests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
- Introduced `CACertRef` field in `ObjectStorageLocation` to reference a Secret containing the CA certificate, replacing the deprecated `CACert` field.
- Implemented validation logic to ensure mutual exclusivity between `CACert` and `CACertRef`.
- Updated BSL controller and repository provider to handle the new certificate resolution logic.
- Enhanced CLI to support automatic certificate discovery from BSL configurations.
- Added unit and integration tests to validate new functionality and ensure backward compatibility.
- Documented migration strategy for users transitioning from inline certificates to Secret-based management.
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Explain that the duration histogram tracks distribution of individual
job durations, not accumulated sums, to address reviewer concerns.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
This commit addresses three review comments on PR #9321:
1. Keep sanitization in controller (response to @ywk253100)
- Maintaining centralized error handling for easier extension
- Azure-specific patterns detected and others passed through unchanged
2. Sanitize unavailableErrors array (@priyansh17)
- Now using sanitizeStorageError() for both unavailableErrors array
and location.Status.Message for consistency
3. Add SAS token scrubbing (@anshulahuja98)
- Scrubs Azure SAS token parameters to prevent credential leakage
- Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss
- Example: ?sig=secret becomes ?sig=***REDACTED***
Added comprehensive test coverage for SAS token scrubbing with 4 new
test cases covering various scenarios.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Azure storage errors include verbose HTTP response details and XML
in error messages, making the BSL status.message field cluttered
and hard to read. This change adds sanitization to extract only
the error code and meaningful message.
Before:
BackupStorageLocation "test" is unavailable: rpc error: code = Unknown
desc = GET https://...
RESPONSE 404: 404 The specified container does not exist.
ERROR CODE: ContainerNotFound
<?xml version="1.0"...>
After:
BackupStorageLocation "test" is unavailable: rpc error: code = Unknown
desc = ContainerNotFound: The specified container does not exist.
AWS and GCP error messages are preserved as-is since they don't
contain verbose HTTP responses.
Fixes#8368
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
- Rename metric constants from maintenance_job_* to repo_maintenance_*
- Update metric help text to clarify these are for repo maintenance
- Rename functions: RegisterMaintenanceJob* → RegisterRepoMaintenance*
- Update all test references to use new names
Addresses review comments from @Lyndon-Li on PR #9414
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Adds three new Prometheus metrics to track backup repository
maintenance job execution:
- velero_maintenance_job_success_total: Counter for successful jobs
- velero_maintenance_job_failure_total: Counter for failed jobs
- velero_maintenance_job_duration_seconds: Histogram for job duration
Metrics use repository_name label to identify specific BackupRepositories.
Duration is recorded for both successful and failed jobs (when job runs),
but not when job fails to start.
Includes comprehensive unit and integration tests.
Fixes#9225
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Add wildcard status fields
Signed-off-by: Joseph <jvaikath@redhat.com>
* Implement wildcard namespace expansion in item collector
- Introduced methods to get active namespaces and expand wildcard includes/excludes in the item collector.
- Updated getNamespacesToList to handle wildcard patterns and return expanded lists.
- Added utility functions for setting includes and excludes in the IncludesExcludes struct.
- Created a new package for wildcard handling, including functions to determine when to expand wildcards and to perform the expansion.
This enhances the backup process by allowing more flexible namespace selection based on wildcard patterns.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Enhance wildcard expansion logic and logging in item collector
- Improved logging to include original includes and excludes when expanding wildcards.
- Updated the ShouldExpandWildcards function to check for wildcard patterns in excludes.
- Added comments for clarity in the expandWildcards function regarding pattern handling.
These changes enhance the clarity and functionality of the wildcard expansion process in the backup system.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Add wildcard namespace fields to Backup CRD and update deepcopy methods
- Introduced `wildcardIncludedNamespaces` and `wildcardExcludedNamespaces` fields to the Backup CRD to support wildcard patterns for namespace inclusion and exclusion.
- Updated deepcopy methods to handle the new fields, ensuring proper copying of data during object manipulation.
These changes enhance the flexibility of namespace selection in backup operations, aligning with recent improvements in wildcard handling.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Refactor Backup CRD to rename wildcard namespace fields
- Updated `BackupStatus` struct to rename `WildcardIncludedNamespaces` to `WildcardExpandedIncludedNamespaces` and `WildcardExcludedNamespaces` to `WildcardExpandedExcludedNamespaces` for clarity.
- Adjusted associated comments to reflect the new naming and ensure consistency in documentation.
- Modified deepcopy methods to accommodate the renamed fields, ensuring proper data handling during object manipulation.
These changes enhance the clarity and maintainability of the Backup CRD, aligning with recent improvements in wildcard handling.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Fix
Signed-off-by: Joseph <jvaikath@redhat.com>
* Refactor where wildcard expansion happens
Signed-off-by: Joseph <jvaikath@redhat.com>
* Refactor Backup CRD and related components for expanded namespace handling
- Updated `BackupStatus` struct to rename fields for clarity: `WildcardExpandedIncludedNamespaces` and `WildcardExpandedExcludedNamespaces` are now `ExpandedIncludedNamespaces` and `ExpandedExcludedNamespaces`, respectively.
- Adjusted associated comments and deepcopy methods to reflect the new naming conventions.
- Removed the `getActiveNamespaces` function from the item collector, streamlining the namespace handling process.
- Enhanced logging during wildcard expansion to provide clearer insights into the process.
These changes improve the clarity and maintainability of the Backup CRD and enhance the functionality of namespace selection in backup operations.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Refactor wildcard expansion logic in item collector and enhance testing
- Moved the wildcard expansion logic into a dedicated method, `expandNamespaceWildcards`, improving code organization and readability.
- Updated logging to provide detailed insights during the wildcard expansion process.
- Introduced comprehensive unit tests for wildcard handling, covering various scenarios and edge cases.
- Enhanced the `ShouldExpandWildcards` function to better identify wildcard patterns and validate inputs.
These changes improve the maintainability and robustness of the wildcard handling in the backup system.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Enhance Restore CRD with expanded namespace fields and update logic
- Added `ExpandedIncludedNamespaces` and `ExpandedExcludedNamespaces` fields to the `RestoreStatus` struct to support expanded wildcard namespace handling.
- Updated the `DeepCopyInto` method to ensure proper copying of the new fields.
- Implemented logic in the restore process to expand wildcard patterns for included and excluded namespaces, improving flexibility in namespace selection during restores.
- Enhanced logging to provide insights into the expanded namespaces.
These changes improve the functionality and maintainability of the restore process, aligning with recent enhancements in wildcard handling.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Refactor Backup and Restore CRDs to enhance wildcard namespace handling
- Renamed fields in `BackupStatus` and `RestoreStatus` from `ExpandedIncludedNamespaces` and `ExpandedExcludedNamespaces` to `IncludeWildcardMatches` and `ExcludeWildcardMatches` for clarity.
- Introduced a new field `WildcardResult` to record the final namespaces after applying wildcard logic.
- Updated the `DeepCopyInto` methods to accommodate the new field names and ensure proper data handling.
- Enhanced comments to reflect the changes and improve documentation clarity.
These updates improve the maintainability and clarity of the CRDs, aligning with recent enhancements in wildcard handling.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Enhance wildcard namespace handling in Backup and Restore processes
- Updated `BackupRequest` and `Restore` status structures to include a new field `WildcardResult`, which captures the final list of namespaces after applying wildcard logic.
- Renamed existing fields to `IncludeWildcardMatches` and `ExcludeWildcardMatches` for improved clarity.
- Enhanced logging to provide detailed insights into the expanded namespaces and final results during backup and restore operations.
- Introduced a new utility function `GetWildcardResult` to streamline the selection of namespaces based on include/exclude criteria.
These changes improve the clarity and functionality of namespace selection in both backup and restore processes, aligning with recent enhancements in wildcard handling.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Refactor namespace wildcard expansion logic in restore process
- Moved the wildcard expansion logic into a dedicated method, `expandNamespaceWildcards`, improving code organization and readability.
- Enhanced error handling and logging to provide detailed insights into the expanded namespaces during the restore operation.
- Updated the restore context with expanded namespace patterns and final results, ensuring clarity in the restore status.
These changes improve the maintainability and clarity of the restore process, aligning with recent enhancements in wildcard handling.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Add checks for "*" in exclude
Signed-off-by: Joseph <jvaikath@redhat.com>
* Rebase
Signed-off-by: Joseph <jvaikath@redhat.com>
* Create NamespaceIncludesExcludes to get full NS listing for backup w/
Signed-off-by: Scott Seago <sseago@redhat.com>
Signed-off-by: Joseph <jvaikath@redhat.com>
* Add new NamespaceIncludesExcludes struct
Signed-off-by: Joseph <jvaikath@redhat.com>
* Move namespace expansion logic
Signed-off-by: Joseph <jvaikath@redhat.com>
* Update backup status with expansion
Signed-off-by: Joseph <jvaikath@redhat.com>
* Wildcard status update
Signed-off-by: Joseph <jvaikath@redhat.com>
* Skip ns check if wildcard expansion
Signed-off-by: Joseph <jvaikath@redhat.com>
* Move wildcard expansion to getResourceItems
Signed-off-by: Joseph <jvaikath@redhat.com>
* lint
Signed-off-by: Joseph <jvaikath@redhat.com>
* Changelog
Signed-off-by: Joseph <jvaikath@redhat.com>
* linting issues
Signed-off-by: Joseph <jvaikath@redhat.com>
* Remove wildcard restore to check if tests pass
Signed-off-by: Joseph <jvaikath@redhat.com>
* Fix namespace mapping test bug from lint fix
The previous commit (0a4aabcf4) attempted to fix linting issues by
using strings.Builder, but incorrectly wrote commas to a separate
builder and concatenated them at the end instead of between namespace
mappings.
This caused the namespace mapping string to be malformed:
Before: ns-1:ns-1-mapped,ns-2:ns-2-mapped
Bug: ns-1:ns-1-mappedns-2:ns-2-mapped,,
The malformed string was parsed as a single mapping with an invalid
namespace name containing a colon, causing Kubernetes to reject it:
"ns-1-mappedns-2:ns-2-mapped" is invalid
Fix by properly using strings.Builder to construct the mapping string
with commas between entries, addressing both the linting concern and
the functional bug.
Fixes the MultiNamespacesMappingResticTest and
MultiNamespacesMappingSnapshotTest failures.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Joseph <jvaikath@redhat.com>
* Fix wildcard namespace expansion edge cases
This commit fixes two bugs in the wildcard namespace expansion feature:
1. Empty wildcard results: When a wildcard pattern (e.g., "invalid*")
matched no namespaces, the backup would incorrectly back up ALL
namespaces instead of backing up nothing. This was because the empty
includes list was indistinguishable from "no filter specified".
Fix: Added wildcardExpanded flag to NamespaceIncludesExcludes to
track when wildcard expansion has occurred. When true and the
includes list is empty, ShouldInclude now correctly returns false.
2. Premature namespace filtering: An earlier attempt to fix bug #1
filtered namespaces too early in collectNamespaces, breaking
LabelSelector tests where namespaces should be included based on
resources within them matching the label selector.
Fix: Removed the premature filtering and rely on the existing
filterNamespaces call at the end of getAllItems, which correctly
handles both wildcard expansion and label selector scenarios.
The fixes ensure:
- Wildcard patterns matching nothing result in empty backups
- Label selectors still work correctly (namespace included if any
resource in it matches the selector)
- State is preserved across multiple ResolveNamespaceList calls
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Joseph <jvaikath@redhat.com>
* Run wildcard expansion during backup processing
Signed-off-by: Joseph <jvaikath@redhat.com>
* Lint fix
Signed-off-by: Joseph <jvaikath@redhat.com>
* Improve coverage
Signed-off-by: Joseph <jvaikath@redhat.com>
* gofmt fix
Signed-off-by: Joseph <jvaikath@redhat.com>
* Add wildcard details to describe backup status
Signed-off-by: Joseph <jvaikath@redhat.com>
* Revert "Remove wildcard restore to check if tests pass"
This reverts commit 4e22c2af855b71447762cb0a9fab7e7049f38a5f.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Add restore describe for wildcard namespaces Revert restore wildcard removal
Signed-off-by: Joseph <jvaikath@redhat.com>
* Add coverage
Signed-off-by: Joseph <jvaikath@redhat.com>
* Lint
Signed-off-by: Joseph <jvaikath@redhat.com>
* Remove unintentional changes
Signed-off-by: Joseph <jvaikath@redhat.com>
* Remove wildcard status fields and mentionsRemove usage of wildcard fields for backup and restore status.
Signed-off-by: Joseph <jvaikath@redhat.com>
* Remove status update changelog line
Signed-off-by: Joseph <jvaikath@redhat.com>
* Rename getNamespaceIncludesExcludes
Signed-off-by: Scott Seago <sseago@redhat.com>
Signed-off-by: Scott Seago <sseago@redhat.com>
* Rewrite brace pattern validation
Signed-off-by: Joseph <jvaikath@redhat.com>
* Different var for internal loop
Signed-off-by: Joseph <jvaikath@redhat.com>
---------
Signed-off-by: Joseph <jvaikath@redhat.com>
Signed-off-by: Scott Seago <sseago@redhat.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-authored-by: Scott Seago <sseago@redhat.com>
Co-authored-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Bump golangci-lint to v1.25.0, because golangci-lint start to support
Golang v1.25 since v1.24.0, and v1.26.x was not stable yet.
Align action pr-linter-check's golangci-lint version to v1.25.0
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
Add documentation explaining how volume policies are applied before
VGS grouping, including examples and troubleshooting guidance for the
multiple CSI drivers scenario.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
VolumeGroupSnapshots were querying all PVCs with matching labels
directly from the cluster without respecting volume policies. This
caused errors when labeled PVCs included both CSI and non-CSI volumes,
or volumes from different CSI drivers that were excluded by policies.
This change filters PVCs by volume policy before VGS grouping,
ensuring only PVCs that should be snapshotted are included in the
group. A warning is logged when PVCs are excluded from VGS due to
volume policy.
Fixes#9344
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Fix the Job build error when BackupReposiotry name longer than 63.
Fix the Job build error.
Consider the name length limitation change in job list code.
Use hash to replace the GetValidName function.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
* Use ref_name to replace ref.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
---------
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
Update test expectations to include createdName field for resources
with action 'created'. Also ensure namespaces track their created
names when created via EnsureNamespaceExistsAndIsReady.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
When restoring resources with GenerateName, Kubernetes assigns the actual name
after creation, but Velero only tracked the original name from the backup in
itemKey. This caused volume information collection to fail when trying to fetch
PVCs using the original name instead of the actual created name.
Example:
- Original PVC name from backup: "test-vm-disk-1"
- Actual created PVC name: "test-vm-backup-2025-10-27-test-vm-disk-1-mdjkd"
- Volume info tried to fetch: "test-vm-disk-1" → Failed with "not found"
This affects any plugin or workflow using GenerateName during restore:
- kubevirt-velero-plugin (VMFR use case with PVC collision avoidance)
- Custom restore item actions using generateName
- Secrets/ConfigMaps restored with generateName
Changes:
1. Add createdName field to restoredItemStatus struct (pkg/restore/request.go)
2. Capture actual name from createdObj.GetName() (pkg/restore/restore.go:1520)
3. Use createdName in RestoredResourceList() when available (pkg/restore/request.go:93-95)
This fix is backwards compatible:
- createdName defaults to empty string
- When empty, falls back to itemKey.name (original behavior)
- Only populated for GenerateName resources where needed
Fixes volume information collection errors like:
"Failed to get PVC" error="persistentvolumeclaims \"<original-name>\" not found"
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
When restoring resources with GenerateName (where name is empty and K8s
assigns the actual name), the managed fields patch was failing with error
"name is required" because it was using obj.GetName() which returns empty
for GenerateName resources.
The fix uses createdObj.GetName() instead, which contains the actual name
assigned by Kubernetes after resource creation.
This affects any resource using GenerateName for restore, including:
- PersistentVolumeClaims restored by kubevirt-velero-plugin
- Secrets and ConfigMaps created with generateName
- Any custom resources using generateName
Changes:
- Line 1707: Use createdObj.GetName() instead of obj.GetName() in Patch call
- Lines 1702, 1709, 1713, 1716: Use createdObj in error/info messages for accuracy
This is a backwards-compatible fix since:
- For resources WITHOUT generateName: obj.GetName() == createdObj.GetName()
- For resources WITH generateName: createdObj.GetName() has the actual name
The managed fields patch was already correctly using createdObj (lines 1698-1700),
only the Patch() call was incorrectly using obj.
Fixes restore status showing FinalizingPartiallyFailed with "name is required"
error when restoring resources with GenerateName.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Add error message in the velero install CLI output if VerifyJSONConfigs fail.
Only allow one element in node-agent-configmap's Data.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
main branch will read go version from go.mod's go primitive, and
only keep major and minor version, because we want the actions to use
the lastest patch version automatically, even the go.mod specify version
like 1.24.0.
release branch can read the go version from go.mod file by setup-go
action's own logic.
Refactor the get Go version to reusable workflow.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
The Bitnami MinIO image bitnami/minio:2021.6.17-debian-10-r7 is no longer
available on Docker Hub, causing E2E tests to fail.
This change implements a solution to build the MinIO image locally from
Bitnami's public Dockerfile and cache it for subsequent runs:
- Fetches the latest commit hash of the Bitnami MinIO Dockerfile
- Uses GitHub Actions cache to store/retrieve built images
- Only rebuilds when the upstream Dockerfile changes
- Maintains compatibility with existing environment variables
Fixes#9279🤖 Generated with [Claude Code](https://claude.ai/code)
Update .github/workflows/e2e-test-kind.yaml
Signed-off-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Fix GetResourceWithLabel's bug: labels were not applied.
Add workOS for deployment and pod creationg.
Add OS label for select node.
Enlarge the context timeout to 10 minutes. 5 min is not enough for Windows.
Enlarge the Kibishii test context to 15 minutes for Windows.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
- Expanded the design to include detailed implementation steps for wildcard expansion in both backup and restore operations.
- Added new status fields to the backup and restore CRDs to track expanded wildcard namespaces.
- Clarified the approach to ensure backward compatibility with existing `*` behavior.
- Addressed limitations and provided insights on restore operations handling wildcard-expanded backups.
This update aims to provide a comprehensive and clear framework for implementing wildcard namespace support in Velero.
Signed-off-by: Joseph <jvaikath@redhat.com>
- Clarified the use of the standalone `*` character in namespace specifications.
- Ensured consistent formatting for `*` throughout the document.
- Maintained focus on backward compatibility and limitations regarding wildcard usage.
This update enhances the clarity and consistency of the design document for implementing wildcard namespace support in Velero.
Signed-off-by: Joseph <jvaikath@redhat.com>
- Updated the abstract to clarify the current limitations of namespace specifications in Velero.
- Expanded the goals section to include specific objectives for implementing wildcard patterns in `--include-namespaces` and `--exclude-namespaces`.
- Detailed the high-level design and implementation steps, including the addition of new status fields in the backup CRD and the creation of a utility package for wildcard expansion.
- Addressed backward compatibility and known limitations regarding the use of wildcards alongside the existing "*" character.
This update aims to provide a comprehensive overview of the proposed changes for improved namespace selection flexibility.
Signed-off-by: Joseph <jvaikath@redhat.com>
This commit add content to cover "includeExcludePolicy" in resource
policies.
It also tweak the wordings to clarify the "volume policy" and "resource
policies"
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
Fixes#4201: Ensure PriorityClasses are restored before pods that
reference them, preventing restoration failures when pods depend on
custom PriorityClasses.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
It failed with fetching the wrong VolumeInfo. Correct it.
Add AdditionalBSL label for applied cases.
Remove not needed default BSL kibishii test for additional bsl case.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
PVB and PVR used to print related pod namespace in output.
In v1.17, the behavior changed. Use backup or restore name to filter them.
Shorten the timeout context from 1h to 5m, because AWS was not covered anymore.
Remove an used if branch for vSphere.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
- Add --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
- Configure data mover pods (PVB/PVR/DataUpload/DataDownload) to use priority class from node-agent-configmap
- Configure maintenance jobs to use priority class from repo-maintenance-job-configmap (global config only)
- Add priority class validation with ValidatePriorityClass and GetDataMoverPriorityClassName utilities
- Update e2e tests to include PriorityClass testing utilities
- Move priority class design document to Implemented folder
- Add comprehensive unit tests for all priority class implementations
- Update documentation for priority class configuration
- Add changelog entry for #8883
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
remove unused test utils
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
feat: add unit test for getting priority class name in maintenance jobs
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
doc update
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
feat: add priority class validation for repository maintenance jobs
- Add ValidatePriorityClassWithClient function to validate priority class existence
- Integrate validation in maintenance.go when creating maintenance jobs
- Update tests to cover the new validation functionality
- Return boolean from ValidatePriorityClass to allow fallback behavior
This ensures maintenance jobs don't fail due to non-existent priority classes,
following the same pattern used for data mover pods.
Addresses feedback from:
https://github.com/vmware-tanzu/velero/pull/8883#discussion_r2238681442
Refs #8869
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
refactor: clean up priority class handling for data mover pods
- Fix comment in node_agent.go to clarify PriorityClassName is only for data mover pods
- Simplify server.go to use dataPathConfigs.PriorityClassName directly
- Remove redundant priority class logging from controllers as it's already logged during server startup
- Keep logging centralized in the node-agent server initialization
This reduces code duplication and clarifies the scope of priority class configuration.
🤖 Generated with [Claude Code](https://claude.ai/code)
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
refactor: remove GetDataMoverPriorityClassName from kube utilities
Remove GetDataMoverPriorityClassName function and its tests as priority
class is now read directly from dataPathConfigs instead of parsing from
ConfigMap. This simplifies the codebase by eliminating the need for
indirect ConfigMap parsing.
Refs #8869🤖 Generated with [Claude Code](https://claude.ai/code)
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
refactor: remove priority class validation from install command
Remove priority class validation during install as it's redundant
since validation already occurs during server startup. Users cannot
see console logs during install, making the validation warnings
ineffective at this stage.
The validation remains in place during server and node-agent startup
where it's more appropriate and visible to users.
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
fixes#8610
This commit extends the resources policy, such that user can define
resource include exclude filters in the policy and reuse it in different backups.
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
Addresses #9133 by adding clear documentation about the current limitation
where only the first element in the loadAffinity array is processed.
Changes:
- Added prominent warning at the beginning of loadAffinity section
- Updated misleading examples that showed multiple array elements
- Added warnings to each multi-element example explaining the limitation
- Clarified that the recommended approach is to combine all conditions
into a single loadAffinity element using both matchLabels and matchExpressions
This provides the "bare minimum" documentation clarification requested
in the issue until a code fix can be implemented.
🤖 Generated with [Claude Code](https://claude.ai/code)
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Apply suggestion from @kaovilai
Signed-off-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Apply suggestion from @kaovilai
Signed-off-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Apply suggestion from @kaovilai
Signed-off-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
feat: Add CA cert fallback when caCertFile fails in download requests
- Fallback to BSL cert when caCertFile cannot be opened
- Combine certificate handling blocks to reuse CA pool initialization
- Add comprehensive unit tests for fallback behavior
This improves robustness by allowing downloads to proceed with BSL CA cert
when the provided CA cert file is unavailable or unreadable.
🤖 Generated with [Claude Code](https://claude.ai/code)
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
If distributed snapshotting is enabled in the external snapshotter a manager label is added to the volume snapshot content. When exposing the snapshot velero needs to keep this label around otherwise the exposed snapshot will never become ready.
Signed-off-by: Felix Prasse <1330854+flx5@users.noreply.github.com>
* Enhance File System Backup documentation with details on volume snapshot behavior and backup method decision flow
Fixes: Velero AWS snapshots not occurring with the AWS plugin #9090
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
* Clarify conditions for excluding volumes from File System Backup and enhance decision flow for volume snapshots
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
---------
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
The ResticIdentifier field in BackupRepository is only relevant for restic
repositories. For kopia repositories, this field is unused and should be
omitted. This change:
- Adds omitempty tag to ResticIdentifier field in BackupRepository CRD
- Updates controller to only populate ResticIdentifier for restic repos
- Adds tests to verify behavior for both restic and kopia repository types
This ensures backward compatibility while properly handling kopia repositories
that don't require a restic-compatible identifier.
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
The reason is toolchain cannot update automatically.
If 1.24 can force CI use the latest patch version,
and it will not force user to upgrade their local go version,
this should be the better approach.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
add changelog file
Show defaultVolumesToFsBackup in describe only when set by the user
minor ut fix
minor fix
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Refactor backup volume info retrieval and snapshot checkpoint building in e2e tests
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
log backup volume info retrieval and snapshot checkpoint building
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Add error handling for volume info retrieval in backup tests
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Add error handling for volume info retrieval in backup tests
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
* Update snapshot checkpoint building to use DefaultKibishiiWorkerCounts
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
---------
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
This PR fixes issue #8870 where Velero was unnecessarily adding the restore-wait init container when restoring pods with volumes that were backed up using native datamover or CSI.
When restoring pods with volumes, Velero was always adding the restore-wait init container, even when the volumes were backed up using native datamover or CSI and didn't need file system restores. This was causing unnecessary overhead and potential issues.
PVR action to remove restore-wait init container on restore
Changes:
- Remove ALL existing restore-wait init containers before deciding whether to add a new one
- This covers both scenarios: when no file system restore is needed AND when preventing duplicates
- Simplify the add logic since we've already cleaned up existing containers
- Add better logging to show how many containers were removed
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
* Clarify thirdparty label/annotations on the maintenance jobs
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
* Clarify that maintenance jobs do not inherit all labels/annotations
- Address PR review feedback and issue #8974
- Make it explicit that only specific predefined third-party labels and annotations are propagated
- Add Important note to prevent user confusion about label/annotation inheritance behavior
- Currently only azure.workload.identity/use label and iam.amazonaws.com/role annotation are inherited
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
---------
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-authored-by: Xun Jiang/Bruce Jiang <59276555+blackpiglet@users.noreply.github.com>
* Add support for pod labels and service account annotations in Velero configuration
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
* Refactor Velero configuration to use string types for pod labels and service account annotations
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
* Please notice only Kibishii workload support Windows test,
because the other work loads use busybox image, and not support Windows.
* Refactor CreateFileToPod to support Windows.
* Add skip logic for migration test if the version is under 1.16.
* Add main in semver check.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
Migration cases use the Kibishii as the workload, and SC mapping
ConfigMap was needed for all scenarios, because standby cluster
doesn't have the Kibishii SC after setting up.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
The restore workflow used name represents the backup resource and the
restore to be restored, but the restored resource name may be different
from the backup one, e.g. PV and VSC are global resources, to avoid
conflict, need to rename them.
Reanme the name variable to backupResourceName, and use obj.GetName()
for restore operation.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
Enables the node-agent to start even if the
/host_pods path does not exist.
If the path is present, the existing logic
remains unchanged, ensuring it is readable.
Signed-off-by: Michal Pryc <mpryc@redhat.com>
Combine existing PVC non-CSI RIAs and move annotation
removal out of the CSI plugin to fix issues with
CSI volumes when using fs-backup
Signed-off-by: Scott Seago <sseago@redhat.com>
This change fixes a minor typo in the Backup Hooks documentation, changing "Defaults is" to "Defaults to".
Signed-off-by: Andreas Lindhé <7773090+lindhe@users.noreply.github.com>
Also bumped to support upgraded k8s.io/ deps.
- controller-gen to v0.16.5
- sigs.k8s.io/controller-runtime v0.19.2
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Run backup post hooks inside ItemBlock synchronously as the ItemBlocks are handled asynchronously
Fixes#8516
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
This commit makes sure when a backup is deleted the controller will
delete the CSI snapshot even when the bakckup tarball is not uploaded.
fixes#8160
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
Check the PVB status via podvolume Backupper rather than calling API server to avoid API server issue
Fixes#8587
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
This commit makes change in restore finalizer controller, to make it
check the status in item operation of a PVC before patch the PV that is
bound to it. If the operation is not successful it will skip patching
the PV.
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
The issue is caused by the changes of controller-runtime: WithEventFilter() doesn't apply to WatchesRawSource(),
this commit set Predicate for WatchesRawSource() seperatedly
Fixes#8437
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
* issue 8433: add ask label to data mover pods
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
* check existence of the same label from node-agent
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
---------
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
This commit adds SecurityContext that complies with "restricted" level
per Pod Security Standards to "restore-helper" initContainer.
It ensures the restore won't fail when the cluster enforces PSA.
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
* Only install and uninstall SC and VSC once for default cluster.
* Install and uninstall SC and VSC for standby cluster on migration case.
* Refactor the StorageClass and VolumeSnapshotClass YAMLs.
* Prettify the e2e_suite_test.go
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
Run the E2E test on kind / run-e2e-test (1.23.17, (NamespaceMapping && Single && Restic) || (NamespaceMapping && Multiple && Restic)) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.23.17, ResourceModifier || (Backups && BackupsSync) || PrivilegesMgmt || OrderedResources) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.24.17, (NamespaceMapping && Single && Restic) || (NamespaceMapping && Multiple && Restic)) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.24.17, ResourceModifier || (Backups && BackupsSync) || PrivilegesMgmt || OrderedResources) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.25.16, (NamespaceMapping && Single && Restic) || (NamespaceMapping && Multiple && Restic)) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.25.16, ResourceModifier || (Backups && BackupsSync) || PrivilegesMgmt || OrderedResources) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.26.13, (NamespaceMapping && Single && Restic) || (NamespaceMapping && Multiple && Restic)) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.26.13, ResourceModifier || (Backups && BackupsSync) || PrivilegesMgmt || OrderedResources) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.27.10, (NamespaceMapping && Single && Restic) || (NamespaceMapping && Multiple && Restic)) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.27.10, ResourceModifier || (Backups && BackupsSync) || PrivilegesMgmt || OrderedResources) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.28.6, (NamespaceMapping && Single && Restic) || (NamespaceMapping && Multiple && Restic)) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.28.6, ResourceModifier || (Backups && BackupsSync) || PrivilegesMgmt || OrderedResources) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.29.1, (NamespaceMapping && Single && Restic) || (NamespaceMapping && Multiple && Restic)) (push) Has been skipped
Run the E2E test on kind / run-e2e-test (1.29.1, ResourceModifier || (Backups && BackupsSync) || PrivilegesMgmt || OrderedResources) (push) Has been skipped
* Add new flag HAS_VSPHERE_PLUGIN for E2E test.
* Modify the E2E README for the new parameter.
* Add the VolumeSnapshotClass for VKS.
* Modify the plugin install logic.
* Modify the cases to support data mover case in VKS.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
Always test latest available patch version of each supported k8s version available in Kindest/node images.
ie. This adds v1.31, v1.30 to test matrix and upgrade patch versions for others.
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
This commit implements a new --annotations flag in the backup and restore create commands.
This allows users to specify key-value pairs for annotations directly at the time of backup and restore creation, in the same way as the --labels flag.
Signed-off-by: Alvaro Romero <alromero@redhat.com>
by changing output type to image.
Then you can execute command like so to create a multi-arch image
```
BUILDX_PLATFORMS=linux/amd64,linux/arm64 BUILDX_OUTPUT_TYPE=image make container
```
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
As with other plugin types, the information on how to implement
an IBA plugin will be in the velero-plugin-example repo.
Signed-off-by: Scott Seago <sseago@redhat.com>
Rename backup-repository-config to backup-repository-configmap.
Rename repo-maintenance-job-config to repo-maintenance-job-configmap.
Rename node-agent-config to node-agent-configmap.
Add those three parameters to `velero install` CLI.
Modify the design and the site documents.
Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
package "k8s.io/api/core/v1" is being imported more than once (ST1019)
pvc_pv_test.go:32:2: other import of "k8s.io/api/core/v1"
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
The code changes are related to the `NewPeriodicalEnqueueSource` function in the `kube/periodical_enqueue_source.go` file. This function is used to create a new instance of the `PeriodicalEnqueueSource` struct, which is responsible for periodically enqueueing objects into a work queue.
The changes involve adding two new parameters to this function: `controllerName string` and modifying the existing `logger` parameter to include additional fields.
Here's what changed:
1. A new `controllerName` parameter was added to the `NewPeriodicalEnqueueSource` function.
These changes are to adding more context or metadata to the logging output, possibly for debugging or monitoring purposes.
The other files (`restore_operations_controller.go`, `schedule_controller.go`, and their respective test files) were modified to use this updated `NewPeriodicalEnqueueSource` function with the new `controllerName` parameter.
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Changed the tests to use mocked function that will not read actual
secrets from env variables nor AWS config file that may be
on the system that is running tests.
As a second guard against exposed secrets comparison for the values
does not shows the actual values for the AWS data. This is to prevent
situation where programming error may still allow the test to read
AWS config/env variables instead of using mocked function.
Signed-off-by: Michal Pryc <mpryc@redhat.com>
* Correcting Openshift on IBM Documentation Error
I have to admit to some significant error and embarrassment regarding the documentation update
about Openshift on IBM Cloud pull request https://github.com/vmware-tanzu/velero/pull/8069.
I will correct my error before it gets any further.
Just exposing /var/data/kubelet/pods is incorrect and host path /var/lib/kubelet/pods should remain unchanged.
The errors with the defaults during csi snapshot data movement were:
data path backup failed: Failed to run kopia backup: unable to get local
block device entry: resolveSymlink: lstat /var/data/: no such
file or directory
I suspected this was the same as RancherOS and Nutanix. It is not.
The original tested changes changed both /var/lib/kubelet/{pods,plugins} to
/var/data/kubelet/{pods,plugins}.
The published changes only result in the error:
```
status:
completionTimestamp: '2024-08-02T17:12:29Z'
message: >-
data path backup failed: Failed to run kopia backup: unable to get local
block device entry: resolveSymlink: lstat /var/data/kubelet/plugins: no such
file or directory
node: 10.240.0.5
phase: Failed
progress: {}
startTimestamp: '2024-08-02T17:12:11Z'
```
After making continued modifications to the daemonset the correct configuration was:
```
volumeMounts:
- name: host-pods
mountPath: /host_pods
mountPropagation: HostToContainer
- name: host-plugins
mountPath: /var/data/kubelet/plugins
mountPropagation: HostToContainer
```
```
volumes:
- name: host-pods
hostPath:
path: /var/lib/kubelet/pods
type: ''
- name: host-plugins
hostPath:
path: /var/data/kubelet/plugins
type: ''
```
Only the changes to the plugin path were required.
The plugin path changes were required to both the mount path and the host path.
Regardless of whether /var/lib/kubelet/pods or /var/data/kubelet/pods host path, backups and restore
succeeded provided the plugin path was modified.
```
volumeMounts:
- name: host-pods
mountPath: /host_pods
mountPropagation: HostToContainer
- name: host-plugins
mountPath: /var/data/kubelet/plugins
mountPropagation: HostToContainer
```
```
volumes:
- name: host-pods
hostPath:
path: /var/data/kubelet/pods
type: ''
- name: host-plugins
hostPath:
path: /var/data/kubelet/plugins
type: ''
```
After getting on-host access was able to confirm. Pods are at /var/lib/kubelet/pods.
```
ls /var/lib/kubelet/pods
07c0be63-335d-4cfb-b39f-816bc2fb32cd
51f31b3e-4710-4ef0-8626-5f1a78a624b2
a4802fd3-3b62-45a4-8f21-974880b6f92a
cccb35c9-b4f9-4ca9-a697-736ae64f09ad
0a5d4366-7fa1-4525-9e45-a43a362b8542
558b0643-0661-4d4a-b03e-aac60c6ad710
a4b106fb-5b7b-48e5-828a-ea7b41ba0e59
ce1290e1-4330-4df6-8166-14784bcce930
```
On host the volumes are in /var/data/kubelet/plugins.
```
ls /var/data/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/231e04896c4f528efb95d23a3c153db9fc4a7206b7320f74443f30de7228dba5/globalmount/velero/backups/backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c/
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-csi-volumesnapshotclasses.json.gz
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-resource-list.json.gz
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-csi-volumesnapshotcontents.json.gz
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-results.gz
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-csi-volumesnapshots.json.gz
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-volumesnapshots.json.gz
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-itemoperations.json.gz
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c.tar.gz
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-logs.gz
velero-backup.json
backup-resources-41d84d16-47a7-4ea8-a9cb-6348d01bcb2c-podvolumebackups.json.gz
```
With the volume config changed to expose /var/data/kubelet/plugins for the plugin hostPath, the DataUploads and DataDownloads
succeed for both Filesystem and Block mode PVCs.
```
status:
completionTimestamp: '2024-08-02T17:23:33Z'
node: 10.240.0.5
path: >-
/host_pods/7fcb9d56-7885-437c-acd3-67db6b1ee8ae/volumeDevices/kubernetes.io~csi/pvc-47b91f56-db8c-44bf-9ecc-737170561b4b
phase: Completed
progress:
bytesDone: 5368709120
totalBytes: 5368709120
snapshotID: 8faae36b3592fee4efbfad024f26033e
startTimestamp: '2024-08-02T17:21:22Z'
```
```
status:
completionTimestamp: '2024-08-02T18:42:19Z'
node: 10.240.0.5
phase: Completed
progress:
bytesDone: 5368709120
totalBytes: 5368709120
startTimestamp: '2024-08-02T18:41:00Z'
```
My apologies for the error.
Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
* Add context to plugins mountPath
Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
---------
Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
This commit makes sure the dbr's status is "Processed" when an error
happens before the actual deletion is started
fixes#7812
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
Added option checksumAlgorith, this stops 403 errors as per https://github.com/vmware-tanzu/velero/issues/7543
Added plugins line as velero install failed without this option in version 1.14.0
Removed the volumesnapshotlocation as it does not exist in 1.14.0
Signed-off-by: Gareth Anderson <gareth.anderson03@gmail.com>
Updates the documentation for CSI snapshot data movement for OpenShift
on IBM Cloud.
The default hostpath /var/lib/kubelet/pods cannot find
PersistentVolumeClaims with volumeMode: Block on host.
The correct hostpath for OpenShift on IBM Cloud is
/var/data/kubelet/pods.
Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
Breaking change (can be mitigated if needed in the future): v1 branch accepted an optional seconds field at the beginning of the cron spec. This is non-standard and has led to a lot of confusion. The new default parser conforms to the standard as described by [the Cron wikipedia page.](https://en.wikipedia.org/wiki/Cron). It is unlikely that this affects us per https://github.com/vmware-tanzu/velero/pull/31
Other notes:
> CRON_TZ is now the recommended way to specify the timezone of a single schedule, which is sanctioned by the specification. The legacy "TZ=" prefix will continue to be supported since it is unambiguous and easy to do so.
References: https://pkg.go.dev/github.com/robfig/cron/v3#readme-upgrading-to-v3-june-2019
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
add changelog file
change log level and add more detailed comments
make update
add return for sc get call if error
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Check whether the namespaces specified in the
backup.Spec.IncludeNamespaces exist during backup resource collcetion
If not, log error to mark the backup as PartiallyFailed.
Signed-off-by: Xun Jiang <blackpigletbruce@gmail.com>
This commit fixes#7849.
It will use PVC instead of PV to track CSI snapshots to generate restore
volume info metadata. So that in the case the PVC is not bound to PV
the metadata can be populated correctly.
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
When dry-run the tag-release.sh, there's an error
"fatal: detected dubious ownership in repository at
'/github.com/vmware-tanzu/velero'"
This commit works around this issue to make sure "tag-release.sh"
can finish successful
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
Tweak the command and remove the sections which include upgrading from
older versions, given v1.13.x is a prerequisite.
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
As per PR #7281, if repository count is more than 1, then snapshots deletion is achieved with a fast way, then we should have more than 1 FS backup repository per backup.
Signed-off-by: danfengl <danfengl@vmware.com>
1. In data movement scenario, volumesnapshotcontent by Velero backup will be deleted instead of retained in CSI scenaito, so add
a checkpoint for data movement scenario to verify no volumesnapshotcontent left after Velero backup;
2. Fix global context varaible issue, context varaible is not effective due to it's initialized right after the very beginning of
all tests instead of beginning of each test, so if someone script a new E2E test and did not overwrite it in the test body, then it
will fail the test if it was triggerd one hour later;
3. Due to CSI plugin is deprecated, it breaked down migration tests, because v1.13 still needs to install CSI plugin for the test.
Signed-off-by: danfengl <danfengl@vmware.com>
1. Add sleep for native snapshot tests when using test.go interface;
2. Add --confirm for velero plugin add CLI as new feature introduced.
Signed-off-by: danfengl <danfengl@vmware.com>
This commit makes change to CLI so `velero restore describe` will
download restore volume info and render the CSI snapshot restores based
on its content.
Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com>
This commit bumps up the golang for building and testing velero to v1.22
It also updates controller-gen to v0.14.0 to fix an issue under new
versino of go.
More details see https://github.com/golang/go/issues/65637
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
The wait error changed from `timed out waiting for the condition`
to `context deadline exceeded`.
Signed-off-by: Xun Jiang <blackpigletbruce@gmail.com>
handleSkippedPVHasRetainPolicy
According to comment, calling executePVAction aims to reset PV's
claimRef, but the reset logic was moved into resetVolumeBindingInfo
since release-1.4.
Signed-off-by: Xun Jiang <blackpigletbruce@gmail.com>
Check the existence of the namespaces provided in the "--include-namespaces" opt
ion and reports validation error if not found
Fixes#7431
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
This commit makes sure when kopia connects to the repository the
crendentials file specified in BSL.spec.config has the higher priority over
Pod Environment credentials when IRSA is configured.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
When debugging this error it is currently hard to identify what
CRD is causing the issue. This is particularly difficult when
dealing with over a hundred CRDs.
Signed-off-by: Jose Arevalo <jose.matias.arevalo@gmail.com>
Make "disable-informer-cache" option false(enabled) by default to keep it consi
stent with the help message
Fixes#7264
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
fixes#7263
This commit makes the data structures more consistent, that namespaces,
as cluster scoped resource will not have "targetNamespace" in the
"restoreableItem" instance.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Add sleep to avoid snapshot limitation issue https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html#:~:text=SnapshotCreationPerVolumeRateExceeded;
2. Move InstallVelero variable out of struct of Veleroconfig as a global one since it's not for controlling any individual case;
3. Unskip migration test case on AWS pipeline, because we added a new EKS pipeline and deleted TKG AWS pipline in internal E2E test, so this restriction for TKG AWS pipline is no long existed;
4. Skip retainPV test on vSphere pipeline due to PV longtime bounding issue;
5. Fix failing get snapshot by CSI from EC2 issue, snapshot by CSI has no label of backup name.
Signed-off-by: danfengl <danfengl@vmware.com>
VolumeInfo contains several sub-structures. They are filled for
different scenarios. Do not generate empty structure for the
not filled sub-structures.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Fixes#1970
Namespaces will be handled as cluster-scope resource, but for
consistency they will still created via "Ensure namespace" flow for
consistency.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
This commit makes sure if a PV is not taken snapshot b/c the flag
SnapshotVolumes is set to false in a backup CR, the PV is also also
tracked as skipped in the tracker.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Modify design according to comments.
Add PVInfo structure.
Add backup VolumeInfo's object storage's put and get methods.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Remove dependecy of generate client from pkg/cmd/cli/snapshotLocation.
Remove the Velero generated informer from PVB and PVR.
Remove dependency of generated client from pkg/podvolume directory.
Replace generated codec with runtime codec.
Signed-off-by: Xun Jiang <jxun@vmware.com>
enabled, before executing the action.
The DeleteItemAction is not checked, because the DIA doesn't have a
method to get the action's plugin name.
This should be OK, because the CSI will check whether the VS and VSC
have a backup name annotation. If the VS and VSC is not handled by
the CSI plugin, then they don't have the annotation.
Signed-off-by: Xun Jiang <jxun@vmware.com>
PVC block mode backup and restore introduced some OS specific
system calls. Those calls are not available for Windows, so
add both non Windows version and Windows version code, and
return error for block mode on the Windows platform.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Use informer cache with dynamic client for Get calls on restore
When enabled, also make the Get call before create.
Add server and install parameter to allow disabling this feature,
but enable by default
Signed-off-by: Scott Seago <sseago@redhat.com>
When creating resources with generateName, apimachinery
does not guarantee uniqueness when it appends the random
suffix to the generateName stub, so if it fails with
already exists error, we need to retry.
Signed-off-by: Scott Seago <sseago@redhat.com>
* doc: Alert that plugins run as binaries when turning on debug logs
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! doc: Alert that plugins run as binaries when turning on debug logs
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! doc: Alert that plugins run as binaries when turning on debug logs
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! doc: Alert that plugins run as binaries when turning on debug logs
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
---------
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
When preparing a backup repository, Velero tries to connect to it, if fails then create it. The repository status always records the error reported by creation but the real reason maybe caused by the connect operation. This is confuseing and hard to debug
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
This commit introduces our own Azure storage provider by wrapping Kopia's implementation rather than contributing to upstream based on the following considerations:
1. Velero needs the capability to interact with the repository concurrently while Kopia doesn't, this will increase the complexity of Kopia if we contribute to upstream
2. The configuration items provided by Velero and Kopia are conflict, e.g. Velero supports customizing storage account URI which is a full path while Kopia supports customizing storage account domain which is part of the URI. We need to consider the backward compatibility and upgrade case if we contribute to upstream which needs extra efforts
3. Contribute to upstream is a longer cycle when we need to introduce new changes. With this commit, we no longer depends on upstream for the Azure storage provider part and is easy for us to maintain
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
1. Skip deleting the restore files from storage if the backup/BSL is not found
2. Allow deleting the restore files from storage even though the BSL is readonly
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
1. Capture Velero pod log and K8S cluster event;
2. Fix wrong path of storageclass yaml file issue caused by pert test;
3. Fix change storageclass test issue that no sc named 'default' in EKS cluster;
4. Support AWS credential as config format;
5. Support more E2E script input parameters like standy cluster plugins and provider.
Signed-off-by: danfengl <danfengl@vmware.com>
Enlarge throttle of UT case TestThrottle_ShouldOutput to avoid occasional CI
failure due to timeout caused by test environment's CPU speed
Signed-off-by: Xun Jiang <jxun@vmware.com>
This commit introduces a deleteItemAction which writes a temporary configmap to
record the snapshot info so that the controller can trigger repo manager
to remove the snapshot
This process is a bit chatty and we should consider to refactor the code
so it's easier to connect to the repo directly in the DIA
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
* fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fixup! fix: Typos and add more spell checking rules to CI
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
---------
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
1. In K8S v1.27 API Version v1beta1 for CR volumesnapshotclass is deprcated, so E2E test should adapt both API versions to cover all K8S versio;
2. Support getting additional plugin from input;
3. Velero version and plugin map should not deprated version older than v1.10, because upgrade test will use them.
Signed-off-by: danfengl <danfengl@vmware.com>
when running `go mod why -m github.com/kopia/kopia` in velero-plugins prior to this change you will see following
```
❯ go mod why -m github.com/kopia/kopia
github.com/konveyor/openshift-velero-plugin/velero-plugins
github.com/vmware-tanzu/velero/pkg/plugin/framework
github.com/vmware-tanzu/velero/pkg/util/logging
github.com/kopia/kopia/repo/logging
```
after
```
❯ go mod why -m github.com/kopia/kopia
(main module does not need module github.com/kopia/kopia)
```
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Following the examples instructions[1], the nginx-deployment is not
backed up or restored. Add a label to the deployment so it will be
backed up and restored.
Similar change is needed for `examples/nginx-app/with-pv.yaml` but I did
not try that example.
[1] https://velero.io/docs/v1.11/contributions/minio/Fixes#6347
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
1. Because VolumeSnapshot and VolumeSnapshotContent CRs are not kept after backup completed,
don't persist them in the backup metadata.
2. Add some builder methods needed by CSI plugin.
Signed-off-by: Xun Jiang <jxun@vmware.com>
1. Bumpup velero version to the latest 2 versions in upgrade script;
2. Bumpup velero verioin to the latest 1 vesion in migration script;
3. Bring B/R with restic test back in vSphere pipeline since vSphere plugin issue fix was included
in v1.5;
4. Disable nodeport test in AWS pipeline since AWS k8s version bumpup;
5. Prepare for data mover test, allow object store provider diffrent from cloud provider.
Signed-off-by: danfengl <danfengl@vmware.com>
Due to the logic moving to plugin, and the plugin cannot read the
Velero server's resourceTimeout setting, add the resourceTimeout
in the backup annotation to pass to plugin.
Remove VolumeSnapshotContent reset code from Velero server.
Signed-off-by: Xun Jiang <jxun@vmware.com>
For some use cases, namespaced-scope resources are inluded into backup,
but the namespaces are not included due to filters setting.
To do this, removing label selector filter from namespace resource.
Namespace resource only honor namespace exclude/include filters.
Signed-off-by: Xun Jiang <jxun@vmware.com>
This commit skips updating the restore progress, in the first loop for
restoration when CRDs are handled, so that the misleading "totalItem"
will not appear in the CR.
Fixes#5990
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Fix context issues produced by previous PR, increase timeout or add case scpoed global timeout param to make backup/restore command timeout configurable.
2. Add global param for storage class name using by test cases;
3. Fix param DefaultVolumesToFsBackup usage issue: set DefaultVolumesToFsBackup to false in backup CLI in case it was set to true in install CLI.
4. Make namespace names of each namespace mapping test unique from being interfered by each other.
Signed-off-by: danfengl <danfengl@vmware.com>
The log message should be clarified, otherwise when a user chooses to do
the backup via podvolme there will be confusing logs, but actually it's
just skipping the BIA for CSI plugin.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Restore Services before Clusters so they can be adopted by AKO-operator and no new Services will be created for the same clusters
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Update README to clarify the backward compatibility.
Trivial update to the support process to reflect how issues are labeled
as for now.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1) default frequency 10s
2) per-reconcile log is now Debug not info
3) added predicate to reduce reconcile events
Signed-off-by: Scott Seago <sseago@redhat.com>
Add secret restore item action to handle service account token secret:
1. Skip the restoration for the auto-created service account token secret
2. Remove several fields for non-auto-created service account token secret to make sure the secret can be restored
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Use the same pvb/pvr update functions across pkg/controller and pkg/cli/nodeagent for consistency of behavior
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
The option "--service-account-name" is to be added to that user can use
an existing service account for velero and node-agent pods. This is
helpful for users who wanna use IRSA.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Update the group members of "maintainers" and "tech-writer" to reflect
the change in the team.
As for the group "tech-writer" I just selected a few members from
maintains team who has been working on velero for a relatively longer
time.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1) clarification around Cancel() return values
2) updates to itemoperation json upload to account for progress
3) update to OperationProgress struct to avoid duplicate parameter
4) update new B/R phase name to WaitingForPluginOperationsPartiallyFailed for consistency
Signed-off-by: Scott Seago <sseago@redhat.com>
Due to CSIDriver is checked for Restic volume mounting path, and CSIDriver is GA and moved to storage v1 group in k8s v1.18, so update Velero v1.8, v1.9 and v1.10 compatible k8s version to 1.18-latest.
Signed-off-by: Xun Jiang <blackpiglet@gmail.com>
This commit makes update to the update api-types docs to add missing
fields.
It also includes misc changes to the inline comment, and a change to
Dockerfile to make sure the build-image works on mac
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Restore ClusterBootstrap before Cluster otherwise a new default ClusterBootstrap object is create for the cluster
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
This design combines the requirements for the previously-merged
Upload Progress Monitoring design with the requirements for the
(not submitted but discussed in meetings and slack) proposed asynchronous
item action plugins into one integrated proposal.
Signed-off-by: Scott Seago <sseago@redhat.com>
Enhance the restore priorities list to support specifying the low prioritized resources that need to be r
estored in the last
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
The container name for the aws plugin is `velero-plugin-for-aws`. There was an extra `velero-` prefix in the doc.
Signed-off-by: Dave Pedu <dave@davepedu.com>
1. Fix issue of kubectl client and server mismatch version in GitAction E2E job, refer to https://github.com/elastic/cloud-on-k8s/issues/4737;
2. Adapt to the changing of keyword for involing Kpoia as fs backupper, new installtion breaked upgrade and migration tests;
3. Accept multi-labels of Ginkgo focus as input of E2E make command;
4. Distinguish workload namespace from each tests;
5. Fix issues of not using Velero util to perform Velero commands;
6. Add snapshot test case for NamespaceMapping E2E test;
7. Collect debug bundle after catching error of Velero backup or restore command;
Signed-off-by: danfengl <danfengl@vmware.com>
The RIA refactoring moved velero.RestoreItemAction into a separate
(restoreitemaction) v1 package. Unfortunately, this change would require
plugins to make code changes to locate the RestoreItemActionExecuteInput
and RestoreItemActionExecuteOutput structs.
This commit restores those structs to the original velero package, leaving
just the RestoreItemAction interface in the new v1 package.
Signed-off-by: Scott Seago <sseago@redhat.com>
This commit provides a simple contract that if the BackupItemAction
plugin sets an annotation in a resource it has handled, the additional
items will considered "must include" i.e. each of them will skip the
"include-exclude" filter, such that the plugin developer can make sure
they are included in the backup disregarding the filter setting in the
bakcup CR.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. One of API group test failed due to other PR with fix for treat PartiallyFailed as failure to collect debugbundle without wrap the origin error;
2. Fix migration test issue of wrong velero cli for backup commmand;
3. Fix wrong pararmeter name issue for pv opt-out backup test.
Signed-off-by: danfengl <danfengl@vmware.com>
When running velero backup/restore command, if the command result is "PartiallyFailed", it won't reture error as design, but we do need to know the debug information to figure out the reason, so the command output is needed to get the command result, then further action will be taken.
Signed-off-by: danfengl <danfengl@vmware.com>
Refactors the framework package to implement the plugin versioning changes
needed for BIA v1 and overall package refactoring to support plugin versions
in different packages. This should be all that's needed to move on to
v2 for BackupItemAction. The remaining plugin types still need similar
refactoring to what's being done here for BIA before attempting a
v2 implementation.
Signed-off-by: Scott Seago <sseago@redhat.com>
Refactors the clientmgmt package to implement the plugin versioning changes
needed for BIA v1 and overall package refactoring to support plugin versions
in different packages. This should be all that's needed to move on to
v2 for BackupItemAction. The remaining plugin types still need similar
refactoring to what's being done here for BIA before attempting a
v2 implementation.
Signed-off-by: Scott Seago <sseago@redhat.com>
I think is necessary this little comment about TTL expiration, because it can be confusing when the expiration time has passed and the data allocated and the snapshots are not erased at that time.
Signed-off-by: Aaron Arias <33655005+aaronariasperez@users.noreply.github.com>
If generating protoc go files from scratch, `make update` fails if
CRD generation happens first, since the protoc-generated
files are imported by the api go files.
protoc generation needs to happen earlier.
Signed-off-by: Scott Seago <sseago@redhat.com>
In determining whether a backup includes all namespaces, item_collector
checks for an empty string in the first element of the ns list. If processing
includes+excludes results in an empty list, treat this as another case
of a not-all-namespaces backup rather than crashing velero.
Signed-off-by: Scott Seago <sseago@redhat.com>
1. Add some refactored controllers initiation code into enabledRuntimeControllers.
2. Add reconciler struct initiation function for DownloadRequest and ServerStatusRequest controllers.
Signed-off-by: Xun Jiang <blackpiglet@gmail.com>
This stops subheading description from showing in posted issues by default.
Signed-off-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
1. Clean backups after each test to avoid exceeding limitation of storage capability during E2E test;
2. Fix exlude label test issue that namespace should not be included and excluded at the same time no matter by which way to config.
Signed-off-by: danfengl <danfengl@vmware.com>
This commit adds the parameter "uploader-type" to velero server, add exposes the
setting via "velero install" in CLI.
fixes#5062
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Also checking annotation "pv.kubernetes.io/migrated-to" to find out whether volume is provisioned by CSI.
2. Add UT cases.
Signed-off-by: Xun Jiang <jxun@vmware.com>
This commit splits the pkg/restic package into several packages to support Kopia integration works
Fixes#5055
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
1. Make the Restore hook.InitConatianer server side field pruing disable.
2. Remove restore patch in update-generate-crd-code.sh.
3. Modify related testcases.
4. Add Container fields validation in Restore Init hook.
Signed-off-by: Xun Jiang <jxun@vmware.com>
"EnableAPIGroupVersions" is set
The crd-remap-version plugin will always backup v1b1 resource for some
CRD. It impacts the feature flag `EnableAPIGroupVersions` which means to
backup all versions, and make migration fail.
In this commit the featureSet was removed from plugin server struct b/c
it blocks the parm `--features` to be populated correctly. This change
should not have negative impact b/c the attribute in server struct is never used.
Fixes#5146
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
This commit adds additional fields to podvolumebackup
and podvolumerestore. The resticrepository will be renamed to
backuprepository
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Pass in a new copy of the map of config values rather than
modifying the BSL Spec.Config and then pass in that field.
Signed-off-by: Scott Seago <sseago@redhat.com>
This commit mitigates the issue for running "make update" locally when
the network is not friendly for accessing the default "proxy.golang.org"
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Mitigate the issue mentioned in #4782
When there's a bug or misconfiguration that causes nil pointer there
will be more stack trace information to help us debug.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Fix bsl validation bug: the BSL is validated continually and doesn't respect the validation period configured
Fixes#5056
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
* move 'velero.io/exclude-from-backup' label name to const
Signed-off-by: Niu Lechuan <lechuan.niu@daocloud.io>
* add changelog file (in changelogs/unreleased) of this PR
Signed-off-by: Niu Lechuan <lechuan.niu@daocloud.io>
1. remove go.sum file from code spell check action.
2. change go version to 1.17 in CRD verify action, and add k8s 1.23 and 1.24 in verification list.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Because the column and project specified by this action do not exist anymore, and Velero team doesn't use this action to assign issue and triage anymore, remove this action.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Update the release steps to reflect the change in the `tag-release.sh`,
that the release branch must be created manually before RC is tagged.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Use patch rather status patch in backup sync controller as we have disable status as sub resource
2. Set the GC period with default value if it isn't set
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
It's not necessary to set the deletion policy as the delete item action
plugin in CSI plugin will set it to Delete when the backup is deleted.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
When enabling the status as sub resource in CRD, the status will be ignored when creating the CR with status, this will cause issues when syncing backups/pvbs
Fixes#4950
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
We have made a few changes to the CSI plugin to provide official support
for AWS/Azure. This commit makes change to the docs to reflect those
changes.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Add filter functions for PeriodicalEnqueueSource.
Move BSL's valication frequency check test case to PeriodicalEnqueueSource's test.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Make in-progress PVB/PVR as failed when restic controller restarts to avoid hanging backup/restore
Fixes#4772
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
1. Add checkpoint in snapshot E2E test to verify snapshot CR should be created and snapshot should be created in cloud side after backup completion;
2. Fix snapshot name issue that CSI snapshot name in cloud side is not the same with other non-CSI cloud snapshots;
Signed-off-by: danfengl <danfengl@vmware.com>
When iterating over applicable restore actions, if a non-matching label
selector is found, velero should continue to the next action rather than
returning from the restoreItem func, which ends up preventing the item's
restore entirely.
Signed-off-by: Scott Seago <sseago@redhat.com>
1. Delete VolumeSnapshot directly when DeletionPolicy set to Retain.
2. Change VolumeSnapshotContent's DeletionPolicy to Retain, then delete VolumeSnapshot. After that delete VolumeSnapshotContent and change VSC DeletionPolicy to Delete back, then re-create the VolumeSnapshotContent.
Signed-off-by: Xun Jiang <jxun@vmware.com>
This commit makes backup sync controller delete the volumesnapshot and
volumesnapshotcontent created by the backup which is cleaned up as orphan
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
Make in-progress backup/restore as failed when doing the reconcile to avoid hanging in in-progress status
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Fixes#4760
This commit make changes in 2 parts:
1) When a volumesnapshotcontent is persisted during backup, velero will reset its
`Source` field to remove the VolumeHandle, so that the
csi-snapshotter will not try to call `CreateSnapshot` when its synced
to another cluster with a backup.
2) Make sure the referenced volumesnapshotclasses are persisted and
synced with the backup, so that when the volumesnapshotcontent is
deleted the storage snapshot is also removed.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. Add --insecure-tls for ResticManager's commands.
2. Add --insecure-tls in PodVolumeBackup and PodVolumeRestore controller.
3. Upgrade integrated Restic version to v0.13.1
4. Change --last flag in Restic command to --latest=1 due to Restic version update.
Signed-off-by: Xun Jiang <jxun@vmware.com>
As we are refactoring controllers with kubebuilder, use the controller-gen rather than code-generator to generate the deep copy methods for objects
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
- go install cmd/velero/velero.go
- go install cmd/velero-restic-restore-helper/velero-restic-restore-helper.go
Will generate binary in `$(go env GOPATH)/bin/` with the correct name.
build.sh still works the same.
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
* Add bsl related TTL gc errors to labelSelectors
* if backup label map is nil, make map
* clear label if not BSL error
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
This allows a user inspecting the restore logs to see any
errors or warnings generated by the restore so that they
will be seen even without having to use the describe cli.
Signed-off-by: Scott Seago <sseago@redhat.com>
* Add plugin versioning design doc
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Use more generic versions in scenarios section
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Address code review
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Address code review
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Modify design to allow other interface changes
The previous design assumed that only method addition would be
supported. It now includes guidance for making changes such as method
removal or signature changes.
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
Co-authored-by: Bridget McErlean <bmcerlean@vmware.com>
The GINKGO_SKIP option is updated to string that can be separated by "." for "make test-e2e".
Signed-off-by: Xun Jiang <jxun@vmware.com>
Signed-off-by: Hoang, Phuong <phuong.n.hoang@dell.com>
1. Mark the BSL as "Unavailable" when gets any error
2. Add a new field "Message" to the BSL status to record the error message
Fixes#4485Fixes#4405
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
When velero is running on clusters that don't support v1beta1 CRD, the
plugin will not try to backup v1beta1 CRD.
The plugin should be kept for backward compatibility. It will be
removed when velero drop the support for k8s v1.21
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. rename zoneSeparator to gkeZoneSeparator
2. add example of regional PV's node affinity. modify test case description.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Specify the risk of this parameter set to true. Add the issue first reported about this topic which includeds the google document illustrates about it.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Test case description is "Deleted backups are deleted from object storage and backups deleted from object storage can be deleted locally",
in this test, only resource backup objects are target for verifition, restic repo verification is not included in this PR, and snapshot verification will be in later PR
Signed-off-by: danfengl <danfengl@vmware.com>
Fix#4499
When hook influnce multiple pods, current logic's first pod's container will overwrite the hook's exec.container parameter. That will cause the other pod fail on the hook executing.
Signed-off-by: Xun Jiang <jxun@vmware.com>
By now, only busybox:latest is used by e2e. It is already upload to gcr.io/velero-gcp/busybox:latest
Change the image to gcr.io to avoid pulling rate limitation from docker hub.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Push to GCR in github workflow to faciliate some environments that have rate limitation to docker hub, e.g. vSphere.
<root@jxun-jumpserver.c.velero-gcp.internal>
Signed-off-by: Xun Jiang <jxun@vmware.com>
Since Itemsnapshotter plugin is still WIP,
this commit removes the reference and the deprecation of volumeSnapshotter plugin
from the doc to avoid confusion.
We'll update the doc when it's ready and we have a reference
implementation.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
* Use OrderedResources in schedules
Make ParseOrderedResources public for use in schedules
Add changelog
Signed-off-by: Dominic <dominic@xdnx.org>
* Rename function in comment section
Signed-off-by: Dominic <dominic@xdnx.org>
* #4067 Initial design of the new plugins - pre-post backup and restore
Signed-off-by: Rafael Brito <rbrito@vmware.com>
* Update new-prepost-backuprestore-plugin-hooks.md
* Updated design doc as per feedback
Signed-off-by: Rafael Brito <rbrito@vmware.com>
* Adding design changes as per feedback
* Update design on prepost-backup-restore plugins
* More color on how to call plugins
Signed-off-by: Rafael Brito <rbrito@vmware.com>
* Proposing annotations to skip plugin execution
Signed-off-by: Rafael Brito <rbrito@vmware.com>
We introduces the installation option "--default-restic-prune-frequency" to make restic prune frequency configuration in the previous release, but there is a bug that make the option don't take effect. This commit fixes the bug by removing the evaluation part. The restic repository controller will take care the prune frequency for the repository
Fixes#3062
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Check the existence of the expected service when ignoring the NodePort already allocated error
Fixes 2308
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Test case description is "Deleted backups are deleted from object storage and backups deleted from object storage can be deleted locally",
in this test, only resource backup objects are target for verifition, restic repo verification is not included in this PR, and snapshot verification will be in later PR
Signed-off-by: danfengl <danfengl@vmware.com>
* Migrate backup sync controller from code-generator to kubebuilder
1. use kubebuilder's reconcile logic to replace controller's old logic.
2. use ginkgo and gomega to replace testing.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* Fix: modify code according to comments
1. Remove DefaultBackupLocation
2. Remove unneccessary comment line
3. Add syncPeriod default value setting logic
4. Modify ListBackupStorageLocations function's context parameter
5. Add RequeueAfter parameter in Reconcile function return value
Signed-off-by: Xun Jiang <jxun@vmware.com>
* Reconcile function use context passed from parameter
1. Use context passed from parameter, instead of using Reconciler struct's context.
2. Delete Reconciler struct's context member.
3. Modify test case accordingly.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* Remove backups and restic repos associated with deleted BSL(s)
Signed-off-by: F. Gold <fgold@vmware.com>
* add changelog
Signed-off-by: F. Gold <fgold@vmware.com>
* Add PR number to changelog
Signed-off-by: F. Gold <fgold@vmware.com>
* Fix typo
Signed-off-by: F. Gold <fgold@vmware.com>
* Only delete backups and restic repos and report success when without errors
Signed-off-by: F. Gold <fgold@vmware.com>
* Adds <backup-name>-itemsnapshots.gz file to backup (when provided). Also
adds DownloadTargetKindBackupItemSnapshots type to allow downloading.
Updated object store unit test
Fixes#3758
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
* Removed redundant checks
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
* Consolidated code for resolving actions and plugins into ActionResolver. Added BackupWithResolvers and
RestoreWithResolvers. Introduces ItemSnapshooterResolver to bring ItemSnapshotter plugins into backup and
restore. ItemSnapshotters are not used yet.
Added action_resolver_test
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
* Addressed review comments
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
This commit adds a restore action item plugin to reset invalid value
of "sideEffects" in resource of mutatingwebhookconfiguration and
validating webhookconfiguration.
To fix the problem the "sideEffects" is illegal for resource migrated
from v1beta1.
fixes#3516
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
1. remove config/crd/v1beta1
2. remove PROJECT file
3. update controller-gen and kubebuilder version
4. generate client and CRD file
5. add changelog and remove v1beta1 CRD generated code.
6. add kubebuilder test bundle setup command.
7. due to apiextensions.k8s.io/v1beta1 is not supported, only k8s after v1.16 is supported, so remove v1.15 check.
8. add CRD and k8s suppored version update in changelog.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* fix: modify generated from schedule's backup name timestamp to UTC timezone
fix#4279
When backup is created from schedule, and the backup name is not specified, a containing-timestamp generated name will be used. Due to velero client not set timezone to UTC, a local timezone will be used for the generated name.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* fix: modify generated from schedule's backup name timestamp to UTC timezone
fix#4279
When backup is created from schedule, and the backup name is not specified, a containing-timestamp generated name will be used. Due to velero client not set timezone to UTC, a local timezone will be used for the generated name.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* fix: modify generated from schedule's backup name timestamp to UTC timezone
fix#4279
When backup is created from schedule, and the backup name is not specified, a containing-timestamp generated name will be used. Due to velero client not set timezone to UTC, a local timezone will be used for the generated name.
Signed-off-by: Xun Jiang <jxun@vmware.com>
* modify changelog description
Reword the changelog description according to comments.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Co-authored-by: jxun <jxun@jxun-a01.vmware.com>
Co-authored-by: Xun Jiang <jxun@vmware.com>
logrusr is a open source convertor, which can convert logrus logger into logr.
By using logrusr, velero can use exsiting formatted logrus logger, other than introducing zap as a new logger.
Signed-off-by: Xun Jiang <jxun@vmware.com>
Added ItemSnapshotter.proto
Added item_snapshotter Go interface
Added framework components for item_snapshotter
Updated plugins doc with ItemSnapshotter info
Added SnapshotPhase to item_snapshotter.go
ProgressOutputOutput now includes a phase as well as an error string for problems that occured
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
* Update EnableAPIGroupVersion feature design doc as implemented
Signed-off-by: F. Gold <fgold@vmware.com>
* Design doc for issue 2082 to delete associated resources when deleting BSLs
Signed-off-by: F. Gold <fgold@vmware.com>
* Changes per @dsu-igeek review comments
Signed-off-by: F. Gold <fgold@vmware.com>
The error should be returned explicitly, because when the default URL is
used S3 will return a 301 and the response can't be handled by restic.
Fixes#4178
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
When the snapshot uploading is failed, it should not be treat as completed and continue.
This commit covers both the phases of in progress and failed when uploading snapshot with vSphere plugin
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Previously, the BSL credential field would always be set when using the
`create` command, even if no credential details were provided. This
would result in an empty `SecretKeySelector` in the BSL which would
cause operations using this BSL to fail as Velero would attempt to fetch
a `Secret` with an empty name from the K8s API server.
With this change, the `Credential` field is only set if credential
details have been specified. This change also includes some refactoring
to allow the change to be tested.
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
fix paging items in to use list options passed by the paging function
The client-go pager sets the Limit options for the list call
to paginate the request[1]. This PR fixes the paging function
to use the options passed by the pager instead of shadowed options
This is required for the pagination to work correctly.
- simplify the pager list implementation by using pager.List()
The List() function already implements a lot of the logic that was
needed for paging here, using it simplifies the code.
1. 3f40906dd8/staging/src/k8s.io/client-go/tools/pager/pager.go (L219)
Signed-off-by: Alay Patel <alay1431@gmail.com>
Bump up restic to v0.12.1 to fix CVE-2020-26160.
Bump up module "github.com/vmware-tanzu/crash-diagnostics" to v0.3.7 to fix CVE-2020-29652.
The "github.com/vmware-tanzu/crash-diagnostics" updates client-go to v0.22.2 which introduces several break changes, this commit updates the related codes as well
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
After the PR to implement `velero debug` - #4022 is reviewed, there are some
suggestion to let the command collect more resources, this commit make
the change to the design doc to reflect those changes.
It also remove some sections that are no longer relevant after `crashd`
has made enhancement in the v0.3.4 release.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
* #4040 - documentation - adding more troubleshooting information during Restic restore
Signed-off-by: Rafael Brito <rbrito@vmware.com>
* #4040 - documentation - adding more troubleshooting information during Restic restore and minor changes
Signed-off-by: Rafael Brito <rbrito@vmware.com>
* #4040 - documentation - tweaks on restic page
Signed-off-by: Rafael Brito <rbrito@vmware.com>
If the "--snapshot-volumes=false" isn't specified explicitly, the vSphere plugin will always take snapshots for the volumes even though the "--default-volumes-to-restic" is specified
This can be removed if the logic of vSphere plugin changes
Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
This commit makes several changes to `tag-release.sh` according to the
change in release process:
1. It will support a "ON_RELEASE_BRANCH" param passed via env variable.
When it's set to "TRUE". The release will be created on the commit of
branch like `release-xxx`. This enables us to create release branch
before GA and tag RC release.
2. It removes the code to push a new branch to upstream. This is
because we decided to create branch manually. For patch releases, we
will not push the change to release branch, instead, we will make
sure the release branch has all commits cherrypicked BEFORE we run
this script to tag the release.
After the change the script will focus on only tag the release, not
making other code change to release branches.
Signed-off-by: Daniel Jiang <jiangd@vmware.com>
@@ -5,15 +5,18 @@ about: Tell us about a problem you are experiencing
---
**What steps did you take and what happened:**
[A clear and concise description of what the bug is, and what commands you ran.)
<!--A clear and concise description of what the bug is, and what commands you ran.-->
**What did you expect to happen:**
**The following information will help us better understand what's going on**:
**The output of the following commands will help us better understand what's going on**:
(Pasting long output into a [GitHub gist](https://gist.github.com) or other pastebin is fine.)
_If you are using velero v1.7.0+:_
Please use `velero debug --backup <backupname> --restore <restorename>` to generate the support bundle, and attach to this issue, more options please refer to `velero debug --help`
_If you are using earlier versions:_
Please provide the output of the following commands (Pasting long output into a [GitHub gist](https://gist.github.com) or other pastebin is fine.)
-`kubectl logs deployment/velero -n velero`
-`velero backup describe <backupname>` or `kubectl get backup/<backupname> -n velero -o yaml`
-`velero backup logs <backupname>`
@@ -22,7 +25,7 @@ about: Tell us about a problem you are experiencing
**Anything else you would like to add:**
[Miscellaneous information that will assist in solving the issue.]
<!--Miscellaneous information that will assist in solving the issue.-->
- [ ] [Accepted the DCO](https://velero.io/docs/v1.5/code-standards/#dco-sign-off). Commits without the DCO will delay acceptance.
- [ ] [Created a changelog file](https://velero.io/docs/v1.5/code-standards/#adding-a-changelog) or added`/kind changelog-not-required`.
- [ ] [Created a changelog file (`make new-changelog`)](https://velero.io/docs/main/code-standards/#adding-a-changelog) or comment`/kind changelog-not-required` on this PR.
- [ ] Updated the corresponding documentation in `site/content/docs/main`.
stale-issue-message:"This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. If a Velero team member has requested log or more information, please provide the output of the shared commands."
close-issue-message:"This issue was closed because it has been stalled for 5 days with no activity."
days-before-issue-stale:30
days-before-issue-close:5
stale-issue-message:"This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands."
close-issue-message:"This issue was closed because it has been stalled for 14 days with no activity."
days-before-issue-stale:60
days-before-issue-close:14
stale-issue-label:staled
# Disable stale PRs for now; they can remain open.
days-before-pr-stale:-1
days-before-pr-close:-1
# Only issues made after Feb 09 2021.
start-date:"2021-09-02T00:00:00"
# Only make issues stale if they have these labels. Comma separated.
only-labels:"Needs info,Duplicate"
exempt-issue-labels:"Epic,Area/CLI,Area/Cloud/AWS,Area/Cloud/Azure,Area/Cloud/GCP,Area/Cloud/vSphere,Area/CSI,Area/Design,Area/Documentation,Area/Plugins,Bug,Enhancement/User,kind/requirement,kind/refactor,kind/tech-debt,limitation,Needs investigation,Needs triage,Needs Product,P0 - Hair on fire,P1 - Important,P2 - Long-term important,P3 - Wouldn't it be nice if...,Product Requirements,Restic - GA,Restic,release-blocker,Security,backlog"
Below is a list of adopters of Velero in **production environments** that have
publicly shared the details of how they use it.
**[BitGo][20]**
BitGo uses Velero backup and restore capabilities to seamlessly provision and scale fullnode statefulsets on the fly as well as having it serve an integral piece for our kubernetes disaster-recovery story.
BitGo uses Velero backup and restore capabilities to seamlessly provision and scale fullnode statefulsets on the fly as well as having it serve an integral piece for our Kubernetes disaster-recovery story.
**[Bugsnag][30]**
We use Velero for managing backups of an internal instance of our on-premise clustered solution. We also recommend our users of [on-premise Bugsnag installations][31] use Velero for [managing their own backups][32].
We use Velero for managing backups of an internal instance of our on-premise clustered solution. We also recommend our users of [on-premise Bugsnag installations](https://www.bugsnag.com/on-premise) use Velero for [managing their own backups](https://docs.bugsnag.com/on-premise/clustered/backup-restore/). <!-- Velero.io word list : ignore -->
**[Banzai Cloud][60]**
[Banzai Cloud Pipeline][61] is a Kubernetes-based microservices platform that integrates services needed for Day-1 and Day-2 operations along with first-class support both for on-prem and hybrid multi-cloud deployments. We use Velero to periodically [backup and restore these clusters in case of disasters][62].
@@ -40,7 +42,9 @@ We have integrated our [solution with Velero][11] to provide our customers with
Kyma [integrates with Velero][41] to effortlessly back up and restore Kyma clusters with all its resources. Velero capabilities allow Kyma users to define and run manual and scheduled backups in order to successfully handle a disaster-recovery scenario.
**[Red Hat][50]**
Red Hat has developed the [Cluster Application Migration Tool][51] which uses [Velero and Restic][52] to drive the migration of applications between OpenShift clusters.
Red Hat has developed 2 operators for the OpenShift platform:
- [Migration Toolkit for Containers][51] (Crane): This operator uses [Velero and Restic][52] to drive the migration of applications between OpenShift clusters.
- [OADP (OpenShift API for Data Protection) Operator][53]: This operator sets up and installs Velero on the OpenShift platform, allowing users to backup and restore applications.
**[Dell EMC][70]**
For Kubernetes environments, [PowerProtect Data Manager][71] leverages the Container Storage Interface (CSI) framework to take snapshots to back up the persistent data or the data that the application creates e.g. databases. [Dell EMC leverages Velero][72] to backup the namespace configuration files (also known as Namespace meta data) for enterprise grade data protection.
@@ -56,8 +60,14 @@ MayaData is a large user of Velero as well as a contributor. MayaData offers a D
Okteto integrates Velero in [Okteto Cloud][94] and [Okteto Enterprise][95] to periodically backup and restore our clusters for disaster recovery. Velero is also a core software building block to provide namespace cloning capabilities, a feature that allows our users cloning staging environments into their personal development namespace for providing production-like development environments.
**[Replicated][100]**<br>
Replicated uses the Velero open source project to enable snapshots in [KOTS][101] to backup Kubernetes manifests & persistent volumes. In addition to the default functionality that Velero provides, [KOTS][101] provides a detailed interface in the [Admin Console][102] that can be used to manage the storage destination and schedule, and to perform and monitor the backup and restore process.
Replicated uses the Velero open source project to enable snapshots in [KOTS][101] to backup Kubernetes manifests & persistent volumes. In addition to the default functionality that Velero provides, [KOTS][101] provides a detailed interface in the [Admin Console][102] that can be used to manage the storage destination and schedule, and to perform and monitor the backup and restore process.<br>
**[CloudCasa][103]**<br>
[Catalogic Software][104] integrates Velero with [CloudCasa][103] - A Smart Home in the Cloud for Backups. CloudCasa is a full-featured, scalable, cloud-native solution providing Kubernetes data protection, disaster recovery, and migration as a service. An option to manage existing Velero instances and an enterprise self-hosted option are also available.<br>
**[Microsoft Azure][105]**<br>
[Azure Backup for AKS][106] is an Azure native, Kubernetes aware, Enterprise ready backup for containerized applications deployed on Azure Kubernetes Service (AKS). AKS Backup utilizes Velero to perform backup and restore operations to protect stateful applications in AKS clusters.<br>
## Adding your organization to the list of Velero Adopters
If you are using Velero and would like to be included in the list of `Velero Adopters`, add an SVG version of your logo to the `site/static/img/adopters` directory in this repo and submit a [pull request][3] with your change. Name the image file something that reflects your company (e.g., if your company is called Acme, name the image acme.png). See this for an example [PR][4].
@@ -77,8 +87,6 @@ If you would like to add your logo to a future `Adopters of Velero` section on [
@@ -107,6 +107,29 @@ Lazy consensus does _not_ apply to the process of:
* Removal of maintainers from Velero
## Deprecation Policy
### Deprecation Process
Any contributor may introduce a request to deprecate a feature or an option of a feature by opening a feature request issue in the vmware-tanzu/velero GitHub project. The issue should describe why the feature is no longer needed or has become detrimental to Velero, as well as whether and how it has been superseded. The submitter should give as much detail as possible.
Once the issue is filed, a one-month discussion period begins. Discussions take place within the issue itself as well as in the community meetings. The person who opens the issue, or a maintainer, should add the date and time marking the end of the discussion period in a comment on the issue as soon as possible after it is opened. A decision on the issue needs to be made within this one-month period.
The feature will be deprecated by a supermajority vote of 50% plus one of the project maintainers at the time of the vote tallying, which is 72 hours after the end of the community meeting that is the end of the comment period. (Maintainers are permitted to vote in advance of the deadline, but should hold their votes until as close as possible to hear all possible discussion.) Votes will be tallied in comments on the issue.
Non-maintainers may add non-binding votes in comments to the issue as well; these are opinions to be taken into consideration by maintainers, but they do not count as votes.
If the vote passes, the deprecation window takes effect in the subsequent release, and the removal follows the schedule.
### Schedule
If depreciation proposal passes by supermajority votes, the feature is deprecated in the next minor release and the feature can be removed completely after two minor version or equivalent major version e.g., if feature gets deprecated in Nth minor version, then feature can be removed after N+2 minor version or its equivalent if the major version number changes.
### Deprecation Window
The deprecation window is the period from the release in which the deprecation takes effect through the release in which the feature is removed. During this period, only critical security vulnerabilities and catastrophic bugs should be fixed.
**Note:** If a backup relies on a deprecated feature, then backups made with the last Velero release before this feature is removed must still be restorable in version `n+2`. For instance, something like restic feature support, that might mean that restic is removed from the list of supported uploader types in version `n` but the underlying implementation required to restore from a restic backup won't be removed until release `n+2`.
## Updating Governance
All substantive changes in Governance require a supermajority agreement by all maintainers.
[![Build Status][1]][2] [](https://bestpractices.coreinfrastructure.org/projects/3811)
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a public cloud platform or on-premises. Velero lets you:
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a public cloud platform or on-premises.
Velero lets you:
* Take backups of your cluster and restore in case of loss.
* Migrate cluster resources to other clusters.
@@ -18,7 +20,7 @@ Velero consists of:
## Documentation
[The documentation][29] provides a getting started guide and information about building from source, architecture, extending Velero, and more.
[The documentation][29] provides a getting started guide and information about building from source, architecture, extending Velero and more.
Please use the version selector at the top of the site to ensure you are using the appropriate documentation for your version of Velero.
@@ -34,6 +36,28 @@ If you are ready to jump in and test, add code, or help with documentation, foll
See [the list of releases][6] to find out about feature changes.
### Velero compatibility matrix
The following is a list of the supported Kubernetes versions for each Velero version.
| Velero version | Expected Kubernetes version compatibility | Tested on Kubernetes version |
Velero supports IPv4, IPv6, and dual stack environments. Support for this was tested against Velero v1.8.
The Velero maintainers are continuously working to expand testing coverage, but are not able to test every combination of Velero and supported Kubernetes versions for each Velero release. The table above is meant to track the current testing coverage and the expected supported Kubernetes versions for each Velero version.
If you are interested in using a different version of Kubernetes with a given Velero version, we'd recommend that you perform testing before installing or upgrading your environment. For full information around capabilities within a release, also see the Velero [release notes](https://github.com/vmware-tanzu/velero/releases) or Kubernetes [release notes](https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG). See the Velero [support page](https://velero.io/docs/latest/support-process/) for information about supported versions of Velero.
For each release, Velero maintainers run the test to ensure the upgrade path from n-2 minor release. For example, before the release of v1.10.x, the test will verify that the backup created by v1.9.x and v1.8.x can be restored using the build to be tagged as v1.10.x.
This document provides a link to the [Velero Project boards](https://github.com/vmware-tanzu/velero/projects) that serves as the up to date description of items that are in the release pipeline. The release boards have separate swim lanes based on prioritization. Most items are gathered from the community or include a feedback loop with the community. This should serve as a reference point for Velero users and contributors to understand where the project is heading, and help determine if a contribution could be conflicting with a longer term plan.
### How to help?
Discussion on the roadmap can take place in threads under [Issues](https://github.com/vmware-tanzu/velero/issues) or in [community meetings](https://velero.io/community/). Please open and comment on an issue if you want to provide suggestions, use cases, and feedback to an item in the roadmap. Please review the roadmap to avoid potential duplicated effort.
### How to add an item to the roadmap?
One of the most important aspects in any open source community is the concept of proposals. Large changes to the codebase and / or new features should be preceded by a [proposal](https://github.com/vmware-tanzu/velero/blob/main/GOVERNANCE.md#proposal-process) in our repo.
For smaller enhancements, you can open an issue to track that initiative or feature request.
We work with and rely on community feedback to focus our efforts to improve Velero and maintain a healthy roadmap.
### Current Roadmap
The following table includes the current roadmap for Velero. If you have any questions or would like to contribute to Velero, please attend a [community meeting](https://velero.io/community/) to discuss with our team. If you don't know where to start, we are always looking for contributors that will help us reduce technical, automation, and documentation debt.
Please take the timelines & dates as proposals and goals. Priorities and requirements change based on community feedback, roadblocks encountered, community contributions, etc. If you depend on a specific item, we encourage you to attend community meetings to get updated status information, or help us deliver that feature by contributing to Velero.
`Last Updated: July 2021`
#### 1.7.0 Roadmap (to be delivered early fall)
The release roadmap is split into Core items that are required for the release and desired items that may slip the release.
##### Core items
The top priority of 1.7 is to increase the technical health of Velero and be more efficient with Velero developer time by streamlining the release process and automating and expanding the E2E test suite.
|Issue|Description|
|---|---|
||Streamline release process|
||Automate the running of the E2E tests|
||Convert pre-release manual tests to automated E2E tests|
|[3493](https://github.com/vmware-tanzu/velero/issues/3493)|[Carvel](https://github.com/vmware-tanzu/velero/issues/3493) based installation (in addition to the existing *velero install* CLI).|
|[675](https://github.com/vmware-tanzu/velero/issues/675)|Velero command to generate debugging information. Will integrate with [Crashd - Crash Diagnostics](https://github.com/vmware-tanzu/velero/issues/675)|
|[3285](https://github.com/vmware-tanzu/velero/issues/3285)|Design doc for Velero plugin versioning|
@@ -12,13 +12,13 @@ The Velero project maintains the following [governance document](https://github.
Security is of the highest importance and all security vulnerabilities or suspected security vulnerabilities should be reported to Velero privately, to minimize attacks against current users of Velero before they are fixed. Vulnerabilities will be investigated and patched on the next patch (or minor) release as soon as possible. This information could be kept entirely internal to the project.
If you know of a publicly disclosed security vulnerability for Velero, please **IMMEDIATELY** contact the VMware Security Team (security@vmware.com).
If you know of a publicly disclosed security vulnerability for Velero, please **IMMEDIATELY** contact the Security Team (velero-security.pdl@broadcom.com).
**IMPORTANT: Do not file public issues on GitHub for security vulnerabilities**
To report a vulnerability or a security-related issue, please contact the VMware email address with the details of the vulnerability. The email will be fielded by the VMware Security Team and then shared with the Velero maintainers who have committer and release permissions. Emails will be addressed within 3 business days, including a detailed plan to investigate the issue and any potential workarounds to perform in the meantime. Do not report non-security-impacting bugs through this channel. Use [GitHub issues](https://github.com/vmware-tanzu/velero/issues/new/choose) instead.
To report a vulnerability or a security-related issue, please contact the email address with the details of the vulnerability. The email will be fielded by the Security Team and then shared with the Velero maintainers who have committer and release permissions. Emails will be addressed within 3 business days, including a detailed plan to investigate the issue and any potential workarounds to perform in the meantime. Do not report non-security-impacting bugs through this channel. Use [GitHub issues](https://github.com/vmware-tanzu/velero/issues/new/choose) instead.
## Proposed Email Content
@@ -29,7 +29,7 @@ Provide a descriptive subject line and in the body of the email include the foll
* Basic identity information, such as your name and your affiliation or company.
* Detailed steps to reproduce the vulnerability (POC scripts, screenshots, and logs are all helpful to us).
* Description of the effects of the vulnerability on Velero and the related hardware and software configurations, so that the VMware Security Team can reproduce it.
* Description of the effects of the vulnerability on Velero and the related hardware and software configurations, so that the Security Team can reproduce it.
* How the vulnerability affects Velero usage and an estimation of the attack surface, if there is one.
* List other projects or dependencies that were used in conjunction with Velero to produce the vulnerability.
@@ -49,7 +49,7 @@ Provide a descriptive subject line and in the body of the email include the foll
## Patch, Release, and Disclosure
The VMware Security Team will respond to vulnerability reports as follows:
The Security Team will respond to vulnerability reports as follows:
@@ -62,7 +62,7 @@ The VMware Security Team will respond to vulnerability reports as follows:
5. The Security Team will also create a [CVSS](https://www.first.org/cvss/specification-document) using the [CVSS Calculator](https://www.first.org/cvss/calculator/3.0). The Security Team makes the final call on the calculated CVSS; it is better to move quickly than making the CVSS perfect. Issues may also be reported to [Mitre](https://cve.mitre.org/) using this [scoring calculator](https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator). The CVE will initially be set to private.
6. The Security Team will work on fixing the vulnerability and perform internal testing before preparing to roll out the fix.
7. The Security Team will provide early disclosure of the vulnerability by emailing the [Velero Distributors](https://groups.google.com/u/1/g/projectvelero-distributors) mailing list. Distributors can initially plan for the vulnerability patch ahead of the fix, and later can test the fix and provide feedback to the Velero team. See the section **Early Disclosure to Velero Distributors List** for details about how to join this mailing list.
8. A public disclosure date is negotiated by the VMware SecurityTeam, the bug submitter, and the distributors list. We prefer to fully disclose the bug as soon as possible once a user mitigation or patch is available. It is reasonable to delay disclosure when the bug or the fix is not yet fully understood, the solution is not well-tested, or for distributor coordination. The timeframe for disclosure is from immediate (especially if it’s already publicly known) to a few weeks. For a critical vulnerability with a straightforward mitigation, we expect the report date for the public disclosure date to be on the order of 14 business days. The VMware Security Team holds the final say when setting a public disclosure date.
8. A public disclosure date is negotiated by the SecurityTeam, the bug submitter, and the distributors list. We prefer to fully disclose the bug as soon as possible once a user mitigation or patch is available. It is reasonable to delay disclosure when the bug or the fix is not yet fully understood, the solution is not well-tested, or for distributor coordination. The timeframe for disclosure is from immediate (especially if it’s already publicly known) to a few weeks. For a critical vulnerability with a straightforward mitigation, we expect the report date for the public disclosure date to be on the order of 14 business days. The Security Team holds the final say when setting a public disclosure date.
9. Once the fix is confirmed, the Security Team will patch the vulnerability in the next patch or minor release, and backport a patch release into all earlier supported releases. Upon release of the patched version of Velero, we will follow the **Public Disclosure Process**.
@@ -79,7 +79,7 @@ The Security Team will also publish any mitigating steps users can take until th
* Use security@vmware.com to report security concerns to the VMware Security Team, who uses the list to privately discuss security issues and fixes prior to disclosure.
* Use velero-security.pdl@broadcom.com to report security concerns to the Security Team, who uses the list to privately discuss security issues and fixes prior to disclosure.
* Join the [Velero Distributors](https://groups.google.com/u/1/g/projectvelero-distributors) mailing list for early private information and vulnerability disclosure. Early disclosure may include mitigating steps and additional information on security patch releases. See below for information on how Velero distributors or vendors can apply to join this list.
@@ -107,11 +107,11 @@ To be eligible to join the [Velero Distributors](https://groups.google.com/u/1/g
## Embargo Policy
The information that members receive on the Velero Distributors mailing list must not be made public, shared, or even hinted at anywhere beyond those who need to know within your specific team, unless you receive explicit approval to do so from the VMware Security Team. This remains true until the public disclosure date/time agreed upon by the list. Members of the list and others cannot use the information for any reason other than to get the issue fixed for your respective distribution's users.
The information that members receive on the Velero Distributors mailing list must not be made public, shared, or even hinted at anywhere beyond those who need to know within your specific team, unless you receive explicit approval to do so from the Security Team. This remains true until the public disclosure date/time agreed upon by the list. Members of the list and others cannot use the information for any reason other than to get the issue fixed for your respective distribution's users.
Before you share any information from the list with members of your team who are required to fix the issue, these team members must agree to the same terms, and only be provided with information on a need-to-know basis.
In the unfortunate event that you share information beyond what is permitted by this policy, you must urgently inform the VMware Security Team (security@vmware.com) of exactly what information was leaked and to whom. If you continue to leak information and break the policy outlined here, you will be permanently removed from the list.
In the unfortunate event that you share information beyond what is permitted by this policy, you must urgently inform the Security Team (velero-security.pdl@broadcom.com) of exactly what information was leaked and to whom. If you continue to leak information and break the policy outlined here, you will be permanently removed from the list.
@@ -123,6 +123,6 @@ Send new membership requests to projectvelero-distributors@googlegroups.com. In
## Confidentiality, integrity and availability
We consider vulnerabilities leading to the compromise of data confidentiality, elevation of privilege, or integrity to be our highest priority concerns. Availability, in particular in areas relating to DoS and resource exhaustion, is also a serious security concern. The VMware Security Team takes all vulnerabilities, potential vulnerabilities, and suspected vulnerabilities seriously and will investigate them in an urgent and expeditious manner.
We consider vulnerabilities leading to the compromise of data confidentiality, elevation of privilege, or integrity to be our highest priority concerns. Availability, in particular in areas relating to DoS and resource exhaustion, is also a serious security concern. The Security Team takes all vulnerabilities, potential vulnerabilities, and suspected vulnerabilities seriously and will investigate them in an urgent and expeditious manner.
Note that we do not currently consider the default settings for Velero to be secure-by-default. It is necessary for operators to explicitly configure settings, role based access control, and other resource related features in Velero to provide a hardened Velero environment. We will not act on any security disclosure that relates to a lack of safe defaults. Over time, we will work towards improved safe-by-default configuration, taking into account backwards compatibility.
This folder contains logo images for Velero in gray (for light backgrounds) and white (for dark backgrounds like black tshirts or dark mode!) – horizontal and stacked… in .eps and .svg.
This folder contains logo images for Velero in gray (for light backgrounds) and white (for dark backgrounds like black t-shirts or dark mode!) – horizontal and stacked… in .eps and .svg.
In this release, we introduced the Unified Repository architecture to build a data path where data movers and the backup repository are decoupled and a unified backup repository could serve various data movement activities.
In this release, we also deeply integrate Velero with Kopia, specifically, Kopia's uploader modules are isolated as a generic file system uploader; Kopia's repository modules are encapsulated as the unified backup repository.
For more information, refer to the [design document](https://github.com/vmware-tanzu/velero/blob/v1.10.0/design/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md).
#### File system backup refactor
Velero's file system backup (a.k.s. pod volume backup or formerly restic backup) is refactored as the first user of the Unified Repository architecture. Specifically, we added a new path, the Kopia path, besides the existing Restic path. While Restic path is still available and set as default, you can opt in Kopia path by specifying the `uploader-type` parameter at installation time. Meanwhile, you are free to restore from existing backups under either path, Velero dynamically switches to the correct path to process the restore.
Because of the new path, we renamed some modules and parameters, refer to the Break Changes section for more details.
For more information, visit the [file system backup document](https://velero.io/docs/v1.10/file-system-backup/) and [v1.10 upgrade guide document](https://velero.io/docs/v1.10/upgrade-to-1.10/).
Meanwhile, we've created a performance guide for both Restic path and Kopia path, which helps you to choose between the two paths and provides you the best practice to configure them under different scenarios. Please note that the results in the guide are based on our testing environments, you may get different results when testing in your own ones. For more information, visit the [performance guide document](https://velero.io/docs/v1.10/performance-guidance/).
#### Plugin versioning V1 refactor
In this release, Velero moves plugins BackupItemAction, RestoreItemAction and VolumeSnapshotterAction to version v1, this allows future plugin changes that do not support backward compatibility, so is a preparation for various complex tasks, for example, data movement tasks.
For more information, refer to the [plugin versioning design document](https://github.com/vmware-tanzu/velero/blob/v1.10.0/design/plugin-versioning.md).
#### Refactor the controllers using Kubebuilder v3
In this release we continued our code modernization work, rewriting some controllers using Kubebuilder v3. This work is ongoing and we will continue to make progress in future releases.
#### Add credentials to volume snapshot locations
In this release, we enabled dedicate credentials options to volume snapshot locations so that you can specify credentials per volume snapshot location as same as backup storage location.
For more information, please visit the [locations document](https://velero.io/docs/v1.10/locations/).
#### CSI snapshot enhancements
In this release we added several changes to enhance the robustness of CSI snapshot procedures, for example, some protection code for error handling, and a mechanism to skip exclusion checks so that CSI snapshot works with various backup resource filters.
#### Backup schedule pause/unpause
In this release, Velero supports to pause/unpause a backup schedule during or after its creation. Specifically:
At creation time, you can specify `–paused` flag to `velero schedule create` command, if so, you will create a paused schedule that will not run until it is unpaused
After creation, you can run `velero schedule pause` or `velero schedule unpause` command to pause/unpause a schedule
#### Runtime and dependencies
In order to fix CVEs, we changed Velero's runtime and dependencies as follows:
Bump go runtime to v1.18.8
Bump some core dependent libraries to newer versions
Compile Restic (v0.13.1) with go 1.18.8 instead of packaging the official binary
#### Breaking changes
Due to file system backup refactor, below modules and parameters name have been changed in this release:
`restic` daemonset is renamed to `node-agent`
`resticRepository` CR is renamed to `backupRepository`
`velero restic repo` command is renamed to `velero repo`
`velero-restic-credentials` secret is renamed to `velero-repo-credentials`
`default-volumes-to-restic` parameter is renamed to `default-volumes-to-fs-backup`
`restic-timeout` parameter is renamed to `fs-backup-timeout`
`default-restic-prune-frequency` parameter is renamed to `default-repo-maintain-frequency`
#### Upgrade
Due to the major changes of file system backup, the old upgrade steps are not suitable any more. For the new upgrade steps, visit [v1.10 upgrade guide document](https://velero.io/docs/v1.10/upgrade-to-1.10/).
#### Limitations/Known issues
In this release, Kopia backup repository (so the Kopia path of file system backup) doesn't support self signed certificate for S3 compatible storage. To track this problem, refer to this [Velero issue](https://github.com/vmware-tanzu/velero/issues/5123) or [Kopia issue](https://github.com/kopia/kopia/issues/1443).
Due to the code change in Velero, there will be some code change required in vSphere plugin, without which the functionality may be impacted. Therefore, if you are using vSphere plugin in your workflow, please hold the upgrade until the issue [#485](https://github.com/vmware-tanzu/velero-plugin-for-vsphere/issues/485) is fixed in vSphere plugin.
### All changes
* Restore ClusterBootstrap before Cluster otherwise a new default ClusterBootstrap object is create for the cluster (#5616, @ywk253100)
* Add compile restic binary for CVE fix (#5574, @qiuming-best)
* Add credential store in backup deletion controller to support VSL credential. (#5521, @blackpiglet)
* Fix issue 5505: the pod volume backups/restores except the first one fail under the kopia path if "AZURE_CLOUD_NAME" is specified (#5512, @Lyndon-Li)
* After Pod Volume Backup/Restore refactor, remove all the unreasonable appearance of "restic" word from documents (#5499, @Lyndon-Li)
* Refactor Pod Volume Backup/Restore doc to match the new behavior (#5484, @Lyndon-Li)
* Remove redundancy code block left by #5388. (#5483, @blackpiglet)
* Issue fix 5477: create the common way to support S3 compatible object storages that work for both Restic and Kopia; Keep the resticRepoPrefix parameter for compatibility (#5478, @Lyndon-Li)
* Update the k8s.io dependencies to 0.24.0.
This also required an update to github.com/bombsimon/logrusr/v3.
Removed the `WithClusterName` method
as it is a "legacy field that was
always cleared by the system and never used" as per upstream k8s
* Remove irrational "Restic" names in Velero code after the PVBR refactor (#5444, @Lyndon-Li)
* moved RIA execute input/output structs back to velero package (#5441, @sseago)
* Rename Velero pod volume restore init helper from "velero-restic-restore-helper" to "velero-restore-helper" (#5432, @Lyndon-Li)
* Skip the exclusion check for additional resources returned by BIA (#5429, @reasonerjt)
* Change B/R describe CLI to support Kopia (#5412, @allenxu404)
* Add nil check before execution of csi snapshot delete (#5401, @shubham-pampattiwar)
* update velero using klog to version v2.9.0 (#5396, @blackpiglet)
* Fix Test_prepareBackupRequest_BackupStorageLocation UT failure. (#5394, @blackpiglet)
* Rename Velero daemonset from "restic" to "node-agent" (#5390, @Lyndon-Li)
* Add some corner cases checking for CSI snapshot in backup controller. (#5388, @blackpiglet)
* Fix issue 5386: Velero providers a full URL as the S3Url while the underlying minio client only accept the host part of the URL as the endpoint and the schema should be specified separately. (#5387, @Lyndon-Li)
* Fix restore error with flag namespace-mappings (#5377, @qiuming-best)
* Pod Volume Backup/Restore Refactor: Rename parameters in CRDs and commands to remove "Restic" word (#5370, @Lyndon-Li)
* Added backupController's UT to test the prepareBackupRequest() method BackupStorageLocation processing logic (#5362, @niulechuan)
* Fix a repoEnsurer problem introduced by the refactor - The repoEnsurer didn't check "" state of BackupRepository, as a result, the function GetBackupRepository always returns without an error even though the ensreReady is specified. (#5359, @Lyndon-Li)
* Add E2E test for schedule backup (#5355, @danfengliu)
* Add useOwnerReferencesInBackup field doc for schedule. (#5353, @cleverhu)
* Clarify the help message for the default value of parameter --snapshot-volumes, when it's not set. (#5350, @blackpiglet)
* Fix issue 4874 and 4752: check the daemonset pod is running in the node where the workload pod resides before running the PVB for the pod (#5319, @Lyndon-Li)
* plugin versioning v1 refactor for VolumeSnapshotter (#5318, @sseago)
* Change the status of restore to completed from partially failed when restore empty backup (#5314, @allenxu404)
* RestoreItemAction v1 refactoring for plugin api versioning (#5312, @sseago)
* Refactor the repoEnsurer code to use controller runtime client and wrap some common BackupRepository operations to share with other modules (#5308, @Lyndon-Li)
* Remove snapshot related lister, informer and client from backup controller. (#5299, @jxun)
* change CSISnapshotTimeout from pointer to normal variables. (#5294, @cleverhu)
* Optimize code for restore exists resources. (#5293, @cleverhu)
* Add more detailed comments for labels columns. (#5291, @cleverhu)
* Add backup status checking in schedule controller. (#5283, @blackpiglet)
* Add changes for problems/enhancements found during smoking test for Kopia pod volume backup/restore (#5282, @Lyndon-Li)
* Support pause/unpause schedules (#5279, @ywk253100)
* plugin/clientmgmt refactoring for BackupItemAction v1 (#5271, @sseago)
* Don't move velero v1 plugins to new proto dir (#5263, @sseago)
* Fill gaps for Kopia path of PVBR: integrate Repo Manager with Unified Repo; pass UploaderType to PVBR backupper and restorer; pass RepositoryType to BackupRepository controller and Repo Ensurer (#5259, @Lyndon-Li)
* Add csiSnapshotTimeout for describe backup (#5252, @cleverhu)
* equip gc controller with configurable frequency (#5248, @allenxu404)
* Fix nil pointer panic when restoring StatefulSets (#5247, @divolgin)
* Add labeled and unlabeled events for PR changelog check action. (#5157, @jxun)
* VolumeSnapshotLocation refactor with kubebuilder. (#5148, @jxun)
* Delay CA file deletion in PVB controller. (#5145, @jxun)
* This commit splits the pkg/restic package into several packages to support Kopia integration works (#5143, @ywk253100)
* Kopia Integration: Add the Unified Repository Interface definition. Kopia Integration: Add the changes for Unified Repository storage config. Related Issues; #5076, #5080 (#5142, @Lyndon-Li)
* Update the CRD for kopia integration (#5135, @reasonerjt)
* Let "make shell xxx" respect GOPROXY (#5128, @reasonerjt)
* Modify BackupStoreGetter to avoid BSL spec changes (#5122, @sseago)
* Dump stack trace when the plugin server handles panic (#5110, @reasonerjt)
* Make CSI snapshot creation timeout configurable. (#5104, @jxun)
* Fix bsl validation bug: the BSL is validated continually and doesn't respect the validation period configured (#5101, @ywk253100)
* Exclude "csinodes.storage.k8s.io" and "volumeattachments.storage.k8s.io" from restore by default. (#5064, @jxun)
* Move 'velero.io/exclude-from-backup' label string to const (#5053, @niulechuan)
* Modify Github actions. (#5052, @jxun)
* Fix typo in doc, in https://velero.io/docs/main/restore-reference/ "Restore order" section, "Mamespace" should be "Namespace". (#5051, @niulechuan)
* Delete opened issues triage action. (#5041, @jxun)
* When spec.RestoreStatus is empty, don't restore status (#5008, @sseago)
* Added DownloadTargetKindCSIBackupVolumeSnapshots for retrieving the signed URL to download only the `<backup name>`-csi-volumesnapshots.json.gz and DownloadTargetKindCSIBackupVolumeSnapshotContents to download only `<backup name>`-csi-volumesnapshotcontents.json.gz in the DownloadRequest CR structure. These files are already present in the backup layout. (#4980, @anshulahuja98)
* Refactor BackupItemAction proto and related code to backupitemaction/v1 package. This is part of implementation of the plugin version design https://github.com/vmware-tanzu/velero/blob/main/design/plugin-versioning.md (#4943, @phuongatemc)
* Unified Repository Design (#4926, @Lyndon-Li)
* Add credentials to volume snapshot locations (#4864, @sseago)
This feature implements the BackupItemAction v2. BIA v2 has two new methods: Progress() and Cancel() and modifies the Execute() return value.
The API change is needed to facilitate long-running BackupItemAction plugin actions that may not be complete when the Execute() method returns. This will allow long-running BackupItemAction plugin actions to continue in the background while the Velero moves to the following plugin or the next item.
#### RestoreItemAction v2
This feature implemented the RestoreItemAction v2. RIA v2 has three new methods: Progress(), Cancel(), and AreAdditionalItemsReady(), and it modifies RestoreItemActionExecuteOutput() structure in the RIA return value.
The Progress() and Cancel() methods are needed to facilitate long-running RestoreItemAction plugin actions that may not be complete when the Execute() method returns. This will allow long-running RestoreItemAction plugin actions to continue in the background while the Velero moves to the following plugin or the next item. The AreAdditionalItemsReady() method is needed to allow plugins to tell Velero to wait until the returned additional items have been restored and are ready for use in the cluster before restoring the current item.
#### Plugin Progress Monitoring
This is intended as a replacement for the previously-approved Upload Progress Monitoring design ([Upload Progress Monitoring](https://github.com/vmware-tanzu/velero/blob/main/design/upload-progress.md)) to expand the supported use cases beyond snapshot upload to include what was previously called Async Backup/Restore Item Actions.
#### Flexible resource policy that can filter volumes to skip in the backup
This feature provides a flexible policy to filter volumes in the backup without requiring patching any labels or annotations to the pods or volumes. This policy is configured as k8s ConfigMap and maintained by the users themselves, and it can be extended to more scenarios in the future. By now, the policy rules out volumes from backup depending on the CSI driver, NFS setting, volume size, and StorageClass setting. Please refer to [policy API design](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/handle-backup-of-volumes-by-resources-filters.md#api-design)for the policy's ConifgMap format. It is not guaranteed to work on unofficial third-party plugins as it may not follow the existing backup workflow code logic of Velero.
#### Resource Filters that can distinguish cluster scope and namespace scope resources
This feature adds four new resource filters for backup. The new filters are separated into cluster scope and namespace scope. Before this feature, Velero could not filter cluster scope resources precisely. This feature provides the ability and refactors existing resource filter parameters.
#### Add a parameter for setting the Velero server connection with the k8s API server's timeout
In Velero, some code pieces need to communicate with the k8s API server. Before v1.11, these code pieces used hard-code timeout settings. This feature adds a resource-timeout parameter in the velero server binary to make it configurable.
#### Add resource list in the output of the restore describe command
Before this feature, Velero restore didn't have a restored resources list as the Velero backup. It's not convenient for users to learn what is restored. This feature adds the resources list and the handling result of the resources (including created, updated, failed, and skipped).
#### Refactor controllers with controller-runtime
In v1.11, Backup Controller and Restore controller are refactored with controller-runtime. Till v1.11, all Velero controllers use the controller-runtime framework.
#### Runtime and dependencies
To fix CVEs and keep pace with Golang, Velero made changes as follows:
* Bump Golang runtime to v1.19.8.
* Bump several dependent libraries to new versions.
* Compile Restic (v0.15.0) with Golang v1.19.8 instead of packaging the official binary.
### Breaking changes
* The Velero CSI plugin now determines whether to restore Volume's data from snapshots on the restore's restorePVs setting. Before v1.11, the CSI plugin doesn't check the restorePVs parameter setting.
### Limitations/Known issues
* The Flexible resource policy that can filter volumes to skip in the backup is not guaranteed to work on unofficial third-party plugins because the plugins may not follow the existing backup workflow code logic of Velero. The ConfigMap used as the policy is supposed to be maintained by users.
### All Changes
* Modify new scope resource filters name. (#6089, @blackpiglet)
* Make Velero not exits when EnableCSI is on and CSI snapshot not installed (#6062, @blackpiglet)
* Restore Services before Clusters (#6057, @ywk253100)
* Fixed backup deletion bug related to async operations (#6041, @sseago)
* Update Golang version to v1.19 for branch main. (#6039, @blackpiglet)
* Fix issue #5972, don't assume errorField as error type when dealing with logger.WithError (#6028, @Lyndon-Li)
* distinguish between New and InProgress operations (#6012, @sseago)
* Modify golangci.yaml file. Resolve found lint issues. (#6008, @blackpiglet)
* Remove Reference of itemsnapshotter (#5997, @reasonerjt)
* minor fixes for backup_operations_controller (#5996, @sseago)
* RIAv2 async operations controller work (#5993, @sseago)
* Follow-on fixes for BIAv2 controller work (#5971, @sseago)
* Refactor backup controller based on the controller-runtime framework. (#5969, @qiuming-best)
* Fix client wait problem after async operation change, velero backup/restore --wait should check a full list of the terminal status (#5964, @Lyndon-Li)
* Fix issue #5935, refactor the logics for backup/restore persistent log, so as to remove the contest to gzip writer (#5956, @Lyndon-Li)
* Switch the base image to distroless/base-nossl-debian11 to reduce the CVE triage efforts (#5939, @ywk253100)
* Wait for additional items to be ready before restoring current item (#5933, @sseago)
* Add configurable server setting for default timeouts (#5926, @eemcmullan)
* Add warning/error result to cmd `velero backup describe` (#5916, @allenxu404)
* Fix Dependabot alerts. Use 1.18 and 1.19 golang instead of patch image in dockerfile. Add release-1.10 and release-1.9 in Trivy daily scan. (#5911, @blackpiglet)
* Update client-go to v0.25.6 (#5907, @kaovilai)
* Limit the concurrent number for backup's VolumeSnapshot operation. (#5900, @blackpiglet)
* Fix goreleaser issue for resolving tags and updated it's version. (#5899, @anshulahuja98)
* This is to fix issue 5881, enhance the PVB tracker in two modes, Track and Taken (#5894, @Lyndon-Li)
* Add labels for velero installed namespace to support PSA. (#5873, @blackpiglet)
* Add restored resource list in the restore describe command (#5867, @ywk253100)
* Add a json output to cmd velero backup describe (#5865, @allenxu404)
* Make restore controller adopting the controller-runtime framework. (#5864, @blackpiglet)
* Replace k8s.io/apimachinery/pkg/util/clock with k8s.io/utils/clock (#5859, @hezhizhen)
* Restore finalizer and managedFields of metadata during the restoration (#5853, @ywk253100)
* BIAv2 async operations controller work (#5849, @sseago)
* Add secret restore item action to handle service account token secret (#5843, @ywk253100)
* Add new resource filters can separate cluster and namespace scope resources. (#5838, @blackpiglet)
* Correct PVB/PVR Failed Phase patching during startup (#5828, @kaovilai)
* bump up golang net to fix CVE-2022-41721 (#5812, @Lyndon-Li)
* Update CRD descriptions for SnapshotVolumes and restorePVs (#5807, @shubham-pampattiwar)
* Add mapped selected-node existence check (#5806, @blackpiglet)
* Add option "--service-account-name" to install cmd (#5802, @reasonerjt)
* Define itemoperations.json format and update DownloadRequest API (#5752, @sseago)
* Add Trivy nightly scan. (#5740, @jxun)
* Fix issue 5696, check if the repo is still openable before running the prune and forget operation, if not, try to reconnect the repo (#5715, @Lyndon-Li)
* Fix error with Restic backup empty volumes (#5713, @qiuming-best)
* new backup and restore phases to support async plugin operations:
CSI Snapshot Data Movement refers to back up CSI snapshot data from the volatile and limited production environment into durable, heterogeneous, and scalable backup storage in a consistent manner; and restore the data to volumes in the original or alternative environment.
CSI Snapshot Data Movement is useful in below scenarios:
* For on-premises users, the storage usually doesn't support durable snapshots, so it is impossible/less efficient/cost ineffective to keep volume snapshots by the storage This feature helps to move the snapshot data to a storage with lower cost and larger scale for long time preservation.
* For public cloud users, this feature helps users to fulfill the multiple cloud strategy. It allows users to back up volume snapshots from one cloud provider and preserve or restore the data to another cloud provider. Then users will be free to flow their business data across cloud providers based on Velero backup and restore
CSI Snapshot Data Movement is built according to the Volume Snapshot Data Movement design ([Volume Snapshot Data Movement](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md)). More details can be found in the design.
#### Resource Modifiers
In many use cases, customers often need to substitute specific values in Kubernetes resources during the restoration process like changing the namespace, changing the storage class, etc.
To address this need, Resource Modifiers (also known as JSON Substitutions) offer a generic solution in the restore workflow. It allows the user to define filters for specific resources and then specify a JSON patch (operator, path, value) to apply to the resource. This feature simplifies the process of making substitutions without requiring the implementation of a new RestoreItemAction plugin. More details can be found in Volume Snapshot Resource Modifiers design ([Resource Modifiers](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/json-substitution-action-design.md)).
#### Multiple VolumeSnapshotClasses
Prior to version 1.12, the Velero CSI plugin would choose the VolumeSnapshotClass in the cluster based on matching driver names and the presence of the "velero.io/csi-volumesnapshot-class" label. However, this approach proved inadequate for many user scenarios.
With the introduction of version 1.12, Velero now offers support for multiple VolumeSnapshotClasses in the CSI Plugin, enabling users to select a specific class for a particular backup. More details can be found in Multiple VolumeSnapshotClasses design ([Multiple VolumeSnapshotClasses](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/multiple-csi-volumesnapshotclass-support.md)).
#### Restore Finalizer
Before v1.12, the restore controller would only delete restore resources but wouldn’t delete restore data from the backup storage location when the command `velero restore delete` was executed. The only chance Velero deletes restores data from the backup storage location is when the associated backup is deleted.
In this version, Velero introduces a finalizer that ensures the cleanup of all associated data for restores when running the command `velero restore delete`.
#### Runtime and dependencies
To fix CVEs and keep pace with Golang, Velero made changes as follows:
* Bump Golang runtime to v1.20.7.
* Bump several dependent libraries to new versions.
* Bump Kopia to v0.13.
### Breaking changes
* Prior to v1.12, the parameter `uploader-type` for Velero installation had a default value of "restic". However, starting from this version, the default value has been changed to "kopia". This means that Velero will now use Kopia as the default path for file system backup.
* The ways of setting CSI snapshot time have changed in v1.12. First, the sync waiting time for creating a snapshot handle in the CSI plugin is changed from the fixed 10 minutes into backup.Spec.CSISnapshotTimeout. The second, the async waiting time for VolumeSnapshot and VolumeSnapshotContent's status turning into `ReadyToUse` in operation uses the operation's timeout. The default value is 4 hours.
* As from [Velero helm chart v4.0.0](https://github.com/vmware-tanzu/helm-charts/releases/tag/velero-4.0.0), it supports multiple BSL and VSL, and the BSL and VSL have changed from the map into a slice, and[ this breaking change](https://github.com/vmware-tanzu/helm-charts/pull/413) is not backward compatible. So it would be best to change the BSL and VSL configuration into slices before the Upgrade.
### Limitations/Known issues
* The Azure plugin supports Azure AD Workload identity way, but it only works for Velero native snapshots. It cannot support filesystem backup and snapshot data mover scenarios.
### All Changes
* Fixes #6498. Get resource client again after restore actions in case resource's gv is changed. This is an improvement of pr #6499, to support group changes. A group change usually happens in a restore plugin which is used for resource conversion: convert a resource from a not supported gv to a supported gv (#6634, @27149chen)
* Add API support for volMode block, only error for now. (#6608, @shawn-hurley)
* Fix how the AWS credentials are obtained from configuration (#6598, @aws_creds)
* Add performance E2E test (#6569, @qiuming-best)
* Non default s3 credential profiles work on Unified Repository Provider (kopia) (#6558, @kaovilai)
* Fix issue #6571, fix the problem for restore item operation to set the errors correctly so that they can be recorded by Velero restore and then reflect the correct status for Velero restore. (#6594, @Lyndon-Li)
* Fix issue 6575, flush the repo after delete the snapshot, otherwise, the changes(deleting repo snapshot) cannot be committed to the repo. (#6587, @Lyndon-Li)
* Delete moved snapshots when the backup is deleted (#6547, @reasonerjt)
* check if restore crd exist before operating restores (#6544, @allenxu404)
* Remove PVC's selector in backup's PVC action. (#6481, @blackpiglet)
* Delete the expired deletebackuprequests that are stuck in "InProgress" (#6476, @reasonerjt)
* Fix issue #6534, reset PVB CR's StorageLocation to the latest one during backup sync as same as the backup CR. Also fix similar problem with DataUploadResult for data mover restore. (#6533, @Lyndon-Li)
* Fix issue #6519. Restrict the client manager of node-agent server to include only Velero resources from the server's namespace, otherwise, the controllers will try to reconcile CRs from all the installed Velero namespaces. (#6523, @Lyndon-Li)
* Track the skipped PVC and print the summary in backup log (#6496, @reasonerjt)
* Add restore finalizer to clean up external resources (#6479, @allenxu404)
* fix: Typos and add more spell checking rules to CI (#6415, @mateusoliveira43)
* Add missing CompletionTimestamp and metrics when restore moved into terminal phase in restoreOperationsReconciler (#6397, @Nutrymaco)
* Add support for resource Modifications in the restore flow. Also known as JSON Substitutions. (#6452, @anshulahuja98)
* Remove dependency of the legacy client code from pkg/cmd directory part 2 (#6497, @blackpiglet)
* Add data upload and download metrics (#6493, @allenxu404)
* Fix issue 6490, If a backup/restore has multiple async operations and one operation fails while others are still in-progress, when all the operations finish, the backup/restore will be set as Completed falsely (#6491, @Lyndon-Li)
* Velero Plugins no longer need kopia indirect dependency in their go.mod (#6484, @kaovilai)
* Remove dependency of the legacy client code from pkg/cmd directory (#6469, @blackpiglet)
* Add support for OpenStack CSI drivers topology keys (#6464, @openstack-csi-topology-keys)
* Add exit code log and possible memory shortage warning log for Restic command failure. (#6459, @blackpiglet)
* Add UT cases for pkg/podvolume (#6336, @Lyndon-Li)
* Remove Wait VolumeSnapshot to ReadyToUse logic. (#6327, @blackpiglet)
* Enhance the code because of #6297, the return value of GetBucketRegion is not recorded, as a result, when it fails, we have no way to get the cause (#6326, @Lyndon-Li)
* Skip updating status when CRDs are restored (#6325, @reasonerjt)
* Include namespaces needed by namespaced-scope resources in backup. (#6320, @blackpiglet)
* Update metrics when backup failed with validation error (#6318, @ywk253100)
* Add the code for data mover backup expose (#6308, @Lyndon-Li)
* Fix a PVR issue for generic data path -- the namespace remap was not honored, and enhance the code for better error handling (#6303, @Lyndon-Li)
* Add default values for defaultItemOperationTimeout and itemOperationSyncFrequency in velero CLI (#6298, @shubham-pampattiwar)
* Add UT cases for pkg/repository (#6296, @Lyndon-Li)
* Fix issue #5875. Since Kopia has supported IAM, Velero should not require static credentials all the time (#6283, @Lyndon-Li)
* Fixed a bug where status.progress is not getting updated for backups. (#6276, @kkothule)
* Add code change for async generic data path that is used by both PVB/PVR and data mover (#6226, @Lyndon-Li)
* Add data mover CRD under v2alpha1, include DataUpload CRD and DataDownload CRD (#6176, @Lyndon-Li)
* Remove any dataSource or dataSourceRef fields from PVCs in PVC BIA for cases of
prior PVC restores with CSI (#6111, @eemcmullan)
* Add the design for Volume Snapshot Data Movement (#5968, @Lyndon-Li)
* Fix issue #5123, Kopia repository supports self-cert CA for S3 compatible storage. (#6268, @Lyndon-Li)
* Bump up Kopia to v0.13 (#6248, @Lyndon-Li)
* log volumes to backup to help debug why `IsPodRunning` is called. (#6232, @kaovilai)
* Enable errcheck linter and resolve found issues (#6208, @blackpiglet)
* Enable more linters, and remove mal-functioned milestoned issue action. (#6194, @blackpiglet)
* Enable stylecheck linter and resolve found issues. (#6185, @blackpiglet)
* Fix issue #6182. If pod is not running, don't treat it as an error, let it go and leave a warning. (#6184, @Lyndon-Li)
* Enable staticcheck and resolve found issues (#6183, @blackpiglet)
* Enable linter revive and resolve found errors: part 2 (#6177, @blackpiglet)
* Enable linter revive and resolve found errors: part 1 (#6173, @blackpiglet)
* Fix usestdlibvars and whitespace linters issues. (#6162, @blackpiglet)
* Update Golang to v1.20 for main. (#6158, @blackpiglet)
* Make GetPluginConfig accessible from other packages. (#6151, @tkaovila)
* Ignore not found error during patching managedFields (#6136, @ywk253100)
* Fix the goreleaser issues and add a new goreleaser action (#6109, @blackpiglet)
Velero introduced the Resource Modifiers in v1.12.0. This feature allows users to specify a ConfigMap with a set of rules to modify the resources during restoration. However, only the JSON Patch is supported when creating the rules, and JSON Patch has some limitations, which cannot cover all use cases. In v1.13.0, Velero adds new support for JSON Merge Patch and Strategic Merge Patch, which provide more power and flexibility and allow users to use the same ConfigMap to apply patches on the resources. More design details can be found in [Support JSON Merge Patch and Strategic Merge Patch in Resource Modifiers](https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/merge-patch-and-strategic-in-resource-modifier.md) design. For instructions on how to use the feature, please refer to the [Resource Modifiers](https://velero.io/docs/v1.13/restore-resource-modifiers/) doc.
#### Node-Agent Concurrency
Velero data movement activities from fs-backups and CSI snapshot data movements run in Velero node-agent, so may be hosted by every node in the cluster and consume resources (i.e. CPU, memory, network bandwidth) from there. With v1.13, users are allowed to configure how many data movement activities (a.k.a, loads) run in each node globally or by node, so that users can better leverage the performance of Velero data movement activities and the resource consumption in the cluster. For more information, check the [Node-Agent Concurrency](https://velero.io/docs/v1.13/node-agent-concurrency/) document.
#### Parallel Files Upload Options
Velero now supports configurable options for parallel files upload when using Kopia uploader to do fs-backups or CSI snapshot data movements which makes speed up backup possible.
For more information, please check [Here](https://velero.io/docs/v1.13/backup-reference/#parallel-files-upload).
#### Write Sparse Files Options
If using fs-restore or CSI snapshot data movements, it’s supported to write sparse files during restore. For more information, please check [Here](https://velero.io/docs/v1.13/restore-reference/#write-sparse-files).
#### Backup Describe
In v1.13, the Backup Volume section is added to the velero backup describe command output. The backup Volumes section describes information for all the volumes included in the backup of various backup types, i.e. native snapshot, fs-backup, CSI snapshot, and CSI snapshot data movement. Particularly, the velero backup description now supports showing the information of CSI snapshot data movements, which is not supported in v1.12.
Additionally, backup describe command will not check EnableCSI feature gate from client side, so if a backup has volumes with CSI snapshot or CSI snapshot data movement, backup describe command always shows the corresponding information in its output.
#### Backup's new VolumeInfo metadata
Create a new metadata file in the backup repository's backup name sub-directory to store the backup-including PVC and PV information. The information includes the backing-up method of the PVC and PV data, snapshot information, and status. The VolumeInfo metadata file determines how the PV resource should be restored. The Velero downstream software can also use this metadata file to get a summary of the backup's volume data information.
#### Enhancement for CSI Snapshot Data Movements when Velero Pod Restart
When performing backup and restore operations, enhancements have been implemented for Velero server pods or node agents to ensure that the current backup or restore process is not stuck or interrupted after restart due to certain exceptional circumstances.
#### New status fields added to show hook execution details
Hook execution status is now included in the backup/restore CR status and displayed in the backup/restore describe command output. Specifically, it will show the number of hooks which attempted to execute under the HooksAttempted field and the number of hooks which failed to execute under the HooksFailed field.
#### AWS SDK Bump Up
Bump up AWS SDK for Go to version 2, which offers significant performance improvements in CPU and memory utilization over version 1.
#### Azure AD/Workload Identity Support
Azure AD/Workload Identity is the recommended approach to do the authentication with Azure services/AKS, Velero has introduced support for Azure AD/Workload Identity on the Velero Azure plugin side in previous releases, and in v1.13.0 Velero adds new support for Kopia operations(file system backup/data mover/etc.) with Azure AD/Workload Identity.
#### Runtime and dependencies
To fix CVEs and keep pace with Golang, Velero made changes as follows:
* Bump Golang runtime to v1.21.6.
* Bump several dependent libraries to new versions.
* Bump Kopia to v0.15.0.
### Breaking changes
* Backup describe command: due to the backup describe output enhancement, some existing information (i.e. the output for native snapshot, CSI snapshot, and fs-backup) has been moved to the Backup Volumes section with some format changes.
* API type changes: changes the field [DataMoverConfig](https://github.com/vmware-tanzu/velero/blob/v1.13.0/pkg/apis/velero/v2alpha1/data_upload_types.go#L54) in DataUploadSpec from `*map[string][string]`` to `map[string]string`
* Velero install command: due to the issue [#7264](https://github.com/vmware-tanzu/velero/issues/7264), v1.13.0 introduces a break change that make the informer cache enabled by default to keep the actual behavior consistent with the helper message(the informer cache is disabled by default before the change).
### Limitations/Known issues
* The backup's VolumeInfo metadata doesn't have the information updated in the async operations. This function could be supported in v1.14 release.
### Note
* Velero introduces the informer cache which is enabled by default. The informer cache improves the restore performance but may cause higher memory consumption. Increase the memory limit of the Velero pod or disable the informer cache by specifying the `--disable-informer-cache` option when installing Velero if you get the OOM error.
### Deprecation announcement
* The generated k8s clients, informers, and listers are deprecated in the Velero v1.13 release. They are put in the Velero repository's pkg/generated directory. According to the n+2 supporting policy, the deprecated are kept for two more releases. The pkg/generated directory should be deleted in the v1.15 release.
* After the backup VolumeInfo metadata file is added to the backup, Velero decides how to restore the PV resource according to the VolumeInfo content. To support the backup generated by the older version of Velero, the old logic is also kept. The support for the backup without the VolumeInfo metadata file will be kept for two releases. The support logic will be deleted in the v1.15 release.
### All Changes
* Make "disable-informer-cache" option false(enabled) by default to keep it consistent with the help message (#7294, @ywk253100)
* Do not set "targetNamespace" to namespace items (#7274, @reasonerjt)
* Fix issue #7244. By the end of the upload, check the outstanding incomplete snapshots and delete them by calling ApplyRetentionPolicy (#7245, @Lyndon-Li)
* Adjust the newline output of resource list in restore describer (#7238, @allenxu404)
* Remove the redundant newline in backup describe output (#7229, @allenxu404)
* Fix issue #7189, data mover generic restore - don't assume the first volume as the restore volume (#7201, @Lyndon-Li)
* Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content. (#7184, @blackpiglet)
* Refactor DownloadRequest Stream function (#7175, @blackpiglet)
* Add `--skip-immediately` flag to schedule commands; `--schedule-skip-immediately` server and install (#7169, @kaovilai)
* Add node-agent concurrency doc and change the config name from dataPathConcurrency to loadCocurrency (#7161, @Lyndon-Li)
* Enhance hooks tracker by adding a returned error to record function (#7153, @allenxu404)
* Track the skipped PV when SnapshotVolumes set as false (#7152, @reasonerjt)
* Add more linters part 2. (#7151, @blackpiglet)
* Fix issue #7135, check pod status before checking node-agent pod status (#7150, @Lyndon-Li)
* Treat namespace as a regular restorable item (#7143, @reasonerjt)
* Fix issue #6695, add describe for data mover backups (#7125, @Lyndon-Li)
* Add hooks status to backup/restore CR (#7117, @allenxu404)
* Include plugin name in the error message by operations (#7115, @reasonerjt)
* Fix issue #7068, due to a behavior of CSI external snapshotter, manipulations of VS and VSC may not be handled in the same order inside external snapshotter as the API is called. So add a protection finalizer to ensure the order (#7102, @Lyndon-Li)
* Generate VolumeInfo for backup. (#7100, @blackpiglet)
* Fix issue #7094, fallback to full backup if previous snapshot is not found (#7096, @Lyndon-Li)
* Fix issue #7068, due to an behavior of CSI external snapshotter, manipulations of VS and VSC may not be handled in the same order inside external snapshotter as the API is called. So add a protection finalizer to ensure the order (#7095, @Lyndon-Li)
* Skip syncing the backup which doesn't contain backup metadata (#7081, @ywk253100)
* Fix issue #6693, partially fail restore if CSI snapshot is involved but CSI feature is not ready, i.e., CSI feature gate is not enabled or CSI plugin is not installed. (#7077, @Lyndon-Li)
* Truncate the credential file to avoid the change of secret content messing it up (#7072, @ywk253100)
* improve discoveryHelper.Refresh() in restore (#7069, @27149chen)
* Add DataUpload Result and CSI VolumeSnapshot check for restore PV. (#7061, @blackpiglet)
* Add the implementation for design #6950, configurable data path concurrency (#7059, @Lyndon-Li)
* Make data mover fail early (#7052, @qiuming-best)
* Remove dependency of generated client part 3. (#7051, @blackpiglet)
* Update Backup.Status.CSIVolumeSnapshotsCompleted during finalize (#7046, @kaovilai)
* Remove the Velero generated client. (#7041, @blackpiglet)
* Fix issue #7027, data mover backup exposer should not assume the first volume as the backup volume in backup pod (#7038, @Lyndon-Li)
* Read information from the credential specified by BSL (#7034, @ywk253100)
* Fix #6857. Added check for matching Owner References when synchronizing backups, removing references that are not found/have mismatched uid. (#7032, @deefdragon)
* Add description markers for dataupload and datadownload CRDs (#7028, @shubham-pampattiwar)
* Add HealthCheckNodePort deletion logic for Service restore. (#7026, @blackpiglet)
* Fix inconsistent behavior of Backup and Restore hook execution (#7022, @allenxu404)
* Fix #6964. Don't use csiSnapshotTimeout (10 min) for waiting snapshot to readyToUse for data mover, so as to make the behavior complied with CSI snapshot backup (#7011, @Lyndon-Li)
* restore: Use warning when Create IsAlreadyExist and Get error (#7004, @kaovilai)
* Bump kopia to 0.15.0 (#7001, @Lyndon-Li)
* Make Kopia file parallelism configurable (#7000, @qiuming-best)
* It is a valid case that the Status.RestoreSize field in VolumeSnapshot is not set, if so, get the volume size from the source PVC to create the backup PVC (#6976, @Lyndon-Li)
* Check whether the action is a CSI action and whether CSI feature is enabled, before executing the action. (#6968, @blackpiglet)
* Add the PV backup information design document. (#6962, @blackpiglet)
* Change controller-runtime List option from MatchingFields to ListOptions (#6958, @blackpiglet)
* Add the design for node-agent concurrency (#6950, @Lyndon-Li)
* Import auth provider plugins (#6947, @0x113)
* Fix #6668, add a limitation for file system restore parallelism with other types of restores (CSI snapshot restore, CSI snapshot movement restore) (#6946, @Lyndon-Li)
* Add MSI Support for Azure plugin. (#6938, @yanggangtony)
* Partially fix #6734, guide Kubernetes' scheduler to spread backup pods evenly across nodes as much as possible, so that data mover backup could achieve better parallelism (#6926, @Lyndon-Li)
* Bump up aws sdk to aws-sdk-go-v2 (#6923, @reasonerjt)
* Optional check if targeted container is ready before executing a hook (#6918, @Ripolin)
* Support JSON Merge Patch and Strategic Merge Patch in Resource Modifiers (#6917, @27149chen)
* Fix issue 6913: Velero Built-in Datamover: Backup stucks in phase WaitingForPluginOperations when Node Agent pod gets restarted (#6914, @shubham-pampattiwar)
* Set ParallelUploadAboveSize as MaxInt64 and flush repo after setting up policy so that policy is retrieved correctly by TreeForSource (#6885, @Lyndon-Li)
* Replace the base image with paketobuildpacks image (#6883, @ywk253100)
* Fix issue #6859, move plugin depending podvolume functions to util pkg, so as to remove the dependencies to unnecessary repository packages like kopia, azure, etc. (#6875, @Lyndon-Li)
* Fix #6861. Only Restic path requires repoIdentifier, so for non-restic path, set the repoIdentifier fields as empty in PVB and PVR and also remove the RepoIdentifier column in the get output of PVBs and PVRs (#6872, @Lyndon-Li)
* Add volume types filter in resource policies (#6863, @qiuming-best)
* change the metrics backup_attempt_total default value to 1. (#6838, @yanggangtony)
* Bump kopia to v0.14 (#6833, @Lyndon-Li)
* Retry failed create when using generateName (#6830, @sseago)
* Fix issue #6786, always delete VSC regardless of the deletion policy (#6827, @Lyndon-Li)
* Proposal to support JSON Merge Patch and Strategic Merge Patch in Resource Modifiers (#6797, @27149chen)
* Fix the node-agent missing metrics-address defines. (#6784, @yanggangtony)
* Fix default BSL setting not work (#6771, @qiuming-best)
* Update restore controller logic for restore deletion (#6770, @ywk253100)
* Fix issue #6753, remove the check for read-only BSL in restore async operation controller since Velero cannot fully support read-only mode BSL in restore at present (#6757, @Lyndon-Li)
* Fix issue #6647, add the --default-snapshot-move-data parameter to Velero install, so that users don't need to specify --snapshot-move-data per backup when they want to move snapshot data for all backups (#6751, @Lyndon-Li)
* Use old(origin) namespace in resource modifier conditions in case namespace may change during restore (#6724, @27149chen)
* Perf improvements for existing resource restore (#6723, @sseago)
* Remove schedule-related metrics on schedule delete (#6715, @nilesh-akhade)
* Kubernetes 1.27 new job label batch.kubernetes.io/controller-uid are deleted during restore per https://github.com/kubernetes/kubernetes/pull/114930 (#6712, @kaovilai)
* This pr made some improvements in Resource Modifiers: 1. add label selector 2. change the field name from groupKind to groupResource (#6704, @27149chen)
* Make Kopia support Azure AD (#6686, @ywk253100)
* Add support for block volumes with Kopia (#6680, @dzaninovic)
* Delete PartiallyFailed orphaned backups as well as Completed ones (#6649, @sseago)
* Add CSI snapshot data movement doc (#6637, @Lyndon-Li)
* Fixes #6636, skip subresource in resource discovery (#6635, @27149chen)
* Add `orLabelSelectors` for backup, restore commands (#6475, @nilesh-akhade)
* fix run preHook and postHook on completed pods (#5211, @cleverhu)
#### The maintenance work for kopia/restic backup repositories is run in jobs
Since velero started using kopia as the approach for filesystem-level backup/restore, we've noticed an issue when velero connects to the kopia backup repositories and performs maintenance, it sometimes consumes excessive memory that can cause the velero pod to get OOM Killed. To mitigate this issue, the maintenance work will be moved out of velero pod to a separate kubernetes job, and the user will be able to specify the resource request in "velero install".
#### Volume Policies are extended to support more actions to handle volumes
In an earlier release, a flexible volume policy was introduced to skip certain volumes from a backup. In v1.14 we've made enhancement to this policy to allow the user to set how the volumes should be backed up. The user will be able to set "fs-backup" or "snapshot" as value of “action" in the policy and velero will backup the volumes accordingly. This enhancement allows the user to achieve a fine-grained control like "opt-in/out" without having to update the target workload. For more details please refer to https://velero.io/docs/v1.14/resource-filtering/#supported-volumepolicy-actions
#### Node Selection for Data Movement Backup
In velero the data movement flow relies on datamover pods, and these pods may take substantial resources and keep running for a long time. In v1.14, the user will be able to create a configmap to define the eligible nodes on which the datamover pods are launched. For more details refer to https://velero.io/docs/v1.14/data-movement-backup-node-selection/
#### VolumeInfo metadata for restored volumes
In v1.13, we introduced volumeinfo metadata for backup to help velero CLI and downstream adopter understand how velero handles each volume during backup. In v1.14, similar metadata will be persisted for each restore. velero CLI is also updated to bring more info in the output of "velero restore describe".
#### "Finalizing" phase is introduced to restores
The "Finalizing" phase is added to the state transition flow to restore, which helps us fix several issues: The labels added to PVs will be restored after the data in the PV is restored via volumesnapshotter. The post restore hook will be executed after datamovement is finished.
#### Certificate-based authentication support for Azure
Besides the service principal with secret(password)-based authentication, Velero introduces the new support for service principal with certificate-based authentication in v1.14.0. This approach enables you to adopt a phishing resistant authentication by using conditional access policies, which better protects Azure resources and is the recommended way by Azure.
### Runtime and dependencies
* Golang runtime: v1.22.2
* kopia: v0.17.0
### Limitations/Known issues
* For the external BackupItemAction plugins that take snapshots for PVs, such as vsphere plugin. If the plugin checks the value of the field "snapshotVolumes" in the backup spec as a criteria for snapshot, the settings in the volume policy will not take effect. For example, if the "snapshotVolumes" is set to False in the backup spec, but a volume meets the condition in the volume policy for "snapshot" action, because the plugin will not check the settings in the volume policy, the plugin will not take snapshot for the volume. For more details please refer to #7818
### Breaking changes
* CSI plugin has been merged into velero repo in v1.14 release. It will be installed by default as an internal plugin, and should not be installed via "–plugins " parameter in "velero install" command.
* The default resource requests and limitations for node agent are removed in v1.14, to make the node agent pods have the QoS class of "BestEffort", more details please refer to #7391
* There's a change in namespace filtering behavior during backup: In v1.14, when the includedNamespaces/excludedNamespaces fields are not set and the labelSelector/OrLabelSelectors are set in the backup spec, the backup will only include the namespaces which contain the resources that match the label selectors, while in previous releases all namespaces will be included in the backup with such settings. More details refer to #7105
* Patching the PV in the "Finalizing" state may cause the restore to be in "PartiallyFailed" state when the PV is blocked in "Pending" state, while in the previous release the restore may end up being in "Complete" state. For more details refer to #7866
### All Changes
* Fix backup log to show error string, not index (#7805, @piny940)
* Modify the volume helper logic. (#7794, @blackpiglet)
* Add documentation for extension of volume policy feature (#7779, @shubham-pampattiwar)
* Surface errors when waiting for backupRepository and timeout occurs (#7762, @kaovilai)
* Add existingResourcePolicy restore CR validation to controller (#7757, @kaovilai)
* Fix condition matching in resource modifier when there are multiple rules (#7715, @27149chen)
* Bump up the version of KinD and k8s in github actions (#7702, @reasonerjt)
* Implementation for Extending VolumePolicies to support more actions (#7664, @shubham-pampattiwar)
* Migrate from `github.com/Azure/azure-storage-blob-go` to `github.com/Azure/azure-sdk-for-go/sdk/storage/azblob` (#7598, @mmorel-35)
* When Included/ExcludedNamespaces are omitted, and LabelSelector or OrLabelSelector is used, namespaces without selected items are excluded from backup. (#7697, @blackpiglet)
* Display CSI snapshot restores in restore describe (#7687, @reasonerjt)
* Use specific credential rather than the credential chain for Azure (#7680, @ywk253100)
* Modify hook docs for clarity on displaying hook execution results (#7679, @allenxu404)
* Wait for results of restore exec hook executions in Finalizing phase instead of InProgress phase (#7619, @allenxu404)
* migrating to `sdk/resourcemanager/**/arm**` from `services/**/mgmt/**` (#7596, @mmorel-35)
* Bump up to go1.22 (#7666, @reasonerjt)
* Fix issue #7648. Adjust the exposing logic to avoid exposing failure and snapshot leak when expose fails (#7662, @Lyndon-Li)
* Track and persist restore volume info (#7630, @reasonerjt)
* Check the existence of the namespaces provided in the "--include-namespaces" option (#7569, @ywk253100)
* Add the finalization phase to the restore workflow (#7377, @allenxu404)
* Upgrade the version of go plugin related libs/tools (#7373, @ywk253100)
* Check resource Group Version and Kind is available in cluster before attempting restore to prevent being stuck. (#7322, @kaovilai)
* Merge CSI plugin code into Velero. (#7609, @blackpiglet)
* Fix issue #7391, remove the default constraint for node-agent pods (#7488, @Lyndon-Li)
* Fix DataDownload fails during restore for empty PVC workload (#7521, @qiuming-best)
Data transfer activities for CSI Snapshot Data Movement are moved from node-agent pods to dedicate backupPods or restorePods. This brings many benefits such as:
- This avoids to access volume data through host path, while host path access is privileged and may involve security escalations, which are concerned by users.
- This enables users to to control resource (i.e., cpu, memory) allocations in a granular manner, e.g., control them per backup/restore of a volume.
- This enhances the resilience, crash of one data movement activity won't affect others.
- This prevents unnecessary full backup because of host path changes after workload pods restart.
- For more information, check the design https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/vgdp-micro-service/vgdp-micro-service.md.
#### Item Block concepts and ItemBlockAction (IBA) plugin
Item Block concepts are introduced for resource backups to help to achieve multiple thread backups. Specifically, correlated resources are categorized in the same item block and item blocks could be processed concurrently in multiple threads.
ItemBlockAction plugin is introduced to help Velero to categorize resources into item blocks. At present, Velero provides built-in IBAs for pods and PVCs and Velero also supports customized IBAs for any resources.
In v1.15, Velero doesn't support multiple thread process of item blocks though item block concepts and IBA plugins are fully supported. The multiple thread support will be delivered in future releases.
For more information, check the design https://github.com/vmware-tanzu/velero/blob/main/design/backup-performance-improvements.md.
#### Node selection for repository maintenance job
Repository maintenance are resource consuming tasks, Velero now allows you to configure the nodes to run repository maintenance jobs, so that you can run repository maintenance jobs in idle nodes or avoid them to run in nodes hosting critical workloads.
To support the configuration, a new repository maintenance configuration configMap is introduced.
For more information, check the document https://velero.io/docs/v1.15/repository-maintenance/.
#### Backup PVC read-only configuration
In 1.15, Velero allows you to configure the data mover backupPods to read-only mount the backupPVCs. In this way, the data mover expose process could be significantly accelerated for some storages (i.e., ceph).
To support the configuration, a new backup PVC configuration configMap is introduced.
For more information, check the document https://velero.io/docs/v1.15/data-movement-backup-pvc-configuration/.
#### Backup PVC storage class configuration
In 1.15, Velero allows you to configure the storageclass used by the data mover backupPods. In this way, the provision of backupPVCs don't need to adhere to the same pattern as workload PVCs, e.g., for a backupPVC, it only needs one replica, whereas, the a workload PVC may have multiple replicas.
To support the configuration, the same backup PVC configuration configMap is used.
For more information, check the document https://velero.io/docs/v1.15/data-movement-backup-pvc-configuration/.
#### Backup repository data cache configuration
The backup repository may need to cache data on the client side during various repository operations, i.e., read, write, maintenance, etc. The cache consumes the root file system space of the pod where the repository access happens.
In 1.15, Velero allows you to configure the total size of the cache per repository. In this way, if your pod doesn't have enough space in its root file system, the pod won't be evicted due to running out of ephemeral storage.
To support the configuration, a new backup repository configuration configMap is introduced.
For more information, check the document https://velero.io/docs/v1.15/backup-repository-configuration/.
#### Performance improvements
In 1.15, several performance related issues/enhancements are included, which makes significant performance improvements in specific scenarios:
- There was a memory leak of Velero server after plugin calls, now it is fixed, see issue https://github.com/vmware-tanzu/velero/issues/7925
- The `client-burst/client-qps` parameters are automatically inherited to plugins, so that you can use the same velero server parameters to accelerate the plugin executions when large number of API server calls happen, see issue https://github.com/vmware-tanzu/velero/issues/7806
- Maintenance of Kopia repository takes huge memory in scenarios that huge number of files have been backed up, Velero 1.15 has included the Kopia upstream enhancement to fix the problem, see issue https://github.com/vmware-tanzu/velero/issues/7510
### Runtime and dependencies
Golang runtime: v1.22.8
kopia: v0.17.0
### Limitations/Known issues
#### Read-only backup PVC may not work on SELinux environments
Due to an issue of Kubernetes upstream, if a volume is mounted as read-only in SELinux environments, the read privilege is not granted to any user, as a result, the data mover backup will fail. On the other hand, the backupPVC must be mounted as read-only in order to accelerate the data mover expose process.
Therefore, a user option is added in the same backup PVC configuration configMap, once the option is enabled, the backupPod container will run as a super privileged container and disable SELinux access control. If you have concern in this super privileged container or you have configured [pod security admissions](https://kubernetes.io/docs/concepts/security/pod-security-admission/) and don't allow super privileged containers, you will not be able to use this read-only backupPVC feature and lose the benefit to accelerate the data mover expose process.
### Breaking changes
#### Deprecation of Restic
Restic path for fs-backup is in deprecation process starting from 1.15. According to [Velero deprecation policy](https://github.com/vmware-tanzu/velero/blob/v1.15/GOVERNANCE.md#deprecation-policy), for 1.15, if Restic path is used the backup/restore of fs-backup still creates and succeeds, but you will see warnings in below scenarios:
- When `--uploader-type=restic` is used in Velero installation
- When Restic path is used to create backup/restore of fs-backup
#### node-agent configuration name is configurable
Previously, a fixed name is searched for node-agent configuration configMap. Now in 1.15, Velero allows you to customize the name of the configMap, on the other hand, the name must be specified by node-agent server parameter `node-agent-configmap`.
#### Repository maintenance job configurations in Velero server parameter are moved to repository maintenance job configuration configMap
In 1.15, below Velero server parameters for repository maintenance jobs are moved to the repository maintenance job configuration configMap. While for back compatibility reason, the same Velero sever parameters are preserved as is. But the configMap is recommended and the same values in the configMap take preference if they exist in both places:
```
--keep-latest-maintenance-jobs
--maintenance-job-cpu-request
--maintenance-job-mem-request
--maintenance-job-cpu-limit
--maintenance-job-mem-limit
```
#### Changing PVC selected-node feature is deprecated
In 1.15, the [Changing PVC selected-node feature](https://velero.io/docs/v1.15/restore-reference/#changing-pvc-selected-node) enters deprecation process and will be removed in future releases according to [Velero deprecation policy](https://github.com/vmware-tanzu/velero/blob/v1.15/GOVERNANCE.md#deprecation-policy). Usage of this feature for any purpose is not recommended.
### All Changes
* add no-relabeling option to backupPVC configmap (#8288, @sseago)
* only set spec.volumes readonly if PVC is readonly for datamover (#8284, @sseago)
* Add labels to maintenance job pods (#8256, @shubham-pampattiwar)
* Add the Carvel package related resources to the restore priority list (#8228, @ywk253100)
* Reduces indirect imports for plugin/framework importers (#8208, @kaovilai)
* Add controller name to periodical_enqueue_source. The logger parameter now includes an additional field with the value of reflect.TypeOf(objList).String() and another field with the value of controllerName. (#8198, @kaovilai)
* Update Openshift SCC docs link (#8170, @shubham-pampattiwar)
* Modify E2E and perf test report generated directory (#8129, @blackpiglet)
* Add docs for backup pvc config support (#8119, @shubham-pampattiwar)
* Delete generated k8s client and informer. (#8114, @blackpiglet)
* Add support for backup PVC configuration (#8109, @shubham-pampattiwar)
* ItemBlock model and phase 1 (single-thread) workflow changes (#8102, @sseago)
* Fix issue #8032, make node-agent configMap name configurable (#8097, @Lyndon-Li)
* Fix issue #8072, add the warning messages for restic deprecation (#8096, @Lyndon-Li)
* Fix issue #7620, add backup repository configuration implementation and support cacheLimit configuration for Kopia repo (#8093, @Lyndon-Li)
* Patch dbr's status when error happens (#8086, @reasonerjt)
* According to design #7576, after node-agent restarts, if a DU/DD is in InProgress status, re-capture the data mover ms pod and continue the execution (#8085, @Lyndon-Li)
* Updates to IBM COS documentation to match current version (#8082, @gjanders)
* Data mover micro service DUCR/DDCR controller refactor according to design #7576 (#8074, @Lyndon-Li)
* add retries with timeout to existing patch calls that moves a backup/restore from InProgress/Finalizing to a final status phase. (#8068, @kaovilai)
* Data mover micro service restore according to design #7576 (#8061, @Lyndon-Li)
In v1.16, Velero supports to run in Windows clusters and backup/restore Windows workloads, either stateful or stateless:
* Hybrid build and all-in-one image: the build process is enhanced to build an all-in-one image for hybrid CPU architecture and hybrid platform. For more information, check the design https://github.com/vmware-tanzu/velero/blob/main/design/multiple-arch-build-with-windows.md
* Deployment in Windows clusters: Velero node-agent, data mover pods and maintenance jobs now support to run in both linux and Windows nodes
* Data mover backup/restore Windows workloads: Velero built-in data mover supports Windows workloads throughout its full cycle, i.e., discovery, backup, restore, pre/post hook, etc. It automatically identifies Windows workloads and schedules data mover pods to the right group of nodes
Check the epic issue https://github.com/vmware-tanzu/velero/issues/8289 for more information.
#### Parallel Item Block backup
v1.16 now supports to back up item blocks in parallel. Specifically, during backup, correlated resources are grouped in item blocks and Velero backup engine creates a thread pool to back up the item blocks in parallel. This significantly improves the backup throughput, especially when there are large scale of resources.
Pre/post hooks also belongs to item blocks, so will also run in parallel along with the item blocks.
Users are allowed to configure the parallelism through the `--item-block-worker-count` Velero server parameter. If not configured, the default parallelism is 1.
For more information, check issue https://github.com/vmware-tanzu/velero/issues/8334.
#### Data mover restore enhancement in scalability
In previous releases, for each volume of WaitForFirstConsumer mode, data mover restore is only allowed to happen in the node that the volume is attached. This severely degrades the parallelism and the balance of node resource(CPU, memory, network bandwidth) consumption for data mover restore (https://github.com/vmware-tanzu/velero/issues/8044).
In v1.16, users are allowed to configure data mover restores running and spreading evenly across all nodes in the cluster. The configuration is done through a new flag `ignoreDelayBinding` in node-agent configuration (https://github.com/vmware-tanzu/velero/issues/8242).
#### Data mover enhancements in observability
In 1.16, some observability enhancements are added:
* Output various statuses of intermediate objects for failures of data mover backup/restore (https://github.com/vmware-tanzu/velero/issues/8267)
* Output the errors when Velero fails to delete intermediate objects during clean up (https://github.com/vmware-tanzu/velero/issues/8125)
The outputs are in the same node-agent log and enabled automatically.
#### CSI snapshot backup/restore enhancement in usability
In previous releases, a unnecessary VolumeSnapshotContent object is retained for each backup and synced to other clusters sharing the same backup storage location. And during restore, the retained VolumeSnapshotContent is also restored unnecessarily.
In 1.16, the retained VolumeSnapshotContent is removed from the backup, so no unnecessary CSI objects are synced or restored.
For more information, check issue https://github.com/vmware-tanzu/velero/issues/8725.
#### Backup Repository Maintenance enhancement in resiliency and observability
In v1.16, some enhancements of backup repository maintenance are added to improve the observability and resiliency:
* A new backup repository maintenance history section, called `RecentMaintenance`, is added to the BackupRepository CR. Specifically, for each BackupRepository, including start/completion time, completion status and error message. (https://github.com/vmware-tanzu/velero/issues/7810)
* Running maintenance jobs are now recaptured after Velero server restarts. (https://github.com/vmware-tanzu/velero/issues/7753)
* The maintenance job will not be launched for readOnly BackupStorageLocation. (https://github.com/vmware-tanzu/velero/issues/8238)
* The backup repository will not try to initialize a new repository for readOnly BackupStorageLocation. (https://github.com/vmware-tanzu/velero/issues/8091)
* Users now are allowed to configure the intervals of an effective maintenance in the way of `normalGC`, `fastGC` and `eagerGC`, through the `fullMaintenanceInterval` parameter in backupRepository configuration. (https://github.com/vmware-tanzu/velero/issues/8364)
#### Volume Policy enhancement of filtering volumes by PVC labels
In v1.16, Volume Policy is extended to support filtering volumes by PVC labels. (https://github.com/vmware-tanzu/velero/issues/8256).
#### Resource Status restore per object
In v1.16, users are allowed to define whether to restore resource status per object through an annotation `velero.io/restore-status` set on the object. (https://github.com/vmware-tanzu/velero/issues/8204).
#### Velero Restore Helper binary is merged into Velero image
In v1.16, Velero banaries, i.e., velero, velero-helper and velero-restore-helper, are all included into the single Velero image. (https://github.com/vmware-tanzu/velero/issues/8484).
### Runtime and dependencies
Golang runtime: 1.23.7
kopia: 0.19.0
### Limitations/Known issues
#### Limitations of Windows support
* fs-backup is not supported for Windows workloads and so fs-backup runs only in linux nodes for linux workloads
* Backup/restore of NTFS extended attributes/advanced features are not supported, i.e., Security Descriptors, System/Hidden/ReadOnly attributes, Creation Time, NTFS Streams, etc.
### All Changes
* Add third party annotation support for maintenance job, so that the declared third party annotations could be added to the maintenance job pods (#8812, @Lyndon-Li)
* Fix issue #8803, use deterministic name to create backupRepository (#8808, @Lyndon-Li)
* Refactor restoreItem and related functions to differentiate the backup resource name and the restore target resource name. (#8797, @blackpiglet)
* ensure that PV is removed before VS is deleted (#8777, @ix-rzi)
* host_pods should not be mandatory to node-agent (#8774, @mpryc)
* Log doesn't show pv name, but displays %!s(MISSING) instead (#8771, @hu-keyu)
* Fix issue #8754, add third party annotation support for data mover (#8770, @Lyndon-Li)
* Add docs for volume policy with labels as a criteria (#8759, @shubham-pampattiwar)
* Move pvc annotation removal from CSI RIA to regular PVC RIA (#8755, @sseago)
* Add doc for maintenance history (#8747, @Lyndon-Li)
* Fix issue #8733, add doc for restorePVC (#8737, @Lyndon-Li)
* Fix issue #8426, add doc for Windows support (#8736, @Lyndon-Li)
* Return directly if no pod volme backup are tracked (#8728, @ywk253100)
* Fix issue #8706, for immediate volumes, there is no selected-node annotation on PVC, so deduce the attached node from VolumeAttachment CRs (#8715, @Lyndon-Li)
* Add labels as a criteria for volume policy (#8713, @shubham-pampattiwar)
* Copy SecurityContext from Containers[0] if present for PVR (#8712, @sseago)
* Support pushing images to an insecure registry (#8703, @ywk253100)
* Modify golangci configuration to make it work. (#8695, @blackpiglet)
* Run backup post hooks inside ItemBlock synchronously (#8694, @ywk253100)
* Add docs for object level status restore (#8693, @shubham-pampattiwar)
* Clean artifacts generated during CSI B/R. (#8684, @blackpiglet)
* Don't run maintenance on the ReadOnly BackupRepositories. (#8681, @blackpiglet)
* Fix issue #8418, add Windows toleration to data mover pods (#8606, @Lyndon-Li)
* Check the PVB status via podvolume Backupper rather than calling API server to avoid API server issue (#8603, @ywk253100)
* Fix issue #8067, add tmp folder (/tmp for linux, C:\Windows\Temp for Windows) as an alternative of udmrepo's config file location (#8602, @Lyndon-Li)
* Data mover restore for Windows (#8594, @Lyndon-Li)
* Skip patching the PV in finalization for failed operation (#8591, @reasonerjt)
* Fix issue #8579, set event burst to block event broadcaster from filtering events (#8590, @Lyndon-Li)
* Configurable Kopia Maintenance Interval. backup-repository-configmap adds an option for configurable`fullMaintenanceInterval` where fastGC (12 hours), and eagerGC (6 hours) allowing for faster removal of deleted velero backups from kopia repo. (#8581, @kaovilai)
* Fix issue #7753, recall repo maintenance history on Velero server restart (#8580, @Lyndon-Li)
* Clear validation errors when schedule is valid (#8575, @ywk253100)
* Merge restore helper image into Velero server image (#8574, @ywk253100)
* Don't include excluded items in ItemBlocks (#8572, @sseago)
* fs uploader and block uploader support Windows nodes (#8569, @Lyndon-Li)
* Fix issue #8418, support data mover backup for Windows nodes (#8555, @Lyndon-Li)
* Fix issue #8044, allow users to ignore delay binding the restorePVC of data mover when it is in WaitForFirstConsumer mode (#8550, @Lyndon-Li)
* Fix issue #8539, validate uploader types when o.CRDsOnly is set to false only since CRD installation doesn't rely on uploader types (#8538, @Lyndon-Li)
* Fix issue #7810, add maintenance history for backupRepository CRs (#8532, @Lyndon-Li)
* Make fs-backup work on linux nodes with the new Velero deployment and disable fs-backup if the source/target pod is running in non-linux node (#8424) (#8518, @Lyndon-Li)
* Fix issue: backup schedule pause/unpause doesn't work (#8512, @ywk253100)
* Fix backup post hook issue #8159 (caused by #7571): always execute backup post hooks after PVBs are handled (#8509, @ywk253100)
* Fix issue #8267, enhance the error message when expose fails (#8508, @Lyndon-Li)
* Fix issue #8416, #8417, deploy Velero server and node-agent in linux/Windows hybrid env (#8504, @Lyndon-Li)
* Design to add label selector as a criteria for volume policy (#8503, @shubham-pampattiwar)
* Related to issue #8485, move the acceptedByNode and acceptedTimestamp to Status of DU/DD CRD (#8498, @Lyndon-Li)
* Add SecurityContext to restore-helper (#8491, @reasonerjt)
* Fix issue #8433, add third party labels to data mover pods when the same labels exist in node-agent pods (#8487, @Lyndon-Li)
* Fix issue #8485, add an accepted time so as to count the prepare timeout (#8486, @Lyndon-Li)
* Fix issue #8125, log diagnostic info for data mover exposers when expose timeout (#8482, @Lyndon-Li)
* Fix issue #8415, implement multi-arch build and Windows build (#8476, @Lyndon-Li)
* Pin kopia to 0.18.2 (#8472, @Lyndon-Li)
* Add nil check for updating DataUpload VolumeInfo in finalizing phase (#8471, @blackpiglet)
* Allowing Object-Level Resource Status Restore (#8464, @shubham-pampattiwar)
* For issue #8429. Add the design for multi-arch build and windows build (#8459, @Lyndon-Li)
* Upgrade go.mod k8s.io/ go.mod to v0.31.3 and implemented proper logger configuration for both client-go and controller-runtime libraries. This change ensures that logging format and level settings are properly applied throughout the codebase. The update improves logging consistency and control across the Velero system. (#8450, @kaovilai)
* Add Design for Allowing Object-Level Resource Status Restore (#8403, @shubham-pampattiwar)
* Fix issue #8391, check ErrCancelled from suffix of data mover pod's termination message (#8396, @Lyndon-Li)
* Fix issue #8394, don't call closeDataPath in VGDP callbacks, otherwise, the VGDP cleanup will hang (#8395, @Lyndon-Li)
* Adding support in velero Resource Policies for filtering PVs based on additional VolumeAttributes properties under CSI PVs (#8383, @mayankagg9722)
* Add --item-block-worker-count flag to velero install and server (#8380, @sseago)
* Make BackedUpItems thread safe (#8366, @sseago)
* Include --annotations flag in backup and restore create commands (#8354, @alromeros)
* Use aggregated discovery API to discovery API groups and resources (#8353, @ywk253100)
* Copy "envFrom" from Velero server when creating maintenance jobs (#8343, @evhan)
* Set hinting region to use for GetBucketRegion() in pkg/repository/config/aws.go (#8297, @kaovilai)
* Bump up version of client-go and controller-runtime (#8275, @ywk253100)
* fix(pkg/repository/maintenance): don't panic when there's no container statuses (#8271, @mcluseau)
* Add Backup warning for inclusion of NS managed by ArgoCD (#8257, @shubham-pampattiwar)
* Added tracking for deleted namespace status check in restore flow. (#8233, @sangitaray2021)
In v1.17, Velero fs-backup is modernized to the micro-service architecture, which brings below benefits:
- Many features that were absent to fs-backup are now available, i.e., load concurrency control, cancel, resume on restart, etc.
- fs-backup is more robust, the running backup/restore could survive from node-agent restart; and the resource allocation is in a more granular manner, the failure of one backup/restore won't impact others.
- The resource usage of node-agent is steady, especially, the node-agent pods won't request huge memory and hold it for a long time.
Check design https://github.com/vmware-tanzu/velero/blob/main/design/vgdp-micro-service-for-fs-backup/vgdp-micro-service-for-fs-backup.md for more details.
#### fs-backup support Windows cluster
In v1.17, Velero fs-backup supports to backup/restore Windows workloads. By leveraging the new micro-service architecture for fs-backup, data mover pods could run in Windows nodes and backup/restore Windows volumes. Together with CSI snapshot data movement for Windows which is delivered in 1.16, Velero now supports Windows workload backup/restore in full scenarios.
Check design https://github.com/vmware-tanzu/velero/blob/main/design/vgdp-micro-service-for-fs-backup/vgdp-micro-service-for-fs-backup.md for more details.
#### Volume group snapshot support
In v1.17, Velero supports [volume group snapshots](https://kubernetes.io/blog/2024/12/18/kubernetes-1-32-volume-group-snapshot-beta/) which is a beta feature in Kubernetes upstream, for both CSI snapshot backup and CSI snapshot data movement. This allows a snapshot to be taken from multiple volumes at the same point-in-time to achieve write order consistency, which is helpful to achieve better data consistency when multiple volumes being backed up are correlated.
Check the document https://velero.io/docs/main/volume-group-snapshots/ for more details.
#### Priority class support
In v1.17, [Kubernetes priority class](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass) is supported for all modules across Velero. Specifically, users are allowed to configure priority class to Velero server, node-agent, data mover pods, backup repository maintenance jobs separately.
Check design https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/priority-class-name-support_design.md for more details.
#### Scalability and Resiliency improvements of data movers
##### Reduce excessive number of data mover pods in Pending state
In v1.17, Velero allows users to set a `PrepareQueueLength` in the node-agent configuration, data mover pods and volumes out of this number won't be created until data path quota is available, so that excessive number cluster resources won't be taken unnecessarily, which is particularly helpful for large scale environments. This improvement applies to all kinds of data movements, including fs-backup and CSI snapshot data movement.
Check design https://github.com/vmware-tanzu/velero/blob/main/design/node-agent-load-soothing.md for more details.
##### Enhancement on node-agent restart handling for data movements
In v1.17, data movements in all phases could survive from node-agent restart and resume themselves; when a data movement gets orphaned in special cases, e.g., cluster node absent, it could also be canceled appropriately after the restart. This improvement applies to all kinds of data movements, including fs-backup and CSI snapshot data movement.
Check issue https://github.com/vmware-tanzu/velero/issues/8534 for more details.
##### CSI snapshot data movement restore node-selection and node-selection by storage class
In v1.17, CSI snapshot data movement restore acquires the same node-selection capability as backup, that is, users could specify which nodes can/cannot run data mover pods for both backup and restore now. And users are also allowed to configure the node-selection per storage class, which is particularly helpful to the environments where a storage class are not usable by all cluster nodes.
Check issue https://github.com/vmware-tanzu/velero/issues/8186 and https://github.com/vmware-tanzu/velero/issues/8223 for more details.
#### Include/exclude policy support for resource policy
In v1.17, Velero resource policy supports `includeExcludePolicy` besides the existing `volumePolicy`. This allows users to set include/exclude filters for resources in a resource policy configmap, so that these filters are reusable among multiple backups.
Check the document https://velero.io/docs/main/resource-filtering/#creating-resource-policies:~:text=resources%3D%22*%22-,Resource%20policies,-Velero%20provides%20resource for more details.
### Runtime and dependencies
Golang runtime: 1.24.6
kopia: 0.21.1
### Limitations/Known issues
### Breaking changes
#### Deprecation of Restic
According to [Velero deprecation policy](https://github.com/vmware-tanzu/velero/blob/main/GOVERNANCE.md#deprecation-policy), backup of fs-backup under Restic path is removed in v1.17, so `--uploader-type=restic` is not a valid installation configuration anymore. This means you cannot create a backup under Restic path, but you can still restore from the previous backups under Restic path until v1.19.
#### Repository maintenance job configurations are removed from Velero server parameter
Since the repository maintenance job configurations are moved to repository maintenance job configMap, in v1.17 below Velero sever parameters are removed:
- --keep-latest-maintenance-jobs
- --maintenance-job-cpu-request
- --maintenance-job-mem-request
- --maintenance-job-cpu-limit
- --maintenance-job-mem-limit
### All Changes
* Add ConfigMap parameters validation for install CLI and server start. (#9200, @blackpiglet)
* Add priorityclasses to high priority restore list (#9175, @kaovilai)
* Introduced context-based logger for backend implementations (Azure, GCS, S3, and Filesystem) (#9168, @priyansh17)
* Fix issue #9140, add os=windows:NoSchedule toleration for Windows pods (#9165, @Lyndon-Li)
* Remove the repository maintenance job parameters from velero server. (#9147, @blackpiglet)
* Add include/exclude policy to resources policy (#9145, @reasonerjt)
* Add ConfigMap support for keepLatestMaintenanceJobs with CLI parameter fallback (#9135, @shubham-pampattiwar)
* Fix the dd and du's node affinity issue. (#9130, @blackpiglet)
* Remove the WaitUntilVSCHandleIsReady from vs BIA. (#9124, @blackpiglet)
* Add comprehensive Volume Group Snapshots documentation with workflow diagrams and examples (#9123, @shubham-pampattiwar)
* Update CSI Snapshot Data Movement doc for issue #8534, #8185 (#9113, @Lyndon-Li)
* Fix issue #8986, refactor fs-backup doc after VGDP Micro Service for fs-backup (#9112, @Lyndon-Li)
* Return error if timeout when checking server version (#9111, @ywk253100)
* Update "Default Volumes to Fs Backup" to "File System Backup (Default)" (#9105, @shubham-pampattiwar)
* Fix issue #9077, don't block backup deletion on list VS error (#9100, @Lyndon-Li)
* Bump up Kopia to v0.21.1 (#9098, @Lyndon-Li)
* Add imagePullSecrets inheritance for VGDP pod and maintenance job. (#9096, @blackpiglet)
* Avoid checking the VS and VSC status in the backup finalizing phase. (#9092, @blackpiglet)
* Fix issue #9053, Always remove selected-node annotation during PVC restore when no node mapping exists. Breaking change: Previously, the annotation was preserved if the node existed. (#9076, @Lyndon-Li)
* Enable parameterized kubelet mount path during node-agent installation (#9074, @longxiucai)
* Fix issue #8857, support third party tolerations for data mover pods (#9072, @Lyndon-Li)
* Fix issue #8813, remove restic from the valid uploader type (#9069, @Lyndon-Li)
* Fix issue #8185, allow users to disable pod volume host path mount for node-agent (#9068, @Lyndon-Li)
* Fix #8344, add the design for a mechanism to soothe creation of data mover pods for DataUpload, DataDownload, PodVolumeBackup and PodVolumeRestore (#9067, @Lyndon-Li)
* Fix #8344, add a mechanism to soothe creation of data mover pods for DataUpload, DataDownload, PodVolumeBackup and PodVolumeRestore (#9064, @Lyndon-Li)
* Add Gauge metric for BSL availability (#9059, @reasonerjt)
* Fix missing defaultVolumesToFsBackup flag output in Velero describe backup cmd (#9056, @shubham-pampattiwar)
* Allow for proper tracking of multiple hooks per container (#9048, @sseago)
* Make the backup repository controller doesn't invalidate the BSL on restart (#9046, @blackpiglet)
* Removed username/password credential handling from newConfigCredential as azidentity.UsernamePasswordCredentialOptions is reported as deprecated. (#9041, @priyansh17)
* Remove dependency with VolumeSnapshotClass in DataUpload. (#9040, @blackpiglet)
* Fix issue #8961, cancel PVB/PVR on Velero server restart (#9031, @Lyndon-Li)
* fix: update mc command in minio-deployment example (#8982, @vishal-chdhry)
* Fix issue #8957, add design for VGDP MS for fs-backup (#8979, @Lyndon-Li)
* Add BSL status check for backup/restore operations. (#8976, @blackpiglet)
* Mark BackupRepository not ready when BSL changed (#8975, @ywk253100)
* Add support for [distributed snapshotting](https://github.com/kubernetes-csi/external-snapshotter/tree/4cedb3f45790ac593ebfa3324c490abedf739477?tab=readme-ov-file#distributed-snapshotting) (#8969, @flx5)
* Fix issue #8534, refactor dm controllers to tolerate cancel request in more cases, e.g., node restart, node drain (#8952, @Lyndon-Li)
* The backup and restore VGDP affinity enhancement implementation. (#8949, @blackpiglet)
* Remove CSI VS and VSC metadata from backup. (#8946, @blackpiglet)
* Extend PVCAction itemblock plugin to support grouping PVCs under VGS label key (#8944, @shubham-pampattiwar)
* Copy security context from origin pod (#8943, @farodin91)
* Add support for configuring VGS label key (#8938, @shubham-pampattiwar)
* Add VolumeSnapshotContent into the RIA and the mustHave resource list. (#8924, @blackpiglet)
* Mounted cloud credentials should not be world-readable (#8919, @sseago)
* Warn for not found error in patching managed fields (#8902, @sseago)
#### Velero plugins now support handling volumes created by the CSI drivers of cloud providers
Versions 1.4 of the Velero plugins for AWS, Azure and GCP now support snapshotting and restoring the persistent volumes provisioned by CSI driver via the APIs of the cloud providers. With this enhancement, users can backup and restore the persistent volumes on these cloud providers without using the Velero CSI plugin. The CSI plugin will remain beta and the feature flag `EnableCSI` will be disabled by default.
For the version of the plugins and the CSI drivers they support respectively please see the table:
We've verified the functionality of Velero on IPv6 dual stack by successfully running the E2E test on IPv6 dual stack environment.
#### Refactor the controllers using Kubebuilder v3
In this release we continued our code modernization work, rewriting some controllers using Kubebuilder v3. This work is ongoing and we will continue to make progress in future releases.
#### Enhancements to E2E test cases
More test cases have been added to the E2E test suite to improve the release health.
#### Respect the cron setting of scheduled backup
The creation time is now taken into account to calculate the next run for scheduled backup.
#### Deleting BSLs also cleans up related resources
When a Backup Storage Location (BSL) is deleted, backup and Restic repository resources will also be deleted.
#### Breaking changes
Starting in v1.8, Velero will only support Kubernetes v1 CRD meaning that Velero v1.8+ will only run on Kubernetes v1.16+. Before upgrading, make sure you are running a supported Kubernetes version. For more information, see our [compatibility matrix](https://github.com/vmware-tanzu/velero#velero-compatibility-matrix).
#### Upload Progress Monitoring and Item Snapshotter
Item Snapshotter plugin API was merged. This will support both Upload Progress
monitoring and the planned Data Mover. Upload Progress monitoring PRs are
in progress for 1.9.
### All changes
* E2E test on ssr object with controller namespace mix-ups (#4521, @mqiu)
* Check whether the volume is provisioned by CSI driver or not by the annotation as well (#4513, @ywk253100)
* Initialize the labels field of `velero backup-location create` option to avoid #4484 (#4491, @ywk253100)
* Fix e2e 2500 namespaces scale test timeout problem (#4480, @mqiu)
* Add backup deletion e2e test (#4401, @danfengliu)
* Return the error when getting backup store in backup deletion controller (#4465, @reasonerjt)
* Ignore the provided port is already allocated error when restoring the LoadBalancer service (#4462, @ywk253100)
* Add rbac and annotation test cases (#4455, @mqiu)
* remove --crds-version in velero install command. (#4446, @jxun)
* Upgrade e2e test vsphere plugin (#4440, @mqiu)
* Fix e2e test failures for the inappropriate optimize of velero install (#4438, @mqiu)
* Limit backup namespaces on test resource filtering cases (#4437, @mqiu)
* Bump up Go to 1.17 (#4431, @reasonerjt)
* Added `<backup name>`-itemsnapshots.json.gz to the backup format. This file exists
when item snapshots are taken and contains an array of volume.Itemsnapshots
containing the information about the snapshots. This will not be used unless
upload progress monitoring and item snapshots are enabled and an ItemSnapshot
plugin is used to take snapshots.
Also added DownloadTargetKindBackupItemSnapshots for retrieving the signed URL to download only the `<backup name>`-itemsnapshots.json.gz part of a backup for use by
`velero backup describe`. (#4429, @dsmithuchida)
* Migrate backup sync controller from code-generator to kubebuilder. (#4423, @jxun)
* Added UploadProgressFeature flag to enable Upload Progress Monitoring and Item
Snapshotters. (#4416, @dsmithuchida)
* Added BackupWithResolvers and RestoreWithResolvers calls. Will eventually replace Backup and Restore methods.
Adds ItemSnapshotters to Backup and Restore workflows. (#4410, @dsu)
* Build for darwin-arm64 (#4409, @epk)
* Add resource filtering test cases (#4404, @mqiu)
* Fix the issue that the backup cannot be deleted after the application uninstalled (#4398, @ywk253100)
* Ignore the `provided port is already allocated` error when restoring the `NodePort` service (#4336, @ywk253100)
* Fixed an issue with the `backup-location create` command where the BSL Credential field would be set to an invalid empty SecretKeySelector when no credential details were provided. (#4322, @zubron)
* fix buggy pager func (#4306, @alaypatel07)
* Don't create a backup immediately after creating a schedule (#4281, @ywk253100)
* Fix CVE-2020-29652 and CVE-2020-26160 (#4274, @ywk253100)
* Refine tag-release.sh to align with change in release process (#4185, @reasonerjt)
* Fix plugins incompatible issue in upgrade test (#4141, @danfengliu)
* Verify group before treating resource as cohabiting (#4126, @sseago)
- No VolumeSnapshot will be left in the source namespace of the workload
- Report metrics for CSI snapshots
More improvements please refer to [CSI plugin improvement](https://github.com/vmware-tanzu/velero/issues?q=is%3Aissue+label%3A%22CSI+plugin+-+GA+-+phase1%22+is%3Aclosed)
With these improvements we'll provide official support for CSI snapshots on AKS/EKS clusters. (with CSI plugin v0.3.0)
#### Refactor the controllers using Kubebuilder v3
In this release we continued our code modernization work, rewriting some controllers using Kubebuilder v3. This work is ongoing and we will continue to make progress in future releases.
#### Optionally restore status on selected resources
Options are added to the CLI and Restore spec to control the group of resources whose status will be restored.
#### ExistingResourcePolicy in the restore API
Users can choose to overwrite or patch the existing resources during restore by setting this policy.
#### Upgrade integrated Restic version and add skip TLS validation in Restic command
Upgrade integrated Restic version, which will resolve some of the CVEs, and support skip TLS validation in Restic backup/restore.
#### Breaking changes
With bumping up the API to v1 in CSI plugin, the v0.3.0 CSI plugin will only work for Kubernetes v1.20+
### All changes
* restic: add full support for setting SecurityContext for restore init container from configMap. (#4084, @MatthieuFin)
* Add metrics backup_items_total and backup_items_errors (#4296, @tobiasgiese)
* Convert PodVolumebackup controller to the Kubebuilder framework (#4436, @fgold)
* Skip not mounted volumes when backing up (#4497, @dkeven)
* Update doc for v1.8 (#4517, @reasonerjt)
* Fix bug to make the restic prune frequency configurable (#4518, @ywk253100)
* Add E2E test of backups sync from BSL (#4545, @mqiu)
* Fix: OrderedResources in Schedules (#4550, @dbrekau)
* Skip volumes of non-running pods when backing up (#4584, @bynare)
* E2E SSR test add retry mechanism and logs (#4591, @mqiu)
* Add pushing image to GCR in github workflow to facilitate some environments that have rate limitation to docker hub, e.g. vSphere. (#4623, @jxun)
* Add existingResourcePolicy to Restore API (#4628, @shubham-pampattiwar)
* Fix E2E backup namespaces test (#4634, @qiuming-best)
* Update image used by E2E test to gcr.io (#4639, @jxun)
* Add multiple label selector support to Velero Backup and Restore APIs (#4650, @shubham-pampattiwar)
* Convert Pod Volume Restore resource/controller to the Kubebuilder framework (#4655, @ywk253100)
* Update --use-owner-references-in-backup description in velero command line. (#4660, @jxun)
* Avoid overwritten hook's exec.container parameter when running pod command executor. (#4661, @jxun)
* Support regional pv for GKE (#4680, @jxun)
* Bypass the remap CRD version plugin when v1beta1 CRD is not supported (#4686, @reasonerjt)
* Add GINKGO_SKIP to support skip specific case in e2e test. (#4692, @jxun)
* Add --pod-labels flag to velero install (#4694, @j4m3s-s)
* Enable coverage in test.sh and upload to codecov (#4704, @reasonerjt)
* Mark the BSL as "Unavailable" when gets any error and add a new field "Message" to the status to record the error message (#4719, @ywk253100)
* Support multiple skip option for E2E test (#4725, @jxun)
* Add PriorityClass to the AdditionalItems of Backup's PodAction and Restore's PodAction plugin to backup and restore PriorityClass if it is used by a Pod. (#4740, @phuongatemc)
* Insert all restore errors and warnings into restore log. (#4743, @sseago)
* Refactor schedule controller with kubebuilder (#4748, @ywk253100)
* Garbage collector now adds labels to backups that failed to delete for BSLNotFound, BSLCannotGet, BSLReadOnly reasons. (#4757, @kaovilai)
* Skip podvolumerestore creation when restore excludes pv/pvc (#4769, @half-life666)
* Add parameter for e2e test to support modify kibishii install path. (#4778, @jxun)
* Ensure the restore hook applied to new namespace based on the mapping (#4779, @reasonerjt)
* Add ability to restore status on selected resources (#4785, @RafaeLeal)
* Do not take snapshot for PV to avoid duplicated snapshotting, when CSI feature is enabled. (#4797, @jxun)
* Bump up to v1 API for CSI snapshot (#4800, @reasonerjt)
* fix: delete empty backups (#4817, @yuvalman)
* Add CSI VolumeSnapshot related metrics. (#4818, @jxun)
* Fix default-backup-ttl not work (#4831, @qiuming-best)
* Make the vsc created by backup sync controller deletable (#4832, @reasonerjt)
* Make in-progress backup/restore as failed when doing the reconcile to avoid hanging in in-progress status (#4833, @ywk253100)
* Use controller-gen to generate the deep copy methods for objects (#4838, @ywk253100)
* Update integrated Restic version and add insecureSkipTLSVerify for Restic CLI. (#4839, @jxun)
* Modify CSI VolumeSnapshot metric related code. (#4854, @jxun)
* Refactor backup deletion controller based on kubebuilder (#4855, @reasonerjt)
* Remove VolumeSnapshots created during backup when CSI feature is enabled. (#4858, @jxun)
* Convert Restic Repository resource/controller to the Kubebuilder framework (#4859, @qiuming-best)
* Add ClusterClasses to the restore priority list (#4866, @reasonerjt)
* Cleanup the .velero folder after restic done (#4872, @big-appled)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.