Kubernetes 1.34 introduced VolumeGroupSnapshot v1beta2 API and
deprecated v1beta1. Distributions running K8s 1.34+ (e.g. OpenShift
4.21+) have removed v1beta1 VGS CRDs entirely, breaking Velero's
VGS functionality on those clusters.
This change bumps external-snapshotter/client/v8 from v8.2.0 to
v8.4.0 and migrates all VGS API usage from v1beta1 to v1beta2.
The v1beta2 API is structurally compatible - the Spec-level types
(GroupSnapshotHandles, VolumeGroupSnapshotContentSource) are
unchanged. The Status-level change (VolumeSnapshotHandlePairList
replaced by VolumeSnapshotInfoList) does not affect Velero as it
does not directly consume that type.
Fixes#9694
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Fix VolumeGroupSnapshot restore on Ceph RBD
This PR fixes two related issues affecting CSI snapshot restore on Ceph RBD:
1. VolumeGroupSnapshot restore fails because Ceph RBD populates
volumeGroupSnapshotHandle on pre-provisioned VSCs, but Velero doesn't
create the required VGSC during restore.
2. CSI snapshot restore fails because VolumeSnapshotClassName is removed
from restored VSCs, preventing the CSI controller from getting
credentials for snapshot verification.
Changes:
- Capture volumeGroupSnapshotHandle during backup as VS annotation
- Create stub VGSC during restore with matching handle in status
- Look up VolumeSnapshotClass by driver and set on restored VSC
Fixes#9512Fixes#9515
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Add changelog for VGS restore fix
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Fix gofmt import order
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Add changelog for VGS restore fix
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Fix import alias corev1 to corev1api per lint config
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Fix: Add snapshot handles to existing stub VGSC and add unit tests
When multiple VolumeSnapshots from the same VolumeGroupSnapshot are
restored, they share the same VolumeGroupSnapshotHandle but have
different individual snapshot handles. This commit:
1. Fixes incomplete logic where existing VGSC wasn't updated with
new snapshot handles (addresses review feedback)
2. Fixes race condition where Create returning AlreadyExists would
skip adding the snapshot handle
3. Adds comprehensive unit tests for ensureStubVGSCExists (5 cases)
and addSnapshotHandleToVGSC (4 cases) functions
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Clean up stub VolumeGroupSnapshotContents during restore finalization
Add cleanup logic for stub VGSCs created during VolumeGroupSnapshot restore.
The stub VGSCs are temporary objects needed to satisfy CSI controller
validation during VSC reconciliation. Once all related VSCs become
ReadyToUse, the stub VGSCs are no longer needed and should be removed.
The cleanup runs in the restore finalizer controller's execute() phase.
Before deleting each VGSC, it polls until all related VolumeSnapshotContents
(correlated by snapshot handle) are ReadyToUse, with a timeout fallback.
Deletion failures and CRD-not-installed scenarios are treated as warnings
rather than errors to avoid failing the restore.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
* Fix lint: remove unused nolint directive and simplify cleanupStubVGSC return
The cleanupStubVGSC function only produces warnings (not errors), so
simplify its return signature. Also remove the now-unused nolint:unparam
directive on execute() since warnings are no longer always nil.
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
---------
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Restrict the listing of PodVolumeBackup resources to the specific
restore namespace in both the core restore controller and the pod
volume restore action plugin. This prevents "Forbidden" errors when
Velero is configured with namespace-scoped minimum privileges,
avoiding the need for cluster-scoped list permissions for
PodVolumeBackups.
Fixes: #9681
Signed-off-by: Adam Zhang <adam.zhang@broadcom.com>
Add a new Prometheus gauge metric that exposes the expected interval
between consecutive scheduled backups. This enables dynamic alerting
thresholds per schedule backups.
Signed-off-by: Quang Ngo <quang.ngo@canonical.com>
- Introduced `CACertRef` field in `ObjectStorageLocation` to reference a Secret containing the CA certificate, replacing the deprecated `CACert` field.
- Implemented validation logic to ensure mutual exclusivity between `CACert` and `CACertRef`.
- Updated BSL controller and repository provider to handle the new certificate resolution logic.
- Enhanced CLI to support automatic certificate discovery from BSL configurations.
- Added unit and integration tests to validate new functionality and ensure backward compatibility.
- Documented migration strategy for users transitioning from inline certificates to Secret-based management.
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
This commit addresses three review comments on PR #9321:
1. Keep sanitization in controller (response to @ywk253100)
- Maintaining centralized error handling for easier extension
- Azure-specific patterns detected and others passed through unchanged
2. Sanitize unavailableErrors array (@priyansh17)
- Now using sanitizeStorageError() for both unavailableErrors array
and location.Status.Message for consistency
3. Add SAS token scrubbing (@anshulahuja98)
- Scrubs Azure SAS token parameters to prevent credential leakage
- Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss
- Example: ?sig=secret becomes ?sig=***REDACTED***
Added comprehensive test coverage for SAS token scrubbing with 4 new
test cases covering various scenarios.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Azure storage errors include verbose HTTP response details and XML
in error messages, making the BSL status.message field cluttered
and hard to read. This change adds sanitization to extract only
the error code and meaningful message.
Before:
BackupStorageLocation "test" is unavailable: rpc error: code = Unknown
desc = GET https://...
RESPONSE 404: 404 The specified container does not exist.
ERROR CODE: ContainerNotFound
<?xml version="1.0"...>
After:
BackupStorageLocation "test" is unavailable: rpc error: code = Unknown
desc = ContainerNotFound: The specified container does not exist.
AWS and GCP error messages are preserved as-is since they don't
contain verbose HTTP responses.
Fixes#8368
Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>