Compare commits

..

13 Commits

Author SHA1 Message Date
lyndon-li 54783fbe28 Merge pull request #9546 from Lyndon-Li/release-1.18
Run the E2E test on kind / get-go-version (push) Failing after 1h5m30s
Run the E2E test on kind / setup-test-matrix (push) Successful in 4s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 21s
Main CI / Build (push) Has been skipped
Update base image for 1.18.0
2026-02-13 17:11:09 +08:00
Lyndon-Li cb5f56265a update base image for 1.18.0
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-13 16:43:12 +08:00
Xun Jiang/Bruce Jiang aa89713559 Merge pull request #9537 from kaovilai/1.18-9508
Run the E2E test on kind / get-go-version (push) Failing after 1m33s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 29s
Main CI / Build (push) Has been skipped
release-1.18: Fix VolumePolicy PVC phase condition filter for unbound PVCs
2026-02-12 11:41:25 +08:00
Xun Jiang/Bruce Jiang 5db4c65a92 Merge branch 'release-1.18' into 1.18-9508 2026-02-12 11:32:06 +08:00
Xun Jiang/Bruce Jiang 87db850f66 Merge pull request #9539 from Joeavaikath/1.18-9502
release-1.18: Support all glob wildcard characters in namespace validation (#9502)
2026-02-12 10:50:55 +08:00
Xun Jiang/Bruce Jiang c7631fc4a4 Merge branch 'release-1.18' into 1.18-9502 2026-02-12 10:41:26 +08:00
Wenkai Yin(尹文开) 9a37478cc2 Merge pull request #9538 from vmware-tanzu/topic/xj014661/update_migration_test_cases
Update the migration and upgrade test cases.
2026-02-12 10:19:43 +08:00
Tiger Kaovilai 5b54ccd2e0 Fix VolumePolicy PVC phase condition filter for unbound PVCs
Use typed error approach: Make GetPVForPVC return ErrPVNotFoundForPVC
when PV is not expected to be found (unbound PVC), then use errors.Is
to check for this error type. When a matching policy exists (e.g.,
pvcPhase: [Pending, Lost] with action: skip), apply the action without
error. When no policy matches, return the original error to preserve
default behavior.

Changes:
- Add ErrPVNotFoundForPVC sentinel error to pvc_pv.go
- Update ShouldPerformSnapshot to handle unbound PVCs with policies
- Update ShouldPerformFSBackup to handle unbound PVCs with policies
- Update item_backupper.go to handle Lost PVCs in tracking functions
- Remove checkPVCOnlySkip helper (no longer needed)
- Update tests to reflect new behavior

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 15:29:14 -05:00
Joseph Antony Vaikath 43b926a58b Support all glob wildcard characters in namespace validation (#9502)
* Support all glob wildcard characters in namespace validation

Expand namespace validation to allow all valid glob pattern characters
(*, ?, {}, [], ,) by replacing them with valid characters during RFC 1123
validation. The actual glob pattern validation is handled separately by
the wildcard package.

Also add validation to reject unsupported characters (|, (), !) that are
not valid in glob patterns, and update terminology from "regex" to "glob"
for clarity since this implementation uses glob patterns, not regex.

Changes:
- Replace all glob wildcard characters in validateNamespaceName
- Add test coverage for valid glob patterns in includes/excludes
- Add test coverage for unsupported characters
- Reject exclamation mark (!) in wildcard patterns
- Clarify comments and error messages about glob vs regex

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Changelog

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add documentation: glob patterns are now accepted

Signed-off-by: Joseph <jvaikath@redhat.com>

* Error message fix

Signed-off-by: Joseph <jvaikath@redhat.com>

* Remove negation glob char test

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add bracket pattern validation for namespace glob patterns

Extends wildcard validation to support square bracket patterns [] used in glob character classes. Validates bracket syntax including empty brackets, unclosed brackets, and unmatched brackets. Extracts ValidateNamespaceName as a public function to enable reuse in namespace validation logic.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Reduce scope to *, ?, [ and ]

Signed-off-by: Joseph <jvaikath@redhat.com>

* Fix tests

Signed-off-by: Joseph <jvaikath@redhat.com>

* Add namespace glob patterns documentation page

Adds dedicated documentation explaining supported glob patterns
for namespace include/exclude filtering to help users understand
the wildcard syntax.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Fix build-image Dockerfile envtest download

Replace inaccessible go.kubebuilder.io URL with setup-envtest and update envtest version to 1.33.0 to match Kubernetes v0.33.3 dependencies.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* kubebuilder binaries mv

Signed-off-by: Joseph <jvaikath@redhat.com>

* Reject brace patterns and update documentation

Add {, }, and , to unsupported characters list to explicitly reject
brace expansion patterns. Remove { from wildcard detection since these
patterns are not supported in the 1.18 release.

Update all documentation to show supported patterns inline (*, ?, [abc])
with clickable links to the detailed namespace-glob-patterns page.
Simplify YAML comments by removing non-clickable URLs.

Update tests to expect errors when brace patterns are used.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Document brace expansion as unsupported

Add {} and , to the unsupported patterns section to clarify that
brace expansion patterns like {a,b,c} are not supported.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

* Update tests to expect brace pattern rejection

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Joseph <jvaikath@redhat.com>

---------

Signed-off-by: Joseph <jvaikath@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 15:14:06 -05:00
Xun Jiang 9bfc78e769 Update the migration and upgrade test cases.
Run the E2E test on kind / get-go-version (push) Failing after 1m34s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Modify Dockerfile to fix GitHub CI action error.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-02-11 14:56:55 +08:00
Xun Jiang/Bruce Jiang c9e26256fa Bump Golang version to v1.25.7 (#9536)
Run the E2E test on kind / get-go-version (push) Failing after 1m26s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 16s
Main CI / Build (push) Has been skipped
Fix test case issue and add UT.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
2026-02-10 15:50:23 -05:00
Wenkai Yin(尹文开) 6e315c32e2 Merge pull request #9530 from Lyndon-Li/release-1.18
Run the E2E test on kind / get-go-version (push) Failing after 1m55s
Run the E2E test on kind / build (push) Has been skipped
Run the E2E test on kind / setup-test-matrix (push) Successful in 3s
Run the E2E test on kind / run-e2e-test (push) Has been skipped
Main CI / get-go-version (push) Failing after 12s
Main CI / Build (push) Has been skipped
[1.18] Move implemented design for 1.18
2026-02-09 11:18:40 +08:00
Lyndon-Li 91cbc40956 move implemented design for 1.18
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
2026-02-06 18:56:05 +08:00
904 changed files with 7444 additions and 16368 deletions
+1 -4
View File
@@ -7,10 +7,6 @@ on:
pull_request_target:
types: [opened, reopened, ready_for_review]
permissions:
contents: read
pull-requests: write
jobs:
# Automatically assigns reviewers and owner
add-reviews:
@@ -20,3 +16,4 @@ jobs:
uses: kentaro-m/auto-assign-action@v2.0.0
with:
configuration-path: ".github/auto-assignees.yml"
repo-token: "${{ secrets.GITHUB_TOKEN }}"
+1 -4
View File
@@ -8,10 +8,6 @@ on:
pull_request_target:
types: [opened, reopened, synchronize, ready_for_review]
permissions:
contents: read
pull-requests: write
jobs:
# Automatically labels PRs based on file globs in the change.
triage:
@@ -19,4 +15,5 @@ jobs:
steps:
- uses: actions/labeler@v5
with:
repo-token: "${{ secrets.GITHUB_TOKEN }}"
configuration-path: .github/labeler.yml
+1 -5
View File
@@ -5,10 +5,6 @@ on:
pull_request_target:
types: [opened, ready_for_review, reopened]
permissions:
contents: read
pull-requests: write
jobs:
auto-request-review:
name: Auto Request Review
@@ -17,5 +13,5 @@ jobs:
- name: Request a PR review based on files types/paths, and/or groups the author belongs to
uses: necojackarc/auto-request-review@v0.13.0
with:
config: .github/auto-assignees.yml
token: ${{ secrets.GITHUB_TOKEN }}
config: .github/auto-assignees.yml
+4 -4
View File
@@ -71,7 +71,7 @@ jobs:
run: |
echo "Building MinIO image from Bitnami Dockerfile..."
git clone --depth 1 https://github.com/bitnami/containers.git /tmp/bitnami-containers
cd /tmp/bitnami-containers/bitnami/minio/2026/debian-12
cd /tmp/bitnami-containers/bitnami/minio/2025/debian-12
docker build -t bitnami/minio:local .
docker save bitnami/minio:local > ${{ github.workspace }}/minio-image.tar
# Create json of k8s versions to test
@@ -95,9 +95,9 @@ jobs:
\"k8s\":$(wget -q -O - "https://hub.docker.com/v2/namespaces/kindest/repositories/node/tags?page_size=50" | grep -o '"name": *"[^"]*' | grep -o '[^"]*$' | grep -v -E "alpha|beta" | grep -E "v[1-9]\.(2[5-9]|[3-9][0-9])" | awk -F. '{if(!a[$1"."$2]++)print $1"."$2"."$NF}' | sort -r | sed s/v//g | jq -R -c -s 'split("\n")[:-1]'),\
\"labels\":[\
\"Basic && (ClusterResource || NodePort || StorageClass)\", \
\"ResourceFiltering && !FSBackup\", \
\"ResourceFiltering && !Restic\", \
\"ResourceModifier || (Backups && BackupsSync) || PrivilegesMgmt || OrderedResources\", \
\"(NamespaceMapping && Single && FSBackup) || (NamespaceMapping && Multiple && FSBackup)\"\
\"(NamespaceMapping && Single && Restic) || (NamespaceMapping && Multiple && Restic)\"\
]}" >> $GITHUB_OUTPUT
# Run E2E test against all Kubernetes versions on kind
@@ -136,7 +136,7 @@ jobs:
- uses: engineerd/setup-kind@v0.6.2
with:
skipClusterLogsExport: true
version: "v0.32.0"
version: "v0.27.0"
image: "kindest/node:v${{ matrix.k8s }}"
- name: Fetch built CLI
id: cli-cache
+1 -1
View File
@@ -25,7 +25,7 @@ jobs:
version=$(grep '^go ' go.mod | awk '{print $2}' | cut -d. -f1-2)
else
goDirectiveVersion=$(grep '^go ' go.mod | awk '{print $2}')
toolChainVersion=$(grep '^toolchain ' go.mod | awk '{print $2}' | sed 's/^go//')
toolChainVersion=$(grep '^toolchain ' go.mod | awk '{print $2}')
version=$(printf "%s\n%s\n" "$goDirectiveVersion" "$toolChainVersion" | sort -V | tail -n1)
fi
+1 -1
View File
@@ -22,7 +22,7 @@ jobs:
uses: actions/checkout@v6
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@57a97c7e7821a5776cebc9bb87c984fa69cba8f1
uses: aquasecurity/trivy-action@master
with:
image-ref: 'docker.io/velero/${{ matrix.images }}:${{ matrix.versions }}'
severity: 'CRITICAL,HIGH,MEDIUM'
+1 -1
View File
@@ -24,7 +24,7 @@ jobs:
- name: Make ci
run: make ci
- name: Upload test coverage
uses: codecov/codecov-action@v6
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: coverage.out
+1 -2
View File
@@ -7,7 +7,6 @@ on:
- 'release-**'
paths:
- 'Dockerfile'
- 'Dockerfile-Windows'
jobs:
build:
@@ -33,6 +32,6 @@ jobs:
# by push, so BRANCH and TAG are empty by default. docker-push.sh will
# only build Velero image without pushing.
- name: Make Velero container without pushing to registry.
if: github.repository == 'velero-io/velero'
if: github.repository == 'vmware-tanzu/velero'
run: |
./hack/docker-push.sh
-93
View File
@@ -1,93 +0,0 @@
name: Pull Request File Path Check
on: [pull_request]
jobs:
filepath-check:
name: Check for invalid characters in file paths
runs-on: ubuntu-latest
steps:
- name: Check out the code
uses: actions/checkout@v6
- name: Validate file paths for Go module compatibility
run: |
# Go's module zip rejects filenames containing certain characters.
# See golang.org/x/mod/module fileNameOK() for the full specification.
#
# Allowed ASCII: letters, digits, and: !#$%&()+,-.=@[]^_{}~ and space
# Allowed non-ASCII: unicode letters only
# Rejected: " ' * < > ? ` | / \ : and any non-letter unicode (control
# chars, format chars like U+200E LEFT-TO-RIGHT MARK, etc.)
#
# This check catches issues like the U+200E incident in PR #9552.
EXIT_STATUS=0
git ls-files -z | python3 -c "
import sys, unicodedata
data = sys.stdin.buffer.read()
files = data.split(b'\x00')
# Characters explicitly rejected by Go's fileNameOK
# (path separators / and \ are inherent to paths so we check per-element)
bad_ascii = set('\"' + \"'\" + '*<>?\`|:')
allowed_ascii = set('!#$%&()+,-.=@[]^_{}~ ')
def is_ok(ch):
if ch.isascii():
return ch.isalnum() or ch in allowed_ascii
return ch.isalpha()
bad_files = [] # list of (original_path, clean_path, char_desc)
for f in files:
if not f:
continue
try:
name = f.decode('utf-8')
except UnicodeDecodeError:
print(f'::error::Non-UTF-8 bytes in filename: {f!r}')
bad_files.append((repr(f), None, 'non-UTF-8 bytes'))
continue
# Check each path element (split on /)
for element in name.split('/'):
for ch in element:
if not is_ok(ch):
cp = ord(ch)
char_name = unicodedata.name(ch, f'U+{cp:04X}')
char_desc = f'U+{cp:04X} ({char_name})'
# Build cleaned path by stripping invalid chars
clean = '/'.join(
''.join(c for c in elem if is_ok(c))
for elem in name.split('/')
)
print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}')
bad_files.append((name, clean, char_desc))
break
if bad_files:
print()
print('The following files have characters that are invalid in Go module zip archives:')
print()
for original, clean, desc in bad_files:
print(f' {original} — {desc}')
print()
print('To fix, rename the files to remove the problematic characters:')
print()
for original, clean, desc in bad_files:
if clean:
print(f' mv \"{original}\" \"{clean}\" && git add \"{clean}\"')
print(f' # or: git mv \"{original}\" \"{clean}\"')
else:
print(f' # {original} — cannot auto-suggest rename (non-UTF-8)')
print()
print('See https://github.com/velero-io/velero/pull/9552 for context.')
sys.exit(1)
else:
print('All file paths are valid for Go module zip.')
" || EXIT_STATUS=1
exit $EXIT_STATUS
+1 -1
View File
@@ -18,7 +18,7 @@ jobs:
name: Checkout
- name: Verify .goreleaser.yml and try a dryrun release.
if: github.repository == 'velero-io/velero'
if: github.repository == 'vmware-tanzu/velero'
run: |
CHANGELOG=$(ls changelogs | sort -V -r | head -n 1)
GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }} \
+1 -1
View File
@@ -28,5 +28,5 @@ jobs:
- name: Linter check
uses: golangci/golangci-lint-action@v9
with:
version: v2.12.0
version: v2.5.0
args: --verbose
-4
View File
@@ -5,10 +5,6 @@ on:
issue_comment:
types: [created]
permissions:
issues: write
pull-requests: write
jobs:
execute:
runs-on: ubuntu-latest
+1 -1
View File
@@ -28,7 +28,7 @@ jobs:
# Only try to publish the container image from the root repo; forks don't have permission to do so and will always get failures.
- name: Publish container image
if: github.repository == 'velero-io/velero'
if: github.repository == 'vmware-tanzu/velero'
run: |
docker login -u ${{ secrets.DOCKER_USER }} -p ${{ secrets.DOCKER_PASSWORD }}
+2 -2
View File
@@ -45,14 +45,14 @@ jobs:
- name: Test
run: make test
- name: Upload test coverage
uses: codecov/codecov-action@v6
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: coverage.out
verbose: true
# Only try to publish the container image from the root repo; forks don't have permission to do so and will always get failures.
- name: Publish container image
if: github.repository == 'velero-io/velero'
if: github.repository == 'vmware-tanzu/velero'
run: |
sudo swapoff -a
sudo rm -f /mnt/swapfile
+1 -1
View File
@@ -20,4 +20,4 @@ jobs:
days-before-pr-close: -1
# Only issues made after Feb 09 2021.
start-date: "2021-09-02T00:00:00"
exempt-issue-labels: "Epic,Area/CLI,Area/Cloud/AWS,Area/Cloud/Azure,Area/Cloud/GCP,Area/Cloud/vSphere,Area/CSI,Area/Design,Area/Documentation,Area/Plugins,Bug,Enhancement/User,kind/requirement,kind/refactor,kind/tech-debt,limitation,Needs investigation,Needs triage,Needs Product,P0 - Hair on fire,P1 - Important,P2 - Long-term important,P3 - Wouldn't it be nice if...,Product Requirements,release-blocker,Security,backlog"
exempt-issue-labels: "Epic,Area/CLI,Area/Cloud/AWS,Area/Cloud/Azure,Area/Cloud/GCP,Area/Cloud/vSphere,Area/CSI,Area/Design,Area/Documentation,Area/Plugins,Bug,Enhancement/User,kind/requirement,kind/refactor,kind/tech-debt,limitation,Needs investigation,Needs triage,Needs Product,P0 - Hair on fire,P1 - Important,P2 - Long-term important,P3 - Wouldn't it be nice if...,Product Requirements,Restic - GA,Restic,release-blocker,Security,backlog"
+1 -4
View File
@@ -317,7 +317,7 @@ linters:
- errchkjson
- exptostd
- ginkgolinter
#- goconst # Disable goconst for now, as it reports a lot of false positives. We can enable it later after refactoring the codebase to reduce the number of string literals.
- goconst
- goheader
- goprintffuncname
- gosec
@@ -383,9 +383,6 @@ linters:
linters:
- dupword
text: "bucket"
- text: "non-constant format string"
linters:
- govet
generated: lax
presets:
+1 -1
View File
@@ -55,7 +55,7 @@ checksum:
name_template: 'CHECKSUM'
release:
github:
owner: velero-io
owner: vmware-tanzu
name: velero
draft: true
prerelease: auto
+1 -22
View File
@@ -1,23 +1,3 @@
# Velero Code of Conduct
Velero is a [Cloud Native Computing Foundation](https://www.cncf.io/) sandbox
project. As a CNCF project, the Velero community follows the
[**CNCF Code of Conduct**](https://github.com/cncf/foundation/blob/main/code-of-conduct.md).
The text below is the project's adopted Code of Conduct, based on the
[Contributor Covenant](https://www.contributor-covenant.org/), and is
substantively aligned with the CNCF Code of Conduct. Where any conflict exists,
the CNCF Code of Conduct prevails.
Instances of unacceptable behavior may be reported to the CNCF Code of
Conduct Committee at [conduct@cncf.io](mailto:conduct@cncf.io). For more
detailed instructions on how to submit a report, including how to submit a
report anonymously, please see the CNCF
[Incident Resolution Procedures](https://github.com/cncf/foundation/blob/main/code-of-conduct/coc-incident-resolution-procedures.md).
You can expect a response within three business days.
---
# Contributor Covenant Code of Conduct
## Our Pledge
@@ -79,8 +59,7 @@ representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the CNCF Code of Conduct Committee at
[conduct@cncf.io](mailto:conduct@cncf.io).
reported to the community leaders responsible for enforcement at oss-coc@vmware.com.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
+31 -4
View File
@@ -13,7 +13,7 @@
# limitations under the License.
# Velero binary build section
FROM --platform=$BUILDPLATFORM golang:1.26-trixie AS velero-builder
FROM --platform=$BUILDPLATFORM golang:1.25.7-bookworm AS velero-builder
ARG GOPROXY
ARG BIN
@@ -48,11 +48,38 @@ RUN mkdir -p /output/usr/bin && \
-ldflags "${LDFLAGS}" ${PKG}/cmd/velero-helper && \
go clean -modcache -cache
# Velero image packing section
FROM paketobuildpacks/ubuntu-noble-run-tiny:latest
# Restic binary build section
FROM --platform=$BUILDPLATFORM golang:1.25.7-bookworm AS restic-builder
LABEL maintainer="Xun Jiang <xun.jiang@broadcom.com>"
ARG GOPROXY
ARG BIN
ARG TARGETOS
ARG TARGETARCH
ARG TARGETVARIANT
ARG RESTIC_VERSION
ENV CGO_ENABLED=0 \
GO111MODULE=on \
GOPROXY=${GOPROXY} \
GOOS=${TARGETOS} \
GOARCH=${TARGETARCH} \
GOARM=${TARGETVARIANT}
COPY . /go/src/github.com/vmware-tanzu/velero
RUN mkdir -p /output/usr/bin && \
export GOARM=$(echo "${GOARM}" | cut -c2-) && \
/go/src/github.com/vmware-tanzu/velero/hack/build-restic.sh && \
go clean -modcache -cache
# Velero image packing section
FROM paketobuildpacks/run-jammy-tiny:0.2.97
LABEL maintainer="Xun Jiang <jxun@vmware.com>"
COPY --from=velero-builder /output /
COPY --from=restic-builder /output /
USER cnb:cnb
+1 -1
View File
@@ -15,7 +15,7 @@
ARG OS_VERSION=1809
# Velero binary build section
FROM --platform=$BUILDPLATFORM golang:1.26-trixie AS velero-builder
FROM --platform=$BUILDPLATFORM golang:1.25.7-bookworm AS velero-builder
ARG GOPROXY
ARG BIN
+4 -2
View File
@@ -105,6 +105,8 @@ see: https://velero.io/docs/main/build-from-source/#making-images-and-updating-v
endef
# comma cannot be escaped and can only be used in Make function arguments by putting into variable
comma=,
# The version of restic binary to be downloaded
RESTIC_VERSION ?= 0.15.0
CLI_PLATFORMS ?= linux-amd64 linux-arm linux-arm64 darwin-amd64 darwin-arm64 windows-amd64 linux-ppc64le linux-s390x
BUILD_OUTPUT_TYPE ?= docker
@@ -211,7 +213,6 @@ shell: build-dirs build-env
-v "$$(pwd)/.go/std/$(GOOS)/$(GOARCH):/usr/local/go/pkg/$(GOOS)_$(GOARCH)_static:delegated" \
-v "$$(pwd)/.go/go-build:/.cache/go-build:delegated" \
-v "$$(pwd)/.go/golangci-lint:/.cache/golangci-lint:delegated" \
-v "$$(pwd)/.go/goimports:/.cache/goimports:delegated" \
-w /github.com/vmware-tanzu/velero \
$(BUILDER_IMAGE) \
/bin/sh $(CMD)
@@ -259,6 +260,7 @@ container-linux:
--build-arg=GIT_SHA=$(GIT_SHA) \
--build-arg=GIT_TREE_STATE=$(GIT_TREE_STATE) \
--build-arg=REGISTRY=$(REGISTRY) \
--build-arg=RESTIC_VERSION=$(RESTIC_VERSION) \
--provenance=false \
--sbom=false \
-f $(VELERO_DOCKERFILE) .
@@ -343,7 +345,7 @@ update-crd:
build-dirs:
@mkdir -p _output/bin/$(GOOS)/$(GOARCH)
@mkdir -p .go/src/$(PKG) .go/pkg .go/bin .go/std/$(GOOS)/$(GOARCH) .go/go-build .go/golangci-lint .go/goimports
@mkdir -p .go/src/$(PKG) .go/pkg .go/bin .go/std/$(GOOS)/$(GOARCH) .go/go-build .go/golangci-lint
build-env:
@# if we have overridden the value for the build-image Dockerfile,
+6 -21
View File
@@ -1,7 +1,7 @@
![100]
[![Build Status][1]][2] [![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/3811/badge)](https://bestpractices.coreinfrastructure.org/projects/3811)
![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/velero-io/velero)
![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/vmware-tanzu/velero)
## Overview
@@ -52,29 +52,14 @@ Velero supports IPv4, IPv6, and dual stack environments. Support for this was te
The Velero maintainers are continuously working to expand testing coverage, but are not able to test every combination of Velero and supported Kubernetes versions for each Velero release. The table above is meant to track the current testing coverage and the expected supported Kubernetes versions for each Velero version.
If you are interested in using a different version of Kubernetes with a given Velero version, we'd recommend that you perform testing before installing or upgrading your environment. For full information around capabilities within a release, also see the Velero [release notes](https://github.com/velero-io/velero/releases) or Kubernetes [release notes](https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG). See the Velero [support page](https://velero.io/docs/latest/support-process/) for information about supported versions of Velero.
If you are interested in using a different version of Kubernetes with a given Velero version, we'd recommend that you perform testing before installing or upgrading your environment. For full information around capabilities within a release, also see the Velero [release notes](https://github.com/vmware-tanzu/velero/releases) or Kubernetes [release notes](https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG). See the Velero [support page](https://velero.io/docs/latest/support-process/) for information about supported versions of Velero.
For each release, Velero maintainers run the test to ensure the upgrade path from n-2 minor release. For example, before the release of v1.10.x, the test will verify that the backup created by v1.9.x and v1.8.x can be restored using the build to be tagged as v1.10.x.
## Cloud Native Computing Foundation
<!-- remove sandbox once promoted -->
Velero is a [Cloud Native Computing Foundation](https://www.cncf.io/) sandbox project.
<p align="center">
<a href="https://www.cncf.io/">
<img src="https://raw.githubusercontent.com/cncf/artwork/main/other/cncf/horizontal/color/cncf-color.svg"
alt="Cloud Native Computing Foundation logo" width="300"/>
</a>
</p>
Copyright Contributors to Velero, established as Velero a Series of LF Projects, LLC.
For website terms of use, trademark policy and other project policies please see
<https://lfprojects.org/policies/>.
[1]: https://github.com/velero-io/velero/workflows/Main%20CI/badge.svg
[2]: https://github.com/velero-io/velero/actions?query=workflow%3A"Main+CI"
[4]: https://github.com/velero-io/velero/issues
[6]: https://github.com/velero-io/velero/releases
[1]: https://github.com/vmware-tanzu/velero/workflows/Main%20CI/badge.svg
[2]: https://github.com/vmware-tanzu/velero/actions?query=workflow%3A"Main+CI"
[4]: https://github.com/vmware-tanzu/velero/issues
[6]: https://github.com/vmware-tanzu/velero/releases
[9]: https://kubernetes.io/docs/setup/
[10]: https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-with-homebrew-on-macos
[11]: https://kubernetes.io/docs/tasks/tools/install-kubectl/#tabset-1
+7 -1
View File
@@ -52,7 +52,7 @@ git_sha = str(local("git rev-parse HEAD", quiet = True, echo_off = True)).strip(
tilt_helper_dockerfile_header = """
# Tilt image
FROM golang:1.26 as tilt-helper
FROM golang:1.25.7 as tilt-helper
# Support live reloading with Tilt
RUN wget --output-document /restart.sh --quiet https://raw.githubusercontent.com/windmilleng/rerun-process-wrapper/master/restart.sh && \
@@ -103,6 +103,11 @@ local_resource(
deps = ["internal", "pkg/cmd"],
)
local_resource(
"restic_binary",
cmd = 'cd ' + '.' + ';mkdir -p _tiltbuild/restic; BIN=velero GOOS=linux GOARCH=amd64 GOARM="" RESTIC_VERSION=0.13.1 OUTPUT_DIR=_tiltbuild/restic ./hack/build-restic.sh',
)
# Note: we need a distro with a bash shell to exec into the Velero container
tilt_dockerfile_header = """
FROM ubuntu:22.04 as tilt
@@ -113,6 +118,7 @@ WORKDIR /
COPY --from=tilt-helper /start.sh .
COPY --from=tilt-helper /restart.sh .
COPY velero .
COPY restic/restic /usr/bin/restic
"""
dockerfile_contents = "\n".join([
-1
View File
@@ -1 +0,0 @@
Include InitContainer configured as Sidecars when validating the existence of the target containers configured for the Backup Hooks
@@ -1 +0,0 @@
Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver by creating stub VolumeGroupSnapshotContent during restore and looking up VolumeSnapshotClass by driver for credential support
-1
View File
@@ -1 +0,0 @@
Add block data mover design for block level incremental backup by integrating with Kubernetes CBT
-1
View File
@@ -1 +0,0 @@
Fix issue #9343, include PV topology to data mover pod affinities
-1
View File
@@ -1 +0,0 @@
Fix issue #9496, support customized host os
-1
View File
@@ -1 +0,0 @@
Add custom action type to volume policies
-1
View File
@@ -1 +0,0 @@
If BIA return updateObj with SkipFromBackupAnnotation, treat it as skip the resource from backup.
-1
View File
@@ -1 +0,0 @@
Issue #9544: Add test coverage for S3 bucket name in MRAP ARN notation and fix bucket validation to accept ARN format
-1
View File
@@ -1 +0,0 @@
Wildcard namespaces: Log warning on empty resolution
-1
View File
@@ -1 +0,0 @@
Fix issue #9475, use node-selector instead of nodName for generic restore
-1
View File
@@ -1 +0,0 @@
Fix issue #9460, flush buffer before data mover completes
-1
View File
@@ -1 +0,0 @@
Add schedule_expected_interval_seconds metric for dynamic backup alerting thresholds (#9559)
-1
View File
@@ -1 +0,0 @@
Add ephemeral storage limit and request support for data mover and maintenance job
@@ -1 +0,0 @@
Fix DBR stuck when CSI snapshot no longer exists in cloud provider
-1
View File
@@ -1 +0,0 @@
update go-hclog to current version
-1
View File
@@ -1 +0,0 @@
Add check for file extraction from tarball.
-1
View File
@@ -1 +0,0 @@
Implement original VolumeSnapshotContent deletion for legacy backups
-1
View File
@@ -1 +0,0 @@
Fix issue #9626, let go for uninitialized repo under readonly mode
@@ -1 +0,0 @@
Fix issue #9636, fix configmap lookup in non-default namespaces
-1
View File
@@ -1 +0,0 @@
Fix issue #9641, Remove redundant ReadyToUse polling in CSI VolumeSnapshotContent delete plugin
-1
View File
@@ -1 +0,0 @@
Fix service restore with null healthCheckNodePort in last-applied-configuration label
@@ -1 +0,0 @@
Fix issue #9658, Honor --stderrthreshold when --logtostderr is enabled
-1
View File
@@ -1 +0,0 @@
Fix issue #9659, in the case that PVB/PVR/DU/DD is cancelled before the data path is really started, call EndEvent to prevent data mover pod from crashing because of delay event distribution
@@ -1 +0,0 @@
Fix issue #9666, fix node-agent node detection in multiple instances scenario
-1
View File
@@ -1 +0,0 @@
Fix issue #9470, remove restic from repository
-1
View File
@@ -1 +0,0 @@
Fix issue #9469, remove restic for uploader
@@ -1 +0,0 @@
Fix issue #9681, fix restores and podvolumerestores list options to only list in installed namespace
-1
View File
@@ -1 +0,0 @@
Fix issue #9428, increase repo maintenance history queue length from 3 to 25
-1
View File
@@ -1 +0,0 @@
Fix wildcard expansion when includes is empty and excludes has wildcards
-1
View File
@@ -1 +0,0 @@
Enhance backup deletion logic to handle tarball download failures
@@ -1 +0,0 @@
Bump external-snapshotter to v8.4.0 and migrate VolumeGroupSnapshot API from v1beta1 to v1beta2 for Kubernetes 1.34+ compatibility
-1
View File
@@ -1 +0,0 @@
Fix issue #9699, add a 2-second gap between temporary CSI VolumeSnapshotContent create and delete operations
-1
View File
@@ -1 +0,0 @@
Update Debian base image from bookworm to trixie
@@ -1 +0,0 @@
Fix issue #9703, fix CSI PVC Backup Plugin list options to only list in installed namespace
-1
View File
@@ -1 +0,0 @@
perf: better string concatenation
-1
View File
@@ -1 +0,0 @@
Fix issue #9709, add interfaces for CBT service and CBT bitmap
-1
View File
@@ -1 +0,0 @@
Fix issue #9723, extend Unified Repo Interface to support block uploader
-1
View File
@@ -1 +0,0 @@
Remove Restic build from Dockerfile, Makefile and Tiltfile.
-1
View File
@@ -1 +0,0 @@
Remove Restic code path from PodVolumeRestore.
-1
View File
@@ -1 +0,0 @@
Add CBT bitmap implementation for block data mover
-1
View File
@@ -1 +0,0 @@
fix: lint permission issue
@@ -1 +0,0 @@
Fix DataUploadDeleteAction creating snapshot-info ConfigMaps labeled with the wrong backup name when a DataUpload CR from another backup is incidentally captured in the backup tarball, which caused Kopia snapshots to be leaked in object storage on expiry of the real owning backup.
-1
View File
@@ -1 +0,0 @@
Restores from backups not in a completed or partially failed phase are now rejected.
-1
View File
@@ -1 +0,0 @@
Uploader interface for block data mover
-1
View File
@@ -1 +0,0 @@
Add Kopia repo snapshot operations for block data mover
-1
View File
@@ -1 +0,0 @@
Add metadata operation to Kopia repo for block data mover
-1
View File
@@ -1 +0,0 @@
Data path naming adjustment for block data mover
@@ -1 +0,0 @@
Fix issue #9811, add interface to support ClusterScopedFilterPolicy and NamespacedFilterPolicy
@@ -1 +0,0 @@
Fix issue #9812, validate ClusterScopedFilterPolicy and NamespacedFilterPolicy incompatible with legacy filters
-1
View File
@@ -1 +0,0 @@
Replace vmware-tanzu/velero GitHub org references with velero-io/velero (#9844)
-1
View File
@@ -1 +0,0 @@
Fix issue #9823, add incremental aware object writer for block data mover
-1
View File
@@ -1 +0,0 @@
Implement the CBT service.
@@ -1 +0,0 @@
Fix issue #9813, add validations for ClusterScopedFilterPolicy
@@ -1 +0,0 @@
Fix issue #9814, add validations for NamespacedFilterPolicies
-1
View File
@@ -1 +0,0 @@
Add the Write implementation for incremental aware object writer
-1
View File
@@ -1 +0,0 @@
Remove restic command package
-1
View File
@@ -1 +0,0 @@
Enhance backup exposer for block data mover
-1
View File
@@ -1 +0,0 @@
Add cbt service parameters to node-agent-config for block data mover
-1
View File
@@ -1 +0,0 @@
Remove Restic cases and workflow from E2E
-1
View File
@@ -1 +0,0 @@
Add totalSize to repo snapshot operation; and pin unified repo snapshots to Kopia repository and so pin them with Velero backup lifecycle
@@ -1 +0,0 @@
Fix issue #9816, add cli support for backup with ClusterScopedFilterPolicy and NamespacedFilterPolicies
-1
View File
@@ -1 +0,0 @@
Make ToSystemAffinity deterministic by sorting MatchLabels keys to avoid spurious affinity spec diffs and restarts
@@ -35,7 +35,7 @@ func main() {
for {
<-ticker.C
if done() {
fmt.Println("All PodVolumeRestores are done")
fmt.Println("All restic restores are done")
err := removeFolder()
if err != nil {
fmt.Println(err)
@@ -65,7 +65,6 @@ func done() bool {
doneFile := filepath.Join("/restores", child.Name(), ".velero", os.Args[1])
// #nosec G304,G703 -- doneFile is generated from internal logic and not user-controllable.
if _, err := os.Stat(doneFile); os.IsNotExist(err) {
fmt.Printf("The filesystem restore done file %s is not found yet. Retry later.\n", doneFile)
return false
@@ -69,7 +69,9 @@ spec:
- ""
type: string
resticIdentifier:
description: Deprecated
description: |-
ResticIdentifier is the full restic-compatible string for identifying
this repository. This field is only used when RepositoryType is "restic".
type: string
volumeNamespace:
description: |-
File diff suppressed because one or more lines are too long
@@ -1,839 +0,0 @@
# Fine Grained Backup Filters via Resource Policies
## Glossary & Abbreviation
**Backup Filter**: The mechanism in Velero that determines which Kubernetes resources are collected from the cluster and written into the backup archive. Backup filters currently operate on four dimensions: namespace, resource type, label, and cluster scope.
**Global Filter**: A filter that applies uniformly across all namespaces in a backup. All existing Velero backup filters are global filters.
**Namespace-Scoped Filter**: A filter that applies only within specific namespaces, overriding the global filter for those namespaces. This is the capability introduced by this design.
**ClusterScopedFilterPolicy**: A global filter for cluster-scoped resources that allows per-kind label selectors and name patterns, functioning similarly to `NamespacedFilterPolicy` but applied to cluster-scoped resources globally.
**Resource Filter**: A filter rule that pairs one or more resource kinds with their own label selector and/or name patterns. Multiple resource filters within a namespace-scoped policy allow different filtering criteria for different resource types.
**Resource Name Filter**: A filter that matches individual resource instances by their metadata.name, using glob patterns. This is a new filter dimension introduced by this design.
**Resource Policy**: An existing Velero mechanism where backup behavior rules are defined in a ConfigMap and referenced from `BackupSpec.ResourcePolicy`. Currently used for volume policies and global include/exclude policies.
## Background
Velero's backup filter system allows users to specify which resources to include or exclude from a backup. The filters operate on three dimensions:
1. **Namespace**`IncludedNamespaces`/`ExcludedNamespaces` select which namespaces to back up
2. **Resource Type**`IncludedResources`/`ExcludedResources` (or the newer scoped variants `Included/ExcludedClusterScopedResources`, `Included/ExcludedNamespaceScopedResources`) select which Kubernetes resource types to back up
3. **Labels**`LabelSelector`/`OrLabelSelectors` filter individual objects by their labels
All three dimensions are applied **globally** — the same resource type filter, the same label selector, and the same namespace list apply uniformly throughout the entire backup operation. Specifically:
- In `item_collector.go`, the `ResourceIncludesExcludes.ShouldInclude()` check is a single global check applied to every resource type across all namespaces.
- In `listResourceByLabelsPerNamespace()`, the same `LabelSelector` is passed to every Kubernetes API list call regardless of namespace.
- There is no mechanism to filter resources by their individual `metadata.name`.
This creates three critical gaps for common backup scenarios:
**Gap 1: Different resource needs per namespace.** When multiple applications share the same cluster, different namespaces often require different backup strategies. For example, a namespace running a database workload may need all resource types backed up, while a namespace running a stateless frontend may only need Deployments, ConfigMaps, and Services. Setting `IncludedResources: [configmaps]` means *all* ConfigMaps in *all* included namespaces — you cannot say "only ConfigMaps in namespace-a but everything in namespace-b."
**Gap 2: Same resource type, different workloads.** Resources of the same type (e.g., ConfigMaps or Secrets) in the same namespace may belong to different workloads. For instance, a namespace may contain `app-config`, `app-secret`, `monitoring-config`, and `monitoring-secret`. Without name-based filtering, you cannot selectively back up only the `app-*` resources — the only option is label-based selection, which requires workloads to have been pre-labeled appropriately.
**Gap 3: Different kinds need different selectors.** Within a single namespace, different resource types may belong to different workloads with different labels. For example, Deployments labeled `app=workload-1` and StatefulSets labeled `app=workload-2` in the same namespace. The current single-label-selector-per-namespace model cannot express this — the label selector applies identically to all resource types.
## Goals
- Extend the `ResourcePolicies` ConfigMap format with a `namespacedFilterPolicies` section that allows per-namespace, per-kind resource filtering with independent label selectors and name patterns for each resource type
- Extend the `ResourcePolicies` ConfigMap format with a `clusterScopedFilterPolicy` section that allows per-kind resource filtering with independent label selectors and name patterns for cluster-scoped resources globally
- Support resource name filtering by glob patterns using the same `gobwas/glob` library that Velero uses for namespace patterns, ensuring consistency across the codebase
- Support per-kind label selectors, so that different resource types within the same namespace can be filtered with different labels
- Maintain full backward compatibility — existing backups with no `namespacedFilterPolicies` behave exactly as they do today
- Define clear precedence rules for how per-namespace filters interact with global filters
- Add corresponding validation within the Resource Policies validation pipeline using existing Velero wildcard validation functions
- Update `velero backup describe` output to display per-namespace filter information when present
- Ensure the restore process works correctly with backups produced by namespace-scoped filters, without requiring restore-side code changes in the initial phase
## Non-Goals
- Adding namespace-scoped filters to `RestoreSpec` or the restore pipeline is not part of the initial implementation. Restore from a namespace-filtered backup works automatically because the restore process reads whatever is in the backup archive. Restore-side namespace filters will be addressed in a follow-up.
- Changing existing `BackupSpec` fields (`IncludedResources`, `LabelSelector`, etc.) or adding new CRD fields is explicitly avoided by this design.
- Supporting regex patterns for resource names is not included. Glob patterns (already used throughout Velero) are sufficient and consistent.
- Modifying the plugin `ResourceSelector` system (`AppliesTo()` / `resolvedAction.ShouldUse()`) is not part of this design.
- CLI flags for inline specification of namespace-scoped filters are not part of the initial implementation. The configuration is expressed in the ResourcePolicy ConfigMap YAML.
## Architecture of Namespace-Scoped Filters
### Configuration Model
The namespace-scoped filters and fine-grained global filters are defined in the same ResourcePolicy ConfigMap that is already referenced by `BackupSpec.ResourcePolicy`. The YAML format is extended with two new top-level keys: `clusterScopedFilterPolicy` and `namespacedFilterPolicies`
```yaml
version: v1
volumePolicies:
# existing volume policies (unchanged)
- conditions:
capacity: "0,100Gi"
action:
type: skip
includeExcludePolicy:
# existing global include/exclude policy (unchanged)
includedNamespaceScopedResources:
- configmaps
clusterScopedFilterPolicy:
# NEW: global overrides for cluster-scoped resources
resourceFilters:
- kinds: [ClusterRole, ClusterRoleBinding]
names: ["my-app-*"]
- kinds: [CustomResourceDefinition]
labelSelector:
app: my-app
namespacedFilterPolicies:
# NEW: per-namespace filter overrides
- namespaces:
- ns-a
resourceFilters:
- kinds: [ConfigMap, Secret, Deployment]
labelSelector:
app: my-app
- namespaces:
- ns-b
resourceFilters:
- kinds: [Deployment]
names: [app-1, app-2]
- kinds: [ConfigMap]
labelSelector:
app: my-service
```
All four sections coexist in the same ConfigMap. They are independent — `volumePolicies` handles volume backup strategy, `includeExcludePolicy` handles global resource type filtering, `clusterScopedFilterPolicy` handles cluster-scoped resource filtering by kind/name/label, and `namespacedFilterPolicies` handles per-namespace, per-kind overrides.
### The `resourceFilters` Model
Each `namespacedFilterPolicies` entry targets one or more namespaces and contains a `resourceFilters` array. Each entry in `resourceFilters` pairs one or more resource kinds with their own label selector and name patterns:
```yaml
namespacedFilterPolicies:
- namespaces: [ns-a]
resourceFilters:
- kinds: [ConfigMap, Secret] # these kinds share a selector
labelSelector: {app: my-app}
names: ["app-*"]
- kinds: [Deployment] # this kind has its own selector
names: [workload-1, workload-2]
- kinds: [StatefulSet] # this kind has no extra filtering
```
This model has one way to express filters — there is no ambiguity about how to structure the configuration. Only resource kinds listed in `resourceFilters` entries are included in the backup for the matched namespaces; unlisted kinds are implicitly excluded.
#### Catch-All Resource Filter (Empty `kinds` or `["*"]`)
A `ResourceFilter` entry with an empty (or omitted) `kinds` field, or a field explicitly set to `["*"]`, acts as a **catch-all**. Its `labelSelector` or `orLabelSelectors` (if provided) is applied to **all resource types in the namespace that are not already matched by a kind-specific filter entry**. If no selectors are provided, all unlisted resources are included. Using `["*"]` is highly recommended as it makes the catch-all intention explicit and self-documenting.
**Rules for catch-all entries:**
- At most **one** catch-all entry is allowed per `NamespacedFilterPolicy`.
- `names` and `excludedNames` are **not** supported on catch-all entries. Name patterns are kind-specific by nature and cannot be applied across arbitrary kinds; use kind-specific entries for name-based filtering.
- The catch-all applies to kinds that are **not listed in any other `resourceFilters` entry** in the same policy. Kind-specific entries take precedence over the catch-all.
- A catch-all entry **does not inherit or fall back to `BackupSpec.LabelSelector`**. If a catch-all entry has no `labelSelector`/`orLabelSelectors`, all unlisted resource kinds in the namespace are included with **no label filtering** — the global label selector is not applied. Define a catch-all with an explicit `labelSelector` if label-based filtering is desired for unlisted kinds.
**Evaluation order within a namespace filter policy:**
1. For each resource kind encountered during backup, the system first checks whether a kind-specific `resourceFilters` entry exists for that kind.
2. If a kind-specific entry exists, it is used exclusively (its label selectors and name patterns apply).
3. If no kind-specific entry exists but a catch-all entry is present, the catch-all's `labelSelector`/`orLabelSelectors` is applied to that kind.
4. If neither a kind-specific entry nor a catch-all entry exists, the kind is excluded from the backup for that namespace.
### Filter Precedence Model
The namespace-scoped filter system and fine-grained global filter system layer on top of the existing global filter system. They intentionally behave differently:
- **`namespacedFilterPolicies`** acts as an **exclusive allowlist (boundary)**. Only kinds explicitly listed (or matched by a catch-all) are backed up from that namespace. This gives namespace owners complete and isolated control over their namespace's backup contents, preventing unexpected data spillage from global fallbacks.
- **`clusterScopedFilterPolicy`** acts as a **refinement overlay (tweak)**. Unlisted cluster-scoped kinds fall back to the standard global filters. This allows administrators to selectively adjust filtering for a few specific cluster-scoped kinds without rewriting the entire global inclusion list.
**For Namespace-Scoped Resources:**
The evaluation order is:
1. **Global namespace filter** (`BackupSpec.IncludedNamespaces`/`ExcludedNamespaces`) is checked first. A namespace must pass this filter to be considered at all. `namespacedFilterPolicies` cannot override namespace exclusion — if a namespace is excluded globally, no filter policy entry can bring it back.
2. **Per-namespace filter lookup.** For each namespace that passes the global namespace filter, the system checks whether any `namespacedFilterPolicies` entry matches (by namespace name or glob pattern). If a match is found, the `resourceFilters` array determines what gets backed up for that namespace:
- Only resource kinds listed in `resourceFilters[].kinds` are included
- Each kind uses its own `labelSelector`/`orLabelSelectors` (if specified)
- Each kind uses its own `names`/`excludedNames` patterns (if specified)
3. **Namespaces without a matching filter policy** continue to use the global filters (`BackupSpec.IncludedResources`, `BackupSpec.LabelSelector`, etc., combined with `includeExcludePolicy`) exactly as they do today.
4. **If multiple filter policy entries could match the same namespace** (e.g., `team-*` and `team-frontend-*` both matching `team-frontend-prod`), the **first matching policy in the list** is used. **Important: Place more specific patterns before broader patterns** to achieve the intended filtering behavior.
5. **The `velero.io/exclude-from-backup=true` label** always takes precedence over all filters, regardless of whether the item matches global or per-namespace filters.
6. **Interaction with `includeExcludePolicy`**: `namespacedFilterPolicies` is a **refinement** of the global resource filter system, not a replacement. Global exclusions defined in `includeExcludePolicy` (e.g., `excludedNamespaceScopedResources: [secrets]`) are applied first at the resource-type level before per-namespace filter policies are consulted. A namespace-scoped filter policy cannot re-include a resource kind that has been globally excluded by `includeExcludePolicy`. For example, if `secrets` is listed under `excludedNamespaceScopedResources`, no `Secret` resources will be backed up from any namespace, even if a `namespacedFilterPolicies` entry explicitly lists `Secret` for that namespace. Users who need per-namespace secret selection must remove `secrets` from the global exclusion list.
To help users catch this misconfiguration early, Velero logs a warning at backup start when a `namespacedFilterPolicies` entry lists a kind that is globally excluded by `includeExcludePolicy`:
```
level=warn msg="namespacedFilterPolicies entry lists a kind that is globally excluded by includeExcludePolicy; the per-namespace filter entry has no effect" kind="secrets" namespacePattern="ns-a"
```
**For Cluster-Scoped Resources:**
1. If `clusterScopedFilterPolicy` is present, it acts as a **refinement overlay** over the existing global filters for cluster-scoped resources. It is NOT an exclusive allowlist.
- To back up cluster-scoped resources in a namespace-filtered backup, you must still explicitly include them via `BackupSpec.IncludedClusterScopedResources`.
- If a cluster-scoped kind is listed in its `resourceFilters`, its specific `labelSelector`/`orLabelSelectors` and `names`/`excludedNames` patterns are applied.
- If a cluster-scoped kind is **not listed**, it falls back to the standard global filters (`BackupSpec.LabelSelector`, etc.) and is included in the backup.
2. If `clusterScopedFilterPolicy` is absent, Velero falls back to the existing global filters (`IncludedClusterScopedResources`, `IncludedResources`, `LabelSelector`, etc.) for cluster-scoped resources.
3. **The `velero.io/exclude-from-backup=true` label** always takes precedence over all filters.
```mermaid
flowchart TD
A["BackupSpec Global<br>IncludedNamespaces / ExcludedNamespaces"]
B{Namespace passes<br>global filter?}
C[Namespace excluded<br>from backup]
D{namespacedFilterPolicies<br>lookup by namespace}
E{"For each resource kind:<br>is kind in resourceFilters?"}
F["Apply kind-specific filters:<br>- labelSelector / orLabelSelectors<br>- names / excludedNames"]
G[Kind skipped for<br>this namespace]
H["Use global filters:<br>- BackupSpec IncludedResources<br>- BackupSpec LabelSelector<br>- includeExcludePolicy"]
I{"Is resource<br>cluster-scoped?"}
J{"Is clusterScopedFilterPolicy<br>present?"}
K{"Is kind in resourceFilters?"}
L["Apply kind-specific filters:<br>- labelSelector / orLabelSelectors<br>- names / excludedNames"]
I -- Yes --> J
J -- Yes --> K
K -- Yes --> L
K -- No --> H
J -- No --> H
I -- No --> A
A --> B
B -- No --> C
B -- Yes --> D
D -- Match found --> E
E -- Yes --> F
E -- No --> G
D -- No match found --> H
```
### Data Flow in the Backup Pipeline
The existing backup pipeline has two stages: item collection and item backup. Namespace-scoped filters and fine-grained global filters are applied at both stages:
**Stage 1 — Item Collection (`item_collector.go`).** Resources are listed from the Kubernetes API.
- **Resource type check** in `getResourceItems()`: Before iterating namespaces, the global resource type check still applies.
- **For Cluster-Scoped Resources:** The global resource type check (`ShouldInclude`) determines if the kind is collected. `ClusterScopedFilterPolicy` does not skip unlisted cluster-scoped kinds at this stage.
- **For Namespace-Scoped Resources:** Within the namespace loop, a per-namespace resource type check is added. If a filter policy matches the current namespace, only resource kinds listed in `resourceFilters[].kinds` are included — if the current resource type is not listed, it is skipped for that namespace.
- **Label selector** in `listResourceByLabelsPerNamespace()` and `listResourceByLabelsGlobally()`: The function looks up the filter policy (either the namespace-specific one or the fine-grained global one). If found, it retrieves the `ResourceFilter` entry for the current resource kind and uses that entry's `labelSelector`/`orLabelSelectors` for the Kubernetes API list call. If no filter policy is found, the global selectors are used as before.
**Stage 2 — Item Backup (`item_backupper.go`).** Collected items are validated and written to the archive.
- **Name pattern check** in `itemInclusionChecks()`: After the existing namespace and resource type re-validation, the item's `metadata.name` is checked against the `ResourceFilter` entry's `names`/`excludedNames` glob patterns for the item's kind (checking the cluster-scoped map for cluster resources and namespace map for namespace resources). If the name doesn't match, the item is excluded.
- **Important:** If the item's kind is not listed in the namespace filter map **and** there is no catch-all entry, the item passes through Stage 2 without a name check. This is intentional — see [Plugin AdditionalItems and Auto-Backed Up CRDs](#edge-cases-and-behavior-documentation) below.
### Impact on Restore
The restore process (`pkg/restore/restore.go`) is **not modified** in this design. The reason:
- Restore reads the backup archive as-is. Items excluded by namespace-scoped filter policies during backup are simply absent from the archive. The restore process iterates what's in the tarball and applies `RestoreSpec` filters on top. No items excluded during backup will appear during restore.
- Restore plugins that request "additional items" (via `RestoreItemAction`) may reference items excluded from the backup. These items won't be in the archive, so the restore will skip them silently. This is the same behavior that occurs today with any incomplete backup — no new risk is introduced.
- Users can still use `RestoreSpec.IncludedNamespaces` to selectively restore from a namespace-filtered backup.
A follow-up design will add namespace-scoped filters to the restore pipeline.
### Edge Cases and Behavior Documentation
This section documents the system behavior in edge cases and error conditions:
**Plugin AdditionalItems and Auto-Backed Up CRDs:**
Cluster-scoped resources injected dynamically (such as `VolumeSnapshotClass` from the CSI plugin or `CustomResourceDefinition` from Velero's auto-backup loop) do not require hardcoded exceptions. In `itemInclusionChecks()`, Velero natively allows unlisted cluster-scoped resources to pass through unless explicitly excluded by the user. `ClusterScopedFilterPolicy` preserves this permissive behavior: if a dynamically injected cluster-scoped resource is NOT listed in the policy, it passes through untouched. If it IS listed, its specific `names` and `excludedNames` filters are strictly enforced.
**Plugin-injected namespace-scoped additional items** follow the same permissive principle. When a `BackupItemAction` returns additional items whose kind is not listed in the matched `namespacedFilterPolicies` entry and there is no catch-all entry, those items still pass through Stage 2 (`itemInclusionChecks`). This is intentional: blocking plugin-injected items at Stage 2 would break backup completeness — for example, a CSI plugin may inject a `VolumeSnapshotContent` that is required for a correct restore even when the user's filter policy only lists application resource types.
The kind-level exclusion that makes `namespacedFilterPolicies` an exclusive allowlist applies only during **Stage 1** (the primary collection pass in `item_collector.go`). At Stage 2, `itemInclusionChecks` enforces only:
- The `velero.io/exclude-from-backup=true` label (always takes precedence).
- The `names`/`excludedNames` patterns for **listed** kinds.
Plugin-injected items of unlisted kinds are therefore included as long as they are not explicitly excluded by label. Users who need to suppress a specific plugin-injected kind should apply the `velero.io/exclude-from-backup=true` label to those resources.
**Multiple Glob Patterns Matching Same Namespace (Incorrect Order):**
```yaml
namespacedFilterPolicies:
- namespaces: ["team-*"] # Broader pattern listed first
resourceFilters:
- kinds: [Deployment, Service]
- namespaces: ["team-frontend-*"] # More specific pattern listed second
resourceFilters:
- kinds: [ConfigMap, Secret, Deployment, Service]
```
**Behavior:** For namespace `team-frontend-prod`, the broader `team-*` pattern matches first, so only `Deployment` and `Service` are backed up. The more specific `team-frontend-*` rule is never reached.
**Multiple Glob Patterns Matching Same Namespace (Correct Order):**
```yaml
namespacedFilterPolicies:
- namespaces: ["team-frontend-*"] # More specific pattern listed first
resourceFilters:
- kinds: [ConfigMap, Secret, Deployment, Service]
- namespaces: ["team-*"] # Broader pattern listed second
resourceFilters:
- kinds: [Deployment, Service]
```
**Behavior:** For namespace `team-frontend-prod`, the specific `team-frontend-*` pattern matches first, backing up all specified resources. For `team-backend-dev`, the broader `team-*` pattern matches, backing up only `Deployment` and `Service`. This achieves the intended behavior.
**Namespace Included Globally But No Matching Filter Policy:**
```yaml
# BackupSpec includes "production" namespace
# ResourcePolicy has no namespacedFilterPolicies entry for "production"
```
**Behavior:** The namespace uses global filters exactly as it does today. This is the backward compatibility behavior — only namespaces with explicit filter policies get namespace-scoped filtering.
**Empty ResourceFilters Array:**
```yaml
namespacedFilterPolicies:
- namespaces: ["test-namespace"]
resourceFilters: [] # empty array
```
**Behavior:** Validation error during backup creation:
```
namespacedFilterPolicies[0]: at least one resourceFilter must be specified
```
**Namespace Pattern with No Matches:**
```yaml
namespacedFilterPolicies:
- namespaces: ["nonexistent-*"]
resourceFilters: [...]
```
**Behavior:** No error. The filter policy is loaded but never applied since no namespaces match the pattern. This allows for conditional filtering based on namespace existence.
**Resource Kind Not Present in Target Namespaces:**
```yaml
resourceFilters:
- kinds: ["StatefulSet"] # namespace has no StatefulSets
names: ["workload-1"]
```
**Behavior:** No error. The filter is applied but finds no matching resources. Empty result set is valid.
**Conflicting Name Patterns:**
```yaml
resourceFilters:
- kinds: ["ConfigMap"]
names: ["app-*"]
excludedNames: ["app-config"] # conflicts with names pattern
```
**Behavior:** The `excludedNames` takes precedence. Resources matching `app-*` are included, then `app-config` is excluded. Net result: includes `app-secret`, `app-data`, etc., but excludes `app-config`.
**Invalid Label Selector Syntax:**
```yaml
resourceFilters:
- kinds: ["Pod"]
labelSelector:
"invalid label key!": "value" # invalid key syntax
```
**Behavior:** Validation error during backup creation when `labels.SelectorFromSet()` fails:
```
namespacedFilterPolicies[0].resourceFilters[0]: invalid label selector: "invalid label key!" is not a valid label key
```
**Out-of-Scope Kinds in Filter Entries:**
A user may accidentally list a cluster-scoped kind (e.g., `ClusterRole`) inside a `namespacedFilterPolicies` entry, or a namespace-scoped kind (e.g., `ConfigMap`) inside `clusterScopedFilterPolicy`. The system silently ignores such entries at the Kubernetes API level — namespace-scoped items are never listed globally, and cluster-scoped items are never listed per-namespace, so no matching resources will ever be found. A warning is logged at backup start to help the user detect the misconfiguration. No validation error is raised — the entry is harmless but ineffective.
**Discovery Helper Unavailable:**
If the discovery helper is completely unavailable during backup initialization, the backup fails with:
```
failed to resolve namespace filter policies: discovery client unavailable
```
This is consistent with how other discovery-dependent features handle this error condition.
# Detailed Design
## ResourceFilter Field Notes
**`labelSelector`** supports equality-based selectors only (`key=value`). Set-based requirements (e.g., `environment in (prod, staging)`) are not supported. To match resources with any of several label combinations, use `orLabelSelectors` with multiple maps — each map is AND-evaluated internally, and the maps are OR-evaluated across the list. `labelSelector` and `orLabelSelectors` cannot co-exist in the same entry.
**`names` / `excludedNames`** accept exact resource names or glob patterns. If `names` is empty, all resource names are included (subject to label filters). `excludedNames` takes precedence over `names` when a name matches both.
## Validation
**Validation functions for `namespacedFilterPolicies`:**
1. **Each filter policy must specify at least one namespace:**
```
namespacedFilterPolicies[N]: at least one namespace must be specified
```
2. **Each filter policy must specify at least one resource filter:**
```
namespacedFilterPolicies[N]: at least one resourceFilter must be specified
```
3. **Each resource filter without kinds can only be defined once, and cannot specify names/excludedNames.**
```
namespacedFilterPolicies[N]: only one resource filter with empty kinds is allowed
namespacedFilterPolicies[N].resourceFilters[M]: names or excludedNames cannot be specified when kinds is empty
```
4. **No duplicate kinds across resource filter entries** within the same namespace filter:
```
namespacedFilterPolicies[N]: kind "Pod" appears in both resourceFilters[0] and resourceFilters[2]
```
5. **`labelSelector` and `orLabelSelectors` mutual exclusion** within each resource filter:
```
namespacedFilterPolicies[N].resourceFilters[M]: labelSelector and orLabelSelectors cannot co-exist
```
6. **No duplicate namespace patterns across filter policies.** This validates only exact duplicates - runtime behavior handles overlapping patterns.
**Rationale:** Detecting all possible pattern overlaps (like `team-*` vs `team-frontend-*`) is computationally complex and may reject valid configurations. Instead, the runtime uses first-match semantics - the first matching filter policy in the list is applied. This allows users flexibility while preventing obvious configuration errors.
7. **Namespace patterns must be valid globs.**
8. **Resource name patterns must be valid globs.**
9. **Resource kind validation with discovery helper** (performed during backup initialization).
**Validation functions for `clusterScopedFilterPolicy`:**
1. **At least one resourceFilter must be specified.**
```
clusterScopedFilterPolicy: at least one resourceFilter must be specified
```
2. **No duplicate kinds across resource filters.**
3. **`labelSelector` and `orLabelSelectors` mutual exclusion.**
4. **Resource name patterns must be valid globs.**
Additionally, in `backup_controller.go`, a validation check ensures that `namespacedFilterPolicies` and `clusterScopedFilterPolicy` are not used with old-style resource filters (`IncludedResources`/`ExcludedResources`/`IncludeClusterResources`), similar to the existing check for `includeExcludePolicy`.
## ConfigMap Examples
### Per-Namespace Resource Type Filtering
Back up only ConfigMaps, Secrets, and Deployments (with label `app=my-app`) from `ns-a`, but everything from `ns-b`:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: backup-filter-policy
namespace: velero
data:
policy: |
version: v1
namespacedFilterPolicies:
- namespaces:
- ns-a
resourceFilters:
- kinds: [ConfigMap, Secret, Deployment]
labelSelector:
app: my-app
# ns-b has no filter policy entry, so global filters apply (include everything)
```
Backup CR referencing it:
```yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
name: selective-backup
namespace: velero
spec:
includedNamespaces:
- ns-a
- ns-b
resourcePolicy:
kind: configmap
name: backup-filter-policy
storageLocation: default
ttl: 720h0m0s
```
### Per-Kind Label Selectors (Different Labels per Kind)
Back up Deployments with one label and StatefulSets with a different label from the same namespace:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: vm-filter-policy
namespace: velero
data:
policy: |
version: v1
namespacedFilterPolicies:
- namespaces:
- target-namespace
resourceFilters:
- kinds: [Deployment]
labelSelector:
app: production-workload-1
- kinds: [StatefulSet]
labelSelector:
app: production-workload-2
```
### Per-Kind Exact Names
Back up specific Deployments, Configmaps, and Secrets by name:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: named-resource-filter
namespace: velero
data:
policy: |
version: v1
namespacedFilterPolicies:
- namespaces:
- target-namespace
resourceFilters:
- kinds: [Deployment]
names: [workload-1, workload-2]
- kinds: [ConfigMap]
names: [p1, p2]
- kinds: [Secret]
names: [c1, c2]
```
### Name Pattern Filtering with Exclusion
Back up only `app-*` ConfigMaps and Secrets from `production`, excluding temporary and debug resources:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-filter-policy
namespace: velero
data:
policy: |
version: v1
namespacedFilterPolicies:
- namespaces:
- production
resourceFilters:
- kinds: [ConfigMap, Secret]
names: ["app-*"]
excludedNames: ["*-tmp", "*-debug"]
```
### Catch-All with No Label Selector (Override-Only)
A user may want to use the global configuration for 99% of resources in a namespace, but only apply a specific name filter to a single kind. To achieve this without explicitly listing all other kinds or adding dummy labels, a catch-all filter without a label selector can be used:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: override-only-filter-policy
namespace: velero
data:
policy: |
version: v1
namespacedFilterPolicies:
- namespaces:
- ns-a
resourceFilters:
- kinds: [Secret]
names: [my-secret] # Specific override for Secrets
- kinds: ["*"] # Catch-all: NO label selector
# Includes all other kinds unconditionally
```
**Result:**
- `Secret` resources: only `my-secret` is backed up.
- All other resource types: backed up unconditionally (acting like a global fallback).
### Catch-All Label Selector (Back Up Everything with a Specific Label)
When a user wants to back up any resource type in a namespace that carries a particular label — without enumerating every kind — the catch-all entry (empty `kinds` or `["*"]`) achieves this with a single rule:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: label-based-filter-policy
namespace: velero
data:
policy: |
version: v1
namespacedFilterPolicies:
- namespaces:
- production
resourceFilters:
- kinds: ["*"] # catch-all: applies to every kind not listed below
labelSelector:
backup: "true" # back up any resource carrying this label
```
**Result:** Every resource type in `production` that has the label `backup=true` is backed up. Resources without that label are excluded. No kind enumeration is required.
### Catch-All with Per-Kind Name Overrides
A more advanced pattern: use exact names for specific kinds and fall back to a label selector for all remaining kinds. Kind-specific entries take precedence over the catch-all:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: mixed-filter-policy
namespace: velero
data:
policy: |
version: v1
namespacedFilterPolicies:
- namespaces:
- production
resourceFilters:
- kinds: [Deployment]
names: [api-server, worker] # these exact Deployments by name
- kinds: [Secret]
names: [db-credentials, tls-cert] # these exact Secrets by name
- kinds: ["*"] # catch-all for all other kinds
labelSelector:
backup: "true" # back up by label
```
**Result:**
- `Deployment` resources: only `api-server` and `worker` are backed up (name filter; the catch-all does not apply).
- `Secret` resources: only `db-credentials` and `tls-cert` are backed up (name filter; the catch-all does not apply).
- All other resource types (ConfigMap, StatefulSet, Service, etc.): backed up only if they carry `backup=true`.
This pattern is useful when certain high-value resources need precise name-based selection, while the rest of the namespace is covered by a label convention.
### Glob Namespace Patterns and Ordering
Apply filters to namespaces matching patterns. **Critical: Order patterns from most specific to least specific:**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: team-filter-policy
namespace: velero
data:
policy: |
version: v1
namespacedFilterPolicies:
# More specific patterns first
- namespaces:
- "team-frontend-prod" # Most specific (exact match)
resourceFilters:
- kinds: [Deployment, Service, ConfigMap, Secret, PVC]
- namespaces:
- "team-frontend-*" # Less specific (pattern match)
resourceFilters:
- kinds: [Deployment, Service, ConfigMap]
- namespaces:
- "team-*" # Least specific (broad pattern)
resourceFilters:
- kinds: [Deployment, Service]
```
**Pattern Matching Results:**
- `team-frontend-prod` → Uses exact match policy (backs up 5 resource types)
- `team-frontend-dev` → Uses `team-frontend-*` policy (backs up 3 resource types)
- `team-backend-test` → Uses `team-*` policy (backs up 2 resource types)
- `app-namespace` → No match, uses global filters
### Combined with Volume Policies
Both volume policies and namespace-scoped filters in the same ConfigMap:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: combined-policy
namespace: velero
data:
policy: |
version: v1
volumePolicies:
- conditions:
capacity: "0,10Gi"
storageClass:
- standard
action:
type: fs-backup
- conditions:
capacity: "10Gi,100Gi"
action:
type: snapshot
namespacedFilterPolicies:
- namespaces:
- production
resourceFilters:
- kinds: [Deployment]
names: [workload-1, workload-2]
- kinds: [StatefulSet]
labelSelector:
app: my-app
- kinds: [ConfigMap, Secret]
names: ["app-*"]
excludedNames: ["*-tmp", "*-debug"]
```
### Backup CR — No ResourcePolicy (backward compatible)
Existing backups continue to work exactly as before:
```yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
name: full-backup
namespace: velero
spec:
includedNamespaces:
- "*"
includedResources:
- "*"
labelSelector:
matchLabels:
backup: "true"
storageLocation: default
```
## CLI
### `velero backup describe`
The output is extended to display namespace-scoped filter policies when present in the ResourcePolicy ConfigMap:
```
Name: selective-backup
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: Completed
Errors: 0
Warnings: 0
Namespaces:
Included: ns-a, ns-b
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Resource Policy: backup-filter-policy
Namespace-Scoped Filter Policies:
ns-a:
Resource Filters:
ConfigMap, Secret, Deployment:
Label selector: app=my-app
Included names: <none>
Excluded names: <none>
target-namespace:
Resource Filters:
Deployment:
Label selector: app=production-workload-1
Included names: <none>
Excluded names: <none>
StatefulSet:
Label selector: app=production-workload-2
Included names: <none>
Excluded names: <none>
production:
Resource Filters:
Deployment:
Label selector: <none>
Included names: [api-server, worker]
Excluded names: <none>
<catch-all> (all other kinds):
Label selector: backup=true
Included names: <none>
Excluded names: <none>
Fine-Grained Global Filter Policy:
Resource Filters:
ClusterRole, ClusterRoleBinding:
Label selector: <none>
Included names: [my-app-*]
Excluded names: <none>
CustomResourceDefinition:
Label selector: app=my-app
Included names: <none>
Excluded names: <none>
Storage Location: default
...
```
### `velero backup create`
No new CLI flags are added. The namespace-scoped filter policies are specified in the ResourcePolicy ConfigMap, which is already referenced via the existing `--resource-policies-configmap` flag:
```bash
velero backup create selective-backup \
--include-namespaces ns-a,ns-b \
--resource-policies-configmap backup-filter-policy
```
The `--help` output for `velero backup create` is updated to clarify the interaction between global and namespace-scoped filters:
```
Backup Filtering Options:
--include-namespaces stringArray namespaces to include in the backup (use '*' for all namespaces)
--exclude-namespaces stringArray namespaces to exclude from the backup
--include-resources stringArray resources to include in the backup, formatted as resource.group
--exclude-resources stringArray resources to exclude from the backup, formatted as resource.group
--include-cluster-resources optionalBool[=true] include cluster-scoped resources
--exclude-cluster-resources exclude cluster-scoped resources
--selector labelSelector only back up resources matching this label selector
--or-selector labelSelector back up resources matching any of the label selectors (can be repeated)
--resource-policies-configmap string reference to a configmap containing resource policies for volume snapshots and namespace-scoped filtering
Notes:
- Global filters (--include-resources, --selector, etc.) apply to all included namespaces
- Namespace-scoped filters defined in --resource-policies-configmap override global filters for matching namespaces
- Fine-grained global filter policies defined in --resource-policies-configmap override global filters for cluster-scoped resources
- Use 'velero backup describe' to view resolved filter policies after backup creation
```
### CLI Integration Points
**Backup Creation Workflow:**
1. User creates ResourcePolicy ConfigMap with `namespacedFilterPolicies`
2. User references ConfigMap via `--resource-policies-configmap` flag
3. Backup controller validates policies during backup initialization
4. Validation errors are reported immediately with specific line/field references
**Help and Discovery:**
- `velero backup create --help` includes updated filtering documentation
- `velero backup describe` shows resolved filter policies for troubleshooting
- Validation errors include ConfigMap field references for easy debugging
**Configuration Discovery:**
- `velero backup create --help` includes namespace-scoped filtering documentation
- `velero backup describe` shows resolved filter policies for verification
## User Perspective
This design provides fine-grained, per-namespace, per-kind control over backup filtering. Key user-facing aspects:
- **For users not using namespace-scoped filter policies**: Zero changes. All existing backups and workflows continue to work identically. The new YAML key is optional.
- **For users adopting namespace-scoped filter policies**: Create a ConfigMap with the `namespacedFilterPolicies` section and reference it via `BackupSpec.ResourcePolicy` (or the existing `--resource-policies-configmap` flag). The backup will selectively include/exclude resources per namespace based on the filter rules.
- **For users already using ResourcePolicy for volume policies**: Add the `namespacedFilterPolicies` section to the same ConfigMap. Both volume policies and namespace-scoped filters coexist.
- **For restore from a namespace-filtered backup**: No changes to restore workflow. Restore processes whatever is in the archive. Users can use existing `RestoreSpec.IncludedNamespaces` for additional filtering at restore time.
- **`velero backup describe` output**: Extended to show per-namespace, per-kind filter details when the ResourcePolicy ConfigMap contains `namespacedFilterPolicies`.
- **Validation errors**: Reported at backup start when the ResourcePolicy ConfigMap contains invalid `namespacedFilterPolicies` configurations. Consistent with how volume policy validation errors are reported today.
## Alternatives Considered
1. **CRD-Based `NamespacedFilters` Field**: Add `NamespacedFilters []NamespaceFilter` directly to `BackupSpec`. Rejected for this iteration due to heavy CRD change overhead. The ResourcePolicy approach achieves the same functionality with less API surface change.
2. **Flat Fields on NamespacedFilterPolicy (No Per-Kind Selectors)**: Use flat fields (`includedResources`, `labelSelector`, `includedResourceNames`) shared across all kinds within a namespace. Rejected because it cannot express per-kind label selectors or per-kind name lists — a critical requirement for workloads where different resource types have different labels or naming conventions.
3. **Scoped Label Selectors Only**: Augment existing label selectors with an optional namespace scope. Rejected because it only addresses label-scoped filtering and does not support per-namespace resource type filtering or name filtering.
4. **Global Name Filter Only**: Add only global name filter fields. Rejected because it only addresses name filtering globally and does not address namespace-scoped or kind-scoped filtering.
5. **Separate ConfigMap for Namespace Filters**: Use a new `BackupSpec` field pointing to a different ConfigMap (separate from volume policies). Rejected because it adds a new CRD field (which has similar reasons with #1) and splits configuration across multiple ConfigMaps.
Binary file not shown.

Before

Width:  |  Height:  |  Size: 498 KiB

-551
View File
@@ -1,551 +0,0 @@
# Block Data Mover Design
## Glossary & Abbreviation
**Backup Storage**: The storage to store the backup data. Check [Unified Repository design][1] for details.
**Backup Repository**: Backup repository is layered between BR data movers and Backup Storage to provide BR related features that is introduced in [Unified Repository design][1].
**Velero Generic Data Path (VGDP)**: VGDP is the collective of modules that is introduced in [Unified Repository design][1]. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository.
**Velero Built-in Data Mover (VBDM)**: VBDM, which is introduced in [Volume Snapshot Data Movement design][2] and [Unified Repository design][1], is the built-in data mover shipped along with Velero, it includes Velero data mover controllers and VGDP.
**Data Mover Pods**: Intermediate pods which hold VGDP and complete the data transfer. See [VGDP Micro Service for Volume Snapshot Data Movement][3] for details.
**Change Block Tracking (CBT)**: CBT is the mechanism to track changed blocks, so that backups could back up the changed data only. CBT usually provides by the computing/storage platform.
**TCO**: Total Cost of Ownership. This is a general criteria for products/solutions, but also means a lot for BR solutions. For example, this means what kind of backup storage (and its cost) it requires, the retention policy of backup copies, the ways to remove backup data redundancy, etc.
**PodVolume Backup**: This is the Velero backup method which accesses the data from live file system, see [Kopia Integration design][1] for how it works.
**CAOS and CABS**: Content-Addressable Object Storage and Content-Addressable Block Storage, they are the parts from Kopia repository, see [Kopia Architecture][5].
## Background
Kubernetes supports two kinds of volume mode, `FileSystem` and `Block`, for persistent volumes. Underlyingly, the storage could use a block storage to provision either `FileSystem` mode or `Block` mode volumes; and the storage could use a file storage to provision `FileSystem` mode volumes.
For volumes provisioned by block storage, they could be backed up/restored from the block level, regardless the volume mode of the persistent volume.
On the other hand, as long as the data could be accessed from the file system, a backup/restore could be conducted from the file system level. That is to say `FileSystem` mode volumes could be backed up/restored from the file system level, regardless of the backend storage type.
Then if a `FileSystem` mode volume is provisioned by a block storage, the volume could be backed up/restored either from the file system level or block level.
For Velero, [CSI Snapshot Data Movement][2] which is implemented by VBDM, ships a file system uploader, so the backup/restore is done from file system only.
Once possible, block level backup/restore is better than file system level backup/restore:
- Block level backup could leverage CBT to process minimal size of data, so it significantly reduces the overhead to network, backup repository and backup storage. As a result, TCO is significantly reduced.
- Block level backup/restore is performant in throughput and resource consumption, because it doesn't need to handle the complexity of the file system, especially for the case that huge number of small files in the file system.
- Block level backup/restore is less OS dependent because the uploader doesn't need the OS to be aware of the file system in the volume.
At present, [Kubernetes CBT API][4] is mature and close to Beta stage. Many platform/storage has supported/is going to support it.
Therefore, it is very important for Velero to deliver the block level backup/restore and recommend users to use it over the file system data mover as long as:
- The volume is backed by block storage so block level access is possible
- The platform supports CBT
Meanwhile, file system level backup/restore is still valuable for below scenarios:
- The volume is backed by file storage, e.g., AWS EFS, Azure File, CephFS, VKS File Volume, etc.
- The volume is backed by block storage but CBT is not available
- The volume doesn't support CSI snapshot, so Velero PodVolume Backup method is used
There are rich features delivered with VGDP, VBDM and [VGDP micro service][3], to reuse these features, block data mover should be built based on these modules.
Velero VBDM supports linux and Windows nodes, however, Windows container doesn't support block mode volumes, so backing up/restoring from Windows nodes is not supported until Windows container removes this limitation. As a result, if there are both linux and Windows nodes in the cluster, block data mover can only run in linux nodes.
Both the Kubernetes CBT service and Velero work in the boundary of the cluster, even though the backend storage may be shared by multiple clusters, Velero can only protection workloads in the same cluster where it is running.
## Goals
Add a block data mover to VBDM and support block level backup/restore for [CSI Snapshot Data Movement][2], which includes:
- Support block level full backup for both `FileSystem` and `Block` mode volumes
- Support block level incremental backup for both `FileSystem` and `Block` mode volumes
- Support block level restore from full/incremental backup for both `FileSystem` and `Block` mode volumes
- Support block level backup/restore for both linux and Windows workloads from linux cluster nodes
- Support all existing features, i.e., load concurrency, node selection, cache volume, deduplication, compression, encryption, etc. for the block data mover
- Support volumes processed from file system level and block level in the same backup/restore
## Non-Goals
- PodVolume Backup does the backup/restore from file system level only, so block level backup/restore is not supported
- Volumes that are backed by file system storages, can only be backed up/restored from file system level, so block level backup/restore is not supported
- Backing up/restoring from Windows nodes is not supported
- Block level incremental backup requires special capabilities of the backup repository, and Velero [Unified Repository][1] supports multiple kinds of backup repositories. The current design focus on Kopia repository only, block level incremental backup support of other repositories will be considered when the specific backup repository is integrated to [Velero Unified Repository][1]
## Architecture
### Data Path
Below shows the architecture of VGDP when integrating to Unified Repository (implemented by Kopia repository).
A new block data mover will be added besides the existing file system data mover, the both data movers read/write data from/to the same backup repository through Unified Repo interface.
Unified Repo interface and the backup repository needs to be enhanced to support incremental backups.
![Data path overview](data-path-overview.png)
For more details of VGDP architecture, see [Unified Repository design][1], [Volume Snapshot Data Movement design][2] and [VGDP Micro Service for Volume Snapshot Data Movement][3].
### Backup
Below is the architecture for block data mover backup which is developed based on the existing VBDM:
![Backup architecture](backup-architecture.png)
The existing VBDM is reused, below are the major changes based on the existing VBDM:
**Exposer**: Exposer needs to create block mode backupPVC all the time regardless of the sourcePVC mode.
**CBT**: This is a new layer to retrieve, transform and store the changed blocks, it interacts with CSI SnapshotMetadataService through gRPC.
**Uploader**: A new block uploader is added. It interacts with CBT layer, holds special logics to make performant data read from block devices and holds special logics to write incremental data to Unified Repository.
**Extended Kopia repo**: A new Incremental Aware Object Extension is added to Kopia's CAOS, so as to support incremental data write. Other parts of Kopia repository, including the existing CAOS and CABS, are not changed.
### Restore
Below is architecture for block data mover restore which is developed based on the existing VBDM:
![Restore architecture](restore-architecture.png)
The existing VBDM is reused, below are the major changes based on the existing VBDM:
**Exposer**: While the restorePV is in block mode, exposer needs to rebind the restorePV to a targetPVC in either file system mode or block mode.
**Uploader**: The same block uploader holds special logics to make performant data write to block devices and holds special logics to read data from the backup chain in Unified repository.
For more details of VBDM, see [Volume Snapshot Data Movement design][2].
## Detailed Design
### Selectable Data Mover Type
#### Per Backup Selection
At present, the backup accepts a `DataMover` parameter and when its value is empty or `velero`, VBDM is used.
After block data mover is introduced, VBDM will have two types of data movers, Velero file system data mover and Velero block data mover.
A new type string `velero-block` is introduced for Velero block data mover, that is, when `DataMover` is set as `velero-block`, Velero block data mover is used.
Another new value `velero-fs` is introduced for Velero file system data mover, that is, when `DataMover` is set as `velero-fs`, Velero file system data mover is used.
For backwards compatibility consideration, `velero` is preserved a valid value, it refers to the default data mover, and the default data mover may change among releases. At present, Velero file system data mover is the default data mover; we can change the default one to Velero block data mover in future releases.
#### Volume Policy
It is a valid case that users have multiple volumes in a single backup, while they want to use Velero file system data mover for some of the volumes and use Velero block data mover for some others.
To meet this requirement, a combined solution of Per Backup Selection and Volume Policy is used.
Here are the data structs for VolumePolicy:
```go
type volPolicy struct {
action Action
conditions []volumeCondition
}
type volumeCondition interface {
match(v *structuredVolume) bool
validate() error
}
type structuredVolume struct {
capacity resource.Quantity
storageClass string
nfs *nFSVolumeSource
csi *csiVolumeSource
volumeType SupportedVolume
pvcLabels map[string]string
pvcPhase string
}
type Action struct {
Type VolumeActionType `yaml:"type"`
Parameters map[string]any `yaml:"parameters,omitempty"`
}
const (
ConfigmapRefType string = "configmap"
Skip VolumeActionType = "skip"
FSBackup VolumeActionType = "fs-backup"
Snapshot VolumeActionType = "snapshot"
)
```
`action.parameters` is used to provide extra information of the action. This is an ideal place to differentiate Velero file system data mover and Velero block data mover.
Therefore, Velero built-in data mover will support `dataMover` key in `parameters`, with the value either `velero-fs` or `velero-block`. While `velero-fs` and `velero-block` are with the same meaning with Per Backup Selection.
As an example, here is how a user might use both `velero-block` and `velero-fs` in a single backup:
- Users set `DataMover` parameter for the backup as `velero-block`
- Users add a record into Volume Policy, make `conditions` to filter the volumes they want to backup through Velero file system data mover, make `action.type` as `snapshot` and insert a record into `action.parameter` as `dataMover:velero-fs`
In this way, all volumes matched by `conditions` will be backed up with Velero file system data mover; while the others will fallback to the per backup method Velero block data mover.
Vice versa, users could set the per backup method as file system data mover and select volumes for Velero block data mover.
The selected data mover for each volume should be recorded to `volumeInfo.json`.
### Controllers
Backup controller and Restore controller are kept as is, async operations are still used to interact with VBDM with block data mover.
DataUpload controller and DataDownload controller are almost kept as is, with some minor changes to handle the data mover type and backup type appropriately and convey it to the exposers. With [VGDP Micro Service][3], the controllers are almost isolated from VGDP, so no major changes are required.
### Exposer
#### CSI Snapshot Exposer
The existing CSI Snapshot Exposer is reused with some changes to decide the backupPVC volume mode by access mode. Specifically, for Velero block data mover, access mode is always `Block`, so the backupPVC volume mode is always `Block`.
Once the backupPVC is created with correct volume mode, the existing code could create the backupPod and mount the backupPVC appropriately.
#### Generic Restore Exposer
The existing Generic Restore Exposer is reused, but the workflow needs some changes.
For block data mover, the restorePV is in Block mode all the time, whereas, the targetPVC may be in either file system mode or block mode.
However, Kubernetes doesn't allow to bound a PV to a PVC with mismatch volume mode.
Therefore, the workflow of ***Finish Volume Readiness*** as introduced in [Volume Snapshot Data Movement design][2] is changed as below:
- When restore completes and restorePV is created, set restorePV's `deletionPolicy` to `Retain`
- Create another rebindPV and copy restorePV's `volumeHandle` but the `volumeMode` matches to the targetPVC
- Delete restorePV
- Set the rebindPV's claim reference (the ```claimRef``` filed) to targetPVC
- Add the ```velero.io/dynamic-pv-restore``` label to the rebindPV
In this way, the targetPVC will be bound immediately by Kubernetes to rebindPV.
These changes work for file system data mover as well, so the old workflow will be replaced, only the new workflow is kept.
### VGDP
Below is the VGDP workflow during backup:
![VGDP Backup](vgdp-backup.png)
Below is the VGDP workflow during restore:
![VGDP Restore](vgdp-restore.png)
#### Unified Repo
For block data mover, one Unified Repo Object is created for each volume, and some metadata is also saved into Unified Repo to describe the volume.
During the backup, the write conducts a skippable-write manner:
- For the data range that the write does not skip, object is written with the real data
- For the data range that is skipped, the data is either filled as ZERO or cloned from the parent object. Specifically, for a full backup, data is filled as ZERO; for an incremental backup, data is cloned from the parent object
To support incremental backup, `ObjectWriter` interface needs to extend to support `io.WriterAt`, so that uploader could conduct a skippable-write manner:
```go
type ObjectWriter interface {
io.WriteCloser
io.WriterAt
// Seeker is used in the cases that the object is not written sequentially
io.Seeker
// Checkpoint is periodically called to preserve the state of data written to the repo so far.
// Checkpoint returns a unified identifier that represent the current state.
// An empty ID could be returned on success if the backup repository doesn't support this.
Checkpoint() (ID, error)
// Result waits for the completion of the object write.
// Result returns the object's unified identifier after the write completes.
Result() (ID, error)
}
```
To clone data from parent object, the caller needs to specify the parent object. To support this, `ObjectWriteOptions` is extended with `ParentObject`.
The existing `AccessMode` could be used to indicate the data access type, either file system or block:
```go
// ObjectWriteOptions defines the options when creating an object for write
type ObjectWriteOptions struct {
FullPath string // Full logical path of the object
DataType int // OBJECT_DATA_TYPE_*
Description string // A description of the object, could be empty
Prefix ID // A prefix of the name used to save the object
AccessMode int // OBJECT_DATA_ACCESS_*
BackupMode int // OBJECT_DATA_BACKUP_*
AsyncWrites int // Num of async writes for the object, 0 means no async write
ParentObject ID // the parent object based on which incremental write will be done
}
```
To support non-Kopia uploader to save snapshots to Unified Repo, snapshot related methods will be added to `BackupRepo` interface:
```go
// SaveSnapshot saves a repo snapshot
SaveSnapshot(ctx context.Context, snapshot Snapshot) (ID, error)
// GetSnapshot returns a repo snapshot from snapshot ID
GetSnapshot(ctx context.Context, id ID) (Snapshot, error)
// DeleteSnapshot deletes a repo snapshot
DeleteSnapshot(ctx context.Context, id ID) error
// ListSnapshot lists all snapshots in repo for the given source (if specified)
ListSnapshot(ctx context.Context, source string) ([]Snapshot, error)
```
To support non-Kopia uploader to save metadata, which is used to describe the backed up objects, some metadata related methods will be added to `BackupRepo` interface:
```go
// WriteMetadata writes metadata to the repo, metadata is used to describe data, e.g., file system
// dirs are saved as metadata
WriteMetadata(ctx context.Context, meta *Metadata, opt ObjectWriteOptions) (ID, error)
// ReadMetadata reads a metadata from repo by the metadata's object ID
ReadMetadata(ctx context.Context, id ID) (*Metadata, error)
```
kopia-lib for Unified Repo will implement these interfaces by calling the corresponding Kopia repository functions.
### Kopia Repository
CAOS of Kopia repository implements Unified Repo's Objects. However, CAOS supports full and sequential write only.
To make it support skippable write, a new Incremental Aware Object Extension is created based on the existing CAOS.
#### Block Address Table
Kopia CAOS uses Block Address Table (BAT) to track objects. It will be reused for both full backups and incremental backups.
![Incremental Aware Object Extension](caos-extension.png)
For Incremental Aware Object Extension, one object represents one volume.
For full backup, the skipped areas will be written as all ZERO by Incremental Aware Object Extension, since Kopia repository's interface doesn't support skippable write. But it is fine, the ZERO data will be deduplicated by Kopia repository so nothing is actually written to the backup storage.
For incremental backup, Incremental Aware Object Extension clones the table entries from the parent object for the skipped areas; for the written area, Incremental Aware Object Extension writes the data to Kopia repository and generate new entries. Finally, Incremental Aware Object Extension generates a new block address table for the incremental object which covers its entire logical space.
Incremental Aware Object Extension is automatically activated for block mode data access as set by `AccessMode` of `ObjectWriteOptions`.
#### Deduplication
The Incremental Aware Object Extension uses fix-sized splitter for deduplication, this is good enough for block level backup, reasons:
- Not like a file, a disk write never inserts data to the middle of the disk, it only does in-place update or append. So the data never shifts between two disks or the same disk of two different backups
- File system IO to disk general aligned to a specific size, e.g., 4KB for NTFS and ext4, as long as the chunk size is a multiply of this size, it effectively reduces the case that one IO kills two deduplication chunks
- For the usage cases that the disk is used as raw block device without a file system, the IO is still conducted by aligning to a specific boundary
The chunk size is intentionally chosen as 1MB, reasons:
- 1MB is a multiply of 4KB for file systems or common block sizes for raw block device usages
- 1MB is the start boundary of partitions for modern operating systems, for both MBR and GPT, so partition metadata could be isolated to a separate chunk
- The more chunks are there, the more indexes in the repository, 1MB is a moderate value regarding to the overhead of indexes for Kopia repository
#### Benefits
Since the existing block address table(BAT) of CAOS is reused and kept as is, it brings below benefits:
- All the entries are still managed by Kopia CAOS, so Velero doesn't need to keep an extra data
- The objects written by Velero block uploader is still recognizable by Kopia, for both full backup and incremental backup
- The existing data management in Kopia repository still works for objects generated by Velero block uploader, e.g., snapshot GC, repository maintenance, etc.
Most importantly, this solution is super performant:
- During incremental write, it doesn't copy any data from the parent object, instead, it only clones object block address entries
- During backup deletion, it doesn't need to move any data, it only deletes the BAT for the object
#### Uploader behavior
The block uploader's skippable write must also be aligned to this 1MB boundary, because Incremental Aware Object Extension needs to clone the entries that have been skipped from the parent object.
File system uploader is still using variable-sized deduplication, it is fine to keep data from the two uploaders into the same Kopia repository, though normally they won't be mutually deduplicated.
Volume could be resized; and volume size may not be aligned to 1MB boundary. The uploader need to handle the resize appropriately since Incremental Aware Object Extension cannot copy a BAT entry partially.
#### CBT Layer
CBT provides below functionalities:
1. For a full backup, it provides the allocated data ranges. E.g., for a 1TB volume, there may be only 1MB of files, with this functionality, the uploader could skip the ranges without real data
2. For an incremental backup, it provides the changed data ranges based on the provided parent snapshot. In this way, the uploader could skip the unchanged data and achieves an incremental backup
For case 1, the uploader calls Unified Repo Object's `WriteAt` method with the offset for the allocated data, ranges ahead of the offset will be filled as ZERO by unified repository.
For case 2, the uploader calls Unified Repo Object's `WriteAt` method with the offset for the changed data, ranges ahead of the offset will be cloned from the parent object unified repository.
A changeId is stored with each backup, the next backup will retrieve the parent snapshot's changeId and use it to retrieve the CBT.
The CBT retrieved from Kubernetes API are a list of `BlockMetadata`, each of range could be with fixed size or variable size.
Block uploader needs to maintain its own granularity that is friendly to its backup repository and uploader, as mentioned above.
From Kubernetes API, `GetMetadataAllocated` or `GetMetadataDelta` are called looply until all `BlockMetadata` are retrieved.
On the other hand, considering the complexity in uploader, e.g., multiple stream between read and write, the workflow should be driven by the uploader instead of the CBT iterator, therefore, in practice, all the allocated/changed blocks should be retrieved and preserved before passing it to the uploader.
As another fact, directly saving `BlockMetadata` list will be memory consuming.
With all the above considerations, the `Bitmap` data structure is used to save the allocated/changed blocks, calling CBT Bitmap.
CBT Bitmap chunk size could be set as 1MB or a multiply of it, but a larger chunk size would amplify the backup size, so 1MB size will be use.
Finally, interactions among CSI Snapshot Metadata Service, CBT Layer and Uploader is like below:
![CBT Layer](cbt.png)
In this way, CBT layer and uploader are decoupled and CBT bitmap plays as a north bound parameter of the uploader.
#### Block Uploader
Block uploader consists of the reader and writer which are running asynchronously.
During backup, reader reads data from the block device and also refers to CBT Bitmap for allocated/changed blocks; writer writes data to the Unified Repo.
During restore, reader reads data from the Unified Repo; writer writes data to the block device.
Reader and writer connects by a ring buffer, that is, reader pushes the block data to the ring buffer and writer gets data from the ring buffer and write to the target.
To improve performance, block device is opened with direct IO, so that no data is going through the system cache unnecessarily.
During restore, to optimize the write throughput and storage usage, zero blocks should be either skipped (for restoring to a new volume) or unmapped (for restoring to an existing volume). To cover the both cases in a unified way, the SCSI command `WRITE_SAME` is used. Logics are as below:
- Detect if a block read from the backup is with all zero data
- If true, the uploader sends `WRITE_SAME` SCSI command by calling `BLKZEROOUT` ioctl
- If the call fails, the uploader fallbaks to use the conservative way to write all zero bytes to the disk
Uploader implementation is OS dependent, but since Windows container doesn't support block volumes, the current implementation is for linux only.
#### ChangeId
ChangeId identifies the base that CBT is generated from, it must strictly map to the parent snapshot in the repository. Otherwise, there will be data corruption in the incremental backup.
Therefore, ChangeId is saved together with the repository snapshot.
The data mover always queries parent snapshot from Unified Repo together with the ChangeId. In this way, no mismatch would happen.
Inside the uploader, the upper layer (DataUpload controller) could also provide the ChangeId as a mechanism of double confirmation. The received ChangeId would be re-evaluated against the one in the provided snapshot.
For Kubernetes API, changeId is represented by `BaseSnapshotId`.
changeId retrieval is storage specific, generally, it is retrieved from the `SnapshotHandle` of the VolumeSnapshotContent object; however, storages may also refer to other places to retrieve the changeId.
That is, `SnapshotHandle` and changeId may be two different values, in this case, the both values need to be preserved.
#### Volume Snapshot Retention
Storages/CSI drivers may support the changeId differently based on the storage's capabilities:
1. In order to calculate the changes, some storages require the parent snapshot mapping to the changeId always exists at the time of `GetMetadataDelta` is called, then the parent snapshot can NOT be deleted as long as there are incremental backups based on it.
2. Some storages don't require the parent snapshot itself at the time of calculating changes, then parent snapshot could be deleted immediately after the parent backup completes.
The existing exposer works perfectly with Case 1, that is, the snapshot is always deleted when the backup completes.
However, for Case 2, since the snapshot must be retained, the exposer needs changes as below:
- At the end of each backup, keep the current VolumeSnapshot's `deletionPolicy` as `Retain`, then when the VolumeSnapshot is deleted at the end of the backup, the current snapshot is retained in the storage
- `GetMetadataDelta` is called with `BaseSnapshotId` set as the preserved changeId
- When deleting a backup, a VolumeSnapshot-VolumeSnapshotContent pair is rebuilt with `deletionPolicy` as `delete` and `snapshotHandle` as the preserved one
- Then the rebuilt VolumeSnapshot is deleted so that the volume snapshot is deleted from the storage
There is no way to automatically detect which way a specific volume support, so an interface is exposed to users to set the volume snapshot retention method.
The interface could be added to the `Action.Parameters` of Volume Policy. By default, Velero block data mover takes Way 1, so volume snapshot is never retained; if users specify `RetainSnapshot` parameter, Way 2 will be taken.
```go
type Action struct {
Type VolumeActionType `yaml:"type"`
Parameters map[string]any `yaml:"parameters,omitempty"`
}
```
In this way, users could specify --- for storage class "xxx" or CSI driver "yyy", backup through CSI snapshot with Velero block data mover and retain the snapshot.
#### Incremental Size
By the end of the backup, incremental size is also returned by the uploader, as same as Velero file system uploader. The size indicates how much data are unique so processed by the uploader, based on the provided CBT.
### Fallback to Full Backup
There are some occasions that the incremental backup won't continue, so the data mover fallbacks to full backup:
- `GetMetadataAllocated` or `GetMetadataDelta` returns error
- ChangeId is missing
- Parent snapshot is missing
When the fallback happens, the volume will be fully backed up from block level, but since because of the data deduplication from the backup repository, the unallocated/unchanged data would be probably deduplicated.
During restore, the volume will also be fully restored. The zero blocks handling as mentioned above is still working, so that write IO for unallocated data would be probably eliminated.
Fallback is to handle the exceptional cases, for most of the backups/restores, fallback is never expected.
### Irregular Volume Size
As mentioned above, during incremental backup, block uploader IO should be restricted to be aligned to the deduplication chunk size (1MB); on the other hand, there is no hard limit for users' volume size to be aligned.
To support volumes with irregular size, below measures are taken:
- Volume objects in the repository is always aligned to 1MB
- If the volume size is irregular, zero bytes will be padded to the tail of the volume object
- A real size is recorded in the repository snapshot
- During restore, the real size of data is restored
The padding must be always with zero bytes.
### Volume Size Change
Incremental backup could continue when volume is resized.
Block uploader supports to write disk with arbitrary size.
The volume resize cases don't need to be handled case by case.
Instead, when volume resize happens, block uploader needs to handle it appropriately in below ways:
- Loop with CBT
- Read data between RoundDownTo1M(newSize) and newSize to get the tail data
- If there is no tail data, which means the volume size is aligned to 1MB, then call `WriteAt(newSize, nil)`
- Otherwise, call `WriteAt(RoundDownTo1M(newSize), taildata)`, `taildata` is also padded to 1MB
That is to say:
- If CBT covers the tail of the volume, loop with CBT is enough for both shrink and expand case
- Otherwise, if volume is expanded, `WriteAt` guarantees to clone appropriate objects entries from the parent object and append zero data for the expanded areas. Particularly, if the parent volume is not in regular size, the zero padding bytes is also reused. Therefore, the parent object's padding bytes must be zero
- In the case the volume is shrunk, writing the tail data makes sure zero bytes are padding to the new volume object instead of inheriting non-zero data from the parent object
### Cancellation
The existing Cancellation mechanism is reused, so there is no change outside of the block uploader.
Inside the uploader, cancellation checkpoints are embedded to the uploader reader and writer, so that the execution could quit in a reasonable time once cancellation happens.
### Parallelism
Parallelism among data movers will reuse the existing mechanism --- load concurrency.
Inside the data mover, uploader reader and writer are always running in parallel. The number of reader and writer is always 1.
Sequential read/write of the volume is always optimized, there is no prove that multiple readers/writers are beneficial.
### Progress Report
Progress report outside of the data mover will reuse the existing mechanism.
Inside the data mover, progress update is embedded to the uploader writer.
The progress struct is kept as is, Velero block data mover still supports `TotalBytes` and `BytesDone`:
```go
type Progress struct {
TotalBytes int64 `json:"totalBytes,omitempty"`
BytesDone int64 `json:"doneBytes,omitempty"`
}
```
By the end of the backup, the progress for block data mover provides the same `GetIncrementalSize` which reports the incremental size of the backup, so that the incremental size is reported to users in the same way as the file system data mover.
### Selectable Backup Type
For many reasons, a periodical full backup is required:
- From user experience, a periodical full is required to make sure the data integrity among the incremental backups, e.g., every 1 week or 1 month
Therefore, backup type (full/incremental) should be supported in Velero's manual backup and backup schedule.
Backup type will also be added to `volumeInfo.json` to support observability purposes.
Backup TTL is still used for users to specify a backup's retention time. By default, both full and incremental backups are with 30 days retention, even though this is not so reasonable for the full backups. This could be enhanced when Velero supports sophisticated retention policy.
As a workaround, users could create two schedules for the same scope of backup, one is for full backups, with less frequency and longer backup TTL; the other one is for incremental backups, with normal frequency and shorter backup TTL.
#### File System Data Mover
At present, Velero file system data mover doesn't support selectable backup type, instead, incremental backups are always conducted once possible.
From user experience this is not reasonable.
Therefore, to solve this problem and to make it align with Velero block data mover, Velero file system data mover will support backup type as well.
At present, the data path for Velero file system data mover has already supported it, we only need to expose this functionality to users.
### Backup Describe
Backup type should be added to backup description, there are two appearances:
- The `backupType` in the Backup CR. This is the selected backup type by users
- The backup type recorded in `volumeInfo.json`, which is the actual type taken by the backup
With these two values, users are able to know the actual backup type and also whether a fallback happens.
The `DataMover` item in the existing backup description should be updated to reflect the actual data mover completing the backup, this information could be retrieved from `volumeInfo.json`.
### Backup Sync
No more data is required for sync, so Backup Sync is kept as is.
### Backup Deletion
As mentioned above, no data is moved when deleting a repo snapshot for Velero block data mover, so Backup Deletion is kept as is regarding to repo snapshot; and for volume snapshot retention case, backup deletion logics will be modified accordingly to delete the retained snapshots.
### Restarts
Restarts mechanism is reused without any change.
### Logging
Logging mechanism is not changed.
### Backup CRD
A `backupType` field is added to Backup CRD, two values are supported `full` or `incremental`.
`full` indicates the data mover to take a full backup.
`incremental` which is the default value, indicates the data mover to take an incremental backup.
```yaml
spec:
description: BackupSpec defines the specification for a Velero backup.
properties:
backupType:
description: BackupType indicates the type of the backup
enum:
- full
- incremental
type: string
```
### DataUpload CRD
A `parentSnapshot` field is added to the DataUpload CRD, below values are supported:
- `""`: it fallbacks to `auto`
- `auto`: it means the data mover finds the recent snapshot of the same volume from Unified Repository and use it as the parent
- `none`: it means the data mover is not assigned with a parent snapshot, so it runs a full backup
- a specific snapshotID: it means the data mover use the specific snapshotID to find the parent snapshot. If it cannot be found, the data mover fallbacks to a full backup
The last option is for a backup plan, it will not be used for now and may be useful when Velero supports sophisticated retention policy. This means, Velero always finds the recent backup as the parent.
When `backupType` of the Backup is `full`, the data mover controller sets `none` to `parentSnapshot` of DataUpload.
When `backupType` of the Backup is `incremental`, the data mover controller sets `auto` to `parentSnapshot` of DataUpload. And `""` is just kept for backwards compatibility consideration.
```yaml
spec:
description: DataUploadSpec is the specification for a DataUpload.
properties:
parentSnapshot:
description: |-
ParentSnapshot specifies the parent snapshot that current backup is based on.
If its value is "" or "auto", the data mover finds the recent backup of the same volume as parent.
If its value is "none", the data mover will do a full backup
If its value is a specific snapshotID, the data mover finds the specific snapshot as parent.
type: string
```
### DataDownload CRD
No change is required to DataDownload CRD.
## Plugin Data Movers
The current design doesn't break anything for plugin data movers.
The enhancement in VolumePolicy could also be used for plugin data movers. That is, users could select a plugin data mover through VolumePolicy as same as Velero built-in data movers.
## Installation
No change to Installation.
## Upgrade
No impacts to Upgrade. The new fields in the CRDs are all optional fields and have backwards compatible values.
## CLI
Backup type parameter is added to Velero CLI as below:
```
velero backup create --full
velero schedule create --full
```
When the parameter is not specified, by default, Velero goes with incremental backups.
[1]: ../Implemented/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md
[2]: ../Implemented/volume-snapshot-data-movement/volume-snapshot-data-movement.md
[3]: ../Implemented/vgdp-micro-service/vgdp-micro-service.md
[4]: https://kubernetes.io/blog/2025/09/25/csi-changed-block-tracking/
[5]: https://kopia.io/docs/advanced/architecture/
Binary file not shown.

Before

Width:  |  Height:  |  Size: 518 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 377 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 389 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 476 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 547 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 504 KiB

-93
View File
@@ -1,93 +0,0 @@
# Add custom volume policy action
## Abstract
Currently, velero supports 3 different volume policy actions:
snapshot, fs-backup, and skip, which tell Velero how to handle backing
up PVC contents. Any other policy action is not allowed. This prevents
third party BackupItemAction (BIA) plugins which might want to perform
different actions on PVC via defined volume policies.
## Background
An external BIA plugin that wants to back up volumes via some custom
means (i.e. not CSI snapshots or fs-backup with kopia) is not able to
make use of the existing volume policy API. While the plugin could use
something like PVC annotations instead, this won't integrate with
existing volume policies, which is desirable in case the user wants to
specify some PVCs to use the custom plugin while leaving others using
CSI snapshots or fs-backup.
## Goals
- Add a fourth valid volume policy action "custom"
- Make use of the existing action parameters field to distinguish between multiple custom actions.
## Non Goals
- Implementing custom action logic in velero repo
## High-Level Design
A new VolumeActionType with the value "custom" will be added to
`internal/resourcepolicies`. When the action is "custom", velero will
not perform a snapshot or use fs-backup on the PVC. If there is no
registered plugin which implements the desired custom action, then it
will be equivalent to the "skip" action. Since there could be
different plugins that implement custom actions, when making the API
call (defined below) the plugin should also pass in a partial action
parameters map containing the portion of the map that identifies the
custom plugin as belonging to a particular external
implementation. For example, there might be a custom BIA that's
looking for a `custom` volume policy action with the parameter
`myCustomAction=true`. The volume policy action would be defined like
this:
```yaml
action:
type: custom
parameters:
myCustomAction: true
```
In `internal/volumehelper/volume_policy_helper.go` a new interface
method will be added, similar to `ShouldPerformSnapshot` but it takes
a partial parameter map as an additional input, since for custom
actions to match, the action type must be `custom`, but also there may
be some parameter that needs to match (to distinguish between
different custom actions). We also want a way for the plugin to get
the parameter map for the action. This should probably just return the
map rather than the Action struct is under `internal`.
```go
type VolumeHelper interface {
ShouldPerformSnapshot(obj runtime.Unstructured, groupResource schema.GroupResource) (bool, error)
ShouldPerformFSBackup(volume corev1api.Volume, pod corev1api.Pod) (bool, error)
ShouldPerformCustomAction(obj runtime.Unstructured, groupResource schema.GroupResource, map[string]any) (bool, error)
GetActionParameters(obj runtime.Unstructured, groupResource schema.GroupResource) (map[string]any, error)
}
```
In addition, since the VolumeHelper interface is expected to be called by external plugins, the interface (but not the implementation) should be moved from `internal/volumehelper` to `pkg/util/volumehelper`.
In `pkg/plugin/utils/volumehelper/volume_policy_helper.go`, a new helper func will be added which delegates to the internal volumehelper.NewVolumeHelperImplWithNamespaces
```go
func NewVolumeHelper(
volumePolicy *resourcepolicies.Policies,
snapshotVolumes *bool,
logger logrus.FieldLogger,
client crclient.Client,
defaultVolumesToFSBackup bool,
backupExcludePVC bool,
namespaces []string,
) (VolumeHelper, error) {
```
## Alternative Considered
An alternate approach was to create a new server arg to allow
user-defined parameters. That was rejected in favor of this approach,
as the explicitly-supported "custom" option integrates more easily
into a supportable plugin-callable API.

Some files were not shown because too many files have changed in this diff Show More