mirror of
https://github.com/vmware-tanzu/velero.git
synced 2026-04-17 22:21:05 +00:00
Compare commits
164 Commits
fix_e2e_ve
...
dependabot
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ac1f6e7f3e | ||
|
|
8e9e6b4d36 | ||
|
|
71b230f82e | ||
|
|
fc6361ba06 | ||
|
|
39db9f9c1e | ||
|
|
4d9bd91200 | ||
|
|
c5fa50bedc | ||
|
|
df2686c146 | ||
|
|
8a6ac7af1c | ||
|
|
a990bd81f1 | ||
|
|
15db9d2552 | ||
|
|
1b4c7fe4be | ||
|
|
5b9bcc99f1 | ||
|
|
e921c177cc | ||
|
|
cf605c948e | ||
|
|
cd89c0ffa7 | ||
|
|
eb0a1814c6 | ||
|
|
eaef4ead42 | ||
|
|
7562011b79 | ||
|
|
4a6756d57b | ||
|
|
e1cc07cec3 | ||
|
|
1730b7f414 | ||
|
|
37abfb4bfa | ||
|
|
0cf8f94268 | ||
|
|
1b5503e20b | ||
|
|
e439977117 | ||
|
|
dd82645909 | ||
|
|
22f93ad457 | ||
|
|
9598c50295 | ||
|
|
54761092c1 | ||
|
|
dca3d3001f | ||
|
|
e8fa708933 | ||
|
|
fca4d405b1 | ||
|
|
d3f4b2c67e | ||
|
|
235e579581 | ||
|
|
dd1def9d33 | ||
|
|
baf2491344 | ||
|
|
e79ad64a10 | ||
|
|
e9226527de | ||
|
|
78fba2146c | ||
|
|
4dbdd2df3a | ||
|
|
6869b7bf54 | ||
|
|
30ddf3f35f | ||
|
|
38d9e96130 | ||
|
|
531fc4810f | ||
|
|
94259e8a5c | ||
|
|
5433eb3081 | ||
|
|
238b1e1f13 | ||
|
|
ef7b468fb9 | ||
|
|
f0aa64172e | ||
|
|
3f8e358849 | ||
|
|
bbd5ae079d | ||
|
|
91922103b4 | ||
|
|
e6bdff61bd | ||
|
|
c74d5e7aba | ||
|
|
905a561c84 | ||
|
|
e9d312c27e | ||
|
|
a5391e13e7 | ||
|
|
e368fc8803 | ||
|
|
b5734a6ba2 | ||
|
|
65c88f3425 | ||
|
|
bb9a94bebe | ||
|
|
74401b20b0 | ||
|
|
417d3d2562 | ||
|
|
68cee893f1 | ||
|
|
fce276bca9 | ||
|
|
ade433ecbd | ||
|
|
48e66b1790 | ||
|
|
29a9f80f10 | ||
|
|
70043af85b | ||
|
|
66ac235e1f | ||
|
|
afe7df17d4 | ||
|
|
a31f4abcb3 | ||
|
|
2145c57642 | ||
|
|
a9b3cfa062 | ||
|
|
bca6afada7 | ||
|
|
d1cc303553 | ||
|
|
befa61cee1 | ||
|
|
245525c26b | ||
|
|
55737b9cf1 | ||
|
|
ffea850522 | ||
|
|
d315bca32b | ||
|
|
b3aff97684 | ||
|
|
23a3c242fa | ||
|
|
b7bc16f190 | ||
|
|
bbec46f6ee | ||
|
|
475050108b | ||
|
|
b5f7cd92c7 | ||
|
|
ab31b811ee | ||
|
|
19360622e7 | ||
|
|
932d27541c | ||
|
|
b0642b3078 | ||
|
|
9cada8fc11 | ||
|
|
25d5fa1b88 | ||
|
|
1c08af8461 | ||
|
|
6c3d81a146 | ||
|
|
81029d64ff | ||
|
|
8f32696449 | ||
|
|
3f15e9219f | ||
|
|
62aa70219b | ||
|
|
544b184d6c | ||
|
|
250c4db158 | ||
|
|
f0d81c56e2 | ||
|
|
8b5559274d | ||
|
|
a230929111 | ||
|
|
7235180de4 | ||
|
|
ba5e7681ff | ||
|
|
fc0a16d734 | ||
|
|
bcdee1b116 | ||
|
|
2a696a4431 | ||
|
|
991bf1b000 | ||
|
|
4d47471932 | ||
|
|
0bf968d24d | ||
|
|
158681e927 | ||
|
|
05c9a8d8f8 | ||
|
|
bc957a22b7 | ||
|
|
7e3d66adc7 | ||
|
|
710ebb9d92 | ||
|
|
1315399f35 | ||
|
|
eadacf43e1 | ||
|
|
ddd83a66c5 | ||
|
|
7af688fbf5 | ||
|
|
41fa774844 | ||
|
|
5121417457 | ||
|
|
ece04e6e39 | ||
|
|
71ddeefcd6 | ||
|
|
e159992f48 | ||
|
|
48b14194df | ||
|
|
4ada356bf1 | ||
|
|
7f51017842 | ||
|
|
556d5826a8 | ||
|
|
62939cec18 | ||
|
|
7d6a10d3ea | ||
|
|
1c0cf6c51d | ||
|
|
58f0b29091 | ||
|
|
5cb4cdba61 | ||
|
|
325eb50480 | ||
|
|
993b80a350 | ||
|
|
a909bd1f85 | ||
|
|
62a47b9fc5 | ||
|
|
31e9dcbb87 | ||
|
|
f824c3ca3b | ||
|
|
386599638f | ||
|
|
9796da389d | ||
|
|
dfb1d45831 | ||
|
|
18c32ed29c | ||
|
|
598c8c528b | ||
|
|
8f9beb04f0 | ||
|
|
bb518e6d89 | ||
|
|
89c5182c3c | ||
|
|
d17435542e | ||
|
|
72beb35edc | ||
|
|
e3b501d0d9 | ||
|
|
7442d20f9d | ||
|
|
4dfb47dd21 | ||
|
|
e72fea8ecd | ||
|
|
f388a5ce51 | ||
|
|
e703e06eeb | ||
|
|
1feaafc03e | ||
|
|
e446ce54f6 | ||
|
|
b7289b51c7 | ||
|
|
6eae73f0bf | ||
|
|
1425ebb369 | ||
|
|
060b3364f2 |
2
.github/workflows/nightly-trivy-scan.yml
vendored
2
.github/workflows/nightly-trivy-scan.yml
vendored
@@ -22,7 +22,7 @@ jobs:
|
||||
uses: actions/checkout@v6
|
||||
|
||||
- name: Run Trivy vulnerability scanner
|
||||
uses: aquasecurity/trivy-action@master
|
||||
uses: aquasecurity/trivy-action@57a97c7e7821a5776cebc9bb87c984fa69cba8f1
|
||||
with:
|
||||
image-ref: 'docker.io/velero/${{ matrix.images }}:${{ matrix.versions }}'
|
||||
severity: 'CRITICAL,HIGH,MEDIUM'
|
||||
|
||||
93
.github/workflows/pr-filepath-check.yml
vendored
Normal file
93
.github/workflows/pr-filepath-check.yml
vendored
Normal file
@@ -0,0 +1,93 @@
|
||||
name: Pull Request File Path Check
|
||||
on: [pull_request]
|
||||
jobs:
|
||||
|
||||
filepath-check:
|
||||
name: Check for invalid characters in file paths
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
|
||||
- name: Check out the code
|
||||
uses: actions/checkout@v6
|
||||
|
||||
- name: Validate file paths for Go module compatibility
|
||||
run: |
|
||||
# Go's module zip rejects filenames containing certain characters.
|
||||
# See golang.org/x/mod/module fileNameOK() for the full specification.
|
||||
#
|
||||
# Allowed ASCII: letters, digits, and: !#$%&()+,-.=@[]^_{}~ and space
|
||||
# Allowed non-ASCII: unicode letters only
|
||||
# Rejected: " ' * < > ? ` | / \ : and any non-letter unicode (control
|
||||
# chars, format chars like U+200E LEFT-TO-RIGHT MARK, etc.)
|
||||
#
|
||||
# This check catches issues like the U+200E incident in PR #9552.
|
||||
|
||||
EXIT_STATUS=0
|
||||
|
||||
git ls-files -z | python3 -c "
|
||||
import sys, unicodedata
|
||||
|
||||
data = sys.stdin.buffer.read()
|
||||
files = data.split(b'\x00')
|
||||
|
||||
# Characters explicitly rejected by Go's fileNameOK
|
||||
# (path separators / and \ are inherent to paths so we check per-element)
|
||||
bad_ascii = set('\"' + \"'\" + '*<>?\`|:')
|
||||
|
||||
allowed_ascii = set('!#$%&()+,-.=@[]^_{}~ ')
|
||||
|
||||
def is_ok(ch):
|
||||
if ch.isascii():
|
||||
return ch.isalnum() or ch in allowed_ascii
|
||||
return ch.isalpha()
|
||||
|
||||
bad_files = [] # list of (original_path, clean_path, char_desc)
|
||||
for f in files:
|
||||
if not f:
|
||||
continue
|
||||
try:
|
||||
name = f.decode('utf-8')
|
||||
except UnicodeDecodeError:
|
||||
print(f'::error::Non-UTF-8 bytes in filename: {f!r}')
|
||||
bad_files.append((repr(f), None, 'non-UTF-8 bytes'))
|
||||
continue
|
||||
|
||||
# Check each path element (split on /)
|
||||
for element in name.split('/'):
|
||||
for ch in element:
|
||||
if not is_ok(ch):
|
||||
cp = ord(ch)
|
||||
char_name = unicodedata.name(ch, f'U+{cp:04X}')
|
||||
char_desc = f'U+{cp:04X} ({char_name})'
|
||||
# Build cleaned path by stripping invalid chars
|
||||
clean = '/'.join(
|
||||
''.join(c for c in elem if is_ok(c))
|
||||
for elem in name.split('/')
|
||||
)
|
||||
print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}')
|
||||
bad_files.append((name, clean, char_desc))
|
||||
break
|
||||
|
||||
if bad_files:
|
||||
print()
|
||||
print('The following files have characters that are invalid in Go module zip archives:')
|
||||
print()
|
||||
for original, clean, desc in bad_files:
|
||||
print(f' {original} — {desc}')
|
||||
print()
|
||||
print('To fix, rename the files to remove the problematic characters:')
|
||||
print()
|
||||
for original, clean, desc in bad_files:
|
||||
if clean:
|
||||
print(f' mv \"{original}\" \"{clean}\" && git add \"{clean}\"')
|
||||
print(f' # or: git mv \"{original}\" \"{clean}\"')
|
||||
else:
|
||||
print(f' # {original} — cannot auto-suggest rename (non-UTF-8)')
|
||||
print()
|
||||
print('See https://github.com/vmware-tanzu/velero/pull/9552 for context.')
|
||||
sys.exit(1)
|
||||
else:
|
||||
print('All file paths are valid for Go module zip.')
|
||||
" || EXIT_STATUS=1
|
||||
|
||||
exit $EXIT_STATUS
|
||||
@@ -17,6 +17,7 @@ If you're using Velero and want to add your organization to this list,
|
||||
<a href="https://www.replicated.com/" border="0" target="_blank"><img alt="replicated.com" src="site/static/img/adopters/replicated-logo-red.svg" height="50"></a>
|
||||
<a href="https://cloudcasa.io/" border="0" target="_blank"><img alt="cloudcasa.io" src="site/static/img/adopters/cloudcasa.svg" height="50"></a>
|
||||
<a href="https://azure.microsoft.com/" border="0" target="_blank"><img alt="azure.com" src="site/static/img/adopters/azure.svg" height="50"></a>
|
||||
<a href="https://www.broadcom.com/" border="0" target="_blank"><img alt="broadcom.com" src="site/static/img/adopters/broadcom.svg" height="50"></a>
|
||||
## Success Stories
|
||||
|
||||
Below is a list of adopters of Velero in **production environments** that have
|
||||
@@ -68,6 +69,9 @@ Replicated uses the Velero open source project to enable snapshots in [KOTS][101
|
||||
**[Microsoft Azure][105]**<br>
|
||||
[Azure Backup for AKS][106] is an Azure native, Kubernetes aware, Enterprise ready backup for containerized applications deployed on Azure Kubernetes Service (AKS). AKS Backup utilizes Velero to perform backup and restore operations to protect stateful applications in AKS clusters.<br>
|
||||
|
||||
**[Broadcom][107]**<br>
|
||||
[VMware Cloud Foundation][108] (VCF) offers built-in [vSphere Kubernetes Service][109] (VKS), a Kubernetes runtime that includes a CNCF certified Kubernetes distribution, to deploy and manage containerized workloads. VCF empowers platform engineers with native [Kubernetes multi-cluster management][110] capability for managing Kubernetes (K8s) infrastructure at scale. VCF utilizes Velero for Kubernetes data protection enabling platform engineers to back up and restore containerized workloads manifests & persistent volumes, helping to increase the resiliency of stateful applications in VKS cluster.
|
||||
|
||||
## Adding your organization to the list of Velero Adopters
|
||||
|
||||
If you are using Velero and would like to be included in the list of `Velero Adopters`, add an SVG version of your logo to the `site/static/img/adopters` directory in this repo and submit a [pull request][3] with your change. Name the image file something that reflects your company (e.g., if your company is called Acme, name the image acme.png). See this for an example [PR][4].
|
||||
@@ -125,3 +129,8 @@ If you would like to add your logo to a future `Adopters of Velero` section on [
|
||||
|
||||
[105]: https://azure.microsoft.com/
|
||||
[106]: https://learn.microsoft.com/azure/backup/backup-overview
|
||||
|
||||
[107]: https://www.broadcom.com/
|
||||
[108]: https://www.vmware.com/products/cloud-infrastructure/vmware-cloud-foundation
|
||||
[109]: https://www.vmware.com/products/cloud-infrastructure/vsphere-kubernetes-service
|
||||
[110]: https://blogs.vmware.com/cloud-foundation/2025/09/29/empowering-platform-engineers-with-native-kubernetes-multi-cluster-management-in-vmware-cloud-foundation/
|
||||
29
Dockerfile
29
Dockerfile
@@ -13,7 +13,7 @@
|
||||
# limitations under the License.
|
||||
|
||||
# Velero binary build section
|
||||
FROM --platform=$BUILDPLATFORM golang:1.25-bookworm AS velero-builder
|
||||
FROM --platform=$BUILDPLATFORM golang:1.25-trixie AS velero-builder
|
||||
|
||||
ARG GOPROXY
|
||||
ARG BIN
|
||||
@@ -48,30 +48,6 @@ RUN mkdir -p /output/usr/bin && \
|
||||
-ldflags "${LDFLAGS}" ${PKG}/cmd/velero-helper && \
|
||||
go clean -modcache -cache
|
||||
|
||||
# Restic binary build section
|
||||
FROM --platform=$BUILDPLATFORM golang:1.25-bookworm AS restic-builder
|
||||
|
||||
ARG GOPROXY
|
||||
ARG BIN
|
||||
ARG TARGETOS
|
||||
ARG TARGETARCH
|
||||
ARG TARGETVARIANT
|
||||
ARG RESTIC_VERSION
|
||||
|
||||
ENV CGO_ENABLED=0 \
|
||||
GO111MODULE=on \
|
||||
GOPROXY=${GOPROXY} \
|
||||
GOOS=${TARGETOS} \
|
||||
GOARCH=${TARGETARCH} \
|
||||
GOARM=${TARGETVARIANT}
|
||||
|
||||
COPY . /go/src/github.com/vmware-tanzu/velero
|
||||
|
||||
RUN mkdir -p /output/usr/bin && \
|
||||
export GOARM=$(echo "${GOARM}" | cut -c2-) && \
|
||||
/go/src/github.com/vmware-tanzu/velero/hack/build-restic.sh && \
|
||||
go clean -modcache -cache
|
||||
|
||||
# Velero image packing section
|
||||
FROM paketobuildpacks/run-jammy-tiny:latest
|
||||
|
||||
@@ -79,7 +55,4 @@ LABEL maintainer="Xun Jiang <jxun@vmware.com>"
|
||||
|
||||
COPY --from=velero-builder /output /
|
||||
|
||||
COPY --from=restic-builder /output /
|
||||
|
||||
USER cnb:cnb
|
||||
|
||||
|
||||
@@ -15,7 +15,7 @@
|
||||
ARG OS_VERSION=1809
|
||||
|
||||
# Velero binary build section
|
||||
FROM --platform=$BUILDPLATFORM golang:1.25-bookworm AS velero-builder
|
||||
FROM --platform=$BUILDPLATFORM golang:1.25-trixie AS velero-builder
|
||||
|
||||
ARG GOPROXY
|
||||
ARG BIN
|
||||
|
||||
@@ -7,11 +7,11 @@
|
||||
| Maintainer | GitHub ID | Affiliation |
|
||||
|---------------------|---------------------------------------------------------------|--------------------------------------------------|
|
||||
| Scott Seago | [sseago](https://github.com/sseago) | [OpenShift](https://github.com/openshift) |
|
||||
| Daniel Jiang | [reasonerjt](https://github.com/reasonerjt) | [VMware](https://www.github.com/vmware/) |
|
||||
| Wenkai Yin | [ywk253100](https://github.com/ywk253100) | [VMware](https://www.github.com/vmware/) |
|
||||
| Xun Jiang | [blackpiglet](https://github.com/blackpiglet) | [VMware](https://www.github.com/vmware/) |
|
||||
| Daniel Jiang | [reasonerjt](https://github.com/reasonerjt) | Broadcom |
|
||||
| Wenkai Yin | [ywk253100](https://github.com/ywk253100) | Broadcom |
|
||||
| Xun Jiang | [blackpiglet](https://github.com/blackpiglet) | Broadcom |
|
||||
| Shubham Pampattiwar | [shubham-pampattiwar](https://github.com/shubham-pampattiwar) | [OpenShift](https://github.com/openshift) |
|
||||
| Yonghui Li | [Lyndon-Li](https://github.com/Lyndon-Li) | [VMware](https://www.github.com/vmware/) |
|
||||
| Yonghui Li | [Lyndon-Li](https://github.com/Lyndon-Li) | Broadcom |
|
||||
| Anshul Ahuja | [anshulahuja98](https://github.com/anshulahuja98) | [Microsoft Azure](https://www.github.com/azure/) |
|
||||
| Tiger Kaovilai | [kaovilai](https://github.com/kaovilai) | [OpenShift](https://github.com/openshift) |
|
||||
|
||||
@@ -27,14 +27,3 @@
|
||||
* JenTing Hsiao ([jenting](https://github.com/jenting))
|
||||
* Dave Smith-Uchida ([dsu-igeek](https://github.com/dsu-igeek))
|
||||
* Ming Qiu ([qiuming-best](https://github.com/qiuming-best))
|
||||
|
||||
## Velero Contributors & Stakeholders
|
||||
|
||||
| Feature Area | Lead |
|
||||
|------------------------|:------------------------------------------------------------------------------------:|
|
||||
| Technical Lead | Daniel Jiang [reasonerjt](https://github.com/reasonerjt) |
|
||||
| Kubernetes CSI Liaison | |
|
||||
| Deployment | |
|
||||
| Community Management | Orlin Vasilev [OrlinVasilev](https://github.com/OrlinVasilev) |
|
||||
| Product Management | Pradeep Kumar Chaturvedi [pradeepkchaturvedi](https://github.com/pradeepkchaturvedi) |
|
||||
|
||||
|
||||
3
Makefile
3
Makefile
@@ -105,8 +105,6 @@ see: https://velero.io/docs/main/build-from-source/#making-images-and-updating-v
|
||||
endef
|
||||
# comma cannot be escaped and can only be used in Make function arguments by putting into variable
|
||||
comma=,
|
||||
# The version of restic binary to be downloaded
|
||||
RESTIC_VERSION ?= 0.15.0
|
||||
|
||||
CLI_PLATFORMS ?= linux-amd64 linux-arm linux-arm64 darwin-amd64 darwin-arm64 windows-amd64 linux-ppc64le linux-s390x
|
||||
BUILD_OUTPUT_TYPE ?= docker
|
||||
@@ -260,7 +258,6 @@ container-linux:
|
||||
--build-arg=GIT_SHA=$(GIT_SHA) \
|
||||
--build-arg=GIT_TREE_STATE=$(GIT_TREE_STATE) \
|
||||
--build-arg=REGISTRY=$(REGISTRY) \
|
||||
--build-arg=RESTIC_VERSION=$(RESTIC_VERSION) \
|
||||
--provenance=false \
|
||||
--sbom=false \
|
||||
-f $(VELERO_DOCKERFILE) .
|
||||
|
||||
@@ -42,13 +42,11 @@ The following is a list of the supported Kubernetes versions for each Velero ver
|
||||
|
||||
| Velero version | Expected Kubernetes version compatibility | Tested on Kubernetes version |
|
||||
|----------------|-------------------------------------------|-------------------------------------|
|
||||
| 1.17 | 1.18-latest | 1.31.7, 1.32.3, 1.33.1, and 1.34.0 |
|
||||
| 1.18 | 1.18-latest | 1.33.7, 1.34.1, and 1.35.0 |
|
||||
| 1.17 | 1.18-latest | 1.31.7, 1.32.3, 1.33.1, and 1.34.0 |
|
||||
| 1.16 | 1.18-latest | 1.31.4, 1.32.3, and 1.33.0 |
|
||||
| 1.15 | 1.18-latest | 1.28.8, 1.29.8, 1.30.4 and 1.31.1 |
|
||||
| 1.14 | 1.18-latest | 1.27.9, 1.28.9, and 1.29.4 |
|
||||
| 1.13 | 1.18-latest | 1.26.5, 1.27.3, 1.27.8, and 1.28.3 |
|
||||
| 1.12 | 1.18-latest | 1.25.7, 1.26.5, 1.26.7, and 1.27.3 |
|
||||
| 1.11 | 1.18-latest | 1.23.10, 1.24.9, 1.25.5, and 1.26.1 |
|
||||
|
||||
Velero supports IPv4, IPv6, and dual stack environments. Support for this was tested against Velero v1.8.
|
||||
|
||||
|
||||
6
Tiltfile
6
Tiltfile
@@ -103,11 +103,6 @@ local_resource(
|
||||
deps = ["internal", "pkg/cmd"],
|
||||
)
|
||||
|
||||
local_resource(
|
||||
"restic_binary",
|
||||
cmd = 'cd ' + '.' + ';mkdir -p _tiltbuild/restic; BIN=velero GOOS=linux GOARCH=amd64 GOARM="" RESTIC_VERSION=0.13.1 OUTPUT_DIR=_tiltbuild/restic ./hack/build-restic.sh',
|
||||
)
|
||||
|
||||
# Note: we need a distro with a bash shell to exec into the Velero container
|
||||
tilt_dockerfile_header = """
|
||||
FROM ubuntu:22.04 as tilt
|
||||
@@ -118,7 +113,6 @@ WORKDIR /
|
||||
COPY --from=tilt-helper /start.sh .
|
||||
COPY --from=tilt-helper /restart.sh .
|
||||
COPY velero .
|
||||
COPY restic/restic /usr/bin/restic
|
||||
"""
|
||||
|
||||
dockerfile_contents = "\n".join([
|
||||
|
||||
109
changelogs/CHANGELOG-1.18.md
Normal file
109
changelogs/CHANGELOG-1.18.md
Normal file
@@ -0,0 +1,109 @@
|
||||
## v1.18
|
||||
|
||||
### Download
|
||||
https://github.com/vmware-tanzu/velero/releases/tag/v1.18.0
|
||||
|
||||
### Container Image
|
||||
`velero/velero:v1.18.0`
|
||||
|
||||
### Documentation
|
||||
https://velero.io/docs/v1.18/
|
||||
|
||||
### Upgrading
|
||||
https://velero.io/docs/v1.18/upgrade-to-1.18/
|
||||
|
||||
### Highlights
|
||||
#### Concurrent backup
|
||||
In v1.18, Velero is capable to process multiple backups concurrently. This is a significant usability improvement, especially for multiple tenants or multiple users case, backups submitted from different users could run their backups simultaneously without interfering with each other.
|
||||
|
||||
Check design https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/concurrent-backup-processing.md for more details.
|
||||
|
||||
#### Cache volume for data movers
|
||||
In v1.18, Velero allows users to configure cache volumes for data mover pods during restore for CSI snapshot data movement and fs-backup. This brings below benefits:
|
||||
- Solve the problem that data mover pods fail to when pod's ephemeral disk is limited
|
||||
- Solve the problem that multiple data mover pods fail to run concurrently in one node when the node's ephemeral disk is limited
|
||||
- Working together with backup repository's cache limit configuration, cache volume with appropriate size helps to improve the restore throughput
|
||||
|
||||
Check design https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/backup-repo-cache-volume.md for more details.
|
||||
|
||||
#### Incremental size for data movers
|
||||
In v1.18, Velero allows users to observe the incremental size of data movers backups for CSI snapshot data movement and fs-backup, so that users could visually see the data reduction due to incremental backup.
|
||||
|
||||
#### Wildcard support for namespaces
|
||||
In v1.18, Velero allows to use Glob regular expressions for namespace filters during backup and restore, so that users could filter namespaces in a batch manner.
|
||||
|
||||
#### VolumePolicy for PVC phase
|
||||
In v1.18, Velero VolumePolicy supports actions by PVC phase, which help users to do special operations for PVCs with a specific phase, e.g., skip PVCs in Pending/Lost status from the backup.
|
||||
|
||||
#### Scalability and Resiliency improvements
|
||||
##### Prevent Velero server OOM Kill for large backup repositories
|
||||
In v1.18, some backup repository operations are delay executed out of Velero server, so Velero server won't be OOM Killed.
|
||||
|
||||
#### Performance improvement for VolumePolicy
|
||||
In v1.18, VolumePolicy is enhanced for large number of pods/PVCs so that the performance is significantly improved.
|
||||
|
||||
#### Events for data mover pod diagnostic
|
||||
In v1.18, events are recorded into data mover pod diagnostic, which allows user to see more information for troubleshooting when the data mover pod fails.
|
||||
|
||||
### Runtime and dependencies
|
||||
Golang runtime: 1.25.7
|
||||
kopia: 0.22.3
|
||||
|
||||
### Limitations/Known issues
|
||||
|
||||
### Breaking changes
|
||||
#### Deprecation of PVC selected node feature
|
||||
According to [Velero deprecation policy](https://github.com/vmware-tanzu/velero/blob/main/GOVERNANCE.md#deprecation-policy), PVC selected node feature is deprecated in v1.18. Velero could appropriately handle PVC's selected-node annotation, so users don't need to do anything particularly.
|
||||
|
||||
### All Changes
|
||||
* Remove backup from running list when backup fails validation (#9498, @sseago)
|
||||
* Maintenance Job only uses the first element of the LoadAffinity array (#9494, @blackpiglet)
|
||||
* Fix issue #9478, add diagnose info on expose peek fails (#9481, @Lyndon-Li)
|
||||
* Add Role, RoleBinding, ClusterRole, and ClusterRoleBinding in restore sequence. (#9474, @blackpiglet)
|
||||
* Add maintenance job and data mover pod's labels and annotations setting. (#9452, @blackpiglet)
|
||||
* Fix plugin init container names exceeding DNS-1123 limit (#9445, @mpryc)
|
||||
* Add PVC-to-Pod cache to improve volume policy performance (#9441, @shubham-pampattiwar)
|
||||
* Remove VolumeSnapshotClass from CSI B/R process. (#9431, @blackpiglet)
|
||||
* Use hookIndex for recording multiple restore exec hooks. (#9366, @blackpiglet)
|
||||
* Sanitize Azure HTTP responses in BSL status messages (#9321, @shubham-pampattiwar)
|
||||
* Remove labels associated with previous backups (#9206, @Joeavaikath)
|
||||
* Add VolumePolicy support for PVC Phase conditions to allow skipping Pending PVCs (#9166, @claude)
|
||||
* feat: Enhance BackupStorageLocation with Secret-based CA certificate support (#9141, @kaovilai)
|
||||
* Add `--apply` flag to `install` command, allowing usage of Kubernetes apply to make changes to existing installs (#9132, @mjnagel)
|
||||
* Fix issue #9194, add doc for GOMAXPROCS behavior change (#9420, @Lyndon-Li)
|
||||
* Apply volume policies to VolumeGroupSnapshot PVC filtering (#9419, @shubham-pampattiwar)
|
||||
* Fix issue #9276, add doc for cache volume support (#9418, @Lyndon-Li)
|
||||
* Add Prometheus metrics for maintenance jobs (#9414, @shubham-pampattiwar)
|
||||
* Fix issue #9400, connect repo first time after creation so that init params could be written (#9407, @Lyndon-Li)
|
||||
* Cache volume for PVR (#9397, @Lyndon-Li)
|
||||
* Cache volume support for DataDownload (#9391, @Lyndon-Li)
|
||||
* don't copy securitycontext from first container if configmap found (#9389, @sseago)
|
||||
* Refactor repo provider interface for static configuration (#9379, @Lyndon-Li)
|
||||
* Fix issue #9365, prevent fake completion notification due to multiple update of single PVR (#9375, @Lyndon-Li)
|
||||
* Add cache volume configuration (#9370, @Lyndon-Li)
|
||||
* Track actual resource names for GenerateName in restore status (#9368, @shubham-pampattiwar)
|
||||
* Fix managed fields patch for resources using GenerateName (#9367, @shubham-pampattiwar)
|
||||
* Support cache volume for generic restore exposer and pod volume exposer (#9362, @Lyndon-Li)
|
||||
* Add incrementalSize to DU/PVB for reporting new/changed size (#9357, @sseago)
|
||||
* Add snapshotSize for DataDownload, PodVolumeRestore (#9354, @Lyndon-Li)
|
||||
* Add cache dir configuration for udmrepo (#9353, @Lyndon-Li)
|
||||
* Fix the Job build error when BackupReposiotry name longer than 63. (#9350, @blackpiglet)
|
||||
* Add cache configuration to VGDP (#9342, @Lyndon-Li)
|
||||
* Fix issue #9332, add bytesDone for cache files (#9333, @Lyndon-Li)
|
||||
* Fix typos in documentation (#9329, @T4iFooN-IX)
|
||||
* Concurrent backup processing (#9307, @sseago)
|
||||
* VerifyJSONConfigs verify every elements in Data. (#9302, @blackpiglet)
|
||||
* Fix issue #9267, add events to data mover prepare diagnostic (#9296, @Lyndon-Li)
|
||||
* Add option for privileged fs-backup pod (#9295, @sseago)
|
||||
* Fix issue #9193, don't connect repo in repo controller (#9291, @Lyndon-Li)
|
||||
* Implement concurrency control for cache of native VolumeSnapshotter plugin. (#9281, @0xLeo258)
|
||||
* Fix issue #7904, remove the code and doc for PVC node selection (#9269, @Lyndon-Li)
|
||||
* Fix schedule controller to prevent backup queue accumulation during extended blocking scenarios by properly handling empty backup phases (#9264, @shubham-pampattiwar)
|
||||
* Fix repository maintenance jobs to inherit allowlisted tolerations from Velero deployment (#9256, @shubham-pampattiwar)
|
||||
* Implement wildcard namespace pattern expansion for backup namespace includes/excludes. This change adds support for wildcard patterns (*, ?, [abc], {a,b,c}) in namespace includes and excludes during backup operations (#9255, @Joeavaikath)
|
||||
* Protect VolumeSnapshot field from race condition during multi-thread backup (#9248, @0xLeo258)
|
||||
* Update AzureAD Microsoft Authentication Library to v1.5.0 (#9244, @priyansh17)
|
||||
* Get pod list once per namespace in pvc IBA (#9226, @sseago)
|
||||
* Fix issue #7725, add design for backup repo cache configuration (#9148, @Lyndon-Li)
|
||||
* Fix issue #9229, don't attach backupPVC to the source node (#9233, @Lyndon-Li)
|
||||
* feat: Permit specifying annotations for the BackupPVC (#9173, @clementnuss)
|
||||
@@ -1 +0,0 @@
|
||||
Add `--apply` flag to `install` command, allowing usage of Kubernetes apply to make changes to existing installs
|
||||
@@ -1 +0,0 @@
|
||||
feat: Enhance BackupStorageLocation with Secret-based CA certificate support
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #7725, add design for backup repo cache configuration
|
||||
@@ -1 +0,0 @@
|
||||
Add VolumePolicy support for PVC Phase conditions to allow skipping Pending PVCs
|
||||
@@ -1 +0,0 @@
|
||||
feat: Permit specifying annotations for the BackupPVC
|
||||
@@ -1 +0,0 @@
|
||||
Remove labels associated with previous backups
|
||||
@@ -1 +0,0 @@
|
||||
Get pod list once per namespace in pvc IBA
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #9229, don't attach backupPVC to the source node
|
||||
@@ -1 +0,0 @@
|
||||
Update AzureAD Microsoft Authentication Library to v1.5.0
|
||||
@@ -1 +0,0 @@
|
||||
Protect VolumeSnapshot field from race condition during multi-thread backup
|
||||
@@ -1,10 +0,0 @@
|
||||
Implement wildcard namespace pattern expansion for backup namespace includes/excludes.
|
||||
|
||||
This change adds support for wildcard patterns (*, ?, [abc], {a,b,c}) in namespace includes and excludes during backup operations.
|
||||
When wildcard patterns are detected, they are expanded against the list of active namespaces in the cluster before the backup proceeds.
|
||||
|
||||
Key features:
|
||||
- Wildcard patterns in namespace includes/excludes are automatically detected and expanded
|
||||
- Pattern validation ensures unsupported patterns (regex, consecutive asterisks) are rejected
|
||||
- Empty wildcard results (e.g., "invalid*" matching no namespaces) correctly result in empty backups
|
||||
- Exact namespace names and "*" continue to work as before (no expansion needed)
|
||||
@@ -1 +0,0 @@
|
||||
Fix repository maintenance jobs to inherit allowlisted tolerations from Velero deployment
|
||||
@@ -1 +0,0 @@
|
||||
Fix schedule controller to prevent backup queue accumulation during extended blocking scenarios by properly handling empty backup phases
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #7904, remove the code and doc for PVC node selection
|
||||
@@ -1 +0,0 @@
|
||||
Implement concurrency control for cache of native VolumeSnapshotter plugin.
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #9193, don't connect repo in repo controller
|
||||
@@ -1 +0,0 @@
|
||||
Add option for privileged fs-backup pod
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #9267, add events to data mover prepare diagnostic
|
||||
@@ -1 +0,0 @@
|
||||
VerifyJSONConfigs verify every elements in Data.
|
||||
@@ -1 +0,0 @@
|
||||
Concurrent backup processing
|
||||
@@ -1 +0,0 @@
|
||||
Sanitize Azure HTTP responses in BSL status messages
|
||||
@@ -1 +0,0 @@
|
||||
Fix typos in documentation
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #9332, add bytesDone for cache files
|
||||
@@ -1 +0,0 @@
|
||||
Add cache configuration to VGDP
|
||||
@@ -1 +0,0 @@
|
||||
Fix the Job build error when BackupReposiotry name longer than 63.
|
||||
@@ -1 +0,0 @@
|
||||
Add cache dir configuration for udmrepo
|
||||
@@ -1 +0,0 @@
|
||||
Add snapshotSize for DataDownload, PodVolumeRestore
|
||||
@@ -1 +0,0 @@
|
||||
Add incrementalSize to DU/PVB for reporting new/changed size
|
||||
@@ -1 +0,0 @@
|
||||
Support cache volume for generic restore exposer and pod volume exposer
|
||||
@@ -1 +0,0 @@
|
||||
Use hookIndex for recording multiple restore exec hooks.
|
||||
@@ -1 +0,0 @@
|
||||
Fix managed fields patch for resources using GenerateName
|
||||
@@ -1 +0,0 @@
|
||||
Track actual resource names for GenerateName in restore status
|
||||
@@ -1 +0,0 @@
|
||||
Add cache volume configuration
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #9365, prevent fake completion notification due to multiple update of single PVR
|
||||
@@ -1 +0,0 @@
|
||||
Refactor repo provider interface for static configuration
|
||||
@@ -1 +0,0 @@
|
||||
don't copy securitycontext from first container if configmap found
|
||||
@@ -1 +0,0 @@
|
||||
Cache volume support for DataDownload
|
||||
@@ -1 +0,0 @@
|
||||
Cache volume for PVR
|
||||
1
changelogs/unreleased/9403-GabriFedi97
Normal file
1
changelogs/unreleased/9403-GabriFedi97
Normal file
@@ -0,0 +1 @@
|
||||
Include InitContainer configured as Sidecars when validating the existence of the target containers configured for the Backup Hooks
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #9400, connect repo first time after creation so that init params could be written
|
||||
@@ -1 +0,0 @@
|
||||
Add Prometheus metrics for maintenance jobs
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #9276, add doc for cache volume support
|
||||
@@ -1 +0,0 @@
|
||||
Apply volume policies to VolumeGroupSnapshot PVC filtering
|
||||
@@ -1 +0,0 @@
|
||||
Fix issue #9194, add doc for GOMAXPROCS behavior change
|
||||
@@ -1 +0,0 @@
|
||||
Remove VolumeSnapshotClass from CSI B/R process.
|
||||
@@ -1 +0,0 @@
|
||||
Add PVC-to-Pod cache to improve volume policy performance
|
||||
@@ -1 +0,0 @@
|
||||
Fix plugin init container names exceeding DNS-1123 limit
|
||||
@@ -1 +0,0 @@
|
||||
Add maintenance job and data mover pod's labels and annotations setting.
|
||||
1
changelogs/unreleased/9502-Joeavaikath
Normal file
1
changelogs/unreleased/9502-Joeavaikath
Normal file
@@ -0,0 +1 @@
|
||||
Support all glob wildcard characters in namespace validation
|
||||
1
changelogs/unreleased/9508-kaovilai
Normal file
1
changelogs/unreleased/9508-kaovilai
Normal file
@@ -0,0 +1 @@
|
||||
Fix VolumePolicy PVC phase condition filter for unbound PVCs (#9507)
|
||||
1
changelogs/unreleased/9516-shubham-pampattiwar
Normal file
1
changelogs/unreleased/9516-shubham-pampattiwar
Normal file
@@ -0,0 +1 @@
|
||||
Fix VolumeGroupSnapshot restore failure with Ceph RBD CSI driver by creating stub VolumeGroupSnapshotContent during restore and looking up VolumeSnapshotClass by driver for credential support
|
||||
1
changelogs/unreleased/9528-Lyndon-Li
Normal file
1
changelogs/unreleased/9528-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Add block data mover design for block level incremental backup by integrating with Kubernetes CBT
|
||||
1
changelogs/unreleased/9532-Lyndon-Li
Normal file
1
changelogs/unreleased/9532-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9343, include PV topology to data mover pod affinities
|
||||
1
changelogs/unreleased/9533-Lyndon-Li
Normal file
1
changelogs/unreleased/9533-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9496, support customized host os
|
||||
1
changelogs/unreleased/9540-sseago
Normal file
1
changelogs/unreleased/9540-sseago
Normal file
@@ -0,0 +1 @@
|
||||
Add custom action type to volume policies
|
||||
1
changelogs/unreleased/9547-blackpiglet
Normal file
1
changelogs/unreleased/9547-blackpiglet
Normal file
@@ -0,0 +1 @@
|
||||
If BIA return updateObj with SkipFromBackupAnnotation, treat it as skip the resource from backup.
|
||||
1
changelogs/unreleased/9554-testsabirweb
Normal file
1
changelogs/unreleased/9554-testsabirweb
Normal file
@@ -0,0 +1 @@
|
||||
Issue #9544: Add test coverage for S3 bucket name in MRAP ARN notation and fix bucket validation to accept ARN format
|
||||
1
changelogs/unreleased/9560-Lyndon-Li
Normal file
1
changelogs/unreleased/9560-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9475, use node-selector instead of nodName for generic restore
|
||||
1
changelogs/unreleased/9561-Lyndon-Li
Normal file
1
changelogs/unreleased/9561-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9460, flush buffer before data mover completes
|
||||
1
changelogs/unreleased/9570-H-M-Quang-Ngo
Normal file
1
changelogs/unreleased/9570-H-M-Quang-Ngo
Normal file
@@ -0,0 +1 @@
|
||||
Add schedule_expected_interval_seconds metric for dynamic backup alerting thresholds (#9559)
|
||||
1
changelogs/unreleased/9574-blackpiglet
Normal file
1
changelogs/unreleased/9574-blackpiglet
Normal file
@@ -0,0 +1 @@
|
||||
Add ephemeral storage limit and request support for data mover and maintenance job
|
||||
1
changelogs/unreleased/9581-shubham-pampattiwar
Normal file
1
changelogs/unreleased/9581-shubham-pampattiwar
Normal file
@@ -0,0 +1 @@
|
||||
Fix DBR stuck when CSI snapshot no longer exists in cloud provider
|
||||
1
changelogs/unreleased/9614-blackpiglet
Normal file
1
changelogs/unreleased/9614-blackpiglet
Normal file
@@ -0,0 +1 @@
|
||||
Add check for file extraction from tarball.
|
||||
1
changelogs/unreleased/9628-priyansh17
Normal file
1
changelogs/unreleased/9628-priyansh17
Normal file
@@ -0,0 +1 @@
|
||||
Implement original VolumeSnapshotContent deletion for legacy backups
|
||||
1
changelogs/unreleased/9634-Lyndon-Li
Normal file
1
changelogs/unreleased/9634-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9626, let go for uninitialized repo under readonly mode
|
||||
1
changelogs/unreleased/9638-adam-jian-zhang
Normal file
1
changelogs/unreleased/9638-adam-jian-zhang
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9636, fix configmap lookup in non-default namespaces
|
||||
1
changelogs/unreleased/9643-priyansh17
Normal file
1
changelogs/unreleased/9643-priyansh17
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9641, Remove redundant ReadyToUse polling in CSI VolumeSnapshotContent delete plugin
|
||||
1
changelogs/unreleased/9653-BassinD
Normal file
1
changelogs/unreleased/9653-BassinD
Normal file
@@ -0,0 +1 @@
|
||||
Fix service restore with null healthCheckNodePort in last-applied-configuration label
|
||||
1
changelogs/unreleased/9663-Lyndon-Li
Normal file
1
changelogs/unreleased/9663-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9659, in the case that PVB/PVR/DU/DD is cancelled before the data path is really started, call EndEvent to prevent data mover pod from crashing because of delay event distribution
|
||||
1
changelogs/unreleased/9668-adam-jian-zhang
Normal file
1
changelogs/unreleased/9668-adam-jian-zhang
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9666, fix node-agent node detection in multiple instances scenario
|
||||
1
changelogs/unreleased/9676-Lyndon-Li
Normal file
1
changelogs/unreleased/9676-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9470, remove restic from repository
|
||||
1
changelogs/unreleased/9677-Lyndon-Li
Normal file
1
changelogs/unreleased/9677-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9469, remove restic for uploader
|
||||
1
changelogs/unreleased/9682-adam-jian-zhang
Normal file
1
changelogs/unreleased/9682-adam-jian-zhang
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9681, fix restores and podvolumerestores list options to only list in installed namespace
|
||||
1
changelogs/unreleased/9683-Lyndon-Li
Normal file
1
changelogs/unreleased/9683-Lyndon-Li
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9428, increase repo maintenance history queue length from 3 to 25
|
||||
1
changelogs/unreleased/9693-priyansh17
Normal file
1
changelogs/unreleased/9693-priyansh17
Normal file
@@ -0,0 +1 @@
|
||||
Enhance backup deletion logic to handle tarball download failures
|
||||
1
changelogs/unreleased/9695-shubham-pampattiwar
Normal file
1
changelogs/unreleased/9695-shubham-pampattiwar
Normal file
@@ -0,0 +1 @@
|
||||
Bump external-snapshotter to v8.4.0 and migrate VolumeGroupSnapshot API from v1beta1 to v1beta2 for Kubernetes 1.34+ compatibility
|
||||
1
changelogs/unreleased/9700-priyansh17
Normal file
1
changelogs/unreleased/9700-priyansh17
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9699, add a 2-second gap between temporary CSI VolumeSnapshotContent create and delete operations
|
||||
1
changelogs/unreleased/9701-emirot
Normal file
1
changelogs/unreleased/9701-emirot
Normal file
@@ -0,0 +1 @@
|
||||
Update Debian base image from bookworm to trixie
|
||||
1
changelogs/unreleased/9704-adam-jian-zhang
Normal file
1
changelogs/unreleased/9704-adam-jian-zhang
Normal file
@@ -0,0 +1 @@
|
||||
Fix issue #9703, fix CSI PVC Backup Plugin list options to only list in installed namespace
|
||||
1
changelogs/unreleased/9705-emirot
Normal file
1
changelogs/unreleased/9705-emirot
Normal file
@@ -0,0 +1 @@
|
||||
perf: better string concatenation
|
||||
1
changelogs/unreleased/9728-blackpiglet
Normal file
1
changelogs/unreleased/9728-blackpiglet
Normal file
@@ -0,0 +1 @@
|
||||
Remove Restic build from Dockerfile, Makefile and Tiltfile.
|
||||
@@ -69,9 +69,7 @@ spec:
|
||||
- ""
|
||||
type: string
|
||||
resticIdentifier:
|
||||
description: |-
|
||||
ResticIdentifier is the full restic-compatible string for identifying
|
||||
this repository. This field is only used when RepositoryType is "restic".
|
||||
description: Deprecated
|
||||
type: string
|
||||
volumeNamespace:
|
||||
description: |-
|
||||
|
||||
File diff suppressed because one or more lines are too long
BIN
design/block-data-mover/backup-architecture.png
Normal file
BIN
design/block-data-mover/backup-architecture.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 498 KiB |
551
design/block-data-mover/block-data-mover.md
Normal file
551
design/block-data-mover/block-data-mover.md
Normal file
@@ -0,0 +1,551 @@
|
||||
# Block Data Mover Design
|
||||
|
||||
## Glossary & Abbreviation
|
||||
|
||||
**Backup Storage**: The storage to store the backup data. Check [Unified Repository design][1] for details.
|
||||
**Backup Repository**: Backup repository is layered between BR data movers and Backup Storage to provide BR related features that is introduced in [Unified Repository design][1].
|
||||
**Velero Generic Data Path (VGDP)**: VGDP is the collective of modules that is introduced in [Unified Repository design][1]. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository.
|
||||
**Velero Built-in Data Mover (VBDM)**: VBDM, which is introduced in [Volume Snapshot Data Movement design][2] and [Unified Repository design][1], is the built-in data mover shipped along with Velero, it includes Velero data mover controllers and VGDP.
|
||||
**Data Mover Pods**: Intermediate pods which hold VGDP and complete the data transfer. See [VGDP Micro Service for Volume Snapshot Data Movement][3] for details.
|
||||
**Change Block Tracking (CBT)**: CBT is the mechanism to track changed blocks, so that backups could back up the changed data only. CBT usually provides by the computing/storage platform.
|
||||
**TCO**: Total Cost of Ownership. This is a general criteria for products/solutions, but also means a lot for BR solutions. For example, this means what kind of backup storage (and its cost) it requires, the retention policy of backup copies, the ways to remove backup data redundancy, etc.
|
||||
**PodVolume Backup**: This is the Velero backup method which accesses the data from live file system, see [Kopia Integration design][1] for how it works.
|
||||
**CAOS and CABS**: Content-Addressable Object Storage and Content-Addressable Block Storage, they are the parts from Kopia repository, see [Kopia Architecture][5].
|
||||
|
||||
## Background
|
||||
Kubernetes supports two kinds of volume mode, `FileSystem` and `Block`, for persistent volumes. Underlyingly, the storage could use a block storage to provision either `FileSystem` mode or `Block` mode volumes; and the storage could use a file storage to provision `FileSystem` mode volumes.
|
||||
For volumes provisioned by block storage, they could be backed up/restored from the block level, regardless the volume mode of the persistent volume.
|
||||
On the other hand, as long as the data could be accessed from the file system, a backup/restore could be conducted from the file system level. That is to say `FileSystem` mode volumes could be backed up/restored from the file system level, regardless of the backend storage type.
|
||||
Then if a `FileSystem` mode volume is provisioned by a block storage, the volume could be backed up/restored either from the file system level or block level.
|
||||
|
||||
For Velero, [CSI Snapshot Data Movement][2] which is implemented by VBDM, ships a file system uploader, so the backup/restore is done from file system only.
|
||||
|
||||
Once possible, block level backup/restore is better than file system level backup/restore:
|
||||
- Block level backup could leverage CBT to process minimal size of data, so it significantly reduces the overhead to network, backup repository and backup storage. As a result, TCO is significantly reduced.
|
||||
- Block level backup/restore is performant in throughput and resource consumption, because it doesn't need to handle the complexity of the file system, especially for the case that huge number of small files in the file system.
|
||||
- Block level backup/restore is less OS dependent because the uploader doesn't need the OS to be aware of the file system in the volume.
|
||||
|
||||
At present, [Kubernetes CBT API][4] is mature and close to Beta stage. Many platform/storage has supported/is going to support it.
|
||||
|
||||
Therefore, it is very important for Velero to deliver the block level backup/restore and recommend users to use it over the file system data mover as long as:
|
||||
- The volume is backed by block storage so block level access is possible
|
||||
- The platform supports CBT
|
||||
|
||||
Meanwhile, file system level backup/restore is still valuable for below scenarios:
|
||||
- The volume is backed by file storage, e.g., AWS EFS, Azure File, CephFS, VKS File Volume, etc.
|
||||
- The volume is backed by block storage but CBT is not available
|
||||
- The volume doesn't support CSI snapshot, so Velero PodVolume Backup method is used
|
||||
|
||||
There are rich features delivered with VGDP, VBDM and [VGDP micro service][3], to reuse these features, block data mover should be built based on these modules.
|
||||
|
||||
Velero VBDM supports linux and Windows nodes, however, Windows container doesn't support block mode volumes, so backing up/restoring from Windows nodes is not supported until Windows container removes this limitation. As a result, if there are both linux and Windows nodes in the cluster, block data mover can only run in linux nodes.
|
||||
|
||||
Both the Kubernetes CBT service and Velero work in the boundary of the cluster, even though the backend storage may be shared by multiple clusters, Velero can only protection workloads in the same cluster where it is running.
|
||||
|
||||
## Goals
|
||||
|
||||
Add a block data mover to VBDM and support block level backup/restore for [CSI Snapshot Data Movement][2], which includes:
|
||||
- Support block level full backup for both `FileSystem` and `Block` mode volumes
|
||||
- Support block level incremental backup for both `FileSystem` and `Block` mode volumes
|
||||
- Support block level restore from full/incremental backup for both `FileSystem` and `Block` mode volumes
|
||||
- Support block level backup/restore for both linux and Windows workloads from linux cluster nodes
|
||||
- Support all existing features, i.e., load concurrency, node selection, cache volume, deduplication, compression, encryption, etc. for the block data mover
|
||||
- Support volumes processed from file system level and block level in the same backup/restore
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- PodVolume Backup does the backup/restore from file system level only, so block level backup/restore is not supported
|
||||
- Volumes that are backed by file system storages, can only be backed up/restored from file system level, so block level backup/restore is not supported
|
||||
- Backing up/restoring from Windows nodes is not supported
|
||||
- Block level incremental backup requires special capabilities of the backup repository, and Velero [Unified Repository][1] supports multiple kinds of backup repositories. The current design focus on Kopia repository only, block level incremental backup support of other repositories will be considered when the specific backup repository is integrated to [Velero Unified Repository][1]
|
||||
|
||||
## Architecture
|
||||
|
||||
### Data Path
|
||||
|
||||
Below shows the architecture of VGDP when integrating to Unified Repository (implemented by Kopia repository).
|
||||
A new block data mover will be added besides the existing file system data mover, the both data movers read/write data from/to the same backup repository through Unified Repo interface.
|
||||
Unified Repo interface and the backup repository needs to be enhanced to support incremental backups.
|
||||
|
||||

|
||||
|
||||
For more details of VGDP architecture, see [Unified Repository design][1], [Volume Snapshot Data Movement design][2] and [VGDP Micro Service for Volume Snapshot Data Movement][3].
|
||||
|
||||
### Backup
|
||||
|
||||
Below is the architecture for block data mover backup which is developed based on the existing VBDM:
|
||||
|
||||

|
||||
|
||||
The existing VBDM is reused, below are the major changes based on the existing VBDM:
|
||||
**Exposer**: Exposer needs to create block mode backupPVC all the time regardless of the sourcePVC mode.
|
||||
**CBT**: This is a new layer to retrieve, transform and store the changed blocks, it interacts with CSI SnapshotMetadataService through gRPC.
|
||||
**Uploader**: A new block uploader is added. It interacts with CBT layer, holds special logics to make performant data read from block devices and holds special logics to write incremental data to Unified Repository.
|
||||
**Extended Kopia repo**: A new Incremental Aware Object Extension is added to Kopia's CAOS, so as to support incremental data write. Other parts of Kopia repository, including the existing CAOS and CABS, are not changed.
|
||||
|
||||
### Restore
|
||||
|
||||
Below is architecture for block data mover restore which is developed based on the existing VBDM:
|
||||
|
||||

|
||||
|
||||
The existing VBDM is reused, below are the major changes based on the existing VBDM:
|
||||
**Exposer**: While the restorePV is in block mode, exposer needs to rebind the restorePV to a targetPVC in either file system mode or block mode.
|
||||
**Uploader**: The same block uploader holds special logics to make performant data write to block devices and holds special logics to read data from the backup chain in Unified repository.
|
||||
|
||||
For more details of VBDM, see [Volume Snapshot Data Movement design][2].
|
||||
|
||||
## Detailed Design
|
||||
|
||||
### Selectable Data Mover Type
|
||||
|
||||
#### Per Backup Selection
|
||||
At present, the backup accepts a `DataMover` parameter and when its value is empty or `velero`, VBDM is used.
|
||||
After block data mover is introduced, VBDM will have two types of data movers, Velero file system data mover and Velero block data mover.
|
||||
A new type string `velero-block` is introduced for Velero block data mover, that is, when `DataMover` is set as `velero-block`, Velero block data mover is used.
|
||||
Another new value `velero-fs` is introduced for Velero file system data mover, that is, when `DataMover` is set as `velero-fs`, Velero file system data mover is used.
|
||||
For backwards compatibility consideration, `velero` is preserved a valid value, it refers to the default data mover, and the default data mover may change among releases. At present, Velero file system data mover is the default data mover; we can change the default one to Velero block data mover in future releases.
|
||||
|
||||
#### Volume Policy
|
||||
It is a valid case that users have multiple volumes in a single backup, while they want to use Velero file system data mover for some of the volumes and use Velero block data mover for some others.
|
||||
To meet this requirement, a combined solution of Per Backup Selection and Volume Policy is used.
|
||||
|
||||
Here are the data structs for VolumePolicy:
|
||||
```go
|
||||
type volPolicy struct {
|
||||
action Action
|
||||
conditions []volumeCondition
|
||||
}
|
||||
|
||||
type volumeCondition interface {
|
||||
match(v *structuredVolume) bool
|
||||
validate() error
|
||||
}
|
||||
|
||||
type structuredVolume struct {
|
||||
capacity resource.Quantity
|
||||
storageClass string
|
||||
nfs *nFSVolumeSource
|
||||
csi *csiVolumeSource
|
||||
volumeType SupportedVolume
|
||||
pvcLabels map[string]string
|
||||
pvcPhase string
|
||||
}
|
||||
|
||||
type Action struct {
|
||||
Type VolumeActionType `yaml:"type"`
|
||||
Parameters map[string]any `yaml:"parameters,omitempty"`
|
||||
}
|
||||
|
||||
const (
|
||||
ConfigmapRefType string = "configmap"
|
||||
Skip VolumeActionType = "skip"
|
||||
FSBackup VolumeActionType = "fs-backup"
|
||||
Snapshot VolumeActionType = "snapshot"
|
||||
)
|
||||
```
|
||||
|
||||
`action.parameters` is used to provide extra information of the action. This is an ideal place to differentiate Velero file system data mover and Velero block data mover.
|
||||
Therefore, Velero built-in data mover will support `dataMover` key in `parameters`, with the value either `velero-fs` or `velero-block`. While `velero-fs` and `velero-block` are with the same meaning with Per Backup Selection.
|
||||
|
||||
As an example, here is how a user might use both `velero-block` and `velero-fs` in a single backup:
|
||||
- Users set `DataMover` parameter for the backup as `velero-block`
|
||||
- Users add a record into Volume Policy, make `conditions` to filter the volumes they want to backup through Velero file system data mover, make `action.type` as `snapshot` and insert a record into `action.parameter` as `dataMover:velero-fs`
|
||||
|
||||
In this way, all volumes matched by `conditions` will be backed up with Velero file system data mover; while the others will fallback to the per backup method Velero block data mover.
|
||||
|
||||
Vice versa, users could set the per backup method as file system data mover and select volumes for Velero block data mover.
|
||||
|
||||
The selected data mover for each volume should be recorded to `volumeInfo.json`.
|
||||
|
||||
### Controllers
|
||||
Backup controller and Restore controller are kept as is, async operations are still used to interact with VBDM with block data mover.
|
||||
DataUpload controller and DataDownload controller are almost kept as is, with some minor changes to handle the data mover type and backup type appropriately and convey it to the exposers. With [VGDP Micro Service][3], the controllers are almost isolated from VGDP, so no major changes are required.
|
||||
|
||||
### Exposer
|
||||
|
||||
#### CSI Snapshot Exposer
|
||||
The existing CSI Snapshot Exposer is reused with some changes to decide the backupPVC volume mode by access mode. Specifically, for Velero block data mover, access mode is always `Block`, so the backupPVC volume mode is always `Block`.
|
||||
Once the backupPVC is created with correct volume mode, the existing code could create the backupPod and mount the backupPVC appropriately.
|
||||
|
||||
#### Generic Restore Exposer
|
||||
The existing Generic Restore Exposer is reused, but the workflow needs some changes.
|
||||
For block data mover, the restorePV is in Block mode all the time, whereas, the targetPVC may be in either file system mode or block mode.
|
||||
However, Kubernetes doesn't allow to bound a PV to a PVC with mismatch volume mode.
|
||||
|
||||
Therefore, the workflow of ***Finish Volume Readiness*** as introduced in [Volume Snapshot Data Movement design][2] is changed as below:
|
||||
- When restore completes and restorePV is created, set restorePV's `deletionPolicy` to `Retain`
|
||||
- Create another rebindPV and copy restorePV's `volumeHandle` but the `volumeMode` matches to the targetPVC
|
||||
- Delete restorePV
|
||||
- Set the rebindPV's claim reference (the ```claimRef``` filed) to targetPVC
|
||||
- Add the ```velero.io/dynamic-pv-restore``` label to the rebindPV
|
||||
|
||||
In this way, the targetPVC will be bound immediately by Kubernetes to rebindPV.
|
||||
|
||||
These changes work for file system data mover as well, so the old workflow will be replaced, only the new workflow is kept.
|
||||
|
||||
### VGDP
|
||||
|
||||
Below is the VGDP workflow during backup:
|
||||
|
||||

|
||||
|
||||
Below is the VGDP workflow during restore:
|
||||
|
||||

|
||||
|
||||
#### Unified Repo
|
||||
For block data mover, one Unified Repo Object is created for each volume, and some metadata is also saved into Unified Repo to describe the volume.
|
||||
During the backup, the write conducts a skippable-write manner:
|
||||
- For the data range that the write does not skip, object is written with the real data
|
||||
- For the data range that is skipped, the data is either filled as ZERO or cloned from the parent object. Specifically, for a full backup, data is filled as ZERO; for an incremental backup, data is cloned from the parent object
|
||||
|
||||
To support incremental backup, `ObjectWriter` interface needs to extend to support `io.WriterAt`, so that uploader could conduct a skippable-write manner:
|
||||
```go
|
||||
type ObjectWriter interface {
|
||||
io.WriteCloser
|
||||
io.WriterAt
|
||||
|
||||
// Seeker is used in the cases that the object is not written sequentially
|
||||
io.Seeker
|
||||
|
||||
// Checkpoint is periodically called to preserve the state of data written to the repo so far.
|
||||
// Checkpoint returns a unified identifier that represent the current state.
|
||||
// An empty ID could be returned on success if the backup repository doesn't support this.
|
||||
Checkpoint() (ID, error)
|
||||
|
||||
// Result waits for the completion of the object write.
|
||||
// Result returns the object's unified identifier after the write completes.
|
||||
Result() (ID, error)
|
||||
}
|
||||
```
|
||||
|
||||
To clone data from parent object, the caller needs to specify the parent object. To support this, `ObjectWriteOptions` is extended with `ParentObject`.
|
||||
The existing `AccessMode` could be used to indicate the data access type, either file system or block:
|
||||
|
||||
```go
|
||||
// ObjectWriteOptions defines the options when creating an object for write
|
||||
type ObjectWriteOptions struct {
|
||||
FullPath string // Full logical path of the object
|
||||
DataType int // OBJECT_DATA_TYPE_*
|
||||
Description string // A description of the object, could be empty
|
||||
Prefix ID // A prefix of the name used to save the object
|
||||
AccessMode int // OBJECT_DATA_ACCESS_*
|
||||
BackupMode int // OBJECT_DATA_BACKUP_*
|
||||
AsyncWrites int // Num of async writes for the object, 0 means no async write
|
||||
ParentObject ID // the parent object based on which incremental write will be done
|
||||
}
|
||||
```
|
||||
|
||||
To support non-Kopia uploader to save snapshots to Unified Repo, snapshot related methods will be added to `BackupRepo` interface:
|
||||
```go
|
||||
// SaveSnapshot saves a repo snapshot
|
||||
SaveSnapshot(ctx context.Context, snapshot Snapshot) (ID, error)
|
||||
|
||||
// GetSnapshot returns a repo snapshot from snapshot ID
|
||||
GetSnapshot(ctx context.Context, id ID) (Snapshot, error)
|
||||
|
||||
// DeleteSnapshot deletes a repo snapshot
|
||||
DeleteSnapshot(ctx context.Context, id ID) error
|
||||
|
||||
// ListSnapshot lists all snapshots in repo for the given source (if specified)
|
||||
ListSnapshot(ctx context.Context, source string) ([]Snapshot, error)
|
||||
```
|
||||
|
||||
To support non-Kopia uploader to save metadata, which is used to describe the backed up objects, some metadata related methods will be added to `BackupRepo` interface:
|
||||
```go
|
||||
// WriteMetadata writes metadata to the repo, metadata is used to describe data, e.g., file system
|
||||
// dirs are saved as metadata
|
||||
WriteMetadata(ctx context.Context, meta *Metadata, opt ObjectWriteOptions) (ID, error)
|
||||
|
||||
// ReadMetadata reads a metadata from repo by the metadata's object ID
|
||||
ReadMetadata(ctx context.Context, id ID) (*Metadata, error)
|
||||
```
|
||||
|
||||
kopia-lib for Unified Repo will implement these interfaces by calling the corresponding Kopia repository functions.
|
||||
|
||||
### Kopia Repository
|
||||
CAOS of Kopia repository implements Unified Repo's Objects. However, CAOS supports full and sequential write only.
|
||||
To make it support skippable write, a new Incremental Aware Object Extension is created based on the existing CAOS.
|
||||
|
||||
#### Block Address Table
|
||||
Kopia CAOS uses Block Address Table (BAT) to track objects. It will be reused for both full backups and incremental backups.
|
||||
|
||||

|
||||
|
||||
For Incremental Aware Object Extension, one object represents one volume.
|
||||
For full backup, the skipped areas will be written as all ZERO by Incremental Aware Object Extension, since Kopia repository's interface doesn't support skippable write. But it is fine, the ZERO data will be deduplicated by Kopia repository so nothing is actually written to the backup storage.
|
||||
For incremental backup, Incremental Aware Object Extension clones the table entries from the parent object for the skipped areas; for the written area, Incremental Aware Object Extension writes the data to Kopia repository and generate new entries. Finally, Incremental Aware Object Extension generates a new block address table for the incremental object which covers its entire logical space.
|
||||
|
||||
Incremental Aware Object Extension is automatically activated for block mode data access as set by `AccessMode` of `ObjectWriteOptions`.
|
||||
|
||||
#### Deduplication
|
||||
The Incremental Aware Object Extension uses fix-sized splitter for deduplication, this is good enough for block level backup, reasons:
|
||||
- Not like a file, a disk write never inserts data to the middle of the disk, it only does in-place update or append. So the data never shifts between two disks or the same disk of two different backups
|
||||
- File system IO to disk general aligned to a specific size, e.g., 4KB for NTFS and ext4, as long as the chunk size is a multiply of this size, it effectively reduces the case that one IO kills two deduplication chunks
|
||||
- For the usage cases that the disk is used as raw block device without a file system, the IO is still conducted by aligning to a specific boundary
|
||||
|
||||
The chunk size is intentionally chosen as 1MB, reasons:
|
||||
- 1MB is a multiply of 4KB for file systems or common block sizes for raw block device usages
|
||||
- 1MB is the start boundary of partitions for modern operating systems, for both MBR and GPT, so partition metadata could be isolated to a separate chunk
|
||||
- The more chunks are there, the more indexes in the repository, 1MB is a moderate value regarding to the overhead of indexes for Kopia repository
|
||||
|
||||
#### Benefits
|
||||
Since the existing block address table(BAT) of CAOS is reused and kept as is, it brings below benefits:
|
||||
- All the entries are still managed by Kopia CAOS, so Velero doesn't need to keep an extra data
|
||||
- The objects written by Velero block uploader is still recognizable by Kopia, for both full backup and incremental backup
|
||||
- The existing data management in Kopia repository still works for objects generated by Velero block uploader, e.g., snapshot GC, repository maintenance, etc.
|
||||
|
||||
Most importantly, this solution is super performant:
|
||||
- During incremental write, it doesn't copy any data from the parent object, instead, it only clones object block address entries
|
||||
- During backup deletion, it doesn't need to move any data, it only deletes the BAT for the object
|
||||
|
||||
#### Uploader behavior
|
||||
The block uploader's skippable write must also be aligned to this 1MB boundary, because Incremental Aware Object Extension needs to clone the entries that have been skipped from the parent object.
|
||||
File system uploader is still using variable-sized deduplication, it is fine to keep data from the two uploaders into the same Kopia repository, though normally they won't be mutually deduplicated.
|
||||
Volume could be resized; and volume size may not be aligned to 1MB boundary. The uploader need to handle the resize appropriately since Incremental Aware Object Extension cannot copy a BAT entry partially.
|
||||
|
||||
#### CBT Layer
|
||||
CBT provides below functionalities:
|
||||
1. For a full backup, it provides the allocated data ranges. E.g., for a 1TB volume, there may be only 1MB of files, with this functionality, the uploader could skip the ranges without real data
|
||||
2. For an incremental backup, it provides the changed data ranges based on the provided parent snapshot. In this way, the uploader could skip the unchanged data and achieves an incremental backup
|
||||
|
||||
For case 1, the uploader calls Unified Repo Object's `WriteAt` method with the offset for the allocated data, ranges ahead of the offset will be filled as ZERO by unified repository.
|
||||
For case 2, the uploader calls Unified Repo Object's `WriteAt` method with the offset for the changed data, ranges ahead of the offset will be cloned from the parent object unified repository.
|
||||
|
||||
A changeId is stored with each backup, the next backup will retrieve the parent snapshot's changeId and use it to retrieve the CBT.
|
||||
|
||||
The CBT retrieved from Kubernetes API are a list of `BlockMetadata`, each of range could be with fixed size or variable size.
|
||||
Block uploader needs to maintain its own granularity that is friendly to its backup repository and uploader, as mentioned above.
|
||||
|
||||
From Kubernetes API, `GetMetadataAllocated` or `GetMetadataDelta` are called looply until all `BlockMetadata` are retrieved.
|
||||
On the other hand, considering the complexity in uploader, e.g., multiple stream between read and write, the workflow should be driven by the uploader instead of the CBT iterator, therefore, in practice, all the allocated/changed blocks should be retrieved and preserved before passing it to the uploader.
|
||||
|
||||
As another fact, directly saving `BlockMetadata` list will be memory consuming.
|
||||
|
||||
With all the above considerations, the `Bitmap` data structure is used to save the allocated/changed blocks, calling CBT Bitmap.
|
||||
CBT Bitmap chunk size could be set as 1MB or a multiply of it, but a larger chunk size would amplify the backup size, so 1MB size will be use.
|
||||
|
||||
Finally, interactions among CSI Snapshot Metadata Service, CBT Layer and Uploader is like below:
|
||||
|
||||

|
||||
|
||||
In this way, CBT layer and uploader are decoupled and CBT bitmap plays as a north bound parameter of the uploader.
|
||||
|
||||
#### Block Uploader
|
||||
Block uploader consists of the reader and writer which are running asynchronously.
|
||||
During backup, reader reads data from the block device and also refers to CBT Bitmap for allocated/changed blocks; writer writes data to the Unified Repo.
|
||||
During restore, reader reads data from the Unified Repo; writer writes data to the block device.
|
||||
|
||||
Reader and writer connects by a ring buffer, that is, reader pushes the block data to the ring buffer and writer gets data from the ring buffer and write to the target.
|
||||
|
||||
To improve performance, block device is opened with direct IO, so that no data is going through the system cache unnecessarily.
|
||||
|
||||
During restore, to optimize the write throughput and storage usage, zero blocks should be either skipped (for restoring to a new volume) or unmapped (for restoring to an existing volume). To cover the both cases in a unified way, the SCSI command `WRITE_SAME` is used. Logics are as below:
|
||||
- Detect if a block read from the backup is with all zero data
|
||||
- If true, the uploader sends `WRITE_SAME` SCSI command by calling `BLKZEROOUT` ioctl
|
||||
- If the call fails, the uploader fallbaks to use the conservative way to write all zero bytes to the disk
|
||||
|
||||
Uploader implementation is OS dependent, but since Windows container doesn't support block volumes, the current implementation is for linux only.
|
||||
|
||||
#### ChangeId
|
||||
ChangeId identifies the base that CBT is generated from, it must strictly map to the parent snapshot in the repository. Otherwise, there will be data corruption in the incremental backup.
|
||||
Therefore, ChangeId is saved together with the repository snapshot.
|
||||
The data mover always queries parent snapshot from Unified Repo together with the ChangeId. In this way, no mismatch would happen.
|
||||
Inside the uploader, the upper layer (DataUpload controller) could also provide the ChangeId as a mechanism of double confirmation. The received ChangeId would be re-evaluated against the one in the provided snapshot.
|
||||
|
||||
For Kubernetes API, changeId is represented by `BaseSnapshotId`.
|
||||
changeId retrieval is storage specific, generally, it is retrieved from the `SnapshotHandle` of the VolumeSnapshotContent object; however, storages may also refer to other places to retrieve the changeId.
|
||||
That is, `SnapshotHandle` and changeId may be two different values, in this case, the both values need to be preserved.
|
||||
|
||||
#### Volume Snapshot Retention
|
||||
Storages/CSI drivers may support the changeId differently based on the storage's capabilities:
|
||||
1. In order to calculate the changes, some storages require the parent snapshot mapping to the changeId always exists at the time of `GetMetadataDelta` is called, then the parent snapshot can NOT be deleted as long as there are incremental backups based on it.
|
||||
2. Some storages don't require the parent snapshot itself at the time of calculating changes, then parent snapshot could be deleted immediately after the parent backup completes.
|
||||
|
||||
The existing exposer works perfectly with Case 1, that is, the snapshot is always deleted when the backup completes.
|
||||
However, for Case 2, since the snapshot must be retained, the exposer needs changes as below:
|
||||
- At the end of each backup, keep the current VolumeSnapshot's `deletionPolicy` as `Retain`, then when the VolumeSnapshot is deleted at the end of the backup, the current snapshot is retained in the storage
|
||||
- `GetMetadataDelta` is called with `BaseSnapshotId` set as the preserved changeId
|
||||
- When deleting a backup, a VolumeSnapshot-VolumeSnapshotContent pair is rebuilt with `deletionPolicy` as `delete` and `snapshotHandle` as the preserved one
|
||||
- Then the rebuilt VolumeSnapshot is deleted so that the volume snapshot is deleted from the storage
|
||||
|
||||
There is no way to automatically detect which way a specific volume support, so an interface is exposed to users to set the volume snapshot retention method.
|
||||
The interface could be added to the `Action.Parameters` of Volume Policy. By default, Velero block data mover takes Way 1, so volume snapshot is never retained; if users specify `RetainSnapshot` parameter, Way 2 will be taken.
|
||||
```go
|
||||
type Action struct {
|
||||
Type VolumeActionType `yaml:"type"`
|
||||
Parameters map[string]any `yaml:"parameters,omitempty"`
|
||||
}
|
||||
```
|
||||
In this way, users could specify --- for storage class "xxx" or CSI driver "yyy", backup through CSI snapshot with Velero block data mover and retain the snapshot.
|
||||
|
||||
#### Incremental Size
|
||||
By the end of the backup, incremental size is also returned by the uploader, as same as Velero file system uploader. The size indicates how much data are unique so processed by the uploader, based on the provided CBT.
|
||||
|
||||
### Fallback to Full Backup
|
||||
There are some occasions that the incremental backup won't continue, so the data mover fallbacks to full backup:
|
||||
- `GetMetadataAllocated` or `GetMetadataDelta` returns error
|
||||
- ChangeId is missing
|
||||
- Parent snapshot is missing
|
||||
|
||||
When the fallback happens, the volume will be fully backed up from block level, but since because of the data deduplication from the backup repository, the unallocated/unchanged data would be probably deduplicated.
|
||||
During restore, the volume will also be fully restored. The zero blocks handling as mentioned above is still working, so that write IO for unallocated data would be probably eliminated.
|
||||
|
||||
Fallback is to handle the exceptional cases, for most of the backups/restores, fallback is never expected.
|
||||
|
||||
### Irregular Volume Size
|
||||
As mentioned above, during incremental backup, block uploader IO should be restricted to be aligned to the deduplication chunk size (1MB); on the other hand, there is no hard limit for users' volume size to be aligned.
|
||||
To support volumes with irregular size, below measures are taken:
|
||||
- Volume objects in the repository is always aligned to 1MB
|
||||
- If the volume size is irregular, zero bytes will be padded to the tail of the volume object
|
||||
- A real size is recorded in the repository snapshot
|
||||
- During restore, the real size of data is restored
|
||||
|
||||
The padding must be always with zero bytes.
|
||||
|
||||
### Volume Size Change
|
||||
Incremental backup could continue when volume is resized.
|
||||
Block uploader supports to write disk with arbitrary size.
|
||||
The volume resize cases don't need to be handled case by case.
|
||||
|
||||
Instead, when volume resize happens, block uploader needs to handle it appropriately in below ways:
|
||||
- Loop with CBT
|
||||
- Read data between RoundDownTo1M(newSize) and newSize to get the tail data
|
||||
- If there is no tail data, which means the volume size is aligned to 1MB, then call `WriteAt(newSize, nil)`
|
||||
- Otherwise, call `WriteAt(RoundDownTo1M(newSize), taildata)`, `taildata` is also padded to 1MB
|
||||
|
||||
That is to say:
|
||||
- If CBT covers the tail of the volume, loop with CBT is enough for both shrink and expand case
|
||||
- Otherwise, if volume is expanded, `WriteAt` guarantees to clone appropriate objects entries from the parent object and append zero data for the expanded areas. Particularly, if the parent volume is not in regular size, the zero padding bytes is also reused. Therefore, the parent object's padding bytes must be zero
|
||||
- In the case the volume is shrunk, writing the tail data makes sure zero bytes are padding to the new volume object instead of inheriting non-zero data from the parent object
|
||||
|
||||
### Cancellation
|
||||
The existing Cancellation mechanism is reused, so there is no change outside of the block uploader.
|
||||
Inside the uploader, cancellation checkpoints are embedded to the uploader reader and writer, so that the execution could quit in a reasonable time once cancellation happens.
|
||||
|
||||
### Parallelism
|
||||
Parallelism among data movers will reuse the existing mechanism --- load concurrency.
|
||||
Inside the data mover, uploader reader and writer are always running in parallel. The number of reader and writer is always 1.
|
||||
Sequential read/write of the volume is always optimized, there is no prove that multiple readers/writers are beneficial.
|
||||
|
||||
### Progress Report
|
||||
Progress report outside of the data mover will reuse the existing mechanism.
|
||||
Inside the data mover, progress update is embedded to the uploader writer.
|
||||
The progress struct is kept as is, Velero block data mover still supports `TotalBytes` and `BytesDone`:
|
||||
```go
|
||||
type Progress struct {
|
||||
TotalBytes int64 `json:"totalBytes,omitempty"`
|
||||
BytesDone int64 `json:"doneBytes,omitempty"`
|
||||
}
|
||||
```
|
||||
By the end of the backup, the progress for block data mover provides the same `GetIncrementalSize` which reports the incremental size of the backup, so that the incremental size is reported to users in the same way as the file system data mover.
|
||||
|
||||
### Selectable Backup Type
|
||||
For many reasons, a periodical full backup is required:
|
||||
- From user experience, a periodical full is required to make sure the data integrity among the incremental backups, e.g., every 1 week or 1 month
|
||||
|
||||
Therefore, backup type (full/incremental) should be supported in Velero's manual backup and backup schedule.
|
||||
Backup type will also be added to `volumeInfo.json` to support observability purposes.
|
||||
|
||||
Backup TTL is still used for users to specify a backup's retention time. By default, both full and incremental backups are with 30 days retention, even though this is not so reasonable for the full backups. This could be enhanced when Velero supports sophisticated retention policy.
|
||||
As a workaround, users could create two schedules for the same scope of backup, one is for full backups, with less frequency and longer backup TTL; the other one is for incremental backups, with normal frequency and shorter backup TTL.
|
||||
|
||||
#### File System Data Mover
|
||||
At present, Velero file system data mover doesn't support selectable backup type, instead, incremental backups are always conducted once possible.
|
||||
From user experience this is not reasonable.
|
||||
|
||||
Therefore, to solve this problem and to make it align with Velero block data mover, Velero file system data mover will support backup type as well.
|
||||
|
||||
At present, the data path for Velero file system data mover has already supported it, we only need to expose this functionality to users.
|
||||
|
||||
### Backup Describe
|
||||
Backup type should be added to backup description, there are two appearances:
|
||||
- The `backupType` in the Backup CR. This is the selected backup type by users
|
||||
- The backup type recorded in `volumeInfo.json`, which is the actual type taken by the backup
|
||||
With these two values, users are able to know the actual backup type and also whether a fallback happens.
|
||||
|
||||
The `DataMover` item in the existing backup description should be updated to reflect the actual data mover completing the backup, this information could be retrieved from `volumeInfo.json`.
|
||||
|
||||
### Backup Sync
|
||||
No more data is required for sync, so Backup Sync is kept as is.
|
||||
|
||||
### Backup Deletion
|
||||
As mentioned above, no data is moved when deleting a repo snapshot for Velero block data mover, so Backup Deletion is kept as is regarding to repo snapshot; and for volume snapshot retention case, backup deletion logics will be modified accordingly to delete the retained snapshots.
|
||||
|
||||
### Restarts
|
||||
Restarts mechanism is reused without any change.
|
||||
|
||||
### Logging
|
||||
Logging mechanism is not changed.
|
||||
|
||||
### Backup CRD
|
||||
A `backupType` field is added to Backup CRD, two values are supported `full` or `incremental`.
|
||||
`full` indicates the data mover to take a full backup.
|
||||
`incremental` which is the default value, indicates the data mover to take an incremental backup.
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
description: BackupSpec defines the specification for a Velero backup.
|
||||
properties:
|
||||
backupType:
|
||||
description: BackupType indicates the type of the backup
|
||||
enum:
|
||||
- full
|
||||
- incremental
|
||||
type: string
|
||||
```
|
||||
|
||||
### DataUpload CRD
|
||||
A `parentSnapshot` field is added to the DataUpload CRD, below values are supported:
|
||||
- `""`: it fallbacks to `auto`
|
||||
- `auto`: it means the data mover finds the recent snapshot of the same volume from Unified Repository and use it as the parent
|
||||
- `none`: it means the data mover is not assigned with a parent snapshot, so it runs a full backup
|
||||
- a specific snapshotID: it means the data mover use the specific snapshotID to find the parent snapshot. If it cannot be found, the data mover fallbacks to a full backup
|
||||
|
||||
The last option is for a backup plan, it will not be used for now and may be useful when Velero supports sophisticated retention policy. This means, Velero always finds the recent backup as the parent.
|
||||
|
||||
When `backupType` of the Backup is `full`, the data mover controller sets `none` to `parentSnapshot` of DataUpload.
|
||||
When `backupType` of the Backup is `incremental`, the data mover controller sets `auto` to `parentSnapshot` of DataUpload. And `""` is just kept for backwards compatibility consideration.
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
description: DataUploadSpec is the specification for a DataUpload.
|
||||
properties:
|
||||
parentSnapshot:
|
||||
description: |-
|
||||
ParentSnapshot specifies the parent snapshot that current backup is based on.
|
||||
If its value is "" or "auto", the data mover finds the recent backup of the same volume as parent.
|
||||
If its value is "none", the data mover will do a full backup
|
||||
If its value is a specific snapshotID, the data mover finds the specific snapshot as parent.
|
||||
type: string
|
||||
```
|
||||
|
||||
### DataDownload CRD
|
||||
No change is required to DataDownload CRD.
|
||||
|
||||
## Plugin Data Movers
|
||||
The current design doesn't break anything for plugin data movers.
|
||||
The enhancement in VolumePolicy could also be used for plugin data movers. That is, users could select a plugin data mover through VolumePolicy as same as Velero built-in data movers.
|
||||
|
||||
## Installation
|
||||
No change to Installation.
|
||||
|
||||
## Upgrade
|
||||
No impacts to Upgrade. The new fields in the CRDs are all optional fields and have backwards compatible values.
|
||||
|
||||
## CLI
|
||||
Backup type parameter is added to Velero CLI as below:
|
||||
```
|
||||
velero backup create --full
|
||||
velero schedule create --full
|
||||
```
|
||||
When the parameter is not specified, by default, Velero goes with incremental backups.
|
||||
|
||||
|
||||
|
||||
[1]: ../Implemented/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md
|
||||
[2]: ../Implemented/volume-snapshot-data-movement/volume-snapshot-data-movement.md
|
||||
[3]: ../Implemented/vgdp-micro-service/vgdp-micro-service.md
|
||||
[4]: https://kubernetes.io/blog/2025/09/25/csi-changed-block-tracking/
|
||||
[5]: https://kopia.io/docs/advanced/architecture/
|
||||
BIN
design/block-data-mover/caos-extension.png
Normal file
BIN
design/block-data-mover/caos-extension.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 518 KiB |
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user