* Add builders for CRD schemas
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add test case for #2319
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add failing test case
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove unnecessary print and temporary variable
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add some options for fixing the test case
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Switch to a JSON middle step to "fix" conversions
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add comment and changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Account for possible missing schemas on v1 CRDs
If a v1beta1 CRD without a Schema was submitted to a Kubernets v1.16
cluster, then Kubernetes will server it back as a v1 CRD without a
schema.
However, when Velero tries to restore this document, the request will be
rejected as a v1 CRD must have a schema.
This commit has some defensive coding on the restore side, as well as
potential fixes on the backup side for getting around this.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Back up nonstructural CRDs as v1beta1
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add tests for remapping plugin
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add builders for v1 CRDs
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Address review feedback
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove extraneous log message
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
The OpenEBS community uses Velero as a preferred tool to enable data protection strategies. And at MayaData, we have
been using Velero since its ark days to power DMaaS - Data Migration as a service. Thank you for an amazing product and
community.
Signed-off-by: kmova <kiran.mova@mayadata.io>
* Wait for CRDs to be available and ready
When restoring CRDs, we should wait for the definition to be ready and
available before moving on to restoring specific CRs.
While the CRDs are often ready by the time we get to restoring a CR,
there is a race condition where the CRD isn't ready.
This change waits on each CRD at restore time.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Pruning unknown fields
In CRD apiversion v1beta1, default preserveUnknownFields=true.
In CRD apiversion v1, the preserveUnknownFields can only be false.
Otherwise, the k8s validation bumps out error message for the
invalid preserveUnknownFields value.
Deploy Velero on k8s 1.16+ with CRD apiversion v1beta1, the
k8s cluster converts apiversion from v1beta1 to v1 automatically.
Fully backup and restore the cluster, restore bumps out error message
due to the preserveUnknownFields=true is not allowed on k8s 1.16+.
Since the CRD structural schema had been defined, enable the preserveUnknownFields
to false to solves the restore bumps out error message on k8s 1.16+.
Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>
* Add changelog
Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>
(for some reason, basic Kubernetes is able to run a Debian based container with nobody:nobody but
docker run and VMware WCP fail which should be expected behavior)
Signed-off-by: Dave Smith-Uchida <dsmithuchida@vmware.com>
Velero client config file should have restricted file permissions to be
read/write-able for the user that creates it--similiar to files like
`.ssh/id_rsa`
Refer to OTG-CONFIG-009: Test File Permission
> Impoper file permission configuration may result in privilledge
escalation, information explousure, DLL injection, or unauthorized file
access.
Therefore, files permission must be properly configured with minium
access permission by default.
[source](https://www.owasp.org/index.php/Test_File_Permission_(OTG-CONFIG-009))
Ticket: #1758
Signed-off-by: John Naulty <johnnaulty@bitgo.com>
* Check for nil LastMaintenanceTime in dueForMaintenance
ResticRepository.dueForMaintenance causes a panic in the velero pod
("invalid memory address or nil pointer dereference") if
repository.Status.LastMaintenanceTime is nil. This fix returns 'true'
if it's nil, so the repository is due for maintenance if LastMaintenanceTime
is nil *or* the time elapsed since the last maintenance is greater than
repository.Spec.MaintenanceFrequency.Duration
Signed-off-by: Scott Seago <sseago@redhat.com>
* changelog for PR#2200
Signed-off-by: Scott Seago <sseago@redhat.com>
* update revision of go-hclog to match go.mod requirement
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update prometheus dep to prepare for go module migration
Signed-off-by: Steve Kriss <krisss@vmware.com>
Install restic with CPU/Memory limits is optional.
If velero cannot parse resource requirements, use default value instead.
After that, the administrator won't get confused that something recovered failed.
Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>
Migrate logic from NewUUID function into the pvRenamer function.
PR #2133 switched to a new NewUUID function that returns an error, but
the invocation of that function needs to happen within the pvRenamer
closure. Because the new function returns an error, the pvRenamer should
return the error, the signature needs to be changed and the return
checked.
Signed-off-by: John Naulty <johnnaulty@bitgo.com>
satori/go.uuid has a known issue with random uuid generation.
gofrs/uuid is still maintained and has fixed the random uuid generation
issue present in satori/go.uuid
Signed-off-by: John Naulty <johnnaulty@bitgo.com>
* Updating restic document for OpenShift cluster having version 4.1 or later
Signed-off-by: shashank855 <shashank.ranjan@mayadata.io>
* update documentation for velero-v1.2.0
Signed-off-by: shashank855 <shashank.ranjan@mayadata.io>
Fixes: #2094
Updates to site colours to align with VMware branding and improve contrast rations for site accessibility.
Update to youtube plugin for that it can will insert an iframes title as well. New usage of liquid template `{% youtube "<title>" %}
Updates to links to provide link text
Updates to images to add alt text.
Accessibility changes assist people visiting the site with visual impairments and improve the function of text to speech tools such as Jaws.
Signed-off-by: Brett Johnson <brett@sdbrett.com>
Fixes: #2092
Resolves: CVE-2019-13117
Updated gemfile.lock for security vulnerability.
Updated Gemfile to specify gem versions, providing more control over versions when using bundle update. Including the Jekyll version in the Gemfile tells Nelify which version to build with.
Signed-off-by: Brett Johnson <brett@sdbrett.com>
* remove fsfreeze-pause image, replace with ubuntu in nginx example
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* switch to sleep infinity for clarity
Signed-off-by: Steve Kriss <krisss@vmware.com>
* restic: don't try to restore PVBs with no snapshotID
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
Without a GOPROXY, go modules are fetched from their respective hosts,
which increases the likelihood that any given host might be unavailable
and break builds.
Fixes#2036
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* restic: use restore cmd's env when getting snapshot size
Signed-off-by: Steve Kriss <krisss@vmware.com>
* restic: remove code that considers 0-byte backups an error
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add --allow-partially-failed flag to velero restore create
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove extraneous client creation
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add godoc to helper func
Signed-off-by: Steve Kriss <krisss@vmware.com>
* todo
Signed-off-by: Steve Kriss <krisss@vmware.com>
* WIP revised install overview
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add info on different installing different VSL provider
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review and TODOs
Signed-off-by: Steve Kriss <krisss@vmware.com>
Related issue: https://github.com/heptio/velero/issues/1830
This accomplishes everything
that's needed, although there might be room for improvement in avoiding
a GET call for matching CRDs for each resource backed up. An alternative
could be a single call to get all CRDs prior to iterating over resources
and passing this into the backupResource function.
Signed-off-by: Scott Seago <sseago@redhat.com>
* feat: add azure china support
Signed-off-by: andyzhangx <xiazhang@microsoft.com>
* remove AZURE_CLOUD_NAME from required env var fetching
Signed-off-by: Steve Kriss <krisss@vmware.com>
* minor simplification of parseAzureEnvironment
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove cloudNameEnvVar from getRequiredValues call
Signed-off-by: Steve Kriss <krisss@vmware.com>
* just check for err != nil
Signed-off-by: Steve Kriss <krisss@vmware.com>
* Allow the velero server to be created on GCP even without a provided service account key in order to support workload identity and default compute engine credentials. Add option for adding service account annotations.
Signed-off-by: Joshua Wong <joshua99wong@gmail.com>
By default, git does not fetch tags on a checkout, so fetch those when
building a tag.
When the tags are not fetched and building master, both HIGHEST and
LATEST_TAG were "", which was equal. Thus, every master push was tagged
as latest. This is now handled correctly.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* update import paths to github.com/vmware-tanzu/...
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update other GH org refs to vmware-tanzu
Signed-off-by: Steve Kriss <krisss@vmware.com>
* site and docs: update GH org to vmware-tanzu
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update travis badge links on docs readmes
Signed-off-by: Steve Kriss <krisss@vmware.com>
* backup sync controller: replace revision file with full diff each interval
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove getting/setting of metadata/revision file
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* tweak logging
Signed-off-by: Steve Kriss <krisss@vmware.com>
* don't keep podVolumeBackup log field around after syncing PVBs
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update generated CRDs
Signed-off-by: Steve Kriss <krisss@vmware.com>
Using layers can simplify iteration on the builder image itself, and
shorten build times when only one command is changed
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* fail on make verify if generated CRDs differ
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* make verification error more clear
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* velero install: wait for restic daemonset to be ready
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
A user encountered the following error on a GCP project:
An error occurred: some backup storage locations are invalid: error getting backup store for location "default": rpc error: code = Unknown desc = invalid character '-' in numeric literal
This error was ambiguous and took some time to track down to the fact
that their credentials file wasn't a JSON file, but instead the contents
of the private key field. This change makes the problem slightly easier
to debug.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* change youtube embed to playlist
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add blog post links to resources
Signed-off-by: Steve Kriss <krisss@vmware.com>
* Add script for pushing container images via Travis
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Explain the latest tag logic
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add travis integration to deployment
* ensure $BRANCH is always the same value (borrowed from Sonobuoy)
* get gcloud SDK installed (borrowed from Sonobuoy)
* use deploy step to run GCR push script (borrowed from Sonobuoy)
* use gcloud's docker to do the image building/pushing
* placeholders for secure values
* rename $LATEST to $HIGHEST to more accurately reflect what it is
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add encrypted GCR creds
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove unused env section
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Rearrange logic so that there's only one make call
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Review feedback
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Update gcloud and OS for Travis environment
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove redundant make dependencies
verify and test targets already run on the ci target, which must pass
before deploy.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Re-encrypt file after testing
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
After discussing with the team, it doesn't make sense for all design proposals to have a draft/accepted status. Currently draft proposals can be kept open as PRs and all merged proposals are considered to be accepted. Removing the status field here removes the extra step of having to update the proposal status before merging.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* rename PV during restore when cloning a namespace
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename func and vars, switch to if..else
Signed-off-by: Steve Kriss <krisss@vmware.com>
* make pv renamer func configurable for testing purposes
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add unit test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* when backing up PVCs with restic, explicitly specify --parent
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* restic backup and restore progress proposal
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* update approach to lookup based on files rather than total size
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* typo
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* record TotalBytes and BytesDone in PodVolumeBackup/Restore and calculate percentage in UI
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* update to go with restic stats approach for restore
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* typo
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* update high-level design for restic stats
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* wait for bytes_done
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* change status to accepted
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* patch velero to handle self-signed certs on client
you'll get this error otherwise:
x509: certificate signed by unknown authority
Signed-off-by: Steven Chung <schung@d2iq.com>
The Velero deployment did not have a way of exposing the namespace it
was installed in to the API client. This is a problem for plugins that
need to query for resources in that namespaces, such as the restic
restore process that needs to find PodVolume(Backup|Restore)s.
While the Velero client is consulted for a configured namespace, this
cannot be set in the server pod since there is no valid home directory
in which to place it.
This change provides the namespace to the deployment via the downward
API, and updates the API client factory to use the VELERO_NAMESPACE
before looking at the config file, so that any plugins using the client
will look at the appropriate namespace.
Fixes#1743
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add SEO functionality to the site
Signed-off-by: Jonas Rosland <jrosland@vmware.com>
* Add some more SEO
Signed-off-by: Jonas Rosland <jrosland@vmware.com>
* use Backup CR labels as tags for snapshots
This allows users to define custom tags to be added to snapshots, by
specifying custom labels on the Backup CR with the `velero backup create
--labels` flag.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* print resource list metadata in velero backup describe details
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* rewrite TestGetDownloadURL to test more scenarios
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* move backup printer helpers to backup_printer.go
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* move describe printer helpers back to backup_describer
and rename to prefix with describe* to indicate that they are used for the describe command
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* split backup and restore tests for TestGetDownloadURL
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* friendlier error message when backup resource list missing
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* periodically check for stale restic repo locks
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* only try to init a restic repo if it doesn't already exist
Signed-off-by: Steve Kriss <krisss@vmware.com>
* reword comment
Signed-off-by: Steve Kriss <krisss@vmware.com>
* migrate pkg/backup restic tests to new structure
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename backup_new_test.go to backup_test.go
Signed-off-by: Steve Kriss <krisss@vmware.com>
* use pod volume backup builder
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove docs header about project rename
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove Upgrade to 1.0 page from master TOC
Signed-off-by: Steve Kriss <krisss@vmware.com>
* standardize TOC titles
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add backup/restore docs for undocumented features
Signed-off-by: Steve Kriss <krisss@vmware.com>
Without the incremental switch, live rebuilds were taking 20s (on a
Linux host) for a single-line change. With the switch, they dropped to
8s.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Flags specifying the kubeconfig or kubecontext to use weren't actually
being used by the install command.
Fixes#1651
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove references to apps/v1beta1 API group
In Kubernetes v1.16, the apps/v1 API group will be the default served
for relevant resources.
Update any references to apps/v1beta1 for fowards compatibility.
Fixes#1672
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Update API group on plugin commands
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* add restore item action to change PVC/PV storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* code review
Signed-off-by: Steve Kriss <krisss@vmware.com>
* change existing plugin names to lowercase/hyphenated
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add validation for new storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* fix imports
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update plugin names to be more consistent
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update unit tests to use pkg/test object constructors
Signed-off-by: Steve Kriss <krisss@vmware.com>
CSI volumes are mounted one level deeper than "native" kubernetes
volumes, and this needs to be appended for proper restic support.
Fixes#1313.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* allow users to specify additional Velero/restic pod annotations on the command line with the pod-annotations flag
Signed-off-by: Traci Kamp <traci.kamp@gmail.com>
* record PodVolumeBackup start and completion timestamps
adds startTimestamp and completionTimestamp fields to the
PodVolumeBackup status spec
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* record PodVolumeRestore start and completion timestamps
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* migrate hooks tests
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add more test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* refactor to strongly typed expectation, add pre+post hook test case
Signed-off-by: Steve Kriss <krisss@vmware.com>
* allow exclusion of resources using standard label
excludes any resources with the velero.io/exclude-from-backup=true label
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* ensure backup item action modifications reflected in tarball filepath
This patch ensures the updated backup item's name and namespace are used
when constructing the filepath for the tarball.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* changelog
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* added top spacing and reduced bottom
Signed-off-by: Ross Fenrick <rossf@heptio.com>
* added marign to h3's
Signed-off-by: Ross Fenrick <rossf@heptio.com>
* Add restic instructions for Enterprise PKS
Signed-off-by: Stephen Carter <carters@vmware.com>
* Add instructions to v1.0.0 docs
Signed-off-by: Stephen Carter <carters@vmware.com>
This fix initialises an empty map if the request object's Labels map
is nil, allowing the controller to later add and modify labels on the
object.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* migrate and enhance backup item action tests
Signed-off-by: Steve Kriss <krisss@vmware.com>
* migrate terminating resource test
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add godoc for test functions
Signed-off-by: Steve Kriss <krisss@vmware.com>
* Adding in Jekyll site | Adjusting gitignore for Jekyll build | Moving docs to site folder
Signed-off-by: Lee Springer <lee@smalltalk.agency>
Adding in Jekyll site | Adjusting gitignore for Jekyll build | Moving docs to site folder
Signed-off-by: Lee Springer <lee@smalltalk.agency>
* Restore main.go to original location
Signed-off-by: Lee Springer <lee@smalltalk.agency>
* Updates to footer
Signed-off-by: Lee Springer <lee@smalltalk.agency>
* Updates to homepage links
Signed-off-by: Lee Springer <lee@smalltalk.agency>
* Content updates
Signed-off-by: Lee Springer <lee@smalltalk.agency>
`velero snapshot-location create` existed, but not `velero create
snapshot-location`; update subcommand for parity with backup-location
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
When using `velero install`, the image used should be a reasonable
default, even if buildinfo.Version is missing (such as when using `go
build` directly).
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Velero should handle cases when the label length exceeds 63 characters.
- if the length of the backup/restore name is <= 63 characters, use it as the value of the label
- if it's > 63 characters, take the SHA256 hash of the name. the value of
the label will be the first 57 characters of the backup/restore name
plus the first six characters of the SHA256 hash.
Fixes heptio#1021
Signed-off-by: Anshul Chandra <anshulc@vmware.com>
Using only the `component: velero` selector caused the restic pods to be
matched, as well, which made things such as `kubectl logs` return too
much information.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Prior to 1.0, user-provided values such as cloud provider names didn't
need to be namespaced. This change allows those values to continue
working without the user editing the fields manually.
Fixes#1363
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Adds support for allowing a RestoreItemAction to skip item restore
This allows a RestoreItemAction plugin to signal to velero that
the returned item should be skipped rather than restored to the
cluster.
To support this, a boolean SkipRestore attribute is added to
RestoreItemActionExecuteOutput. If restore.restoreResource finds
this set to true, any remaining actions on this item are skipped,
and restore on this item is skipped. Execution continues with
the next item of this resource type.
To signal this for a particular item, the RestoreItemAction's
Execute method should call WithoutRestore() on the
RestoreItemActionExecuteOutput before returning it.
Signed-off-by: Scott Seago <sseago@redhat.com>
* Autogenerated code to support SkipRestore
Signed-off-by: Scott Seago <sseago@redhat.com>
* Added changelog for #1336
Signed-off-by: Scott Seago <sseago@redhat.com>
Original error was:
```
ark -n <redacted> restore logs <redacted>
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=nodes logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events.events.k8s.io logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=backups.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=restores.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Starting restore of backup backup/<redacted>" logSource="pkg/restore/restore.go:342"
time="2019-03-06T18:31:06Z" level=info msg="error unzipping and extracting: open /tmp/604421455/resources/rolebindings.rbac.authorization.k8s.io/namespaces/<redacted>/<redacted>: too many open files" logSource="pkg/restore/restore.go:346"
```
Downloading the directory from s3 and untarring it I found 1036 files. The ulimit -n output says 1024. This is our team's best guess at a root cause. But the code fixed in the PR definitely is holding all the files open until the method closes: https://blog.learngoprogramming.com/gotchas-of-defer-in-go-1-8d070894cb01
Please note my go code abilities are not great and I did not test this. I just edited the file in github. All I did was remove the defer and put fileClose after the copy is done. Theoretically this should only hold one file open at a time now. Let me know if you want me to do any further steps.
Thank you,
-Asaf
Signed-off-by: Asaf Erlich <aerlich@groupon.com>
* Move Phase to right under Metadata(name/namespace/label/annotations)
* Move Validation errors: section right after Phase: section and only
show it if the item has a phase of FailedValidation
* For restores move Warnings and Errors under Validation errors. Do not
show Warnings or Errors if there are none.
Signed-off-by: DheerajSShetty <dheerajs@vmware.com>
Fixes#987
To create a regional disk, the URLs for the zones in which the disk is
replicated must be provided to the GCP API.
Fixes#1183
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Due to the assumption of building from a git repository inherent in the
Makefile targets, update the documentation to limit source code archive
builds to the `go build` commands.
Note that it may be possible to update hack/build.sh to cope with
building outside a git repository, but there are a number of questions
to answer regarding how to populate the buildinfo info, and that is a
larger discussion than ensuring we can build the binary from the sources
in the source code archive.
Signed-off-by: Darren Hart <dvhart@vmware.com>
Update the documentation to include minimal instructions for building
the velero binary from the GitHub Release "Source code" archive.
Correct the Download link in the Table of Contents for
build-from-scratch.md.
Signed-off-by: Darren Hart <dvhart@vmware.com>
If a PV already exists, wait for it, it's associated PVC, and associated
namespace to be deleted before attempting to restore it.
If a namespace already exists, wait for it to be deleted before
attempting to restore it.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
The binary, deployment, and namespace have yet to change for most users,
so make sure the bug template has the correct information when filing.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Some aws implementations, for example the quobyte object storage, do not
support the v4 signing algorithm, but only v1.
This makes it possible to configure the signatureVersion.
The algorithm implementation was ported from d6c1be296e/botocore/auth.py (L860-L862)
which is used by the aws CLI client.
This fixes https://github.com/heptio/ark/issues/811.
Signed-off-by: Bastian Hofmann <bashofmann@gmail.com>
Instead of only removing the default token from a service account when
it already exists in the cluster, always remove it. If the service
account already exists, continue to do the merging logic.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When backups are run manually (outside of a schedule) the metrics
will be counted for ark_*{schedule=""}. To prevent partial NaN
metrics they will be initialised on server init.
Signed-off-by: Christian Beneke <c.beneke@syseleven.de>
This allows the Ark server to use one URL for the majority of
communications with S3 (or compatible) object storage, and a different
URL base for pre-signed URLs (for streaming logs, etc. to clients).
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
We were checking for nil, but were getting back an empty
*unstructured.Unstructured{} instead, along with a NotFound error.
Change the logic to check for the NotFound error instead of a nil
object.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When a backup is deleted, the delete method uses ListObjects to get a list of
files it needs to delete in s3. Different s3 implementations may return
the object lists in different, even non-deterministic orders, which can
result in the deletion not working because ark tries to delete a non empty folder
before it tries to delete the files in the folder.
Signed-off-by: Bastian Hofmann <bashofmann@gmail.com>
rename variables to reflect the metric name
fix comments for exported methods
explicitly record per schedule per schedule metric values
initialize metrics and change variable name to match with that of metric
add metric for recording failed volume snapshots
use singular variable instead of plural
remove extra field for failed snapshots, calculate using existing fields
initialize failure metric and rename methods
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
- Pin to a specific revision of goimports
- Use -local flag with goimports to keep ark imports separated
- Correct shellcheck errors
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
refactor and move DeleteOptions struct and methods
unexport fields not used outside the package in DeleteOptions struct
refactor BindFlags() to work with name of command
fix constructor
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
The support-matrix.md has been moved to the docs folder, hence the link in the readme didn't work anymore.
Signed-off-by: Oliver Hummel <oliver.hummel@sap.com>
A blank line is necessary before starting a table with Jekyll. Fix it
here so when new versions are cut they render to HTML corectly.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
refactor util methods into a common package
move utility methods into cli package instead of common
rename util.go to common.go
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
Due to the version of docker used to build images, the Dockerfile used
with -f must be in the same directory that's used for the context. Copy
the Dockerfile into the _output directory and make the custom targets
more closely match the standard ones.
Fixes#833
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Moving check for environment variable outside the loop
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Insert a note about AWS_CLUSTER_NAME in the aws-config doc
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Improving implementation and documentation
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Changing instructions, adding unit test for getTagsForCluster and removing duplicated Lookup
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Commit after update
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Correcting bad formatting in aws-config.md
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
This commit only provides the data model for further work. It does not
implement any logic around locations, nor does it remove anything from
the Config API type.
Closes#736Closes#732
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
This is to better reflect the intent of the user when node ports are
specified explicitly (as opposed to being assigned by Kubernetes). The
`last-applied-configuration` annotation added by `kubectl apply` is one
such indicator we are now leveraging.
We still default to omitting the node ports when the annotation is
missing.
Signed-off-by: Timo Reimann <ttr314@googlemail.com>
Refactor plugin management:
- support multiple plugins per executable
- support restarting a plugin process in the event it terminates
- simplify plugin lifecycle management by using separate managers for
each scope (server vs backup vs restore)
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
These commands are intended to be general, and give users some context
for where they can look to get more information about Ark's behaviors.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
- ScheduleName is added as an API field to the Restore object
- Restore controller validates that exactly one of BackupName
or ScheduleName has been provided
- If ScheduleName is provided, Restore controller populates
BackupName with the name of the most recent successful backup
created from the schedule
- --from-schedule flag is added to `ark restore create` CLI cmd
Signed-off-by: Steve Kriss <steve@heptio.com>
Add backups and restores the list of non restorable resources. Backups,
if applicable, are synced from object storage by the backup sync
controller. Restores are specific to a cluster and don't have value
moving across clusters.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Mirror pods are pods created from static manifest files on a node.
They're mirrored to the apiserver so they're visible when querying the
apiserver for a list of pods, but it's not possible to send a pod
containing the mirror pod annotation to the apiserver and have it be
created successfully. Instead of trying to do this, log a message that
we're skipping restoring the pod because it's a mirror pod.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If a PV has a reclaim policy of Delete and we didn't create a snapshot
of it, don't restore the PV, as doing so would create a PV whose
underlying volume is incorrect.
Also "reset" any PVCs bound to the PV so they'll be dynamically
provisioned when restored.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
All properties from a backup will be merged into the ServiceAccount
except for the default token secret.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
* add a metrics package to handle metric registration and publishing
* add a metricsAddress field to the server struct
* make metrics a part of the server
* start a metrics endpoint as part of starting the controllers
* instrument backup_controller to report metrics
* update cli-reference docs
* update example deployments with prometheus annotations
* update 'pkg/install' tooling with prometheus annotations
Signed-off-by: Ashish Amarnath <ashish.amarnath@gmail.com>
Handle the case where a BackupItemAction may return nil for updatedItem,
meaning "no modifications to the item". The backupPVAction does this,
and we were panicking instead of accepting it.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Writing to GCP's object store is any async operation, so errors need to
be checked both on write and close calls, since errors like permission
violations aren't reported until a close.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
This will help users use the `--selector` flag to selectively exclude objects from being backed up by ark
workaround for #404 until dedicated flags are implemented
Signed-off-by: Dhananjay Sathe <dhanajaysathe@gmail.com>
Completed jobs and pods may be useful in the backup for auditing
purposes, but don't recreate them when restoring.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
These internal `gcr.io/heptio-images/golang` images are deprecated. It looks like `git` and `bash` are the only things the Ark build needed that aren't in the upstream `golang:1.9-alpine3.6` image.
Signed-off-by: Matt Moyer <moyer@heptio.com>
Update k8s.io/api to v1.10.0
Update k8s.io/apimachinery to v1.10.0
Update k8s.io/client-go to v7.0
Update k8s.io/kubernetes to v1.10
Signed-off-by: Steve Kriss <steve@heptio.com>
2018-04-20 12:58:20 -07:00
4108 changed files with 518594 additions and 399539 deletions
Below is a list of adopters of Velero in **production environments** that have
publicly shared the details of how they use it.
**[BitGo][20]**
BitGo uses Velero backup and restore capabilities to seamlessly provision and scale fullnode statefulsets on the fly as well as having it serve an integral piece for our kubernetes disaster-recovery story.
**[Bugsnag][30]**
We use Velero for managing backups of an internal instance of our on-premise clustered solution. We also recommend our users of [on-premise Bugsnag installations][31] use Velero for [managing their own backups][32].
**[Banzai Cloud][60]**
[Banzai Cloud Pipeline][61] is a Kubernetes-based microservices platform that integrates services needed for Day-1 and Day-2 operations along with first-class support both for on-prem and hybrid multi-cloud deployments. We use Velero to periodically [backup and restore these clusters in case of disasters][62].
## Solutions built with Velero
Below is a list of solutions where Velero is being used as a component.
**[Nirmata][10]**
We have integrated our [solution with Velero][11] to provide our customers with out of box backup/DR.
**[Kyma][40]**
Kyma [integrates with Velero][41] to effortlessly back up and restore Kyma clusters with all its resources. Velero capabilities allow Kyma users to define and run manual and scheduled backups in order to successfully handle a disaster-recovery scenario.
**[Red Hat][50]**
Red Hat has developed the [Cluster Application Migration Tool][51] which uses [Velero and Restic][52] to drive the migration of applications between OpenShift clusters.
**[Dell EMC][70]**
For Kubernetes environments, [PowerProtect Data Manager][71] leverages the Container Storage Interface (CSI) framework to take snapshots to back up the persistent data or the data that the application creates e.g. databases. [Dell EMC leverages Velero][72] to backup the namespace configuration files (also known as Namespace meta data) for enterprise grade data protection.
**[SIGHUP][80]**
SIGHUP integrates Velero in its [Fury Kubernetes Distribution][81] providing predefined schedules and configurations to ensure an optimized disaster recovery experience.
[Fury Kubernetes Disaster Recovery Module][82] is ready to be deployed into any Kubernetes cluster running anywhere.
**[MayaData][90]**
MayaData is a large user of Velero as well as a contributor. MayaData offers a Data Agility platform called [OpenEBS Director][91], that helps customers confidently and easily manage stateful workloads in Kubernetes. Velero is one of the core software building block of the OpenEBS Director's [DMaaS or data migration as a service offering][92] used to enable data protection strategies.
## Adding your organization to the list of Velero Adopters
If you are using Velero and would like to be included in the list of `Velero Adopters`, add an SVG version of your logo to the `site/img/adopters` directory in this repo and submit a [pull request][3] with your change. Name the image file something that reflects your company (e.g., if your company is called Acme, name the image acme.png). See this for an example [PR][4].
### Adding a logo to velero.io
If you would like to add your logo to a future `Adopters of Velero` section on [velero.io][2], follow the steps above to add your organization to the list of Velero Adopters. Our community will follow up and publish it to the [velero.io][2] website.
* Backup deletion has been completely revamped to make it simpler and less error-prone. As a user, you still use the `ark backup delete` command to request deletion of a backup and its associated cloud
resources; behind the scenes, we've switched to using a new `DeleteBackupRequest` Custom Resource and associated controller for processing deletion requests.
* We've reduced the number of required fields in the Ark config. For Azure, `location` is no longer required, and for GCP, `project` is not needed.
* Ark now copies tags from volumes to snapshots during backup, and from snapshots to new volumes during restore.
##### Breaking Changes:
* Ark has moved back to a single namespace (`heptio-ark` by default) as part of #383.
##### All New Features:
* Add global `--kubecontext` flag to Ark CLI (#296, @blakebarnett)
* Azure: support cross-resource group restores of volumes (#356#378, @skriss)
* AWS/Azure/GCP: copy tags from volumes to snapshots, and from snapshots to volumes (#341, @skriss)
* Replace finalizer for backup deletion with `DeleteBackupRequest` custom resource & controller (#383#431, @ncdc@nrb)
* Don't log warnings during restore if an identical object already exists in the cluster (#405, @nrb)
* Add bash & zsh completion support (#384, @containscafeine)
##### Bug Fixes / Other Changes:
* Error from the Ark CLI if attempting to restore a non-existent backup (#302, @ncdc)
* Enable running the Ark server locally for development purposes (#334, @ncdc)
* Add examples to `ark schedule create` documentation (#331, @lypht)
* GCP: Remove `project` requirement from Ark config (#345, @skriss)
* Add `--from-backup` flag to `ark restore create` and allow custom restore names (#342#409, @skriss)
* Azure: remove `location` requirement from Ark config (#344, @skriss)
* Add documentation/examples for storing backups in IBM Cloud Object Storage (#321, @roytman)
* Reduce verbosity of hooks logging (#362, @skriss)
* AWS: Add minimal IAM policy to documentation (#363#419, @hopkinsth)
* Don't restore events (#374, @sanketjpatel)
* Azure: reduce API polling interval from 60s to 5s (#359, @skriss)
* Switch from hostPath to emptyDir volume type for minio example (#386, @containscafeine)
* Add limit ranges as a prioritized resource for restores (#392, @containscafeine)
* AWS: Add documentation on using Ark with kube2iam (#402, @domderen)
* Azure: add node selector so Ark pod is scheduled on a linux node (#415, @ffd2subroutine)
* Error from the Ark CLI if attempting to get logs for a non-existent restore (#391, @containscafeine)
* GCP: Add minimal IAM policy to documentation (#429, @skriss@jody-frankowski)
##### Upgrading from v0.7.1:
Ark v0.7.1 moved the Ark server deployment into a separate namespace, `heptio-ark-server`. As of v0.8.0 we've
returned to a single namespace, `heptio-ark`, for all Ark-related resources. If you're currently running v0.7.1,
here are the steps you can take to upgrade:
1. Execute the steps from the **Credentials and configuration** section for your cloud:
* **Plugins** - We now support user-defined plugins that can extend Ark functionality to meet your custom backup/restore needs without needing to be compiled into the core binary. We support pluggable block and object stores as well as per-item backup and restore actions that can execute arbitrary logic, including modifying the items being backed up or restored. For more information see the [documentation](docs/plugins.md), which includes a reference to a fully-functional sample plugin repository. (#174 #188 #206 #213 #215 #217 #223 #226)
* **Describers** - The Ark CLI now includes `describe` commands for `backups`, `restores`, and `schedules` that provide human-friendly representations of the relevant API objects.
Breaking Changes:
* The config object format has changed. In order to upgrade to v0.6.0, the config object will have to be updated to match the new format. See the [examples](examples) and [documentation](docs/config-definition.md) for more information.
* The restore object format has changed. The `warnings` and `errors` fields are now ints containing the counts, while full warnings and errors are now stored in the object store instead of etcd. Restore objects created prior to v.0.6.0 should be deleted, or a new bucket used, and the old restore objects deleted from Kubernetes (`kubectl -n heptio-ark delete restore --all`).
* The backup tar file format has changed. Backups created using previous versions of Ark cannot be restored using v0.5.0.
* When backing up one or more specific namespaces, cluster-scoped resources are no longer backed up by default, with the exception of PVs that are used within the target namespace(s). Cluster-scoped resources can still be included by explicitly specifying `--include-cluster-resources`.
New features:
* Add customized user-agent string for Ark CLI
* Switch from glog to logrus
* Exclude nodes from restoration
* Add a FAQ
* Record PV availability zone and use it when restoring volumes from snapshots
* Back up the PV associated with a PVC
* Add `--include-cluster-resources` flag to `ark backup create`
* Add `--include-cluster-resources` flag to `ark restore create`
* Properly support resource restore priorities across cluster-scoped and namespace-scoped resources
As contributors and maintainers of this project, and in the interest of fostering
an open and welcoming community, we pledge to respect all people who contribute
through reporting issues, posting feature requests, updating documentation,
submitting pull requests or patches, and other activities.
We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
We are committed to making participation in this project a harassment-free experience for
everyone, regardless of level of experience, gender, gender identity and expression,
sexual orientation, disability, personal appearance, body size, race, ethnicity, age,
religion, or nationality.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
Examples of unacceptable behavior by participants include:
## Our Standards
* The use of sexualized language or imagery
* Personal attacks
*Trolling or insulting/derogatory comments
Examples of behavior that contributes to a positive environment for our community include:
*Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing other's private information, such as physical or electronic addresses,
without explicit permission
* Other unethical or unprofessional conduct.
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
Project maintainers have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are not
aligned to this Code of Conduct. By adopting this Code of Conduct, project maintainers
commit themselves to fairly and consistently applying these principles to every aspect
of managing this project. Project maintainers who do not follow or enforce the Code of
Conduct may be permanently removed from the project team.
## Enforcement Responsibilities
This code of conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community.
Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project maintainer(s).
Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
This Code of Conduct is adapted from the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md) and [Contributor Covenant](http://contributor-covenant.org/version/1/2/0/), version 1.2.0.
## Scope
This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at [oss-coc@vmware.com](mailto:oss-coc@vmware.com). All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of actions.
**Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0,
available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
All authors to the project retain copyright to their work. However, to ensure
that they are only submitting work that they have rights to, we are requiring
everyone to acknowledge this by signing their work.
Any copyright notices in this repo should specify the authors as "the Heptio Ark project contributors".
To sign your work, just add a line like this at the end of your commit message:
```
Signed-off-by: Joe Beda <joe@heptio.com>
```
This can easily be done with the `--signoff` option to `git commit`.
By doing this you state that you can certify the following (from https://developercertificate.org/):
```
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
```
Authors are expected to follow some guidelines when submitting PRs. Please see [our documentation](https://velero.io/docs/master/code-standards/) for details.
Ark gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. Ark lets you:
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a cloud provider or on-premises. Velero lets you:
* Take backups of your cluster and restore in case of loss.
*Copy cluster resources across cloud providers. NOTE: Cloud volume migrations are not yet supported.
* Replicate your production environment for development and testing environments.
*Migrate cluster resources to other clusters.
* Replicate your production cluster to development and testing clusters.
Ark consists of:
Velero consists of:
* A server that runs on your cluster
* A command-line client that runs locally
## More information
## Documentation
[The documentation][29] provides detailed information about building from source, architecture, extending Ark, and more.
[The documentation][29] provides a getting started guide and information about building from source, architecture, extending Velero, and more.
## Getting started
The following example sets up the Ark server and client, then backs up and restores a sample application.
For simplicity, the example uses Minio, an S3-compatible storage service that runs locally on your cluster. See [Set up Ark with your cloud provider][3] for how to run on a cloud provider.
### Prerequisites
* Access to a Kubernetes cluster, version 1.7 or later. Version 1.7.5 or later is required to run `ark backup delete`.
* A DNS server on the cluster
*`kubectl` installed
### Download
Clone or fork the Ark repository:
```
git clone git@github.com:heptio/ark.git
```
NOTE: Make sure to check out the appropriate version. We recommend that you check out the latest tagged version. The master branch is under active development and might not be stable.
### Set up server
1. Start the server and the local storage service. In the root directory of Ark, run:
```bash
kubectl apply -f examples/common/00-prereqs.yaml
kubectl apply -f examples/minio/
```
NOTE: If you get an error about Config creation, wait for a minute, then run the commands again.
1. Deploy the example nginx application:
```bash
kubectl apply -f examples/nginx-app/base.yaml
```
1. Check to see that both the Ark and nginx deployments are successfully created:
```
kubectl get deployments -l component=ark --namespace=heptio-ark
kubectl get deployments --namespace=nginx-example
```
### Install client
For this example, we recommend that you [download a pre-built release][26].
You can also [build from source][7].
Make sure that you install somewhere in your `$PATH`.
### Back up
1. Create a backup for any object that matches the `app=nginx` label selector:
1. To check that the nginx deployment and service are gone, run:
```
kubectl get deployments --namespace=nginx-example
kubectl get services --namespace=nginx-example
kubectl get namespace/nginx-example
```
You should get no results.
NOTE: You might need to wait for a few minutes for the namespace to be fully cleaned up.
### Restore
1. Run:
```
ark restore create --from-backup nginx-backup
```
1. Run:
```
ark restore get
```
After the restore finishes, the output looks like the following:
```
NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR
nginx-backup-20170727200524 nginx-backup Completed 0 0 2017-07-27 20:05:24 +0000 UTC <none>
```
NOTE: The restore can take a few moments to finish. During this time, the `STATUS` column reads `InProgress`.
After a successful restore, the `STATUS` column is `Completed`, and `WARNINGS` and `ERRORS` are 0. All objects in the `nginx-example` namespacee should be just as they were before you deleted them.
If there are errors or warnings, you can look at them in detail:
```
ark restore describe <RESTORE_NAME>
```
For more information, see [the debugging information][18].
### Clean up
If you want to delete any backups you created, including data in object storage and persistent
volume snapshots, you can run:
```
ark backup delete BACKUP_NAME
```
This asks the Ark server to delete all backup data associated with `BACKUP_NAME`. You need to do
this for each backup you want to permanently delete. A future version of Ark will allow you to
delete multiple backups by name or label selector.
Once fully removed, the backup is no longer visible when you run:
```
ark backup get BACKUP_NAME
```
If you want to uninstall Ark but preserve the backup data in object storage and persistent volume
snapshots, it is safe to remove the `heptio-ark` namespace and everything else created for this
example:
```
kubectl delete -f examples/common/
kubectl delete -f examples/minio/
kubectl delete -f examples/nginx-app/base.yaml
```
Please use the version selector at the top of the site to ensure you are using the appropriate documentation for your version of Velero.
## Troubleshooting
If you encounter issues, review the [troubleshooting docs][30], [file an issue][4], or talk to us on the [Kubernetes Slack team][25] channel `#ark-dr`.
If you encounter issues, review the [troubleshooting docs][30], [file an issue][4], or talk to us on the [#velero channel][25] on the Kubernetes Slack server.
## Contributing
Thanks for taking the time to join our community and start contributing!
Feedback and discussion is available on [the mailing list][24].
#### Before you start
* Please familiarize yourself with the [Code of Conduct][8] before contributing.
* See [CONTRIBUTING.md][5] for instructions on the developer certificate of origin that we require.
#### Pull requests
* We welcome pull requests. Feel free to dig through the [issues][4] and jump in.
If you are ready to jump in and test, add code, or help with documentation, follow the instructions on our [Start contributing][31] documentation for guidance on how to setup Velero for development.
## Changelog
See [the list of releases][6] to find out about feature changes.
- We've introduced two new custom resource definitions, `BackupStorageLocation` and `VolumeSnapshotLocation`, that replace the `Config` CRD from
previous versions. As part of this, you may now configure more than one possible location for where backups and snapshots are stored, and when you
create a `Backup` you can select the location where you'd like that particular backup to be stored. See the [Locations documentation][2] for an overview
of this feature.
- Ark's plugin system has been significantly refactored to improve robustness and ease of development. Plugin processes are now automatically restarted
if they unexpectedly terminate. Additionally, plugin binaries can now contain more than one plugin implementation (e.g. and object store *and* a block store,
or many backup item actions).
- The sync process, which ensures that Backup custom resources exist for each backup in object storage, has been revamped to run much more frequently (once
per minute rather than once per hour), to use significantly fewer cloud provider API calls, and to not generate spurious Kubernetes API errors.
- Ark can now be configured to store all data under a prefix within an object storage bucket. This means that you no longer need a separate bucket per Ark
instance; you can now have all of your clusters' Ark backups go into a single bucket, with each cluster having its own prefix/subdirectory
within that bucket.
- Restic backup data is now automatically stored within the same bucket/prefix as the rest of the Ark data. A separate bucket is no longer required (or allowed).
- Ark resources (backups, restores, schedules) can now be bulk-deleted through the `ark` CLI, using the `--all` or `--selector` flags, or by specifying
multiple resource names as arguments to the `delete` commands.
- The `ark` CLI now supports waiting for backups and restores to complete with the `--wait` flag for `ark backup create` and `ark restore create`
- Restores can be created directly from the most recent backup for a schedule, using `ark restore create --from-schedule SCHEDULE_NAME`
### Breaking Changes
Heptio Ark v0.10 contains a number of breaking changes. Upgrading will require some additional steps beyond just updating your client binary and your
container image tag. We've provided a [detailed set of instructions][1] to help you with the upgrade process. **Please read and follow these instructions
carefully to ensure a successful upgrade!**
- The `Config` CRD has been replaced by `BackupStorageLocation` and `VolumeSnapshotLocation` CRDs.
- The interface for external plugins (object/block stores, backup/restore item actions) has changed. If you have authored any custom plugins, they'll
need to be updated for v0.10.
- The [`ObjectStore.ListCommonPrefixes`](https://github.com/heptio/ark/blob/master/pkg/cloudprovider/object_store.go#L50) signature has changed to add a `prefix` parameter.
- Registering plugins has changed. Create a new plugin server with the `NewServer` function, and register plugins with the appropriate functions. See the [`Server`](https://github.com/heptio/ark/blob/master/pkg/plugin/server.go#L37) interface for details.
- The organization of Ark data in object storage has changed. Existing data will need to be moved around to conform to the new layout.
### All Changes
- [b9de44ff](https://github.com/heptio/ark/commit/b9de44ff) update docs to reference config/ dir within release tarballs
- [eace0255](https://github.com/heptio/ark/commit/eace0255) goreleaser: update example image tags to match version being released
- [cff02159](https://github.com/heptio/ark/commit/cff02159) add rbac content, rework get-started for NodePort and publicUrl, add versioning information
- [fa14255e](https://github.com/heptio/ark/commit/fa14255e) add content for docs issue 819
- [06d6665a](https://github.com/heptio/ark/commit/06d6665a) check s3URL scheme upon AWS ObjectStore Init()
- [cc359f6e](https://github.com/heptio/ark/commit/cc359f6e) Add contributor docs for our ZenHub usage
- [f6204562](https://github.com/heptio/ark/commit/f6204562) cleanup service account action log statement
- [450fa72f](https://github.com/heptio/ark/commit/450fa72f) Initialize schedule Prometheus metrics to have them created beforehand (see https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics)
- [39c4267a](https://github.com/heptio/ark/commit/39c4267a) Clarify that object storage should per-cluster
- [78cbdf95](https://github.com/heptio/ark/commit/78cbdf95) delete old deletion requests for backup when processing a new one
- [85a61b8e](https://github.com/heptio/ark/commit/85a61b8e) return nil error if 404 encountered when deleting snapshots
- [a2a7dbda](https://github.com/heptio/ark/commit/a2a7dbda) fix tagging latest by using make's ifeq
- [b4a52e45](https://github.com/heptio/ark/commit/b4a52e45) Add commands for context to the bug template
- [3efe6770](https://github.com/heptio/ark/commit/3efe6770) Update Ark library code to work with Kubernetes 1.11
- [7e8c8c69](https://github.com/heptio/ark/commit/7e8c8c69) Add some basic troubleshooting commands
- [d1955120](https://github.com/heptio/ark/commit/d1955120) require namespace for backups/etc. to exist at server startup
- [683f7afc](https://github.com/heptio/ark/commit/683f7afc) switch to using .status.startTimestamp for sorting backups
- [b71a37db](https://github.com/heptio/ark/commit/b71a37db) Record backup completion time before uploading
- [217084cd](https://github.com/heptio/ark/commit/217084cd) Add example ark version command to issue templates
- [040788bb](https://github.com/heptio/ark/commit/040788bb) Add minor improvements and aws example<Plug>delimitMateCR
- [5b89f7b6](https://github.com/heptio/ark/commit/5b89f7b6) Skip backup sync if it already exists in k8s
- [c6050845](https://github.com/heptio/ark/commit/c6050845) restore controller: switch to 'c' for receiver name
- [706ae07d](https://github.com/heptio/ark/commit/706ae07d) enable a schedule to be provided as the source for a restore
- [aea68414](https://github.com/heptio/ark/commit/aea68414) fix up Slack link in troubleshooting on master branch
- [bb8e2e91](https://github.com/heptio/ark/commit/bb8e2e91) Document how to run the Ark server locally
* Added the `velero migrate-backups` command to migrate legacy Ark backup metadata to the current Velero format in object storage. This command needs to be run in preparation for upgrading to v1.0, **if** you have backups that were originally created prior to v0.11 (i.e. when the project was named Ark).
* Heptio Ark is now Velero! This release is the first one to use the new name. For details on the changes and how to migrate to v0.11, see the [migration instructions][1]. **Please follow the instructions to ensure a successful upgrade to v0.11.**
* Restic has been upgraded to v0.9.4, which brings significantly faster restores thanks to a new multi-threaded restorer.
* Velero now waits for terminating namespaces and persistent volumes to delete before attempting to restore them, rather than trying and failing to restore them while they're being deleted.
* Wait for PVs and namespaces to delete before attempting to restore them. (#826, @nrb)
* Set the zones for GCP regional disks on restore. This requires the `compute.zones.get` permission on the GCP serviceaccount in order to work correctly. (#1200, @nrb)
* Renamed Heptio Ark to Velero. Changed internal imports, environment variables, and binary name. (#1184, @nrb)
* use 'restic stats' instead of 'restic check' to determine if repo exists (#1171, @skriss)
* upgrade restic to v0.9.4 & replace --hostname flag with --host (#1156, @skriss)
* Clarify restore log when object unchanged (#1153, @daved)
* Add backup-version file in backup tarball. (#1117, @wwitzel3)
* add ServerStatusRequest CRD and show server version in `ark version` output (#1116, @skriss)
* If a Service is headless, retain ClusterIP = None when backing up and restoring.
* Use the specifed --label-selector when listing backups, schedules, and restores.
* Restore namespace mapping functionality that was accidentally broken in 0.5.0.
* Always include namespaces in the backup, regardless of the --include-cluster-resources setting.
## v0.5.0
#### 2017-10-26
### Download
- https://github.com/heptio/ark/tree/v0.5.0
### Breaking changes
* The backup tar file format has changed. Backups created using previous versions of Ark cannot be restored using v0.5.0.
* When backing up one or more specific namespaces, cluster-scoped resources are no longer backed up by default, with the exception of PVs that are used within the target namespace(s). Cluster-scoped resources can still be included by explicitly specifying `--include-cluster-resources`.
### New features
* Add customized user-agent string for Ark CLI
* Switch from glog to logrus
* Exclude nodes from restoration
* Add a FAQ
* Record PV availability zone and use it when restoring volumes from snapshots
* Back up the PV associated with a PVC
* Add `--include-cluster-resources` flag to `ark backup create`
* Add `--include-cluster-resources` flag to `ark restore create`
* Properly support resource restore priorities across cluster-scoped and namespace-scoped resources
* **Plugins** - We now support user-defined plugins that can extend Ark functionality to meet your custom backup/restore needs without needing to be compiled into the core binary. We support pluggable block and object stores as well as per-item backup and restore actions that can execute arbitrary logic, including modifying the items being backed up or restored. For more information see the [documentation](docs/plugins.md), which includes a reference to a fully-functional sample plugin repository. (#174#188#206#213#215#217#223#226)
* **Describers** - The Ark CLI now includes `describe` commands for `backups`, `restores`, and `schedules` that provide human-friendly representations of the relevant API objects.
### Breaking Changes
* The config object format has changed. In order to upgrade to v0.6.0, the config object will have to be updated to match the new format. See the [examples](examples) and [documentation](docs/config-definition.md) for more information.
* The restore object format has changed. The `warnings` and `errors` fields are now ints containing the counts, while full warnings and errors are now stored in the object store instead of etcd. Restore objects created prior to v.0.6.0 should be deleted, or a new bucket used, and the old restore objects deleted from Kubernetes (`kubectl -n heptio-ark delete restore --all`).
* Backup deletion has been completely revamped to make it simpler and less error-prone. As a user, you still use the `ark backup delete` command to request deletion of a backup and its associated cloud
resources; behind the scenes, we've switched to using a new `DeleteBackupRequest` Custom Resource and associated controller for processing deletion requests.
* We've reduced the number of required fields in the Ark config. For Azure, `location` is no longer required, and for GCP, `project` is not needed.
* Ark now copies tags from volumes to snapshots during backup, and from snapshots to new volumes during restore.
### Breaking Changes:
* Ark has moved back to a single namespace (`heptio-ark` by default) as part of #383.
### All New Features:
* Add global `--kubecontext` flag to Ark CLI (#296, @blakebarnett)
* Azure: support cross-resource group restores of volumes (#356#378, @skriss)
* AWS/Azure/GCP: copy tags from volumes to snapshots, and from snapshots to volumes (#341, @skriss)
* Replace finalizer for backup deletion with `DeleteBackupRequest` custom resource & controller (#383#431, @ncdc@nrb)
* Don't log warnings during restore if an identical object already exists in the cluster (#405, @nrb)
* Add bash & zsh completion support (#384, @containscafeine)
### Bug Fixes / Other Changes:
* Error from the Ark CLI if attempting to restore a non-existent backup (#302, @ncdc)
* Enable running the Ark server locally for development purposes (#334, @ncdc)
* Add examples to `ark schedule create` documentation (#331, @lypht)
* GCP: Remove `project` requirement from Ark config (#345, @skriss)
* Add `--from-backup` flag to `ark restore create` and allow custom restore names (#342#409, @skriss)
* Azure: remove `location` requirement from Ark config (#344, @skriss)
* Add documentation/examples for storing backups in IBM Cloud Object Storage (#321, @roytman)
* Reduce verbosity of hooks logging (#362, @skriss)
* AWS: Add minimal IAM policy to documentation (#363#419, @hopkinsth)
* Don't restore events (#374, @sanketjpatel)
* Azure: reduce API polling interval from 60s to 5s (#359, @skriss)
* Switch from hostPath to emptyDir volume type for minio example (#386, @containscafeine)
* Add limit ranges as a prioritized resource for restores (#392, @containscafeine)
* AWS: Add documentation on using Ark with kube2iam (#402, @domderen)
* Azure: add node selector so Ark pod is scheduled on a linux node (#415, @ffd2subroutine)
* Error from the Ark CLI if attempting to get logs for a non-existent restore (#391, @containscafeine)
* GCP: Add minimal IAM policy to documentation (#429, @skriss@jody-frankowski)
### Upgrading from v0.7.1:
Ark v0.7.1 moved the Ark server deployment into a separate namespace, `heptio-ark-server`. As of v0.8.0 we've
returned to a single namespace, `heptio-ark`, for all Ark-related resources. If you're currently running v0.7.1,
here are the steps you can take to upgrade:
1. Execute the steps from the **Credentials and configuration** section for your cloud:
* Ark now has support for backing up and restoring Kubernetes volumes using a free open-source backup tool called [restic](https://github.com/restic/restic).
This provides users an out-of-the-box solution for backing up and restoring almost any type of Kubernetes volume, whether or not it has snapshot support
integrated with Ark. For more information, see the [documentation](https://github.com/heptio/ark/blob/master/docs/restic.md).
* Support for Prometheus metrics has been added! View total number of backup attempts (including success or failure), total backup size in bytes, and backup
durations. More metrics coming in future releases!
### All New Features:
* Add restic support (#508#532#533#534#535#537#540#541#545#546#547#548#555#557#561#563#569#570#571#606#608#610#621#631#636, @skriss)
- We've added a new command, `velero install`, to make it easier to get up and running with Velero. This CLI command replaces the static YAML installation files that were previously part of release tarballs. See the updated [install instructions][3] for more information.
- We've made a number of improvements to the plugin framework:
- we've reorganized the relevant packages to minimize the import surface for plugin authors
- all plugins are now wrapped in panic handlers that will report information on panics back to Velero
- Velero's `--log-level` flag is now passed to plugin implementations
- Errors logged within plugins are now annotated with the file/line of where the error occurred
- Restore item actions can now optionally return a list of additional related items that should be restored
- Restore item actions can now indicate that an item *should not* be restored
- For Azure installation, the `cloud-credentials` secret can now be created from a file containing a list of environment variables. Note that `velero install` always uses this method of providing credentials for Azure. For more details, see [Run on Azure][0].
- We've added a new phase, `PartiallyFailed`, for both backups and restores. This new phase is used for backups/restores that successfully process some but not all of their items.
- We removed all legacy Ark references, including API types, prometheus metrics, restic & hook annotations, etc.
- The restic integration remains a **beta feature**. Please continue to try it out and provide feedback, and we'll be working over the next couple of releases to bring it to GA.
### Breaking Changes
#### API
* All legacy Ark data types and pre-1.0 compatibility code has been removed. Users should migrate any backups created pre-v0.11.0 with the `velero migrate-backups` command, available in [v0.11.1][2].
#### Image
* The base container image has been switched to `ubuntu:bionic`
#### Labels/Annotations/Metrics
* The "ark" annotations for specifying hooks are no longer supported, and have been replaced with "velero"-based equivalents.
* The "ark" annotation for specifying restic backups is no longer supported, and has been replaced with a "velero"-based equivalent.
* The "ark" prometheus metrics no longer exist, and have been replaced with "velero"-based equivalents.
#### Plugin Development
*`BlockStore` plugins are now named `VolumeSnapshotter` plugins
* Plugin APIs have moved to reduce the import surface:
* Plugin gRPC servers live in `github.com/heptio/velero/pkg/plugin/framework`
* Plugin interface types live in `github.com/heptio/velero/pkg/plugin/velero`
* RestoreItemAction interface now takes the original item from the backup as a parameter
* RestoreItemAction plugins can now return additional items to restore
* RestoreItemAction plugins can now skip restoring an item
* Plugins may now send stack traces with errors to the Velero server, so that the errors may be put into the server log
* Plugins must now be "namespaced," using `example.domain.com/plugin-name` format
* For external ObjectStore and VolumeSnapshotter plugins. this name will also be the provider name in BackupStorageLoction and VolumeSnapshotLocation objects
*`--log-level` flag is now passed to all plugins
#### Validation
* Configs for Azure, AWS, and GCP are now checked for invalid or extra keys, and the server is halted if any are found
To upgrade from a previous version of Velero, see our [upgrade instructions][1].
### All Changes
* Change base images to ubuntu:bionic (#1488, @skriss)
* Expose the timestamp of the last successful backup in a gauge (#1448, @fabito)
* check backup existence before download (#1447, @fabito)
* Use `latest` image tag if no version information is provided at build time (#1439, @nrb)
* switch from `restic stats` to `restic snapshots` for checking restic repository existence (#1416, @skriss)
* GCP: add optional 'project' config to volume snapshot location for if snapshots are in a different project than the IAM account (#1405, @skriss)
* Disallow bucket names starting with '-' (#1407, @nrb)
* Shorten label values when they're longer than 63 characters (#1392, @anshulc)
* Fail backup if it already exists in object storage. (#1390, @ncdc,carlisia)
* Add PartiallyFailed phase for backups, log + continue on errors during backup process (#1386, @skriss)
* Remove deprecated "hooks" for backups (they've been replaced by "pre hooks") (#1384, @skriss)
* Restic repo ensurer: return error if new repository does not become ready within a minute, and fix channel closing/deletion (#1367, @skriss)
* Support non-namespaced names for built-in plugins (#1366, @nrb)
* Change container base images to debian:stretch-slim and upgrade to go 1.12 (#1365, @skriss)
* Azure: allow credentials to be provided in a .env file (path specified by $AZURE_CREDENTIALS_FILE), formatted like (#1364, @skriss):
```
AZURE_TENANT_ID=${AZURE_TENANT_ID}
AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID}
AZURE_CLIENT_ID=${AZURE_CLIENT_ID}
AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET}
AZURE_RESOURCE_GROUP=${AZURE_RESOURCE_GROUP}
```
* Instantiate the plugin manager with the per-restore logger so plugin logs are captured in the per-restore log (#1358, @skriss)
* Add gauge metrics for number of existing backups and restores (#1353, @fabito)
* Set default TTL for backups (#1352, @vorar)
* Validate that there can't be any duplicate plugin name, and that the name format is `example.io/name`. (#1339, @carlisia)
* AWS/Azure/GCP: fail fast if unsupported keys are provided in BackupStorageLocation/VolumeSnapshotLocation config (#1338, @skriss)
* `velero backup logs` & `velero restore logs`: show helpful error message if backup/restore does not exist or is not finished processing (#1337, @skriss)
* Add support for allowing a RestoreItemAction to skip item restore. (#1336, @sseago)
* Improve error message around invalid S3 URLs, and gracefully handle trailing backslashes. (#1331, @skriss)
* Set backup's start timestamp before patching it to InProgress so start times display in `velero backup get` while in progress (#1330, @skriss)
* Added ability to dynamically disable controllers (#1326, @amanw)
* Remove deprecated code in preparation for v1.0 release (#1323, @skriss):
- remove ark.heptio.com API group
- remove support for reading ark-backup.json files from object storage
- remove Ark field from RestoreResult type
- remove support for "hook.backup.ark.heptio.com/..." annotations for specifying hooks
- remove support for $HOME/.config/ark/ client config directory
- remove support for restoring Azure snapshots using short snapshot ID formats in backup metadata
- stop applying "velero-restore" label to restored resources and remove it from the API pkg
- remove code that strips the "gc.ark.heptio.com" finalizer from backups
- remove support for "backup.ark.heptio.com/..." annotations for requesting restic backups
- remove "ark"-prefixed prometheus metrics
- remove VolumeBackups field and related code from Backup's status
* Rename BlockStore plugin to VolumeSnapshotter (#1321, @skriss)
* Bump plugin ProtocolVersion to version 2 (#1319, @carlisia)
* Remove Warning field from restore item action output (#1318, @skriss)
* Fix for #1312, use describe to determine if AWS EBS snapshot is encrypted and explicitly pass that value in EC2 CreateVolume call. (#1316, @mstump)
* Allow restic restore helper image name to be optionally specified via ConfigMap (#1311, @skriss)
* Compile only once to lower the initialization cost for regexp.MustCompile. (#1306, @pei0804)
* Enable restore item actions to return additional related items to be restored; have pods return PVCs and PVCs return PVs (#1304, @skriss)
* Log error locations from plugin logger, and don't overwrite them in the client logger if they exist already (#1301, @skriss)
* Send stack traces from plugin errors to Velero via gRPC so error location info can be logged (#1300, @skriss)
* Azure: restore volumes in the original region's zone (#1298, @sylr)
* Check for and exclude hostPath-based persistent volumes from restic backup (#1297, @skriss)
* Make resticrepositories non-restorable resources (#1296, @skriss)
* Gracefully handle failed API groups from the discovery API (#1293, @fabito)
* Add `velero install` command for basic use cases. (#1287, @nrb)
* Collect 3 new metrics: backup_deletion_{attempt|failure|success}_total (#1280, @fabito)
* Pass --log-level flag to internal/external plugins, matching Velero server's log level (#1278, @skriss)
* AWS EBS Volume IDs now contain AZ (#1274, @tsturzl)
* Add panic handlers to all server-side plugin methods (#1270, @skriss)
* Move all the interfaces and associated types necessary to implement all of the Velero plugins to under the new package `velero`. (#1264, @carlisia)
* Update `velero restore` to not open every single file open during extraction of the data (#1261, @asaf)
* Remove restore code that waits for a PV to become Available (#1254, @skriss)
* Improve `describe` output
* Move Phase to right under Metadata(name/namespace/label/annotations)
* Move Validation errors: section right after Phase: section and only show it if the item has a phase of FailedValidation
* For restores move Warnings and Errors under Validation errors. Leave their display as is. (#1248, @DheerajSShetty)
* Don't remove storage class from a persistent volume when restoring it (#1246, @skriss)
* Need to defer closing the the ReadCloser in ObjectStoreGRPCServer.GetObject (#1236, @DheerajSShetty)
* Update Kubernetes dependencies to match v1.12, and update Azure SDK to v19.0.0 (GA) (#1231, @skriss)
* Remove pkg/util/collections/map_utils.go, replace with structured API types and apimachinery's unstructured helpers (#1146, @skriss)
* Add original resource (from backup) to restore item action interface (#1123, @mwieczorek)
**If you are running Velero in a non-default namespace**, i.e. any namespace other than `velero`, manual intervention is required when upgrading to v1.1. See [upgrading to v1.1](https://velero.io/docs/v1.1.0/upgrade-to-1.1/) for details.
### Highlights
#### Improved Restic Support
A big focus of our work this cycle was continuing to improve support for restic. To that end, we’ve fixed the following bugs:
- Prior to version 1.1, restic backups could be delayed or failed due to long-lived locks on the repository. Now, Velero removes stale locks from restic repositories every 5 minutes, ensuring they do not interrupt normal operations.
- Previously, the PodVolumeBackup custom resources that represented a restic backup within a cluster were not synchronized between clusters, making it unclear what restic volumes were available to restore into a new cluster. In version 1.1, these resources are synced into clusters, so they are more visible to you when you are trying to restore volumes.
- Originally, Velero would not validate the host path in which volumes were mounted on a given node. If a node did not expose the filesystem correctly, you wouldn’t know about it until a backup failed. Now, Velero’s restic server will validate that the directory structure is correct on startup, providing earlier feedback when it’s not.
- Velero’s restic support is intended to work on a broad range of volume types. With the general release of the [Container Storage Interface API](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/), Velero can now use restic to back up CSI volumes.
Along with our bug fixes, we’ve provided an easier way to move restic backups between storage providers. Different providers often have different StorageClasses, requiring user intervention to make restores successfully complete.
To make cross-provider moves simpler, we’ve introduced a StorageClass remapping plug-in. It allows you to automatically translate one StorageClass on PersistentVolumeClaims and PersistentVolumes to another. You can read more about it in our [documentation](https://velero.io/docs/v1.1.0/restore-reference/#changing-pv-pvc-storage-classes).
#### Quality-of-Life Improvements
We’ve also made several other enhancements to Velero that should benefit all users.
Users sometimes ask about recommendations for Velero’s resource allocation within their cluster. To help with this concern, we’ve added default resource requirements to the Velero Deployment and restic init containers, along with configurable requests and limits for the restic DaemonSet. All these values can be adjusted if your environment requires it.
We’ve also taken some time to improve Velero for the future by updating the Deployment and DaemonSet to use the apps/v1 API group, which will be the [default in Kubernetes 1.16](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#action-required-3). This change means that `velero install` and the `velero plugin` commands will require Kubernetes 1.9 or later to work. Existing Velero installs will continue to work without needing changes, however.
In order to help you better understand what resources have been backed up, we’ve added a list of resources in the `velero backup describe --details` command. This change makes it easier to inspect a backup without having to download and extract it.
In the same vein, we’ve added the ability to put custom tags on cloud-provider snapshots. This approach should provide a better way to keep track of the resources being created in your cloud account. To add a label to a snapshot at backup time, use the `--labels` argument in the `velero backup create` command.
Our final change for increasing visibility into your Velero installation is the `velero plugin get` command. This command will report all the plug-ins within the Velero deployment..
Velero has previously used a restore-only flag on the server to control whether a cluster could write backups to object storage. With Velero 1.1, we’ve now moved the restore-only behavior into read-only BackupStorageLocations. This move means that the Velero server can use a BackupStorageLocation as a source to restore from, but not for backups, while still retaining the ability to back up to other configured locations. In the future, the `--restore-only` flag will be removed in favor of configuring read-only BackupStorageLocations.
#### Community Contributions
We appreciate all community contributions, whether they be pull requests, bug reports, feature requests, or just questions. With this release, we wanted to draw attention to a few contributions in particular:
For users of node-based IAM authentication systems such as kube2iam, `velero install` now supports the `--pod-annotations` argument for applying necessary annotations at install time. This support should make `velero install` more flexible for scenarios that do not use Secrets for access to their cloud buckets and volumes. You can read more about how to use this new argument in our [AWS documentation](https://velero.io/docs/v1.1.0/aws-config/#alternative-setup-permissions-using-kube2iam). Huge thanks to [Traci Kamp](https://github.com/tlkamp) for this contribution.
Structured logging is important for any application, and Velero is no different. Starting with version 1.1, the Velero server can now output its logs in a JSON format, allowing easier parsing and ingestion. Thank you to [Donovan Carthew](https://github.com/carthewd) for this feature.
AWS supports multiple profiles for accessing object storage, but in the past Velero only used the default. With v.1.1, you can set the `profile` key on yourBackupStorageLocation to specify an alternate profile. If no profile is set, the default one is used, making this change backward compatible. Thanks [Pranav Gaikwad](https://github.com/pranavgaikwad) for this change.
Finally, thanks to testing by [Dylan Murray](https://github.com/dymurray) and [Scott Seago](https://github.com/sseago), an issue with running Velero in non-default namespaces was found in our beta version for this release. If you’re running Velero in a namespace other than `velero`, please follow the [upgrade instructions](https://velero.io/docs/v1.1.0/upgrade-to-1.1/).
### All Changes
* Add the prefix to BSL config map so that object stores can use it when initializing (#1767, @betta1)
* Use `VELERO_NAMESPACE` to determine what namespace Velero server is running in. For any v1.0 installations using a different namespace, the `VELERO_NAMESPACE` environment variable will need to be set to the correct namespace. (#1748, @nrb)
* support setting CPU/memory requests with unbounded limits using velero install (#1745, @prydonius)
* sort output of resource list in `velero backup describe --details` (#1741, @prydonius)
* adds the ability to define custom tags to be added to snapshots by specifying custom labels on the Backup CR with the velero backup create --labels flag (#1729, @prydonius)
* Restore restic volumes from PodVolumeBackups CRs (#1723, @carlisia)
* properly restore PVs backed up with restic and a reclaim policy of "Retain" (#1713, @skriss)
* Make `--secret-file` flag on `velero install` optional, add `--no-secret` flag for explicit confirmation (#1699, @nrb)
* Add low cpu/memory limits to the restic init container. This allows for restoration into namespaces with quotas defined. (#1677, @nrb)
* Adds configurable CPU/memory requests and limits to the restic DaemonSet generated by velero install. (#1710, @prydonius)
* remove any stale locks from restic repositories every 5m (#1708, @skriss)
* error if backup storage location's Bucket field also contains a prefix, and gracefully handle leading/trailing slashes on Bucket and Prefix fields. (#1694, @skriss)
* Adds configurable CPU/memory requests and limits to the Velero Deployment generated by velero install. (#1678, @prydonius)
* Store restic PodVolumeBackups in obj storage & use that as source of truth like regular backups. (#1577, @carlisia)
* Update Velero Deployment to use apps/v1 API group. `velero install` and `velero plugin add/remove` commands will now require Kubernetes 1.9+ (#1673, @nrb)
* Respect the --kubecontext and --kubeconfig arugments for `velero install`. (#1656, @nrb)
* add plugin for updating PV & PVC storage classes on restore based on a config map (#1621, @skriss)
* Add restic support for CSI volumes (#1615, @nrb)
* enhancement: allow users to specify additional Velero/Restic pod annotations on the command line with the pod-annotations flag. (#1626, @tlkamp)
* adds validation for pod volumes hostPath mount on restic server startup (#1616, @prydonius)
* enable support for ppc64le architecture (#1605, @prajyot)
* bug fix: only restore additional items returned from restore item actions if they match the restore's namespace/resource selectors (#1612, @skriss)
* add startTimestamp and completionTimestamp to PodVolumeBackup and PodVolumeRestore status fields (#1609, @prydonius)
* bug fix: respect namespace selector when determining which restore item actions to run (#1607, @skriss)
* ensure correct backup item actions run with namespace selector (#1601, @prydonius)
* allows excluding resources from backups with the `velero.io/exclude-from-backup=true` label (#1588, @prydonius)
* ensures backup item action modifications to an item's namespace/name are saved in the file path in the tarball (#1587, @prydonius)
* Hides `velero server` and `velero restic server` commands from the list of available commands as these are not intended for use by the velero CLI user. (#1561, @prydonius)
* remove dependency on glog, update to klog (#1559, @skriss)
* move issue-template-gen from docs/ to hack/ (#1558, @skriss)
* fix panic when processing DeleteBackupRequest objects without labels (#1556, @prydonius)
* support for multiple AWS profiles (#1548, @pranavgaikwad)
* Add CLI command to list (get) all Velero plugins (#1535, @carlisia)
* Added author as a tag on blog post. Should fix 404 error when trying to follow link as specified in issue #1522. (#1522, @coonsd)
* Allow individual backup storage locations to be read-only (#1517, @skriss)
* Stop returning an error when a restic volume is empty since it is a valid scenario. (#1480, @carlisia)
* add ability to use wildcard in includes/excludes (#1428, @guilhem)
Please note that as of this release we are no longer publishing new container images to `gcr.io/heptio-images`. The existing ones will remain there for the foreseeable future.
### Documentation
https://velero.io/docs/v1.2.0/
### Upgrading
https://velero.io/docs/v1.2.0/upgrade-to-1.2/
### Highlights
## Moving Cloud Provider Plugins Out of Tree
Velero has had built-in support for AWS, Microsoft Azure, and Google Cloud Platform (GCP) since day 1. When Velero moved to a plugin architecture for object store providers and volume snapshotters in version 0.6, the code for these three providers was converted to use the plugin interface provided by this new architecture, but the cloud provider code still remained inside the Velero codebase. This put the AWS, Azure, and GCP plugins in a different position compared with other providers’ plugins, since they automatically shipped with the Velero binary and could include documentation in-tree.
With version 1.2, we’ve extracted the AWS, Azure, and GCP plugins into their own repositories, one per provider. We now also publish one plugin image per provider. This change brings these providers to parity with other providers’ plugin implementations, reduces the size of the core Velero binary by not requiring each provider’s SDK to be included, and opens the door for the plugins to be maintained and released independently of core Velero.
## Restic Integration Improvements
We’ve continued to work on improving Velero’s restic integration. With this release, we’ve made the following enhancements:
- Restic backup and restore progress is now captured during execution and visible to the user through the `velero backup/restore describe --details` command. The details are updated every 10 seconds. This provides a new level of visibility into restic operations for users.
- Restic backups of persistent volume claims (PVCs) now remain incremental across the rescheduling of a pod. Previously, if the pod using a PVC was rescheduled, the next restic backup would require a full rescan of the volume’s contents. This improvement potentially makes such backups significantly faster.
- Read-write-many volumes are no longer backed up once for every pod using the volume, but instead just once per Velero backup. This improvement speeds up backups and prevents potential restore issues due to multiple copies of the backup being processed simultaneously.
## Clone PVs When Cloning a Namespace
Before version 1.2, you could clone a Kubernetes namespace by backing it up and then restoring it to a different namespace in the same cluster by using the `--namespace-mappings` flag with the `velero restore create` command. However, in this scenario, Velero was unable to clone persistent volumes used by the namespace, leading to errors for users.
In version 1.2, Velero automatically detects when you are trying to clone an existing namespace, and clones the persistent volumes used by the namespace as well. This doesn’t require the user to specify any additional flags for the `velero restore create` command. This change lets you fully achieve your goal of cloning namespaces using persistent storage within a cluster.
## Improved Server-Side Encryption Support
To help you secure your important backup data, we’ve added support for more forms of server-side encryption of backup data on both AWS and GCP. Specifically:
- On AWS, Velero now supports Amazon S3-managed encryption keys (SSE-S3), which uses AES256 encryption, by specifying `serverSideEncryption: AES256` in a backup storage location’s config.
- On GCP, Velero now supports using a specific Cloud KMS key for server-side encryption by specifying `kmsKeyName: <key name>` in a backup storage location’s config.
## CRD Structural Schema
In Kubernetes 1.16, custom resource definitions (CRDs) reached general availability. Structural schemas are required for CRDs created in the `apiextensions.k8s.io/v1` API group. Velero now defines a structural schema for each of its CRDs and automatically applies it the user runs the `velero install` command. The structural schemas enable the user to get quicker feedback when their backup, restore, or schedule request is invalid, so they can immediately remediate their request.
### All Changes
* Ensure object store plugin processes are cleaned up after restore and after BSL validation during server start up (#2041, @betta1)
* bug fix: don't try to restore pod volume backups that don't have a snapshot ID (#2031, @skriss)
* Restore Documentation: Updated Restore Documentation with Clarification implications of removing restore object. (#1957, @nainav)
* add `--allow-partially-failed` flag to `velero restore create` for use with `--from-schedule` to allow partially-failed backups to be restored (#1994, @skriss)
* Allow backup storage locations to specify backup sync period or toggle off sync (#1936, @betta1)
* Remove cloud provider code (#1985, @carlisia)
* Restore action for cluster/namespace role bindings (#1974, @alexander-demichev)
* Add `--no-default-backup-location` flag to `velero install` (#1931, @Frank51)
* If includeClusterResources is nil/auto, pull in necessary CRDs in backupResource (#1831, @sseago)
* Azure: add support for Azure China/German clouds (#1938, @andyzhangx)
* Add a new required `--plugins` flag for `velero install` command. `--plugins` takes a list of container images to add as initcontainers. (#1930, @nrb)
* restic: only backup read-write-many PVCs at most once, even if they're annotated for backup from multiple pods. (#1896, @skriss)
* Azure: add support for cross-subscription backups (#1895, @boxcee)
* adds `insecureSkipTLSVerify` server config for AWS storage and `--insecure-skip-tls-verify` flag on client for self-signed certs (#1793, @s12chung)
* Add check to update resource field during backupItem (#1904, @spiffcs)
* Add `LD_LIBRARY_PATH` (=/plugins) to the env variables of velero deployment. (#1893, @lintongj)
* backup sync controller: stop using `metadata/revision` file, do a full diff of bucket contents vs. cluster contents each sync interval (#1892, @skriss)
* bug fix: during restore, check item's original namespace, not the remapped one, for inclusion/exclusion (#1909, @skriss)
* adds structural schema to Velero CRDs created on Velero install, enabling validation of Velero API fields (#1898, @prydonius)
* GCP: add support for specifying a Cloud KMS key name to use for encrypting backups in a storage location. (#1879, @skriss)
* AWS: add support for SSE-S3 AES256 encryption via `serverSideEncryption` config field in BackupStorageLocation (#1869, @skriss)
* change default `restic prune` interval to 7 days, add `velero server/install` flags for specifying an alternate default value. (#1864, @skriss)
* velero install: if `--use-restic` and `--wait` are specified, wait up to a minute for restic daemonset to be ready (#1859, @skriss)
* report restore progress in PodVolumeRestores and expose progress in the velero restore describe --details command (#1854, @prydonius)
* Jekyll Site updates - modifies documentation to use a wider layout; adds better markdown table formatting (#1848, @ccbayer)
* fix excluding additional items with the velero.io/exclude-from-backup=true label (#1843, @prydonius)
* report backup progress in PodVolumeBackups and expose progress in the velero backup describe --details command. Also upgrades restic to v0.9.5 (#1821, @prydonius)
* Add `--features` argument to all velero commands to provide feature flags that can control enablement of pre-release features. (#1798, @nrb)
* when backing up PVCs with restic, specify `--parent` flag to prevent full volume rescans after pod reschedules (#1807, @skriss)
* remove 'restic check' calls from before/after 'restic prune' since they're redundant (#1794, @skriss)
* fix error formatting due interpreting % as printf formatted strings (#1781, @s12chung)
* when using `velero restore create --namespace-mappings ...` to create a second copy of a namespace in a cluster, create copies of the PVs used (#1779, @skriss)
* adds --from-schedule flag to the `velero create backup` command to create a Backup from an existing Schedule (#1734, @prydonius)
* Allow `plugins/` as a valid top-level directory within backup storage locations. This directory is a place for plugin authors to store arbitrary data as needed. It is recommended to create an additional subdirectory under `plugins/` specifically for your plugin, e.g. `plugins/my-plugin-data/`. (#2350, @skriss)
* bug fix: don't panic in `velero restic repo get` when last maintenance time is `nil` (#2315, @skriss)
#### Custom Resource Definition Backup and Restore Improvements
This release includes a number of related bug fixes and improvements to how Velero backs up and restores custom resource definitions (CRDs) and instances of those CRDs.
We found and fixed three issues around restoring CRDs that were originally created via the `v1beta1` CRD API. The first issue affected CRDs that had the `PreserveUnknownFields` field set to `true`. These CRDs could not be restored into 1.16+ Kubernetes clusters, because the `v1` CRD API does not allow this field to be set to `true`. We added code to the restore process to check for this scenario, to set the `PreserveUnknownFields` field to `false`, and to instead set `x-kubernetes-preserve-unknown-fields` to `true` in the OpenAPIv3 structural schema, per Kubernetes guidance. For more information on this, see the [Kubernetes documentation](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#pruning-versus-preserving-unknown-fields). The second issue affected CRDs without structural schemas. These CRDs need to be backed up/restored through the `v1beta1` API, since all CRDs created through the `v1` API must have structural schemas. We added code to detect these CRDs and always back them up/restore them through the `v1beta1` API. Finally, related to the previous issue, we found that our restore code was unable to handle backups with multiple API versions for a given resource type, and we’ve remediated this as well.
We also improved the CRD restore process to enable users to properly restore CRDs and instances of those CRDs in a single restore operation. Previously, users found that they needed to run two separate restores: one to restore the CRD(s), and another to restore instances of the CRD(s). This was due to two deficiencies in the Velero code. First, Velero did not wait for a CRD to be fully accepted by the Kubernetes API server and ready for serving before moving on; and second, Velero did not refresh its cached list of available APIs in the target cluster after restoring CRDs, so it was not aware that it could restore instances of those CRDs.
We fixed both of these issues by (1) adding code to wait for CRDs to be “ready” after restore before moving on, and (2) refreshing the cached list of APIs after restoring CRDs, so any instances of newly-restored CRDs could subsequently be restored.
With all of these fixes and improvements in place, we hope that the CRD backup and restore experience is now seamless across all supported versions of Kubernetes.
#### Multi-Arch Docker Images
Thanks to community members [@Prajyot-Parab](https://github.com/Prajyot-Parab) and [@shaneutt](https://github.com/shaneutt), Velero now provides multi-arch container images by using Docker manifest lists. We are currently publishing images for `linux/amd64`, `linux/arm64`, `linux/arm`, and `linux/ppc64le` in [our Docker repository](https://hub.docker.com/r/velero/velero/tags?page=1&name=v1.3&ordering=last_updated).
Users don’t need to change anything other than updating their version tag - the v1.3 image is `velero/velero:v1.3.0`, and Docker will automatically pull the proper architecture for the host.
For more information on manifest lists, see [Docker’s documentation](https://docs.docker.com/registry/spec/manifest-v2-2/).
#### Bug Fixes, Usability Enhancements, and More
We fixed a large number of bugs and made some smaller usability improvements in this release. Here are a few highlights:
- Support private registries with custom ports for the restic restore helper image ([PR #1999](https://github.com/vmware-tanzu/velero/pull/1999), [@cognoz](https://github.com/cognoz))
- Use AWS profile from BackupStorageLocation when invoking restic ([PR #2096](https://github.com/vmware-tanzu/velero/pull/2096), [@dinesh](https://github.com/dinesh))
- Allow restores from schedules in other clusters ([PR #2218](https://github.com/vmware-tanzu/velero/pull/2218), [@cpanato](https://github.com/cpanato))
* Corrected the selfLink for Backup CR in site/docs/master/output-file-format.md (#2292, @RushinthJohn)
* Back up schema-less CustomResourceDefinitions as v1beta1, even if they are retrieved via the v1 endpoint. (#2264, @nrb)
* Bug fix: restic backup volume snapshot to the second location failed (#2244, @jenting)
* Added support of using PV name from volumesnapshotter('SetVolumeID') in case of PV renaming during the restore (#2216, @mynktl)
* Replaced deprecated helm repo url at all it appearance at docs. (#2209, @markrity)
* added support for arm and arm64 images (#2227, @shaneutt)
* when restoring from a schedule, validate by checking for backup(s) labeled with the schedule name rather than existence of the schedule itself, to allow for restoring from deleted schedules and schedules in other clusters (#2218, @cpanato)
* bug fix: back up server-preferred version of CRDs rather than always the `v1beta1` version (#2230, @skriss)
* Wait for CustomResourceDefinitions to be ready before restoring CustomResources. Also refresh the resource list from the Kubernetes API server after restoring CRDs in order to properly restore CRs. (#1937, @nrb)
* When restoring a v1 CRD with PreserveUnknownFields = True, make sure that the preservation behavior is maintained by copying the flag into the Open API V3 schema, but update the flag so as to allow the Kubernetes API server to accept the CRD without error. (#2197, @nrb)
# Design proposal template (replace with your proposal's title)
One to two sentences that describes the goal of this proposal.
The reader should be able to tell by the title, and the opening paragraph, if this document is relevant to them.
_Note_: The preferred style for design documents is one sentence per line.
*Do not wrap lines*.
This aids in review of the document as changes to a line are not obscured by the reflowing those changes caused and has a side effect of avoiding debate about one or two space after a period.
## Goals
- A short list of things which will be accomplished by implementing this proposal.
- Two things is ok.
- Three is pushing it.
- More than three goals suggests that the proposal's scope is too large.
## Non Goals
- A short list of items which are:
- a. out of scope
- b. follow on items which are deliberately excluded from this proposal.
## Background
One to two paragraphs of exposition to set the context for this proposal.
## High-Level Design
One to two paragraphs that describe the high level changes that will be made to implement this proposal.
## Detailed Design
A detailed design describing how the changes to the product should be made.
The names of types, fields, interfaces, and methods should be agreed on here, not debated in code review.
The same applies to changes in CRDs, YAML examples, and so on.
Ideally the changes should be made in sequence so that the work required to implement this design can be done incrementally, possibly in parallel.
## Alternatives Considered
If there are alternative high level or detailed designs that were not pursued they should be called out here with a brief explanation of why they were not pursued.
## Security Considerations
If this proposal has an impact to the security of the product, its users, or data stored or transmitted via the product, they must be addressed here.
# Expose list of backed up resources in backup details
Status: Accepted
To increase the visibility of what a backup might contain, this document proposes storing metadata about backed up resources in object storage and adding a new section to the detailed backup description output to list them.
## Goals
- Include a list of backed up resources as metadata in the bucket
- Enable users to get a view of what resources are included in a backup using the Velero CLI
## Non Goals
- Expose the full manifests of the backed up resources
## Background
As reported in [#396](https://github.com/heptio/velero/issues/396), the information reported in a `velero backup describe <name> --details` command is fairly limited, and does not easily describe what resources a backup contains.
In order to see what a backup might contain, a user would have to download the backup tarball and extract it.
This makes it difficult to keep track of different backups in a cluster.
## High-Level Design
After performing a backup, a new file will be created that contains the list of the resources that have been included in the backup.
This file will be persisted in object storage alongside the backup contents and existing metadata.
A section will be added to the output of `velero backup describe <name> --details` command to view this metadata.
## Detailed Design
### Metadata file
This metadata will be in JSON (or YAML) format so that it can be easily inspected from the bucket outside of Velero tooling, and will contain the API resource and group, namespaces and names of the resources:
```
apps/v1/Deployment:
- default/database
- default/wordpress
v1/Service:
- default/database
- default/wordpress
v1/Secret:
- default/database-root-password
- default/database-user-password
v1/ConfigMap:
- default/database
v1/PersistentVolume:
- my-pv
```
The filename for this metadata will be `<backup name>-resource-list.json.gz`.
The top-level key is the string form of the `schema.GroupResource` type that we currently keep track of in the backup controller code path.
### Changes in Backup controller
The Backupper currently initialises a map to track the `backedUpItems` (https://github.com/heptio/velero/blob/1594bdc8d0132f548e18ffcc1db8c4cd2b042726/pkg/backup/backup.go#L269), this is passed down through GroupBackupper, ResourceBackupper and ItemBackupper where ItemBackupper records each backed up item.
This property will be moved to the [Backup request struct](https://github.com/heptio/velero/blob/16910a6215cbd8f0bde385dba9879629ebcbcc28/pkg/backup/request.go#L11), allowing the BackupController to access it after a successful backup.
`backedUpItems` currently uses the `schema.GroupResource` as a key for the resource.
In order to record the API group, version and kind for the resource, this key will be constructed from the object's `schema.GroupVersionKind` in the format `{group}/{version}/{kind}` (e.g. `apps/v1/Deployment`).
The `backedUpItems` map is kept as a flat structure internally for quick lookup.
When the backup is ready to upload, `backedUpItems` will be converted to a nested structure representing the metadata file above, grouped by `schema.GroupVersionKind`.
After converting to the right format, it can be passed to the `persistBackup` function to persist the file in object storage.
### Changes to DownloadRequest CRD and processing
A new `DownloadTargetKind` "BackupResourceList" will be added to the DownloadRequest CR.
The `GetDownloadURL` function in the `persistence` package will be updated to handle this new DownloadTargetKind to enable the Velero client to fetch the metadata from the bucket.
### Changes to `velero backup describe <name> --details`
This command will need to be updated to fetch the metadata from the bucket using the `Stream` method used in other commands.
The file will be read in memory and displayed in the output of the command.
Depending on the format the metadata is stored in, it may need processing to print in a more human-readable format.
If we choose to store the metadata in YAML, it can likely be directly printed out.
If the metadata file does not exist, this is an older backup and we cannot display the list of resources that were backed up.
## Open Questions
## Alternatives Considered
### Fetch backup contents archive and walkthrough to list contents
Instead of recording new metadata about what resources have been backed up, we could simply download the backup contents archive and walkthrough it to list the contents everytime `velero backup describe <name> --details` is run.
The advantage of this approach is that we don't need to change any backup procedures as we already have this content, and we will also be able to list resources for older backups.
Additionally, if we wanted to expose more information about the backed up resources, we can do so without having to update what we store in the metadata.
The disadvantages are:
- downloading the whole backup archive will be larger than just downloading a smaller file with metadata
- reduces the metadata available in the bucket that users might want to inspect outside of Velero tooling (though this is not an explicit requirement)
The Container Storage Interface (CSI) [introduced an alpha snapshot API in Kubernetes v1.12][1].
It will reach beta support in Kubernetes v1.17, scheduled for release in December 2019.
This proposal documents an approach for integrating support for this snapshot API within Velero, augmenting its existing capabilities.
## Goals
- Enable Velero to backup and restore CSI-backed volumes using the Kubernetes CSI CustomResourceDefinition API
## Non Goals
- Replacing Velero's existing [VolumeSnapshotter][7] API
- Replacing Velero's Restic support
## Background
Velero has had support for performing persistent volume snapshots since its inception.
However, support has been limited to a handful of providers.
The plugin API introduced in Velero v0.7 enabled the community to expand the number of supported providers.
In the meantime, the Kubernetes sig-storage advanced the CSI spec to allow for a generic storage interface, opening up the possibility of moving storage code out of the core Kubernetes code base.
The CSI working group has also developed a generic snapshotting API that any CSI driver developer may implement, giving users the ability to snapshot volumes from a standard interface.
By supporting the CSI snapshot API, Velero can extend its support to any CSI driver, without requiring a Velero-specific plugin be written, easing the development burden on providers while also reaching more end users.
## High-Level Design
In order to support CSI's snapshot API, Velero must interact with the [`VolumeSnapshot`][2] and [`VolumeSnapshotContent`][3] CRDs.
These act as requests to the CSI driver to perform a snapshot on the underlying provider's volume.
This can largely be accomplished with Velero `BackupItemAction` and `RestoreItemAction` plugins that operate on these CRDs.
Additionally, changes to the Velero server and client code are necessary to track `VolumeSnapshot`s that are associated with a given backup, similarly to how Velero tracks its own [`volume.Snapshot`][4] type.
Tracking these is important for allowing users to see what is in their backup, and provides parity for the existing `volume.Snapshot` and [`PodVolumeBackup`][5] types.
This is also done to retain the object store as Velero's source of truth, without having to query the Kubernetes API server for associated `VolumeSnapshot`s.
`velero backup describe --details` will use the stored VolumeSnapshots to list CSI snapshots included in the backup to the user.
## Detailed Design
### Resource Plugins
A set of [prototype][6] plugins was developed that informed this design.
The plugins will be as follows:
#### A `BackupItemAction` for `PersistentVolumeClaim`s, named `velero.io/csi-pvc`
This plugin will act directly on PVCs, since an implementation of Velero's VolumeSnapshotter does not have enough information about the StorageClass to properly create the `VolumeSnapshot` objects.
The associated PV will be queried and checked for the presence of `PersistentVolume.Spec.PersistentVolumeSource.CSI`. (See the "Snapshot Mechanism Selection" section below).
If this field is `nil`, then the plugin will return early without taking action.
If the `Backup.Spec.SnapshotVolumes` value is `false`, the plugin will return early without taking action.
Additionally, to prevent creating CSI snapshots for volumes backed up by restic, the plugin will query for all pods in the `PersistentVolumeClaim`'s namespace.
It will then filter out the pods that have the PVC mounted, and inspect the `backup.velero.io/backup-volumes` annotation for the associated volume's name.
If the name is found in the list, then the plugin will return early without taking further action.
Create a `VolumeSnapshot.snapshot.storage.k8s.io` object from the PVC.
Label the `VolumeSnapshot` object with the [`velero.io/backup-name`][10] label for ease of lookup later.
Also set an ownerRef on the `VolumeSnapshot` so that cascading deletion of the Velero `Backup` will delete associated `VolumeSnapshots`.
The CSI controllers will create a `VolumeSnapshotContent.snapshot.storage.k8s.io` object associated with the `VolumeSnapshot`.
Associated `VolumeSnapshotContent` objects will be retrieved and updated with the [`velero.io/backup-name`][10] label for ease of lookup later.
`velero.io/volume-snapshot-name` will be applied as a label to the PVC so that the `VolumeSnapshot` can be found easily for restore.
`VolumeSnapshot`, `VolumeSnapshotContent`, and `VolumeSnapshotClass` objects would be returned as additional items to be backed up. GitHub issue [1566][18] represents this work.
The `VolumeSnapshotContent.Spec.VolumeSnapshotSource.SnapshotHandle` field is the link to the underlying platform's on-disk snapshot, and must be preserved for restoration.
The plugin will _not_ wait for the `VolumeSnapshot.Status.readyToUse` field to be `true` before returning.
This field indicates that the snapshot is ready to use for restoration, and for different vendors can indicate that the snapshot has been made durable.
However, the applications can proceed as soon as `VolumeSnapshot.Status.CreationTime` is set.
This also maintains current Velero behavior, which allows applications to quiesce and resume quickly, with minimal interruption.
Any sort of monitoring or waiting for durable snapshots, either Velero-native or CSI snapshots, are not covered by this proposal.
```
K8s object relationships inside of the backup tarball
No special logic is required to restore `VolumeSnapshotClass` objects.
These plugins should be provided with Velero, as there will also be some changes to core Velero code to enable association of a `Backup` to the included `VolumeSnapshot`s.
### Velero server changes
Any non-plugin code changes must be behind a `EnableCSI` feature flag and the behavior will be opt-in until it's exited beta status.
This will allow the development to continue on the feature while it's in pre-production state, while also reducing the need for long-lived feature branches.
[`persistBackup`][8] will be extended to query for all `VolumeSnapshot`s associated with the backup, and persist the list to JSON.
[`BackupStore.PutBackup`][9] will receive an additional argument, `volumeSnapshots io.Reader`, that contains the JSON representation of `VolumeSnapshots`.
This will be written to a file named `csi-snapshots.json.gz`.
[`defaultRestorePriorities`][11] should be rewritten to the following to accomodate proper association between the CSI objects and PVCs. `CustomResourceDefinition`s are moved up because they're necessary for creating the CSI CRDs. The CSI CRDs are created before `PersistentVolume`s and `PersistentVolumeClaim`s so that they may be used as data sources.
GitHub issue [1565][17] represents this work.
```go
vardefaultRestorePriorities=[]string{
"namespaces",
"storageclasses",
"customresourcedefinitions",
"volumesnapshotclass.snapshot.storage.k8s.io",
"volumesnapshotcontents.snapshot.storage.k8s.io",
"volumesnapshots.snapshot.storage.k8s.io",
"persistentvolumes",
"persistentvolumeclaims",
"secrets",
"configmaps",
"serviceaccounts",
"limitranges",
"pods",
"replicaset",
}
```
### Restic and CSI interaction
Volumes found in a `Pod`'s `backup.velero.io/backup-volumes` list will use Velero's current Restic code path.
This also means Velero will continue to offer Restic as an option for CSI volumes.
The `velero.io/csi-pvc` BackupItemAction plugin will inspect pods in the namespace to ensure that it does not act on PVCs already being backed up by restic.
This is preferred to modifying the PVC due to the fact that Velero's current backup process backs up PVCs and PVs mounted to pods at the same time as the pod.
A drawback to this approach is that we're querying all pods in the namespace per PVC, which could be a large number.
In the future, the plugin interface could be improved to have some sort of context argument, so that additional data such as our existing `resticSnapshotTracker` could be passed to plugins and reduce work.
### Garbage collection and deletion
To ensure that all created resources are deleted when a backup expires or is deleted, `VolumeSnapshot`s will have an `ownerRef` defined pointing to the Velero backup that created them.
In order to fully delete these objects, each `VolumeSnapshotContent`s object will need to be edited to ensure the associated provider snapshot is deleted.
This will be done by editing the object and setting `VolumeSnapshotContent.Spec.DeletionPolicy` to `Delete`, regardless of whether or not the default policy for the class is `Retain`.
See the Deletion Policies section below.
The edit will happen before making Kubernetes API deletion calls to ensure that the cascade works as expected.
Deleting a Velero `Backup` or any associated CSI object via `kubectl` is unsupported; data will be lost or orphaned if this is done.
### Other snapshots included in the backup
Since `VolumeSnapshot` and `VolumeSnapshotContent` objects are contained within a Velero backup tarball, it is possible that all CRDs and on-disk provider snapshots have been deleted, yet the CRDs are still within other Velero backup tarballs.
Thus, when a Velero backup that contains these CRDs is restored, the `VolumeSnapshot` and `VolumeSnapshotContent` objects are restored into the cluster, the CSI controllers will attempt to reconcile their state, and there are two possible states when the on-disk snapshot has been deleted:
1) If the driver _does not_ support the `ListSnapshots` gRPC method, then the CSI controllers have no way of knowing how to find it, and sets the `VolumeSnapshot.Status.readyToUse` field to `true`.
2) If the driver _does_ support the `ListSnapshots` gRPC method, then the CSI controllers will query the state of the on-disk snapshot, see it is missing, and set `VolumeSnapshot.Status.readyToUse` and `VolumeSnapshotContent.Status.readyToUse` fields to `false`.
## Velero client changes
To use CSI features, the Velero client must use the `EnableCSI` feature flag.
[`DescribeBackupStatus`][13] will be extended to download the `csi-snapshots.json.gz` file for processing. GitHub Issue [1568][19] captures this work.
A new `describeCSIVolumeSnapshots` function should be added to the [output][12] package that knows how to render the included `VolumeSnapshot` names referenced in the `csi-snapshots.json.gz` file.
### Snapshot selection mechanism
The most accurate, reliable way to detect if a PersistentVolume is a CSI volume is to check for a non-`nil` [`PersistentVolume.Spec.PersistentVolumeSource.CSI`][16] field.
Using the [`volume.beta.kubernetes.io/storage-provisioner`][14] is not viable, since the usage is for any PVC that should be dynamically provisioned, and is _not_ limited to CSI implementations.
It was [introduced with dynamic provisioning support][15] in 2016, predating CSI.
In the `BackupItemAction` for PVCs, the associated PV will be queried and checked for the presence of `PersistentVolume.Spec.PersistentVolumeSource.CSI`.
Volumes with any other `PersistentVolumeSource` set will use Velero's current VolumeSnapshotter plugin code path.
### VolumeSnapshotLocations and VolumeSnapshotClasses
Velero uses its own `VolumeSnapshotLocation` CRDs to specify configuration options for a given storage system.
In Velero, this often includes topology information such as regions or availibility zones, as well as credential information.
CSI volume snapshotting has a `VolumeSnapshotClass` CRD which also contains configuration options for a given storage system, but these options are not the same as those that Velero would use.
Since CSI volume snapshotting is operating within the same storage system that manages the volumes already, it does not need the same topology or credential information that Velero does.
As such, when used with CSI volumes, Velero's `VolumeSnapshotLocation` CRDs are not relevant, and could be omitted.
This will create a separate path in our documentation for the time being, and should be called out explicitly.
## Alternatives Considered
* Implementing similar logic in a Velero VolumeSnapshotter plugin was considered.
However, this is inappropriate given CSI's data model, which requires a PVC/PV's StorageClass.
Given the arguments to the VolumeSnapshotter interface, the plugin would have to instantiate its own client and do queries against the Kubernetes API server to get the necessary information.
This is unnecessary given the fact that the `BackupItemAction` and `RestoreItemAction` APIs can act directly on the appropriate objects.
Additionally, the VolumeSnapshotter plugins and CSI volume snapshot drivers overlap - both produce a snapshot on backup and a PersistentVolume on restore.
Thus, there's not a logical place to fit the creation of VolumeSnapshot creation in the VolumeSnapshotter interface.
* Implement CSI logic directly in Velero core code.
The plugins could be packaged separately, but that doesn't necessarily make sense with server and client changes being made to accomodate CSI snapshot lookup.
* Implementing the CSI logic entirely in external plugins.
As mentioned above, the necessary plugins for `PersistentVolumeClaim`, `VolumeSnapshot`, and `VolumeSnapshotContent` could be hosted out-out-of-tree from Velero.
In fact, much of the logic for creating the CSI objects will be driven entirely inside of the plugin implementation.
However, Velero currently has no way for plugins to communicate that some arbitrary data should be stored in or retrieved from object storage, such as list of all `VolumeSnapshot` objects associated with a given `Backup`.
This is important, because to display snapshots included in a backup, whether as native snapshots or Restic backups, separate JSON-encoded lists are stored within the backup on object storage.
Snapshots are not listed directly on the `Backup` to fit within the etcd size limitations.
Additionally, there are no client-side Velero plugin mechanisms, which means that the `velero describe backup --details` command would have no way of displaying the objects to the user, even if they were stored.
## Deletion Policies
In order for underlying, provider-level snapshots to be retained similarly to Velero's current functionality, the `VolumeSnapshotContent.Spec.DeletionPolicy` field must be set to `Retain`.
This is most easily accomplished by setting the `VolumeSnapshotClass.DeletionPolicy` field to `Retain`, which will be inherited by all `VolumeSnapshotContent` objects associated with the `VolumeSnapshotClass`.
The current default for dynamically provisioned `VolumeSnapshotContent` objects is `Delete`, which will delete the provider-level snapshot when the `VolumeSnapshotContent` object representing it is deleted.
Additionally, the `Delete` policy will cascade a deletion of a `VolumeSnapshot`, removing the associated `VolumeSnapshotContent` object.
It is not currently possible to define a deletion policy on a `VolumeSnapshot` that gets passed to a `VolumeSnapshotContent` object on an individual basis.
## Security Considerations
This proposal does not significantly change Velero's security implications within a cluster.
If a deployment is using solely CSI volumes, Velero will no longer need privileges to interact with volumes or snapshots, as these will be handled by the CSI driver.
This reduces the provider permissions footprint of Velero.
Velero must still be able to access cluster-scoped resources in order to back up `VolumeSnapshotContent` objects.
Without these objects, the provider-level snapshots cannot be located in order to re-associate them with volumes in the event of a restore.
Some features may take a while to get fully implemented, and we don't necessarily want to have long-lived feature branches
A simple feature flag implementation allows code to be merged into master, but not used unless a flag is set.
## Goals
- Allow unfinished features to be present in Velero releases, but only enabled when the associated flag is set.
## Non Goals
- A robust feature flag library.
## Background
When considering the [CSI integration work](https://github.com/heptio/velero/pull/1661), the timelines involved presented a problem in balancing a release and longer-running feature work.
A simple implementation of feature flags can help protect unfinished code while allowing the rest of the changes to ship.
## High-Level Design
A new command line flag, `--features` will be added to the root `velero` command.
`--features` will accept a comma-separated list of features, such as `--features EnableCSI,Replication`.
Each feature listed will correspond to a key in a map in `pkg/features/features.go` defining whether a feature should be enabled.
Any code implementing the feature would then import the map and look up the key's value.
For the Velero client, a `features` key can be added to the `config.json` file for more convenient client invocations.
## Detailed Design
A new `features` package will be introduced with these basic structs:
```go
typeFeatureFlagSetstruct{
flagsmap[string]bool
}
typeFlagsinterface{
// Enabled reports whether or not the specified flag is found.
Enabled(namestring)bool
// Enable adds the specified flags to the list of enabled flags.
Enable(names...string)
// All returns all enabled features
All()[]string
}
// NewFeatureFlagSet initializes and populates a new FeatureFlagSet
# Generating Velero CRDs with structural schema support
As the apiextensions.k8s.io API moves to GA, structural schema in Custom Resource Definitions (CRDs) will become required.
This document proposes updating the CRD generation logic as part of `velero install` to include structural schema for each Velero CRD.
## Goals
- Enable structural schema and validation for Velero Custom Resources.
## Non Goals
- Update Velero codebase to use Kubebuilder for controller/code generation.
- Solve for keeping CRDs in the Velero Helm chart up-to-date.
## Background
Currently, Velero CRDs created by the `velero install` command do not contain any structural schema.
The CRD is simply [generated at runtime](https://github.com/heptio/velero/blob/8b0cf3855c2b8aa631cf22e63da0955f7b1d06a8/pkg/install/crd.go#L39) using the name and plurals from the [`velerov1api.CustomResources()`](https://github.com/heptio/velero/blob/8b0cf3855c2b8aa631cf22e63da0955f7b1d06a8/pkg/apis/velero/v1/register.go#L60) info.
Updating the info returned by that method would be one way to add support for structural schema when generating the CRDs, but this would require manually describing the schema and would duplicate information from the API structs (e.g. comments describing a field).
Instead, the [controller-tools](https://github.com/kubernetes-sigs/controller-tools) project from Kubebuilder provides tooling for generating CRD manifests (YAML) from the Velero API types.
This document proposes adding _controller-tools_ to the project to automatically generate CRDs, and use these generated CRDs as part of `velero install`.
## High-Level Design
_controller-tools_ works by reading the Go files that contain the API type definitions.
It uses a combination of the struct fields, types, tags and comments to build the OpenAPIv3 schema for the CRDs. The tooling makes some assumptions based on conventions followed in upstream Kubernetes and the ecosystem, which involves some changes to the Velero API type definitions, especially around optional fields.
In order for _controller-tools_ to read the Go files containing Velero API type defintiions, the CRDs need to be generated at build time, as these files are not available at runtime (i.e. the Go files are not accessible by the compiled binary).
These generated CRD manifests (YAML) will then need to be available to the `pkg/install` package for it to include when installing Velero resources.
## Detailed Design
### Changes to Velero API type definitions
API type definitions need to be updated to correctly identify optional and required fields for each API type.
Upstream Kubernetes defines all optional fields using the `omitempty` tag as well as a `// +optional` annotation above the field (e.g. see [PodSpec definition](https://github.com/kubernetes/api/blob/master/core/v1/types.go#L2835-L2838)).
_controller-tools_ will mark a field as optional if it sees either the tag or the annotation, but to keep consistent with upstream, optional fields will be updated to use both indicators (as [suggested](https://github.com/kubernetes-sigs/kubebuilder/issues/479) by the Kubebuilder project).
Additionally, upstream Kubernetes defines the metav1.ObjectMeta, metav1.ListMeta, Spec and Status as [optional on all types](https://github.com/kubernetes/api/blob/master/core/v1/types.go#L3517-L3531).
Some Velero API types set the `omitempty` tag on Status, but not on other fields - these will all need to be updated to be made optional.
Below is a list of the Velero API type fields and what changes (if any) will be made.
Note that this only includes fields used in the spec, all status fields will become optional.
To tie in the CRD manifest generation with existing scripts/workflows, the `hack/update-generated-crd-code.sh` script will be updated to use _controller-tools_ to generate CRDs manifests after it generates the client code.
The generated CRD manifests will be placed in the `pkg/generated/crds/manifests` folder.
Similarly to client code generation, these manifests will be checked-in to the git repo.
Checking in these manifests allows including documentation and schema changes to API types as part of code review.
### Updating `velero install` to include generated CRD manifests
As described above, CRD generation using _controller-tools_ will happen at build time due to need to inspect Go files.
To enable the `velero install` to access the generated CRD manifests at runtime, the `pkg/generated/crds/manifests` folder will be embedded as binary data in the Velero binary (e.g. using a tool like [vfsgen](https://github.com/shurcooL/vfsgen) - see [POC branch](https://github.com/prydonius/velero/commit/4aa7413f97ce9b23e071b6054f600dd0c283351e)).
`velero install` will then unmarshal the binary data as `unstructured.Unstructured` types and append them to the [resources list](https://github.com/heptio/velero/blob/8b0cf3855c2b8aa631cf22e63da0955f7b1d06a8/pkg/install/resources.go#L217) in place of the existing CRD generation.
## Alternatives Considered
Instead of generating and bundling CRD manifests, it could be possible to instead embed the `pkg/apis` package in the Velero binary.
With this, _controller-tools_ could be run at runtime during `velero install` to generate the CRD manifests.
However, this would require including _controller-tools_ as a dependency in the project, which might not be desirable as it is a developer tool.
Another option, to avoid embedding static files in the binary, would be to generate the CRD manifest as one YAML file in CI and upload it as a release artifact (e.g. using GitHub releases).
`velero install` could then download this file for the current version and use it on install.
The downside here is that `velero install` becomes dependent on the GitHub network, and we lose visibility on changes to the CRD manifests in the Git history.
# Plan for moving the Velero GitHub repo into the VMware GitHub organization
Currently, the Velero repository sits under the Heptio GitHub organization. With the acquisition of Heptio by VMware, it is due time that this repo moves to one of the VMware GitHub organizations. This document outlines a plan to move this repo to the VMware Tanzu (https://github.com/vmware-tanzu) organization.
## Goals
- List all steps necessary to have this repo fully functional under the new org
## Non Goals
- Highlight any step necessary around setting up the new organization and its members
## Action items
### Todo list
#### Pre move
- [ ] PR: Blog post communicating the move. https://github.com/heptio/velero/issues/1841. Who: TBD.
- [ ] PR: Find/replace in all Go, script, yaml, documentation, and website files: `github.com/heptio/velero -> github.com/vmware-tanzu/velero`. Who: a Velero developer; TBD
- [ ] PR: Update website with the correct GH links. Who: a Velero developer; TBD
- [ ] PR: Change deployment and grpc-push scripts with the new location path. Who: a Velero developer; TBD
- [ ] Delete branches not to be carried over (https://github.com/heptio/velero/branches/all). Who: Any of the current repo owners; TBD
#### Move
- [ ] Use GH UI to transfer the repository to the VMW org; must be accepted within a day. Who: new org owner; TBD
- [ ] Make owners of this repo owners of repo in the new org. Who: new org owner; TBD
- [ ] Update Travis CI. Who: Any of the new repo owners; TBD
- [ ] Add DCO for signoff check (https://probot.github.io/apps/dco/). Who: Any of the new repo owners; TBD
#### Post move
- [ ] Each individual developer should point their origin to the new location: `git remote set-url origin git@github.com:vmware-tanzu/velero.git`
- [ ] Transfer ZenHub. Who: Any of the new repo owners; TBD
- [ ] Update Netlify deploy settings. Any of the new repo owners; TBD
- [ ] GH app: Netlify integration. Who: Any of the new repo owners; TBD
- [ ] GH app: Slack integration. Who: Any of the new repo owners; TBD
- [ ] Add webhook: travis CI. Who: Any of the new repo owners; TBD
- [ ] Add webhook: zenhub. Who: Any of the new repo owners; TBD
- [ ] Move all 3 native provider plugins into their own individual repo. https://github.com/heptio/velero/issues/1537. Who: @carlisia.
- [ ] Merge PRs from the "pre move" section
- [ ] Create a team for the Velero core members (https://github.com/orgs/vmware-tanzu/teams/). Who: Any of the new repo owners; TBD
### Notes/How-Tos
#### Transfering the GH repository
All action items needed for the repo transfer are listed in the Todo list above. For details about what gets moved and other info, this is the GH documentation: https://help.github.com/en/articles/transferring-a-repository
[Pending] We will find out this week who will be the organization owner(s) who will accept this transfer in the new GH org. This organization owner will make all current owners in this repo owners in the new org Velero repo.
#### Updating Travis CI
Someone with owner permission on the new repository needs to go to their Travis CI account and authorize Travis CI on the repo. Here are instructions: https://docs.travis-ci.com/user/tutorial/.
After this, webhook notifications can be added following these instructions: https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications.
#### Transfering ZenHub
Pre-requisite: A new Zenhub account must exist for a vmware or vmware-tanzu organization.
This page contains a pre-migration checklist for ensuring a repo migration goes well with Zenhub: https://help.zenhub.com/support/solutions/articles/43000010366-moving-a-repo-cross-organization-or-to-a-new-organization. After this, webhooks can be added by following these instructions: https://github.com/ZenHubIO/API#webhooks.
#### Updating Netlify
The settings for Netflify should remain the same, except that it now needs to be installed in the new repo. The instructions on how to install Netlify on the new repo are here: https://www.netlify.com/docs/github-permissions/.
#### Communication strategy
[Pending] We will find out this week how this move will be communicated to the community. In particular, the Velero repository move might be tied to the move of our provider plugins into their own repos, also in the new org: https://github.com/heptio/velero/issues/1814.
#### TBD
Many items on the todo list must be done by a repository member with owner permission. This doesn't all need to be done by the same person obviously, but we should specify if @skriss wants to split these tasks with any other owner(s).
#### Other notes
Might want to exclude updating documentation prior to v1.0.0.
GH documentation does not specify if branches on the server are also moved.
All links to the original repository location are automatically redirected to the new location.
## Alternatives Considered
Alternatives such as moving Velero to its own organization, or even not moving at all, were considered. Collectively, however, the open source leadership decided it would be best to move it so it lives alongside other VMware supported cloud native related repositories.
## Security Considerations
- Ensure that only the Velero core team has maintainer/owner privileges.
# Plan to extract the provider plugins out of (the Velero) tree
Currently, the Velero project contains in-tree plugins for three cloud providers: AWS, Azure, and GCP. The Velero team has decided to extract each of those plugins into their own separate repository. This document details the steps necessary to create the new repositories, as well as a general design for what each plugin project will look like.
## Goals
- Have 3 new repositories for each cloud provider plugin currently supported by the Velero team: AWS, Azure, and GCP
- Have the currently in-tree cloud provider plugins behave like any other plugin external to Velero
## Non Goals
- Extend the Velero plugin framework capability in any way
- Create GH repositories for any plugin other then the currently 3 in-tree plugins
- Extract out any plugin that is not a cloud provider plugin (ex: item action related plugins)
## Background
With more and more providers wanting to support Velero, it gets more difficult to justify excluding those from being in-tree just as with the three original ones. At the same time, if we were to include any more plugins in-tree, it would ultimately become the responsibility of the Velero team to maintain an increasing number of plugins. This move aims to equalize the field so all plugins are treated equally. We also hope that, with time, developers interested in getting involved in the upkeep of those plugins will become active enough to be promoted to maintainers. Lastly, having the plugins live in their own individual repositories allows for iteration on them separately from the core codebase.
## Action items
### Todo list
#### Repository creation
- [ ] Use GH UI to create each repository in the new VMW org. Who: new org owner; TBD
- [ ] Make owners of the Velero repo owners of each repo in the new org. Who: new org owner; TBD
- [ ] Add Travis CI. Who: Any of the new repo owners; TBD
- [ ] Add webhook: travis CI. Who: Any of the new repo owners; TBD
- [ ] Add DCO for signoff check (https://probot.github.io/apps/dco/). Who: Any of the new repo owners; TBD
#### Plugin changes
- [ ] Modify Velero so it can install any of the provider plugins. https://github.com/heptio/velero/issues/1740 - Who: @nrb
- [ ] Extract each provider plugin into their own repo. https://github.com/heptio/velero/issues/1537
- [ ] Create deployment and gcr-push scripts with the new location path. Who: @carlisia
- [ ] Add documentation for how to use the plugin. Who: @carlisia
- [ ] Update Helm chart to install Velero using any of the provider plugins. https://github.com/heptio/velero/issues/1819
[Pending] The organization owner will make all current owners in the Velero repo also owners in each of the new org plugin repos.
#### Setting up Travis CI
Someone with owner permission on the new repository needs to go to their Travis CI account and authorize Travis CI on the repo. Here are instructions: https://docs.travis-ci.com/user/tutorial/.
After this, any webhook notifications can be added following these instructions: https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications.
## High-Level Design
Each provider plugin will be an independent project, using the Velero library to implement their specific functionalities.
The way Velero is installed will be changed to accommodate installing these plugins at deploy time, namely the Velero `install` command, as well as the Helm chart.
Each plugin repository will need to have their respective images built and pushed to the same registry as the Velero images.
## Detailed Design
### Projects
Each provider plugin will be an independent GH repository, named: `velero-plugin-aws`, `velero-plugin-azure`, and `velero-plugin-gcp`.
Build of the project will be done the same way as with Velero, using Travis.
Images for all the plugins will be pushed to the same repository as the Velero image, also using Travis.
Releases of each of these plugins will happen in sync with releases of Velero. This will consist of having a tag in the repo and a tagged image build with the same release version as Velero so it makes it easy to identify what versions are compatible, starting at v1.2.
Documentation for how to install and use the plugins will be augmented in the existing Plugins section of the Velero documentation.
Documentation for how to use each plugin will reside in their respective repos. The navigation on the Velero documentation will be modified for easy discovery of the docs/images for these plugins.
#### Version compatibility
We will keep the major and minor release points in sync, but the plugins can have multiple minor dot something releases as long as it remains compatible with the corresponding major/minor release of Velero. Ex:
| Velero | Plugin | Compatible? |
|---|---|---|
| v1.2 | v1.2 | ✅ |
| v1.2 | v1.2.3 | ✅ |
| v1.2 | v1.3 | 🚫 |
| v1.3 | v1.2 | 🚫 |
| v1.3 | v1.3.3 | ✅ |
### Installation
As per https://github.com/heptio/velero/issues/1740, we will add a `plugins` flag to the Velero install command which will accept an array of URLs pointing to +1 images of plugins to be installed. The `velero plugin add` command should continue working as is, in specific, it should also allow the installation of any of the new 3 provider plugins. @nrb will provide specifics about how this change will be tackled, as well as what will be documented. Part of the work of adding the `plugins` flag will be removing the logic that adds `velero.io` name spacing to plugins that are added without it.
The Helm chart that allows the installation of Velero will be modified to accept the array of plugin images with an added `plugins` configuration item.
### Design code changes and considerations
The naming convention to use for name spacing each plugin will be `velero.io`, since they are currently maintained by the Velero team.
Install dep
Question: are there any places outside the plugins where we depend on the cloud-provider SDKs? can we eliminate those dependencies too? x
- the `restic` package uses the `aws`. SDK to get the bucket region for the AWS object store (https://github.com/carlisia/velero/blob/32d46871ccbc6b03e415d1e3d4ad9ae2268b977b/pkg/restic/config.go#L41)
- could not find usage of the cloud provider SDKs anywhere else.
Plugins such as the pod -> pvc -> pv backupitemaction ones make sense to stay in the core repo as they provide some important logic that just happens to be implemented in a plugin.
### Upgrade
The documentation for how to fresh install the out-of-tree plugin with Velero v1.2 will be specified together with the documentation for the install changes on issue https://github.com/heptio/velero/issues/1740.
For upgrades, we will provide a script that will:
- change the tag on the Velero deployment yaml for both the main image and any of th three plugins installed.
- rename existing aws, azure or gcp plugin names to have the `velero.io/` namespace preceding the name (ex: `velero.io/aws).
Alternatively, we could add CLI `velero upgrade` command that would make these changes. Ex: `velero upgrade 1.3` would upgrade from `v1.2` to `v1.3`.
For upgrading:
- Edit the provider field in the backupstoragelocations and volumesnapshotlocations CRDs to include the new namespace.
## Alternatives Considered
We considered having the plugins all live in the same GH repository. The downside of that approach is ending up with a binary and image bigger than necessary, since they would contain the SDKs of all three providers.
## Security Considerations
- Ensure that only the Velero core team has maintainer/owner privileges.
Velero supports restoring resources into different namespaces than they were backed up from.
This enables a user to, among other things, clone a namespace within a cluster.
However, if the namespace being cloned uses persistent volume claims, Velero cannot currently create a second copy of the original persistent volume when restoring.
This limitation is documented in detail in [issue #192](https://github.com/heptio/velero/issues/192).
This document proposes a solution that allows new copies of persistent volumes to be created during a namespace clone.
## Goals
- Enable persistent volumes to be cloned when using `velero restore create --namespace-mappings ...` to create a second copy of a namespace within a cluster.
## Non Goals
- Cloning of persistent volumes in any scenario other than when using `velero restore create --namespace-mappings ...` flag.
During a restore, Velero will detect that it needs to assign a new name to a persistent volume being restored if and only if both of the following conditions are met:
- the persistent volume is claimed by a persistent volume claim in a namespace that's being remapped using `velero restore create --namespace-mappings ...`
- a persistent volume already exists in the cluster with the original name
If these conditions exist, Velero will give the persistent volume a new arbitrary name before restoring it.
It will also update the `spec.volumeName` of the related persistent volume claim.
## Detailed Design
In `pkg/restore/restore.go`, around [line 872](https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L872), Velero has special-case code for persistent volumes.
This code will be updated to check for the two preconditions described in the previous section.
If the preconditions are met, the object will be given a new name.
The persistent volume will also be annotated with the original name, e.g. `velero.io/original-pv-name=NAME`.
Importantly, the name change will occur **before** [line 890](https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L890), where Velero checks to see if it should restore the persistent volume.
Additionally, the old and new persistent volume names will be recorded in a new field that will be added to the `context` struct, `renamedPVs map[string]string`.
In the special-case code for persistent volume claims starting on [line 987](https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L987), Velero will check to see if the claimed persistent volume has been renamed by looking in `ctx.renamedPVs`.
If so, Velero will update the persistent volume claim's `spec.volumeName` to the new name.
## Alternatives Considered
One alternative approach is to add a new CLI flag and API field for restores, e.g. `--clone-pvs`, that a user could provide to indicate they want to create copies of persistent volumes.
This approach would work fine, but it does require the user to be aware of this flag/field and to properly specify it when needed.
It seems like a better UX to detect the typical conditions where this behavior is needed, and to automatically apply it.
Additionally, the design proposed here does not preclude such a flag/field from being added later, if it becomes necessary to cover other use cases.
# Progress reporting for restic backups and restores
Status: Accepted
During long-running restic backups/restores, there is no visibility into what (if anything) is happening, making it hard to know if the backup/restore is making progress or hung, how long the operation might take, etc.
We should capture progress during restic operations and make it user-visible so that it's easier to reason about.
This document proposes an approach for capturing progress of backup and restore operations and exposing this information to users.
## Goals
- Provide basic visibility into restic operations to inform users about their progress.
## Non Goals
- Capturing progress for non-restic backups and restores.
## Background
(Omitted, see introduction)
## High-Level Design
### restic backup progress
The `restic backup` command provides progress reporting to stdout in JSON format, which includes the completion percentage of the backup.
This progress will be read on some interval and the PodVolumeBackup Custom Resource's (CR) status will be updated with this information.
### restic restore progress
The `restic stats` command returns the total size of a backup.
This can be compared with the total size the volume periodically to calculate the completion percentage of the restore.
The PodVolumeRestore CR's status will be updated with this information.
## Detailed Design
## Changes to PodVolumeBackup and PodVolumeRestore Status type
A new `Progress` field will be added to PodVolumeBackupStatus and PodVolumeRestoreStatus of type `PodVolumeOperationProgress`:
```
type PodVolumeOperationProgress struct {
TotalBytes int64
BytesDone int64
}
```
### restic backup progress
restic added support for [streaming JSON output for the `restic backup` command](https://github.com/restic/restic/pull/1944) in 0.9.5.
Our current images ship restic 0.9.4, and so the Dockerfile will be updated to pull the new version: https://github.com/heptio/velero/blob/af4b9373fc73047f843cd4bc3648603d780c8b74/Dockerfile-velero#L21.
With the `--json` flag, `restic backup` outputs single lines of JSON reporting the status of the backup:
The [command factory for backup](https://github.com/heptio/velero/blob/af4b9373fc73047f843cd4bc3648603d780c8b74/pkg/restic/command_factory.go#L37) will be updated to include the `--json` flag.
The code to run the `restic backup` command (https://github.com/heptio/velero/blob/af4b9373fc73047f843cd4bc3648603d780c8b74/pkg/controller/pod_volume_backup_controller.go#L241) will be changed to include a Goroutine that reads from the command's stdout stream.
The implementation of this will largely follow [@jmontleon's PoC](https://github.com/fusor/velero/pull/4/files) of this.
The Goroutine will periodically read the stream (every 10 seconds) and get the last printed status line, which will be convered to JSON.
If `bytes_done` is empty, restic has not finished scanning the volume and hasn't calculated the `total_bytes`.
In this case, we will not update the PodVolumeBackup and instead will wait for the next iteration.
Once we get a non-zero value for `bytes_done`, the `bytes_done` and `total_bytes` properties will be read and the PodVolumeBackup will be patched to update `status.Progress.BytesDone` and `status.Progress.TotalBytes` respectively.
Once the backup has completed successfully, the PodVolumeBackup will be patched to set `status.Progress.BytesDone = status.Progress.TotalBytes`.
This is done since the main thread may cause early termination of the Goroutine once the operation has finished, preventing a final update to the `BytesDone` property.
### restic restore progress
The `restic stats <snapshot_id> --json` command provides information about the size of backups:
```
{"total_size":10558111744,"total_file_count":11}
```
Before beginning the restore operation, we can use the output of `restic stats` to get the total size of the backup.
The PodVolumeRestore will be patched to set `status.Progress.TotalBytes` to the total size of the backup.
The code to run the `restic restore` command will be changed to include a Goroutine that periodically (every 10 seconds) gets the current size of the volume.
To get the current size of the volume, we will recursively walkthrough all files in the volume to accumulate the total size.
The current total size is the number of bytes transferred so far and the PodVolumeRestore will be patched to update `status.Progress.BytesDone`.
Once the restore has completed successfully, the PodVolumeRestore will be patched to set `status.Progress.BytesDone = status.Progress.TotalBytes`.
This is done since the main thread may cause early termination of the Goroutine once the operation has finished, preventing a final update to the `BytesDone` property.
### Velero CLI changes
The output that describes detailed information about [PodVolumeBackups](https://github.com/heptio/velero/blob/559d62a2ec99f7a522924348fc4a173a0699813a/pkg/cmd/util/output/backup_describer.go#L349) and [PodVolumeRestores](https://github.com/heptio/velero/blob/559d62a2ec99f7a522924348fc4a173a0699813a/pkg/cmd/util/output/restore_describer.go#L160) will be updated to calculate and display a completion percentage from `status.Progress.TotalBytes` and `status.Progress.BytesDone` if available.
## Open Questions
- Can we assume that the volume we are restoring in will be empty? Can it contain other artefacts?
- Based on discussion in this PR, we are okay making the assumption that the PVC is empty and will proceed with the above proposed approach.
## Alternatives Considered
### restic restore progress
If we cannot assume that the volume we are restoring into will be empty, we can instead use the output from `restic snapshot` to get the list of files in the backup.
This can then be used to calculate the current total size of just those files in the volume, so that we avoid considering any other files unrelated to the backup.
The above proposed approach is simpler than this one, as we don't need to keep track of each file in the backup, but this will be more robust if the volume could contain other files not included in the backup.
It's possible that certain volume types may contain hidden files that could attribute to the total size of the volume, though these should be small enough that the BytesDone calculation will only be slightly inflated.
Another option is to contribute progress reporting similar to `restic backup` for `restic restore` upstream.
This may take more time, but would give us a more native view on the progress of a restore.
There are several issues about this already in the restic repo (https://github.com/restic/restic/issues/426, https://github.com/restic/restic/issues/1154), and what looks like an abandoned attempt (https://github.com/restic/restic/pull/2003) which we may be able to pick up.
The YAML config files in this directory can be used to quickly deploy a containerized Ark deployment.
This directory contains sample YAML config files that can be used for exploring Velero.
*`common/`: Contains manifests to set up Ark. Can be used across cloud provider platforms. (Note that Azure requires its own deployment file due to its unique way of loading credentials).
*`minio/`: Used in the [Quickstart][0] to set up [Minio][1], a local S3-compatible object storage service. It provides a convenient way to test Velero without tying you to a specific cloud provider.
*`minio/`: Used in the [Quickstart][1] to set up [Minio][0], a local S3-compatible object storage service. It provides a convenient way to test Ark without tying you to a specific cloud provider.
*`nginx-app/`: A sample nginx app that can be used to test backups and restores.
*`aws/`, `azure/`, `gcp/`, `ibm/`: Contains manifests specific to the given cloud provider's setup.
# replace known version-specific links -- the sed syntax is slightly different in OS X and Linux,
# so check which OS we're running on.
if[[$(uname)=="Darwin"]];then
echo"[OS X] updating version-specific links"
find site/docs/${NEW_DOCS_VERSION} -type f -name "*.md"| xargs sed -i ''"s|https://velero.io/docs/master|https://velero.io/docs/$NEW_DOCS_VERSION|g"
find site/docs/${NEW_DOCS_VERSION} -type f -name "*.md"| xargs sed -i ''"s|https://github.com/vmware-tanzu/velero/blob/master|https://github.com/vmware-tanzu/velero/blob/$NEW_DOCS_VERSION|g"
echo"[OS X] Updating latest version in _config.yml"
sed -i ''"s/latest: ${PREVIOUS_DOCS_VERSION}/latest: ${NEW_DOCS_VERSION}/" site/_config.yml
# newlines and lack of indentation are requirements for this sed syntax
# which is doing an append
echo"[OS X] Adding latest version to versions list in _config.yml"
sed -i ''"/- master/a\\
- ${NEW_DOCS_VERSION}
" site/_config.yml
echo"[OS X] Adding ToC mapping entry"
sed -i ''"/master: master-toc/a\\
${NEW_DOCS_VERSION}: ${NEW_DOCS_TOC}
" site/_data/toc-mapping.yml
else
echo"[Linux] updating version-specific links"
find site/docs/${NEW_DOCS_VERSION} -type f -name "*.md"| xargs sed -i''"s|https://velero.io/docs/master|https://velero.io/docs/$NEW_DOCS_VERSION|g"
find site/docs/${NEW_DOCS_VERSION} -type f -name "*.md"| xargs sed -i''"s|https://github.com/vmware-tanzu/velero/blob/master|https://github.com/vmware-tanzu/velero/blob/$NEW_DOCS_VERSION|g"
echo"[Linux] Updating latest version in _config.yml"
sed -i''"s/latest: ${PREVIOUS_DOCS_VERSION}/latest: ${NEW_DOCS_VERSION}/" site/_config.yml
echo"[Linux] Adding latest version to versions list in _config.yml"
sed -i''"/- master/a - ${NEW_DOCS_VERSION}" site/_config.yml
echo"[Linux] Adding ToC mapping entry"
sed -i''"/master: master-toc/a ${NEW_DOCS_VERSION}: ${NEW_DOCS_TOC}" site/_data/toc-mapping.yml
fi
echo"Success! site/docs/$NEW_DOCS_VERSION has been created."
echo""
echo"The next steps are:"
echo" 1. Consult site/README-JEKYLL.md for further manual steps required to finalize the new versioned docs generation."
echo" 2. Run a 'git diff' to review all changes made to the docs since the previous version."
echo" 3. Make any manual changes/corrections necessary."
echo" 4. Run 'git add' to stage all unstaged changes, then 'git commit'."
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
packagev1
const(
// DefaultNamespace is the Kubernetes namespace that is used by default for
// the Ark server and API objects.
DefaultNamespace="heptio-ark"
// ResourcesDir is a top-level directory expected in backups which contains sub-directories
// for each resource type in the backup.
ResourcesDir="resources"
// RestoreLabelKey is the label key that's applied to all resources that
// are created during a restore. This is applied for ease of identification
// of restored resources. The value will be the restore's name.
RestoreLabelKey="ark-restore"
// ClusterScopedDir is the name of the directory containing cluster-scoped
// resources within an Ark backup.
ClusterScopedDir="cluster"
// NamespaceScopedDir is the name of the directory containing namespace-scoped
// resource within an Ark backup.
NamespaceScopedDir="namespaces"
)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.