* restic backup and restore progress proposal
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* update approach to lookup based on files rather than total size
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* typo
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* record TotalBytes and BytesDone in PodVolumeBackup/Restore and calculate percentage in UI
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* update to go with restic stats approach for restore
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* typo
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* update high-level design for restic stats
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* wait for bytes_done
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* change status to accepted
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* patch velero to handle self-signed certs on client
you'll get this error otherwise:
x509: certificate signed by unknown authority
Signed-off-by: Steven Chung <schung@d2iq.com>
The Velero deployment did not have a way of exposing the namespace it
was installed in to the API client. This is a problem for plugins that
need to query for resources in that namespaces, such as the restic
restore process that needs to find PodVolume(Backup|Restore)s.
While the Velero client is consulted for a configured namespace, this
cannot be set in the server pod since there is no valid home directory
in which to place it.
This change provides the namespace to the deployment via the downward
API, and updates the API client factory to use the VELERO_NAMESPACE
before looking at the config file, so that any plugins using the client
will look at the appropriate namespace.
Fixes#1743
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add SEO functionality to the site
Signed-off-by: Jonas Rosland <jrosland@vmware.com>
* Add some more SEO
Signed-off-by: Jonas Rosland <jrosland@vmware.com>
* use Backup CR labels as tags for snapshots
This allows users to define custom tags to be added to snapshots, by
specifying custom labels on the Backup CR with the `velero backup create
--labels` flag.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* print resource list metadata in velero backup describe details
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* rewrite TestGetDownloadURL to test more scenarios
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* move backup printer helpers to backup_printer.go
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* move describe printer helpers back to backup_describer
and rename to prefix with describe* to indicate that they are used for the describe command
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* split backup and restore tests for TestGetDownloadURL
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* friendlier error message when backup resource list missing
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* periodically check for stale restic repo locks
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* only try to init a restic repo if it doesn't already exist
Signed-off-by: Steve Kriss <krisss@vmware.com>
* reword comment
Signed-off-by: Steve Kriss <krisss@vmware.com>
* migrate pkg/backup restic tests to new structure
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename backup_new_test.go to backup_test.go
Signed-off-by: Steve Kriss <krisss@vmware.com>
* use pod volume backup builder
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove docs header about project rename
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove Upgrade to 1.0 page from master TOC
Signed-off-by: Steve Kriss <krisss@vmware.com>
* standardize TOC titles
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add backup/restore docs for undocumented features
Signed-off-by: Steve Kriss <krisss@vmware.com>
Without the incremental switch, live rebuilds were taking 20s (on a
Linux host) for a single-line change. With the switch, they dropped to
8s.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Flags specifying the kubeconfig or kubecontext to use weren't actually
being used by the install command.
Fixes#1651
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove references to apps/v1beta1 API group
In Kubernetes v1.16, the apps/v1 API group will be the default served
for relevant resources.
Update any references to apps/v1beta1 for fowards compatibility.
Fixes#1672
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Update API group on plugin commands
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* add restore item action to change PVC/PV storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* code review
Signed-off-by: Steve Kriss <krisss@vmware.com>
* change existing plugin names to lowercase/hyphenated
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add validation for new storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* fix imports
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update plugin names to be more consistent
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update unit tests to use pkg/test object constructors
Signed-off-by: Steve Kriss <krisss@vmware.com>
CSI volumes are mounted one level deeper than "native" kubernetes
volumes, and this needs to be appended for proper restic support.
Fixes#1313.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* allow users to specify additional Velero/restic pod annotations on the command line with the pod-annotations flag
Signed-off-by: Traci Kamp <traci.kamp@gmail.com>
* record PodVolumeBackup start and completion timestamps
adds startTimestamp and completionTimestamp fields to the
PodVolumeBackup status spec
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* record PodVolumeRestore start and completion timestamps
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* migrate hooks tests
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add more test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* refactor to strongly typed expectation, add pre+post hook test case
Signed-off-by: Steve Kriss <krisss@vmware.com>
* allow exclusion of resources using standard label
excludes any resources with the velero.io/exclude-from-backup=true label
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* ensure backup item action modifications reflected in tarball filepath
This patch ensures the updated backup item's name and namespace are used
when constructing the filepath for the tarball.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* changelog
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* added top spacing and reduced bottom
Signed-off-by: Ross Fenrick <rossf@heptio.com>
* added marign to h3's
Signed-off-by: Ross Fenrick <rossf@heptio.com>
* Add restic instructions for Enterprise PKS
Signed-off-by: Stephen Carter <carters@vmware.com>
* Add instructions to v1.0.0 docs
Signed-off-by: Stephen Carter <carters@vmware.com>
This fix initialises an empty map if the request object's Labels map
is nil, allowing the controller to later add and modify labels on the
object.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* migrate and enhance backup item action tests
Signed-off-by: Steve Kriss <krisss@vmware.com>
* migrate terminating resource test
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add godoc for test functions
Signed-off-by: Steve Kriss <krisss@vmware.com>
* Adding in Jekyll site | Adjusting gitignore for Jekyll build | Moving docs to site folder
Signed-off-by: Lee Springer <lee@smalltalk.agency>
Adding in Jekyll site | Adjusting gitignore for Jekyll build | Moving docs to site folder
Signed-off-by: Lee Springer <lee@smalltalk.agency>
* Restore main.go to original location
Signed-off-by: Lee Springer <lee@smalltalk.agency>
* Updates to footer
Signed-off-by: Lee Springer <lee@smalltalk.agency>
* Updates to homepage links
Signed-off-by: Lee Springer <lee@smalltalk.agency>
* Content updates
Signed-off-by: Lee Springer <lee@smalltalk.agency>
`velero snapshot-location create` existed, but not `velero create
snapshot-location`; update subcommand for parity with backup-location
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
When using `velero install`, the image used should be a reasonable
default, even if buildinfo.Version is missing (such as when using `go
build` directly).
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Velero should handle cases when the label length exceeds 63 characters.
- if the length of the backup/restore name is <= 63 characters, use it as the value of the label
- if it's > 63 characters, take the SHA256 hash of the name. the value of
the label will be the first 57 characters of the backup/restore name
plus the first six characters of the SHA256 hash.
Fixes heptio#1021
Signed-off-by: Anshul Chandra <anshulc@vmware.com>
Using only the `component: velero` selector caused the restic pods to be
matched, as well, which made things such as `kubectl logs` return too
much information.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Prior to 1.0, user-provided values such as cloud provider names didn't
need to be namespaced. This change allows those values to continue
working without the user editing the fields manually.
Fixes#1363
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Adds support for allowing a RestoreItemAction to skip item restore
This allows a RestoreItemAction plugin to signal to velero that
the returned item should be skipped rather than restored to the
cluster.
To support this, a boolean SkipRestore attribute is added to
RestoreItemActionExecuteOutput. If restore.restoreResource finds
this set to true, any remaining actions on this item are skipped,
and restore on this item is skipped. Execution continues with
the next item of this resource type.
To signal this for a particular item, the RestoreItemAction's
Execute method should call WithoutRestore() on the
RestoreItemActionExecuteOutput before returning it.
Signed-off-by: Scott Seago <sseago@redhat.com>
* Autogenerated code to support SkipRestore
Signed-off-by: Scott Seago <sseago@redhat.com>
* Added changelog for #1336
Signed-off-by: Scott Seago <sseago@redhat.com>
Original error was:
```
ark -n <redacted> restore logs <redacted>
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=nodes logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events.events.k8s.io logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=backups.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=restores.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Starting restore of backup backup/<redacted>" logSource="pkg/restore/restore.go:342"
time="2019-03-06T18:31:06Z" level=info msg="error unzipping and extracting: open /tmp/604421455/resources/rolebindings.rbac.authorization.k8s.io/namespaces/<redacted>/<redacted>: too many open files" logSource="pkg/restore/restore.go:346"
```
Downloading the directory from s3 and untarring it I found 1036 files. The ulimit -n output says 1024. This is our team's best guess at a root cause. But the code fixed in the PR definitely is holding all the files open until the method closes: https://blog.learngoprogramming.com/gotchas-of-defer-in-go-1-8d070894cb01
Please note my go code abilities are not great and I did not test this. I just edited the file in github. All I did was remove the defer and put fileClose after the copy is done. Theoretically this should only hold one file open at a time now. Let me know if you want me to do any further steps.
Thank you,
-Asaf
Signed-off-by: Asaf Erlich <aerlich@groupon.com>
* Move Phase to right under Metadata(name/namespace/label/annotations)
* Move Validation errors: section right after Phase: section and only
show it if the item has a phase of FailedValidation
* For restores move Warnings and Errors under Validation errors. Do not
show Warnings or Errors if there are none.
Signed-off-by: DheerajSShetty <dheerajs@vmware.com>
Fixes#987
To create a regional disk, the URLs for the zones in which the disk is
replicated must be provided to the GCP API.
Fixes#1183
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Due to the assumption of building from a git repository inherent in the
Makefile targets, update the documentation to limit source code archive
builds to the `go build` commands.
Note that it may be possible to update hack/build.sh to cope with
building outside a git repository, but there are a number of questions
to answer regarding how to populate the buildinfo info, and that is a
larger discussion than ensuring we can build the binary from the sources
in the source code archive.
Signed-off-by: Darren Hart <dvhart@vmware.com>
Update the documentation to include minimal instructions for building
the velero binary from the GitHub Release "Source code" archive.
Correct the Download link in the Table of Contents for
build-from-scratch.md.
Signed-off-by: Darren Hart <dvhart@vmware.com>
If a PV already exists, wait for it, it's associated PVC, and associated
namespace to be deleted before attempting to restore it.
If a namespace already exists, wait for it to be deleted before
attempting to restore it.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
The binary, deployment, and namespace have yet to change for most users,
so make sure the bug template has the correct information when filing.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Some aws implementations, for example the quobyte object storage, do not
support the v4 signing algorithm, but only v1.
This makes it possible to configure the signatureVersion.
The algorithm implementation was ported from d6c1be296e/botocore/auth.py (L860-L862)
which is used by the aws CLI client.
This fixes https://github.com/heptio/ark/issues/811.
Signed-off-by: Bastian Hofmann <bashofmann@gmail.com>
Instead of only removing the default token from a service account when
it already exists in the cluster, always remove it. If the service
account already exists, continue to do the merging logic.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When backups are run manually (outside of a schedule) the metrics
will be counted for ark_*{schedule=""}. To prevent partial NaN
metrics they will be initialised on server init.
Signed-off-by: Christian Beneke <c.beneke@syseleven.de>
This allows the Ark server to use one URL for the majority of
communications with S3 (or compatible) object storage, and a different
URL base for pre-signed URLs (for streaming logs, etc. to clients).
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
We were checking for nil, but were getting back an empty
*unstructured.Unstructured{} instead, along with a NotFound error.
Change the logic to check for the NotFound error instead of a nil
object.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When a backup is deleted, the delete method uses ListObjects to get a list of
files it needs to delete in s3. Different s3 implementations may return
the object lists in different, even non-deterministic orders, which can
result in the deletion not working because ark tries to delete a non empty folder
before it tries to delete the files in the folder.
Signed-off-by: Bastian Hofmann <bashofmann@gmail.com>
rename variables to reflect the metric name
fix comments for exported methods
explicitly record per schedule per schedule metric values
initialize metrics and change variable name to match with that of metric
add metric for recording failed volume snapshots
use singular variable instead of plural
remove extra field for failed snapshots, calculate using existing fields
initialize failure metric and rename methods
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
- Pin to a specific revision of goimports
- Use -local flag with goimports to keep ark imports separated
- Correct shellcheck errors
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
refactor and move DeleteOptions struct and methods
unexport fields not used outside the package in DeleteOptions struct
refactor BindFlags() to work with name of command
fix constructor
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
The support-matrix.md has been moved to the docs folder, hence the link in the readme didn't work anymore.
Signed-off-by: Oliver Hummel <oliver.hummel@sap.com>
A blank line is necessary before starting a table with Jekyll. Fix it
here so when new versions are cut they render to HTML corectly.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
refactor util methods into a common package
move utility methods into cli package instead of common
rename util.go to common.go
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
Due to the version of docker used to build images, the Dockerfile used
with -f must be in the same directory that's used for the context. Copy
the Dockerfile into the _output directory and make the custom targets
more closely match the standard ones.
Fixes#833
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Moving check for environment variable outside the loop
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Insert a note about AWS_CLUSTER_NAME in the aws-config doc
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Improving implementation and documentation
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Changing instructions, adding unit test for getTagsForCluster and removing duplicated Lookup
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Commit after update
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
Correcting bad formatting in aws-config.md
Signed-off-by: Lukasz Jakimczuk <ljakimczuk@gmail.com>
This commit only provides the data model for further work. It does not
implement any logic around locations, nor does it remove anything from
the Config API type.
Closes#736Closes#732
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
This is to better reflect the intent of the user when node ports are
specified explicitly (as opposed to being assigned by Kubernetes). The
`last-applied-configuration` annotation added by `kubectl apply` is one
such indicator we are now leveraging.
We still default to omitting the node ports when the annotation is
missing.
Signed-off-by: Timo Reimann <ttr314@googlemail.com>
Refactor plugin management:
- support multiple plugins per executable
- support restarting a plugin process in the event it terminates
- simplify plugin lifecycle management by using separate managers for
each scope (server vs backup vs restore)
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
These commands are intended to be general, and give users some context
for where they can look to get more information about Ark's behaviors.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
- ScheduleName is added as an API field to the Restore object
- Restore controller validates that exactly one of BackupName
or ScheduleName has been provided
- If ScheduleName is provided, Restore controller populates
BackupName with the name of the most recent successful backup
created from the schedule
- --from-schedule flag is added to `ark restore create` CLI cmd
Signed-off-by: Steve Kriss <steve@heptio.com>
Add backups and restores the list of non restorable resources. Backups,
if applicable, are synced from object storage by the backup sync
controller. Restores are specific to a cluster and don't have value
moving across clusters.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Mirror pods are pods created from static manifest files on a node.
They're mirrored to the apiserver so they're visible when querying the
apiserver for a list of pods, but it's not possible to send a pod
containing the mirror pod annotation to the apiserver and have it be
created successfully. Instead of trying to do this, log a message that
we're skipping restoring the pod because it's a mirror pod.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If a PV has a reclaim policy of Delete and we didn't create a snapshot
of it, don't restore the PV, as doing so would create a PV whose
underlying volume is incorrect.
Also "reset" any PVCs bound to the PV so they'll be dynamically
provisioned when restored.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
All properties from a backup will be merged into the ServiceAccount
except for the default token secret.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
* add a metrics package to handle metric registration and publishing
* add a metricsAddress field to the server struct
* make metrics a part of the server
* start a metrics endpoint as part of starting the controllers
* instrument backup_controller to report metrics
* update cli-reference docs
* update example deployments with prometheus annotations
* update 'pkg/install' tooling with prometheus annotations
Signed-off-by: Ashish Amarnath <ashish.amarnath@gmail.com>
Handle the case where a BackupItemAction may return nil for updatedItem,
meaning "no modifications to the item". The backupPVAction does this,
and we were panicking instead of accepting it.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Writing to GCP's object store is any async operation, so errors need to
be checked both on write and close calls, since errors like permission
violations aren't reported until a close.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
This will help users use the `--selector` flag to selectively exclude objects from being backed up by ark
workaround for #404 until dedicated flags are implemented
Signed-off-by: Dhananjay Sathe <dhanajaysathe@gmail.com>
Completed jobs and pods may be useful in the backup for auditing
purposes, but don't recreate them when restoring.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
These internal `gcr.io/heptio-images/golang` images are deprecated. It looks like `git` and `bash` are the only things the Ark build needed that aren't in the upstream `golang:1.9-alpine3.6` image.
Signed-off-by: Matt Moyer <moyer@heptio.com>
Update k8s.io/api to v1.10.0
Update k8s.io/apimachinery to v1.10.0
Update k8s.io/client-go to v7.0
Update k8s.io/kubernetes to v1.10
Signed-off-by: Steve Kriss <steve@heptio.com>
Now that we're no longer using a finalizer as part of backup deletion,
it's fine to run the Ark server in the same namespace as all of the
resources.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
This PR updates the documentation & example deployment template to show how `ark` can be ran utilizing [https://github.com/jtblin/kube2iam](Kube2iam) for AWS IAM permissions, rather than using access key & secret key.
Signed-off-by: Dominik Deren <dominik.deren@live.com>
This commit adds support for auto completion for bash and zsh
shells. A new root level command called "completion" has been
introduced, and the user can get the auto completion code by
running `ark completion bash/zsh`.
For bash completion, the built-in GenBashCompletion() from cobra
has been used, but for zsh, the built-in GenZshCompletion() is
known to cause issues. The workaround has been copied from zsh
completion code of kubectl.
Signed-off-by: Shubham <shubham@linux.com>
This commit introduces validation logic to `ark restore logs`
command, the way it already exists in other commands like `ark
restore create`.
Before the logs for a restore are fetched from the server, the
server is contacted to check if the specified restore exists. If
it does not, it errors out.
Fixes#389
Signed-off-by: Shubham <shubham@linux.com>
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.
The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.
There are some cases right now that are known to not be equal via this
method:
- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at https://github.com/heptio/ark/issues/354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
When the BackupDeletionController processes a request, set the request's
backup-name and backup-uid labels if they aren't currently set.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Make sure a DeleteBackupRequest has its Spec.BackupName filled in. If
not, record an error in the status and mark the request as processed.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Always request DeleteBackupRequests for a given backup so we can show
failed deletion attempts if you try to delete a backup that has PV
snapshots when Ark doesn't have a persistentVolumeProvider configured.
When creating a DeleteBackupRequest, include a label for the UID so we
can match based on name and UID when associated DeleteBackupRequests
with a given backup.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
We ran into a lot of problems using a finalizer on the backup to allow
the Ark server to clean up all associated backup data when deleting a
backup.
Users also found it less than desirable that deleting the heptio-ark
namespace resulted in all the backup data being deleted.
This removes the finalizer and replaces it with an explicit
DeleteBackupRequest that is created as a means of requesting the
deletion of a backup and all its associated data. This is what `ark
backup delete` does.
If you use kubectl to delete a backup or to delete the heptio-ark
namespace, this no longer deletes associated backups. Additionally, as
long as the heptio-ark namespace still exists, the Ark server's
BackupSyncController will continually sync backups into the heptio-ark
namespace from object storage.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Use a custom builder image to do all of Ark's builds. This image now
contains k8s.io/code-generator for code generation.
Enable docker in travis to use the builder image.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Now that we've configured pruning for dep, this removes all unused
packages, all non-go files, and all tests from the vendor directory.
NOTE: due to a change in dep, it preserves anything that looks like a
license file. We'll be pulling in a few files we weren't previously
using - mostly license files. It's easier to just go with what dep does
than to try to exclude them after the fact.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
- Add pruning settings to Gopkg.toml
- Update vendoring deps doc to point to dep installation instructions
and to use dep instead of hack/dep-save.sh
- Remove hack/dep-save.sh
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
This commit adds limitranges to defaultResourcePriorities as
suggested in #385.
This is done so that pods are not restored before the LimitRange
objects, because that would lead to pods not honoring the requests
and limits set in LimitRange objects.
Fixes#385
Signed-off-by: Shubham <shubham@linux.com>
This commit changes the type of volume mounted inside the minio pod
from hostPath to emptyDir. This is done because minio requires
at least 1Gi to start, but the default hostPath under /tmp in
minishift does not have enough capacity.
Fixes#382
Signed-off-by: Shubham <shubham@linux.com>
Instead of requiring the Ark admin to specify a "location" in the azure
persistentVolumeProvider config (meaning only a single location is
supported), get info about the disk (for its location) when creating a
snapshot, and get info about the snapshot (for its location) when
creating a disk from a snapshot.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If you want to test changes to the ark server without having to rebuild
and redeploy the ark container, this change allows you to do something
like this (assuming you've created your cloud credentials file):
AWS_SHARED_CREDENTIALS_FILE=credentials-minio ark server -n heptio-ark
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Add --force and --confirm to `ark backup delete` to support forced
backup deletion. This forcibly removes the Ark GC finalizer (if it's
present) from a backup and will orphan any resources associated with the
backup, such as backup tarballs in object storage, persistent volume
snapshots, and restores for the backup.
If a backup has a deletion timestamp, display `Deleting` in `ark backup
describe` and `ark backup get`.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Move ark server deployment & minio deployment to a separate namespace
from the backups/schedules/restores/config because backups now have a
finalizer. If everything lives in one namespace, you have to delete all
the backups and wait for the GC controller to process them and remove the
finalizer from each before deleting the namespace.
By moving the server into a separate namespace, users can now delete the
heptio-ark namespace the normal way (kubectl delete), and once that
namespace is fully removed, they can delete the heptio-ark-server
namespace.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Always try to create the config directory when saving the client config
in case it doesn't exist.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When running `ark <resource> create`, a request is sent to the server,
but the status is not immediately known. Inform the user that a request
was sent and provide a way to get more information on it.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
When creating a restore based on a backup that doesn't exist, the
restore should be marked as invalid and the error clearly communicated
so the user understands why the restore wasn't made.
Previously, the restore was left as in progress with an error attached.
Since restores are CRDs and must be updated via a controller, there's
currently not a way to give the client immediate errors.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
Add the ability for the Ark server to run in any namespace.
Add `ark client config get/set` for manipulating the new client
configuration file in $HOME/.config/ark/config.json. This holds client
defaults, such as the Ark server's namespace (to avoid having to specify
the --namespace flag all the time).
Add a --namespace flag to all client commands.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Also update:
- kubernetes to v1.9.0
- k8s.io/api, k8s.io/apimachinery, k8s.io/code-generator to kubernetes-1.9.0
- gengo to b58fc7edb82e0c6ffc9b8aef61813c7261b785d4 (to match code-generator)
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If a backup item action errors, log the error as soon as it occurs, so
it's clear when the error happened. Also include information about the
groupResource, namespace, and name of the item in the error.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When running a backup, try to do as much as possible, collecting errors
along the way, and return an aggregate at the end. This way, if a backup
fails for most reasons, we'll be able to upload the backup log file to
object storage, which wasn't happening before.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When using GKE, an additional step is needed to become cluster admin.
Without this, generating the RBAC scaffolding will result in an error.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
The main Ark code was hard-coding specific support for AWS, GCE, and
Azure volume snapshots and restores, and anything else was considered
unsupported.
Add GetVolumeID and SetVolumeID to the BlockStore interface, to allow
block store plugins to handle volume snapshots and restores.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
The log location hook was matching github.com/heptio/ark and stripping
off that + 1 more char. This meant that
github.com/heptio/ark-plugin-example/foo.go was being listed as
plugin-example/foo.go instead of
github.com/heptio/ark-plugin-example/foo.go.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If you have "cluster.local" as a search domain in /etc/resolv.conf and
you have DNS set up so it can resolve cluster.local queries (e.g.
with dnsmasq), this makes commands such as `ark restore logs` work
correctly outside of the cluster.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If backing up specific namespaces with "auto" cluster resources, the
actual namespace objects themselves were not being included in the
backup. Restore would create them but any labels or metadata would be
lost.
Instead handle the special case of namespace as a cluster level resource
we may still need, even if excluding most cluster level resources.
Signed-off-by: Devan Goodwin <dgoodwin@redhat.com>
If you have a large number of warnings and/or errors, the restore
object's size can exceed the maximum allowed by etcd. Move them to
object storage, and add a new describe command to fetch and display them
on the fly.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Deleting the clusterIP field when the service should be headless will
cause it to be assigned a new IP on restore; instead it should retain
the headless state after restoration.
Fixes#168
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
We only need them if we've got unstructured/unknown data and we want to
convert it to typed objects.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
The generated clientsets use this package, but there are no explicit
imports, so we have to manually require it.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Update k8s.io/kubernetes to v1.8.
Update k8s.io/client-go to v5.0.0
Update k8s.io/apimachinery to match
Pull in k8s.io/api release-1.8 branch
Pull in k8s.io/code-generator release-1.8 branch
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
*The backup tar file format has changed. Backups created using previous versions of Ark cannot be restored using v0.5.0.
*When backing up one or more specific namespaces, cluster-scoped resources are no longer backed up by default, with the exception of PVs that are used within the target namespace(s). Cluster-scoped resources can still be included by explicitly specifying `--include-cluster-resources`.
## Older releases:
* [CHANGELOG-1.0.md][10]
*[CHANGELOG-0.11.md][9]
*[CHANGELOG-0.10.md][8]
* [CHANGELOG-0.9.md][7]
* [CHANGELOG-0.8.md][6]
* [CHANGELOG-0.7.md][5]
* [CHANGELOG-0.6.md][4]
* [CHANGELOG-0.5.md][3]
* [CHANGELOG-0.4.md][2]
* [CHANGELOG-0.3.md][1]
New features:
* Add customized user-agent string for Ark CLI
* Switch from glog to logrus
* Exclude nodes from restoration
* Add a FAQ
* Record PV availability zone and use it when restoring volumes from snapshots
* Back up the PV associated with a PVC
* Add `--include-cluster-resources` flag to `ark backup create`
* Add `--include-cluster-resources` flag to `ark restore create`
* Properly support resource restore priorities across cluster-scoped and namespace-scoped resources
Heptio Ark is a utility for managing disaster recovery, specifically for your [Kubernetes][14] cluster resources and persistent volumes. It provides a simple, configurable, and operationally robust way to back up and restore applications and PVs from a series of checkpoints. This allows you to better automate in the following scenarios:
* **Disaster recovery** with reduced TTR (time to respond), in the case of:
* Infrastructure loss
* Data corruption
* Service outages
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. Velero lets you:
***Cross-cloud-provider migration** for Kubernetes API objects (cross-cloud-provider migration of persistent volume snapshots not yet supported)
*Take backups of your cluster and restore in case of loss.
* Migrate cluster resources to other clusters.
* Replicate your production cluster to development and testing clusters.
* **Dev and testing environment setup (+ CI)**, via replication of prod environment
Velero consists of:
More concretely, Heptio Ark combines an in-cluster service with a CLI that allows you to record both:
1.*Configurable subsets of Kubernetes API objects* -- as tarballs stored in object storage
2.*Disk snapshots of Persistent Volumes* -- via the cloud provider APIs
* A server that runs on your cluster
* A command-line client that runs locally
Heptio Ark currently supports the [AWS][15], [GCP][16], and [Azure][17] cloud provider platforms.
You can run Velero in clusters on a cloud provider or on-premises. For detailed information, see [Compatible Storage Providers][99].
## Quickstart
## Installation
This guide gets Ark up and running on your cluster, and goes through an example using the following:
* **Minio, an S3-compatible storage service** that runs locally on your cluster. This is the storage service where backup files are uploaded. *Note that Ark is intended to run on a cloud provider--we are using Minio here to keep the example convenient and self-contained.*
We strongly recommend that you use an [official release][6] of Velero. The tarballs for each release contain the
`velero` command-line client. Follow the [installation instructions][28] to get started.
* **A sample nginx app** under the `nginx-example` namespace, used to demonstrate Ark's backup and restore functionality.
_The code and sample YAML files in the master branch of the Velero repository are under active development and are not guaranteed to be stable. Use them at your own risk!_
Note that this example *does not* include a demonstration of PV disk snapshots, because that feature requires integration with a cloud provider API. For snapshotting examples and instructions specific to AWS, GCP, and Azure, see [Cloud Provider Specifics][23].
## More information
### 0. Prerequisites
[The documentation][29] provides a getting started guide, plus information about building from source, architecture, extending Velero, and more.
* *You should have access to an up-and-running Kubernetes cluster (minimum version 1.7).* If you do not have a cluster, [choose a setup solution][9] from the official Kubernetes docs.
* *You will need to have a DNS server set up on your cluster for the example files to work.* You can check this with `kubectl get svc -l k8s-app=kube-dns --namespace=kube-system`. If said service does not exist, [these instructions][12] may help.
* *You should have `kubectl` installed.* If not, follow the instructions for [installing via Homebrew (MacOS)][10] or [building the binary (Linux)][11].
### 1. Download
Clone or fork the Heptio Ark repo:
```
git clone git@github.com:heptio/ark.git
```
> NOTE: Documentation may change between releases. See the [Changelog][20] for links to previous versions of this repository and its docs.
>
> To ensure that you are working off a specific release, `git checkout <VERSION_TAG>` where `<VERSION_TAG>` is the appropriate tag for the Ark version you wish to use (e.g. "v0.3.3"). You should `git checkout master` only if you're planning on [building the Ark image from scratch][7].
### 2. Setup
There are two types of Ark instances that work in tandem:
1.**Ark server**: Runs persistently on the cluster.
2.**Ark client**: Launched by the user whenever they want to initiate an operation (e.g. a backup).
To get the server started on your cluster (as well as the local storage service), execute the following commands in Ark's root directory:
*NOTE: If you encounter an error related to Config creation, wait for a minute and run the command again. (The Config CRD does not always finish registering in time.)*
Now deploy the example nginx app:
```
kubectl apply -f examples/nginx-app/base.yaml
```
Check to see that both the Ark and nginx deployments have been successfully created:
```
kubectl get deployments -l component=ark --namespace=heptio-ark
kubectl get deployments --namespace=nginx-example
```
Finally, install the Ark client somehwere in your `$PATH`:
* [Download a pre-built release][26], or
* [Build it from scratch][7]
### 3. Back up and restore
First, create a backup specifically for any object matching the `app=nginx` label selector:
Oh no! The nginx deployment and service are both gone, as you can see (though you may have to wait a minute or two for the namespace be fully cleaned up):
```
kubectl get deployments --namespace=nginx-example
kubectl get services --namespace=nginx-example
```
Neither commands should yield any results. However, because Ark has your back(up), you can run this command:
```
ark restore create nginx-backup
```
To check on the status of the Restore:
```
ark restore get
```
The output should look something like the table below:
```
NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR
nginx-backup-20170727200524 nginx-backup Completed 0 0 2017-07-27 20:05:24 +0000 UTC <none>
```
If the Restore's `STATUS` column is "Completed", and `WARNINGS` and `ERRORS` are both zero, the restore is a success. All of the objects in the `nginx-example` namespace should be just as they were before.
Otherwise, if there are warnings or errors indicated, you can run the following command to look at them in more detail:
```
ark restore get <RESTORE NAME> -o yaml
```
See the [debugging documentation][18] for more details.
*NOTE*: In the example files, the `storage` volume is defined via `hostPath` for better visibility. If you're curious to see the [structure of the backup files][13] firsthand, you can find the compressed results in `/tmp/minio/ark/nginx-backup`.
### 4. Tear Down
Using the following command, you can remove all Kubernetes objects associated with this example:
```
kubectl delete -f examples/common/
kubectl delete -f examples/minio/
kubectl delete -f examples/nginx-app/base.yaml
```
## Architecture
Each of Heptio Ark's operations (Backups, Schedules, and Restores) are custom resources themselves, defined using [CRDs][20]. Their accompanying [custom controllers][21] handle them when they are submitted to the Kubernetes API server.
As mentioned before, Ark runs in two different modes:
* **Ark client**: Allows you to query, create, and delete the Ark resources as desired.
* **Ark server**: Runs all of the Ark controllers. Each controller watches its respective custom resource for API operations, performs validation, and handles the majority of the cloud API logic (e.g. interfacing with object storage and persistent volumes).
Looking at a specific example--an `ark backup create test-backup` command triggers the following operations:
![19]
1. The *ark client* makes a call to the Kubernetes API server, creating a `Backup` custom resource (which is stored in [etcd][22]).
2. The `BackupController` sees that a new `Backup` has been created, and validates it.
3. Once validation passes, the `BackupController` begins the backup process. It collects data by querying the Kubernetes API Server for resources.
4. Once the data has been aggregated, the `BackupController` makes a call to the object storage service (e.g. Amazon S3) to upload the backup file.
5. By default, Ark also makes disk snapshots of any persistent volumes, using the appropriate cloud service API. (This can be disabled via the option `--snapshot-volumes=false`)
## Further documentation
To learn more about Heptio Ark operations and their applications, see the [`/docs` directory][3].
Please use the version selector at the top of the site to ensure you are using the appropriate documentation for your version of Velero.
## Troubleshooting
If you encounter any problems that the documentation does not address, [file an issue][4] or talk to us on the [Kubernetes Slack team][25] channel `#ark-dr`.
If you encounter issues, review the [troubleshooting docs][30], [file an issue][4], or talk to us on the [#velero channel][25] on the Kubernetes Slack server.
## Contributing
Thanks for taking the time to join our community and start contributing!
Feedback and discussion is available on [the mailing list][24].
Feedback and discussion are available on [the mailing list][24].
#### Before you start
### Before you start
* Please familiarize yourself with the [Code of Conduct][8] before contributing.
* See [CONTRIBUTING.md][5] for instructions on the developer certificate of origin that we require.
* Read how [we're using ZenHub][26] for project and roadmap planning
#### Pull requests
### Pull requests
* We welcome pull requests. Feel free to dig through the [issues][4] and jump in.
## Changelog
See [the list of releases][6] to find out about feature changes.
- We've introduced two new custom resource definitions, `BackupStorageLocation` and `VolumeSnapshotLocation`, that replace the `Config` CRD from
previous versions. As part of this, you may now configure more than one possible location for where backups and snapshots are stored, and when you
create a `Backup` you can select the location where you'd like that particular backup to be stored. See the [Locations documentation][2] for an overview
of this feature.
- Ark's plugin system has been significantly refactored to improve robustness and ease of development. Plugin processes are now automatically restarted
if they unexpectedly terminate. Additionally, plugin binaries can now contain more than one plugin implementation (e.g. and object store *and* a block store,
or many backup item actions).
- The sync process, which ensures that Backup custom resources exist for each backup in object storage, has been revamped to run much more frequently (once
per minute rather than once per hour), to use significantly fewer cloud provider API calls, and to not generate spurious Kubernetes API errors.
- Ark can now be configured to store all data under a prefix within an object storage bucket. This means that you no longer need a separate bucket per Ark
instance; you can now have all of your clusters' Ark backups go into a single bucket, with each cluster having its own prefix/subdirectory
within that bucket.
- Restic backup data is now automatically stored within the same bucket/prefix as the rest of the Ark data. A separate bucket is no longer required (or allowed).
- Ark resources (backups, restores, schedules) can now be bulk-deleted through the `ark` CLI, using the `--all` or `--selector` flags, or by specifying
multiple resource names as arguments to the `delete` commands.
- The `ark` CLI now supports waiting for backups and restores to complete with the `--wait` flag for `ark backup create` and `ark restore create`
- Restores can be created directly from the most recent backup for a schedule, using `ark restore create --from-schedule SCHEDULE_NAME`
### Breaking Changes
Heptio Ark v0.10 contains a number of breaking changes. Upgrading will require some additional steps beyond just updating your client binary and your
container image tag. We've provided a [detailed set of instructions][1] to help you with the upgrade process. **Please read and follow these instructions
carefully to ensure a successful upgrade!**
- The `Config` CRD has been replaced by `BackupStorageLocation` and `VolumeSnapshotLocation` CRDs.
- The interface for external plugins (object/block stores, backup/restore item actions) has changed. If you have authored any custom plugins, they'll
need to be updated for v0.10.
- The [`ObjectStore.ListCommonPrefixes`](https://github.com/heptio/ark/blob/master/pkg/cloudprovider/object_store.go#L50) signature has changed to add a `prefix` parameter.
- Registering plugins has changed. Create a new plugin server with the `NewServer` function, and register plugins with the appropriate functions. See the [`Server`](https://github.com/heptio/ark/blob/master/pkg/plugin/server.go#L37) interface for details.
- The organization of Ark data in object storage has changed. Existing data will need to be moved around to conform to the new layout.
### All Changes
- [b9de44ff](https://github.com/heptio/ark/commit/b9de44ff) update docs to reference config/ dir within release tarballs
- [eace0255](https://github.com/heptio/ark/commit/eace0255) goreleaser: update example image tags to match version being released
- [cff02159](https://github.com/heptio/ark/commit/cff02159) add rbac content, rework get-started for NodePort and publicUrl, add versioning information
- [fa14255e](https://github.com/heptio/ark/commit/fa14255e) add content for docs issue 819
- [06d6665a](https://github.com/heptio/ark/commit/06d6665a) check s3URL scheme upon AWS ObjectStore Init()
- [cc359f6e](https://github.com/heptio/ark/commit/cc359f6e) Add contributor docs for our ZenHub usage
- [f6204562](https://github.com/heptio/ark/commit/f6204562) cleanup service account action log statement
- [450fa72f](https://github.com/heptio/ark/commit/450fa72f) Initialize schedule Prometheus metrics to have them created beforehand (see https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics)
- [39c4267a](https://github.com/heptio/ark/commit/39c4267a) Clarify that object storage should per-cluster
- [78cbdf95](https://github.com/heptio/ark/commit/78cbdf95) delete old deletion requests for backup when processing a new one
- [85a61b8e](https://github.com/heptio/ark/commit/85a61b8e) return nil error if 404 encountered when deleting snapshots
- [a2a7dbda](https://github.com/heptio/ark/commit/a2a7dbda) fix tagging latest by using make's ifeq
- [b4a52e45](https://github.com/heptio/ark/commit/b4a52e45) Add commands for context to the bug template
- [3efe6770](https://github.com/heptio/ark/commit/3efe6770) Update Ark library code to work with Kubernetes 1.11
- [7e8c8c69](https://github.com/heptio/ark/commit/7e8c8c69) Add some basic troubleshooting commands
- [d1955120](https://github.com/heptio/ark/commit/d1955120) require namespace for backups/etc. to exist at server startup
- [683f7afc](https://github.com/heptio/ark/commit/683f7afc) switch to using .status.startTimestamp for sorting backups
- [b71a37db](https://github.com/heptio/ark/commit/b71a37db) Record backup completion time before uploading
- [217084cd](https://github.com/heptio/ark/commit/217084cd) Add example ark version command to issue templates
- [040788bb](https://github.com/heptio/ark/commit/040788bb) Add minor improvements and aws example<Plug>delimitMateCR
- [5b89f7b6](https://github.com/heptio/ark/commit/5b89f7b6) Skip backup sync if it already exists in k8s
- [c6050845](https://github.com/heptio/ark/commit/c6050845) restore controller: switch to 'c' for receiver name
- [706ae07d](https://github.com/heptio/ark/commit/706ae07d) enable a schedule to be provided as the source for a restore
- [aea68414](https://github.com/heptio/ark/commit/aea68414) fix up Slack link in troubleshooting on master branch
- [bb8e2e91](https://github.com/heptio/ark/commit/bb8e2e91) Document how to run the Ark server locally
* Added the `velero migrate-backups` command to migrate legacy Ark backup metadata to the current Velero format in object storage. This command needs to be run in preparation for upgrading to v1.0, **if** you have backups that were originally created prior to v0.11 (i.e. when the project was named Ark).
* Heptio Ark is now Velero! This release is the first one to use the new name. For details on the changes and how to migrate to v0.11, see the [migration instructions][1]. **Please follow the instructions to ensure a successful upgrade to v0.11.**
* Restic has been upgraded to v0.9.4, which brings significantly faster restores thanks to a new multi-threaded restorer.
* Velero now waits for terminating namespaces and persistent volumes to delete before attempting to restore them, rather than trying and failing to restore them while they're being deleted.
* Wait for PVs and namespaces to delete before attempting to restore them. (#826, @nrb)
* Set the zones for GCP regional disks on restore. This requires the `compute.zones.get` permission on the GCP serviceaccount in order to work correctly. (#1200, @nrb)
* Renamed Heptio Ark to Velero. Changed internal imports, environment variables, and binary name. (#1184, @nrb)
* use 'restic stats' instead of 'restic check' to determine if repo exists (#1171, @skriss)
* upgrade restic to v0.9.4 & replace --hostname flag with --host (#1156, @skriss)
* Clarify restore log when object unchanged (#1153, @daved)
* Add backup-version file in backup tarball. (#1117, @wwitzel3)
* add ServerStatusRequest CRD and show server version in `ark version` output (#1116, @skriss)
* If a Service is headless, retain ClusterIP = None when backing up and restoring.
* Use the specifed --label-selector when listing backups, schedules, and restores.
* Restore namespace mapping functionality that was accidentally broken in 0.5.0.
* Always include namespaces in the backup, regardless of the --include-cluster-resources setting.
## v0.5.0
#### 2017-10-26
### Download
- https://github.com/heptio/ark/tree/v0.5.0
### Breaking changes
* The backup tar file format has changed. Backups created using previous versions of Ark cannot be restored using v0.5.0.
* When backing up one or more specific namespaces, cluster-scoped resources are no longer backed up by default, with the exception of PVs that are used within the target namespace(s). Cluster-scoped resources can still be included by explicitly specifying `--include-cluster-resources`.
### New features
* Add customized user-agent string for Ark CLI
* Switch from glog to logrus
* Exclude nodes from restoration
* Add a FAQ
* Record PV availability zone and use it when restoring volumes from snapshots
* Back up the PV associated with a PVC
* Add `--include-cluster-resources` flag to `ark backup create`
* Add `--include-cluster-resources` flag to `ark restore create`
* Properly support resource restore priorities across cluster-scoped and namespace-scoped resources
* **Plugins** - We now support user-defined plugins that can extend Ark functionality to meet your custom backup/restore needs without needing to be compiled into the core binary. We support pluggable block and object stores as well as per-item backup and restore actions that can execute arbitrary logic, including modifying the items being backed up or restored. For more information see the [documentation](docs/plugins.md), which includes a reference to a fully-functional sample plugin repository. (#174#188#206#213#215#217#223#226)
* **Describers** - The Ark CLI now includes `describe` commands for `backups`, `restores`, and `schedules` that provide human-friendly representations of the relevant API objects.
### Breaking Changes
* The config object format has changed. In order to upgrade to v0.6.0, the config object will have to be updated to match the new format. See the [examples](examples) and [documentation](docs/config-definition.md) for more information.
* The restore object format has changed. The `warnings` and `errors` fields are now ints containing the counts, while full warnings and errors are now stored in the object store instead of etcd. Restore objects created prior to v.0.6.0 should be deleted, or a new bucket used, and the old restore objects deleted from Kubernetes (`kubectl -n heptio-ark delete restore --all`).
* Backup deletion has been completely revamped to make it simpler and less error-prone. As a user, you still use the `ark backup delete` command to request deletion of a backup and its associated cloud
resources; behind the scenes, we've switched to using a new `DeleteBackupRequest` Custom Resource and associated controller for processing deletion requests.
* We've reduced the number of required fields in the Ark config. For Azure, `location` is no longer required, and for GCP, `project` is not needed.
* Ark now copies tags from volumes to snapshots during backup, and from snapshots to new volumes during restore.
### Breaking Changes:
* Ark has moved back to a single namespace (`heptio-ark` by default) as part of #383.
### All New Features:
* Add global `--kubecontext` flag to Ark CLI (#296, @blakebarnett)
* Azure: support cross-resource group restores of volumes (#356#378, @skriss)
* AWS/Azure/GCP: copy tags from volumes to snapshots, and from snapshots to volumes (#341, @skriss)
* Replace finalizer for backup deletion with `DeleteBackupRequest` custom resource & controller (#383#431, @ncdc@nrb)
* Don't log warnings during restore if an identical object already exists in the cluster (#405, @nrb)
* Add bash & zsh completion support (#384, @containscafeine)
### Bug Fixes / Other Changes:
* Error from the Ark CLI if attempting to restore a non-existent backup (#302, @ncdc)
* Enable running the Ark server locally for development purposes (#334, @ncdc)
* Add examples to `ark schedule create` documentation (#331, @lypht)
* GCP: Remove `project` requirement from Ark config (#345, @skriss)
* Add `--from-backup` flag to `ark restore create` and allow custom restore names (#342#409, @skriss)
* Azure: remove `location` requirement from Ark config (#344, @skriss)
* Add documentation/examples for storing backups in IBM Cloud Object Storage (#321, @roytman)
* Reduce verbosity of hooks logging (#362, @skriss)
* AWS: Add minimal IAM policy to documentation (#363#419, @hopkinsth)
* Don't restore events (#374, @sanketjpatel)
* Azure: reduce API polling interval from 60s to 5s (#359, @skriss)
* Switch from hostPath to emptyDir volume type for minio example (#386, @containscafeine)
* Add limit ranges as a prioritized resource for restores (#392, @containscafeine)
* AWS: Add documentation on using Ark with kube2iam (#402, @domderen)
* Azure: add node selector so Ark pod is scheduled on a linux node (#415, @ffd2subroutine)
* Error from the Ark CLI if attempting to get logs for a non-existent restore (#391, @containscafeine)
* GCP: Add minimal IAM policy to documentation (#429, @skriss@jody-frankowski)
### Upgrading from v0.7.1:
Ark v0.7.1 moved the Ark server deployment into a separate namespace, `heptio-ark-server`. As of v0.8.0 we've
returned to a single namespace, `heptio-ark`, for all Ark-related resources. If you're currently running v0.7.1,
here are the steps you can take to upgrade:
1. Execute the steps from the **Credentials and configuration** section for your cloud:
* Ark now has support for backing up and restoring Kubernetes volumes using a free open-source backup tool called [restic](https://github.com/restic/restic).
This provides users an out-of-the-box solution for backing up and restoring almost any type of Kubernetes volume, whether or not it has snapshot support
integrated with Ark. For more information, see the [documentation](https://github.com/heptio/ark/blob/master/docs/restic.md).
* Support for Prometheus metrics has been added! View total number of backup attempts (including success or failure), total backup size in bytes, and backup
durations. More metrics coming in future releases!
### All New Features:
* Add restic support (#508#532#533#534#535#537#540#541#545#546#547#548#555#557#561#563#569#570#571#606#608#610#621#631#636, @skriss)
- We've added a new command, `velero install`, to make it easier to get up and running with Velero. This CLI command replaces the static YAML installation files that were previously part of release tarballs. See the updated [install instructions][3] for more information.
- We've made a number of improvements to the plugin framework:
- we've reorganized the relevant packages to minimize the import surface for plugin authors
- all plugins are now wrapped in panic handlers that will report information on panics back to Velero
- Velero's `--log-level` flag is now passed to plugin implementations
- Errors logged within plugins are now annotated with the file/line of where the error occurred
- Restore item actions can now optionally return a list of additional related items that should be restored
- Restore item actions can now indicate that an item *should not* be restored
- For Azure installation, the `cloud-credentials` secret can now be created from a file containing a list of environment variables. Note that `velero install` always uses this method of providing credentials for Azure. For more details, see [Run on Azure][0].
- We've added a new phase, `PartiallyFailed`, for both backups and restores. This new phase is used for backups/restores that successfully process some but not all of their items.
- We removed all legacy Ark references, including API types, prometheus metrics, restic & hook annotations, etc.
- The restic integration remains a **beta feature**. Please continue to try it out and provide feedback, and we'll be working over the next couple of releases to bring it to GA.
### Breaking Changes
#### API
* All legacy Ark data types and pre-1.0 compatibility code has been removed. Users should migrate any backups created pre-v0.11.0 with the `velero migrate-backups` command, available in [v0.11.1][2].
#### Image
* The base container image has been switched to `ubuntu:bionic`
#### Labels/Annotations/Metrics
* The "ark" annotations for specifying hooks are no longer supported, and have been replaced with "velero"-based equivalents.
* The "ark" annotation for specifying restic backups is no longer supported, and has been replaced with a "velero"-based equivalent.
* The "ark" prometheus metrics no longer exist, and have been replaced with "velero"-based equivalents.
#### Plugin Development
*`BlockStore` plugins are now named `VolumeSnapshotter` plugins
* Plugin APIs have moved to reduce the import surface:
* Plugin gRPC servers live in `github.com/heptio/velero/pkg/plugin/framework`
* Plugin interface types live in `github.com/heptio/velero/pkg/plugin/velero`
* RestoreItemAction interface now takes the original item from the backup as a parameter
* RestoreItemAction plugins can now return additional items to restore
* RestoreItemAction plugins can now skip restoring an item
* Plugins may now send stack traces with errors to the Velero server, so that the errors may be put into the server log
* Plugins must now be "namespaced," using `example.domain.com/plugin-name` format
* For external ObjectStore and VolumeSnapshotter plugins. this name will also be the provider name in BackupStorageLoction and VolumeSnapshotLocation objects
*`--log-level` flag is now passed to all plugins
#### Validation
* Configs for Azure, AWS, and GCP are now checked for invalid or extra keys, and the server is halted if any are found
To upgrade from a previous version of Velero, see our [upgrade instructions][1].
### All Changes
* Change base images to ubuntu:bionic (#1488, @skriss)
* Expose the timestamp of the last successful backup in a gauge (#1448, @fabito)
* check backup existence before download (#1447, @fabito)
* Use `latest` image tag if no version information is provided at build time (#1439, @nrb)
* switch from `restic stats` to `restic snapshots` for checking restic repository existence (#1416, @skriss)
* GCP: add optional 'project' config to volume snapshot location for if snapshots are in a different project than the IAM account (#1405, @skriss)
* Disallow bucket names starting with '-' (#1407, @nrb)
* Shorten label values when they're longer than 63 characters (#1392, @anshulc)
* Fail backup if it already exists in object storage. (#1390, @ncdc,carlisia)
* Add PartiallyFailed phase for backups, log + continue on errors during backup process (#1386, @skriss)
* Remove deprecated "hooks" for backups (they've been replaced by "pre hooks") (#1384, @skriss)
* Restic repo ensurer: return error if new repository does not become ready within a minute, and fix channel closing/deletion (#1367, @skriss)
* Support non-namespaced names for built-in plugins (#1366, @nrb)
* Change container base images to debian:stretch-slim and upgrade to go 1.12 (#1365, @skriss)
* Azure: allow credentials to be provided in a .env file (path specified by $AZURE_CREDENTIALS_FILE), formatted like (#1364, @skriss):
```
AZURE_TENANT_ID=${AZURE_TENANT_ID}
AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID}
AZURE_CLIENT_ID=${AZURE_CLIENT_ID}
AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET}
AZURE_RESOURCE_GROUP=${AZURE_RESOURCE_GROUP}
```
* Instantiate the plugin manager with the per-restore logger so plugin logs are captured in the per-restore log (#1358, @skriss)
* Add gauge metrics for number of existing backups and restores (#1353, @fabito)
* Set default TTL for backups (#1352, @vorar)
* Validate that there can't be any duplicate plugin name, and that the name format is `example.io/name`. (#1339, @carlisia)
* AWS/Azure/GCP: fail fast if unsupported keys are provided in BackupStorageLocation/VolumeSnapshotLocation config (#1338, @skriss)
* `velero backup logs` & `velero restore logs`: show helpful error message if backup/restore does not exist or is not finished processing (#1337, @skriss)
* Add support for allowing a RestoreItemAction to skip item restore. (#1336, @sseago)
* Improve error message around invalid S3 URLs, and gracefully handle trailing backslashes. (#1331, @skriss)
* Set backup's start timestamp before patching it to InProgress so start times display in `velero backup get` while in progress (#1330, @skriss)
* Added ability to dynamically disable controllers (#1326, @amanw)
* Remove deprecated code in preparation for v1.0 release (#1323, @skriss):
- remove ark.heptio.com API group
- remove support for reading ark-backup.json files from object storage
- remove Ark field from RestoreResult type
- remove support for "hook.backup.ark.heptio.com/..." annotations for specifying hooks
- remove support for $HOME/.config/ark/ client config directory
- remove support for restoring Azure snapshots using short snapshot ID formats in backup metadata
- stop applying "velero-restore" label to restored resources and remove it from the API pkg
- remove code that strips the "gc.ark.heptio.com" finalizer from backups
- remove support for "backup.ark.heptio.com/..." annotations for requesting restic backups
- remove "ark"-prefixed prometheus metrics
- remove VolumeBackups field and related code from Backup's status
* Rename BlockStore plugin to VolumeSnapshotter (#1321, @skriss)
* Bump plugin ProtocolVersion to version 2 (#1319, @carlisia)
* Remove Warning field from restore item action output (#1318, @skriss)
* Fix for #1312, use describe to determine if AWS EBS snapshot is encrypted and explicitly pass that value in EC2 CreateVolume call. (#1316, @mstump)
* Allow restic restore helper image name to be optionally specified via ConfigMap (#1311, @skriss)
* Compile only once to lower the initialization cost for regexp.MustCompile. (#1306, @pei0804)
* Enable restore item actions to return additional related items to be restored; have pods return PVCs and PVCs return PVs (#1304, @skriss)
* Log error locations from plugin logger, and don't overwrite them in the client logger if they exist already (#1301, @skriss)
* Send stack traces from plugin errors to Velero via gRPC so error location info can be logged (#1300, @skriss)
* Azure: restore volumes in the original region's zone (#1298, @sylr)
* Check for and exclude hostPath-based persistent volumes from restic backup (#1297, @skriss)
* Make resticrepositories non-restorable resources (#1296, @skriss)
* Gracefully handle failed API groups from the discovery API (#1293, @fabito)
* Add `velero install` command for basic use cases. (#1287, @nrb)
* Collect 3 new metrics: backup_deletion_{attempt|failure|success}_total (#1280, @fabito)
* Pass --log-level flag to internal/external plugins, matching Velero server's log level (#1278, @skriss)
* AWS EBS Volume IDs now contain AZ (#1274, @tsturzl)
* Add panic handlers to all server-side plugin methods (#1270, @skriss)
* Move all the interfaces and associated types necessary to implement all of the Velero plugins to under the new package `velero`. (#1264, @carlisia)
* Update `velero restore` to not open every single file open during extraction of the data (#1261, @asaf)
* Remove restore code that waits for a PV to become Available (#1254, @skriss)
* Improve `describe` output
* Move Phase to right under Metadata(name/namespace/label/annotations)
* Move Validation errors: section right after Phase: section and only show it if the item has a phase of FailedValidation
* For restores move Warnings and Errors under Validation errors. Leave their display as is. (#1248, @DheerajSShetty)
* Don't remove storage class from a persistent volume when restoring it (#1246, @skriss)
* Need to defer closing the the ReadCloser in ObjectStoreGRPCServer.GetObject (#1236, @DheerajSShetty)
* Update Kubernetes dependencies to match v1.12, and update Azure SDK to v19.0.0 (GA) (#1231, @skriss)
* Remove pkg/util/collections/map_utils.go, replace with structured API types and apimachinery's unstructured helpers (#1146, @skriss)
* Add original resource (from backup) to restore item action interface (#1123, @mwieczorek)
**If you are running Velero in a non-default namespace**, i.e. any namespace other than `velero`, manual intervention is required when upgrading to v1.1. See [upgrading to v1.1](https://velero.io/docs/v1.1.0/upgrade-to-1.1/) for details.
### Highlights
#### Improved Restic Support
A big focus of our work this cycle was continuing to improve support for restic. To that end, we’ve fixed the following bugs:
- Prior to version 1.1, restic backups could be delayed or failed due to long-lived locks on the repository. Now, Velero removes stale locks from restic repositories every 5 minutes, ensuring they do not interrupt normal operations.
- Previously, the PodVolumeBackup custom resources that represented a restic backup within a cluster were not synchronized between clusters, making it unclear what restic volumes were available to restore into a new cluster. In version 1.1, these resources are synced into clusters, so they are more visible to you when you are trying to restore volumes.
- Originally, Velero would not validate the host path in which volumes were mounted on a given node. If a node did not expose the filesystem correctly, you wouldn’t know about it until a backup failed. Now, Velero’s restic server will validate that the directory structure is correct on startup, providing earlier feedback when it’s not.
- Velero’s restic support is intended to work on a broad range of volume types. With the general release of the [Container Storage Interface API](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/), Velero can now use restic to back up CSI volumes.
Along with our bug fixes, we’ve provided an easier way to move restic backups between storage providers. Different providers often have different StorageClasses, requiring user intervention to make restores successfully complete.
To make cross-provider moves simpler, we’ve introduced a StorageClass remapping plug-in. It allows you to automatically translate one StorageClass on PersistentVolumeClaims and PersistentVolumes to another. You can read more about it in our [documentation](https://velero.io/docs/v1.1.0/restore-reference/#changing-pv-pvc-storage-classes).
#### Quality-of-Life Improvements
We’ve also made several other enhancements to Velero that should benefit all users.
Users sometimes ask about recommendations for Velero’s resource allocation within their cluster. To help with this concern, we’ve added default resource requirements to the Velero Deployment and restic init containers, along with configurable requests and limits for the restic DaemonSet. All these values can be adjusted if your environment requires it.
We’ve also taken some time to improve Velero for the future by updating the Deployment and DaemonSet to use the apps/v1 API group, which will be the [default in Kubernetes 1.16](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#action-required-3). This change means that `velero install` and the `velero plugin` commands will require Kubernetes 1.9 or later to work. Existing Velero installs will continue to work without needing changes, however.
In order to help you better understand what resources have been backed up, we’ve added a list of resources in the `velero backup describe --details` command. This change makes it easier to inspect a backup without having to download and extract it.
In the same vein, we’ve added the ability to put custom tags on cloud-provider snapshots. This approach should provide a better way to keep track of the resources being created in your cloud account. To add a label to a snapshot at backup time, use the `--labels` argument in the `velero backup create` command.
Our final change for increasing visibility into your Velero installation is the `velero plugin get` command. This command will report all the plug-ins within the Velero deployment..
Velero has previously used a restore-only flag on the server to control whether a cluster could write backups to object storage. With Velero 1.1, we’ve now moved the restore-only behavior into read-only BackupStorageLocations. This move means that the Velero server can use a BackupStorageLocation as a source to restore from, but not for backups, while still retaining the ability to back up to other configured locations. In the future, the `--restore-only` flag will be removed in favor of configuring read-only BackupStorageLocations.
#### Community Contributions
We appreciate all community contributions, whether they be pull requests, bug reports, feature requests, or just questions. With this release, we wanted to draw attention to a few contributions in particular:
For users of node-based IAM authentication systems such as kube2iam, `velero install` now supports the `--pod-annotations` argument for applying necessary annotations at install time. This support should make `velero install` more flexible for scenarios that do not use Secrets for access to their cloud buckets and volumes. You can read more about how to use this new argument in our [AWS documentation](https://velero.io/docs/v1.1.0/aws-config/#alternative-setup-permissions-using-kube2iam). Huge thanks to [Traci Kamp](https://github.com/tlkamp) for this contribution.
Structured logging is important for any application, and Velero is no different. Starting with version 1.1, the Velero server can now output its logs in a JSON format, allowing easier parsing and ingestion. Thank you to [Donovan Carthew](https://github.com/carthewd) for this feature.
AWS supports multiple profiles for accessing object storage, but in the past Velero only used the default. With v.1.1, you can set the `profile` key on yourBackupStorageLocation to specify an alternate profile. If no profile is set, the default one is used, making this change backward compatible. Thanks [Pranav Gaikwad](https://github.com/pranavgaikwad) for this change.
Finally, thanks to testing by [Dylan Murray](https://github.com/dymurray) and [Scott Seago](https://github.com/sseago), an issue with running Velero in non-default namespaces was found in our beta version for this release. If you’re running Velero in a namespace other than `velero`, please follow the [upgrade instructions](https://velero.io/docs/v1.1.0/upgrade-to-1.1/).
### All Changes
* Add the prefix to BSL config map so that object stores can use it when initializing (#1767, @betta1)
* Use `VELERO_NAMESPACE` to determine what namespace Velero server is running in. For any v1.0 installations using a different namespace, the `VELERO_NAMESPACE` environment variable will need to be set to the correct namespace. (#1748, @nrb)
* support setting CPU/memory requests with unbounded limits using velero install (#1745, @prydonius)
* sort output of resource list in `velero backup describe --details` (#1741, @prydonius)
* adds the ability to define custom tags to be added to snapshots by specifying custom labels on the Backup CR with the velero backup create --labels flag (#1729, @prydonius)
* Restore restic volumes from PodVolumeBackups CRs (#1723, @carlisia)
* properly restore PVs backed up with restic and a reclaim policy of "Retain" (#1713, @skriss)
* Make `--secret-file` flag on `velero install` optional, add `--no-secret` flag for explicit confirmation (#1699, @nrb)
* Add low cpu/memory limits to the restic init container. This allows for restoration into namespaces with quotas defined. (#1677, @nrb)
* Adds configurable CPU/memory requests and limits to the restic DaemonSet generated by velero install. (#1710, @prydonius)
* remove any stale locks from restic repositories every 5m (#1708, @skriss)
* error if backup storage location's Bucket field also contains a prefix, and gracefully handle leading/trailing slashes on Bucket and Prefix fields. (#1694, @skriss)
* Adds configurable CPU/memory requests and limits to the Velero Deployment generated by velero install. (#1678, @prydonius)
* Store restic PodVolumeBackups in obj storage & use that as source of truth like regular backups. (#1577, @carlisia)
* Update Velero Deployment to use apps/v1 API group. `velero install` and `velero plugin add/remove` commands will now require Kubernetes 1.9+ (#1673, @nrb)
* Respect the --kubecontext and --kubeconfig arugments for `velero install`. (#1656, @nrb)
* add plugin for updating PV & PVC storage classes on restore based on a config map (#1621, @skriss)
* Add restic support for CSI volumes (#1615, @nrb)
* enhancement: allow users to specify additional Velero/Restic pod annotations on the command line with the pod-annotations flag. (#1626, @tlkamp)
* adds validation for pod volumes hostPath mount on restic server startup (#1616, @prydonius)
* enable support for ppc64le architecture (#1605, @prajyot)
* bug fix: only restore additional items returned from restore item actions if they match the restore's namespace/resource selectors (#1612, @skriss)
* add startTimestamp and completionTimestamp to PodVolumeBackup and PodVolumeRestore status fields (#1609, @prydonius)
* bug fix: respect namespace selector when determining which restore item actions to run (#1607, @skriss)
* ensure correct backup item actions run with namespace selector (#1601, @prydonius)
* allows excluding resources from backups with the `velero.io/exclude-from-backup=true` label (#1588, @prydonius)
* ensures backup item action modifications to an item's namespace/name are saved in the file path in the tarball (#1587, @prydonius)
* Hides `velero server` and `velero restic server` commands from the list of available commands as these are not intended for use by the velero CLI user. (#1561, @prydonius)
* remove dependency on glog, update to klog (#1559, @skriss)
* move issue-template-gen from docs/ to hack/ (#1558, @skriss)
* fix panic when processing DeleteBackupRequest objects without labels (#1556, @prydonius)
* support for multiple AWS profiles (#1548, @pranavgaikwad)
* Add CLI command to list (get) all Velero plugins (#1535, @carlisia)
* Added author as a tag on blog post. Should fix 404 error when trying to follow link as specified in issue #1522. (#1522, @coonsd)
* Allow individual backup storage locations to be read-only (#1517, @skriss)
* Stop returning an error when a restic volume is empty since it is a valid scenario. (#1480, @carlisia)
* add ability to use wildcard in includes/excludes (#1428, @guilhem)
# Design proposal template (replace with your proposal's title)
Status: {Draft,Accepted,Declined}
One to two sentences that describes the goal of this proposal.
The reader should be able to tell by the title, and the opening paragraph, if this document is relevant to them.
_Note_: The preferred style for design documents is one sentence per line.
*Do not wrap lines*.
This aids in review of the document as changes to a line are not obscured by the reflowing those changes caused and has a side effect of avoiding debate about one or two space after a period.
## Goals
- A short list of things which will be accomplished by implementing this proposal.
- Two things is ok.
- Three is pushing it.
- More than three goals suggests that the proposal's scope is too large.
## Non Goals
- A short list of items which are:
- a. out of scope
- b. follow on items which are deliberately excluded from this proposal.
## Background
One to two paragraphs of exposition to set the context for this proposal.
## High-Level Design
One to two paragraphs that describe the high level changes that will be made to implement this proposal.
## Detailed Design
A detailed design describing how the changes to the product should be made.
The names of types, fields, interfaces, and methods should be agreed on here, not debated in code review.
The same applies to changes in CRDs, YAML examples, and so on.
Ideally the changes should be made in sequence so that the work required to implement this design can be done incrementally, possibly in parallel.
## Alternatives Considered
If there are alternative high level or detailed designs that were not pursued they should be called out here with a brief explanation of why they were not pursued.
## Security Considerations
If this proposal has an impact to the security of the product, its users, or data stored or transmitted via the product, they must be addressed here.
# Expose list of backed up resources in backup details
Status: Accepted
To increase the visibility of what a backup might contain, this document proposes storing metadata about backed up resources in object storage and adding a new section to the detailed backup description output to list them.
## Goals
- Include a list of backed up resources as metadata in the bucket
- Enable users to get a view of what resources are included in a backup using the Velero CLI
## Non Goals
- Expose the full manifests of the backed up resources
## Background
As reported in [#396](https://github.com/heptio/velero/issues/396), the information reported in a `velero backup describe <name> --details` command is fairly limited, and does not easily describe what resources a backup contains.
In order to see what a backup might contain, a user would have to download the backup tarball and extract it.
This makes it difficult to keep track of different backups in a cluster.
## High-Level Design
After performing a backup, a new file will be created that contains the list of the resources that have been included in the backup.
This file will be persisted in object storage alongside the backup contents and existing metadata.
A section will be added to the output of `velero backup describe <name> --details` command to view this metadata.
## Detailed Design
### Metadata file
This metadata will be in JSON (or YAML) format so that it can be easily inspected from the bucket outside of Velero tooling, and will contain the API resource and group, namespaces and names of the resources:
```
apps/v1/Deployment:
- default/database
- default/wordpress
v1/Service:
- default/database
- default/wordpress
v1/Secret:
- default/database-root-password
- default/database-user-password
v1/ConfigMap:
- default/database
v1/PersistentVolume:
- my-pv
```
The filename for this metadata will be `<backup name>-resource-list.json.gz`.
The top-level key is the string form of the `schema.GroupResource` type that we currently keep track of in the backup controller code path.
### Changes in Backup controller
The Backupper currently initialises a map to track the `backedUpItems` (https://github.com/heptio/velero/blob/1594bdc8d0132f548e18ffcc1db8c4cd2b042726/pkg/backup/backup.go#L269), this is passed down through GroupBackupper, ResourceBackupper and ItemBackupper where ItemBackupper records each backed up item.
This property will be moved to the [Backup request struct](https://github.com/heptio/velero/blob/16910a6215cbd8f0bde385dba9879629ebcbcc28/pkg/backup/request.go#L11), allowing the BackupController to access it after a successful backup.
`backedUpItems` currently uses the `schema.GroupResource` as a key for the resource.
In order to record the API group, version and kind for the resource, this key will be constructed from the object's `schema.GroupVersionKind` in the format `{group}/{version}/{kind}` (e.g. `apps/v1/Deployment`).
The `backedUpItems` map is kept as a flat structure internally for quick lookup.
When the backup is ready to upload, `backedUpItems` will be converted to a nested structure representing the metadata file above, grouped by `schema.GroupVersionKind`.
After converting to the right format, it can be passed to the `persistBackup` function to persist the file in object storage.
### Changes to DownloadRequest CRD and processing
A new `DownloadTargetKind` "BackupResourceList" will be added to the DownloadRequest CR.
The `GetDownloadURL` function in the `persistence` package will be updated to handle this new DownloadTargetKind to enable the Velero client to fetch the metadata from the bucket.
### Changes to `velero backup describe <name> --details`
This command will need to be updated to fetch the metadata from the bucket using the `Stream` method used in other commands.
The file will be read in memory and displayed in the output of the command.
Depending on the format the metadata is stored in, it may need processing to print in a more human-readable format.
If we choose to store the metadata in YAML, it can likely be directly printed out.
If the metadata file does not exist, this is an older backup and we cannot display the list of resources that were backed up.
## Open Questions
## Alternatives Considered
### Fetch backup contents archive and walkthrough to list contents
Instead of recording new metadata about what resources have been backed up, we could simply download the backup contents archive and walkthrough it to list the contents everytime `velero backup describe <name> --details` is run.
The advantage of this approach is that we don't need to change any backup procedures as we already have this content, and we will also be able to list resources for older backups.
Additionally, if we wanted to expose more information about the backed up resources, we can do so without having to update what we store in the metadata.
The disadvantages are:
- downloading the whole backup archive will be larger than just downloading a smaller file with metadata
- reduces the metadata available in the bucket that users might want to inspect outside of Velero tooling (though this is not an explicit requirement)
Some features may take a while to get fully implemented, and we don't necessarily want to have long-lived feature branches
A simple feature flag implementation allows code to be merged into master, but not used unless a flag is set.
## Goals
- Allow unfinished features to be present in Velero releases, but only enabled when the associated flag is set.
## Non Goals
- A robust feature flag library.
## Background
When considering the [CSI integration work](https://github.com/heptio/velero/pull/1661), the timelines involved presented a problem in balancing a release and longer-running feature work.
A simple implementation of feature flags can help protect unfinished code while allowing the rest of the changes to ship.
## High-Level Design
A new command line flag, `--features` will be added to the root `velero` command.
`--features` will accept a comma-separated list of features, such as `--features EnableCSI,Replication`.
Each feature listed will correspond to a key in a map in `pkg/features/features.go` defining whether a feature should be enabled.
Any code implementing the feature would then import the map and look up the key's value.
For the Velero client, a `features` key can be added to the `config.json` file for more convenient client invocations.
## Detailed Design
A new `features` package will be introduced with these basic structs:
```go
typeFeatureFlagSetstruct{
flagsmap[string]bool
}
typeFlagsinterface{
// Enabled reports whether or not the specified flag is found.
Enabled(namestring)bool
// Enable adds the specified flags to the list of enabled flags.
Enable(names...string)
// All returns all enabled features
All()[]string
}
// NewFeatureFlagSet initializes and populates a new FeatureFlagSet
Velero supports restoring resources into different namespaces than they were backed up from.
This enables a user to, among other things, clone a namespace within a cluster.
However, if the namespace being cloned uses persistent volume claims, Velero cannot currently create a second copy of the original persistent volume when restoring.
This limitation is documented in detail in [issue #192](https://github.com/heptio/velero/issues/192).
This document proposes a solution that allows new copies of persistent volumes to be created during a namespace clone.
## Goals
- Enable persistent volumes to be cloned when using `velero restore create --namespace-mappings ...` to create a second copy of a namespace within a cluster.
## Non Goals
- Cloning of persistent volumes in any scenario other than when using `velero restore create --namespace-mappings ...` flag.
During a restore, Velero will detect that it needs to assign a new name to a persistent volume being restored if and only if both of the following conditions are met:
- the persistent volume is claimed by a persistent volume claim in a namespace that's being remapped using `velero restore create --namespace-mappings ...`
- a persistent volume already exists in the cluster with the original name
If these conditions exist, Velero will give the persistent volume a new arbitrary name before restoring it.
It will also update the `spec.volumeName` of the related persistent volume claim.
## Detailed Design
In `pkg/restore/restore.go`, around [line 872](https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L872), Velero has special-case code for persistent volumes.
This code will be updated to check for the two preconditions described in the previous section.
If the preconditions are met, the object will be given a new name.
The persistent volume will also be annotated with the original name, e.g. `velero.io/original-pv-name=NAME`.
Importantly, the name change will occur **before** [line 890](https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L890), where Velero checks to see if it should restore the persistent volume.
Additionally, the old and new persistent volume names will be recorded in a new field that will be added to the `context` struct, `renamedPVs map[string]string`.
In the special-case code for persistent volume claims starting on [line 987](https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L987), Velero will check to see if the claimed persistent volume has been renamed by looking in `ctx.renamedPVs`.
If so, Velero will update the persistent volume claim's `spec.volumeName` to the new name.
## Alternatives Considered
One alternative approach is to add a new CLI flag and API field for restores, e.g. `--clone-pvs`, that a user could provide to indicate they want to create copies of persistent volumes.
This approach would work fine, but it does require the user to be aware of this flag/field and to properly specify it when needed.
It seems like a better UX to detect the typical conditions where this behavior is needed, and to automatically apply it.
Additionally, the design proposed here does not preclude such a flag/field from being added later, if it becomes necessary to cover other use cases.
# Progress reporting for restic backups and restores
Status: Accepted
During long-running restic backups/restores, there is no visibility into what (if anything) is happening, making it hard to know if the backup/restore is making progress or hung, how long the operation might take, etc.
We should capture progress during restic operations and make it user-visible so that it's easier to reason about.
This document proposes an approach for capturing progress of backup and restore operations and exposing this information to users.
## Goals
- Provide basic visibility into restic operations to inform users about their progress.
## Non Goals
- Capturing progress for non-restic backups and restores.
## Background
(Omitted, see introduction)
## High-Level Design
### restic backup progress
The `restic backup` command provides progress reporting to stdout in JSON format, which includes the completion percentage of the backup.
This progress will be read on some interval and the PodVolumeBackup Custom Resource's (CR) status will be updated with this information.
### restic restore progress
The `restic stats` command returns the total size of a backup.
This can be compared with the total size the volume periodically to calculate the completion percentage of the restore.
The PodVolumeRestore CR's status will be updated with this information.
## Detailed Design
## Changes to PodVolumeBackup and PodVolumeRestore Status type
A new `Progress` field will be added to PodVolumeBackupStatus and PodVolumeRestoreStatus of type `PodVolumeOperationProgress`:
```
type PodVolumeOperationProgress struct {
TotalBytes int64
BytesDone int64
}
```
### restic backup progress
restic added support for [streaming JSON output for the `restic backup` command](https://github.com/restic/restic/pull/1944) in 0.9.5.
Our current images ship restic 0.9.4, and so the Dockerfile will be updated to pull the new version: https://github.com/heptio/velero/blob/af4b9373fc73047f843cd4bc3648603d780c8b74/Dockerfile-velero#L21.
With the `--json` flag, `restic backup` outputs single lines of JSON reporting the status of the backup:
The [command factory for backup](https://github.com/heptio/velero/blob/af4b9373fc73047f843cd4bc3648603d780c8b74/pkg/restic/command_factory.go#L37) will be updated to include the `--json` flag.
The code to run the `restic backup` command (https://github.com/heptio/velero/blob/af4b9373fc73047f843cd4bc3648603d780c8b74/pkg/controller/pod_volume_backup_controller.go#L241) will be changed to include a Goroutine that reads from the command's stdout stream.
The implementation of this will largely follow [@jmontleon's PoC](https://github.com/fusor/velero/pull/4/files) of this.
The Goroutine will periodically read the stream (every 10 seconds) and get the last printed status line, which will be convered to JSON.
If `bytes_done` is empty, restic has not finished scanning the volume and hasn't calculated the `total_bytes`.
In this case, we will not update the PodVolumeBackup and instead will wait for the next iteration.
Once we get a non-zero value for `bytes_done`, the `bytes_done` and `total_bytes` properties will be read and the PodVolumeBackup will be patched to update `status.Progress.BytesDone` and `status.Progress.TotalBytes` respectively.
Once the backup has completed successfully, the PodVolumeBackup will be patched to set `status.Progress.BytesDone = status.Progress.TotalBytes`.
This is done since the main thread may cause early termination of the Goroutine once the operation has finished, preventing a final update to the `BytesDone` property.
### restic restore progress
The `restic stats <snapshot_id> --json` command provides information about the size of backups:
```
{"total_size":10558111744,"total_file_count":11}
```
Before beginning the restore operation, we can use the output of `restic stats` to get the total size of the backup.
The PodVolumeRestore will be patched to set `status.Progress.TotalBytes` to the total size of the backup.
The code to run the `restic restore` command will be changed to include a Goroutine that periodically (every 10 seconds) gets the current size of the volume.
To get the current size of the volume, we will recursively walkthrough all files in the volume to accumulate the total size.
The current total size is the number of bytes transferred so far and the PodVolumeRestore will be patched to update `status.Progress.BytesDone`.
Once the restore has completed successfully, the PodVolumeRestore will be patched to set `status.Progress.BytesDone = status.Progress.TotalBytes`.
This is done since the main thread may cause early termination of the Goroutine once the operation has finished, preventing a final update to the `BytesDone` property.
### Velero CLI changes
The output that describes detailed information about [PodVolumeBackups](https://github.com/heptio/velero/blob/559d62a2ec99f7a522924348fc4a173a0699813a/pkg/cmd/util/output/backup_describer.go#L349) and [PodVolumeRestores](https://github.com/heptio/velero/blob/559d62a2ec99f7a522924348fc4a173a0699813a/pkg/cmd/util/output/restore_describer.go#L160) will be updated to calculate and display a completion percentage from `status.Progress.TotalBytes` and `status.Progress.BytesDone` if available.
## Open Questions
- Can we assume that the volume we are restoring in will be empty? Can it contain other artefacts?
- Based on discussion in this PR, we are okay making the assumption that the PVC is empty and will proceed with the above proposed approach.
## Alternatives Considered
### restic restore progress
If we cannot assume that the volume we are restoring into will be empty, we can instead use the output from `restic snapshot` to get the list of files in the backup.
This can then be used to calculate the current total size of just those files in the volume, so that we avoid considering any other files unrelated to the backup.
The above proposed approach is simpler than this one, as we don't need to keep track of each file in the backup, but this will be more robust if the volume could contain other files not included in the backup.
It's possible that certain volume types may contain hidden files that could attribute to the total size of the volume, though these should be small enough that the BytesDone calculation will only be slightly inflated.
Another option is to contribute progress reporting similar to `restic backup` for `restic restore` upstream.
This may take more time, but would give us a more native view on the progress of a restore.
There are several issues about this already in the restic repo (https://github.com/restic/restic/issues/426, https://github.com/restic/restic/issues/1154), and what looks like an abandoned attempt (https://github.com/restic/restic/pull/2003) which we may be able to pick up.
The YAML config files in this directory can be used to quickly deploy a containerized Ark deployment.
This directory contains sample YAML config files that can be used for exploring Velero.
*`common/`: Contains manifests to set up Ark. Can be used across cloud provider platforms. (Note that Azure requires its own deployment file due to its unique way of loading credentials).
*`minio/`: Used in the [Quickstart][0] to set up [Minio][1], a local S3-compatible object storage service. It provides a convenient way to test Velero without tying you to a specific cloud provider.
*`minio/`: Used in the [Quickstart][1] to set up [Minio][0], a local S3-compatible object storage service. It provides a convenient way to test Ark without tying you to a specific cloud provider.
*`nginx-app/`: A sample nginx app that can be used to test backups and restores.
*`aws/`, `azure/`, `gcp/`: Contains manifests specific to the given cloud provider's setup.
# replace known version-specific links -- the sed syntax is slightly different in OS X and Linux,
# so check which OS we're running on.
if[[$(uname)=="Darwin"]];then
echo"[OS X] updating version-specific links"
find site/docs/${NEW_DOCS_VERSION} -type f -name "*.md"| xargs sed -i ''"s|https://velero.io/docs/master|https://velero.io/docs/$NEW_DOCS_VERSION|g"
find site/docs/${NEW_DOCS_VERSION} -type f -name "*.md"| xargs sed -i ''"s|https://github.com/heptio/velero/blob/master|https://github.com/heptio/velero/blob/$NEW_DOCS_VERSION|g"
echo"[OS X] Updating latest version in _config.yml"
sed -i ''"s/latest: ${PREVIOUS_DOCS_VERSION}/latest: ${NEW_DOCS_VERSION}/" site/_config.yml
# newlines and lack of indentation are requirements for this sed syntax
# which is doing an append
echo"[OS X] Adding latest version to versions list in _config.yml"
sed -i ''"/- master/a\\
- ${NEW_DOCS_VERSION}
" site/_config.yml
echo"[OS X] Adding ToC mapping entry"
sed -i ''"/master: master-toc/a\\
${NEW_DOCS_VERSION}: ${NEW_DOCS_TOC}
" site/_data/toc-mapping.yml
else
echo"[Linux] updating version-specific links"
find site/docs/${NEW_DOCS_VERSION} -type f -name "*.md"| xargs sed -i''"s|https://velero.io/docs/master|https://velero.io/docs/$NEW_DOCS_VERSION|g"
find site/docs/${NEW_DOCS_VERSION} -type f -name "*.md"| xargs sed -i''"s|https://github.com/heptio/velero/blob/master|https://github.com/heptio/velero/blob/$NEW_DOCS_VERSION|g"
echo"[Linux] Updating latest version in _config.yml"
sed -i''"s/latest: ${PREVIOUS_DOCS_VERSION}/latest: ${NEW_DOCS_VERSION}/" site/_config.yml
echo"[Linux] Adding latest version to versions list in _config.yml"
sed -i''"/- master/a - ${NEW_DOCS_VERSION}" site/_config.yml
echo"[Linux] Adding ToC mapping entry"
sed -i''"/master: master-toc/a ${NEW_DOCS_VERSION}: ${NEW_DOCS_TOC}" site/_data/toc-mapping.yml
fi
echo"Success! site/docs/$NEW_DOCS_VERSION has been created."
echo""
echo"The next steps are:"
echo" 1. Consult site/README-JEKYLL.md for further manual steps required to finalize the new versioned docs generation."
echo" 2. Run a 'git diff' to review all changes made to the docs since the previous version."
echo" 3. Make any manual changes/corrections necessary."
echo" 4. Run 'git add' to stage all unstaged changes, then 'git commit'."
// PodVolumeBackupList is a list of PodVolumeBackups.
typePodVolumeBackupListstruct{
metav1.TypeMeta`json:",inline"`
metav1.ListMeta`json:"metadata"`
Items[]PodVolumeBackup`json:"items"`
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.