* Restore CAPI cluster objects in a better order
Restoring CAPI workload clusters without this ordering caused the
capi-controller-manager code to panic, resulting in an unhealthy cluster
state.
This can be worked around
(https://community.pivotal.io/s/article/5000e00001pJyN41611954332537?language=en_US),
but we provide the inclusion of these resources as a default in order to
provide a better out-of-the-box experience.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Use pod namespace from backup when matching PVBs (#3475)
* Use pod namespace from backup when matching PVBs
In #3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.
To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.
Fixes#3467.
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Explain why the namespace mapping can't be used
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Allow Dockerfiles to be configurable (#3634)
For internal builds of Velero, we need to be able to specify an
alternative Dockerfile which uses an alternative image registry to pull
the base images from. This change adapts our Makefile such that both the
main Dockerfile and build image Dockerfile can be overridden.
We have some special handling for the build image to only build when the
Dockerfile has changed. In this case, we check whether a custom
Dockerfile has been provided, and always rebuild in that case. For
custom build image Dockerfiles, use a fixed tag rather than the one
based on commit SHA of the original file.
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Combine CRD install verification into 1 job, and update k8s versions (#3448)
* Validate CRDs against latest Kubernetes versions
Add Kubernetes v1.19 and v1.20 series images, and consolidate the job
into a single file to reduce repetition.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Ignore job if the changes are only site/design
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Fix codespell error
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Cache Velero binary for reuse on workers
This will cache the Velero binary based on the PR number and a SHA256 of
the generated binary.
This way, the runners testing each version of Kubernetes do not need to
build it independently.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Fix GitHub event access
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Wrap output path in quotes
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Move code checkout to build step
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Also cache go modules
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Fix syntax issues
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Download cached binary on each node
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Use cached go modules on main CI
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add changelog for v1.5.4
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
Co-authored-by: Nolan Brubaker <brubakern@vmware.com>
* Exec hooks in restored pods
Signed-off-by: Andrew Reed <andrew@replicated.com>
* WaitExecHookHandler implements ItemHookHandler
This required adding a context.Context argument to the ItemHookHandler
interface which is unused by the DefaultItemHookHandler implementation.
It also means passing nil for the []ResourceHook argument since that
holds BackupResourceHook.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* WaitExecHookHandler unit tests
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Changelog and go fmt
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Fix double import
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Default to first contaienr in pod
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Use constants for hook error modes in tests
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Revert to separate WaitExecHookHandler interface
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Negative tests for invalid timeout annotations
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Rename NamedExecRestoreHook PodExecRestoreHook
Also make field names more descriptive.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Cleanup test names
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Separate maxHookWait and add unit tests
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Comment on maxWait <= 0
Also info log container is not running for hooks to execute in.
Also add context error to hooks not executed errors.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Remove log about default for invalid timeout
There is no default wait or exec timeout.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Linting
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Fix log message and rename controller to podWatcher
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Comment on exactly-once semantics for handler
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Fix logging and comments
Use filed logger for pod in handler.
Add comment about pod changes in unit tests.
Use kube util NamespaceAndName in messages.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Fix maxHookWait
Signed-off-by: Andrew Reed <andrew@replicated.com>
* k8s 1.18 import wip
backup, cmd, controller, generated, restic, restore, serverstatusrequest, test and util
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* go mod tidy
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* add changelog file
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* go fmt
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* update code-generator and controller-gen in CI
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* checkout proper code-generator version, regen
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* fix remaining calls
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* regenerate CRDs with ./hack/update-generated-crd-code.sh
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* use existing context in restic and server
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* fix test cases by resetting resource version
also use main library go context, not golang.org/x/net/context, in pkg/restore/restore.go
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* clarify changelog message
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* use github.com/kubernetes-csi/external-snapshotter/v2@v2.2.0-rc1
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* run 'go mod tidy' to remove old external-snapshotter version
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* kubebuilder init - minimalist version
Signed-off-by: Carlisia <carlisia@vmware.com>
* Add back main.go, apparently kb needs it
Signed-off-by: Carlisia <carlisia@vmware.com>
* Tweak makefile to accomodate kubebuilder expectations
Signed-off-by: Carlisia <carlisia@vmware.com>
* Port BSL to kubebuilder api client
Signed-off-by: Carlisia <carlisia@vmware.com>
* s/cache/client bc client fetches from cache
And other naming improvements
Signed-off-by: Carlisia <carlisia@vmware.com>
* So, .GetAPIReader is how we bypass the cache
In this case, the cache hasn't started yet
Signed-off-by: Carlisia <carlisia@vmware.com>
* Oh that's what this code was for... adding back
We still need to embed the CRDs as binary data in the Velero binary to
access the generated CRDs at runtime.
Signed-off-by: Carlisia <carlisia@vmware.com>
* Tie in CRD/code generation w/ existing scripts
Signed-off-by: Carlisia <carlisia@vmware.com>
* Mostly result of running update-fmt, updated file formatting
Signed-off-by: Carlisia <carlisia@vmware.com>
* Just a copyright fix
Signed-off-by: Carlisia <carlisia@vmware.com>
* All the test fixes
Signed-off-by: Carlisia <carlisia@vmware.com>
* Add changelog + some cleanup
Signed-off-by: Carlisia <carlisia@vmware.com>
* Update backup manifest
Signed-off-by: Carlisia <carlisia@vmware.com>
* Remove unneeded auto-generated files
Signed-off-by: Carlisia <carlisia@vmware.com>
* Keep everything in the same (existing) package
Signed-off-by: Carlisia <carlisia@vmware.com>
* Fix/clean scripts, generated code, and calls
Deleting the entire `generated` directory and running `make update`
works. Modifying an api and running `make verify` works as expected.
Signed-off-by: Carlisia <carlisia@vmware.com>
* Clean up schema and client calls + code reviews
Signed-off-by: Carlisia <carlisia@vmware.com>
* Move all code gen to inside builder container
Signed-off-by: Carlisia <carlisia@vmware.com>
* Address code review
Signed-off-by: Carlisia <carlisia@vmware.com>
* Fix imports/aliases
Signed-off-by: Carlisia <carlisia@vmware.com>
* More code reviews
Signed-off-by: Carlisia <carlisia@vmware.com>
* Add waitforcachesync
Signed-off-by: Carlisia <carlisia@vmware.com>
* Have manager register ALL controllers
This will allow for proper cache management.
Signed-off-by: Carlisia <carlisia@vmware.com>
* Status subresource is now enabled; cleanup
Signed-off-by: Carlisia <carlisia@vmware.com>
* More code reviews
Signed-off-by: Carlisia <carlisia@vmware.com>
* Clean up
Signed-off-by: Carlisia <carlisia@vmware.com>
* Manager registers ALL controllers for restic too
Signed-off-by: Carlisia <carlisia@vmware.com>
* More code reviews
Signed-off-by: Carlisia <carlisia@vmware.com>
* Add deprecation warning/todo
Signed-off-by: Carlisia <carlisia@vmware.com>
* Add documentation
Signed-off-by: Carlisia <carlisia@vmware.com>
* Add helpful comments
Signed-off-by: Carlisia <carlisia@vmware.com>
* Address code review
Signed-off-by: Carlisia <carlisia@vmware.com>
* More idiomatic Runnable
Signed-off-by: Carlisia <carlisia@vmware.com>
* Clean up imports
Signed-off-by: Carlisia <carlisia@vmware.com>
* Switch to backing up v1beta1 CRDs from API server
Instead of simply switching out the APIVersion string on a v1
CustomResourceDefinition object, re-download the object from the API
server entirely to get the correct fields.
This should fix validation errors upon restore.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Fix existing tests
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add full example CRDs to automated tests
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Move beta CRD lookup into helper function
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add case for preserveUnknownFields CRDs
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add PreserveUnknownFields case and refactor execute
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add older prometheus CRD test cases
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Infomers won't start if cancelFunc is invoked as soon as the newServer
function exits via the defer
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Account for having CSI enabled or not, as well as having the snapshot
CRDs installed in the kubernetes cluster.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Account for possible missing schemas on v1 CRDs
If a v1beta1 CRD without a Schema was submitted to a Kubernets v1.16
cluster, then Kubernetes will server it back as a v1 CRD without a
schema.
However, when Velero tries to restore this document, the request will be
rejected as a v1 CRD must have a schema.
This commit has some defensive coding on the restore side, as well as
potential fixes on the backup side for getting around this.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Back up nonstructural CRDs as v1beta1
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add tests for remapping plugin
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add builders for v1 CRDs
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Address review feedback
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove extraneous log message
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Wait for CRDs to be available and ready
When restoring CRDs, we should wait for the definition to be ready and
available before moving on to restoring specific CRs.
While the CRDs are often ready by the time we get to restoring a CR,
there is a race condition where the CRD isn't ready.
This change waits on each CRD at restore time.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* update import paths to github.com/vmware-tanzu/...
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update other GH org refs to vmware-tanzu
Signed-off-by: Steve Kriss <krisss@vmware.com>
* site and docs: update GH org to vmware-tanzu
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update travis badge links on docs readmes
Signed-off-by: Steve Kriss <krisss@vmware.com>
The Velero deployment did not have a way of exposing the namespace it
was installed in to the API client. This is a problem for plugins that
need to query for resources in that namespaces, such as the restic
restore process that needs to find PodVolume(Backup|Restore)s.
While the Velero client is consulted for a configured namespace, this
cannot be set in the server pod since there is no valid home directory
in which to place it.
This change provides the namespace to the deployment via the downward
API, and updates the API client factory to use the VELERO_NAMESPACE
before looking at the config file, so that any plugins using the client
will look at the appropriate namespace.
Fixes#1743
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>