mirror of
https://github.com/google/nomulus
synced 2026-06-09 16:33:02 +00:00
dcfe939c38
Summarize all documentation updates across the repository to align with modern GKE, Cloud SQL Proxy v2, standard EPP fee v1.0, and Postgres database environments. Key Updates: - Prerequisites: Bump Java requirement to Java 25. - Architecture & Scaling: Document GKE workloads, Cloud Tasks queues, and scheduled tasks. Replace App Engine references with GKE deployment restart commands (kubectl rollout restart). - Configuration: Update Cloud SQL Proxy instructions to v2, fix keyring verification commands, and document IAP configuration. - Escrow (RDE/BRDA): Fix manual generation and download procedures to match the Dataflow job ID folder structure, and correct deposit encryption/verification command parameters. - Monitoring: Correct metric names and expand the documented metrics list with caching, locking, and reserved list metrics. - Fixes: Standardize lists formatting across markdown files, fix broken webdriver links, and resolve various typos. - Cleanup: Remove leftover cloud scheduler configurations for the deleted wipeOutContactHistoryPii task, and update ICANN reporting documentation to reflect open-sourced DNS query coordinator. TAG=agy CONV=88271e71-e272-40e0-85f8-a075a423b7c2
775 lines
29 KiB
Markdown
775 lines
29 KiB
Markdown
# Proxy Setup Instructions
|
|
|
|
This doc covers procedures to configure, build and deploy the
|
|
[Netty](https://netty.io)-based proxy onto [Kubernetes](https://kubernetes.io)
|
|
clusters.
|
|
[Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) is used
|
|
as deployment target. Any kubernetes cluster should in theory work, but the user
|
|
needs to change some dependencies on other GCP features such as Cloud KMS for
|
|
key management and Stackdriver for monitoring.
|
|
|
|
## Overview
|
|
|
|
Nomulus runs on GKE, and natively only supports HTTP(S) traffic. In order to
|
|
work with [EPP](https://tools.ietf.org/html/rfc5730.html) (TCP port 700), a
|
|
proxy is needed to relay traffic between clients and Nomulus and do protocol
|
|
translation.
|
|
|
|
We provide a Netty-based proxy that runs as a standalone service (separate from
|
|
Nomulus) either on a VM or Kubernetes clusters. Deploying to Kubernetes is
|
|
recommended as it provides automatic scaling and management for Docker
|
|
containers that alleviates much of the pain of running a production service.
|
|
|
|
The procedure described here can be used to set up a production environment, as
|
|
most of the steps only needs to be configured once for each environment.
|
|
However, proper release management (cutting a release, rolling updates, canary
|
|
analysis, reliable rollback, etc) is not covered. The user is advised to use a
|
|
service like [Spinnaker](https://www.spinnaker.io/) for release management.
|
|
|
|
## Detailed Instructions
|
|
|
|
We use [`gcloud`](https://cloud.google.com/sdk/gcloud/) and
|
|
[`terraform`](https://terraform.io) to configure the proxy project on GCP and to
|
|
create a GCS bucket for storing the terraform state file. We use
|
|
[`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) to deploy
|
|
the proxy to the project. These instructions assume that all three tools are
|
|
installed.
|
|
|
|
### Setup GCP project
|
|
|
|
There are three projects involved:
|
|
|
|
- Nomulus project: the project that hosts Nomulus.
|
|
- Proxy project: the project that hosts this proxy.
|
|
- GCR
|
|
([Google Container Registry](https://cloud.google.com/container-registry/))
|
|
project: the project from which the proxy pulls its Docker image.
|
|
|
|
We recommend using the same project for Nomulus and the proxy, so that logs for
|
|
both are collected in the same place and easily accessible. If there are
|
|
multiple Nomulus projects (environments), such as production, sandbox, alpha,
|
|
etc, it is recommended to use just one as the GCR project. This way the same
|
|
proxy images are deployed to each environment, and what is running in production
|
|
is the same image tested in sandbox before.
|
|
|
|
The following document outlines the procedure to setup the proxy for one
|
|
environment.
|
|
|
|
In the proxy project, create a GCS bucket to store the terraform state file:
|
|
|
|
```bash
|
|
$ gcloud auth login # only if you haven't run gcloud before.
|
|
$ gcloud storage buckets create gs://<bucket-name>/ --project <proxy-project>
|
|
```
|
|
|
|
### Obtain a domain and SSL certificate
|
|
|
|
The proxy exposes one endpoint: `epp.<yourdomain.tld>`. The base domain
|
|
`<yourdomain.tld>` needs to be obtained from a registrar (RIP to
|
|
[Google Domains](https://domains.google/)). Nomulus operators can also
|
|
self-allocate a domain in the TLDs under management.
|
|
|
|
[EPP protocol over TCP](https://tools.ietf.org/html/rfc5734) requires a
|
|
client-authenticated SSL connection. The operator of the proxy needs to obtain
|
|
an SSL certificate for domain `epp.<yourdomain.tld>`.
|
|
[Let's Encrypt](https://letsencrypt.org) offers SSL certificate free of charge,
|
|
but any other CA can fill the role.
|
|
|
|
Concatenate the certificate and its private key into one file:
|
|
|
|
```bash
|
|
$ cat <certificate.pem> <private.key> > <combined_secret.pem>
|
|
```
|
|
|
|
The order between the certificate and the private key inside the combined file
|
|
does not matter. However, if the certificate file is chained, i.e. it contains
|
|
not only the certificate for your domain, but also certificates from
|
|
intermediate CAs, these certificates must appear in order. The previous
|
|
certificate's issuer must be the next certificate's subject.
|
|
|
|
The certificate will be encrypted by KMS and uploaded to a GCS bucket. The
|
|
bucket will be created automatically by terraform.
|
|
|
|
### Setup proxy project
|
|
|
|
First setup the
|
|
[Application Default Credential](https://cloud.google.com/docs/authentication/production)
|
|
locally:
|
|
|
|
```bash
|
|
$ gcloud auth application-default login
|
|
```
|
|
|
|
Login with the account that has "Project Owner" role of all three projects
|
|
mentioned above.
|
|
|
|
Navigate to `proxy/terraform`, create a folder called `envs`, and inside it,
|
|
create a folder for the environment that proxy is deployed to ("alpha" for
|
|
example). Copy `example_config.tf` and `outputs.tf` to the environment folder.
|
|
|
|
```bash
|
|
$ cd proxy/terraform
|
|
$ mkdir -p envs/alpha
|
|
$ cp example_config.tf envs/alpha/config.tf
|
|
$ cp outputs.tf envs/alpha
|
|
```
|
|
|
|
Now go to the environment folder, edit the `config.tf` file and replace
|
|
placeholders with actual project and domain names.
|
|
|
|
Run terraform:
|
|
|
|
```bash
|
|
$ cd envs/alpha
|
|
(edit config.tf)
|
|
$ terraform init -upgrade
|
|
$ terraform apply
|
|
```
|
|
|
|
Go over the proposed changes, and answer "yes". Terraform will start configuring
|
|
the projects, including setting up clusters, keyrings, load balancer, etc. This
|
|
takes a couple of minutes.
|
|
|
|
### Setup Nomulus
|
|
|
|
After terraform completes, it outputs some information, among which is the email
|
|
address of the service account created for the proxy. This needs to be added to
|
|
the Nomulus configuration file so that Nomulus accepts traffic from the proxy.
|
|
Edit the following section in
|
|
`core/src/main/java/google/registry/config/files/nomulus-config-<env>.yaml` and
|
|
redeploy Nomulus:
|
|
|
|
```yaml
|
|
auth:
|
|
allowedServiceAccountEmails:
|
|
- <email address>
|
|
```
|
|
|
|
### Setup nameservers
|
|
|
|
The terraform output (run `terraform output` in the environment folder to show
|
|
it again) also shows the nameservers of the proxy domain (`<yourdomain.tld>`).
|
|
Delegate this domain to these nameservers (through your registrar). If the
|
|
domain is self-allocated by Nomulus, run:
|
|
|
|
```bash
|
|
$ nomulus -e production update_domain <yourdomain.tld> \
|
|
-c <registrar_client_name> -n <nameserver1>,<nameserver2>,...
|
|
```
|
|
|
|
### Setup named ports
|
|
|
|
Unfortunately, terraform currently cannot add named ports on the instance groups
|
|
of the GKE clusters it manages.
|
|
[Named ports](https://cloud.google.com/compute/docs/load-balancing/http/backend-service#named_ports)
|
|
are needed for the load balancer it sets up to route traffic to the proxy. To
|
|
set named ports, in the environment folder, do:
|
|
|
|
```bash
|
|
$ bash ../../update_named_ports.sh
|
|
```
|
|
|
|
### Encrypt the certificate to Cloud KMS
|
|
|
|
With the newly set up Cloud KMS key, encrypt the certificate/key combo file
|
|
created earlier:
|
|
|
|
```bash
|
|
$ gcloud kms encrypt --plaintext-file <combined_secret.pem> \
|
|
--ciphertext-file - --key <key-name> --keyring <keyring-name> --location \
|
|
global | base64 > <combined_secret.pem.enc>
|
|
```
|
|
|
|
This encrypted file is then uploaded to a GCS bucket specified in the
|
|
`config.tf` file.
|
|
|
|
```bash
|
|
$ gcloud storage cp <combined_secret.pem.enc> gs://<your-certificate-bucket>
|
|
```
|
|
|
|
### Edit proxy config file
|
|
|
|
Proxy configuration files are at
|
|
`proxy/src/main/java/google/registry/proxy/config/`. There is a default config
|
|
that provides most values needed to run the proxy, and several
|
|
environment-specific configs for proxy instances that communicate to different
|
|
Nomulus environments. The values specified in the environment-specific file
|
|
override those in the default file.
|
|
|
|
The values that need to be changed include the project name, the Nomulus
|
|
endpoint, encrypted certificate/key combo filename and the GCS bucket it is
|
|
stored in, Cloud KMS keyring and key names, etc. Refer to the default file for
|
|
detailed descriptions on each field.
|
|
|
|
### Upload proxy docker image to GCR
|
|
|
|
The GKE deployment manifest is set up to pull the proxy docker image from
|
|
[Google Container Registry](https://cloud.google.com/container-registry/) (GCR).
|
|
Instead of using `docker` and `gcloud` to build and push images, respectively,
|
|
we provide `gradle` rules for the same tasks. To push an image, first use
|
|
[`docker-credential-gcr`](https://github.com/GoogleCloudPlatform/docker-credential-gcr)
|
|
to obtain necessary credentials. It is used by the Gradle to push the image.
|
|
|
|
After credentials are configured, verify that Gradle will use the proper
|
|
`gcpProject` for deployment in the main `build.gradle` file. We recommend using
|
|
the same project and image for proxies intended for different Nomulus
|
|
environments, this way one can deploy the same proxy image first to sandbox for
|
|
testing, and then to production.
|
|
|
|
To push to GCR, run:
|
|
|
|
```bash
|
|
$ ./gradlew proxy:pushProxyImage
|
|
```
|
|
|
|
If the GCP project to host images (gcr project) is different from the project
|
|
that the proxy runs in (proxy project), give the service account "Storage Object
|
|
Viewer" role of the gcr project.
|
|
|
|
```bash
|
|
$ gcloud projects add-iam-policy-binding <image-project> \
|
|
--member serviceAccount:<service-account-email> \
|
|
--role roles/storage.objectViewer
|
|
```
|
|
|
|
### Deploy proxy
|
|
|
|
Terraform by default creates three clusters, in the Americas, EMEA, and APAC,
|
|
respectively. We will have to deploy to each cluster separately. The cluster
|
|
information is shown by `terraform output` as well.
|
|
|
|
Deployment is defined in two files, `proxy-deployment-<env>.yaml` and
|
|
`proxy-service.yaml`. Edit `proxy-deployment-<env>.yaml` for your environment,
|
|
fill in the GCR project name and image name. You can also change the arguments
|
|
in the file to turn on logging, for example. To deploy to a cluster:
|
|
|
|
```bash
|
|
# Get credentials to deploy to a cluster.
|
|
$ gcloud container clusters get-credentials --project <proxy-project> \
|
|
--zone <cluster-zone> <cluster-name>
|
|
|
|
# Deploys environment specific kubernetes objects.
|
|
$ kubectl create -f \
|
|
proxy/kubernetes/proxy-deployment-<env>.yaml
|
|
|
|
# Deploys shared kubernetes objects.
|
|
$ kubectl create -f \
|
|
proxy/kubernetes/proxy-service.yaml
|
|
```
|
|
|
|
Repeat this for all three clusters.
|
|
|
|
### Afterwork
|
|
|
|
Remember to turn on
|
|
[Stackdriver Monitoring](https://cloud.google.com/monitoring/docs/) for the
|
|
proxy project as we use it to collect metrics from the proxy.
|
|
|
|
You are done! The proxy should be running now. You should store the private key
|
|
safely, or delete it as you now have the encrypted file shipped with the proxy.
|
|
See "Additional Steps" in the appendix for other things to check.
|
|
|
|
## Appendix
|
|
|
|
Here we give detailed instructions on how to configure a GCP project to host the
|
|
proxy manually. We strongly recommend against doing so because it is tedious and
|
|
error-prone. Using Terraform is much easier. The following instructions are for
|
|
educational purpose for readers to understand why we set up the infrastructure
|
|
this way. The Terraform config is essentially a translation of the following
|
|
procedure.
|
|
|
|
### Set default project
|
|
|
|
The proxy can run on its own GCP project, or use the existing project that also
|
|
hosts Nomulus. We recommend initializing the
|
|
[`gcloud`](https://cloud.google.com/sdk/gcloud/) config to use that project as
|
|
default, as it avoids having to provide the `--project` flag for every `gcloud`
|
|
command:
|
|
|
|
```bash
|
|
$ gcloud init
|
|
```
|
|
|
|
Follow the prompt and choose the project you want to deploy the proxy to. You
|
|
can skip picking default region and zones, as we will explicitly create clusters
|
|
in multiple zones to provide geographical redundancy.
|
|
|
|
### Create service account
|
|
|
|
The proxy will run with the credential of a
|
|
[service account](https://cloud.google.com/compute/docs/access/service-accounts).
|
|
In theory, it can take advantage of
|
|
[Application Default Credentials](https://cloud.google.com/docs/authentication/production)
|
|
and use the service account that the GCE instance underpinning the GKE cluster
|
|
uses, but we recommend creating a separate service account. With a dedicated
|
|
service account, one can grant permissions only necessary to the proxy. To
|
|
create a service account:
|
|
|
|
```bash
|
|
$ gcloud iam service-accounts create proxy-service-account \
|
|
--display-name "Service account for Nomulus proxy"
|
|
```
|
|
|
|
Generate a `.json` key file for the newly created service account. The key file
|
|
contains the secret necessary to construct credentials of the service account
|
|
and needs to be stored safely (it should be deleted later).
|
|
|
|
```bash
|
|
$ gcloud iam service-accounts keys create proxy-key.json --iam-account \
|
|
<service-account-email>
|
|
```
|
|
|
|
A `proxy-key.json` file will be created inside the current working directory.
|
|
|
|
The service account email address needs to be added to the Nomulus configuration
|
|
file so that Nomulus accepts the OAuth tokens generated for this service
|
|
account. Add its value to
|
|
`core/src/main/java/google/registry/config/files/nomulus-config-<env>.yaml`:
|
|
|
|
```yaml
|
|
auth:
|
|
allowedServiceAccountEmails:
|
|
- <email address>
|
|
```
|
|
|
|
Redeploy Nomulus for the change to take effect.
|
|
|
|
Also bind the "Logs Writer" and role to the proxy service account so that it can
|
|
write logs to [Stackdriver Logging](https://cloud.google.com/logging/).
|
|
|
|
```bash
|
|
$ gcloud projects add-iam-policy-binding <project-id> \
|
|
--member serviceAccount:<service-accounte-email> \
|
|
--role roles/logging.logWriter
|
|
```
|
|
|
|
### Create keyring and encrypt the certificate/private key
|
|
|
|
The proxy needs access to both the private key and the certificate. Do *not*
|
|
package them directly with the proxy. Instead, use
|
|
[Cloud KMS](https://cloud.google.com/kms/) to encrypt them, ship the encrypted
|
|
file with the proxy, and call Cloud KMS to decrypt them on the fly. (If you want
|
|
to use another keyring solution, you will have to modify the proxy and implement
|
|
yours)
|
|
|
|
Concatenate the private key file with the certificate. It does not matter which
|
|
file is appended to which. However, if the certificate file is a chained `.pem`
|
|
file, make sure that the certificates appear in order, i. e. the issuer of one
|
|
certificate is the subject of the next certificate:
|
|
|
|
```bash
|
|
$ cat <private-key.key> <chained-certificates.pem> >> ssl-cert-key.pem
|
|
```
|
|
|
|
Create a keyring and a key in Cloud KMS, and use the key to encrypt the combined
|
|
file:
|
|
|
|
```bash
|
|
# create keyring
|
|
$ gcloud kms keyrings create <keyring-name> --location global
|
|
|
|
# create key
|
|
$ gcloud kms keys create <key-name> --purpose encryption --location global \
|
|
--keyring <keyring-name>
|
|
|
|
# encryption using the key
|
|
$ gcloud kms encrypt --plaintext-file ssl-cert-key.pem \
|
|
--ciphertext-file ssl-cert-key.pem.enc \
|
|
--key <key-name> --keyring <keyring-name> --location global
|
|
```
|
|
|
|
A file named `ssl-cert-key.pem.enc` will be created. Upload it to a GCS bucket
|
|
in the proxy project. To create a bucket and upload the file:
|
|
|
|
```bash
|
|
$ gcloud storage buckets create gs://<bucket-name> --project <proxy-project>
|
|
$ gcloud storage cp ssl-cert-key.pem.enc gs://<bucket-name>
|
|
```
|
|
|
|
The proxy service account needs the "Cloud KMS CryptoKey Decrypter" role to
|
|
decrypt the file using Cloud KMS:
|
|
|
|
```bash
|
|
$ gcloud projects add-iam-policy-binding <project-id> \
|
|
--member serviceAccount:<service-accounte-email> \
|
|
--role roles/cloudkms.cryptoKeyDecrypter
|
|
```
|
|
|
|
The service account also needs the "Storage Object Viewer" role to retrieve the
|
|
encrypted file from GCS:
|
|
|
|
```bash
|
|
$ gcloud storage buckets add-iam-policy-binding gs://<bucket-name> \
|
|
--member=serviceAccount:<service-account-email> \
|
|
--role=roles/storage.objectViewer
|
|
```
|
|
|
|
### Proxy configuration
|
|
|
|
Proxy configuration files are at
|
|
`proxy/src/main/java/google/registry/proxy/config/`. There is a default config
|
|
that provides most values needed to run the proxy, and several
|
|
environment-specific configs for proxy instances that communicate to different
|
|
Nomulus environments. The values specified in the environment-specific file
|
|
override those in the default file.
|
|
|
|
The values that need to be changed include the project name, the Nomulus
|
|
endpoint, encrypted certificate/key combo filename (`ssl-cert-key.pem` in the
|
|
above example), Cloud KMS keyring and key names, etc. Refer to the default file
|
|
for detailed descriptions on each field.
|
|
|
|
### Setup Stackdriver for the project
|
|
|
|
The proxy streams metrics to
|
|
[Stackdriver](https://cloud.google.com/stackdriver/). Refer to
|
|
[Stackdriver Monitoring](https://cloud.google.com/monitoring/docs/)
|
|
documentation on how to enable monitoring on the GCP project.
|
|
|
|
The proxy service account needs to have
|
|
["Monitoring Metric Writer"](https://cloud.google.com/monitoring/access-control#predefined_roles)
|
|
role in order to stream metrics to Stackdriver:
|
|
|
|
```bash
|
|
$ gcloud projects add-iam-policy-binding <project-id> \
|
|
--member serviceAccount:<service-account-email> --role roles/monitoring.metricWriter
|
|
```
|
|
|
|
### Create GKE clusters
|
|
|
|
We recommend creating several clusters in different zones for better
|
|
geographical redundancy and better network performance. For example to have
|
|
clusters in the Americas, EMEA and APAC. It is also a good idea to enable
|
|
[autorepair](https://cloud.google.com/kubernetes-engine/docs/concepts/node-auto-repair),
|
|
[autoupgrade](https://cloud.google.com/kubernetes-engine/docs/concepts/node-auto-upgrades),
|
|
and
|
|
[autoscaling](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler)
|
|
on the clusters.
|
|
|
|
The default Kubernetes version on GKE is usually old, consider specifying a
|
|
newer version when creating the cluster, to save time upgrading the nodes
|
|
immediately after.
|
|
|
|
```bash
|
|
$ gcloud container clusters create proxy-americas-cluster --enable-autorepair \
|
|
--enable-autoupgrade --enable-autoscaling --max-nodes=3 --min-nodes=1 \
|
|
--zone=us-east1-c --tags=proxy-cluster \
|
|
--service-account=<service-account-email>
|
|
```
|
|
|
|
We give the GCE instances inside the cluster the same credential as the proxy
|
|
service account, which makes it easier to limit permissions granted to service
|
|
accounts. If we use the default GCE service account, we'd have to grant the
|
|
default GCE service account permission to read from GCR in order to download
|
|
images of the proxy to create pods, which gives *any* GCE instance with the
|
|
default service account that permission.
|
|
|
|
Note the `--tags` flag: it will apply the tag to all GCE instances running in
|
|
the cluster, making it easier to set up firewall rules later on. Use the same
|
|
tag for all clusters.
|
|
|
|
Repeat this for all the zones you want to create clusters in.
|
|
|
|
### Upload proxy service account key to GKE cluster
|
|
|
|
The kubernetes pods (containers) are configured to read the proxy service
|
|
account key file from a secret resource stored in the cluster.
|
|
|
|
First set the cluster credential in `gcloud` so that `kubectl` knows which
|
|
cluster to manage:
|
|
|
|
```bash
|
|
$ gcloud container clusters get-credentials proxy-americas-cluster \
|
|
--zone us-east1-c
|
|
```
|
|
|
|
To upload the key file as `service-account-key.json` as a secret named
|
|
`service-account`:
|
|
|
|
```bash
|
|
$ kubectl create secret generic service-account \
|
|
--from-file=service-account-key.json=<service-account-key.json>
|
|
```
|
|
|
|
More details on using service account on GKE can be found
|
|
[here](https://cloud.google.com/kubernetes-engine/docs/tutorials/authenticating-to-cloud-platform).
|
|
|
|
Repeat the same step for all clusters you want to deploy to. Use `gcloud` to
|
|
switch context, and then `kubectl` to upload the key.
|
|
|
|
### Deploy proxy to GKE clusters
|
|
|
|
Use `kubectl` to create the deployment and autoscale objects:
|
|
|
|
```bash
|
|
$ kubectl create -f \
|
|
proxy/kubernetes/proxy-deployment-alpha.yaml
|
|
```
|
|
|
|
The kubernetes
|
|
[deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)
|
|
object specifies the images to run, along with its parameters. The
|
|
[autoscale](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
|
|
object changes the number of pods running based on CPU load. This is different
|
|
from GKE cluster autoscaling, which changes the number of nodes (VMs) running
|
|
based on pod resource requests. Ideally if there's no load, just one pod will be
|
|
running in one cluster, resulting only one node running as well, saving
|
|
resources.
|
|
|
|
Repeat the same step for all clusters you want to deploy to.
|
|
|
|
### Expose the proxy service
|
|
|
|
The proxies running on GKE clusters need to be exposed to the outside. Do not
|
|
use Kubernetes
|
|
[`LoadBalancer`](https://kubernetes.io/docs/concepts/services-networking/service/#type-loadbalancer).
|
|
It will create a GCP
|
|
[Network Load Balancer](https://cloud.google.com/compute/docs/load-balancing/network/),
|
|
which has several problems:
|
|
|
|
- This load balancer does not terminate TCP connections. It simply acts as an
|
|
edge router that forwards IP packets to a "healthy" node in the cluster. As
|
|
such, it does not support IPv6, because GCE instances themselves are
|
|
currently IPv4 only.
|
|
- IP packets that arrived at the node may be routed to another node for
|
|
reasons of capacity and availability. In doing so it will
|
|
[SNAT](https://en.wikipedia.org/wiki/Network_address_translation#SNAT) the
|
|
packet, therefore losing the source IP information that the proxy needs. The
|
|
proxy uses source IP address to cap QPS and passes EPP source IP to Nomulus
|
|
for validation. Note that a TCP terminating load balancer also has this
|
|
problem as the source IP becomes that of the load balancer, but it can be
|
|
addressed in other ways (explained later). See
|
|
[here](https://kubernetes.io/docs/tutorials/services/source-ip/) for more
|
|
details on how Kubernetes route traffic and translate source IPs inside the
|
|
cluster.
|
|
- Acting as an edge router, this type of load balancer can only work with a
|
|
given region as each GCP region forms its own subnet. Therefore multiple
|
|
load balancers, and IP addresses are needed if the proxy were to run in
|
|
multiple regional clusters.
|
|
|
|
Instead, we split the task of exposing the proxy to the Internet into two tasks,
|
|
first to expose it within the cluster, then to expose the cluster to the outside
|
|
through a
|
|
[TCP Proxy Load Balancer](https://cloud.google.com/compute/docs/load-balancing/tcp-ssl/tcp-proxy).
|
|
This load balancer terminates TCP connections and allows for the use of a single
|
|
anycast IP address (IPv4 and IPv6) to reach any clusters connected to its
|
|
backend (it chooses a particular cluster based on geographical proximity). From
|
|
this point forward we will refer to this type of load balancer simply as the
|
|
load balancer.
|
|
|
|
#### Set up proxy NodePort service
|
|
|
|
Kubernetes pods and nodes are
|
|
[ephemeral](https://kubernetes.io/docs/concepts/services-networking/service/). A
|
|
pod may crash and be killed, and a new pod will be spun up by the master node to
|
|
fill its role. Similarly a node may be shut down due to under-utilization
|
|
(thanks to GKE autoscaling). In order to reliably route incoming traffic to the
|
|
proxy, a
|
|
[NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport)
|
|
service is used to expose the proxy on specificed port(s) on every running node
|
|
in the cluster, even if the proxy does not run on a VM (in which case the
|
|
traffic is routed to a VM that has the proxy running). With a [NodePort]
|
|
service, the load balancer can alway route traffic to any healthy node, and
|
|
kubernetes takes care of delivering that traffic to a servicing proxy pod.
|
|
|
|
To deploy the NodePort service:
|
|
|
|
```bash
|
|
$ kubectl create -f \
|
|
proxy/kubernetes/proxy-service.yaml
|
|
```
|
|
|
|
This service object will open up port 30000 (health check) and 30002 (EPP) on
|
|
the nodes, routing to the same ports inside a pod.
|
|
|
|
Repeat this for all clusters.
|
|
|
|
#### Map named ports in GCE instance groups
|
|
|
|
GKE uses GCE as its underlying infrastructure. A GKE cluster (or more precisely,
|
|
a node pool) corresponds to a GKE instance group. In order to receive traffic
|
|
from a load balancer backend, an instance group needs to designate the ports
|
|
that are to receive traffic, by giving them names (i. e. making them "named
|
|
ports").
|
|
|
|
As mentioned above, the Kubernetes `NodePort` service object sets up three ports
|
|
to receive traffic (30000, 30001 and 30002). Port 30000 is used by the health
|
|
check protocol (discussed later) and does not need to be explicitly named.
|
|
|
|
First obtain the instance group names for the clusters:
|
|
|
|
```bash
|
|
$ gcloud compute instance-groups list
|
|
```
|
|
|
|
They start with `gke` and have the cluster names in them, should be easy to
|
|
spot.
|
|
|
|
Then set the named ports:
|
|
|
|
```bash
|
|
$ gcloud compute instance-groups set-named-ports <instance-group> \
|
|
--named-ports epp:30002 --zone <zone>
|
|
```
|
|
|
|
Repeat this for each instance group (cluster).
|
|
|
|
#### Set up firewall rules to allow traffic from the load balancer
|
|
|
|
By default inbound traffic from the load balancer are dropped by the GCE
|
|
firewall. A new firewall rule needs to be added to explicitly allow TCP packets
|
|
originating from the load balancer to the three ports opened in the `NodePort`
|
|
service on the nodes.
|
|
|
|
```bash
|
|
$ gcloud compute firewall-rules create proxy-loadbalancer \
|
|
--source-ranges 130.211.0.0/22,35.191.0.0/16 \
|
|
--target-tags proxy-cluster \
|
|
--allow tcp:30000,tcp:30001,tcp:30002
|
|
```
|
|
|
|
The target tag controls what GCE VMs can receive traffic allowed in this rule.
|
|
It is the same tag used during cluster creation. Since we use the same tag for
|
|
all clusters, this rule applies to all VMs running the proxy. The load balancer
|
|
source IP is taken from
|
|
[here](https://cloud.google.com/compute/docs/load-balancing/tcp-ssl/tcp-proxy#config-hc-firewall)
|
|
|
|
#### Create health check
|
|
|
|
The load balancer sends TCP requests to a designated port on each backend VM to
|
|
probe if the VM is healthy to serve traffic. The proxy by default uses port
|
|
30000 (which is exposed as the same port on the node) for health check and
|
|
returns a pre-configured response (`HEALTH_CHECK_RESPONSE`) when an expected
|
|
request (`HEALTH_CHECK_REQUEST`) is received. To add health check:
|
|
|
|
```bash
|
|
$ gcloud compute health-checks create tcp proxy-health \
|
|
--description "Health check on port 30000 for Nomulus proxy" \
|
|
--port 30000 --request "HEALTH_CHECK_REQUEST" --response "HEALTH_CHECK_RESPONSE"
|
|
```
|
|
|
|
#### Create load balancer backend
|
|
|
|
The load balancer backend configures what instance groups the load balancer
|
|
sends packets to. We have already setup `NodePort` service on each node in all
|
|
the clusters to ensure that traffic to any of the exposed node ports will be
|
|
routed to the corresponding port on a proxy pod. The backend service codifies
|
|
which ports on the node's clusters should receive traffic from the load
|
|
balancer.
|
|
|
|
Create a backend service for EPP:
|
|
|
|
```bash
|
|
# EPP backend
|
|
$ gcloud compute backend-services create proxy-epp-loadbalancer \
|
|
--global --protocol TCP --health-checks proxy-health --timeout 1h \
|
|
--port-name epp
|
|
|
|
```
|
|
|
|
This ackend service routes packets to the EPP named port on any instance group
|
|
attached to it.
|
|
|
|
Then add (attach) instance groups that the proxies run on the backend service:
|
|
|
|
```bash
|
|
# EPP backend
|
|
$ gcloud compute backend-services add-backend proxy-epp-loadbalancer \
|
|
--global --instance-group <instance-group> --instance-group-zone <zone> \
|
|
--balancing-mode UTILIZATION --max-utilization 0.8
|
|
```
|
|
|
|
Repeat this for each instance group.
|
|
|
|
#### Reserve static IP addresses for the load balancer frontend
|
|
|
|
These are the public IP addresses that receive all outside traffic. We need one
|
|
address for IPv4 and one for IPv6:
|
|
|
|
```bash
|
|
# IPv4
|
|
$ gcloud compute addresses create proxy-ipv4 \
|
|
--description "Global static anycast IPv4 address for Nomulus proxy" \
|
|
--ip-version IPV4 --global
|
|
|
|
# IPv6
|
|
$ gcloud compute addresses create proxy-ipv6 \
|
|
--description "Global static anycast IPv6 address for Nomulus proxy" \
|
|
--ip-version IPV6 --global
|
|
```
|
|
|
|
To check the IP addresses obtained:
|
|
|
|
```bash
|
|
$ gcloud compute addresses describe proxy-ipv4 --global
|
|
$ gcloud compute addresses describe proxy-ipv6 --global
|
|
```
|
|
|
|
Set these IP addresses as the A/AAAA records for epp.<nic.tld>where <nic.tld> is
|
|
the domain that was obtained earlier. (If you use
|
|
[Cloud DNS](https://cloud.google.com/dns/) as your DNS provider, this step can
|
|
also be performed by `gcloud`)
|
|
|
|
#### Create load balancer frontend
|
|
|
|
The frontend receives traffic from the Internet and routes it to the backend
|
|
service.
|
|
|
|
First create a TCP proxy (yes, it is confusing, this GCP resource is called
|
|
"proxy" as well) which is a TCP termination point. Outside connections terminate
|
|
on a TCP proxy, which establishes its own connection to the backend services
|
|
defined above. As such, the source IP address from the outside is lost. But the
|
|
TCP proxy can add the
|
|
[PROXY protocol header](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt)
|
|
at the beginning of the connection to the backend. The proxy running on the
|
|
backend can parse the header and obtain the original source IP address of a
|
|
request.
|
|
|
|
```bash
|
|
# EPP
|
|
$ gcloud compute target-tcp-proxies create proxy-epp-proxy \
|
|
--backend-service proxy-epp-loadbalancer --proxy-header PROXY_V1
|
|
```
|
|
|
|
Note the use of the `--proxy-header` flag, which turns on the PROXY protocol
|
|
header.
|
|
|
|
Next, create the forwarding rule that route outside traffic to a given IP to the
|
|
TCP proxy just created:
|
|
|
|
```bash
|
|
$ gcloud compute forwarding-rules create proxy-epp-ipv4 \
|
|
--global --target-tcp-proxy proxy-epp-proxy \
|
|
--address proxy-ipv4 --ports 700
|
|
```
|
|
|
|
The above command sets up a forwarding rule that routes traffic destined to the
|
|
static IPv4 address reserved earlier, on port 700 (actual port for EPP), to the
|
|
TCP proxy that connects to the EPP backend service.
|
|
|
|
Repeat the above command to set up IPv6 forwarding for EPP
|
|
|
|
## Additional steps
|
|
|
|
### Check logs and metrics
|
|
|
|
The proxy saves logs to
|
|
[Stackdriver Logging](https://cloud.google.com/logging/), which is the same
|
|
place that Nomulus saves it logs to. On GCP console, navigate to Logging -
|
|
Logs - GKE Container - <cluster name> - default. Do not choose "All
|
|
namespace_id" as it includes logs from the Kubernetes system itself and can be
|
|
quite overwhelming.
|
|
|
|
Metrics are stored in
|
|
[Stackdriver Monitoring](https://cloud.google.com/monitoring/docs/). To view the
|
|
metrics, go to Stackdriver [console](https://app.google.stackdriver.com) (also
|
|
accessible from GCE console under Monitoring), navigate to Resources - Metrics
|
|
Explorer. Choose resource type "GKE Container" and search for metrics with name
|
|
"/proxy/" in it. Currently available metrics include total connection counts,
|
|
active connection count, request/response count, request/response size,
|
|
round-trip latency and quota rejection count.
|
|
|
|
### Cleanup sensitive files
|
|
|
|
Delete the service account key file and the SSL certificate private key, or
|
|
store them in some secure location.
|