1
0
mirror of https://github.com/google/nomulus synced 2026-03-05 02:05:04 +00:00

Compare commits

...

2 Commits

Author SHA1 Message Date
gbrodman
72016b1e5f Update more of the documentation (#2974)
We should be at least at a "good enough" state after this -- I'm sure
there are many updates we could make that would improve the
documentation but this is definitely much improved from before and
should hopefully be good enough to get people started.
2026-03-03 20:25:30 +00:00
gbrodman
25fcef8a5b Fix typo in a command (#2973) 2026-03-02 18:15:44 +00:00
7 changed files with 463 additions and 600 deletions

View File

@@ -59,8 +59,6 @@ Nomulus has the following capabilities:
implementation that works with BIND. If you are using Google Cloud DNS, you
may need to understand its capabilities and provide your own
multi-[AS](https://en.wikipedia.org/wiki/Autonomous_system_\(Internet\)) solution.
* **[WHOIS](https://en.wikipedia.org/wiki/WHOIS)**: A text-based protocol that
returns ownership and contact information on registered domain names.
* **[Registration Data Access Protocol
(RDAP)](https://en.wikipedia.org/wiki/Registration_Data_Access_Protocol)**:
A JSON API that returns structured, machine-readable information about

View File

@@ -65,7 +65,7 @@ public class BulkDomainTransferCommand extends ConfirmingCommand implements Comm
@Parameter(
names = {"-d", "--domain_names_file"},
description = "A file with a list of newline-delimited domain names to create tokens for")
description = "A file with a list of newline-delimited domain names to transfer")
private String domainNamesFile;
@Parameter(
@@ -82,7 +82,7 @@ public class BulkDomainTransferCommand extends ConfirmingCommand implements Comm
@Parameter(
names = {"--reason"},
description = "Reason to transfer the domains",
description = "Reason to transfer the domains, possibly a bug number",
required = true)
private String reason;

View File

@@ -1,153 +1,97 @@
# Architecture
This document contains information on the overall architecture of Nomulus on
[Google Cloud Platform](https://cloud.google.com/). It covers the App Engine
architecture as well as other Cloud Platform services used by Nomulus.
[Google Cloud Platform](https://cloud.google.com/).
## App Engine
Nomulus was originally built for App Engine, but the modern architecture now
uses Google Kubernetes Engine (GKE) for better flexibility and control over
networking, running as a series of Java-based microservices within GKE pods.
[Google App Engine](https://cloud.google.com/appengine/) is a cloud computing
platform that runs web applications in the form of servlets. Nomulus consists of
Java servlets that process web requests. These servlets use other features
provided by App Engine, including task queues and cron jobs, as explained
below.
In addition, because GKE (and standard HTTP load balancers) typically handle
HTTP(s) traffic, Nomulus uses a custom proxy to handle raw TCP traffic required
for EPP (Port 700). This proxy can run as a GKE sidecar or a standalone cluster.
For more information on the proxy, see [the proxy setup guide](proxy-setup.md).
### Services
### Workloads
Nomulus contains three [App Engine
services](https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine),
which were previously called modules in earlier versions of App Engine. The
services are: default (also called front-end), backend, and tools. Each service
runs independently in a lot of ways, including that they can be upgraded
individually, their log outputs are separate, and their servers and configured
scaling are separate as well.
Nomulus contains four Kubernetes
[workloads](https://kubernetes.io/docs/concepts/workloads/). Each workload is
fairly independent as one would expect, including scaling.
Once you have your app deployed and running, the default service can be accessed
at `https://project-id.appspot.com`, substituting whatever your App Engine app
is named for "project-id". Note that that is the URL for the production instance
of your app; other environments will have the environment name appended with a
hyphen in the hostname, e.g. `https://project-id-sandbox.appspot.com`.
The four workloads are referred to as `frontend`, `backend`, `console`, and
`pubapi`.
The URL for the backend service is `https://backend-dot-project-id.appspot.com`
and the URL for the tools service is `https://tools-dot-project-id.appspot.com`.
The reason that the dot is escaped rather than forming subdomains is because the
SSL certificate for `appspot.com` is only valid for `*.appspot.com` (no double
wild-cards).
Each workload's URL is created by prefixing the name of the workload to the base
domain, e.g. `https://pubapi.mydomain.example`. Requests to each workload are
all handled by the
[RegistryServlet](https://github.com/google/nomulus/blob/master/core/src/main/java/google/registry/module/RegistryServlet.java)
#### Default service
#### Frontend workload
The default service is responsible for all registrar-facing
The frontend workload is responsible for all registrar-facing
[EPP](https://en.wikipedia.org/wiki/Extensible_Provisioning_Protocol) command
traffic, all user-facing WHOIS and RDAP traffic, and the admin and registrar web
consoles, and is thus the most important service. If the service has any
problems and goes down or stops servicing requests in a timely manner, it will
begin to impact users immediately. Requests to the default service are handled
by the `FrontendServlet`, which provides all of the endpoints exposed in
`FrontendRequestComponent`.
traffic. If the workload has any problems or goes down, it will begin to impact
users immediately.
#### Backend service
#### PubApi workload
The backend service is responsible for executing all regularly scheduled
background tasks (using cron) as well as all asynchronous tasks. Requests to the
backend service are handled by the `BackendServlet`, which provides all of the
endpoints exposed in `BackendRequestComponent`. These include tasks for
generating/exporting RDE, syncing the trademark list from TMDB, exporting
backups, writing out DNS updates, handling asynchronous contact and host
deletions, writing out commit logs, exporting metrics to BigQuery, and many
more. Issues in the backend service will not immediately be apparent to end
users, but the longer it is down, the more obvious it will become that
user-visible tasks such as DNS and deletion are not being handled in a timely
manner.
The PubApi (Public API) workload is responsible for all public traffic to the
registry. In practice, this primarily consists of RDAP traffic. This is split
into a separate workload so that public users (without authentication) will have
a harder time impacting intra-registry or registrar-registry actions.
The backend service is also where scheduled and automatically invoked MapReduces
run, which includes some of the aforementioned tasks such as RDE and
asynchronous resource deletion. Consequently, the backend service should be
sized to support not just the normal ongoing DNS load but also the load incurred
by MapReduces, both scheduled (such as RDE) and on-demand (asynchronous
contact/host deletion).
#### Backend workload
#### BSA service
The backend workload is responsible for executing all regularly scheduled
background tasks (using cron) as well as all asynchronous tasks. These include
tasks for generating/exporting RDE, syncing the trademark list from TMDB,
exporting backups, writing out DNS updates, syncing BSA data,
generating/exporting ICANN activity data, and many more. Issues in the backend
workload will not immediately be apparent to end users, but the longer it is
down, the more obvious it will become that user-visible tasks such as DNS and
deletion are not being handled in a timely manner.
The bsa service is responsible for business logic behind Nomulus and BSA
functionality. Requests to the backend service are handled by the `BsaServlet`,
which provides all of the endpoints exposed in `BsaRequestComponent`. These
include tasks for downloading, processing and uploading BSA data.
The backend workload is also where scheduled and automatically-invoked BEAM
pipelines run, which includes some of the aforementioned tasks such as RDE.
Consequently, the backend workload should be sized to support not just the
normal ongoing DNS load but also the load incurred by BEAM pipelines, both
scheduled (such as RDE) and on-demand (started by registry employees).
The backend workload also supports handling of manually-performed actions using
the `nomulus` command-line tool, which provides administrative-level
functionality for developers and tech support employees of the registry.
#### Tools service
### Cloud Tasks queues
The tools service is responsible for servicing requests from the `nomulus`
command line tool, which provides administrative-level functionality for
developers and tech support employees of the registry. It is thus the least
critical of the three services. Requests to the tools service are handled by the
`ToolsServlet`, which provides all of the endpoints exposed in
`ToolsRequestComponent`. Some example functionality that this service provides
includes the server-side code to update premium lists, run EPP commands from the
tool, and manually modify contacts/hosts/domains/and other resources. Problems
with the tools service are not visible to users.
The tools service also runs ad-hoc MapReduces, like those invoked via `nomulus`
tool subcommands like `generate_zone_files` and by manually hitting URLs under
https://tools-dot-project-id.appspot.com, like
`/_dr/task/refreshDnsForAllDomains`.
### Task queues
App Engine [task
queues](https://cloud.google.com/appengine/docs/java/taskqueue/) provide an
GCP's [Cloud Tasks](https://docs.cloud.google.com/tasks/docs) provides an
asynchronous way to enqueue tasks and then execute them on some kind of
schedule. There are two types of queues, push queues and pull queues. Tasks in
push queues are always executing up to some throttlable limit. Tasks in pull
queues remain there until the queue is polled by code that is running for some
other reason. Essentially, push queues run their own tasks while pull queues
just enqueue data that is used by something else. Many other parts of App Engine
are implemented using task queues. For example, [App Engine
cron](https://cloud.google.com/appengine/docs/java/config/cron) adds tasks to
push queues at regularly scheduled intervals, and the [MapReduce
framework](https://cloud.google.com/appengine/docs/java/dataprocessing/) adds
tasks for each phase of the MapReduce algorithm.
schedule. Task queues are essential because by nature, GKE architecture does not
support long-running background processes, and so queues are thus the
fundamental building block that allows asynchronous and background execution of
code that is not in response to incoming web requests.
Nomulus uses a particular pattern of paired push/pull queues that is worth
explaining in detail. Push queues are essential because App Engine's
architecture does not support long-running background processes, and so push
queues are thus the fundamental building block that allows asynchronous and
background execution of code that is not in response to incoming web requests.
However, they also have limitations in that they do not allow batch processing
or grouping. That's where the pull queue comes in. Regularly scheduled tasks in
the push queue will, upon execution, poll the corresponding pull queue for a
specified number of tasks and execute them in a batch. This allows the code to
execute in the background while taking advantage of batch processing.
The task queues used by Nomulus are configured in the `cloud-tasks-queue.xml`
file. Note that many push queues have a direct one-to-one correspondence with
entries in `cloud-scheduler-tasks-ENVIRONMENT.xml` because they need to be
fanned-out on a per-TLD or other basis (see the Cron section below for more
explanation). The exact queue that a given cron task will use is passed as the
query string parameter "queue" in the url specification for the cron task.
The task queues used by Nomulus are configured in the `cloud-tasks-queue.xml`
file. Note that many push queues have a direct one-to-one correspondence with
entries in `cloud-scheduler-tasks.xml` because they need to be fanned-out on a
per-TLD or other basis (see the Cron section below for more explanation).
The exact queue that a given cron task will use is passed as the query string
parameter "queue" in the url specification for the cron task.
Here are the task queues in use by the system. All are push queues unless
explicitly marked as otherwise.
Here are the task queues in use by the system:
* `brda` -- Queue for tasks to upload weekly Bulk Registration Data Access
(BRDA) files to a location where they are available to ICANN. The
`RdeStagingReducer` (part of the RDE MapReduce) creates these tasks at the
end of generating an RDE dump.
* `dns-pull` -- A pull queue to enqueue DNS modifications. Cron regularly runs
`ReadDnsQueueAction`, which drains the queue, batches modifications by TLD,
and writes the batches to `dns-publish` to be published to the configured
`DnsWriter` for the TLD.
(BRDA) files to a location where they are available to ICANN. The RDE
pipeline creates these tasks at the end of generating an RDE dump.
* `dns-publish` -- Queue for batches of DNS updates to be pushed to DNS
writers.
* `lordn-claims` and `lordn-sunrise` -- Pull queues for handling LORDN
exports. Tasks are enqueued synchronously during EPP commands depending on
whether the domain name in question has a claims notice ID.
* `dns-refresh` -- Queues for reading and fanning out DNS refresh requests,
using the `DnsRefreshRequest` SQL table as the source of data
* `marksdb` -- Queue for tasks to verify that an upload to NORDN was
successfully received and verified. These tasks are enqueued by
`NordnUploadAction` following an upload and are executed by
`NordnVerifyAction`.
* `nordn` -- Cron queue used for NORDN exporting. Tasks are executed by
`NordnUploadAction`, which pulls LORDN data from the `lordn-claims` and
`lordn-sunrise` pull queues (above).
`NordnUploadAction`
* `rde-report` -- Queue for tasks to upload RDE reports to ICANN following
successful upload of full RDE files to the escrow provider. Tasks are
enqueued by `RdeUploadAction` and executed by `RdeReportAction`.
@@ -157,28 +101,25 @@ explicitly marked as otherwise.
* `retryable-cron-tasks` -- Catch-all cron queue for various cron tasks that
run infrequently, such as exporting reserved terms.
* `sheet` -- Queue for tasks to sync registrar updates to a Google Sheets
spreadsheet. Tasks are enqueued by `RegistrarServlet` when changes are made
to registrar fields and are executed by `SyncRegistrarsSheetAction`.
spreadsheet, done by `SyncRegistrarsSheetAction`.
### Cron jobs
### Scheduled cron jobs
Nomulus uses App Engine [cron
jobs](https://cloud.google.com/appengine/docs/java/config/cron) to run periodic
scheduled actions. These actions run as frequently as once per minute (in the
case of syncing DNS updates) or as infrequently as once per month (in the case
of RDE exports). Cron tasks are specified in `cron.xml` files, with one per
environment. There are more tasks that run in Production than in other
environments because tasks like uploading RDE dumps are only done for the live
system. Cron tasks execute on the `backend` service.
Nomulus uses [Cloud Scheduler](https://docs.cloud.google.com/scheduler/docs) to
run periodic scheduled actions. These actions run as frequently as once per
minute (in the case of syncing DNS updates) or as infrequently as once per month
(in the case of RDE exports). Cron tasks are specified in
`cloud-scheduler-tasks-{ENVIRONMENT}.xml` files, with one per environment. There
are more tasks that run in Production than in other environments because tasks
like uploading RDE dumps are only done for the live system.
Most cron tasks use the `TldFanoutAction` which is accessed via the
`/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on
the backend service, fans out a given cron task for each TLD that exists in the
registry system, using the queue that is specified in the `cron.xml` entry.
Because some tasks may be computationally intensive and could risk spiking
system latency if all start executing immediately at the same time, there is a
`jitterSeconds` parameter that spreads out tasks over the given number of
seconds. This is used with DNS updates and commit log deletion.
`/_dr/cron/fanout` URL path. This action fans out a given cron task for each TLD
that exists in the registry system, using the queue that is specified in the XML
entry. Because some tasks may be computationally intensive and could risk
spiking system latency if all start executing immediately at the same time,
there is a `jitterSeconds` parameter that spreads out tasks over the given
number of seconds. This is used with DNS updates and commit log deletion.
The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to
@@ -192,8 +133,7 @@ tasks retry in the face of transient errors.
The full list of URL parameters to `TldFanoutAction` that can be specified in
cron.xml is:
* `endpoint` -- The path of the action that should be executed (see
`web.xml`).
* `endpoint` -- The path of the action that should be executed
* `queue` -- The cron queue to enqueue tasks in.
* `forEachRealTld` -- Specifies that the task should be run in each TLD of
type `REAL`. This can be combined with `forEachTestTld`.
@@ -218,14 +158,14 @@ Each environment is thus completely independent.
The different environments are specified in `RegistryEnvironment`. Most
correspond to a separate App Engine app except for `UNITTEST` and `LOCAL`, which
by their nature do not use real environments running in the cloud. The
recommended naming scheme for the App Engine apps that has the best possible
compatibility with the codebase and thus requires the least configuration is to
pick a name for the production app and then suffix it for the other
environments. E.g., if the production app is to be named 'registry-platform',
then the sandbox app would be named 'registry-platform-sandbox'.
recommended project naming scheme that has the best possible compatibility with
the codebase and thus requires the least configuration is to pick a name for the
production app and then suffix it for the other environments. E.g., if the
production app is to be named 'registry-platform', then the sandbox app would be
named 'registry-platform-sandbox'.
The full list of environments supported out-of-the-box, in descending order from
real to not, is:
real to not-real, is:
* `PRODUCTION` -- The real production environment that is actually running
live TLDs. Since Nomulus is a shared registry platform, there need only ever
@@ -270,28 +210,28 @@ of experience running a production registry using this codebase.
## Cloud SQL
To be filled.
Nomulus uses [GCP Cloud SQL](https://cloud.google.com/sql) (Postgres) to store
information. For more information, see the
[DB project README file.](../db/README.md)
## Cloud Storage buckets
Nomulus uses [Cloud Storage](https://cloud.google.com/storage/) for bulk storage
of large flat files that aren't suitable for Cloud SQL. These files include
backups, RDE exports, and reports. Each bucket name must be unique across all of
Google Cloud Storage, so we use the common recommended pattern of prefixing all
buckets with the name of the App Engine app (which is itself globally unique).
Most of the bucket names are configurable, but the defaults are as follows, with
PROJECT standing in as a placeholder for the App Engine app name:
of large flat files that aren't suitable for SQL. These files include backups,
RDE exports, and reports. Each bucket name must be unique across all of Google
Cloud Storage, so we use the common recommended pattern of prefixing all buckets
with the name of the project (which is itself globally unique). Most of the
bucket names are configurable, but the most important / relevant defaults are:
* `PROJECT-billing` -- Monthly invoice files for each registrar.
* `PROJECT-commits` -- Daily exports of commit logs that are needed for
potentially performing a restore.
* `PROJECT-bsa` -- BSA data and output
* `PROJECT-domain-lists` -- Daily exports of all registered domain names per
TLD.
* `PROJECT-gcs-logs` -- This bucket is used at Google to store the GCS access
logs and storage data. This bucket is not required by the Registry system,
but can provide useful logging information. For instructions on setup, see
the [Cloud Storage
documentation](https://cloud.google.com/storage/docs/access-logs).
the
[Cloud Storage documentation](https://cloud.google.com/storage/docs/access-logs).
* `PROJECT-icann-brda` -- This bucket contains the weekly ICANN BRDA files.
There is no lifecycle expiration; we keep a history of all the files. This
bucket must exist for the BRDA process to function.
@@ -301,9 +241,3 @@ PROJECT standing in as a placeholder for the App Engine app name:
regularly uploaded to the escrow provider. Lifecycle is set to 90 days. The
bucket must exist.
* `PROJECT-reporting` -- Contains monthly ICANN reporting files.
* `PROJECT.appspot.com` -- Temporary MapReduce files are stored here. By
default, the App Engine MapReduce library places its temporary files in a
bucket named {project}.appspot.com. This bucket must exist. To keep
temporary files from building up, a 90-day or 180-day lifecycle should be
applied to the bucket, depending on how long you want to be able to go back
and debug MapReduce problems.

View File

@@ -3,54 +3,46 @@
This document contains information on the overall structure of the code, and how
particularly important pieces of the system are implemented.
## Bazel build system
## Gradle build system
[Bazel](https://www.bazel.io/) is used to build and test the Nomulus codebase.
[Gradle](https://gradle.org/) is used to build and test the Nomulus codebase.
Bazel builds are described using [BUILD
files](https://www.bazel.io/versions/master/docs/build-ref.html). A directory
containing a BUILD file defines a package consisting of all files and
directories underneath it, except those directories which themselves also
contain BUILD files. A package contains targets. Most targets in the codebase
are of the type `java_library`, which generates `JAR` files, or `java_test`,
which runs tests.
Nomulus, for the most part, uses fairly standard Gradle task naming for building
and running tests, with the various tasks defined in various `build.gradle`
files.
The key to Bazel's ability to create reproducible builds is the requirement that
each build target must declare its direct dependencies. Each of those
dependencies is a target, which, in turn, must also declare its dependencies.
This recursive description of a target's dependencies forms an acyclic graph
that fully describes the targets which must be built in order to build any
target in the graph.
Dependencies and their version restrictions are defined in the
`dependencies.gradle` file. Within each subproject's `build.gradle` file, the
actual dependencies used by that subproject are listed along with the type of
dependency (e.g. implementation, testImplementation). Versions of each
dependency are locked to avoid frequent dependency churn, with the locked
versions stored in the various `gradle.lockfile` files. To update these
versions, run any Gradle command (e.g. `./gradlew build`) with the
`--write-locks` argument.
A wrinkle in this system is managing external dependencies. Bazel was designed
first and foremost to manage builds where all code lives in a single source
repository and is compiled from `HEAD`. In order to mesh with other build and
packaging schemes, such as libraries distributed as compiled `JAR`s, Bazel
supports [external target
declarations](https://www.bazel.io/versions/master/docs/external.html#transitive-dependencies).
The Nomulus codebase uses external targets pulled in from Maven Central, these
are declared in `java/google/registry/repositories.bzl`. The dependencies of
these external targets are not managed by Bazel; you must manually add all of
the dependencies or use the
[generate_workspace](https://docs.bazel.build/versions/master/generate-workspace.html)
tool to do it.
### Generating WAR archives for deployment
### Generating EAR/WAR archives for deployment
The `jetty` project is the main entry point for building the Nomulus WAR files,
and one can use the `war` gradle task to build the base WAR file. The various
deployment/release files use Docker to deploy this, in a system that is too
Google-specialized to replicate directly here.
There are special build target types for generating `WAR` and `EAR` files for
deploying Nomulus to GAE. These targets, `zip_file` and `registry_ear_file` respectively, are used in `java/google/registry/BUILD`. To generate archives suitable for deployment on GAE:
## Subprojects
```shell
$ bazel build java/google/registry:registry_ear
...
bazel-genfiles/java/google/registry/registry.ear
INFO: Elapsed time: 0.216s, Critical Path: 0.00s
# This will also generate the per-module WAR files:
$ ls bazel-genfiles/java/google/registry/*.war
bazel-genfiles/java/google/registry/registry_backend.war
bazel-genfiles/java/google/registry/registry_default.war
bazel-genfiles/java/google/registry/registry_tools.war
```
Within the Nomulus repository there are a few notable subprojects:
* `util` contains tools that don't depend on any of our other code, e.g.
libraries or raw utilities
* `db` contains database-related code, managing the schema and
deployment/testing of the database.
* `integration` contains tests to make sure that schema rollouts won't break
Nomulus, that code versions and schema versions are cross-compatible
* `console-webapp` contains the Typescript/HTML/CSS/Angular code for the
registrar console frontend
* `proxy` contains code for the EPP proxy, which relays port 700 requests to
the core EPP services
* `core` contains the bulk of the core Nomulus code, including request
handling+serving, backend, actions, etc
## Cursors
@@ -72,8 +64,8 @@ The following cursor types are defined:
* **`RDE_UPLOAD`** - RDE (thick) escrow deposit upload
* **`RDE_UPLOAD_SFTP`** - Cursor that tracks the last time we talked to the
escrow provider's SFTP server for a given TLD.
* **`RECURRING_BILLING`** - Expansion of `BillingRecurrence` (renew) billing events
into one-time `BillingEvent`s.
* **`RECURRING_BILLING`** - Expansion of `BillingRecurrence` (renew) billing
events into one-time `BillingEvent`s.
* **`SYNC_REGISTRAR_SHEET`** - Tracks the last time the registrar spreadsheet
was successfully synced.
@@ -82,16 +74,9 @@ next timestamp at which an operation should resume processing and a `CursorType`
that identifies which operation the cursor is associated with. In many cases,
there are multiple cursors per operation; for instance, the cursors related to
RDE reporting, staging, and upload are per-TLD cursors. To accomplish this, each
`Cursor` also has a scope, a `Key<ImmutableObject>` to which the particular
cursor applies (this can be e.g. a `Registry` or any other `ImmutableObject` in
the database, depending on the operation). If the `Cursor` applies to the entire
registry environment, it is considered a global cursor and has a scope of
`EntityGroupRoot.getCrossTldKey()`.
Cursors are singleton entities by type and scope. The id for a `Cursor` is a
deterministic string that consists of the websafe string of the Key of the scope
object concatenated with the name of the name of the cursor type, separated by
an underscore.
`Cursor` also has a scope, a string to which the particular cursor applies (this
can be anything, but in practice is either a TLD or `GLOBAL` for cross-TLD
cursors. Cursors are singleton entities by type and scope.
## Guava
@@ -101,8 +86,7 @@ idiomatic, well-tested, and performant add-ons to the JDK. There are several
libraries in particular that you should familiarize yourself with, as they are
used extensively throughout the codebase:
* [Immutable
Collections](https://github.com/google/guava/wiki/ImmutableCollectionsExplained):
* [Immutable Collections](https://github.com/google/guava/wiki/ImmutableCollectionsExplained):
Immutable collections are a useful defensive programming technique. When an
Immutable collection type is used as a parameter type, it immediately
indicates that the given collection will not be modified in the method.
@@ -144,11 +128,10 @@ as follows:
* `Domain` ([RFC 5731](https://tools.ietf.org/html/rfc5731))
* `Host` ([RFC 5732](https://tools.ietf.org/html/rfc5732))
* `Contact` ([RFC 5733](https://tools.ietf.org/html/rfc5733))
All `EppResource` entities use a Repository Object Identifier (ROID) as its
unique id, in the format specified by [RFC
5730](https://tools.ietf.org/html/rfc5730#section-2.8) and defined in
unique id, in the format specified by
[RFC 5730](https://tools.ietf.org/html/rfc5730#section-2.8) and defined in
`EppResourceUtils.createRoid()`.
Each entity also tracks a number of timestamps related to its lifecycle (in
@@ -164,12 +147,9 @@ the status of a resource at a given point in time.
## Foreign key indexes
Foreign key indexes provide a means of loading active instances of `EppResource`
objects by their unique IDs:
* `Domain`: fully-qualified domain name
* `Contact`: contact id
* `Host`: fully-qualified host name
`Domain` and `Host` each are foreign-keyed, meaning we often wish to query them
by their foreign keys (fully-qualified domain name and fully-qualified host
name, respectively).
Since all `EppResource` entities are indexed on ROID (which is also unique, but
not as useful as the resource's name), the `ForeignKeyUtils` provides a way to
@@ -203,10 +183,9 @@ events that are recorded as history entries, including:
The full list is captured in the `HistoryEntry.Type` enum.
Each `HistoryEntry` has a parent `Key<EppResource>`, the EPP resource that was
mutated by the event. A `HistoryEntry` will also contain the complete EPP XML
command that initiated the mutation, stored as a byte array to be agnostic of
encoding.
Each `HistoryEntry` has a reference to a singular EPP resource that was mutated
by the event. A `HistoryEntry` will also contain the complete EPP XML command
that initiated the mutation, stored as a byte array to be agnostic of encoding.
A `HistoryEntry` also captures other event metadata, such as the `DateTime` of
the change, whether the change was created by a superuser, and the ID of the
@@ -215,9 +194,9 @@ registrar that sent the command.
## Poll messages
Poll messages are the mechanism by which EPP handles asynchronous communication
between the registry and registrars. Refer to [RFC 5730 Section
2.9.2.3](https://tools.ietf.org/html/rfc5730#section-2.9.2.3) for their protocol
specification.
between the registry and registrars. Refer to
[RFC 5730 Section 2.9.2.3](https://tools.ietf.org/html/rfc5730#section-2.9.2.3)
for their protocol specification.
Poll messages are stored by the system as entities in the database. All poll
messages have an event time at which they become active; any poll request before
@@ -245,8 +224,9 @@ poll messages are ACKed (and thus deleted) in `PollAckFlow`.
## Billing events
Billing events capture all events in a domain's lifecycle for which a registrar
will be charged. A `BillingEvent` will be created for the following reasons (the
full list of which is represented by `BillingEvent.Reason`):
will be charged. A one-time `BillingEvent` will (or can) be created for the
following reasons (the full list of which is represented by
`BillingBase.Reason`):
* Domain creates
* Domain renewals
@@ -254,19 +234,19 @@ full list of which is represented by `BillingEvent.Reason`):
* Server status changes
* Domain transfers
A `BillingBase` can also contain one or more `BillingBase.Flag` flags that
provide additional metadata about the billing event (e.g. the application phase
during which the domain was applied for).
All `BillingBase` entities contain a parent `VKey<HistoryEntry>` to identify the
mutation that spawned the `BillingBase`.
There are 4 types of billing events, all of which extend the abstract
`BillingBase` base class:
* **`BillingEvent`**, a one-time billing event.
* **`BillingRecurrence`**, a recurring billing event (used for events such as domain
renewals).
* **`BillingCancellation`**, which represents the cancellation of either a `OneTime`
or `BillingRecurrence` billing event. This is implemented as a distinct event to
preserve the immutability of billing events.
* **`BillingRecurrence`**, a recurring billing event (used for events such as
domain renewals).
* **`BillingCancellation`**, which represents the cancellation of either a
`BillingEvent` or `BillingRecurrence` billing event. This is implemented as
a distinct event to preserve the immutability of billing events.
A `BillingBase` can also contain one or more `BillingBase.Flag` flags that
provide additional metadata about the billing event (e.g. the application phase
during which the domain was applied for).
All `BillingBase` entities contain reference to a given ROID (`EppResource`
reference) to identify the mutation that spawned the `BillingBase`.

View File

@@ -2,10 +2,11 @@
There are multiple different kinds of configuration that go into getting a
working registry system up and running. Broadly speaking, configuration works in
two ways -- globally, for the entire sytem, and per-TLD. Global configuration is
managed by editing code and deploying a new version, whereas per-TLD
configuration is data that lives in the database in `Tld` entities, and is
updated by running `nomulus` commands without having to deploy a new version.
two ways -- globally, for the entire system, and per-TLD. Global configuration
is managed by editing code and deploying a new version, whereas per-TLD
configuration is data that lives in the database in `Tld` entities, and
[is updated](operational-procedures/modifying-tlds.md) without having to deploy
a new version.
## Initial configuration
@@ -23,40 +24,14 @@ Before getting into the details of configuration, it's important to note that a
lot of configuration is environment-dependent. It is common to see `switch`
statements that operate on the current `RegistryEnvironment`, and return
different values for different environments. This is especially pronounced in
the `UNITTEST` and `LOCAL` environments, which don't run on App Engine at all.
As an example, some timeouts may be long in production and short in unit tests.
the `UNITTEST` and `LOCAL` environments, which don't run on GCP at all. As an
example, some timeouts may be long in production and short in unit tests.
See the [Architecture documentation](./architecture.md) for more details on
environments as used by Nomulus.
## App Engine configuration
App Engine configuration isn't covered in depth in this document as it is
thoroughly documented in the [App Engine configuration docs][app-engine-config].
The main files of note that come pre-configured in Nomulus are:
* `cron.xml` -- Configuration of cronjobs
* `web.xml` -- Configuration of URL paths on the webserver
* `appengine-web.xml` -- Overall App Engine settings including number and type
of instances
* `cloud-scheduler-tasks.xml` -- Configuration of Cloud Scheduler Tasks
* * `cloud-tasks-queue.xml` -- Configuration of Cloud Tasks Queue
* `application.xml` -- Configuration of the application name and its services
Cron, web, and queue are covered in more detail in the "App Engine architecture"
doc, and the rest are covered in the general App Engine documentation.
If you are not writing new code to implement custom features, is unlikely that
you will need to make any modifications beyond simple changes to
`application.xml` and `appengine-web.xml`. If you are writing new features, it's
likely you'll need to add cronjobs, URL paths, and task queues, and thus edit
those associated XML files.
The existing codebase is configured for running a full-scale registry with
multiple TLDs. In order to deploy to App Engine, you will either need to
[increase your quota](https://cloud.google.com/compute/quotas#requesting_additional_quota)
to allow for at least 100 running instances or reduce `max-instances` in the
backend `appengine-web.xml` files to 25 or less.
TODO: documentation about how to set up GKE and what config points are necessary
to modify there
## Global configuration
@@ -65,9 +40,9 @@ deployed in the app. The full list of config options and their default values
can be found in the [`default-config.yaml`][default-config] file. If you wish to
change any of these values, do not edit this file. Instead, edit the environment
configuration file named
`google/registry/config/files/nomulus-config-ENVIRONMENT.yaml`, overriding only
the options you wish to change. Nomulus ships with blank placeholders for all
standard environments.
`core/src/main/java/google/registry/config/files/nomulus-config-ENVIRONMENT.yaml`,
overriding only the options you wish to change. Nomulus ships with blank
placeholders for all standard environments.
You will not need to change most of the default settings. Here is the subset of
settings that you will need to change for all deployed environments, including
@@ -75,52 +50,65 @@ development environments. See [`default-config.yaml`][default-config] for a full
description of each option:
```yaml
appEngine:
projectId: # Your App Engine project ID
toolsServiceUrl: https://tools-dot-PROJECT-ID.appspot.com # Insert your project ID
isLocal: false # Causes saved credentials to be used.
gcpProject:
projectId: # Your GCP project ID
projectIdNumber: # The corresponding ID number, found on the home page
locationId: # e.g. us-central1
isLocal: false # Causes saved credentials to be used
baseDomain: # the base domain from which the registry will be served, e.g. registry.google
gSuite:
domainName: # Your G Suite domain name
adminAccountEmailAddress: # An admin login for your G Suite account
domainName: # Your GSuit domain name, likely same as baseDomain above
adminAccountEmailAddress: # An admin login for your GSuite account
auth:
allowedServiceAccountEmails:
- # a list of service account emails given access to Nomulus
oauthClientId: # the client ID of the Identity-Aware Proxy
cloudSql:
jdbcUrl: # path to the Postgres server
```
For fully-featured production environments that need the full range of features
(e.g. RDE, correct contact information on the registrar console, etc.) you will
need to specify more settings.
need to specify *many* more settings.
From a code perspective, all configuration settings ultimately come through the
[`RegistryConfig`][registry-config] class. This includes a Dagger module called
`ConfigModule` that provides injectable configuration options. While most
configuration options can be changed from within the yaml config file, certain
derived options may still need to be overriden by changing the code in this
derived options may still need to be overridden by changing the code in this
module.
## OAuth 2 client id configuration
## OAuth 2 client ID configuration
The open source Nomulus release uses OAuth 2 to authenticate and authorize
users. This includes the `nomulus` tool when it connects to the system to
execute commands. OAuth must be configured before you can use the `nomulus` tool
to set up the system.
Nomulus uses OAuth 2 to authenticate and authorize users. This includes the
`nomulus` [command-line tool](admin-tool.md) when it connects to the system to
execute commands as well as the
[Identity-Aware Proxy](https://pantheon.corp.google.com/security/iap) used to
authenticate standard requests. OAuth must be configured before you can use
either system.
OAuth defines the concept of a *client id*, which identifies the application
OAuth defines the concept of a *client ID*, which identifies the application
which the user wants to authorize. This is so that, when a user clicks in an
OAuth permission dialog and grants access to data, they are not granting access
to every application on their computer (including potentially malicious ones),
but only to the application which they agree needs access. Each environment of
the Nomulus system should have its own client id. Multiple installations of the
`nomulus` tool application can share the same client id for the same
environment.
the Nomulus system should have its own pair of client IDs. Multiple
installations of the `nomulus` tool application can share the same client ID for
the same environment.
There are three steps to configuration.
For the Nomulus tool OAuth configuration, do the following steps:
* **Create the client id in App Engine:** Go to your project's
* **Create the registry tool client ID in GCP:** Go to your project's
["Credentials" page](https://console.developers.google.com/apis/credentials)
in the Developer's Console. Click "Create credentials" and select "OAuth
client ID" from the dropdown. In the create credentials window, select an
application type of "Desktop app". After creating the client id, copy the
client id and client secret which are displayed in the popup window. You may
also obtain this information by downloading the json file for the client id.
application type of "Desktop app". After creating the client ID, copy the
client ID and client secret which are displayed in the popup window. You may
also obtain this information by downloading the JSON file for the client ID
* **Copy the client secret information to the config file:** The *client
secret file* contains both the client ID and the client secret. Copy the
@@ -129,18 +117,21 @@ There are three steps to configuration.
`registryTool` section. This will make the `nomulus` tool use this
credential to authenticate itself to the system.
* **Add the new client id to the configured list of allowed client ids:** The
configuration files include an `oAuth` section, which defines a parameter
called `allowedOauthClientIds`, specifying a list of client ids which are
permitted to connect. Add the client ID to the list. You will need to
rebuild and redeploy the project so that the configuration changes take
effect.
For IAP configuration, do the following steps: * **Create the IAP client ID:**
Follow similar steps from above to create an additional OAuth client ID, but
using an application type of "Web application". Note the client ID and secret. *
**Enable IAP for your HTTPS load balancer:** On the
[IAP page](https://pantheon.corp.google.com/security/iap), enable IAP for all of
the backend services that all use the same HTTPS load balancer. * **Use a custom
OAuth configuration:** For the backend services, under the "Settings" section
(in the three-dot menu) enable custom OAuth and insert the client ID and secret
that we just created * **Save the client ID:** In the configuration file, save
the client ID as `oauthClientId` in the `auth` section
Once these steps are taken, the `nomulus` tool will use a client id which the
server is configured to accept, and authentication should succeed. Note that
many Nomulus commands also require that the user have App Engine admin
privileges, meaning that the user needs to be added as an owner or viewer of the
App Engine project.
Once these steps are taken, the `nomulus` tool and IAP will both use client IDs
which the server is configured to accept, and authentication should succeed.
Note that many Nomulus commands also require that the user have GCP admin
privileges on the project in question.
## Sensitive global configuration
@@ -151,8 +142,8 @@ control mishap. We use a secret store to persist these values in a secure
manner, which is backed by the GCP Secret Manager.
The `Keyring` interface contains methods for all sensitive configuration values,
which are primarily credentials used to access various ICANN and ICANN-
affiliated services (such as RDE). These values are only needed for real
which are primarily credentials used to access various ICANN and
ICANN-affiliated services (such as RDE). These values are only needed for real
production registries and PDT environments. If you are just playing around with
the platform at first, it is OK to put off defining these values until
necessary. This allows the codebase to start and run, but of course any actions
@@ -169,16 +160,16 @@ ${KEY_NAME}`.
## Per-TLD configuration
`Tld` entities, which are persisted to the database, are used for per-TLD
configuration. They contain any kind of configuration that is specific to a TLD,
such as the create/renew price of a domain name, the pricing engine
implementation, the DNS writer implementation, whether escrow exports are
enabled, the default currency, the reserved label lists, and more. The `nomulus
update_tld` command is used to set all of these options. See the
[admin tool documentation](./admin-tool.md) for more information, as well as the
command-line help for the `update_tld` command. Unlike global configuration
above, per-TLD configuration options are stored as data in the running system,
and thus do not require code pushes to update.
`Tld` entities, which are persisted to the database and stored in YAML files,
are used for per-TLD configuration. They contain any kind of configuration that
is specific to a TLD, such as the create/renew price of a domain name, the
pricing engine implementation, the DNS writer implementation, whether escrow
exports are enabled, the default currency, the reserved label lists, and more.
To create or update TLDs, we use
[YAML files](operational-procedures/modifying-tlds.md) and the `nomulus
configure_tld` command. Because the TLDs are stored as data in the running
system, they do not require code pushes to update.
[app-engine-config]: https://cloud.google.com/appengine/docs/java/configuration-files
[default-config]: https://github.com/google/nomulus/blob/master/java/google/registry/config/files/default-config.yaml
@@ -242,7 +233,7 @@ connectionName: your-project:us-central1:nomulus
Use the `update_keyring_secret` command to update the `SQL_PRIMARY_CONN_NAME`
key with the connection name. If you have created a read-replica, update the
`SQL_REPLICA_CONN_NAME` key with the replica's connection time.
`SQL_REPLICA_CONN_NAME` key with the replica's connection name.
### Installing the Schema
@@ -334,6 +325,17 @@ $ gcloud sql connect nomulus --user=nomulus
From this, you should have a postgres prompt and be able to enter the "GRANT"
command specified above.
### Replication and Backups
We highly recommend creating a read-only replica of the database and using the
previously-mentioned `SQL_REPLICA_CONN_NAME` value in the keyring to the name of
that replica. By doing so, you can remove some load from the primary database.
We also recommend enabling
[point-in-time recovery](https://docs.cloud.google.com/sql/docs/postgres/backup-recovery/pitr)
for the instance, just in case something bad happens and you need to restore
from a backup.
### Cloud SecretManager
You'll need to enable the SecretManager API in your project.

View File

@@ -6,48 +6,41 @@ This document covers the steps necessary to download, build, and deploy Nomulus.
You will need the following programs installed on your local machine:
* A recent version of the [Java 11 JDK][java-jdk11].
* [Google App Engine SDK for Java][app-engine-sdk], and configure aliases to the `gcloud` and `appcfg.sh` utilities (
you'll use them a lot).
* [Git](https://git-scm.com/) version control system.
* Docker (confirm with `docker info` no permission issues, use `sudo groupadd docker` for sudoless docker).
* Python version 3.7 or newer.
* gnupg2 (e.g. in run `sudo apt install gnupg2` in Debian-like Linuxes)
* A recent version of the [Java 21 JDK][java-jdk21].
* The [Google Cloud CLI](https://docs.cloud.google.com/sdk/docs/install-sdk)
(configure an alias to the `gcloud`utility, because you'll use it a lot)
* [Git](https://git-scm.com/) version control system.
* Docker (confirm with `docker info` no permission issues, use `sudo groupadd
docker` for sudoless docker).
* Python version 3.7 or newer.
* gnupg2 (e.g. in run `sudo apt install gnupg2` in Debian-like Linuxes)
**Note:** App Engine does not yet support Java 9. Also, the instructions in this
document have only been tested on Linux. They might work with some alterations
on other operating systems.
**Note:** The instructions in this document have only been tested on Linux. They
might work with some alterations on other operating systems.
## Download the codebase
Start off by using git to download the latest version from the [Nomulus GitHub
page](https://github.com/google/nomulus). You may checkout any of the daily
tagged versions (e.g. `nomulus-20200629-RC00`), but in general it is also
safe to simply checkout from HEAD:
Start off by using git to download the latest version from the
[Nomulus GitHub page](https://github.com/google/nomulus). You may check out any
of the daily tagged versions (e.g. `nomulus-20260101-RC00`), but in general it
is also safe to simply check out from HEAD:
```shell
$ git clone git@github.com:google/nomulus.git
Cloning into 'nomulus'...
[ .. snip .. ]
$ cd nomulus
$ ls
apiserving CONTRIBUTORS java LICENSE scripts
AUTHORS docs javascript python third_party
CONTRIBUTING.md google javatests README.md WORKSPACE
```
Most of the directory tree is organized into gradle sub-projects (see
`settings.gradle` for details). The following other top-level directories are
Most of the directory tree is organized into gradle subprojects (see
`settings.gradle` for details). The following other top-level directories are
also defined:
* `buildSrc` -- Gradle extensions specific to our local build and release
methodology.
* `config` -- Tools for build and code hygiene.
* `docs` -- The documentation (including this install guide)
* `gradle` -- Configuration and code managed by the gradle build system.
* `gradle` -- Configuration and code managed by the Gradle build system.
* `integration` -- Testing scripts for SQL changes.
* `java-format` -- The Google java formatter and wrapper scripts to use it
incrementally.
* `python` -- Some Python reporting scripts
* `release` -- Configuration for our continuous integration process.
## Build the codebase
@@ -56,34 +49,29 @@ The first step is to build the project, and verify that this completes
successfully. This will also download and install dependencies.
```shell
$ ./nom_build build
$ ./gradlew build
Starting a Gradle Daemon (subsequent builds will be faster)
Plugins: Using default repo...
> Configure project :buildSrc
Java dependencies: Using Maven central...
[ .. snip .. ]
```
The `nom_build` script is just a wrapper around `gradlew`. Its main
additional value is that it formalizes the various properties used in the
build as command-line flags.
The "build" command builds all the code and runs all the tests. This will take a
while.
The "build" command builds all of the code and runs all of the tests. This
will take a while.
## Create and configure a GCP project
## Create an App Engine project
First, [create an
application](https://cloud.google.com/appengine/docs/java/quickstart) on Google
Cloud Platform. Make sure to choose a good Project ID, as it will be used
repeatedly in a large number of places. If your company is named Acme, then a
good Project ID for your production environment would be "acme-registry". Keep
First,
[create an application](https://cloud.google.com/appengine/docs/java/quickstart)
on Google Cloud Platform. Make sure to choose a good Project ID, as it will be
used repeatedly in a large number of places. If your company is named Acme, then
a good Project ID for your production environment would be "acme-registry". Keep
in mind that project IDs for non-production environments should be suffixed with
the name of the environment (see the [Architecture
documentation](./architecture.md) for more details). For the purposes of this
example we'll deploy to the "alpha" environment, which is used for developer
testing. The Project ID will thus be `acme-registry-alpha`.
the name of the environment (see the
[Architecture documentation](./architecture.md) for more details). For the
purposes of this example we'll deploy to the "alpha" environment, which is used
for developer testing. The Project ID will thus be `acme-registry-alpha`.
Now log in using the command-line Google Cloud Platform SDK and set the default
project to be this one that was newly created:
@@ -96,6 +84,17 @@ You are now logged in as [user@email.tld].
$ gcloud config set project acme-registry-alpha
```
And make sure the required APIs are enabled in the project:
```shell
$ gcloud services enable \
container.googleapis.com \
artifactregistry.googleapis.com \
sqladmin.googleapis.com \
secretmanager.googleapis.com \
compute.googleapis.com
```
Now modify `projects.gradle` with the name of your new project:
<pre>
@@ -106,42 +105,51 @@ rootProject.ext.projects = ['production': 'your-production-project',
'crash' : 'your-crash-project']
</pre>
Next follow the steps in [configuration](./configuration.md) to configure the
complete system or, alternately, read on for an initial deploy in which case
you'll need to deploy again after configuration.
#### Create GKE Clusters
## Deploy the code to App Engine
We recommend Standard clusters with Workload Identity enabled to allow pods to
securely access Cloud SQL and Secret Manager. Feel free to adjust the numbers
and sizing as desired.
AppEngine deployment with gradle is straightforward:
```shell
$ gcloud container clusters create nomulus-cluster \
--region=$REGION \
--workload-pool=$PROJECT_ID.svc.id.goog \
--num-nodes=3 \
--enable-ip-alias
$ gcloud container clusters create proxy-cluster \
--region=$REGION \
--workload-pool=$PROJECT_ID.svc.id.goog \
--num-nodes=3 \
--enable-ip-alias
```
$ ./nom_build appengineDeploy --environment=alpha
Then create an artifact repository: `shell $ gcloud artifacts repositories
create nomulus-repo \ --repository-format=docker \ --location=$REGION \
--description="Nomulus Docker images"`
To verify successful deployment, visit
https://acme-registry-alpha.appspot.com/registrar in your browser (adjusting
appropriately for the project ID that you actually used). If the project
deployed successfully, you'll see a "You need permission" page indicating that
you need to configure the system and grant access to your Google account. It's
time to go to the next step, configuration.
See the files and documentation in the `release/` folder for more information on
the release process. You will likely need to customize the internal build
process for your own setup, including internal repository management, builds,
and where Nomulus is deployed.
Configuration is handled by editing code, rebuilding the project, and deploying
again. See the [configuration guide](./configuration.md) for more details.
Once you have completed basic configuration (including most critically the
project ID, client id and secret in your copy of the `nomulus-config-*.yaml`
files), you can rebuild and start using the `nomulus` tool to create test
entities in your newly deployed system. See the [first steps tutorial](./first-steps-tutorial.md)
again. See the [configuration guide](./configuration.md) for more details. Once
you have completed basic configuration (including most critically the project
ID, client id and secret in your copy of the `nomulus-config-*.yaml` files), you
can rebuild and start using the `nomulus` tool to create test entities in your
newly deployed system. See the [first steps tutorial](./first-steps-tutorial.md)
for more information.
[app-engine-sdk]: https://cloud.google.com/appengine/docs/java/download
[java-jdk11]: https://www.oracle.com/java/technologies/javase-downloads.html
[java-jdk21]: https://www.oracle.com/java/technologies/javase-downloads.html
## Deploy the BEAM Pipelines
## Deploy the Beam Pipelines
Nomulus is in the middle of migrating all pipelines to use flex-template. For
pipelines already based on flex-template, deployment in the testing environments
Deployment of the Beam pipelines to Cloud Dataflow in the testing environments
(alpha and crash) can be done using the following command:
```shell
./nom_build :core:stageBeamPipelines --environment=alpha
./gradlew :core:stageBeamPipelines -Penvironment=alpha
```
Pipeline deployment in other environments are through CloudBuild. Please refer

View File

@@ -2,22 +2,22 @@
This doc covers procedures to configure, build and deploy the
[Netty](https://netty.io)-based proxy onto [Kubernetes](https://kubernetes.io)
clusters. [Google Kubernetes
Engine](https://cloud.google.com/kubernetes-engine/) is used as deployment
target. Any kubernetes cluster should in theory work, but the user needs to
change some dependencies on other GCP features such as Cloud KMS for key
management and Stackdriver for monitoring.
clusters.
[Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) is used
as deployment target. Any kubernetes cluster should in theory work, but the user
needs to change some dependencies on other GCP features such as Cloud KMS for
key management and Stackdriver for monitoring.
## Overview
Nomulus runs on Google App Engine, which only supports HTTP(S) traffic. In order
to work with [EPP](https://tools.ietf.org/html/rfc5730.html) (TCP port 700) and
[WHOIS](https://tools.ietf.org/html/rfc3912) (TCP port 43), a proxy is needed to
relay traffic between clients and Nomulus and do protocol translation.
Nomulus runs on GKE, and natively only supports HTTP(S) traffic. In order to
work with [EPP](https://tools.ietf.org/html/rfc5730.html) (TCP port 700), a
proxy is needed to relay traffic between clients and Nomulus and do protocol
translation.
We provide a Netty-based proxy that runs as a standalone service (separate from
Nomulus) either on a VM or Kubernetes clusters. Deploying to kubernetes is
recommended as it provides automatic scaling and management for docker
Nomulus) either on a VM or Kubernetes clusters. Deploying to Kubernetes is
recommended as it provides automatic scaling and management for Docker
containers that alleviates much of the pain of running a production service.
The procedure described here can be used to set up a production environment, as
@@ -26,13 +26,13 @@ However, proper release management (cutting a release, rolling updates, canary
analysis, reliable rollback, etc) is not covered. The user is advised to use a
service like [Spinnaker](https://www.spinnaker.io/) for release management.
## Detailed Instruction
## Detailed Instructions
We use [`gcloud`](https://cloud.google.com/sdk/gcloud/) and
[`terraform`](https://terraform.io) to configure the proxy project on GCP and to create a GCS
bucket for storing the terraform state file. We use
[`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) to deploy
the proxy to the project. These instructions assume that all three tools are
[`terraform`](https://terraform.io) to configure the proxy project on GCP and to
create a GCS bucket for storing the terraform state file. We use
[`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) to deploy
the proxy to the project. These instructions assume that all three tools are
installed.
### Setup GCP project
@@ -41,9 +41,9 @@ There are three projects involved:
- Nomulus project: the project that hosts Nomulus.
- Proxy project: the project that hosts this proxy.
- GCR ([Google Container
Registry](https://cloud.google.com/container-registry/)) project: the
project from which the proxy pulls its Docker image.
- GCR
([Google Container Registry](https://cloud.google.com/container-registry/))
project: the project from which the proxy pulls its Docker image.
We recommend using the same project for Nomulus and the proxy, so that logs for
both are collected in the same place and easily accessible. If there are
@@ -64,16 +64,16 @@ $ gcloud storage buckets create gs://<bucket-name>/ --project <proxy-project>
### Obtain a domain and SSL certificate
The proxy exposes two endpoints, whois.\<yourdomain.tld\> and
epp.\<yourdomain.tld\>. The base domain \<yourdomain.tld\> needs to be obtained
from a registrar ([Google Domains](https://domains.google) for example). Nomulus
operators can also self-allocate a domain in the TLDs under management.
The proxy exposes one endpoint: `epp.<yourdomain.tld>`. The base domain
`<yourdomain.tld>` needs to be obtained from a registrar (RIP to
[Google Domains](https://domains.google/)). Nomulus operators can also
self-allocate a domain in the TLDs under management.
[EPP protocol over TCP](https://tools.ietf.org/html/rfc5734) requires a
client-authenticated SSL connection. The operator of the proxy needs to obtain
an SSL certificate for domain epp.\<yourdomain.tld\>. [Let's
Encrypt](https://letsencrypt.org) offers SSL certificate free of charge, but any
other CA can fill the role.
an SSL certificate for domain `epp.<yourdomain.tld>`.
[Let's Encrypt](https://letsencrypt.org) offers SSL certificate free of charge,
but any other CA can fill the role.
Concatenate the certificate and its private key into one file:
@@ -82,7 +82,7 @@ $ cat <certificate.pem> <private.key> > <combined_secret.pem>
```
The order between the certificate and the private key inside the combined file
does not matter. However, if the certificate file is chained, i. e. it contains
does not matter. However, if the certificate file is chained, i.e. it contains
not only the certificate for your domain, but also certificates from
intermediate CAs, these certificates must appear in order. The previous
certificate's issuer must be the next certificate's subject.
@@ -92,8 +92,9 @@ bucket will be created automatically by terraform.
### Setup proxy project
First setup the [Application Default
Credential](https://cloud.google.com/docs/authentication/production) locally:
First setup the
[Application Default Credential](https://cloud.google.com/docs/authentication/production)
locally:
```bash
$ gcloud auth application-default login
@@ -102,10 +103,9 @@ $ gcloud auth application-default login
Login with the account that has "Project Owner" role of all three projects
mentioned above.
Navigate to `proxy/terraform`, create a folder called
`envs`, and inside it, create a folder for the environment that proxy is
deployed to ("alpha" for example). Copy `example_config.tf` and `outputs.tf`
to the environment folder.
Navigate to `proxy/terraform`, create a folder called `envs`, and inside it,
create a folder for the environment that proxy is deployed to ("alpha" for
example). Copy `example_config.tf` and `outputs.tf` to the environment folder.
```bash
$ cd proxy/terraform
@@ -132,12 +132,12 @@ takes a couple of minutes.
### Setup Nomulus
After terraform completes, it outputs some information, among which is the
email address of the service account created for the proxy. This needs to be
added to the Nomulus configuration file so that Nomulus accepts traffic from the
proxy. Edit the following section in
`java/google/registry/config/files/nomulus-config-<env>.yaml` and redeploy
Nomulus:
After terraform completes, it outputs some information, among which is the email
address of the service account created for the proxy. This needs to be added to
the Nomulus configuration file so that Nomulus accepts traffic from the proxy.
Edit the following section in
`core/src/main/java/google/registry/config/files/nomulus-config-<env>.yaml` and
redeploy Nomulus:
```yaml
auth:
@@ -148,7 +148,7 @@ auth:
### Setup nameservers
The terraform output (run `terraform output` in the environment folder to show
it again) also shows the nameservers of the proxy domain (\<yourdomain.tld\>).
it again) also shows the nameservers of the proxy domain (`<yourdomain.tld>`).
Delegate this domain to these nameservers (through your registrar). If the
domain is self-allocated by Nomulus, run:
@@ -160,8 +160,8 @@ $ nomulus -e production update_domain <yourdomain.tld> \
### Setup named ports
Unfortunately, terraform currently cannot add named ports on the instance groups
of the GKE clusters it manages. [Named
ports](https://cloud.google.com/compute/docs/load-balancing/http/backend-service#named_ports)
of the GKE clusters it manages.
[Named ports](https://cloud.google.com/compute/docs/load-balancing/http/backend-service#named_ports)
are needed for the load balancer it sets up to route traffic to the proxy. To
set named ports, in the environment folder, do:
@@ -189,8 +189,9 @@ $ gcloud storage cp <combined_secret.pem.enc> gs://<your-certificate-bucket>
### Edit proxy config file
Proxy configuration files are at `java/google/registry/proxy/config/`. There is
a default config that provides most values needed to run the proxy, and several
Proxy configuration files are at
`proxy/src/main/java/google/registry/proxy/config/`. There is a default config
that provides most values needed to run the proxy, and several
environment-specific configs for proxy instances that communicate to different
Nomulus environments. The values specified in the environment-specific file
override those in the default file.
@@ -202,16 +203,33 @@ detailed descriptions on each field.
### Upload proxy docker image to GCR
Edit the `proxy_push` rule in `java/google/registry/proxy/BUILD` to add the GCR
project name and the image name to save to. Note that as currently set up, all
images pushed to GCR will be tagged `bazel` and the GKE deployment object loads
the image tagged as `bazel`. This is fine for testing, but for production one
should give images unique tags (also configured in the `proxy_push` rule).
The GKE deployment manifest is set up to pull the proxy docker image from
[Google Container Registry](https://cloud.google.com/container-registry/) (GCR).
Instead of using `docker` and `gcloud` to build and push images, respectively,
we provide `gradle` rules for the same tasks. To push an image, first use
[`docker-credential-gcr`](https://github.com/GoogleCloudPlatform/docker-credential-gcr)
to obtain necessary credentials. It is used by the Gradle to push the image.
After credentials are configured, verify that Gradle will use the proper
`gcpProject` for deployment in the main `build.gradle` file. We recommend using
the same project and image for proxies intended for different Nomulus
environments, this way one can deploy the same proxy image first to sandbox for
testing, and then to production.
To push to GCR, run:
```bash
$ bazel run java/google/registry/proxy:proxy_push
$ ./gradlew proxy:pushProxyImage
```
If the GCP project to host images (gcr project) is different from the project
that the proxy runs in (proxy project), give the service account "Storage Object
Viewer" role of the gcr project.
```bash
$ gcloud projects add-iam-policy-binding <image-project> \
--member serviceAccount:<service-account-email> \
--role roles/storage.objectViewer
```
### Deploy proxy
@@ -243,9 +261,9 @@ Repeat this for all three clusters.
### Afterwork
Remember to turn on [Stackdriver
Monitoring](https://cloud.google.com/monitoring/docs/) for the proxy project as
we use it to collect metrics from the proxy.
Remember to turn on
[Stackdriver Monitoring](https://cloud.google.com/monitoring/docs/) for the
proxy project as we use it to collect metrics from the proxy.
You are done! The proxy should be running now. You should store the private key
safely, or delete it as you now have the encrypted file shipped with the proxy.
@@ -278,14 +296,14 @@ in multiple zones to provide geographical redundancy.
### Create service account
The proxy will run with the credential of a [service
account](https://cloud.google.com/compute/docs/access/service-accounts). In
theory it can take advantage of [Application Default
Credentials](https://cloud.google.com/docs/authentication/production) and use
the service account that the GCE instance underpinning the GKE cluster uses, but
we recommend creating a separate service account. With a dedicated service
account, one can grant permissions only necessary to the proxy. To create a
service account:
The proxy will run with the credential of a
[service account](https://cloud.google.com/compute/docs/access/service-accounts).
In theory, it can take advantage of
[Application Default Credentials](https://cloud.google.com/docs/authentication/production)
and use the service account that the GCE instance underpinning the GKE cluster
uses, but we recommend creating a separate service account. With a dedicated
service account, one can grant permissions only necessary to the proxy. To
create a service account:
```bash
$ gcloud iam service-accounts create proxy-service-account \
@@ -303,10 +321,10 @@ $ gcloud iam service-accounts keys create proxy-key.json --iam-account \
A `proxy-key.json` file will be created inside the current working directory.
The service account email address needs to be added to the Nomulus
configuration file so that Nomulus accepts the OAuth tokens generated for this
service account. Add its value to
`java/google/registry/config/files/nomulus-config-<env>.yaml`:
The service account email address needs to be added to the Nomulus configuration
file so that Nomulus accepts the OAuth tokens generated for this service
account. Add its value to
`core/src/main/java/google/registry/config/files/nomulus-config-<env>.yaml`:
```yaml
auth:
@@ -325,27 +343,13 @@ $ gcloud projects add-iam-policy-binding <project-id> \
--role roles/logging.logWriter
```
### Obtain a domain and SSL certificate
A domain is needed (if you do not want to rely on IP addresses) for clients to
communicate to the proxy. Domains can be purchased from a domain registrar
([Google Domains](https://domains.google) for example). A Nomulus operator could
also consider self-allocating a domain under an owned TLD insteadl.
An SSL certificate is needed as [EPP over
TCP](https://tools.ietf.org/html/rfc5734) requires SSL. You can apply for an SSL
certificate for the domain name you intended to serve as EPP endpoint
(epp.nic.tld for example) for free from [Let's
Encrypt](https://letsencrypt.org). For now, you will need to manually renew your
certificate before it expires.
### Create keyring and encrypt the certificate/private key
The proxy needs access to both the private key and the certificate. Do *not*
package them directly with the proxy. Instead, use [Cloud
KMS](https://cloud.google.com/kms/) to encrypt them, ship the encrypted file
with the proxy, and call Cloud KMS to decrypt them on the fly. (If you want to
use another keyring solution, you will have to modify the proxy and implement
package them directly with the proxy. Instead, use
[Cloud KMS](https://cloud.google.com/kms/) to encrypt them, ship the encrypted
file with the proxy, and call Cloud KMS to decrypt them on the fly. (If you want
to use another keyring solution, you will have to modify the proxy and implement
yours)
Concatenate the private key file with the certificate. It does not matter which
@@ -378,7 +382,7 @@ A file named `ssl-cert-key.pem.enc` will be created. Upload it to a GCS bucket
in the proxy project. To create a bucket and upload the file:
```bash
$ gcloud storage buckets create gs://<bucket-name> --project <proxy-project>
$ gcloud storage buckets create gs://<bucket-name> --project <proxy-project>
$ gcloud storage cp ssl-cert-key.pem.enc gs://<bucket-name>
```
@@ -402,8 +406,9 @@ $ gcloud storage buckets add-iam-policy-binding gs://<bucket-name> \
### Proxy configuration
Proxy configuration files are at `java/google/registry/proxy/config/`. There is
a default config that provides most values needed to run the proxy, and several
Proxy configuration files are at
`proxy/src/main/java/google/registry/proxy/config/`. There is a default config
that provides most values needed to run the proxy, and several
environment-specific configs for proxy instances that communicate to different
Nomulus environments. The values specified in the environment-specific file
override those in the default file.
@@ -416,12 +421,12 @@ for detailed descriptions on each field.
### Setup Stackdriver for the project
The proxy streams metrics to
[Stackdriver](https://cloud.google.com/stackdriver/). Refer to [Stackdriver
Monitoring](https://cloud.google.com/monitoring/docs/) documentation on how to
enable monitoring on the GCP project.
[Stackdriver](https://cloud.google.com/stackdriver/). Refer to
[Stackdriver Monitoring](https://cloud.google.com/monitoring/docs/)
documentation on how to enable monitoring on the GCP project.
The proxy service account needs to have ["Monitoring Metric
Writer"](https://cloud.google.com/monitoring/access-control#predefined_roles)
The proxy service account needs to have
["Monitoring Metric Writer"](https://cloud.google.com/monitoring/access-control#predefined_roles)
role in order to stream metrics to Stackdriver:
```bash
@@ -464,44 +469,6 @@ tag for all clusters.
Repeat this for all the zones you want to create clusters in.
### Upload proxy docker image to GCR
The GKE deployment manifest is set up to pull the proxy docker image from
[Google Container Registry](https://cloud.google.com/container-registry/) (GCR).
Instead of using `docker` and `gcloud` to build and push images, respectively,
we provide `bazel` rules for the same tasks. To push an image, first use
[`docker-credential-gcr`](https://github.com/GoogleCloudPlatform/docker-credential-gcr)
to obtain necessary credentials. It is used by the [bazel container_push
rules](https://github.com/bazelbuild/rules_docker#authentication) to push the
image.
After credentials are configured, edit the `proxy_push` rule in
`java/google/registry/proxy/BUILD` to add the GCP project name and the image
name to save to. We recommend using the same project and image for proxies
intended for different Nomulus environments, this way one can deploy the same
proxy image first to sandbox for testing, and then to production.
Also note that as currently set up, all images pushed to GCR will be tagged
`bazel` and the GKE deployment object loads the image tagged as `bazel`. This is
fine for testing, but for production one should give images unique tags (also
configured in the `proxy_push` rule).
To push to GCR, run:
```bash
$ bazel run java/google/registry/proxy:proxy_push
```
If the GCP project to host images (gcr project) is different from the project
that the proxy runs in (proxy project), give the service account "Storage Object
Viewer" role of the gcr project.
```bash
$ gcloud projects add-iam-policy-binding <image-project> \
--member serviceAccount:<service-account-email> \
--role roles/storage.objectViewer
```
### Upload proxy service account key to GKE cluster
The kubernetes pods (containers) are configured to read the proxy service
@@ -555,22 +522,22 @@ Repeat the same step for all clusters you want to deploy to.
The proxies running on GKE clusters need to be exposed to the outside. Do not
use Kubernetes
[`LoadBalancer`](https://kubernetes.io/docs/concepts/services-networking/service/#type-loadbalancer).
It will create a GCP [Network Load
Balancer](https://cloud.google.com/compute/docs/load-balancing/network/), which
has several problems:
It will create a GCP
[Network Load Balancer](https://cloud.google.com/compute/docs/load-balancing/network/),
which has several problems:
- This load balancer does not terminate TCP connections. It simply acts as an
edge router that forwards IP packets to a "healthy" node in the cluster. As
such, it does not support IPv6, because GCE instances themselves are
currently IPv4 only.
- IP packets that arrived on the node may be routed to another node for
- IP packets that arrived at the node may be routed to another node for
reasons of capacity and availability. In doing so it will
[SNAT](https://en.wikipedia.org/wiki/Network_address_translation#SNAT) the
packet, therefore losing the source IP information that the proxy needs. The
proxy uses WHOIS source IP address to cap QPS and passes EPP source IP to
Nomulus for validation. Note that a TCP terminating load balancer also has
this problem as the source IP becomes that of the load balancer, but it can
be addressed in other ways (explained later). See
proxy uses source IP address to cap QPS and passes EPP source IP to Nomulus
for validation. Note that a TCP terminating load balancer also has this
problem as the source IP becomes that of the load balancer, but it can be
addressed in other ways (explained later). See
[here](https://kubernetes.io/docs/tutorials/services/source-ip/) for more
details on how Kubernetes route traffic and translate source IPs inside the
cluster.
@@ -581,8 +548,8 @@ has several problems:
Instead, we split the task of exposing the proxy to the Internet into two tasks,
first to expose it within the cluster, then to expose the cluster to the outside
through a [TCP Proxy Load
Balancer](https://cloud.google.com/compute/docs/load-balancing/tcp-ssl/tcp-proxy).
through a
[TCP Proxy Load Balancer](https://cloud.google.com/compute/docs/load-balancing/tcp-ssl/tcp-proxy).
This load balancer terminates TCP connections and allows for the use of a single
anycast IP address (IPv4 and IPv6) to reach any clusters connected to its
backend (it chooses a particular cluster based on geographical proximity). From
@@ -611,8 +578,8 @@ $ kubectl create -f \
proxy/kubernetes/proxy-service.yaml
```
This service object will open up port 30000 (health check), 30001 (WHOIS) and
30002 (EPP) on the nodes, routing to the same ports inside a pod.
This service object will open up port 30000 (health check) and 30002 (EPP) on
the nodes, routing to the same ports inside a pod.
Repeat this for all clusters.
@@ -641,7 +608,7 @@ Then set the named ports:
```bash
$ gcloud compute instance-groups set-named-ports <instance-group> \
--named-ports whois:30001,epp:30002 --zone <zone>
--named-ports epp:30002 --zone <zone>
```
Repeat this for each instance group (cluster).
@@ -689,7 +656,7 @@ routed to the corresponding port on a proxy pod. The backend service codifies
which ports on the node's clusters should receive traffic from the load
balancer.
Create one backend service for EPP and one for WHOIS:
Create a backend service for EPP:
```bash
# EPP backend
@@ -697,28 +664,18 @@ $ gcloud compute backend-services create proxy-epp-loadbalancer \
--global --protocol TCP --health-checks proxy-health --timeout 1h \
--port-name epp
# WHOIS backend
$ gcloud compute backend-services create proxy-whois-loadbalancer \
--global --protocol TCP --health-checks proxy-health --timeout 1h \
--port-name whois
```
These two backend services route packets to the epp named port and whois named
port on any instance group attached to them, respectively.
This ackend service routes packets to the EPP named port on any instance group
attached to it.
Then add (attach) instance groups that the proxies run on to each backend
service:
Then add (attach) instance groups that the proxies run on the backend service:
```bash
# EPP backend
$ gcloud compute backend-services add-backend proxy-epp-loadbalancer \
--global --instance-group <instance-group> --instance-group-zone <zone> \
--balancing-mode UTILIZATION --max-utilization 0.8
# WHOIS backend
$ gcloud compute backend-services add-backend proxy-whois-loadbalancer \
--global --instance-group <instance-group> --instance-group-zone <zone> \
--balancing-mode UTILIZATION --max-utilization 0.8
```
Repeat this for each instance group.
@@ -747,10 +704,10 @@ $ gcloud compute addresses describe proxy-ipv4 --global
$ gcloud compute addresses describe proxy-ipv6 --global
```
Set these IP addresses as the A/AAAA records for both epp.<nic.tld> and
whois.<nic.tld> where <nic.tld> is the domain that was obtained earlier. (If you
use [Cloud DNS](https://cloud.google.com/dns/) as your DNS provider, this step
can also be performed by `gcloud`)
Set these IP addresses as the A/AAAA records for epp.<nic.tld>where <nic.tld> is
the domain that was obtained earlier. (If you use
[Cloud DNS](https://cloud.google.com/dns/) as your DNS provider, this step can
also be performed by `gcloud`)
#### Create load balancer frontend
@@ -761,21 +718,16 @@ First create a TCP proxy (yes, it is confusing, this GCP resource is called
"proxy" as well) which is a TCP termination point. Outside connections terminate
on a TCP proxy, which establishes its own connection to the backend services
defined above. As such, the source IP address from the outside is lost. But the
TCP proxy can add the [PROXY protocol
header](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) at the
beginning of the connection to the backend. The proxy running on the backend can
parse the header and obtain the original source IP address of a request.
Make one for each protocol (EPP and WHOIS).
TCP proxy can add the
[PROXY protocol header](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt)
at the beginning of the connection to the backend. The proxy running on the
backend can parse the header and obtain the original source IP address of a
request.
```bash
# EPP
$ gcloud compute target-tcp-proxies create proxy-epp-proxy \
--backend-service proxy-epp-loadbalancer --proxy-header PROXY_V1
# WHOIS
$ gcloud compute target-tcp-proxies create proxy-whois-proxy \
--backend-service proxy-whois-loadbalancer --proxy-header PROXY_V1
```
Note the use of the `--proxy-header` flag, which turns on the PROXY protocol
@@ -785,47 +737,36 @@ Next, create the forwarding rule that route outside traffic to a given IP to the
TCP proxy just created:
```bash
$ gcloud compute forwarding-rules create proxy-whois-ipv4 \
--global --target-tcp-proxy proxy-whois-proxy \
--address proxy-ipv4 --ports 43
$ gcloud compute forwarding-rules create proxy-epp-ipv4 \
--global --target-tcp-proxy proxy-epp-proxy \
--address proxy-ipv4 --ports 700
```
The above command sets up a forwarding rule that routes traffic destined to the
static IPv4 address reserved earlier, on port 43 (actual port for WHOIS), to the
TCP proxy that connects to the whois backend service.
static IPv4 address reserved earlier, on port 700 (actual port for EPP), to the
TCP proxy that connects to the EPP backend service.
Repeat the above command another three times, set up IPv6 forwarding for WHOIS,
and IPv4/IPv6 forwarding for EPP.
Repeat the above command to set up IPv6 forwarding for EPP
## Additional steps
### Check if it all works
At this point the proxy should be working and reachable from the Internet. Try
if a whois request to it is successful:
```bash
whois -h whois.<nic.tld> something
```
One can also try to contact the EPP endpoint with an EPP client.
### Check logs and metrics
The proxy saves logs to [Stackdriver
Logging](https://cloud.google.com/logging/), which is the same place that
Nomulus saves it logs to. On GCP console, navigate to Logging - Logs - GKE
Container - <cluster name> - default. Do not choose "All namespace_id" as it
includes logs from the Kubernetes system itself and can be quite overwhelming.
The proxy saves logs to
[Stackdriver Logging](https://cloud.google.com/logging/), which is the same
place that Nomulus saves it logs to. On GCP console, navigate to Logging -
Logs - GKE Container - <cluster name> - default. Do not choose "All
namespace_id" as it includes logs from the Kubernetes system itself and can be
quite overwhelming.
Metrics are stored in [Stackdriver
Monitoring](https://cloud.google.com/monitoring/docs/). To view the metrics, go
to Stackdriver [console](https://app.google.stackdriver.com) (also accessible
from GCE console under Monitoring), navigate to Resources - Metrics Explorer.
Choose resource type "GKE Container" and search for metrics with name "/proxy/"
in it. Currently available metrics include total connection counts, active
connection count, request/response count, request/response size, round-trip
latency and quota rejection count.
Metrics are stored in
[Stackdriver Monitoring](https://cloud.google.com/monitoring/docs/). To view the
metrics, go to Stackdriver [console](https://app.google.stackdriver.com) (also
accessible from GCE console under Monitoring), navigate to Resources - Metrics
Explorer. Choose resource type "GKE Container" and search for metrics with name
"/proxy/" in it. Currently available metrics include total connection counts,
active connection count, request/response count, request/response size,
round-trip latency and quota rejection count.
### Cleanup sensitive files