The primary annoyance with this is that it means we need (or at least,
should) split all tests that use the fee extension into two separate
tests -- one that simulates non-prod environments, and one that
simulates prod environments. This leads to duplication of many tests but
that's fine since this is theoretically temporary.
* Configure cloud scheduler to trigger MoSAPI SLA status to cloud monitoring in production
- We have kept this job to trigger for every 3 minutes so that we get near to real time update for our task.
- This will not trigger metrics for now as we have not written Metrics triggering logic yet
- Logs are added
* Change Trigger scheduling from 3 minutes to 5 minutes
This primarily affects the EPP greeting. We already were erroring out when any
contact flows attempted to be run; this should just prevent registrars from even
trying them at all.
This PR is designed to be minimally invasive, and does not remove any of the
contact flows or Jakarta XML/XJC objects/files themselves. That can be done
later as a follow-up.
Also note that the contact namespace urn:ietf:params:xml:ns:contact-1.0 is still
present for now in RDE exports, but I'll remove that subsequently as well.
BUG= http://b/475506288
This means that attempting to add a status that is already present will now
fail, and attempting to remove a status that is not present will also now fail.
This also refactors the existing checks into a single verify method, rather than
having to call three separate methods from every callsite.
BUG= http://b/474645068
* add cloud profiler to dockerfile and start script
* add apt-get update
* change in cb machine type for nomulus
* fix typo
* add max worker limit to gradle tests
* Switch to root before doing apt-get
* correct dockerfile
* jetty/Dockerfile
* profiler service conditional to kubernetes container name
Many of the actual fee extension changes are based off Weimin's PR
https://github.com/google/nomulus/pull/2912, though this makes some
additional changes based on the XML schema and description from RFC 8748.
This adds tests for the DomainCheckFlow which is the most complex and
thorough user of the fee extension, but we'll want to add further tests
to the other domain flows to make sure they're handled correctly.
It just makes it possible to delete allocation tokens, otherwise we need
to do a linear search over the entire Domain and DomainHistory tables if
we ever want to delete something.
The RST testing expects us to fail if they try to remove an IP from a
host that already doesn't that have that IP, or to add one that already
exists (ditto on both for a domain's nameservers). I don't really see an
issue with our previous no-op implementation, but we need to do this to
pass the tests.
We need to have this enabled in sandbox, but we wish to wait to enable
it for production to make sure that the implementation is correct and
that clients can use it.
Soon we'll want to do something similar (but the opposite) with the old
fee extensions, where we **only** serve them in production (or maybe
unit test as well). That will allow us to pass the RST tests that depend
on only having the fee extension 1.0.
This is / will be required in https://datatracker.ietf.org/doc/rfc8748/.
I split this out from the rest of the fee-extension testing so that it
can be easily visible.
This commit introduces a new backend endpoint at `/_dr/task/triggerMosApiServiceState` that initiates the process of fetching the latest service states for all TLDs from the MoSAPI endpoint and exporting them as metrics to Cloud Monitoring.
The key changes include:
- A new `TriggerServiceStateAction` class that handles the GET request to the new endpoint.
- Logic within `MosApiStateService` to concurrently fetch states for all configured TLDs.
- A new `MosApiMetrics` class (currently a placeholder) responsible for sending the collected states to the monitoring service.
- Unit tests for the new action and the updated service logic.
This endpoint will be called periodically to ensure that the MosApi service health metrics are kept up-to-date.
this is coming from the schema https://datatracker.ietf.org/doc/rfc8748/
section 6.1. The class, that we use for "premium" notes, moved from the
command to the object itself.
I don't know where in the spec these are explicitly disallowed, but it
seems like good practice and we'll fail the RST tests if we don't
disallow them.
* Add GetServiceState action for MoSAPI service monitoring
Implements the `/api/mosapi/getServiceState` endpoint to retrieve service health summaries for TLDs from the MoSAPI system.
- Introduces `GetServiceStateAction` to fetch TLD service status.
- Implements `MosApiStateService` to transform raw MoSAPI responses into a curated `ServiceStateSummary`.
- Uses concurrent processing with a fixed thread pool to fetch states for all configured TLDs efficiently while respecting MoSAPI rate limits.
junit test added
* Refactor MoSAPI models to records and address review nits
- Convert model classes to Java records for conciseness and immutability.
- Update unit tests to use Java text blocks for improved JSON readability.
- Simplify service and action layers by removing redundant logic and logging.
- Fix configuration nits regarding primitive types and comment formatting.
* Consolidate MoSAPI models and enhance null-safety
- Moves model records into a single MosApiModels.java file.
- Switches to ImmutableList/ImmutableMap with non-null defaults in constructors.
- Removes redundant pass-through methods in MosApiStateService.
- Updates tests to use Java Text Blocks and non-null collection assertions.
* Improve MoSAPI client error handling and clean up data models
Refactors the MoSAPI monitoring client to be more robust against
infrastructure failures
* Refactor: use nullToEmptyImmutableCopy() for MoSAPI models
Standardize null-handling in model classes by using the Nomulus
`nullToEmptyImmutableCopy()` utility. This ensures consistent API
responses with empty lists instead of omitted fields.
* Add RST support in Sandbox
Added RST test label files as resources.
Added a RstTmchUtils class that loads appropriate labels according to
TLD pattern.
Temporarily changed label fetching in production to include the TLD
string, so that the new class may know which set of labels to use.
* Addressing comments
* Addressing comments
We do most of these on host create already so we should also do them on
host checks. The only added change is the character validation (our
existing hostnames all match these).
2306 signifies something that is syntactically valid but semantically
invalid (like if someone tried to register a .com domain). These errors
are for domain syntax that could never be valid, thus we should throw a
syntax exception instead of a policy exception.
Now that we have transitioned to the minimum dataset, we no longer
support any actions on contacts (and by the time this is merged /
deployed, all contacts will be deleted). We should just throw an
appropriate exception on all contact-related flows. We don't delete the
flows themselves, so that we can have an appropriate error message.
We also keep all the flows and XML templates around individually for now because we may be
required to continue to differentiate the requests in ICANN activity
reporting (e.g. srs-cont-create vs srs-cont-delete)
This is a follow-up to PR #2908, which relaxed this restriction on bare TLDs
only, but now we also allow it systemwide on domains and hostnames as well. The
rules against hyphens in these positions are still enforced on all parts of the
domain name except the last one. Correct handling of multi-part TLDs in this
regard is out of scope in this PR; a multi-part TLD that looked something like
".zz--foobar.foobar" would still fail validation. (But of course you cannot a
priori know just from looking at a 3-part string whether it might be a hostname
on a normal TLD, or a domain name on a 2-part TLD.)
This also has some annoying interactions with a trailing dot (indicating the
root), which need to be preserved, but otherwise don't affect how TLD validation
is handled.
BUG= http://b/471013082
The canonicalizeHostname() helper method is only suitable for use with domain
names or host names. It does not work on bare TLDs, because a bare TLD can
have hyphens in the third and fourth position without necessarily being an IDN.
Note that the configure TLD command already correctly allows TLDs with such
names to be created.
Note that we are still enforcing that the TLDs to be added exist, so they have
to pass all TLD naming requirements that are enforced on creating TLDs, and we
are still lowercasing the TLD names passed as arguments here (though we're no
longer punycoding them, although arguably that's not super useful on
command-line params anyway).
BUG= http://b/471013082
We used to publish test artifacts to a Maven repo on GCS, for use by
schema tests. For this to work with Kokoro, the GCS bucket must be
accessible to all users.
To comply with the no-public-user requirement, we store the necessary
jars at at well-known bucket and map them into Kokoro. This strategy
cannot be used on the Maven repo because only a small number of files
with fixed names may be mapped. With the Maven repo, there are too many
files to map.
This is a decently simple wrapper around the previously-created
BulkDomainTransferAction that batches a provided list of domains up and
sends them along to be transferred.
This PR finds instances where we previously checked if the feature flag
for contacts-prohibited was set and removes those checks, making the
contacts-prohibited behavior the only behavior. Because the tests didn't
have that feature flag set, this means we need to change a ton of tests
to remove contact references.
* Add mosapi client to intract with ICANN's monitoring system
This change introduces a comprehensive client to interact with ICANN's Monitoring System API (MoSAPI). This provides direct, automated access to critical registry health and compliance data, moving Nomulus towards a more proactive monitoring posture.
A core, stateless MosApiClient that manages communication and authentication with the MoSAPI service using TLS client certificates.
* Resolve review feedback & upgrade to OkHttp3 client
This commit addresses and resolves all outstanding review comments, primarily encompassing a shift to OkHttp3, security configuration cleanup, and general code improvements.
* **Review:** Addressed and resolved all pending review comments.
* **Feature:** Switched the underlying HTTP client implementation to [OkHttp3](https://square.github.io/okhttp/).
* **Configuration:** Consolidated TLS Certificates-related configuration into the dedicated configuration area.
* **Cleanup:** Removed unused components (`HttpUtils` and `HttpModule`) and performed general code cleanup.
* **Quality:** Improved exception handling logic for better robustness.
* Refactor and fix Mosapi exception handling
Addresses code review feedback and resulting test failures.
- Flattens package structure by moving MosApiException and its test.
- Corrects exception handling to ensure MosApiAuthorizationException
propagates correctly, before the general exception handler.
- Adds a default case to the MosApiException factory for robustness.
- Uses lowercase for placeholder TLDs in default-config.yaml.
* Refactor and improve Mosapi client implementation
Simplifying URL validation with Guava
Preconditions and refining exception handling to use `Throwables`.
* Refactor precondition checks using project specific utility
This will be necessary if we wish to do larger BTAPPA transfers (or
other types of transfers, I suppose). The nomulus command-line tool is
not fast enough to quickly transfer thousands of domains within a
reasonable timeframe.
After each deployment in sandbox or production, move the artifacts from
the corresponding release to a well-known location so that they can be
mapped to Kokoro in presubmit tests. The Kokoro-mapping does not need
public access to the GCS bucket.
The artifacts include the postgresql schema jar, the nomulus release
jar, and the uber jar of the nomulus schema integration test classes.
Every jar name consists of a fixed prefix and the environment. Each jar
of a new deployment overrides the previous copy.
These are old/pointless now that we've migrated to GKE. Note that this
doesn't update anything in the docs/ folder, as that's a much larger
project that should be done on its own.
Add a flag indicating that a Sandbox TLD should use the production
servers.
No additional TLD name pattern checks. Cloud DNS has an allowlist for
names that may use production servers.
Also updated default descriptive name generation: dropping the trailing
'.', and replacing remaining dots with '_'.
This contains the same codepoints from the
core/src/main/java/google/registry/idn/Latin-IDN.xml file, just in the old .txt
IDN format that Nomulus actually ingests.
We don't need to support the mix of GAE and GKE any more so we can get
rid of the GaeService bits and unify everything under one constant
service. This also allows us to reduce the number of services down to
four (FE, BE, PUBAPI, console) which is nice.
We've previously been using Scrypt since PR #2191 which, while being a
memory-hard slow function, isn't the optimal solution according to the
OWASP recommendations. While we could get away with increasing the
parallelization parameter to 3, it's better to just switch to the
most-recommended solution if we're switching things up anyway.
For the transition, we do something similar to PR #2191 where if the
previous-algorithm's hash is successful, we re-hash with Argo2id and
store that version. By doing this, we should not need any intervention
for registrars who log in at any point during the transition period.
Much of this PR, especially the parts where we re-hash the passwords in
Argon2 instead of Scrypt upon login, is based on the code that was
eventually removed in #2310.
Reformat the current schema file for RFC 8748 final version. This was
adapted from v0.12 is not fully consistent with the final schema
This helps highlight the differences we missed in PR 2855 when we check
in the official schema.
The servlets, at this point now that we're off GAE, are only used for
the test server (and, indirectly, in one BSA test). Instead of having
them all remain separate, we can unify them in one test servlet that
lives in the test/ folder.
This removes one avenue of potential confusion w/r/t how request routing
actually works and where we would want to add new routing.
We need to stop using maven repo on GCS to store artifacts for the
schema compatibility tests. After public access is removed from GCS
buckets, Kokoro won't be able to access it: normal access will be
denied, and the repo is too large to map (copy) to Kokoro VM as a
resource.
This PR uploads the relevant jars to each release's folder. See
go/dr-gcs-public-access-prevention for details.
With GKE, we don't need the individual servlets because the services
aren't partitioned out the same way they were in GAE.
We keep FrontendServlet and BackendServlet around for now as they serve
as the backbone for the local RegistryTestServer (for testing things
like the console).
did some cursory tests on alpha and things seem to be unaffected -- I
was able to curl RDAP (pubapi) and create domains
It's been over half a year now since we last used any of these and we definitely
no longer have any intentions of ever using App Engine again.
BUG= http://b/457471639
Still part of b/454947209, removing references to WHOIS where we can. We
keep the registrar type and the column names (at least for now) because
changing those is much more complicated.
We no longer need this now that no contacts can be applied to any domains at all.
A follow-up PR in subsequent weeks will delete the column from the DB schema.
BUG= http://b/448619572
Previously, we would have separate database calls for mapping from
foreign key to repo ID and then from repo ID to object. This PR modifies
those calls to load the resource directly (the old system was an
artifact of the Datastore key-value storage system).
In this PR, we merge the load-resource-by-foreign-key calls into a
single database load, as well as adding a separate cache object for
(foreign key) -> (resource). Now we cache, and have separate cleaner
code paths, for fk -> resource, fk -> repo ID, and repo ID -> resource.
Also removes the unused RdeFragmenter class
* Support Fee Extension standard in rfc 8748
Adding support to the final version of RFC 8748.
Compared with draft-0.12, the only meaningful change is in the namespace.
The rest is either schema-tightening that reflects actual usage, or
optional server-side features that we do not support.
We reuse draft-0.12 tests, only changing namespace uris in the input and
output files for the new version.
* Addressing reviews
We haven't been serving this for a while, let's finally get rid of them.
We keep some Soy rules around in the presubmits file because we use some
Soy files as XML templates for EPP actions.
This doesn't make any underlying implementation details, and is mainly useful to
reduce the number of diffs in PR #2852 (which does change implementation
details) thus making that easier to review.
This allows us to specify a getter delegation to bypass Hibernate's limitations
on field types for the purposes of, e.g., using a sorted set in toString()
output rather than the base Hibernate unsorted HashSet type.
BUG=http://b/448631639
This affects FKs pointing to both Contact and ContactHistory. This is in
preparation to us deleting all rows in those two tables, and then subsequently
removing all application logic having to do with contacts entirely.
This is steps one and two of b/454947209
We already haven't been serving WHOIS for a while, so there's no point
in keeping the old code around. This can simplify some code paths in the
future (like, certain foreign-key-loads that are only used in WHOIS
queries).
Basically, what happened is that the cache's expireAfterWrite was being
called some number of milliseconds (say, 50-100) after the transaction
was started. That method used the transaction time instead of the
current time, so as a result the entries were sticking around 50-100ms
longer in the cache than they should have been.
This fix contains two parts, each of which I believe would be sufficient
on their own to fix the issue:
1. Use the currentTime passed in in Expiry::expireAfterCreate
2. Use the transaction time in the cache's Ticker. This keeps everything
on the same schedule.
During the release process, we are seeing the message "Gradle build daemon disappeared unexpectedly (it may have been killed or may have crashed)" which seemingly can be caused by OOMs
I analyzed SQL statements run during the following flows and EXPLAIN
ANALYZEd each of them to figure out if there are any additional hash
indexes we could add that could be particularly helpful. Note: it's not
worth adding a hash index on the host_repo_id field in DomainHost
because so many rows (domains) use the same host.
- domain create
- domain delete
- domain info
- domain renew
- domain update
- host create
- host delete
- host update
I skipped the ones that use the read-only replica, as well as contact
flows (we're getting rid of them), and domain transfer/restore-related
flows as those are extremely infrequent.
Updates (AKA merges) run an extra SELECT statement to figure out if the
resource exists so that it can merge the entity into the existing object
in Hibernate's schema. When we're inserting new rows (such as new poll
messages or resource creates), we know that we don't need to do that
merge. Doing this should save us some SELECT statements (this has borne
out to be the truth in alpha)
This should help in instances of popular domains dropping, since we
won't need to do an additional two database loads every time (assuming
the deletion time is in the future).
When running the action in sandbox on 1.5M domains, it failed a few times
updating individual domains (requiring a manual restart of the entire action).
It's better to just log the individual failures for manual inspection and then
otherwise continue running the action to process the vast majority of other
updates that won't fail.
BUG = http://b/439636188
Add a flag to the CreateCdnsTld command to bypass the dns name format
check in Sandbox (limiting names to `*.test.`). With this flag, we
can create TLDs for RST testing in Sandbox.
Note that if the new flag is wrongly set for a disallowed name, the
request to the Cloud DNS API will fail. The format check in the command
just provides a user-friendly error message.
I went through all the SQL statements generated by some sample
DomainCreateFlow and DomainDeleteFlow cases to find situations where we
were either SELECTing from, or UPDATEing, tables with a direct "field =
value" format. These are the situations that I found where we can add
hash indexes. This does two things:
1. Makes these queries slight faster, since these are usually queries on
columns that are either unique or very close to unique, and O(1) is
faster than O(log(n))
2. Spreads around the optimistic predicate locks on the previously-used
btree indexes. Many of our serialization errors came from the fact
that we were autogenerating incrementing ID values for various
tables, meaning that SELECTs, INSERTs, and UPDATEs would all try to
take predicate locks out on the same page of the btree index. Using a
hash index means that the page locks will be spread out to various
index pages, rather than conflicting with each other.
Running load tests on alpha I see significant improvements in speed and
error rates. Speed is hard to quantify due to the nature of the way the
load tests distribute tasks among the queues but it could be more than
50% improvement, and serialization errors in the logs drop by more than
90%.
This implements the first part of Minimum Data Set phase 3, wherein we delete
all contact data. This action is necessary to leave a permanent record on the
domain (in the form of a domain history entry) documenting when the contacts
were removed by the administrative user.
Then, after this has finished removing all contact assocations, we can simply
empty out or drop the Contact/ContactHistory tables and associated join tables.
* Skip user loading for proxy service account
Reduces database load by skipping the User entity lookup for the proxy
service account during OIDC authentication.
The high volume of EPP "hello" and "login" commands from the proxy
service account results in a constant database load. These lookups
are unnecessary as the proxy service account is not expected to have a
corresponding User object.
This change optimizes the authentication flow by checking for the proxy
service account email *before* attempting to load a User from the
database. This bypasses the database transaction entirely for these
high-volume requests.
This approach is more efficient than caching, as it eliminates the
database lookup for the proxy service account altogether, rather than
just caching the result.
* comment added and service account llokup time improved
* comment updated for more clarity
* Add cache for User entities in OIDC auth flow
* refactor: Address review feedback
- Refactor database call into a single, reusable method
- Increase the default cache size to 200
- Remove .recordStats() and using spy for testing
- Split unit tests into separate implementation test that use Mockito spies instead of checking internal cache stats
We've moved these over to the User class, so we should remove these for
clarity. In addition, we should make it clear (in Java at least) that
the field in the RegistryLock object refers to the email address used
for the lock in question.
This allows us to also check / modify the CharlestonRoad registrar in
the console, and also allows us to test actions (like password reset)
using that registrar in the prod environment.
* Fix OOM in UploadBsaUnavailableDomains action
The action was using string concatenation to generate the upload content.
This causes an OOM when string length exceeds 25MB on our current VM.
This PR witches to streaming upload.
Also added an HTTP upload test.
* Fix OOM in UploadBsaUnavailableDomains action
The action was using string concatenation to generate the upload content.
This causes an OOM when string length exceeds 25MB on our current VM.
This PR witches to streaming upload.
Also added an HTTP upload test.
Error happened in the case that an unblockable name reported with
'Registered' as reason has been deregistered. We tried to check the
deletion time of the domain to decide if this is a transient error
that is no worth reporting. However, we forgot that we do not have
the domain key in this case.
As best-effort action, and with a case that rarely happens, we decide
not to make the optimization (staleness check) in thise case.
Note: this still includes "contacts" for registrars, which are actually
a different concept that we call RegistrarPoc. That's different from
"Contact" objects, e.g. registrant.
The TLD is technically valid but it doesn't exist for us -- we should
return 404 instead of 400 in these situations according to the RDAP
conformance docs
Load balancer / internal redirections can result in the final request
URL lacking "https" when finally getting to the servlet. As a result,
even if you use https in the request, the resulting URL can be plain
http.
We need to include the actual (HTTPS) URL in the output, so replace it.
This increases hikari fetch size to 40 from 20 in order to decrease the
amount of round trips
This also sets lower CPU as we seem to have overshot CPU consumtion
This also set min replicas to 8 for EPP and max to 16 as we've been
running on 8-10 for the last week
Tested locally and on alpha with dummy values (and throwing an
exception).
I was able to reuse a bit of code from the EPP password reset, but not
all of it.
This is the first in a series of PRs to implement the expiry access period
(XAP). The overall fee schedules will be set in YAML config files, so the only
DB change necessary should be this single new boolean column on the Tld entity,
which defaults to false so as to require XAP explicitly being turned on for a
given TLD.
BUG=http://b/437398822
Postgresql-17.6 introduces two new lines in pg_dump output as a
security feature: `\restricted {HASH}` and `\unrestricted {HASH}`.
We filter out lines starting with these two prefixes when comparing
schemas.
The db upgrade also adds two empty lines to the pg_dump output. We
know ignore all empty lines when comparing schemas.
It's necessary to remove the GAE-related code (and use GKE launch
commands instead), and we might as well remove contact-related fields
and actions because of the upcoming move to the minimum data set.
This reverts commit 5cef2dd8b5.
We faced CPU quota issue with standard machine type, so rolling back to c4
for now to monitor server performance and decide if we want to try to
downgrade again in the future.
We probably want this to run before the billing recurrence expansion
pipeline just in case there are any domains that should be deleted
before their billing recurrence gets expanded.
nodeSelector can limit scheduling capabilities of k8s, which leads to delays in assigning new workloads. Since we do not require and particular machine for execution it can be removed.
* Fix: Robustly parse certs and provide specific errors
* Add test for expired certificate failure
* fixing indentation
* fixing indentation
* Update SecurityActionTest.java
* Update SecurityActionTest.java for correcting the testcase
* Fix: Provide indentation fix
* Fixing Deduplication in test
ALl deployments received update to averageUtilization cpu. This should allow us to stay ahead of the curve of traffic and create instances before we cpu reached the limit.
Frontend cpu allocation has caused "noise neighbors" problem with pods assigned to nodes where there's not enough bursting capacity, so I increased it.
Adjusted rest of the deployments according to their utilization.
Missing resource requests as well as metrics for when to evict resource
produced situation when under load k8s struggled to assign pods. This
adds default resource requirements based on 2 weeks metrics and
instructions when resource should be evicted.
* Fix error handling in CopyDetailReportsAction
The action tries to record errors per registrar in an ImmutableMap, without realizing that
there may be duplicate keys due to retries.
Switched to the `buildKeepingLast` method to build the map.
* Addressing comments and rebase
Given an entity with auto-filled id fields (annotated with
@GeneratedValue, with null as initial value), when inserting it
using Hibernate, the id fields will be filled with non-nulls even
if the transaction fails.
If the same entity instance is used again in a retry, Hibernate mistakes
it as a detached entity and raises an error.
The work around is to make a new copy of the entity in each transaction.
This PR applies this pattern to affected entity types.
We considered applying this pattern to JpaTransactionManagerImpl's insert
method so that individual call sites do not have to change. However, we
decided against it because:
- It is unnecessary for entity types that do not have auto-filled id
- The JpaTransactionManager cannot tell if copying is cheap or
expensive. It is better exposing this to the user.
- The JpaTransactionManager needs to know how to clone entities. A new
interface may need to be introduced just for a handful of use cases.
It will now only throw errors on domain updates if a new contact/registrant has
been specified where none was previously present. This means that domain updates
on unrelated fields (e.g. nameserver changes) will succeed even if there is
existing contact data that the update is not removing.
This is a follow-up to #2781.
BUG=http://b/434958659
Some of these have been around since the Datastore days and are no
longer relevant (dealing with things like Datastore foreign keys). Let's
simplify things.
This works fairly similarly to the registry lock request and
verification mechanism. The request action generates a UUI which is
emailed (in link form) to the user in question. The frontend will send a
request to the verify action with the UUID and hopefully the action
should be finalized.
EPP password requests can be sent by anyone with edit-registrar
permissions and must be approved by an admin POC email.
Registry lock password resets can only be sent by primary contacts, and
are verified/performed by the user in question.
This prohibits all contact data on create and update EPP flows for both domain
and contact flows. It also refactors how default values on FeatureFlags work, as
it's safer to specify a single default on the flag itself rather than have to
specify it independently at a number of callsites (and potentially end up having
an inconsistent value). Domain updates on existing domains that still have
contact data will fail unless all contact data is removed, as a forcing function
to require registrars to rectify the situation prior to being able to do any
other kind of domain changes.
Contact-related flows that are still allowed after this point: Updating a domain
to remove all contacts from it, and deleting a contact object.
This corresponds to the Feb 2024 response profile section 1.2 and
implementation guide 1.3 respectively, now that we comply (or are, at
least closer to complying), with the Feb 2024 versions.
This should probably depend on https://github.com/google/nomulus/pull/2771
because that includes a small change included in the Feb 2024 version
This also updates the documentation to reference the proper areas of the
specifications.
From the response profile:
2.4.6. Registrar URL - The entity with the registrar role in the RDAP response
MUST contain a links member [RFC9083]. The links object MUST contain
the elements: value, identical to the the RDAP Base URL for the
Registrar as provided in the IANA “Registrar IDs” registry (i.e.,
https://www.iana.org/assignments/registrar-ids); rel:about, and href
containing the Registrar URL. Note: in cases where the Registry Operator
acts as sponsoring Registrar (e.g., IANA Registrar ID 9999), the href shall
contain a URL from the Registry.
This is necessary so that the total number of requests/responses adds up
correctly even though some fraction of them are only being recorded. It uses
stochastic rounding so that the totals add up correctly even when the reciprocal
of the ratio isn't an integer.
This is a follow-up to PR #2772.
This ratio defaults to 1.0 (i.e. all metrics will be recorded), but we will set
it much lower in sandbox and production, probably something closer to 0.01. This
will reduce recorded metrics volume and thus StackDriver cost, while still
retaining enough data for overall performance monitoring.
This is handled stochastically, so as to not require any coordination between
Java threads or GKE pods/clusters, as alternative approaches would (i.e. using a
counter and recording every Nth, or throttling to a max metrics qps).
In RDAP, domain queries are the most common by a factor of like 40,000
so we should optimize these as much as possible. We already have an EPP
resource / foreign key cache which does improve performance somewhat but
looking at some sample logs, it only cuts the RDAP request times by like
40% (looking at requests for the same domain a few seconds apart).
History entries don't change often, so we should cache them to make
subsequent queries faster as well. In addition, we're only caching two
fields per repo ID (modification time, registrar ID) so we can cache
more entries than we can for the EPP resource cache (which stores large
objects).
Pubapi actions should always use cache, regardless of the config
settings on caching.
In EppResource.java, the original `loadCached(Iterable<VKey>)`
method is renamed to `loadByCacheIfEnabled`. The original
`loadCached(Vkey)` method is renamed to `loadByCache` and always
uses cache.
In EppResourceUtils.java, the original `loadByForeignKeyCached`
method is renamed to `loadByForeignKeyByCacheIfEnabled`. A new
`loadByForeignKeyByCache` method, which always uses cache.
In ForeighKeyUtils.java, the original `loadCached` method is
renamed to `loadByCacheIfEnabled`, and a new `loadCached` method
is added which always uses cache.
Also added a `getContactsFromReplica` method in Registrar,
for use by RDAP actions.
A future PR will add the actions that save and use this object. That
future PR will also require loading RegistrarPoc objects given the
registrar ID, hence the change in that class.
This implements two type of changes:
1. changing the link type for things like the terms of service
2. adding the request URL to each and every link with the "value" field.
This is a bit tricky to implement because the links are generated in
various places, but we can implement it by adding it to the results
after generation.
See b/418782147 for more information
Add a new DnsWritersModule for use by the component classes.
To override the set of writers installed, we can easily overwrite this
file with a private version.
This is necessary because we'll use primary-contact emails as a way of
resetting passwords.
In the UI, don't allow editing of email address for primary contacts,
and don't allow addition/removal of the primary contact field
post-creation.
In the backend, make sure that all emails previously added still exist.
We're changing the way that allocation tokens work in suboptimal (i.e. incorrect) situations in the domain check, creation, and renewal process. Currently, if a token is not applicable, in any way, to any of the operations (including when a check has multiple operations requested) we return some variation of "Allocation token not valid" for all of those options. We wish to allow for a more lenient process, where if a token is "not applicable" instead of "invalid", we just pass through that part of the request as if the token were not there.
Types of errors that will remain catastrophic, where we'll basically return a token error immediately in all cases:
- nonexistent or null token
- token is assigned to a particular domain and the request isn't for that domain
- token is not valid for this registrar
- token is a single-use token that has already been redeemed
- token has a promotional schedule and it's no longer valid
Types of errors that will now be a silent pass-through, as if the user did not issue a token:
- token is not allowed for this TLD
- token has a discount, is not valid for premium names, and the domain name is premium
- token does not allow the provided EPP action
Currently, the last three types of errors cause that generic "token invalid" message but in the future, we'll pass the requests through as if the user did not pass in a token. This does allow for a default token to apply to these requests if available, meaning that it's possible that a single DomainCheckFlow with multiple check requests could use the provided token for some check(s), and a default token for others.
The flip side of this is that if the user passes in a catastrophically invalid token (the first five error messages above), we will return that result to any/all checks that they request, even if there are other issues with that request (e.g. the domain is reserved or already registered).
See b/315504612 for more details and background
We suspected this could be a cause of optimistic locking failures
(because long transactions would lead to optimistic locks not being
released) but this didn't end up being the case. Let's remove this to
reduce log spam.
1. This doesn't remove the SQL tables yet (this is necessary to pass
tests and also good practice just in case we need or want to look at
history for a little bit)
2. This also removes the Registrar, RegistrarPoc, and User base classes
that were only necessary because we were saving copies of those
objects in the old history classes.
We no longer need to union GKE+GAE logs since we've moved all production
traffic to GKE only.
For testing, I copied the affected *_test.sql files to Bigquery, removed
all the "-alpha" bits, and changed the dates to 20250301 and 20250331
and ran them to make sure they returned the expected data.
Now that we have effective global sessions thanks to #2734, there is no
longer a need to keep the number of pods on the EPP service static.
We are also not vulnerable to random pod restarts. K8s never guarantees
perpetual pod lifetime anyway, and not having to be at its mercy is
certainly a relief.
It turns out period can be used in the URI, such as in
"urn:ietf:params:xml:ns:fee-0.12". I don't think pipe is used, at least
not according to EPP URI namespace naming convention.
Ideally we'd use serialization, but using the default serialization runs
the risk of it being platform/JDK dependent, so a new deployment might
not be able to deserialize existing cookies. A custom serializer that
guarantees stability would have been needed.
This was added back in early 2018 long ago to enable promotions, but
since then (and for many years) we've added the ability to run
promotions on the tokens themselves, rather than relying on custom Java
classes.
This will make the changes for b/315504612 much easier, as that will
split up token validation into "is this token valid in general?" and "is
this token valid for this domain/action?"
This can potentially help even more with serializable transaction
failures (optimistic locking exceptions, which are expected to occur
somewhat frequently).
With six attempts, we will sleep at most five times, for
100+200+400+800+1600 ms each, for a total of at most 3.1 seconds (much
less than the EPP maximum which I believe (?) to be 30 seconds.
In addition, we add a 20% skew in an attempt to spread out
possibly-conflicting transaction retries.
This changes the code to only save console histories of this type. We
keep the old Java code (and, necessarily, the corresponding SQL code)
for now because there's no harm in doing so and we want to avoid hastily
deleting too much.
The SQL statement was incorrectly flooring to zero one layer too deep, which was
negating all negative transaction report rows (which occur most frequently when
a domain in the autorenew grace period is deleted). I've changed it so that it
now only floors to zero at the report level, which still solves the issue
reported in http://b/290228682 but whose original fix caused the issue
http://b/344645788
This bug was introduced in https://github.com/google/nomulus/pull/2074
I tested this by running the new query against the DB for 2024 Q4 using the
registrar that was having issues and confirmed that the total renewal numbers
for .app now match with the sum total of what we invoiced for the last three
months of 2024.
This doesn't check for correctness (we have other scripts that do that)
but just that the service is available at all (the other scripts do not
do that).
This should, and will, be configured with a scheduled trigger in GCB (for us, in
the domain-registry-dev project) and configuration to send some sort of
pub/sub notification on failure (for us, this is already set up on
domain-registry-dev and it sends messages to the "Domain Registry
Notifications" chat channel.
We've seen this issue happen more often than not recently, where GAE
canary deployment is stuck for about 10 min and the failed. The reason
is not clear, but delete the canary version prior to a deployment always
fixes the issue.
We use the $environment property to set the console config. If it is not
given, 'alpha' is used, which has the same effect as 'production'.
TESTED=ran :jetty:copyConsole with
-Penvironment=(sandbox|production|alpha) and checked the resulting js
file.
For GKE all tasks should be on backend, BSA was on its own service
because of egress IP constraint.
Also made it possible to specify a timeout for the Cloud Scheduler job,
with the default (3m) suitable for most tasks.
There are two session cookies, JSESSIONID, which is set by Jetty, and
GCLB, which is set by the Gateway.
In one session, every request other than the first one (the <hello>)
should have the same GCLB value, and every request after a successful
<login> should have the same JSESSIONID.
With these two metadata, we should be able to trace all requests that
*should* belong to the same session and debug issues with session
mismatch (if any).
There can be delays in releasing predicate locks when we have
transactions that are long-lived -- even delays in releasing predicate
locks acquired by shorter-lived transactions. Logging the transaction
duration will allow us to get a sense as to transaction durations during
busy times.
This can happen on public endpoints (in pubapi) where the service is
behind IAP but all users (including not-logged-in ones) are allowed. IAP
will add an OIDC token with no email field in the request header.
This removes logic from an inner helper method so that it becomes more clear
from callsites within each test exactly which behavior is expected from those
test conditions.
We now pin to postgreSQL v17 when running tests, which means that minor
version might increase without our intervention. This causes (at least)
the comment in the golden schema to change, and failing the test as a
result.
This PR adds the ability to strip lines that we deem as comment from the
comparison, so we don't have to do trivial upgrades to the gold schema
whenever there's minor version upgrade.
This allows us to easily tell which tag was deployed.
Also set the gateway to use named address so they are stable, and so
that we can attach an IPv6 record to it. Auto-provisioned addresses are
IPv4 only.
Failed pg_dump may not leave a file, failing the subsequent diffing and
causing the verifier to return success.
The verifier should abort in this case.
GKE logs are routed to a different dataset and the table is different.
The structs to look for are also different (jsonPayload vs textPayload
or protoPayload).
TESTED=Ran the resulting query in crash.
I obtained access to an IBM s390x VM so I thought I'd see how multi-arch
Nomulus is.
Our main application is in Java so it is already multi-arch, but several
tests use docker images that are by default x64. Luckily postgres has an
s390x port, but selenium does not. So I had to disable Screenshot tests
when the arch is not amd64.
This isn't a situation we'll encounter often, but if the client has an
allocation token that's valid for premium domains that gives a 0 cost,
we shouldn't require them to include the fee extension when creating the
domain. We already don't require it for standard domains.
We use this now for the DomainDeleteFlow in an attempt to figure out
what statements it's running (cross-referencing that with PSQL's own
statement logging to find slow statements).
This is kinda nonsensical because this use case is trying to apply a
single use token multiple times in the same domain:check request --
like, trying to use a single-use token for both create, renew, and
transfer while having a $0 create price and a premium renewal price.
This change doesn't affect any actual business / costs, since SPECIFIED
token renewal prices were already set on the BillingRecurrence
Instead of using a separate RenewalPriceInfo object, just map the
behavior (if it exists) onto the BillingRecurrence with a special
carve-out, as always, for anchor tenants (note: this shouldn't matter
much since anchor tenants *should* use NONPREMIUM renewal tokens anyway,
but just in case, double-check).
This also fixes DomainPricingLogic to treat a multiyear create as a
one-year-create + n-minus-1-year-renewal for cases where either the
creation or the renewal (or both) are nonpremium.
We are required to respond to HTTP(S) requests on port 80/443 on the
same domain where we serve port 43 WHOIS requests. The proxy already
does this by redirecting to the web WHOIS lookup page on the marketing
website.
This PR makes it so that requests to port 80/443 can be routed to the
proxy for redirect.
TESTED=tested on crash and the redirect works.
SecretManager based keyring not included in keyring bindings, resulting
in runtime failure.
We should simply keyring bindings. There is no use case for multiple
implementations. See b/388835696.
Knonw nested transact calls found in sandbox have been refactored away.
Enable logging in production to catch any missing cases. Logging is
throttled at 1 message per minute per VM.
k8s does not have a way to expose a global load balancer with TCP
endpoints, and setting up node port-based routing is a chore, even with
Terraform (which is what we did with the standalone proxy).
We will use Cloud DNS's geolocation routing policy to ensure that
clients connect to the endpoint closest to them.
- We never delete rows from DomainHistory (and even if we do in the
future, they'll be old / the references won't matter)
- This is likely creating lock contention when lots of requests come
through at once for domains with many DomainHistory entries
There are some breaking method changes in the 10.x.y versions and we're encountering exceptions when trying to run the flywayMigrate task thanks to those.
For now it only includes two options (domain deletion and domain
suspension). In the future, as necessary, we can add other actions but
this seems like a relatively simple starting point (actions like bulk
updates are much more conceptually complex).
This might help alleviate DB transaction contention on the PollMessage table. A
lower transaction isolation level is safe because acking a poll message is
idempotent: there are only two things it does, either delete a poll message or
take a recurring one from the past and set it to be a year in the future from
the date in the past. Both of these operations will always yield the same final
result even if executed multiple times simultaneously for some reason.
It doesn't need a higher transaction isolation level as it's only loading a given poll
message once, and we want to avoid putting any kind of locks on the PollMessage table
as it seems to be having contention issues. Note that the poll message request flow
is by far the most frequent code that touches the PollMessage table, as there are many
many requests every minute from dozens of registrars, but much fewer poll messages
than that to actually ACK.
We only include the deletion time if the domain is in the 5-day
PENDING_DELETE period after the 30 day REDEMPTION period. For all other
domains, we just have an empty string as that field.
This is behind a feature flag so that we can control when it is enabled
It was failing when any kind of transfer data was present, even completed
transfer data. Note that completed transfer data persists on a domain
indefinitely until/unless a new transfer is requested.
BUG= http://b/377328244
It seems like `/usr/bin/python` is no longer symlinked to the `python3`
binary in the `gcr.io/cloud-builders/git` image.
I've sent out a separate fix to upstream to change the shebang.
https://gerrit-review.git.corp.google.com/c/gcompute-tools/+/439501
But in the meantime, we need this temporary fix for the release to
build.
These flags are suggested by GitHub support to disable reusing caches
during Gradle build. They think that could fix the intermittent error
message:
```
Encountered a fatal error while running "/opt/hostedtoolcache/CodeQL/2.19.0/x64/codeql/codeql database finalize --finalize-dataset --threads=4 --ram=14576 --verbosity=progress++ /home/runner/work/_temp/codeql_databases/java". Exit code was 32 and last log line was: CodeQL detected code written in Java/Kotlin but could not process any of it. For more information, review our troubleshooting guide at https://gh.io/troubleshooting-code-scanning/no-source-code-seen-during-build . See the logs for more details.
```
This fails for me every day for some reason (starting about a month
ago). The same commit went through the workflow fine when the action was
triggered by a push.
I think there's no reason for us to have a cron run as the changes to the
master branch can only come from commit pushes.
Newer versions of GPG (v.2.4.5 in my case) has uses different wording
then what's available in our build image (and Ubuntu I suspect). For
example it says "rsa2048" instead of "2048-bit RSA".
Make the tests work in both cases. Admittedly we cannot check for the
string RSA/rsa easily, but I don't think it matters much for tests.
It looks like /usr/bin/python *may* no longer exists in the latest cloud
builder git image. I ran the latest image and logged into it to verify
that /usr/bin/python3 does exist on 9/25, and again on 9/26 where it
re-appeared.
I think it is generally a good idea to not rely on it being there going
forward.
* Include discount price in domai n pricing
* Partial progress in logic
* Tests and logic passing
* Change pricing for multi year create
* Tests for discount pricing logic
* Token currency check
* Add some comments
* Java formatting
* Discount price to Optional
* Change discount price to be optional nullable
* Re-add deleted tests
Depending on if a "--gke" parameter (must be the last one) is passed,
the deployer constructs the corresponding URIs for GAE or GKE
accordingly.
TESTED=Used the deployer to deploy tasks to alpha and verified that they
run on GKE.
* Drop the `transact` call in Id services
All usages already routed through `tm().allocateId()`, which is
guaranteed to be in a transaction.
* Addressing reviews
Several jars in our dependencies are now multi-release, including
dnsjava and snakeyaml, and a few more. Such jars include
jvm-version-specific classes that will only be loaded by the vm that can
handle them. All it takes is a new manifest attribute.
This change allows us to upgrade to dnsjava3.6+: the base (java 8) version of
this jar breaks java21. The correct manifest allows java21 to find the
classes it needs.
This single metric currently accounts for 22.2% of our total metrics bill,
almost double the size of our EPP requests metric, while also simultaneously
being much less useful. This change reduces the cardinality by removing two
parameters we don't care that much about, which should significantly reduce the
size and thus the cost. If after this change the metric is still too large, I'll
also then remove the matchCount parameter from this metric. We could possibly
even consider deleting the metric in its entirety, as we hardly ever use it.
This PR also removes unused code for premium list metrics that have never
actually been written out (and that we won't bother with at this point).
This will help us if/when we need to run the report generation multiple
times, or for past dates and we don't want to send extra emails or
upload any extra reports to ICANN.
Instead of having to parse the protoPayload.line from the request logs,
we just want to inspect the textPayload from the app logs (stored in a
separate table). This applies to the EPP metrics from the activity
reporting and the attempted-adds column for the transaction reporting.
This doesn't really add any tests, and we'll require many more additions
if we actually want to have full unit testing, but this at least makes
the tests pass when running `npm test`.
Previous PRs and token changes (see b/332928676) have made it so that
SPECIFIED renewalPriceBehavior tokens must have a renewal price. As
such, we can now use that renewalPrice when creating domains with
SPECIFIED tokens.
* Add QPS and incomplete connections metrics to load test client
* Add a failed request count
* Add todos
* Reuse contact
* Add bugs to todos
* small fix
* Clarify QPS
It's valid for the auth data to be null (although it only happens 10 times
across our entire registry), so the domain update flow should not fail out with
a NullPointerException when the existing state of the data is null and the
update isn't adding that data either.
BUG=http://b/359264787
This is the first step in the field removal (second will be removing the
column from SQL once this is deployed).
There's no point in using a UserDao versus just doing the standard
loading-from-DB that we do everywhere else. No need to special-case it.
When creating/deleting users, we need to add/remove the emails in
question to/from the console email group (if it exists). This used to be
done synchronously by calling the Groups API directly from the nomulus
tool. However #2488 made it so that in all cases where group membership
is modified, a Cloud Tasks task is created to execute the change on
the server side asynchronously (because there are multiple places where
this change needs to be done, and it is easier to make it all happen on the
server side).
Alas, as it turns out, Cloud Tasks tasks need to be created with a
service account's credential (which is trivially done on the server side
because the ADC is a service account). Nomulus command runs with a user
credential, and we need to grant the relevant user permission to
masquerade as a service account, in order to enqueue tasks from the
nomulus tool. It is therefore easier to just revert to the old behavior.
Originally, we though that User entities were going to have mutable
email addresses, and thus would require a non-changing primary key. This
proved to not be the case. It'll simplify the User loading/saving code
if we just do everything by email address.
Obviously this doesn't change much functionality, but it prepares us for
removing the id field down the line once the changes propagate.
For instance, on sandbox this will allow us to remove our global roles
but keep roles to the CharlestonRoad admin registrar. Then, when we view
the console, it will be as if we were a registrar user.
This is consistent with how other registries are handling RDAP and is also consistent
with overall behavior in WHOIS and domain info flows as implemented in my previous
PRs #2477 and #2490.
* Check FeatureFlag in domain flows before checking contacts
Check if phase 1 has begun of the transition to the minimum registry dataset, and if it has, do not require the presence of contacts in domain flows.
* Add tests
* Small test fixes
* rename flag
* Fix merge conflicts
* Change todo
* Add isActive methods
* Add javadocs
* small fix
As requested, for registrars participating in these tiered pricing
promos that wish to receive this type of response, we make the following
changes:
1. The pre-promotional (i.e. base tier) price is returned as the
standard domain-create fee when running a domain check.
2. The promotional (i.e. correct) price is returned as a special custom
command class with a name of "STANDARD PROMO" when running a domain
check
3. Domain creates will return the non-promotional (i.e. incorrect) price
rather than the actual promotional price.
This PR does only number 3. See PR #2489 for the others.
As requested, for registrars participaing in these tiered pricing promos
that wish to receive this type of response, we make the following
changes:
1. The non-promotional (i.e. incorrect) price is returned as the
standard domain-create fee when running a domain check.
2. The promotional (i.e. correct) price is returned as a special custom
command class with a name of "STANDARD PROMO" when running a domain
check.
3. Domain creates will return the non-promotional (i.e. incorrect) price
rather than the actual promotional price. This is not implemented in
this PR.
Our logs are getting gummed up with an indefinitely failing and retrying task to
re-save a prober domain that doesn't exist (likely because it was hard-deleted
by delete prober data action), so this makes the re-save action resilient to
that failure case so that it stops assuming every enqueued re-save actually
corresponds to an entity that exists, thus allowing it to fail permanently if
the entity doesn't exist. Failing permanently is the right thing to do as if
the entity doesn't exist now there's no reason to think it will in the future,
plus all re-saves are optimistic rather than guaranteed anyway.
This should fix http://b/350530720
* Add registryTool commands for FeatureFlags
* Fix merge conflicts
* Add required parameters and inject mapper
* Use optionals in cache to negative cahe missing objects
* Fix spelling
* Change back to bulk load in cache
* Add FeatureName enum
* Change variable name
* Use FeatureName in main parameter
This doesn't yet allow them to be absent in EPP flows, but it should make the
code not break if they happen to be null in the database. This is a follow-up to
PR #2477, which ends up being a bit easier because whereas the registrant is
used in more parts of the codebase, the other contact types (admin, technical,
billing) are really only used in RDE, WHOIS, and RDAP, and because they were
already being used as a collection anyway, the handling for if that collection
contains fewer elements or is empty happened to already be mostly correct.
When using this token (which must be tied to a particular domain), the
first year price (and only the first year price, i.e. the creation
price) will be the standard price for this TLD. Future years (i.e.
renewals) will continue at the normal premium price.
This isn't the worst thing in the world but it does result in a bad
request to the server otherwise, and log/error spam. So, only load the
domains list if we have a registrar selected.
This is the first step in migrating to the minimum registration data set. Note
that our database model already permits null domain registrants, so this just
makes the code accept it as well. Note that I haven't changed any requirements
in EPP flows yet; a later step will be to check the migration schedule and then
not require the registrant to be present if in a suitable state.
This does potentially affect the output of WHOIS/RDAP, but that's a NOOP so long
as EPP commands and other tools continue to enforce the requirement of a
registrant.
This is an optional field (will be required when the renewal price
behavior is SPECIFIED). This will allow us to set arbitrary renewal
prices for domains as part of one-off negotiations.
https://b.corp.google.com/issues/332928676
This is the last remaining GAE API that we depend on. By removing it, we are able to remove all common GAE dependencies as well.
To merge this PR, we need to create console User objects that have the same email address as the RegistrarPoc objects' login_email_address and copy over the existing registry lock hashes and salts.
We are also able to simply the code base by removing some redundant logic like AuthMethod (API is now the only supported one) and UserAuthInfo (console user is now the only supported one)
There are several behavioral changes that are worth noting:
The XsrfTokenManager now uses the console user's email address to mint and verify the token. Previously, only email addresses returned by the GAE Users service are used, whereas a blank email address will be used if the user is logged in as a console user. I believe this was an oversight that is now corrected.
The legacy console will return 401 when no user is logged in, instead of redirecting to the Users service login flow.
The logout URL in the legacy console is changed to use the IAP logout flow. It will clear the cookie and redirect the users to IAP login page (tested on QA).
The screenshot changes are mostly due to the console users lacking a display name and therefore showing the email address instead. Some changes are due to using the console user's email address as the registry lock email address, which is being fixed in Add DB column for separate rlock email address #2413 and its follow-up RPs.
GKE now displays log messages correctly. There is no need for an extra
leading newline, which now results in a useless blank line for each log
entry in Log Explorer.
* Create a load testing EPP client.
This code is mostly based off of what was used for a past EPP load testing client that can be found in Google3 at https://source.corp.google.com/piper///depot/google3/experimental/users/jianglai/proxy/java/google/registry/proxy/client/
I modified the old client to be open-source friendly and use Gradle.
For now, this only performs a login and logout command, I will further expand on this in later PRs to add other EPP commands so that we can truly load test the system.
* Small changes
* Remove unnecessary build dep
* Add gradle build tasks
* Small fixes
* Add an instances setUp and cleanUp script
* More modifications to instance setup scripts
* change to ubuntu instance
* Add comment to make ssh work
For reasons unclear to me, if the stack trace is appended directly to
the message, the log entry will be lumped together with following logs
on GKE.
Also updated the GKE service account for Nomulus in the manifest so we
can use workload identity just for Nomulus, not other pods on the same
cluster.
There are a bunch of cases where we want common exception handling and
it's annoying to have to deal with the common "set failed response and
make sure to return" a bunch of times.
This allows us to break up request methods more easily, since we can now
often throw exceptions that will break all the way back up to
ConsoleApiAction. Previously, any error handling had to exist in the
primary handler method so it could return.
The user, on the front end, should not be required to provide whether or
not they're trying to verify a lock or an unlock. They should only need
the verification code. We can inspect the lock object itself (and the
domain in question) to see whether or not we're verifying a lock or an
unlock.
We've added the field in the database in a previous PR. This is only
used in the old console for now because the new console does not have
registry lock functionality yet
* Remove the createBillingCost field from Tld
* fix spacing
* Change field name of map
* Rename getter
* Fix formatting
* Fix todo
* unchange column name
* Add log traces to Nomulus service on GKE
Add request-scope log traces to Nomulus on GKE which, unlike
AppEngine and Cloud Run etc, does not generate traces for hosted
applications. This change only affects the GKE image. It does not affect
the AppEngine services.
Log traces are added to Nomulus-generated logs in request-processing
threads. Forked threads are not covered yet. The single relevant use
case (TimeLimiter) will be addressed in a followup PR.
The main change is in the logging configuration:
* Use gcp-cloud-logging's LoggingHandler
* Add gcp-cloud-logging's TraceLoggingEnhancer to the handler.
* Set a thread-local trace id through the TraceLoggingEnhancer in
ServletBase on request's entry and clear it on completion.
Also removed an unused class (`RequestLogId`).
* CR
* CR
This handles both GET and POST requests. For POST requests it doesn't
actually change anything about the domains because we will need to add a
verification action (this will be done in a future PR).
* Avoid contention over the RefreshDnsRequest table
This table can be small at times, when PSQL may use table scan in
queries by keys. At the SERIALIZABLE isolation level, updates to
unrelated rows may conflict due to this `optimization`.
Lower the isolation level to repeatable read.
* Code review
This includes using the new switch format (though IntelliJ does not yet
understand patterns including default so those aren't used), multiline strings,
replacing some unnecessary type declarations with <>, converting some classes to
records, replacing some Guava predicates with native Java code, and some other
miscellaneous Code Inspection fixes.
This adds an index on transfer_billing_cancellation_id to Domain and superordinate_domain to Host. When tested on crash with the action limited to only delete 10,000 domains, before these indexes were added the action took about 2 hours to delete 10,000 domains. Once these indexes were added, the action was able to delete the 10,000 domains in a little under 2 minutes.
This also creates base classes for the objects contained within the
history classes, e.g. RegistrarBase. This is the same way that objects
stored in the HistoryEntry subclasses have base classes, e.g.
DomainBase.
We cannot rely on the user checking their login email, so we'll want to
send the emails to the other address if configured. This is already the
case in RegistrarPoc.
Use REPEATABLE READ for lock acquire/release operation to avoid conlicts
between locks.
Postgresql uses table scan on small tables, causing false sharing at
the SERIALIZABLE isolation level.
See b/333537928 for details.
Console users need IAP to inject the necessary OIDC tokens into their
request headers and therefore need to be bound to appropriate roles. Note
that in environments managed by latchkey, the bindings will need to be
present in latchkey config files as well, otherwise the changes made by
the nomulus tool will be reverted.
TESTED=ran the nomulus command against alpha and verified that the
bindings are created/removed upon console user creation/deletion.
The existing behavior was to ignore bad header names, in a way that was
counter-intuitive as a user of the Google Sheet. If a header name was bad (which
could just be someone accidentally changing it not realizing it needs to
correspond exactly to the name of the field on the Java object), then all of the
data in that column was just silently left as-is and never updated. This led to
gradually worsening sync and offset shift errors over time.
Now, it will write out an error message into every single cell in the bad
column, so it's clear that the column name is wrong and does not correspond to any
actual data in the DB.
BUG=http://b/332336068
As a read-only action that tolerates staleness, locking is unnecessary.
This should help with the lock contention we are observing.
Also reduces the number of VM instances provisioned for BSA and increase
the idle timeout. This should reduce invocation delay. Longer delay may
cause AppEngine to return `Timeout` status to Cloud Scheduler even
though the cron job succeeds.
This also involved breaking out an improperly done assertThat() helper overload
method for JsonObjects into a proper Subject that doesn't further overload
assertThat().
* Check for missing BSA unblockable domains
All unblockable domains created before the last refresh run should be
reported as unblockable (registered).
All reserved domains that are not registered should be reported as
unblockable (reserved). Note that transient errors may be reported for
newly added reserved domains since we do not maintain update time for
when a reserved label is associated with a TLD. However, this scenario
is extremely rare in operations.
* Addressing review
* Add batching to DeleteProberDataAction
* Only get time once
* Add separate query for dry run
* Update querries to actually properly delete all the data
* Fix merge conflicts
* Add test for foreign key constraints
* Make transaction repeatable read
* Make queries to subtables native
* Add native query for GracePeriodHistory
* Kill job after 20 hours
* remove extra time check from read query
* Add index for domainRepoId to PollMessage and DomainHistoryHost
* Add flyway fix for Concurrent
* fix gradle.properties
* Modify lockfiles
* Update the release tool and add IF NOT EXISTS
* Test removing transactional lock from deploy script
* Add transactional lock flag to actual flyway commands in script
* Remove flag from info command
* Add configuration for integration test
* Verify unblockables are truly unblockable
Unblockable domains may become blockable due to deregistration or
removal from the reserved list. The BSA refresh job is responsible
for removing them from the database. This PR verifies that the refreshes
are correct.
Note that recent changes since last refresh are not reflected in the
result, and inconsistency due to recent deregistrations are ignored.
Changes in reserved status or IDN validity are not timestamped,
therefore we cannot ignore recent inconsistencies. However, these
changes are rare.
* Addressing code review
* Addressing code review
Upgrade to using Jakarta EE 10 from Java EE 8 by mostly following the upgrade instructions. Only the servlet package is upgrade. Other Jakarta EE components (like the persistence package that Hibernate depends on) need to be upgraded separately.
TESTED=deployed and successfully communicated with the pubapi endpoint for web WHOIS.
Note that this currently requires packaing the App Engine runtime per instructions here due to GoogleCloudPlatform/appengine-java-standard#98. This PR will only be merged until the fix is deployed to production (https://rapid.corp.google.com/#/release/serverless_runtimes_run_java/java21_20240310_21_0).
We don't use the upload results feature (kokoro picks the results
artifacts directly and uploads them).
Keeping it around is a maintenance burden.
Also fixed a deprecation warning.
Labels that are not in any supported IDN are not added to the database.
Remove such labels from those loaded from the block list files before
comparing with DB.
There is one remaining instance in JpaTransactionManagerImpl that cannot
be removed because DetachingTypedQuery is implementing TypedQuery, which has
a method that expectred java.util.Date.
Console tests fail for the files that are affected by redesign. There's no point in fixing it here. I will reenable the task after the console redesign PR is merged
Note that Dagger currently doesn't work with the Jakarta namespace and
we have to cap the jakarta inject package version below 2.0 so that it
sill provides classes in the old namespace.
Removed the deprecation mark as it is natural to expose methods related
to a transaction like getting the entity manager or checking if one is
in a transaction through the transaction manager interface.
Also only caches/resets the original TM when in unit tests (TBT I'm not so sure
that even this is necessary as we don't seem to call the tool from tests
that often. There is only ShellCommandTest that calls the run() function
in RegistryCli and we could just put these tests in fragileTest and make
them run sequentially and fork every time to get around issue with
inference).
The issue with caching is that it tries to first create the to-be-cached
TM, and when the environment given is prod/sandbox/... It will try to
retrieve SQL credentials from prod/sandbox/... secret manager. This
works fine locally as we all have access to prod/sandbox/..., but fails
in Cloud Build jobs such as sync-db-objects where it provides it own
credential that has direct SQL access, but not access to
prod/sandbox/... secret manager.
TESTED=ran `./gradlew devTool --args="-e localhost generate_sql_er_diagram -o ../db/src/main/resources/sql/er_diagram"`
* Add a GitHub Action workflow
This allows us to create Gradle depedency graphs for Dependabot analysis (as the ones we already get for Javascript dependencies).
* Update Java version
* Add build scan
* codeql 3
* run with gradle
* exclude jIFC
* build scan
* Finalize
Chose the default transaction manager based on RegistryEnvironment. This
makes it possible to run nomulus on Jetty locally. Tested with the
following:
```bash
./gradle :jetty:run -Penvironment=alpha
curl http://localhost:8080/beta.app
```
The docker image is also updated to take an argument that specifies the
environment. It runs locally as well but the container doesn't get
access to locally stored credentials, so it fails to initialize the
transaction manager.
* Add BSA validation job
Add the BsaValidateAction class with a first check (for inconsistency
between downloaded and persisted labels).
* Addressing comments
* Addressing reviews
The staging job runs at 9AM on the 2nd day of each month, we should set
the cursor to be after that time, otherwise we attempt to upload reports
on the 1st day of each month before they are ready, causing an error
email to be sent to us.
Remove email classes that depend on AppEngine API. They have been
replaced by the gmail-based client.
Remove `EmailMessage.from` method, which is no longer used.
There is a fixed sender address for the entire domain, and is
set by the gmail client.
The configs remain to be cleaned up. There is a bug (b/279671974) that
tracks it.
This PR makes the runtime of most of our workload Java 21.
1. App Engine. Java 21 is in GA and it supports Java EE 8. I had to add
an environmental variable so that we don't get an
AppEngineCredentails by default (we have been using
ComputeEngineCredentials for a couple of years). The uprade to Java
21 runtime changed a system property that controls how jetty logging
works, which also control if AppEngineCredential is return. Tested by
deploying to alpha.
2. Proxy base image upgradedd to Java 21 (distroless still doesn't
support Java 21 and it looks like Temurin is the way to go
b/306728455). Tested by deploying to alpha.
3. Nomulus tool image upgrade to Temurin 21 as well. Tested locally.
4. Beam pipeline base image upgrade to Java 21. The JAVA21 flag is not
supported by gcloud yet, but specifying the image URL directly works
(and is supported). Tested by running in alpha.
5. Jetty base image upgraded to Java 21. Tested locally.
Make the necessary changes for the code base to compile with JDK 21.
Other changes:
1. Upgraded testcontainer version and the SQL image version (to be the
same as what we use in Cloud SQL). This led to some schema changes and
also changed the order of results in some test queries (for the
better I think, as the new order appears to be alphabetical).
2. Remove dependency on Truth8, which is deprecated.
3. Enable parallel Gradle task execution and greatly increased the
number of parallel tests in standardTest. Removed outcastTest.
This PR creates a unified RegistryServlet that will serve all
non-console traffic. It also creates a jetty subproject that allows one
to run Nomulus on top of a standard Jetty 12 runtime.
`./gradlew :jetty:stage` will create a jetty base folder at
`jetty/build/jetty-base` where one is able spin up a local Nomulus server
by running the following command inside the folder:
```bash
java -jar ${JETTY_HOME}/start.jar
```
`JETTY_HOME` is a folder where the [Jetty runtime](https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-home/12.0.6/jetty-home-12.0.6.zip) is located.
This PR also adds a Gradle task to create a Nomulus image based on the
official Jetty image:
```bash
./gradlew :jetty:buildNomulusImage
```
This reverts commit 64f5971275.
The catch block is too broad and most of the times the errors caught is
because `command.run()` failed and it had nothing to do with getting
the transaction manager. The `runCommand` method is already wrapped in a try
block that checks for `LoginRequiredException` and gives the appropriate
error message.
We need to re-assess the situation when the next time we encounter a
login issue that did not trigger `LoginRequiredException`. A blanket try
catch block is not the solution and only makes the situation more
confusing.
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/2342)
<!-- Reviewable:end -->
* Add a check so newly saved createCostTransitions get recognized and saved to the database
* Fix equals check
* Rename equals method
* Add comment explaining need for createBillingCostTransitionEqualCheck
* Remove use of shouldPublishField from ReservedList
* Remove from tests
* Update test comment
* Fix indentation
* fix test comment
* Fix test
* fix test
* Make shouldPublish column nullable
We introduced Scrypt as the default password hashing algorithm in
November 2023 and have been auto-converting saved hashes whenever a
successful EPP login or registry lock/unlock request is processed.
We will send comms to registrars to inform them the upcoming removal of
SHA256 support and urge them to log in at least once before the change.
Otherwise, they will need to contact support to reset the password out of
band after the change.
This PR will NOT be submitted until comms are out and the effective date
is immediate.
Co-authored-by: Weimin Yu <weiminyu@google.com>
This makes the callsites look neater, as the work to execute itself is often a
many line lambda, whereas the transaction isolation level is not more than a
couple dozen characters.
The Distinct transform removes duplicates based on the serialized format
of the elements. By providing a deterministic coder, we can guarantee
that no duplicates exist.
This is the start of a long and low priority migration, but for now I wanted to do a couple of them just to see what it looks like.
This also demonstrates the pattern for use of an @AutoBuilder to replace an @AutoValue.Builder. See https://github.com/google/auto/blob/main/value/userguide/records.md#builders for full details on that.
This removes a dependency on the App Engine SDK. It also looks like
(from the logs at least) that shutdown hooks registered the old way stopped
working after the runtime is upgraded to Java 17.
Also removed some random leftover dependencies on the App Engine SKD
that are not needed any more.
* Add java changes for createBillingCostTransitions
* Add negative cost test
* Remove default value
* remove unused variable
* Add check that create cost and trnasitions map are the same
* inject clock, only use key set when checking for missing fields
* Add test for removing map
* Add ALLOW_BSA allocation type
Add a new type to allow creation of domains blocked by BSA.
Except for the BSA semantics, the new type behaves exactly
like SINGLE_USE.
* Addressing reviews
* Addressing review
* Change tld-update to db-object-updater
* rename sync_tlds.sh to sync_db_objects.sh
* Change to configured command name
* Change environment to sandbox explicitly for testing on alpha
* Add remaining object steps and change cloudbuild-tld-sync to cloudbuild-sync-db-objects
* Add build_environment flag
* Change order of command and directory
* Uncomment out reserved list part
Query size is borderline too-large for the replica.
At 50000, about 2 out of 30 took more than 30 seconds and were retried.
Lower to 40000 and we will keep monitoring the executions.
The fallback should only apply on creates, not on updates, otherwise it can
override an existing value for the email address when only the referral email
should be what's updated.
This fixes a bug introduced back in commit in 0ead4f8d9d.
BUG= http://b/322026165
* Stop saving BSA empty refresh changes
We thought that as a way to verify the refresh job to be running, browsing
the GCS bucket with empty files is easier than quering the DB or go to GCP
logging dashboard, but there are too many of them to be useful.
* Add some updates to UpdateReservedListCommand to facilitate internal config presubmits and syncing
Added a dry-run tag for presubmit tests
Added early exit behavior when there are no new changes to the list
Added a new --build_environment tag to be used to indicate command runs from build tools. This tag was also added to UpdatePremiumListCommand. Once this new tag is deployed, and break glass behavior is added, these commands will be modified to prevent runs on the command line in the production environment unless the --build_environment or --break_glass flag is used.
* Fix capitalization
* Added in commented out production environment check for buildEnv flag
The blocking issue is fixed in
https://github.com/google/nomulus/pull/2224.
Java 8 support is being deprecated on 2024-01-31 and no further deployment is
possible afterwards without exception:
https://cloud.google.com/appengine/docs/legacy/standard/java/deprecations
We have been using Java 17 on alpha/crash/qa for several months and have
not oberved any other blocking issue other than possible missing email
attachements, which is being mitigated by including a link to the
attachments saved in GCS.
* Fix BsaRefreshAction bugs
Added functional tests for BsaRefreshAction, which checks for changes in
domain registration and reservation, and apply them to the Unblockable
domain list.
Fixed a few bugs exposed by the tests.
Also refactored a few other tests.
This also moves it back to the replica transaction manager now that it shouldn't be timing
out its queries.
And this adds a test as well (more to come!).
* Define the --end_breakglass and --build_environment flags
It is necessary to define these flags in a deployment before merging go/r3pr/2273 in order to prevent breaking the exisitng TLD syncing and entity presubmit testing that has already been enabled
* make break glass 2 words
* Change break_glass flag to take a Boolean and use false value to end break glass mode
* small fixes
* Fix spacing
* Add missing G
* Add clarifying comment
* Refactor a few BSA unit tests
Added a few helpers for managing reserved list in tests and updated the
tests to use them.
Also fixed a bug: when quering for newly created domains, the query
should be restricted to bsa-enrolled tlds.
* Remove create_tld and update_tld commands
These commands are no longer necessary now that configure_tld command is available. However, the configure_tld command should only be used for crash, QA, and alpha environments. TLDs in production and sandbox must be modified using modifications to their config files in Gerrit unless using the configure_tld command in breakglass mode. Check the "How to configure TLDs" procedure doc for more info.
* re-delete file
- Registered names are not affected.
- Reserved names are not affected.
- Names that are none of the above and match some BSA labels are
reported as blocked.
Also fixes the action taken in the case where zero unavailable domains are
found, and temporarily changes over to using the primary DB (because the replica
transaction was timing out at 30 seconds on large databases). I'll switch this
over to use batching and move it back to replica afterwards, but this should
unblock us temporarily.
This PR makes it possible to build the Nomulus code base using Java 17.
Building with Java 11 continue to be possible and the resulting bytecodes are
still at Java 8 level. Also upgraded Gradle to 8.5.
There are several necessary changes to make this happen:
1. Some Gradle plugins need to be upgraded to support Java 17, notably
errorprone. As a result, a lot more "errors" were caught and corrected.
2. All test code are now built and run at Java 8 level. Previously it was left
undefined (which defaults to the version of the compiler) and had led to
situations where we inadvertently called Java 8+ features in production that
are not caught by tests. The change also made the java8compatibility subproject
obsolete, which is therefore removed.
3. Removed the docs subproject. Its main use is to generate flows.md, but it
relies heavily on Java internal APIs that have changed significant with each
version. Upgrading to Java 11 required extensive refactoring of the code there,
and Java 17 again removed many APIs that were used. I don't think it is worth
the maintenance effort just to have a tool to generate flows.md which no one
actually reads.
4. Capped a few GCP dependencies because the latest version depends on
grpc-java >= 1.59.0, which includes a runtime incompatibility
(https://github.com/grpc/grpc-java/releases/tag/v1.59.0).
* Check BSA block status in CheckApi
Checks for and reports BSA block status if the name is not registered or
reserved.
Also moves CheckApiActionTest to standardTest. Whatever problem forcing
it to another suite has apparently disappeared.
Supports the full blocklist download cycle (download, diffing, diff-apply, and order-status reporting) and the refreshing of unblockable domains.
Submitted due to tight deadline. We will conduct post-submit review and refactoring.
SCRYPT is much computationally heavier than SHA265 (by design), which
resulted in test run time doubling due to most tests initializing canned
data that uses hashing.
Since out tests are not verifying the correctness of a specific hashing
algorithm anyway, this PR makes it so that simple concatenation is used
in tests.
Also moved RegistryEnvironment to the util subproject so it can be called by
PasswordUtils, which makes sense as it is a utility class.
* Add BigInt conversion to TimedTransitionProperty<Money> deserializer to handle JPY currency
* Remove unnecessary lines in test
* Add eap schedule check
* Don't use raw LinkedHashMap type
* add timezone
From our investigation, the Monday night WHOIS storm does not cause any
strain to the backend system. The backend latency metrics are all well within
the limits. The latency measured from the proxy matches observed latency
by the prober, and we see that the "used" CPU is 1.5x of "requested" CPU
during the time when the latency is above the threshold.
Making this change hopefully removes the proxy as the bottleneck and
ameliorate the pages.
Add the BsaDomainRefresh class which tracks the refresh actions.
The refresh actions checks for changes in the set of registered and
reserved domains, which are called unblockables to BSA.
Currently, a verify action is enqueued every time the upload method
succeeds. Because the upload job is wrapped in a transaction, the
same task will be enqueued again if the transaction retries.
We cannot move the upload method outside the transaction because the
read-upload-write logic needs to be atomic, and the upload part itself
is idempotent (therefore retri-able). We can, however, move the
enqueuing part outside the transaction as we only need to enqueue the
verify task once the transaction succeeds. This should fix the issue
where multiple verify jobs try to hit the same marksdb endpoints,
resulting in 429 (Too Many Requests) errors.
* Add a dryrun tag to UpdatePremiumListCommand and early exit command if no new changes to the list
* Change prompt string when no change to list to reflect that there is no actual prompted user input
* Add camelCase and correct flag name
This might be the cause of the SQL performance degradation that we are
observing during the recent launch. The change went in a month ago but
there hasn't been enough increase in mutating traffic to make it
problematic until the launch.
Note that presubmits should run faster too with this chance, which
serves as an evidence that excessive logging is the culprit.
Add the BsaDomainRefresh table which tracks the refresh actions.
The refresh actions checks for changes in the set of registered and
reserved domains, which are called unblockables to BSA.
This doesn't fix any issues with dead/livelocks when deleting or
updating allocation tokens, but it at least will significantly reduce
the time to load the tokens that we'll want to update/delete.
The code as previously written assumed that creation fees would be the
same as renewal fees -- this is not the case for anchor tenants, where
the renewal fee is always the standard cost for the TLD (instead of any
premium cost). This was already handled properly in the actual billing
implementation, but we didn't tell the user the right renewal cost in
domain checks.
This also removes some warning logs related to nested transactions
We shouldn't have to parse through every single entry to see what
changed
Note: we don't do this for premium lists because those can be HUGE and
we don't want/need to load and display every entry. This was an explicit
choice made in https://github.com/google/nomulus/pull/1482
Since the replica SQL instance is read-only, any transaction performed
on it should be explicitly read-only, which would allow PostgreSQL to
optimize away (some) use of predicate locks.
Also changed the EPP cache to read from the replica. The foreign key
cache already behaves this way.
See: https://www.postgresql.org/docs/current/transaction-iso.html
For reasons unclear at this point, Java 17's servlet implementation on
GAE injects IP addresses (including unroutable private IPs) into the
standard X-Forwarded-For header, which we currently use to embed
registrar IP addresses to check against the allow list. This results in
the server not properly parsing the header and rejecting legitimate
connections.
This PR sets a custom header that should not be interfered with by any
JVM implementation to store the IP address, while maintaining the old
header as a fallback. The proxy will set both headers to allow the
server to gracefully migrate from Java 8 and Java 17 (and potentially
rollback).
Also removed some headers and logic that are not used.
This does not include any styling for now, just wanted to make sure
we're all good with regards to the basic approach. I'm open to suggestion on
which columns to include.
Note: filter searching is not implemented yet because the backend does
not allow for it (yet)
Java 17 injects unexpected headers to X-Forwarded-For, which causes
issues with validating incoming IP addresses.
This is a partial reversion of #2201. We are still keeping Java 17 in other environment but sandbox and production needs to be able to parse the header to accept incoming EPP connections from registrars. Once we fix it we will re-enable Java 17 in these environment.
This allows us to switch the proxy to a different client ID without
disrupting the service. This is a temporary measure and will be removed
once the switch is complete.
Also adds a placeholder getter in the Tld class, so that it can be
mocked/spied in tests. This way more BSA related code can be submitted
before the schema is deployed to prod.
This also adds a targeted QPS as a parameter in case we need to manually bump it
up (or down) for some reason without having to make code changes and re-deploy.
This PR makes a few changes to make it possible to turn on
per-transaction isolation level with minimal disruption:
1) Changed the signatures of transact() and reTransact() methods to allow
passing in lambdas that throw checked exceptions. Previously one has
always to wrap such lambdas in try-and-retrow blocks, which wasn't a
big issue when one can liberally open nested transactions around small
lambdas and keeps the "throwing" part outside the lambda. This becomes a
much bigger hassle when the goal is to eliminate nested transactions and
put as much code as possible within the top-level lambda. As a result,
the transactNoRetry() method now handles checked exceptions by re-throwing
them as runtime exceptions.
2) Changed the name and meaning of the config file field that used to
indicate if per-transaction isolation level is enabled or not. Now it
decides if transact() is called within a transaction, whether to
throw or to log, regardless whether the transaction could have
succeeded based on the isolation override level (if provided). The
flag will initially be set to false and would help us identify all
instances of nested calls and either refactor them or use reTransact()
instead. Once we are fairly certain that no nested calls to transact()
exists, we flip the flag to true and start enforcing this logic.
Eventually the flag will go away and nested calls to transact() will
always throw.
3) Per-transaction isolation level will now always be applied, if an
override is provided. Because currently there should be no actual
use of such feature (except for places where we explicitly use an
override and have ensured no nested transactions exist, like in
RefreshDnsForAllDomainsAction), we do not expect any issues with
conflicting isolation levels, which would resulted in failure.
3) transactNoRetry() is made package private and removed from the
exposed API of JpaTransactionManager. This saves a lot of redundant
methods that do not have a practical use. The only instances where this
method was called outside the package was in the reader of
RegistryJpaIO, which should have no problem with retrying.
We have been using SHA256 to hash passwords (for both EPP and registry lock),
which is now considered too weak.
This PR switches to using Scrypt, a memory-hard slow hash function, with
recommended parameters per go/crypto-password-hash.
To ease the transition, when a password is being verified, both Scrypt
and SHA256 are tried. If SHA256 verification is successful, we re-hash
the verified password with Scrypt and replace the stored SHA256 hash
with the new one. This way, as long as a user uses the password once
before the transition period ends (when Scrypt becomes the only valid
algorithm), there would be no need for manual intervention from them.
We will send out notifications to users to remind them of the transition
and urge them to use the password (which should not be a problem with
EPP, but less so with the registry lock). After the transition,
out-of-band reset for EPP password, or remove-and-add on the console for
registry lock password, would be required for the hashes that have not
been re-saved.
Note that the re-save logic is not present for console user's registry
lock password, as there is no production data for console users yet.
Only legacy GAE user's password requires re-save.
The domains (and their associated billing recurrences) were already being loaded
to check restores, but they also now need to be loaded for renews and transfers
as well, as the billing renewal behavior on the recurrence could be modifying
the relevant renew price that should be shown. (The renew price is used for
transfers as well.)
See https://buganizer.corp.google.com/issues/306212810
Both actions have not been used for a while (the wipe out action
actually caused problems when it ran unintentionally and wiped out QA).
Keeping them around is a burden when refactoring efforts have to take
them into consideration.
It is always possible to resurrect them form git history should the need
arises.
We finally fixed Spinnaker (I hope) to deploy bundled services with Java
17 runtime. Note that the bytecodes are still targeting Java 8. The only
change this PR introduces is to switch the runtime environment to Java
17.
TESTED=deployed to crash.
This was already set to true in all environments except prod last week. Now that the release has gone out and we have not seen any issues, we should feel safe turning this on in production as well.
We previously needed to use URLFetch in some instances where TLS 1.3 is
required (mostly when connecting to ICANN servers),and the JDK-bundled SSL
engine that came with App Engine runtime did not support TLS 1.3.
It appears now that the Java 8 runtime on App Engine supports TLS 1.3
out of the box, which allows us to get rid of URLFetch, which depends on
App Engine APIs.
Also removed some redundant retry and logging logic, now that we know
the HTTP client behaves correctly.
TESTED=modified the CannedScriptExecutionAction and deployed to alpha, used the
new HTTP client to connect to the three URL endpoints that were
problematic before and confirmed that TLS connections can be established. HTTP
sessions were rejected in some cases when authentication failed, but
that was expected.
* Add a cloudbuild-tld-sync job
This job checks the Tld config files in the internal repo and syncs them with the actual Tld objects in the database using the configure_tld numulus command.
* Add the dockerfile and shell script
* Force the command
* Add comments
* add newline
* Create a separate copy of the job for each environment
* fix file name
* Fix indentation
* Lower the isolation level for RefreshDnsForAllDomainsAction
This lowers the isolation level to TRANSACTION_REPEATABLE_READ which will hopefully allow the action to run the entire action without timing out on our larger TLDs.
* Unchange default config
Cache loads will likely always be inner transactions, if they have a transaction at all. Cache loads do not always call a transaction since they are only necessary if the cache is not fresh at the time it is called. Since the cache itself needs to decide whether or not a DB transaction is necessary, it should use the reTransact method to safely indicate that the isolation level of the outer transaction is what should be used.
If the cert(s) are invalid or expired that's a problem, but that
shouldn't necessarily prevent us from changing other things. If we're
not changing the certs, leave them alone.
If we don't explicitly handle random unexpected exceptions, the error
that the front end receives includes a big ole stacktrace, which is
unhelpful for regular users and possibly bad to expose. Instead, we
should provide a vague "something went wrong" message.
Separately, we can create a default SnackBar options and use that (we
want it longer than 1.5 seconds because that's pretty short).
Also made some refactoring to various Auth related classes to clean up things a bit and make the logic less convoluted:
1. In Auth, remove AUTH_API_PUBLIC as it is only used by the WHOIS and EPP endpoints accessed by the proxy. Previously, the proxy relies on OAuth and its service account is not given admin role (in OAuth parlance), so we made them accessible by a public user, deferring authorization to the actions themselves. In practice, OAuth checks for allowlisted client IDs and only the proxy client ID was allowlisted, which effectively limited access to only the proxy anyway.
2. In AuthResult, expose the service account email if it is at APP level. RequestAuthenticator will print out the auth result and therefore log the email, making it easy to identify which account was used. This field is mutually exclusive to the user auth info field. As a result, the factory methods are refactored to explicitly create either APP or USER level auth result.
3. Completely re-wrote RequestAuthenticatorTest. Previously, the test mingled testing functionalities of the target class with testing how various authentication mechanisms work. Now they are cleanly decoupled, and each method in RequestAuthenticator is tested individually.
4. Removed nomulus-config-production-sample.yaml as it is vastly out of date.
Surround the dot in unsafe domain names with a square bracket. This
is suggested by Gmail abuse-detection and allows outgoing messages
to pass Gmail's check. This should also help with recipients' checks.
Adds a delay between emails sent in a tight loop. This helps avoid
triggering Gmail abuse detections.
Also updated the recipient address for billing alerts.
* Add custom YAML serializer for Duration
This addresses b/301119144. This changes the YAML representation of a TLD to show Duration fields as a String reperesntation using the Java Duration object's toString() format. This eliminates the previous ambiguity over the time unit that is being used for each duration.
* change standardSeconds to standardMinutes in test
* Add custom serializer to the entire mapper
Unless an --oauth flag is used, the nomulus tool will only send the OIDC
header. The server still accepts both headers and the user should use
`create_user` command to create an admin User (with the --oauth flag on), which
will then allow one to use the nomulus tool without the --oauth flag.
The --oauth flag and the server's ability to support OAuth-based
authentication will be removed soon. Users are urged to create the User
object in time to avoid service interruption.
TESTED=verified on alpha.
We want to do this because it takes the user to an external site, which
could potentially lead to confusion if they tried to use the back button
without a new tab.
Add documentation that describes the current Cloud Build status notification
to Google Chat, as well as how to update the configuration and the
notifier service.
This isn't the prettiest thing, but it replicates the type of view /
edit functionality that we had in the original console.
Of note: this doesn't include input field validation, which would
probably be a good idea to add at some point.
It will call equalsImmutableObject(), which seems the right thing to do.
We only care if the two Tld objects have the same fields, not if they
are the same object. ErrorProne complained about comparison by identity.
* Defend against deserialization-based attacks
Added the `SafeObjectInputStream` class that defends attacks using
malformed serialized data, including remote code execution and
denial-of-service attacks.
Started using the new class to handle EPP resource VKeys and
PendingDeposits, which are passed across credential-boundaries: between
TaskQueue and AppEngine server, and between AppEngine server and the RDE
pipeline on GCE. Note that the wireformat of VKeys do not change,
therefore existing tasks sitting in the TaskQueue are not affected.
Also removed an unused class: JaxbFragment.
This token is only ever used for logging. The GAE OAuth service will
parse the header directly when called to retrieve the current user and
user id. Logging it in prod could be a security risk if the logs are
leaked.
Previously this didn't properly deal with nested routings, e.g.
"settings/whois". It tried to just pass "whois" as the next url which
doesn't work with the router because it's nested under the settings.
Using all parts of the URL allows us to handle the nesting.
* Add a configureTld command that uses YAML
* Add more tests and edge case handling
* Add out of order test and fix wrong inject
* small changes
* Add check for ascii
* Add check for ROID suffix
Hibernate logs certain information at the ERROR level, which for the
purpose of troubleshooting is misleading, since most affected operations
succeed after retry. ERROR-level logging should only be added by Nomulus
code.
This PR does two things:
1. Disable all logging in two Hibernate classes: we cannot disable
logging at a finer granularity, and we cannot preserve lower-level
logging while disabling ERROR.
2. Adds a DatabaseException which captures all error details that may
escape the typical loggers' attention: SQLException instances can be
chained in a different way from Throwable's `getCause()` method.
The old semantics for TransactionalFlow meant "anything that needs to mutate the
database", but then FlowRunner was not creating transactions for
non-transactional flows even though nearly every flow needs a transaction (as
nearly every flow needs to hit the database for some purpose). So now
TransactionalFlow simply means "any flow that needs the database", and
MutatingFlow means "a flow that mutates the database". In the future we will
have FlowRunner use a read-only transaction for TransactionalFlow and then a
normal writes-allowed transaction for MutatingFlow. That is a TODO.
This also fixes up some transact() calls inside caches to be reTransact(), as we
rightly can't move the transaction outside them as from some callsites we
legitimately do not know whether a transaction will be needed at all (depending
on whether said data is already in memory). And it removes the replicaTm() calls
which weren't actually doing anything as they were always nested inside of normal
tm()s, thus causing confusion.
In the future, reTransact() will be the only way to initiate a transaction that
doesn't fail when called inside an outer wrapping transaction (when wrapped,
it's a no-op). It should be used sparingly, with a preference towards
refactoring the code to move transactions outwards (which this PR also
contains).
Note that this PR includes some potential efficiency gains caused by existing
poor use of transactions. E.g. in the file RefreshDnsAction, the existing code
was using two separate transactions to refresh the DNS for domains and hosts
(one is hidden in loadAndVerifyExistence(), whereas now as of this PR it has a
single wrapping transaction to do so.
It turns out that disallowing all nested transaction is impractical. So
in this PR we make it possible to run nested transactions (which are not
really nested as far as SQL is concerned, but rather lexically nested
calls to tm().transact() which will NOT open new transactions when
called within a transaction) as long as there is no conflict between the
specified isolation levels between the enclosing and the enclosed
transactions.
Note that this will change the behavior of calling tm().transact() with
no isolation level override, or a null override INSIDE a transaction.
The lack of the override will allow the nested transaction to run at
whatever level the enclosing transaction runs at, instead of at the
default level specified in the config file.
* Fix Cloud Tasks failure to retry
Replace `SC_NOT_MODIFIED` (304) with `SC_SERVICE_UNAVAILABLE` (503) when
data is not available yet. Affected actions are invoice- and spec11-publishing.
It is confirmed that Cloud Tasks currently does not retry with code 304,
despite the public documentation stating so. We will use 503 for now,
pending the decision by Cloud Tasks whether to change behavior or
documentation.
The code `TOO_EARLY` (425) is another alternative. It is not meant for
our use case but at least sounds like it is. However, it is not in any
javax.servlet jar. We don't want to define our own constant, and we cannot upgrade
to jakarta.servlet yet.
Also revert previous mitigation.
This includes a bit of refactoring of the GSON creation. There can exist
some objects (e.g. Address) where the JSON representation is not equal to the
representation that we store in the database. For these objects, when
deserializing, we should update the objects so that they reflect the
proper DB structure (indeed, this is already what we do for the XML
parsing of Address).
* Change PackagePromotion to BulkPricingPackage
* More name changes
* Fix some test names
* Change token type "BULK" to "BULK_PRICING"
* Fix missed token_type reference
* Add todo to remove package type
A config file field is added to control if per-transaction isolation
level is actually used. If set to true, nested transactions will throw
a runtime exception as the enclosing transaction may run at a different
isolation level.
In this PR we only add the ability to specify the isolation level,
without enabling it in any environment (including unit tests), or
actually specifying one for any query. This should allow us to set up
the system without impacting anything currently working.
* Mitigate Cloud task retry problem
Increase PublishSpec11Action start delay to avoid the need to retry.
The only other use case is invoice, which typically does not retry:
delay is 10 minutes, pipeline finishes within 7 minutes.
This includes two changes:
1. Creating a base string-type adapter for use parsing to/from JSON
classes that are represented as simple strings
2. Changing the object-provider methods so that the POST bodies should
contain precisely the expected object(s) and nothing else. This way,
it's easier for the frontend and backend to agree that, for instance,
one POST endpoint might accept exactly a Registrar object, or a list
of Contact objects.
Co-Authored-By: gbrodman <gbrodman@google.com>
This includes two changes:
1. Creating a base string-type adapter for use parsing to/from JSON
classes that are represented as simple strings
2. Changing the object-provider methods so that the POST bodies should
contain precisely the expected object(s) and nothing else. This way,
it's easier for the frontend and backend to agree that, for instance,
one POST endpoint might accept exactly a Registrar object, or a list
of Contact objects.
2.25.0 contains a breaking change that made HttpStorageOptions not
serializeable, which breaks RDE as it needs to access GCS from Beam.
2.22.6 was the last version that was used before the Gradle upgrade.
Also had to downgrade google-cloud-nio to pass the tests.
For some inexplicable reason, I had to manually add
guava-listenablefuture as
testRuntimeClasspath/runtimeClasspath/deploy_jar dependencies to the
networking, docs and prober subprojects' lock files, as running
`gradle test --write-locks` would NOT add them and succeed; but without
`--write-locks`, running the corresponding tests would fail.
See: b/294378137.
Use a system property to specify whether this check should be executed.
We will update the presubmit test script to run this check only during
foss-pr.
Add placeholder configs for sending emails using Gmail in GSuite.
The names of the new configs are temporary. After migration they
will revert to the names currently in use by the AppEngine email API.
The instance ID used to be uniquely determined by App Engine SDK. Since
we no longer calls the SDK, we need a way to distinguish instances so
that their metrics would not stump on each other and result in a time
inversion error (as we have seen frequently in the logs since the
removal of the App Engine SDK).
This includes removing (hopefully temporarily) the gradle-lint plugin as
it is incompatible with various Gradle versions (see
https://github.com/nebula-plugins/gradle-lint-plugin/issues/393). This
is somewhat unfortunate since the plugin is useful for removing unused
dependencies, though with the relatively small amount of Gradle code we
write hopefully it will not be missed much. If Nebula changes their
code to be compatible with Gradle 8+, we can re-add it easily.
This upgrade means we can remove the code added in 342051e1.
* Use Jackson to create and read Tld YAML files
* Add getObjectMapper to TldYamlUtils
* revert lockfiles
* Fix optionals
* Add more tests and javadocs
* small fixes
This is part of the spec in RFC 5730 that we hadn't implemented until now. Note
that this requires changing LoginFlow to be transactional, but I don't think
that should cause any issues.
It was only called in one place (in actual production code), and it was just
slightly obscuring the fact that re-saves can be scheduled for multiple points
in the future in a way that wasn't amazingly helpful to understanding of the
system logic at the callsite.
This includes two changes, the second necessary for testing the first.
1. We add the rdap-queries field as mandated by the amendment to the
registry agreement,
https://itp.cdn.icann.org/en/files/registry-agreement/proposed-global-amendment-base-gtld-registry-agreement-12-04-2023-en.pdf.
This is fairly similar to the whois-queries field where we just query
the logs, but instead of searching for "whois" we search for "rdap".
2. BigQuery doesn't use MAX to refer to the bigger of two fields; MAX
accepts an array as an argument. In order to do what we want (and to
have the BigQuery statements succeed), we need to use GREATEST.
Tested both versions in alpha and production BigQuery instances.
This is part of b/247839944 as a followup to the large bug from
September 2022. As a result of that, there are domains whose
BillingRecurrence objects were closed but the domain wasn't deleted. In
order to avoid having these domains stick around forever without being
billed, we want to restart billing on them whenever their next billing
cycle would have been.
See b/290228682, there are edge cases in which the net_renew would be negative when
a domain is cancelled by superusers during renew grace period. The correct thing
to do is attribute the cancellation to the owning registrar, but that would require
changing the owing registrar of the the corresponding cancellation DomainHistory,
which has cascading effects that we don't want to deal with. As such we simply
floor the number here to zero to prevent any negative value from appearing, which
should have negligible impact as the edge cage happens very rarely, more specifically
when a cancellation happens during grace period by a registrar other than the the
owning one. All the numbers here should be positive to pass ICANN validation.
See b/248035435 for more details / reasoning, but basically this will
make it easier if we ever need to restore user actions in the future (or
figure out which user actions went wrong)
It was used by cron job and task queues, which now use OIDC-based auth.
Also renamed and consolidated auth enums to make them easier to
understand. Ultimately we should get rid of the AuthMethod part as OIDC
will be the only auth method used.
Based on the updated routing map:
Backend and tools: the only change is that INTERNAL is removed from allowed
auth methods. Should be an no-op.
Pubapi: INTERNAL is removed from allowed auth. For endpoints that only
allowed INTERNAL before, API and LEGACY become the allowed methods.
However this should not affect anything because regardless of which auth
method is ultimately used, the required auth level is always NONE for
pubapi endpoints. Therefore any auth result is discarded anyway.
Frontend: INTERNAL is removed. RegistryLockVerifyAction has lowered
its required auth level to NONE because it extends HtmlAction, which can
redirect the user to login if necessary. All other endpoints extending
HtmlAction require NONE, so it's better to keep things consistent.
Instead of using REDACTED FOR PRIVACY everywhere we should just include
the empty string (this is what the spec says, what other gTLD registrars
do, and what the RDAP conformance tool at
https://github.com/icann/rdap-conformance-tool says to do.
In the contact VCards, we omit redacted fields entirely unless the spec
requires that they exist (the version number and an empty 'fn' field).
This also applies to the "handle" field.
Eventually we will probably want to add the redaction extension but
that's not RFCed yet and isn't required for the August RDAP conformance
deadline.
It was previously calling toString() on an Optional<PremiumList> which was
unnecessarily verbose. The existing premium list is required to be present
anyway.
This is a follow-on to comments in PR #2037. It makes the main loop cleaner and
also removes ambiguities around database handling when the first query is run
with the cursor still empty because no results have been found yet.
This PR changes the two flavors of OIDC authentication mechanisms to
verify the same audience. This allows the same token to pass both
mechanisms. Previously the regular OIDC flavor uses the project id as
its required audience, which does not work for local user credentials
(such as ones used by the nomulus tool), which requires a valid OAuth
client ID as audience when minting the token (project id is NOT a valid
OAuth client ID).
I considered allowing multiple audiences, but the result is not as clean
as just using the same everywhere, because the fall-through logic would
have generated a lot of noises for failed attempts.
This PR also changes the client side to solely use OIDC token whenever
possible, including the proxy, cloud scheduler and cloud tasks. The nomulus
tool still uses OAuth access token by default because it requires USER level
authentication, which in turn requires us to fill the User table with objects
corresponding to the email address of everyone needing access to the tool.
TESTED=verified each client is able to make authenticated calls on QA with or
without IAP.
* Add an includeDeleted option to RefreshDnsForAllDomainsAction
* Add batching to the query
* Some refactoring
* Make batch size configurable
* Set status to ok
* Combine into one transaction
* Remove smear mintes parameter
* Only pass in lastInPreviousBatch
Add a --canary option (default to false) to the CurlCommand that allows
connection to the canary endpoints.
During canary analysis, only the DEFAULT-canary receives traffic. This
new flag allows use to test other canary services manually using the
curl command.
* Add Gmail Client and set up tests
Add a Gmail client and manually triggered email tests in
CannedScriptExecutionActon.
We will test Gmail with Google Workspace in Sandbox, since Alpha and
Crash are not properly set up for Google Workspace, and we have not
figured out why.
* Remove nested transaction from requestDnsRefresh
* Add a bulk version
* Remove transaction time as field
* Only add delay once
* have PublishDnsUpdatesAction use bulk refresh
The Java code will be added in a followup PR.
Also fixed tests failing due to org.json upgrade: decimal whole numbers
no longer have their fractional parts removed, so currency value strings
must end with ".00" instead of ".0".
When RdeReportAction is invoked without a prefix parameter (as in the
case when it is kicked off by cron jobs for potential catch ups), we
need to used the same heuristics that's employed in RdeUploadAction to
find the most recent prefix for the given watermark, otherwise the job
will not find any deposits to upload.
Also renamed RdeUtil to RdeUtils, to be consistent with our naming
conventions.
IAP and regular OIDC auth mechanisms are unified under a base class that
produces either APP or USER level AuthResult based on the principal email
found in the OIDC token.
Also moved some enum classes to better organize code structure.
This encompasses most of the basic information that is viewable in the
existing console, basically, just viewing the base info of the Registrar
object.
Use the ApplicationDefaultCredential annotation instead.
The new annotation has been verified in sandbox and production using the
'executeCannedScript' endpoint. The verification code is removed in this
PR too.
This adds a possible configuration point "defaultServiceAccount" (which
in GAE will be the standard GAE service account). If this is configured,
CloudTasksUtils can create tasks with standard HTTP requests with an
OIDC token corresponding to that service account, as opposed to using
the AppEngine-specific request methods.
This also works with IAP, in that if IAP is on and we specify the IAP
client ID in the config, CloudTasksUtils will use the IAP client ID as
the token audience and the request will successfully be passed through
the IAP layer.
Tetsted in QA.
This column is used by the billing team to create invoices. Registrars
have asked that a single invoice be created for a given registrar,
instead of one per registrar-tld pair. This should have no other effect
on the billing pipeline as the invoice grouping key has a description
field that also contains the TLD, so the granularity as a whole does not
change.
We have been using it as a poor man's timed flag that triggers a system
behavior change after a certain time. We have no foreseeable future use
for it now that the DNS pull queue related code is deleted. If in the
future a need for such a flag arises, we are better off implementing a
proper flag system than hijacking this class any way.
* Prepare switch of credential annotation
Prepare the switch from DefaultCredential to ApplicationCredential.
In nomulus tools, start using the new annotation. This is tested by
successfully using the nomulus curl command, which actually needs a
valid credential to work.
For remaining use cases of the old annotation in Nomulus server, add
some code that relies on the new credential to work. Once these code
are tested in sandbox and production, we will switch the annotations.
If the user does, e.g. `--allowed_nameservers=` (or contact ids) that
shouldn't mean a list consisting solely of the empty string.
Using this parameter / converter allows us to ensure that lists of
strings look reasonable.
This includes renaming the billing classes to match the SQL table names,
as well as splitting them out into their own separate top-level classes.
The rest of the changes are mostly renaming variables and comments etc.
We now use `BillingBase` as the name of the common billing superclass,
because one-time events are called BillingEvents
* Allow rotation when updating registrar cert
When updating a registrar's primary cert, add a flag to activate
rotation of previous primary cert to failover.
This functionality is part of the prober ssl cert renewal automation.
The only method that is called from this class is setNumInstances. However we
don't current use `nomulus set_num_instances` anywhere. If we need to change
the number of instances, it is either done by updating appengine-web.xml, which
is deployed by Spinnaker, or doing it manually as a break-glass fix via gcloud
or on Pantheon.
Because we need to check if a contact history is the most recent for its
underlying contact resource, the query-wipe out-repeat loop no longer works
ideally due to the added overhead with the query.
Instead, we refactor the logic into a Beam pipeline where the query only
needs to be performed once and history entries eligible for wipe out are
handled individually in their own transforms. Because history entries
are otherwise immutable, we can run the pipeline in relatively relaxed
repeatable read isolation level. We also do not worry about batching for
performance, as we do not anticipate this operation to put a lot of
strains on the particular table.
This also adds README files to explain the two different IDN table locations
(which have different purposes). See http://b/278565478 for more information.
This includes changes to make sure that we use the proper per-TLD IDN
tables as well as setting/updating/removing them via the Create/Update
TLD commands.
There can be situations (anchor tenants, test tokens, other ways of
getting a domain to cost $0) where we may want to delete a domain during
the add grace period but the credit applied is 0. We should not fail on
those cases.
See b/277115241 for an example.
This new table has just been approved by ICANN. It is the same as our existing
Extended Latin table, except with the removal of some lesser-used characters
with diacritic marks that are confusable variants.
The filenames for the IDN tables are made explicit to improve code readability.
And this reverses the removal of G with stroke from the existing Extended Latin
table (see PR #1938), so that that table continues to accurately reflect the
state of our previously launched TLDs.
This is the full list of removed characters:
U+00E1 # LATIN SMALL LETTER A WITH ACUTE
U+0101 # LATIN SMALL LETTER A WITH MACRON
U+01CE # LATIN SMALL LETTER A WITH CARON
U+010B # LATIN SMALL LETTER C WITH DOT ABOVE
U+01E7 # LATIN SMALL LETTER G WITH CARON
U+0123 # LATIN SMALL LETTER G WITH CEDILLA
U+01E5 # LATIN SMALL LETTER G WITH STROKE
U+0131 # LATIN SMALL LETTER DOTLESS I
U+00ED # LATIN SMALL LETTER I WITH ACUTE
U+00EF # LATIN SMALL LETTER I WITH DIAERESIS
U+01D0 # LATIN SMALL LETTER I WITH CARON
U+0144 # LATIN SMALL LETTER N WITH ACUTE
U+014B # LATIN SMALL LETTER ENG
U+00F3 # LATIN SMALL LETTER O WITH ACUTE
U+014D # LATIN SMALL LETTER O WITH MACRON
U+01D2 # LATIN SMALL LETTER O WITH CARON
U+0157 # LATIN SMALL LETTER R WITH CEDILLA
U+0163 # LATIN SMALL LETTER T WITH CEDILLA
U+00FA # LATIN SMALL LETTER U WITH ACUTE
U+00FC # LATIN SMALL LETTER U WITH DIAERESIS
U+01D4 # LATIN SMALL LETTER U WITH CARON
U+1E83 # LATIN SMALL LETTER W WITH ACUTE
U+1E81 # LATIN SMALL LETTER W WITH GRAVE
U+1E85 # LATIN SMALL LETTER W WITH DIAERESIS
U+1EF3 # LATIN SMALL LETTER Y WITH GRAVE
U+017C # LATIN SMALL LETTER Z WITH DOT ABOVE
Makes the next run at the first Monday of December, which should give us
plenty of time to fix the issue with it wiping out PII in the most recent
contact history.
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1982)
<!-- Reviewable:end -->
* Check allowedEppActions when validating tokens
* Reflect failed tokens in the fee check
* Add tests for domainCheckFlow
* Add hyphens to fee class name
* Add clarifying comment to catch block
* Add specific exception types
* Implement default tokens for the fee extension in domain check
* Add test for expired token
* Add test for alloc token and default token
* Fix formatting
* Always check for default tokens
* Change transaction time to passed in DateTime
Also adds a DnsUtils class to deal with adding, polling, and removing
DNS refresh requests (only adding is implemented for now). The class
also takes care of choosing which mechanism to use (pull queue vs. SQL)
based on the current time and the database migration schedule map.
When submitting tasks to Cloud Tasks, we will use the built-in OIDC
authentication which runs under the default service account (not the
cloud scheduler service account). We want either to work for app-level
auth.
This does nothing for now, but in the future this will allow us to refer
to the RegistryConfig and/or Service objects from the core project. This
will be necessary when changing CloudTasksUtils to not use the AppEngine
built-in connection (it will need to use a standard HTTP request
instead).
It seems like we're hitting App Engine capacity issues resulting in actual pages
now (for whatever reason, but likely one customer), and we obviously don't want
that.
The value of the column would be set to START_OF_TIME for new entries.
Every time a row is read, the value is updated to the current time. This
allows concurrent reads to not repeatedly read the same entry that has the
earliest request time, because they would only look for rows that have a value
of process time that is before current time - some padding time.
This basically fulfills the same function that LEASE_PADDING gives us
when using a pull queue, whereas a task would be leased for a certain
time, during which time they would not be leased by anyone else.
See: https://cs.opensource.google/nomulus/nomulus/+/master:core/src/main/java/google/registry/dns/ReadDnsQueueAction.java;l=99?q=readdnsqueue&ss=nomulus%2Fnomulus
This PR adds an alternative method to upload Lordn to Nordn server without
using App Engine pull queue. A new database migration stage is added to control
whether a new task is scheduled with the old or new method. The
NordnUploadAction is configured to process both kind of tasks. Once the tasks
scheduled for the old tasks are all processed, we can start using the
new method exclusively.
See: go/registry-pull-queue-redesign
There is no point comparing the old CRL to the new ones when the old one
is invalid. This could happen when the CA cert rotates, after which the
old CRL stop being valid as it fails signature verification against the
new cert.
This change will allow us to keep updating the CRL after a CA rotation without
having to manually delete the old CRL from the database.
See b/270983553.
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1946)
<!-- Reviewable:end -->
ICANN doesn't like this character because it's confusable with a normal G (the
stroke tends to get lost in the visual clutter of the descender), and .com's
Extended Latin table doesn't use it either. Best to get rid of it.
For some reason `./gradlew clean build` on master is failing for me on
multiple machines due to a new org.json:json version triggering license
violations, even though the lock files are not changing.
Note that the old versions are still present because if I remove
"The JSON license", which the old versions use, the check also fails...
We might (likely will) modify some of the fiddly bits around this (maybe
the GSON serialization, where we do the actual authorization, etc) but
this should be a decent basic shell structure for endpoints that the new
registrar console can call to retrieve JSON results.
This is only generated and used if "iapClientId" is set in the proxy
config. If so, we use code similar to
https://cloud.google.com/iap/docs/authentication-howto#obtaining_an_oidc_token_for_the_default_service_account
to generate an ID token that is valid for IAP. We set the token on the
Proxy-Authorization header so that we can keep using the pre-existing
access token as well -- IAP allows for us to use either the
Authorization header or the Proxy-Authorization header.
We were under the mistaken impression before that there was a reliable
way to, out-of-band, get a GAIA ID for a particular email address.
Unfortunately, that isn't the case (at least, not in a scalable way or
one that support agents could use). As a result, we have to allow null
GAIA IDs in the database.
When we (or the support team) create new users, we will only specify the
email address and not the GAIA ID. Then, when the user logs in for the
first time, we will have the GAIA ID from the provided ID token, and we
can populate it then.
See b/260945047.
Also refactored the corresponding tests, which should future updates easier.
This change should be deployed at or around 2023-02-15T16:00:00Z.
* Modify DomainCreateFlow to check for an applicable defaultPromoToken
* Add handling for deleted tokens
* Change cache to allocation token cache
* Abstract away cache methods
* Use AllocationToken.getAll in create flow
* Filter out empty tokens
This is an 'easy' upgrade that requires a minor change in
common/build.gradle and the removal of an unnecessary import in buildSrc.
Gradle 7.4 and above has breaking changes that break the latest nebula lint plugin. We may have to wait a while.
Due to the way the beam pipeline is designed, it will expand an
recurring billing event when its event time is in scope for expansion,
instead of billing time. This means that the one time will be generated
45 days earlier. This would negate the need to check if the expansion is
finished when generating monthly invoices.
We will need to backfill the past 45 days of onetimes before the new
code is deployed. As an illustration, with the old code, a cursor time
of 2023-01-17 means that all auto-renewals whose billing time is before
2023-01-17 were created, which corresponds to an effective cursor time
of 2022-12-03 (45 days before 2023-01-17) for event time. This cursor
will need to be brought to 2023-01-17 to ensure that there is no gap in
generated event times when switching to use the new code.
Currently we synthesize a instance id which requires the use of App
Engine Module API. Given that we only have one version of code running
at one time, and HTTP is stateless, there is no point tracking exactly
which GAE "instance" is. We do lose information on which service (default,
backend, etc) is writing the metric, but that does not seem very
important.
Using a constant fake instance ID allows us to get rid of another GAE
dependency.
The temporary queue.xml file is not deleted in the afterEach() method,
likely causing some flaky tests that we saw due to overwriting of the
file by concurrent tests.
Most of its usage can be replaced by JpaIntegrationTestExtension. In
places where specific GAE APIs are still needed, namely when pull queue
or the User service is used, two simplifed extensions are used, which
makes them much easier to identify when the APIs are no longer used.
We've been using it in production for three weeks now. Everything seems
to be working fine. Removing the code related to checking the migration
state and using the override.
This will replace the ExpandRecurringBillingEventsAction, which has a
couple of issues:
1) The action starts with too many Recurrings that are later filtered out
because their expanded OneTimes are not actually in scope. This is due
to the Recurrings not recording its latest expanded event time, and
therefore many Recurrings that are not yet due for renewal get included
in the initial query.
2) The action works in sequence, which exacerbated the issue in 1) and
makes it very slow to run if the window of operation is wider than
one day, which in turn makes it impossible to run any catch-up
expansions with any significant gap to fill.
3) The action only expands the recurrence when the billing times because
due, but most of its logic works on event time, which is 45 days
before billing time, making the code hard to reason about and
error-prone. This has led to b/258822640 where a premature
optimization intended to fix 1) caused some autorenwals to not be
expanded correctly when subsequent manual renews within the autorenew
grace period closed the original recurrece.
As a result, the new pipeline addresses the above issues in the
following way:
1) Update the recurrenceLastExpansion field on the Recurring when a new
expansion occurs, and narrow down the Recurrings in scope for
expansion by only looking for the ones that have not been expanded for
more than a year.
2) Make it a Beam pipeline so expansions can happen in parallel. The
Recurrings are grouped into batches in order to not overwhelm the
database with writes for each expansion.
3) Create new expansions when the event time, as opposed to billing
time, is within the operation window. This streamlines the logic and
makes it clearer and easier to reason about. This also aligns with
how other (cancelllable) operations for which there are accompanying
grace periods are handled, when the corresponding data is always
speculatively created at event time. Lastly, doing this negates the
need to check if the expansion has finished running before generating
the monthly invoices, because the billing events are now created not
just-in-time, but 45 days in advance.
Note that this PR only adds the pipeline. It does not switch the default
behavior to using the pipeline, which is still done by
ExpandRecurringBillingEventsAction. We will first use this pipeline to
generate missing billing events and domain histories caused by
b/258822640. This also allows us to test it in production, as it
backfills data that will not affect ongoing invoice generation. If
anything goes wrong, we can always delete the generated billing events
and domain histories, based on the unique "reason" in them.
This pipeline can only run after we switch to use SQL sequence based ID
allocation, introduced in #1831.
We can use the saved refresh token associated with the nomulus tool to
request an ID token with an audience of the IAP client in order to
satisfy IAP with with the Nomulus tool.
Note: this requires that the user of the Nomulus tool, e.g.
"gbrodman@google.com" has a User object stored in SQL.
Tested on QA
This is similar to where we store the SQL files for beam pipelines, and
frankly makes more sense. Also streamlined the use of the API to read
SQL files from a jar.
The AppEngine thread factory is only useful if we can't create our own
(this is no longer the case) or if we need access to AppEngine APIs
(this is no longer the case).
The Concurrent class is only used by the DNS writer and the
CreateGroupsAction.
This parameter is misleading and does not do what it purports to do.
Namely, it does not impact the level of parallelism. Given the input n for this
parameter, and m for the batch size, the elements are divided (keyed) into n
groups, each of which are then spread evenly across all threads, which
are eventually in turn batched into batches with size m:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L227
This is also evident in the implementation itself, where the ShardedKey
is determined by the unique number for a worker/thread combo and the
original key:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L268
Using a more concrete example, suppose we have 100 elements and 10
worker threads, with a target batch size of 5. If the "shard" number is set to
1, we first spread the 100 elements across 10 threads, resulting in 10
elements per thread, each thread then batches the elements into 2
batches of size 5.
If the "shard" number is set to 2, the 100 elements are first divided into 2
"shards" of 50 each. Each "shard" is then distributed within the 10
threads, resulting in 5 elements per "shard" per thread. They then get
turned into 1 batch per "shard" per thread. In the end, each thread
still processes 2 batches, even though they are from 2 different "shards".
Therefore this "shard" number does not perform horizontal partitioning
that one normally associates with sharding, and provides no
performance benefits but rather confuses the user.
It is also suggested that using withShardedKey() alone is already
sufficient to achieve auto-sharding within the keyed group. There is no
need to manually divide the input by keying them differently based on
the "shard" number specified:
https://youtu.be/jses0W4Zalc?t=967
Some retriers are no longer needed because transactions are
automatically retried by the JPA transaction manager when there's a
transient exception.
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1874)
<!-- Reviewable:end -->
We no longer use App Engine Remote API as of #1858.
The pipeline endpoint is only for GAE mapreduce, which we stopped doing
for a while.
TESTED=deployed to alpha and used nomulus tool built from master to
connect to the tools service.
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1873)
<!-- Reviewable:end -->
* Add defaultPromoTokens to Registry
* Remove flyway files from this PR
* Fix merge conflicts
* Add back flyway file
* Add more info to error messages
* Change to a list
* Fix javadoc
* Change error message
* Add note to field declaration
These were always properly reflected in the actual creations, but when
running a check flow it would still show a non-zero cost even when using
an ANCHOR_TENANT allocation token. This changes it so that we accurately
show the $0.00 cost.
This is only used for contacting Datastore. With the removal of:
1. All standard usages of Datastore
2. Usage of Datastore for allocation of object IDs
3. Usage of Datastore for GAE user IDs
we can remove the remote API without affecting functionality.
This also allows us to just use SQL for every command since it's lazily
supplied. This simplifies the SQL setup and means that we remove a
possible situation where we forget the SQL setup.
So long, farewell, adios, ciao, sayonara, 再见!
TESTED=deployed to alpha and used `nomulus list_tlds` to confirm that the web app can receive and serve requests.
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1863)
<!-- Reviewable:end -->
With newer versions of Java 11, javadoc fails to build due to unknown
tags in package-info.java files. These files are not important so we
exclude them.
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1866)
<!-- Reviewable:end -->
Switch to using the login email address instead of GAE user ID to
identify console users. The primary use cases are:
1) When the user logged in the registrar console, need to figure out
which registrars they have access to (in
AuthenticatedReigstrarAccess).
2) When a user tries to apply a registry lock, needs to know if they
can (in RegistryLockGetAction).
Both cases are tested in alpha with a personal email address to ensure
it does not get the permission due to being a GAE admin account.
Also verified that the soy templates includes the hidden login email
form field instead of GAE user ID when registrars are displayed on the
console; and consequently when a contact update is posted to the server,
the login email is part of the JSON payload. Even though it does not
look like it is used in any way by RegistrarSettingsAction, which
receives the POST request. Like GAE user ID, the field is hidden, so
cannot be changed by the user from the console, it is also not used to
identify the RegistryPoc entity, whose composite keys are the contact
email and the registrar ID associated with it.
The login email address is backfilled for all RegistrarPocs that have a
non-null GAE user ID. The backfilled addresses converted to the same ID
as stored in the database.
It doesn't actually use any App Engine libraries or code -- it's just a
generic connection with authentication to a service. This also involves
changing that block of config to be "gcpProject" instead of "appEngine"
since it's more generic.
Note: this will require an internal PR as well to change the
corresponding private config block
* Remove package token on manual transfer approval
* remove extra variables
* Add back white space
* Don't overwrite existingDomain
* Format fixes, use available helper variables
* Use PACKAGE allocation tokens in tests
* Refactor
* Fix merge conflicts
* Dont overwrite existingRecurring
* Remove names from packages on automatic transfers
* Add more tests
* Remove unneccesary local variable
* Eliminate unnecessary api call
* Reformat if blocks
* Don't overwrite existingRecurring
This PR fixes the issue where the onetime billing event for an autorenew
is not correctly created if the recurrence of the autorenew is closed
during the autorenew grace period, such as the case if a manual renew
happens during the same grace period.
The detailed analysis of the issue is captured in b/258822640. Note that
this is a quick and dirty fix to make ongoing billing event expanse work
correctly in the future. It does not fix the missing events in the past,
nor can it be used to reconstruct the missing ones (by providing a
different cursor time), due to timeout when triggering the action from
nomulus curl.
Per Weimin, the recurrences that fits the new condition along, based on
the current cursor, would increase from 382k to 430k, a 12% increase. I
checked last nights cron job run, which starts on 22:00 EST and seemed
to finish at 22:15 EST (when the last log for this request was
recorded), so it should definitely still finish in time for the nightly
runs with the new condition.
This is not a complete removal of ofy as we still a dependency on it
(GaeUserIdConverter). But this PR removed it from a lot of places where
it's no longer needed.
* Fix Gradle dependency version pinning
In Gradle 7, version labels require '!!' at the end to be free from
any forced upgrade.
Hibernate min version needs to be advanced past 5.6.12, which is buggy.
Upgraded most dependencies to the latest version.
Therefore this PR also removed several classes and related tests that
support the setup and verification of Ofy entities.
In addition, support for creating a VKey from a string is limited to
VKey<? extends EppResource> only because it is the only use case (to
pass a key to an EPP resource in a web safe way to facilitate resave),
and we do not want to keep an extra simple name to class mapping, in
addition to what persistence.xml contains. I looked into using
PersistenceXmlUtility to obtain the mapping, but the xml file contains
classes with the same simple name (namely OneTime from both PollMessage
and BillingEvent). It doesn't seem like a worthwhile investment to write
more code to deal with that, when the fact is that we only need to
consider EppResource.
* Remove Cloud KMS from Nomulus Server
Removed Cloud KMS from the Nomulus (:core) since it is no longer used.
Renamed remaining classes to reflect their use of the SecretManager.
Updated the config instructions to use a new codename for the keyring:
KMS to CSM. This PR works with both codenames. Will drop 'KMS' after
the internal repo is updated.
Add a implementation of Delegated credential without using downloaded private key.
This is a stop-gap implementation while waiting for a solution from the Java auth library.
Also added a verifier action to test the new credential in production. Testing is helpful because:
Configuration is per-environment, therefore, success in alpha does not fully validate prod.
The relevant use case is triggered by low-frequency activities. Problem may not pop out for hours or longer.
This PR removes all Ofy related cruft around `HistoryEntry` and its three subclasses in order to support dual-write to datastore and SQL. The class structure was refactored to take advantage of inheritance to reduce code duplication and improve clarity.
Note that for the embedded EPP resources, either their columns are all empty (for pre-3.0 entities imported into SQL), including their unique foreign key (domain name, host name, contact id) and the update timestamp; or they are filled as expected (for entities that were written since dual writing was implemented).
Therefore the check for foreign key column nullness in the various `@PostLoad` methods in the original code is an no-op as the EPP resource would have been loaded as null. In another word, there is no case where the update timestamp is null but other columns are not.
See the following query for the most recent entries in each table where the foreign key column or the update timestamp are null -- they are the same.
```
[I]postgres=> select MAX(history_modification_time) from "DomainHistory" where update_timestamp is null;
max
----------------------------
2021-09-27 15:56:52.502+00
(1 row)
[I]postgres=> select MAX(history_modification_time) from "DomainHistory" where domain_name is null;
max
----------------------------
2021-09-27 15:56:52.502+00
(1 row)
[I]postgres=> select MAX(history_modification_time) from "ContactHistory" where update_timestamp is null;
max
----------------------------
2021-09-27 15:56:04.311+00
(1 row)
[I]postgres=> select MAX(history_modification_time) from "ContactHistory" where contact_id is null;
max
----------------------------
2021-09-27 15:56:04.311+00
(1 row)
[I]postgres=> select MAX(history_modification_time) from "HostHistory" where update_timestamp is null;
max
----------------------------
2021-09-27 15:52:16.517+00
(1 row)
[I]postgres=> select MAX(history_modification_time) from "HostHistory" where host_name is null;
max
----------------------------
2021-09-27 15:52:16.517+00
(1 row)
```
These alternative ORMs are introduced as a way to make querying large number of
domains and domain histories more efficient through bulk loading from several
to-be-joined tables separately, then in-memory re-assembly of the final entity,
bypassing the need to query multiple tables each time an entity is queried.
Their primary use case is loading these entities for comparison between
datastore and SQL during the migration, which has been completed. The
code remain unused as of now and their existence makes refactoring and
general maintenance more complicated than necessary due to the need to keep
them up to date.
Therefore we remove the related code.
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1834)
<!-- Reviewable:end -->
Also removed the ability to disable update timestamp auto update as it
was only needed during the migration.
Lastly, rectified the use of raw Coder in RegistryJpaIO.
Passing in an already-existing instance is an antipattern because it can
lead to race conditions where something else modified the object in
between when the pipeline loaded it and when you're saving it. The Write
action should only be writing new entities.
We cannot check IDs for the objects (some IDs are not autogenerated so
they might exist already). We also cannot call `insert` on the objects
because the underlying JPA `persist` call adds the input object to the
persistence context, meaning that any modifications (e.g.
updateTimestamp) are reflected in the input object. Beam doesn't allow
modification of input objects.
* Does not self allocate IDs in Beam by default.
Per b/250948425, it is dangerous to implicitly allow all Beam pipelines
to create buildables by self allocating the IDs. This change makes it so
that one has to explicitly request self allocation in Beam.
A boolean is added to the pipeline option so that it can be passed to
the beam worker initializer that controls the behavior of the JVM on
each worker. Note that we did not add the option in the metadata.json file
because we did not want people to use the override at run time when launching
a pipeline, due to the risk. As shown in RdePipeline.java, we instead
explicitly hard-code the option in the pipeline. There is nothing that
stops one to supply that option when launching the pipeline, but it's
not advised.
Tested=deployed the pipeline alpha and ran it.
* Properly create and use default credential
This PR consists of the following changes:
- Stopped adding scopes to the default credential when using it to access other
non-workspace GCP APIs. Scopes are not needed here.
- Started applying scopes to the default credential when using to access
Drive and Sheets APIs.
- Upgraded Drive access from the deprecated credential lib to the
up-to-date one
- Switched Sheet access from the exported json credential to the
scoped default credential.
This PR requires that the affected files be writable to the default
service account (project-name@appspot.gserviceaccount.com) of the
project.
- This is already the case for exported files (premium terms, reserved
terms, and domain list).
- The registrar sync sheets in alpha, sandbox, and production have been
updated with the new permissions.
All impacted operations have been tested in alpha.
* Properly create and use default credential
This PR consists of the following changes:
- Added a new method to generate scope-less default credential when using it to
access other non-workspace GCP APIs. Scopes are not needed here.
- Started to use the new credential in the SecreteManager.
- Will migrate other usages to this new credential gradually.
- Marked the old DefaultCredential as deprecated.
- Started applying scopes to the default credential when using to access Drive
and Sheets APIs.
- Upgraded Drive access from the deprecated credentials lib
- Switched Sheet access from the exported json credential to the scoped
default credential.
This PR requires that the affected files be writable to the default service
account (project-name@appspot.gserviceaccount.com) of the project.
- This is already the case for exported files (premium terms, reserved terms,
and domain list).
- The registrar sync sheets in alpha, sandbox, and production have been
updated with the new permissions.
All impacted operations have been tested in alpha.
This reverts commit 1ab077d267.
Apparently the new version of Spinnaker that is compatible with this doesn't
work for our release, so we need to roll this back for now. (Again!)
We started getting failures because some of the tests used October. In
general we should freeze the clock for testing as much as possible.
Same thing with the Get*Commands
Basically, any time we're loading a bunch of linked objects that might
change, we want to have REPEATABLE_READ so that another transaction
doesn't come along and smush whatever we think we're loading.
The following instances of READ_COMMITTED haven't changed:
- RdePipeline (it only loads immutable objects like histories)
- Invoicing pipeline (only immutable objects like BillingEvents)
- Spec11 (doesn't use any linked info from Domain)
This also changes the PersistenceModule to use REPEATABLE_READ by
default on the replica JPA TM, for the standard reasoning.
This runs over all domains that weren't deleted as of September 5. This
will fix most of b/248112997, which is itself caused by b/245940594 --
creating synthetic history objects means that the RDE pipeline will look
at those instead of the potentially-no-longer-valid data in the old
history objects.
Also fixed a bug introduced in #1785 where identity checked were performed instead of equality. This resulted in two sets containing the same elements not being regarded as equal and subsequent DNS updated being unnecessarily enqueued.
The old class is modeled after datastore with some logic jammed in for it to work with SQL as well. As of #1777, the ofy related logic is deleted, however the general structure of the class remained datastore oriented.
This PR refactors the existing class into a ForeignKeyUtils helper class that does away wit the index subclasses and provides static helper methods to do the same, in a SQL-idiomatic fashion.
Some minor changes are made to the EPP resource classes to make it possible to create them in a SQL only environment in tests.
`TransferData` is currently a generic class with a complicated type parameter that designate the `Builder` class of its concrete subclass, on order to facilitate returning the said `Builder` from an instance loosely typed to the superclass (`TransferData`) itself.
While this works, in most all places that a `TransferData` is used, the raw, un-generic type is declared, resulting a lot of warnings, not to mention the fact that type safety not actually checked when raw type is used.
In this PR, we make it so that the concrete `Builder` is returned through a protected abstract method that is implemented by the subclasses. The type information therefore no longer needs to be embedded in the superclass type signature, and reflection is not necessary to create the `Builder` either. Overall, it makes `TransferData` a much cleaner class without the messiness of generics.
This uses the GoogleIdTokenVerifier to verify ID tokens passed in
(presumably from a front end) via cookies. This isn't used anywhere yet
but it will be used for front-end API calls for the new console.
First, we create a sequence of User IDs in Postgres and assign it to the
User ID field, meaning that Hibernate can autogenerate IDs.
Next, add an update timestamp.
Next, add a constraint that we can't have multiple Users with the same
email address.
Finally, create a DAO since we'll usually want to query by that email
address (at least for now).
FKI used to be persisted in datastore to help speed up loading by foreign key.
Now it is just a helper class to do the same thing in SQL because
indexing is natively supported in SQL.
- Create a sequence to generate IDs for the user (this allows us to have
Long ID types so that Hibernate can autogenerate IDs)
- Add an update timestamp column so we can extend BackupGroupRoot
- Add a restriction that there can't be multiple users with the same
email address
* Rename ContactResource -> Contact
This is a follow-up to PR #1725 and #1733. Now all EPP resource entity class
names have been rationalized to match with their SQL table names.
This means that LegacyAuthenticationMechanism or a to-be-created
OAuth2AuthenticationMechanism) can return a UserAuthInfo object that
contains either the GAE User or the console User as appropriate. The
goal is that the non-auth flows shouldn't have to know about which user
type it is. Note: the registry lock flow (for now) needs to know about
the separate types of auth because it is a separate level of auth from
the standard AuthenticatedRegistrarAccessor.
The AuthenticatedRegistrarAccessor code is a bit odd because the new
role system doesn't quite fit neatly into the old registrar ->
OWNER,ADMIN system but this is a fine approximation. Basically, any
new registrar role will map to the old OWNER role.
* Add the PackagePromotion table
* Add long id
* Add NOT NULL
* fix formatting
* make package price non null
* Add not nulls to java file
* Fix broken tests from merge conflicts
Also deletes the autorenew poll message history revision id field in
Domain, which is only needed to recreate the ofy key for the poll
message. The column already contains null values in it, making it
impossible to depend on it. The column itself will be deleted from the
schema after this PR is deployed.
The logic to update autorenew recurrence end time is changed
accordingly: When a poll message already exists, we simply update the
endtime, but when it no longer exists, i. e. when it's deleted
speculatively after a transfer request, we recreate one using the
history entry id that resulted in its creation (e. g. cancelled or rejected
transfer).
This should fix b/240984498. Though the exact reason for that bug is
still unclear to me. Namely, it throws an NPE at this line during an
explicit domain transfer approval:
https://cs.opensource.google/nomulus/nomulus/+/master:core/src/main/java/google/registry/flows/domain/DomainFlowUtils.java;l=603;bpv=1;bpt=0;drc=ede919d7dcdb7f209b074563b3d449ebee19118a
The domain in question has a null autorenewPollMessageHistoryId, but
that in itself should not have caused an NPE because we are not
operating on the null pointer. On that line the only possible way to
throw an NPE is for the domain itself to be null, but if that were the
case, the NPE would have been thrown at line 599 where we called a
method on the domain object.
Regardless of the cause, with this PR we are using an explicitly
provided history id and checking for its nullness before using it. If a
similar issue arises again, we should have a better idea why.
Lastly, the way poll message id is constructed is largely simplified in
PollMessageExternalKeyConverter as a result of the removal ofy parent
keys in PollMessage. This does present a possibility of failure when
immediately before deployment, a registrar requests a poll message and
received the old id, but by the time the registrar acks the id, the new
version is deployed and therefore does not recognize the old key. The
likelihood of this happening should be slim, and we could have prevented
it by letting the converter recognize both the old and the new key.
However, we would like to eventually phase out the old key, and in
theory a registrar could ack a poll message at any time after it was
requested. So, there is not a safe time by which all the old ids are
acked, lest we develop some elaborate scheme to keep track of which
messages were sent with an old id when requested and which of these old
ids are acked. Only then can we be truly safe to phase out the old id.
The benefit does not seem to warrant the effort. If a registrar does
encounter a situation like this, they could open a support bug to have
us manually ack the poll message for them.
Basically, what's happening here is that some flow tests are adding
things to the claims list cache which is stored statically, meaning that
some other tests can pick those up when they shouldn't. By adding the
extension in RFTC, it'll clear out the caches after each test.
* Allow anchor tenant creation via allocation token behavior
This also enforces that non-superusers cannot create registrations on
trademarked names prior to the sunrise period, even if they have an
allocation token with ANCHOR_TENANT behavior.
* Create a registry lock permission and corresponding account manager role
This allows us to distinguish between standard account managers and
users that might have the registry lock permission. This will make the
registry lock password-setting flow easier (user can reset their
password iff they have the REGISTRY_LOCK permission, instead of having a
separate boolean) and allows us to easily determine whether or not a
user should have access to registry lock views in the UI.
We cache the ClaimsList Java object for six hours (we don't expect it to
change frequently, and the cron job to update it only runs every twelve
hours). Subsequent calls to ClaimsListDao::get will return the cached
value.
Within the ClaimsList Java object itself, we cache any labels that we
have retrieved. While we already have a form of a cache here in the
"labelsToKeys" map, that only handles situations where we've loaded the
entire map from the database. We want to have a non-guaranteed cache in
order to make repeated calls to getClaimKey fast.
* Use the new renewal price logic in transfer flow
* Fix build
* Add renewal handling on all transfer flows
* Merge branch 'master' into transfer-retain-renewal-price
* Merge branch 'master' into transfer-retain-renewal-price
* Add more tests
* Add base object classes for new user/role permissioning model
- Adds the permissions themselves
- Adds the six roles that a user may have -- three global, three
per-registrar
- Adds the mapping from role -> set of permissions
- Adds a UserRoles object to encapsulate the answer to the question of
"does this user have this permission?"
- Adds a User class as a base to show how we will use the new UserRoles
object
longer included in the generated WAR, even though the deploy_jar
configuration is specified as a dependency.
I could not figure out a way to tweak the configuration dependency to
have core.jar pulled into the .war, so I decided to just explicitly pick
it from its known location.
TESTED=deployed to alpha and verified that the instances can start.
* Rename DomainContent -> DomainBase
This is a follow-up to PR #1725 which renamed DomainBase to Domain. Now, the
class naming hierarchy has the same structure as ContactBase/HostBase.
* Rename DomainBase -> Domain
This was a long time coming, but we couldn't do it until we left Datastore, as
the Java class name has to match the Datastore entity name.
Subsequent PRs will rename ContactResource to Contact and HostResource to Host,
so that everything matches the SQL table names (and is shorter!).
* Merge branch 'master' into rename-domainbase
The fix is based on b/240627423. I tested locally and was able to build
with the -PenableCrossReferencing=true flag successfully.
TESTED=run the kythe GCB pipeline locally.
This PR turns out to be more massive than I would have liked but it
deals with all billing event related stuff, which are more or link all
intertwined:
* Remove all billing events as Ofy entities.
* Add a temporary annotation to allow BillingEvent's ID to be
auto-allocated by ofy while not lacking the Ofy @Id annotation.
* Remove Modification, which is only used in ofy.
* Remove BillingVKey, as we do not need to store the ofy key parent
information anymore. The VKey for a billing event now just contain
its primary key, and can be converted by VKeyConverter.
* Remove BigQuery related code in the billing pipeline.
Note that after BillingVKey is removed, several columns in
BillingCancellation are no longer needed. The change to database schema
will be handled in https://github.com/google/nomulus/pull/1721 after
this PR is deployed to production.
<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1710)
<!-- Reviewable:end -->
* Revert "Upgrade App Engine Standard to Java 17 w/ bundled APIs (#1714)"
This partially reverts commit d8e77e2ab2 (it keeps
intact unrelated version upgrades).
We need to temporarily revert this because Spinnaker isn't quite yet playing
nice with the new <app-engine-apis> configuration option in appengine-web.xml
(it seems like this was added recently and Spinnaker is still stuck on App
Engine SDK version 1.9.82 which predates it). Hopefully we can get that
dependency updated in Spinnaker soon and then we can re-upgrade to Java 17.
* Flyway files for adding allocationToken column to Domain table
* Rename column to current_package_token
* Update er diagram
* Add foreign key to DomainHistory
This shouldn't matter for billing or anything like that because the
actual actions performed that month are still correct, but before this
PR we're including all domains ever created in the total_domains number,
including deleted domains
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1713)
<!-- Reviewable:end -->
* Upgrade App Engine Standard to Java 17 w/ bundled APIs
Note that this doesn't yet upgrade our actual Gradle scripts to use a more
recent of Java (that will happen separately); this solely affects the GAE
instances.
I followed the instructions here:
https://cloud.google.com/appengine/docs/standard/java-gen2/services/access
And note that I removed threadsafe true from appengine's XML config because
that doesn't do anything anymore and was just throwing errors (the new
instances handle multiple requests in parallel by default, no configuration
necessary).
* Convert to gradle 7.
* More fixes, regenerated lockfiles.
* Update lockfiles for dependency update.
* Fix show_upgrade_diff for new lockfile format
* Add property for allowInsecureProtocol
Allow us to override the restriction against use of plain HTTP for
communication to dependency repositories. We need this to be able to use a
local proxy for dependency gathering.
* Checking in missing gradle.lockfile
* Split failing dns update batches and kill after 10 retries
* format fixes
* Add another test
* Switch to CloudTasks
* Change back to app engine header
* Change to immutableList and other changes
* Change to optional header
* Add bug ID to todo
* Switch to constructor injection
* Remove old queue
* Set response status
* Change to Optional<Integer>
* Rename action status
* Switched to use CLoudTaskHelper
* Remove spy in test
This is, as of now, unused but we can use it for b/237683906 and
b/237800445 in the future to allow for special behavior dictated by
allocation tokens rather than having to reserve specific domains.
Note that we enforce a tied domain for ANCHOR_TENANT tokens (because
they should be matched to a domain) but not for BYPASS_TLD_STATE tokens.
This deletes LastSqlTransaction and ReplayGap in Datastore, as well as
SqlReplayCheckpoint and TransactionEntity in SQL. These are all related
to replay and are no longer used.
* Remove columns for unused ofy key reconstitution
Remove the columns in Domain, DomainHistory, GracePeriod and
GracePeriodHistory that were only used for Ofy key reconstitution.
All uses of these were removed in #1660.
* Add forgotten flyway file.
* Re-add database migration state commands
These were removed in PR #1661, but we do still need them for the time being
until we complete the ID migration as well.
This uses the Apache commons CSV parsing instead of rolling our own.
Annoyingly, the results that we're given aren't exactly proper CSVs
since they have a non-standard line of data at the top, and the header
is actually the second line.
Fortunately this no longer requires a log-in, we can just send a GET
request and receive a CSV result in return.
This also adds the apache-commons CSV parser to the dependencies
See https://b.corp.google.com/issues/237784559 for more details
* Resolve conflict
* Fix setup for existing test cases in info and check flow
* Revise info flow test cases
* Fix lint
* Merge branch 'master' into handlefeerequest-renew
* Address code review comments myself
* Merge branch 'master' into handlefeerequest-renew
* Get test passing
* Add check flow tests
* Format, consolidate test helpers
* Don't unnecessarily specify XML name
Now that ofy keys aren't necessarily being restored, it seemed prudent to
audit existing uses to verify that we aren't relying on any keys that may now
be null.
This fixes one case that appeared to be potentially problematic (in
ResourceFlowUtils), removes a few methods that call getOfyKey() but are no
longer used, and adds comments to one use of the key that appears to be safe
after visual inspection.
This includes:
- deletion of helper DB methods in tests
- deletion of various old Datastore-only classes and removal of any
endpoints
- removal of the dual-database test concept
- removal of 'ofy' from the AppEngineExtension
This removes the code that converts between ofy fields and SQL fields in DomainContent and a number of related core classes (basically anything that also needed modification to support the removal from DomainContent).
Cursor was originally envisioned to support arbitary ImmutableObject
scopes. However, in practice only the Registry scope is used. The SQL
representation of Cursor assumes that and the schema uses a composite ID
with a string column for the primary key of the scope object. Without a
schema migration to persist the VKey of the scope, we cannot support any
ImmutableObject other than those with a primitive string primary key.
Given the complexity involved and the limited use case, the scope is now
explictly limited to Registry only.
Also removed mapreduces that depends on Ofy keys of Cursors, and made
some code quality improvement based on IntelliJ suggestions on modified
files.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1672)
<!-- Reviewable:end -->
These tests use Ofy exclusively and should not run anymore, as any class
they test also use Ofy and should be deleted.
More importantly, running tests in Ofy mode makes it hard to remove Ofy
from entities, especially Registrar and RegistrarContact, as some of
them are created as canned data when tests are initiated, and the
creation would fail if they are not registered as Ofy entities.
It is therefore a prerequisite to disable these tests before we can
remove Ofy from those entities. We could have deleted them, but I think
that should be done when the corresponding classes tested by them are
deleted.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1679)
<!-- Reviewable:end -->
GPG1 is deprecated and stuck in v1.4 from 2018. GPG2 is recommended. We
only use the GPG binary in tests and when the host system has both
versions it causes problems because we hardcode the GPG import command
in GpgSystemCommandExension to use the binary named "gpg", which could
be linked to either GPG1 or GPG2, causing the other test to fail when
the version of GPG that runs in tests is incompatible with the version of GPG
that imports the keys.
With this PR we only support GPG2 from now on.
Added methods that exist in AppEngineExtension that preload some canned
data. This data is loaded by default and a lot of tests rely on them. As
we migrate away from App Engine, it is helpful to have them in the JPA
test extension which will replace the app engine extension that is
used to set up the database in dual database tests.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1673)
<!-- Reviewable:end -->
I'm not 100% sure that this is strictly necessary, but for now we can
replicate the ability to generate zonefiles for any point in time in the
recent past.
* Migrate ReadDnsQueueAction to use CloudTasksUtils
Also marked TaskQueueUtils as deprecated and fixed a few linter errors.
Note that DNS pull queue still requires the use of the GAE Task Queue API.
* Fix a test failure
* Remove TaskQueueUtils from VKeyTest
* Remove the @error exception that was inadvertently pulled in
One of the more significant changes introduced in this PR is that we use
SQL as the backing database in all tests unless otherwise specified,
e.g. by using the TmOverrideExtension. We change various ofy-related
tests to use this.
This includes various changes:
- Deletion of SqlEntity/DatastoreEntity and related classes. Includes
any necessary changes because of that (e.g. getting a nice SQL key on
error in RegistryJpaIO).
- Deletion of classes that used libraries from the init-sql code
(RefreshDnsOnHostRenameAction)
- Removal of the JpaTransactionManager's backup implementation
- Modification of RegistryJpaWriteTest to not use init-sql code
- Removal of the Transaction class and related classes, however it does
not remove the TransactionEntity class as that would require DB
changes
- Removal of anything related to the actual usage of the database
migration schedule or read-only phases
- Various test changes and fixes to account for the differences in SQL
(like how foreign keys need to exist)
This deliberately doesn't do anything to alter the objects actually
stored in the DB yet, just how we use them
This is the only user of the ofy code that will stick around at least
until we move to the new registrar console. By removing references to
the transaction manager, we will be able to delete all the tm code
without interfering with this.
* Remove version pin for java-diff-utils dependency
Latest version of the lib introduces a small behavior change/bug fix.
It no longer ignores empty lines. This actually makes sense.
Update the test data to reflect this change.
The main purpose of this PR is to help debug b/234189023, where a
registrar reported that in sandbox they observed seemingly successful EPP
update responses to delete NS records, which are not actually deleted after
the commands executed.
To actually load the persisted domain resource after an update would
require us to execute another transaction immediately after the update
transaction and that can only be achieved outside the flow (i. e. in
FlowRunner or EppController) and we need to test for the type of flows
before logging, which seems unnecessarily complex.
For now we are just adding logs inside the update transaction itself to
validate that:
1. The NS records to delete are as expected.
2. The Current NS records are as expected.
3. The new NS records to persist are as expected.
The EPP success reply is the default reply when no errors are thrown in
a transaction. If we see a success reply (which means that the
transaction finished successfully) and expected logs from the transaction, the
only explanation could be that somewhere in the ORM layer the java
representation of what the entity is is different from what is being
presented to the database. I think that signals a much bigger and
fundamental problem, which is quite unlikely given how isolated the
issue under consideration is.
In any case we would like to add the logging functionality in sandbox and ask
the registrar to report again when they see similar issues.
Also made some typo and linting fixes.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1663)
<!-- Reviewable:end -->
* Fix some small transactional issues in SQL mode
These weren't caught until I switched the default database type in tests
to be SQL (separate PR). Fortunately these don't seem to be catastrophic
This includes:
- removing the actions that do the replay
- removing the tests for the replay
- removing the ReplayExtension and adjusting the various tests that used
it appropriately
- removing functionality relating to "things that happen during replay",
e.g. beforeSqlSaveOnReplay
This does not include:
- removing the InitSqlPipeline or similar tasks
- removing e.g. SqlEntity (it's used in other places)
- removing Transforms/RegistryJpaIO and other SQL-pipeline-creation code
* Remove bracket around varname in CloudBuild script
Due to spinnaker restriction: it cannot handle variable references where the var name has brackets around it.
Added spinnaker error message to the comments
This included removing ofy-specific code from various tests. Also, some
of the other tests (e.g. RdapDomainActionTest) had to be configured to
use only SQL -- otherwise, as it currently stands, they were trying to
use ofy.
We also delete the CreateSyntheticHistoryEntriesAction and pipeline
because they're no longer relevant, and impossible to test (the goal of
the actions were to create objects in ofy which doesn't happen any
more).
We're running into issues pulling 2.1.3 from maven, possibly due to
vulnerabilities in dependencies, so this updates it to the most recent
version of 2.2.6.
* Add a new recurrenceLastExpansion column to the BillingRecurrence table
This will be used to determine which recurrences are in scope for
expansion. Anything that has already been expanded within the past year is
automatically out of scope.
Note that the default value is set to just over a year ago, so that, initially,
everything is in scope for expansion, and then will gradually be winnowed down
over time so that most recurrences at any given point are out of scope. Newly
created recurrings (after the subsequent code change goes in) will have their
last expansion time set to the same as the event time of when the recurring is
written, such that they'll first be considered in-scope precisely one year
later.
* Summarize schema related tests
Document existing schema-related tests including presubmit tests and
the schema-verify predeployment test newly added to Spinnaker.
* Inject a DomainPricingLogic into ExpandRecurringBillingEventsAction
This will be used in other PRs to set the renewal price correctly based on the
renewal price behavior of the BillingRecurrence event.
Note that, in order for this to work, a not-null constraint has been lifted on
the EPP flow state when the DomainPricingCustomLogic is being constructed, as
the pricing here will occur in a backend action outside the context of any EPP
flow.
* Slightly improve performance of ExpandRecurringBillingEventsAction
We don't need to log every single no-op batch of 50 Recurrences that are
processed (considering we have 1.5M total in our system), and we also don't need
to process Recurrences that already ended prior to the Cursor time (this gets us
down to 420k from 1.5M).
* Disable Ofy tests.
This change just turns off the Ofy tests at the root, by removing processing
for dual tests and disassociating the TestOfyOnly annotation from test
annotations.
This is far less comprehensive than #1631, but it's probably worth entering as
a stopgap solution just because it should speed up our test runs and unblock a
lot of other cleanup work.
* Fix DualDatabaseTestInvocationContextProviderTest
* Optimize RDAP entity event query
For each EPP entity, directly load the latest HistoryEntry per event type
instead of loading all events through the HistoryEntryDao.
Although most entities have a small number of history entries, there are
a few entities with many entries, enough to cause OutOfMemory error.
We have backend max-instances set to 100, which apparently exceeds the default
quota for GAE. Add info on updating the quota or changing this parameter to
the configuration doc.
* Add batching to ExpandRecurringBillingEventsAction
It's OOMing on trying to load every single BillingRecurrence that needs to be
expanded simultaneously (which is to be expected). So this processes them in
transactional batches of 50.
* Cleanup gpg-agent instances and home directories
The GpgSystemCommandException leaks home directories, but more importantly it
leaks gpg-agent instances. This can cause problems with inotify limits, since
the agent seems to make use of inotify. Do a proper cleanup in afterEach().
* Don't fail if we can't kill the agent
We'll delete the associated code soon enough too, but it's safer to delete the
cron jobs first and run in that state for a week, so we can still run them
manually if need be.
* Reduce the number of manually scaled instances for default/pubapi
This is in the spirit of "not always running significantly over-provisioned",
which helps to save costs and also expose potential scaling issues when they are
still small rather than all at once when they're a big problem.
This can always be reverted if necessary, and can be instantaneously adjusted by
running the `nomulus set_num_instances` command.
* Don't enforce billing account map check on TEST TLDs
This was affecting monitoring (i.e. prober TLDs). Note that test TLDs are
already excluded from the billing account map check in the Registrar builder()
method (see PR #1601), but we forgot to make that same test TLD exclusion in the
EPP flows check (see PR #1605).
* Add test for Java 8 Compatibility
Add a test to check for Java 8 compatibility of jars deployed to
AppEngine.
It is not enough to run existing tests with Java 8 VM, since many API
jars are not exercised by tests. For example, those for GCP services
like the SecretManager.
We take the conservative approach and verify that every class in every
jar are compiled for Java 8.
We would like to re-use the build cache when building RCs for different
environments. There's not much practical use in doing a "clean" for
every build when Gradle should be able to figure out which artifacts
need to be rebuilt. It also does not make sense to build each
environment in a separate step, which also introduces redunency because
not all artifacts are cached across steps. The build cache is enabled by
default.
Lastly, the cache needs to be inside the /workspace folder, which is the
default persisted storage location.
TESTED=tried to build the RCs on alpha and saved about 10 min.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1619)
<!-- Reviewable:end -->
* Add missing transaction for whois lookups
Nameserver whois lookups are failing under SQL for hosts with superordinate
domains because the query in this case is not done in a transaction. We
missed this during testing because a) we didn't have a test for lookups of
hosts with superordinate domains and b) we missed converting
NameserverWhoisResponseTest to a DualDatabaseTest.
This PR fixes the problem and adds the requisite testing.
* Use a single transaction to get host registrars
* Replace streaming with Maps.toMap()
* Downgrade dependencies that no longer support Java8
Downgrade two dependencies whose latest versions no longer support
java8.
A follow up PR will add java8 compatibility to presubmit tests.
We have recently started to routinely breach the 1h timeout. Increasing
this value to 2h. We should also look into reusing the artifacts when
building RCs for different environments.
* Use Gradle dependency dynamic versioning
Use dynamic versioning for Gradle dependencies when possible.
Please refer to go/dr-dependency-upgrade for more information about the
automation plan.
This PR calls out all dependencies that must be pinned to specific
versions for various reasons. The remaining ones are converted to
open-ended version ranges ("[version_str,)").
* Check PAK on domain create
* Add unit test
* update docs
* Remove unneccesary setup
* Fix blank line
* Add check and test to all relevant flows
* Change error message
* Ignore version UIDs during txn deserialization
When deserializing transactions for replay to datastore, ignore class version
UIDs that don't match those of the local classes and just use the local class
descriptors instead. This is a simple solution for the problem of persisted
VKeys containing references to classes where the class has been updated and
the serial version UID has changed.
Also add a "replay_txns" command that replays the transactions from a given
start point so we can verify all transactions are deserializable.
TESTED:
Ran replay_txns against all transactions on sandbox beginning with
transaction id 1828385, which includes Recurring billing events containing
both the old and the new serial version UIDs.
* Change check for root directory during rollback
`rollback_tool` tries to infer the root of the nomulus tree by checking for a
directory named "nomulus". This is potentially problematic (and, indeed, was
for me) since there is no guarantee what that directory will be named.
There are a number of features that characterize the root directory. Check
for the presence of the `rollback_tool` wrapper script, as this is both at
root level and tightly coupled to the python code, so hopefully we won't
move it without testing that the script still works.
This will require edits to a substantial number of registrars on sandbox (nearly
all of them) because almost all of them have access to at least one TLD, but
almost none of them have any billing accounts set. Until this is set, any updates
to the existing registrars that aren't adding the billing accounts will cause
failures.
Unfortunately, there wasn't any less invasive foolproof way to implement this
change, and we already had one attempt to implement it on create registrar
command that wasn't working (because allowed TLDs tend not to be added on
initial registrar creation, but rather, afterwards as an update).
* Downgrade Caffeine to 2.9.3
Apparently Caffeine >=3.* requires Java 11, and we're still stuck on Java 8
because of App Engine Standard. Fortunately this doesn't affect the exposed
interface we're using, so we can simply go back to the newest Caffeine version
once Registry 3.0 Phase 3 (GKE migration) is completed.
Now that SQL is the default, we do not need this side job to run
alongside the main one. Its purpose was to validate the BEAM pipeline
while Datastore was primary.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1599)
<!-- Reviewable:end -->
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
steps:
- name:Checkout repository
uses:actions/checkout@v4
- name:Set Java version
uses:actions/setup-java@v4
with:
distribution:'temurin'
java-version:'21'
# Initializes the CodeQL tools for scanning.
- name:Initialize CodeQL
uses:github/codeql-action/init@v3
with:
languages:${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
|[](https://storage.googleapis.com/domain-registry-kokoro/internal/index.html)|[](https://storage.googleapis.com/domain-registry-kokoro/foss/index.html)|[](https://lgtm.com/projects/g/google/nomulus/alerts/)|[](https://github.com/google/nomulus/blob/master/LICENSE)|[](https://cs.opensource.google/nomulus/nomulus)|
|[](https://storage.googleapis.com/domain-registry-kokoro/internal/index.html)|[](https://storage.googleapis.com/domain-registry-kokoro/foss/index.html)|[](https://github.com/google/nomulus/blob/master/LICENSE)|[](https://cs.opensource.google/nomulus/nomulus)|

@@ -12,16 +12,16 @@ Nomulus is an open source, scalable, cloud-based service for operating
[top-level domains](https://en.wikipedia.org/wiki/Top-level_domain) (TLDs). It
is the authoritative source for the TLDs that it runs, meaning that it is
responsible for tracking domain name ownership and handling registrations,
renewals, availability checks, and WHOIS requests. End-user registrants (i.e.
renewals, availability checks, and WHOIS requests. End-user registrants (i.e.,
people or companies that want to register a domain name) use an intermediate
domain name registrar acting on their behalf to interact with the registry.
Nomulus runs on [Google App Engine][gae] and is written primarily in Java. It is
the software that [Google Registry](https://www.registry.google/) uses to
operate TLDs such as .google, .app, .how, .soy, and .みんな. It can run any
number of TLDs in a single shared registry system using horizontal scaling. Its
source code is publicly available in this repository under the [Apache 2.0 free
and open source license](https://www.apache.org/licenses/LICENSE-2.0).
Nomulus runs on [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine)
and is written primarily in Java. It is the software that
[Google Registry](https://www.registry.google/) uses to operate TLDs such as .google,
.app, .how, .soy, and .みんな. It can run any number of TLDs in a single shared registry
system using horizontal scaling. Its source code is publicly available in this
repository under the [Apache 2.0 free and open source license](https://www.apache.org/licenses/LICENSE-2.0).
@@ -43,12 +43,6 @@ by Joshua Bloch in his book Effective Java -->
<propertyname="message"value='Your Javadocs appear to use invalid <a> tag syntax in @see tags. Please use the correct syntax: @see <a href="http(s)://your_url">url_description</a>'/>
</module>
<!-- Checks that our Ofy wrapper is used instead of the "real" ofy(). -->
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.