This is necessary because we'll use primary-contact emails as a way of
resetting passwords.
In the UI, don't allow editing of email address for primary contacts,
and don't allow addition/removal of the primary contact field
post-creation.
In the backend, make sure that all emails previously added still exist.
We're changing the way that allocation tokens work in suboptimal (i.e. incorrect) situations in the domain check, creation, and renewal process. Currently, if a token is not applicable, in any way, to any of the operations (including when a check has multiple operations requested) we return some variation of "Allocation token not valid" for all of those options. We wish to allow for a more lenient process, where if a token is "not applicable" instead of "invalid", we just pass through that part of the request as if the token were not there.
Types of errors that will remain catastrophic, where we'll basically return a token error immediately in all cases:
- nonexistent or null token
- token is assigned to a particular domain and the request isn't for that domain
- token is not valid for this registrar
- token is a single-use token that has already been redeemed
- token has a promotional schedule and it's no longer valid
Types of errors that will now be a silent pass-through, as if the user did not issue a token:
- token is not allowed for this TLD
- token has a discount, is not valid for premium names, and the domain name is premium
- token does not allow the provided EPP action
Currently, the last three types of errors cause that generic "token invalid" message but in the future, we'll pass the requests through as if the user did not pass in a token. This does allow for a default token to apply to these requests if available, meaning that it's possible that a single DomainCheckFlow with multiple check requests could use the provided token for some check(s), and a default token for others.
The flip side of this is that if the user passes in a catastrophically invalid token (the first five error messages above), we will return that result to any/all checks that they request, even if there are other issues with that request (e.g. the domain is reserved or already registered).
See b/315504612 for more details and background
We suspected this could be a cause of optimistic locking failures
(because long transactions would lead to optimistic locks not being
released) but this didn't end up being the case. Let's remove this to
reduce log spam.
1. This doesn't remove the SQL tables yet (this is necessary to pass
tests and also good practice just in case we need or want to look at
history for a little bit)
2. This also removes the Registrar, RegistrarPoc, and User base classes
that were only necessary because we were saving copies of those
objects in the old history classes.
We no longer need to union GKE+GAE logs since we've moved all production
traffic to GKE only.
For testing, I copied the affected *_test.sql files to Bigquery, removed
all the "-alpha" bits, and changed the dates to 20250301 and 20250331
and ran them to make sure they returned the expected data.
Now that we have effective global sessions thanks to #2734, there is no
longer a need to keep the number of pods on the EPP service static.
We are also not vulnerable to random pod restarts. K8s never guarantees
perpetual pod lifetime anyway, and not having to be at its mercy is
certainly a relief.
It turns out period can be used in the URI, such as in
"urn:ietf:params:xml:ns:fee-0.12". I don't think pipe is used, at least
not according to EPP URI namespace naming convention.
Ideally we'd use serialization, but using the default serialization runs
the risk of it being platform/JDK dependent, so a new deployment might
not be able to deserialize existing cookies. A custom serializer that
guarantees stability would have been needed.
This was added back in early 2018 long ago to enable promotions, but
since then (and for many years) we've added the ability to run
promotions on the tokens themselves, rather than relying on custom Java
classes.
This will make the changes for b/315504612 much easier, as that will
split up token validation into "is this token valid in general?" and "is
this token valid for this domain/action?"
This can potentially help even more with serializable transaction
failures (optimistic locking exceptions, which are expected to occur
somewhat frequently).
With six attempts, we will sleep at most five times, for
100+200+400+800+1600 ms each, for a total of at most 3.1 seconds (much
less than the EPP maximum which I believe (?) to be 30 seconds.
In addition, we add a 20% skew in an attempt to spread out
possibly-conflicting transaction retries.
This changes the code to only save console histories of this type. We
keep the old Java code (and, necessarily, the corresponding SQL code)
for now because there's no harm in doing so and we want to avoid hastily
deleting too much.
The SQL statement was incorrectly flooring to zero one layer too deep, which was
negating all negative transaction report rows (which occur most frequently when
a domain in the autorenew grace period is deleted). I've changed it so that it
now only floors to zero at the report level, which still solves the issue
reported in http://b/290228682 but whose original fix caused the issue
http://b/344645788
This bug was introduced in https://github.com/google/nomulus/pull/2074
I tested this by running the new query against the DB for 2024 Q4 using the
registrar that was having issues and confirmed that the total renewal numbers
for .app now match with the sum total of what we invoiced for the last three
months of 2024.
This doesn't check for correctness (we have other scripts that do that)
but just that the service is available at all (the other scripts do not
do that).
This should, and will, be configured with a scheduled trigger in GCB (for us, in
the domain-registry-dev project) and configuration to send some sort of
pub/sub notification on failure (for us, this is already set up on
domain-registry-dev and it sends messages to the "Domain Registry
Notifications" chat channel.
We've seen this issue happen more often than not recently, where GAE
canary deployment is stuck for about 10 min and the failed. The reason
is not clear, but delete the canary version prior to a deployment always
fixes the issue.