These are old/pointless now that we've migrated to GKE. Note that this
doesn't update anything in the docs/ folder, as that's a much larger
project that should be done on its own.
We don't need to support the mix of GAE and GKE any more so we can get
rid of the GaeService bits and unify everything under one constant
service. This also allows us to reduce the number of services down to
four (FE, BE, PUBAPI, console) which is nice.
We've previously been using Scrypt since PR #2191 which, while being a
memory-hard slow function, isn't the optimal solution according to the
OWASP recommendations. While we could get away with increasing the
parallelization parameter to 3, it's better to just switch to the
most-recommended solution if we're switching things up anyway.
For the transition, we do something similar to PR #2191 where if the
previous-algorithm's hash is successful, we re-hash with Argo2id and
store that version. By doing this, we should not need any intervention
for registrars who log in at any point during the transition period.
Much of this PR, especially the parts where we re-hash the passwords in
Argon2 instead of Scrypt upon login, is based on the code that was
eventually removed in #2310.
This can potentially help even more with serializable transaction
failures (optimistic locking exceptions, which are expected to occur
somewhat frequently).
With six attempts, we will sleep at most five times, for
100+200+400+800+1600 ms each, for a total of at most 3.1 seconds (much
less than the EPP maximum which I believe (?) to be 30 seconds.
In addition, we add a 20% skew in an attempt to spread out
possibly-conflicting transaction retries.
There are two session cookies, JSESSIONID, which is set by Jetty, and
GCLB, which is set by the Gateway.
In one session, every request other than the first one (the <hello>)
should have the same GCLB value, and every request after a successful
<login> should have the same JSESSIONID.
With these two metadata, we should be able to trace all requests that
*should* belong to the same session and debug issues with session
mismatch (if any).
I obtained access to an IBM s390x VM so I thought I'd see how multi-arch
Nomulus is.
Our main application is in Java so it is already multi-arch, but several
tests use docker images that are by default x64. Luckily postgres has an
s390x port, but selenium does not. So I had to disable Screenshot tests
when the arch is not amd64.
This is the last remaining GAE API that we depend on. By removing it, we are able to remove all common GAE dependencies as well.
To merge this PR, we need to create console User objects that have the same email address as the RegistrarPoc objects' login_email_address and copy over the existing registry lock hashes and salts.
We are also able to simply the code base by removing some redundant logic like AuthMethod (API is now the only supported one) and UserAuthInfo (console user is now the only supported one)
There are several behavioral changes that are worth noting:
The XsrfTokenManager now uses the console user's email address to mint and verify the token. Previously, only email addresses returned by the GAE Users service are used, whereas a blank email address will be used if the user is logged in as a console user. I believe this was an oversight that is now corrected.
The legacy console will return 401 when no user is logged in, instead of redirecting to the Users service login flow.
The logout URL in the legacy console is changed to use the IAP logout flow. It will clear the cookie and redirect the users to IAP login page (tested on QA).
The screenshot changes are mostly due to the console users lacking a display name and therefore showing the email address instead. Some changes are due to using the console user's email address as the registry lock email address, which is being fixed in Add DB column for separate rlock email address #2413 and its follow-up RPs.
GKE now displays log messages correctly. There is no need for an extra
leading newline, which now results in a useless blank line for each log
entry in Log Explorer.
For reasons unclear to me, if the stack trace is appended directly to
the message, the log entry will be lumped together with following logs
on GKE.
Also updated the GKE service account for Nomulus in the manifest so we
can use workload identity just for Nomulus, not other pods on the same
cluster.
* Add log traces to Nomulus service on GKE
Add request-scope log traces to Nomulus on GKE which, unlike
AppEngine and Cloud Run etc, does not generate traces for hosted
applications. This change only affects the GKE image. It does not affect
the AppEngine services.
Log traces are added to Nomulus-generated logs in request-processing
threads. Forked threads are not covered yet. The single relevant use
case (TimeLimiter) will be addressed in a followup PR.
The main change is in the logging configuration:
* Use gcp-cloud-logging's LoggingHandler
* Add gcp-cloud-logging's TraceLoggingEnhancer to the handler.
* Set a thread-local trace id through the TraceLoggingEnhancer in
ServletBase on request's entry and clear it on completion.
Also removed an unused class (`RequestLogId`).
* CR
* CR
This includes using the new switch format (though IntelliJ does not yet
understand patterns including default so those aren't used), multiline strings,
replacing some unnecessary type declarations with <>, converting some classes to
records, replacing some Guava predicates with native Java code, and some other
miscellaneous Code Inspection fixes.
There is one remaining instance in JpaTransactionManagerImpl that cannot
be removed because DetachingTypedQuery is implementing TypedQuery, which has
a method that expectred java.util.Date.
Note that Dagger currently doesn't work with the Jakarta namespace and
we have to cap the jakarta inject package version below 2.0 so that it
sill provides classes in the old namespace.