This is the last remaining GAE API that we depend on. By removing it, we are able to remove all common GAE dependencies as well.
To merge this PR, we need to create console User objects that have the same email address as the RegistrarPoc objects' login_email_address and copy over the existing registry lock hashes and salts.
We are also able to simply the code base by removing some redundant logic like AuthMethod (API is now the only supported one) and UserAuthInfo (console user is now the only supported one)
There are several behavioral changes that are worth noting:
The XsrfTokenManager now uses the console user's email address to mint and verify the token. Previously, only email addresses returned by the GAE Users service are used, whereas a blank email address will be used if the user is logged in as a console user. I believe this was an oversight that is now corrected.
The legacy console will return 401 when no user is logged in, instead of redirecting to the Users service login flow.
The logout URL in the legacy console is changed to use the IAP logout flow. It will clear the cookie and redirect the users to IAP login page (tested on QA).
The screenshot changes are mostly due to the console users lacking a display name and therefore showing the email address instead. Some changes are due to using the console user's email address as the registry lock email address, which is being fixed in Add DB column for separate rlock email address #2413 and its follow-up RPs.
* Add log traces to Nomulus service on GKE
Add request-scope log traces to Nomulus on GKE which, unlike
AppEngine and Cloud Run etc, does not generate traces for hosted
applications. This change only affects the GKE image. It does not affect
the AppEngine services.
Log traces are added to Nomulus-generated logs in request-processing
threads. Forked threads are not covered yet. The single relevant use
case (TimeLimiter) will be addressed in a followup PR.
The main change is in the logging configuration:
* Use gcp-cloud-logging's LoggingHandler
* Add gcp-cloud-logging's TraceLoggingEnhancer to the handler.
* Set a thread-local trace id through the TraceLoggingEnhancer in
ServletBase on request's entry and clear it on completion.
Also removed an unused class (`RequestLogId`).
* CR
* CR
This includes using the new switch format (though IntelliJ does not yet
understand patterns including default so those aren't used), multiline strings,
replacing some unnecessary type declarations with <>, converting some classes to
records, replacing some Guava predicates with native Java code, and some other
miscellaneous Code Inspection fixes.
Console users need IAP to inject the necessary OIDC tokens into their
request headers and therefore need to be bound to appropriate roles. Note
that in environments managed by latchkey, the bindings will need to be
present in latchkey config files as well, otherwise the changes made by
the nomulus tool will be reverted.
TESTED=ran the nomulus command against alpha and verified that the
bindings are created/removed upon console user creation/deletion.
Upgrade to using Jakarta EE 10 from Java EE 8 by mostly following the upgrade instructions. Only the servlet package is upgrade. Other Jakarta EE components (like the persistence package that Hibernate depends on) need to be upgraded separately.
TESTED=deployed and successfully communicated with the pubapi endpoint for web WHOIS.
Note that this currently requires packaing the App Engine runtime per instructions here due to GoogleCloudPlatform/appengine-java-standard#98. This PR will only be merged until the fix is deployed to production (https://rapid.corp.google.com/#/release/serverless_runtimes_run_java/java21_20240310_21_0).
Note that Dagger currently doesn't work with the Jakarta namespace and
we have to cap the jakarta inject package version below 2.0 so that it
sill provides classes in the old namespace.
This PR makes the runtime of most of our workload Java 21.
1. App Engine. Java 21 is in GA and it supports Java EE 8. I had to add
an environmental variable so that we don't get an
AppEngineCredentails by default (we have been using
ComputeEngineCredentials for a couple of years). The uprade to Java
21 runtime changed a system property that controls how jetty logging
works, which also control if AppEngineCredential is return. Tested by
deploying to alpha.
2. Proxy base image upgradedd to Java 21 (distroless still doesn't
support Java 21 and it looks like Temurin is the way to go
b/306728455). Tested by deploying to alpha.
3. Nomulus tool image upgrade to Temurin 21 as well. Tested locally.
4. Beam pipeline base image upgrade to Java 21. The JAVA21 flag is not
supported by gcloud yet, but specifying the image URL directly works
(and is supported). Tested by running in alpha.
5. Jetty base image upgraded to Java 21. Tested locally.
Make the necessary changes for the code base to compile with JDK 21.
Other changes:
1. Upgraded testcontainer version and the SQL image version (to be the
same as what we use in Cloud SQL). This led to some schema changes and
also changed the order of results in some test queries (for the
better I think, as the new order appears to be alphabetical).
2. Remove dependency on Truth8, which is deprecated.
3. Enable parallel Gradle task execution and greatly increased the
number of parallel tests in standardTest. Removed outcastTest.
This PR makes it possible to build the Nomulus code base using Java 17.
Building with Java 11 continue to be possible and the resulting bytecodes are
still at Java 8 level. Also upgraded Gradle to 8.5.
There are several necessary changes to make this happen:
1. Some Gradle plugins need to be upgraded to support Java 17, notably
errorprone. As a result, a lot more "errors" were caught and corrected.
2. All test code are now built and run at Java 8 level. Previously it was left
undefined (which defaults to the version of the compiler) and had led to
situations where we inadvertently called Java 8+ features in production that
are not caught by tests. The change also made the java8compatibility subproject
obsolete, which is therefore removed.
3. Removed the docs subproject. Its main use is to generate flows.md, but it
relies heavily on Java internal APIs that have changed significant with each
version. Upgrading to Java 11 required extensive refactoring of the code there,
and Java 17 again removed many APIs that were used. I don't think it is worth
the maintenance effort just to have a tool to generate flows.md which no one
actually reads.
4. Capped a few GCP dependencies because the latest version depends on
grpc-java >= 1.59.0, which includes a runtime incompatibility
(https://github.com/grpc/grpc-java/releases/tag/v1.59.0).
From our investigation, the Monday night WHOIS storm does not cause any
strain to the backend system. The backend latency metrics are all well within
the limits. The latency measured from the proxy matches observed latency
by the prober, and we see that the "used" CPU is 1.5x of "requested" CPU
during the time when the latency is above the threshold.
Making this change hopefully removes the proxy as the bottleneck and
ameliorate the pages.
For reasons unclear at this point, Java 17's servlet implementation on
GAE injects IP addresses (including unroutable private IPs) into the
standard X-Forwarded-For header, which we currently use to embed
registrar IP addresses to check against the allow list. This results in
the server not properly parsing the header and rejecting legitimate
connections.
This PR sets a custom header that should not be interfered with by any
JVM implementation to store the IP address, while maintaining the old
header as a fallback. The proxy will set both headers to allow the
server to gracefully migrate from Java 8 and Java 17 (and potentially
rollback).
Also removed some headers and logic that are not used.
2.25.0 contains a breaking change that made HttpStorageOptions not
serializeable, which breaks RDE as it needs to access GCS from Beam.
2.22.6 was the last version that was used before the Gradle upgrade.
Also had to downgrade google-cloud-nio to pass the tests.
For some inexplicable reason, I had to manually add
guava-listenablefuture as
testRuntimeClasspath/runtimeClasspath/deploy_jar dependencies to the
networking, docs and prober subprojects' lock files, as running
`gradle test --write-locks` would NOT add them and succeed; but without
`--write-locks`, running the corresponding tests would fail.
See: b/294378137.
This includes removing (hopefully temporarily) the gradle-lint plugin as
it is incompatible with various Gradle versions (see
https://github.com/nebula-plugins/gradle-lint-plugin/issues/393). This
is somewhat unfortunate since the plugin is useful for removing unused
dependencies, though with the relatively small amount of Gradle code we
write hopefully it will not be missed much. If Nebula changes their
code to be compatible with Gradle 8+, we can re-add it easily.
This upgrade means we can remove the code added in 342051e1.
This PR changes the two flavors of OIDC authentication mechanisms to
verify the same audience. This allows the same token to pass both
mechanisms. Previously the regular OIDC flavor uses the project id as
its required audience, which does not work for local user credentials
(such as ones used by the nomulus tool), which requires a valid OAuth
client ID as audience when minting the token (project id is NOT a valid
OAuth client ID).
I considered allowing multiple audiences, but the result is not as clean
as just using the same everywhere, because the fall-through logic would
have generated a lot of noises for failed attempts.
This PR also changes the client side to solely use OIDC token whenever
possible, including the proxy, cloud scheduler and cloud tasks. The nomulus
tool still uses OAuth access token by default because it requires USER level
authentication, which in turn requires us to fill the User table with objects
corresponding to the email address of everyone needing access to the tool.
TESTED=verified each client is able to make authenticated calls on QA with or
without IAP.
The Java code will be added in a followup PR.
Also fixed tests failing due to org.json upgrade: decimal whole numbers
no longer have their fractional parts removed, so currency value strings
must end with ".00" instead of ".0".
This is only generated and used if "iapClientId" is set in the proxy
config. If so, we use code similar to
https://cloud.google.com/iap/docs/authentication-howto#obtaining_an_oidc_token_for_the_default_service_account
to generate an ID token that is valid for IAP. We set the token on the
Proxy-Authorization header so that we can keep using the pre-existing
access token as well -- IAP allows for us to use either the
Authorization header or the Proxy-Authorization header.