For reasons unclear to me, if the stack trace is appended directly to
the message, the log entry will be lumped together with following logs
on GKE.
Also updated the GKE service account for Nomulus in the manifest so we
can use workload identity just for Nomulus, not other pods on the same
cluster.
There are a bunch of cases where we want common exception handling and
it's annoying to have to deal with the common "set failed response and
make sure to return" a bunch of times.
This allows us to break up request methods more easily, since we can now
often throw exceptions that will break all the way back up to
ConsoleApiAction. Previously, any error handling had to exist in the
primary handler method so it could return.
The user, on the front end, should not be required to provide whether or
not they're trying to verify a lock or an unlock. They should only need
the verification code. We can inspect the lock object itself (and the
domain in question) to see whether or not we're verifying a lock or an
unlock.
We've added the field in the database in a previous PR. This is only
used in the old console for now because the new console does not have
registry lock functionality yet
* Remove the createBillingCost field from Tld
* fix spacing
* Change field name of map
* Rename getter
* Fix formatting
* Fix todo
* unchange column name
* Add log traces to Nomulus service on GKE
Add request-scope log traces to Nomulus on GKE which, unlike
AppEngine and Cloud Run etc, does not generate traces for hosted
applications. This change only affects the GKE image. It does not affect
the AppEngine services.
Log traces are added to Nomulus-generated logs in request-processing
threads. Forked threads are not covered yet. The single relevant use
case (TimeLimiter) will be addressed in a followup PR.
The main change is in the logging configuration:
* Use gcp-cloud-logging's LoggingHandler
* Add gcp-cloud-logging's TraceLoggingEnhancer to the handler.
* Set a thread-local trace id through the TraceLoggingEnhancer in
ServletBase on request's entry and clear it on completion.
Also removed an unused class (`RequestLogId`).
* CR
* CR
This handles both GET and POST requests. For POST requests it doesn't
actually change anything about the domains because we will need to add a
verification action (this will be done in a future PR).
* Avoid contention over the RefreshDnsRequest table
This table can be small at times, when PSQL may use table scan in
queries by keys. At the SERIALIZABLE isolation level, updates to
unrelated rows may conflict due to this `optimization`.
Lower the isolation level to repeatable read.
* Code review
This includes using the new switch format (though IntelliJ does not yet
understand patterns including default so those aren't used), multiline strings,
replacing some unnecessary type declarations with <>, converting some classes to
records, replacing some Guava predicates with native Java code, and some other
miscellaneous Code Inspection fixes.
This adds an index on transfer_billing_cancellation_id to Domain and superordinate_domain to Host. When tested on crash with the action limited to only delete 10,000 domains, before these indexes were added the action took about 2 hours to delete 10,000 domains. Once these indexes were added, the action was able to delete the 10,000 domains in a little under 2 minutes.
This also creates base classes for the objects contained within the
history classes, e.g. RegistrarBase. This is the same way that objects
stored in the HistoryEntry subclasses have base classes, e.g.
DomainBase.
We cannot rely on the user checking their login email, so we'll want to
send the emails to the other address if configured. This is already the
case in RegistrarPoc.
Use REPEATABLE READ for lock acquire/release operation to avoid conlicts
between locks.
Postgresql uses table scan on small tables, causing false sharing at
the SERIALIZABLE isolation level.
See b/333537928 for details.
Console users need IAP to inject the necessary OIDC tokens into their
request headers and therefore need to be bound to appropriate roles. Note
that in environments managed by latchkey, the bindings will need to be
present in latchkey config files as well, otherwise the changes made by
the nomulus tool will be reverted.
TESTED=ran the nomulus command against alpha and verified that the
bindings are created/removed upon console user creation/deletion.
The existing behavior was to ignore bad header names, in a way that was
counter-intuitive as a user of the Google Sheet. If a header name was bad (which
could just be someone accidentally changing it not realizing it needs to
correspond exactly to the name of the field on the Java object), then all of the
data in that column was just silently left as-is and never updated. This led to
gradually worsening sync and offset shift errors over time.
Now, it will write out an error message into every single cell in the bad
column, so it's clear that the column name is wrong and does not correspond to any
actual data in the DB.
BUG=http://b/332336068
As a read-only action that tolerates staleness, locking is unnecessary.
This should help with the lock contention we are observing.
Also reduces the number of VM instances provisioned for BSA and increase
the idle timeout. This should reduce invocation delay. Longer delay may
cause AppEngine to return `Timeout` status to Cloud Scheduler even
though the cron job succeeds.
This also involved breaking out an improperly done assertThat() helper overload
method for JsonObjects into a proper Subject that doesn't further overload
assertThat().