Depending on if a "--gke" parameter (must be the last one) is passed,
the deployer constructs the corresponding URIs for GAE or GKE
accordingly.
TESTED=Used the deployer to deploy tasks to alpha and verified that they
run on GKE.
* Add index for domainRepoId to PollMessage and DomainHistoryHost
* Add flyway fix for Concurrent
* fix gradle.properties
* Modify lockfiles
* Update the release tool and add IF NOT EXISTS
* Test removing transactional lock from deploy script
* Add transactional lock flag to actual flyway commands in script
* Remove flag from info command
* Add configuration for integration test
This PR makes the runtime of most of our workload Java 21.
1. App Engine. Java 21 is in GA and it supports Java EE 8. I had to add
an environmental variable so that we don't get an
AppEngineCredentails by default (we have been using
ComputeEngineCredentials for a couple of years). The uprade to Java
21 runtime changed a system property that controls how jetty logging
works, which also control if AppEngineCredential is return. Tested by
deploying to alpha.
2. Proxy base image upgradedd to Java 21 (distroless still doesn't
support Java 21 and it looks like Temurin is the way to go
b/306728455). Tested by deploying to alpha.
3. Nomulus tool image upgrade to Temurin 21 as well. Tested locally.
4. Beam pipeline base image upgrade to Java 21. The JAVA21 flag is not
supported by gcloud yet, but specifying the image URL directly works
(and is supported). Tested by running in alpha.
5. Jetty base image upgraded to Java 21. Tested locally.
Make the necessary changes for the code base to compile with JDK 21.
Other changes:
1. Upgraded testcontainer version and the SQL image version (to be the
same as what we use in Cloud SQL). This led to some schema changes and
also changed the order of results in some test queries (for the
better I think, as the new order appears to be alphabetical).
2. Remove dependency on Truth8, which is deprecated.
3. Enable parallel Gradle task execution and greatly increased the
number of parallel tests in standardTest. Removed outcastTest.
* Change tld-update to db-object-updater
* rename sync_tlds.sh to sync_db_objects.sh
* Change to configured command name
* Change environment to sandbox explicitly for testing on alpha
* Add remaining object steps and change cloudbuild-tld-sync to cloudbuild-sync-db-objects
* Add build_environment flag
* Change order of command and directory
* Uncomment out reserved list part
* Add a cloudbuild-tld-sync job
This job checks the Tld config files in the internal repo and syncs them with the actual Tld objects in the database using the configure_tld numulus command.
* Add the dockerfile and shell script
* Force the command
* Add comments
* add newline
* Create a separate copy of the job for each environment
* fix file name
* Fix indentation
Add documentation that describes the current Cloud Build status notification
to Google Chat, as well as how to update the configuration and the
notifier service.
This PR changes the two flavors of OIDC authentication mechanisms to
verify the same audience. This allows the same token to pass both
mechanisms. Previously the regular OIDC flavor uses the project id as
its required audience, which does not work for local user credentials
(such as ones used by the nomulus tool), which requires a valid OAuth
client ID as audience when minting the token (project id is NOT a valid
OAuth client ID).
I considered allowing multiple audiences, but the result is not as clean
as just using the same everywhere, because the fall-through logic would
have generated a lot of noises for failed attempts.
This PR also changes the client side to solely use OIDC token whenever
possible, including the proxy, cloud scheduler and cloud tasks. The nomulus
tool still uses OAuth access token by default because it requires USER level
authentication, which in turn requires us to fill the User table with objects
corresponding to the email address of everyone needing access to the tool.
TESTED=verified each client is able to make authenticated calls on QA with or
without IAP.
This includes renaming the billing classes to match the SQL table names,
as well as splitting them out into their own separate top-level classes.
The rest of the changes are mostly renaming variables and comments etc.
We now use `BillingBase` as the name of the common billing superclass,
because one-time events are called BillingEvents
The only method that is called from this class is setNumInstances. However we
don't current use `nomulus set_num_instances` anywhere. If we need to change
the number of instances, it is either done by updating appengine-web.xml, which
is deployed by Spinnaker, or doing it manually as a break-glass fix via gcloud
or on Pantheon.
Because we need to check if a contact history is the most recent for its
underlying contact resource, the query-wipe out-repeat loop no longer works
ideally due to the added overhead with the query.
Instead, we refactor the logic into a Beam pipeline where the query only
needs to be performed once and history entries eligible for wipe out are
handled individually in their own transforms. Because history entries
are otherwise immutable, we can run the pipeline in relatively relaxed
repeatable read isolation level. We also do not worry about batching for
performance, as we do not anticipate this operation to put a lot of
strains on the particular table.
This will replace the ExpandRecurringBillingEventsAction, which has a
couple of issues:
1) The action starts with too many Recurrings that are later filtered out
because their expanded OneTimes are not actually in scope. This is due
to the Recurrings not recording its latest expanded event time, and
therefore many Recurrings that are not yet due for renewal get included
in the initial query.
2) The action works in sequence, which exacerbated the issue in 1) and
makes it very slow to run if the window of operation is wider than
one day, which in turn makes it impossible to run any catch-up
expansions with any significant gap to fill.
3) The action only expands the recurrence when the billing times because
due, but most of its logic works on event time, which is 45 days
before billing time, making the code hard to reason about and
error-prone. This has led to b/258822640 where a premature
optimization intended to fix 1) caused some autorenwals to not be
expanded correctly when subsequent manual renews within the autorenew
grace period closed the original recurrece.
As a result, the new pipeline addresses the above issues in the
following way:
1) Update the recurrenceLastExpansion field on the Recurring when a new
expansion occurs, and narrow down the Recurrings in scope for
expansion by only looking for the ones that have not been expanded for
more than a year.
2) Make it a Beam pipeline so expansions can happen in parallel. The
Recurrings are grouped into batches in order to not overwhelm the
database with writes for each expansion.
3) Create new expansions when the event time, as opposed to billing
time, is within the operation window. This streamlines the logic and
makes it clearer and easier to reason about. This also aligns with
how other (cancelllable) operations for which there are accompanying
grace periods are handled, when the corresponding data is always
speculatively created at event time. Lastly, doing this negates the
need to check if the expansion has finished running before generating
the monthly invoices, because the billing events are now created not
just-in-time, but 45 days in advance.
Note that this PR only adds the pipeline. It does not switch the default
behavior to using the pipeline, which is still done by
ExpandRecurringBillingEventsAction. We will first use this pipeline to
generate missing billing events and domain histories caused by
b/258822640. This also allows us to test it in production, as it
backfills data that will not affect ongoing invoice generation. If
anything goes wrong, we can always delete the generated billing events
and domain histories, based on the unique "reason" in them.
This pipeline can only run after we switch to use SQL sequence based ID
allocation, introduced in #1831.
The fix is based on b/240627423. I tested locally and was able to build
with the -PenableCrossReferencing=true flag successfully.
TESTED=run the kythe GCB pipeline locally.
This includes:
- deletion of helper DB methods in tests
- deletion of various old Datastore-only classes and removal of any
endpoints
- removal of the dual-database test concept
- removal of 'ofy' from the AppEngineExtension
GPG1 is deprecated and stuck in v1.4 from 2018. GPG2 is recommended. We
only use the GPG binary in tests and when the host system has both
versions it causes problems because we hardcode the GPG import command
in GpgSystemCommandExension to use the binary named "gpg", which could
be linked to either GPG1 or GPG2, causing the other test to fail when
the version of GPG that runs in tests is incompatible with the version of GPG
that imports the keys.
With this PR we only support GPG2 from now on.
One of the more significant changes introduced in this PR is that we use
SQL as the backing database in all tests unless otherwise specified,
e.g. by using the TmOverrideExtension. We change various ofy-related
tests to use this.
This includes various changes:
- Deletion of SqlEntity/DatastoreEntity and related classes. Includes
any necessary changes because of that (e.g. getting a nice SQL key on
error in RegistryJpaIO).
- Deletion of classes that used libraries from the init-sql code
(RefreshDnsOnHostRenameAction)
- Removal of the JpaTransactionManager's backup implementation
- Modification of RegistryJpaWriteTest to not use init-sql code
- Removal of the Transaction class and related classes, however it does
not remove the TransactionEntity class as that would require DB
changes
- Removal of anything related to the actual usage of the database
migration schedule or read-only phases
- Various test changes and fixes to account for the differences in SQL
(like how foreign keys need to exist)
This deliberately doesn't do anything to alter the objects actually
stored in the DB yet, just how we use them