Instead of having to parse the protoPayload.line from the request logs,
we just want to inspect the textPayload from the app logs (stored in a
separate table). This applies to the EPP metrics from the activity
reporting and the attempted-adds column for the transaction reporting.
This also creates base classes for the objects contained within the
history classes, e.g. RegistrarBase. This is the same way that objects
stored in the HistoryEntry subclasses have base classes, e.g.
DomainBase.
Supports the full blocklist download cycle (download, diffing, diff-apply, and order-status reporting) and the refreshing of unblockable domains.
Submitted due to tight deadline. We will conduct post-submit review and refactoring.
Add the BsaDomainRefresh class which tracks the refresh actions.
The refresh actions checks for changes in the set of registered and
reserved domains, which are called unblockables to BSA.
Both actions have not been used for a while (the wipe out action
actually caused problems when it ran unintentionally and wiped out QA).
Keeping them around is a burden when refactoring efforts have to take
them into consideration.
It is always possible to resurrect them form git history should the need
arises.
* Change PackagePromotion to BulkPricingPackage
* More name changes
* Fix some test names
* Change token type "BULK" to "BULK_PRICING"
* Fix missed token_type reference
* Add todo to remove package type
This includes two changes, the second necessary for testing the first.
1. We add the rdap-queries field as mandated by the amendment to the
registry agreement,
https://itp.cdn.icann.org/en/files/registry-agreement/proposed-global-amendment-base-gtld-registry-agreement-12-04-2023-en.pdf.
This is fairly similar to the whois-queries field where we just query
the logs, but instead of searching for "whois" we search for "rdap".
2. BigQuery doesn't use MAX to refer to the bigger of two fields; MAX
accepts an array as an argument. In order to do what we want (and to
have the BigQuery statements succeed), we need to use GREATEST.
Tested both versions in alpha and production BigQuery instances.
See b/290228682, there are edge cases in which the net_renew would be negative when
a domain is cancelled by superusers during renew grace period. The correct thing
to do is attribute the cancellation to the owning registrar, but that would require
changing the owing registrar of the the corresponding cancellation DomainHistory,
which has cascading effects that we don't want to deal with. As such we simply
floor the number here to zero to prevent any negative value from appearing, which
should have negligible impact as the edge cage happens very rarely, more specifically
when a cancellation happens during grace period by a registrar other than the the
owning one. All the numbers here should be positive to pass ICANN validation.
We have been using it as a poor man's timed flag that triggers a system
behavior change after a certain time. We have no foreseeable future use
for it now that the DNS pull queue related code is deleted. If in the
future a need for such a flag arises, we are better off implementing a
proper flag system than hijacking this class any way.
This includes renaming the billing classes to match the SQL table names,
as well as splitting them out into their own separate top-level classes.
The rest of the changes are mostly renaming variables and comments etc.
We now use `BillingBase` as the name of the common billing superclass,
because one-time events are called BillingEvents
Because we need to check if a contact history is the most recent for its
underlying contact resource, the query-wipe out-repeat loop no longer works
ideally due to the added overhead with the query.
Instead, we refactor the logic into a Beam pipeline where the query only
needs to be performed once and history entries eligible for wipe out are
handled individually in their own transforms. Because history entries
are otherwise immutable, we can run the pipeline in relatively relaxed
repeatable read isolation level. We also do not worry about batching for
performance, as we do not anticipate this operation to put a lot of
strains on the particular table.
This includes changes to make sure that we use the proper per-TLD IDN
tables as well as setting/updating/removing them via the Create/Update
TLD commands.
Also adds a DnsUtils class to deal with adding, polling, and removing
DNS refresh requests (only adding is implemented for now). The class
also takes care of choosing which mechanism to use (pull queue vs. SQL)
based on the current time and the database migration schedule map.
See b/260945047.
Also refactored the corresponding tests, which should future updates easier.
This change should be deployed at or around 2023-02-15T16:00:00Z.
This will replace the ExpandRecurringBillingEventsAction, which has a
couple of issues:
1) The action starts with too many Recurrings that are later filtered out
because their expanded OneTimes are not actually in scope. This is due
to the Recurrings not recording its latest expanded event time, and
therefore many Recurrings that are not yet due for renewal get included
in the initial query.
2) The action works in sequence, which exacerbated the issue in 1) and
makes it very slow to run if the window of operation is wider than
one day, which in turn makes it impossible to run any catch-up
expansions with any significant gap to fill.
3) The action only expands the recurrence when the billing times because
due, but most of its logic works on event time, which is 45 days
before billing time, making the code hard to reason about and
error-prone. This has led to b/258822640 where a premature
optimization intended to fix 1) caused some autorenwals to not be
expanded correctly when subsequent manual renews within the autorenew
grace period closed the original recurrece.
As a result, the new pipeline addresses the above issues in the
following way:
1) Update the recurrenceLastExpansion field on the Recurring when a new
expansion occurs, and narrow down the Recurrings in scope for
expansion by only looking for the ones that have not been expanded for
more than a year.
2) Make it a Beam pipeline so expansions can happen in parallel. The
Recurrings are grouped into batches in order to not overwhelm the
database with writes for each expansion.
3) Create new expansions when the event time, as opposed to billing
time, is within the operation window. This streamlines the logic and
makes it clearer and easier to reason about. This also aligns with
how other (cancelllable) operations for which there are accompanying
grace periods are handled, when the corresponding data is always
speculatively created at event time. Lastly, doing this negates the
need to check if the expansion has finished running before generating
the monthly invoices, because the billing events are now created not
just-in-time, but 45 days in advance.
Note that this PR only adds the pipeline. It does not switch the default
behavior to using the pipeline, which is still done by
ExpandRecurringBillingEventsAction. We will first use this pipeline to
generate missing billing events and domain histories caused by
b/258822640. This also allows us to test it in production, as it
backfills data that will not affect ongoing invoice generation. If
anything goes wrong, we can always delete the generated billing events
and domain histories, based on the unique "reason" in them.
This pipeline can only run after we switch to use SQL sequence based ID
allocation, introduced in #1831.
This is similar to where we store the SQL files for beam pipelines, and
frankly makes more sense. Also streamlined the use of the API to read
SQL files from a jar.
This parameter is misleading and does not do what it purports to do.
Namely, it does not impact the level of parallelism. Given the input n for this
parameter, and m for the batch size, the elements are divided (keyed) into n
groups, each of which are then spread evenly across all threads, which
are eventually in turn batched into batches with size m:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L227
This is also evident in the implementation itself, where the ShardedKey
is determined by the unique number for a worker/thread combo and the
original key:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L268
Using a more concrete example, suppose we have 100 elements and 10
worker threads, with a target batch size of 5. If the "shard" number is set to
1, we first spread the 100 elements across 10 threads, resulting in 10
elements per thread, each thread then batches the elements into 2
batches of size 5.
If the "shard" number is set to 2, the 100 elements are first divided into 2
"shards" of 50 each. Each "shard" is then distributed within the 10
threads, resulting in 5 elements per "shard" per thread. They then get
turned into 1 batch per "shard" per thread. In the end, each thread
still processes 2 batches, even though they are from 2 different "shards".
Therefore this "shard" number does not perform horizontal partitioning
that one normally associates with sharding, and provides no
performance benefits but rather confuses the user.
It is also suggested that using withShardedKey() alone is already
sufficient to achieve auto-sharding within the keyed group. There is no
need to manually divide the input by keying them differently based on
the "shard" number specified:
https://youtu.be/jses0W4Zalc?t=967
* Add defaultPromoTokens to Registry
* Remove flyway files from this PR
* Fix merge conflicts
* Add back flyway file
* Add more info to error messages
* Change to a list
* Fix javadoc
* Change error message
* Add note to field declaration
Switch to using the login email address instead of GAE user ID to
identify console users. The primary use cases are:
1) When the user logged in the registrar console, need to figure out
which registrars they have access to (in
AuthenticatedReigstrarAccess).
2) When a user tries to apply a registry lock, needs to know if they
can (in RegistryLockGetAction).
Both cases are tested in alpha with a personal email address to ensure
it does not get the permission due to being a GAE admin account.
Also verified that the soy templates includes the hidden login email
form field instead of GAE user ID when registrars are displayed on the
console; and consequently when a contact update is posted to the server,
the login email is part of the JSON payload. Even though it does not
look like it is used in any way by RegistrarSettingsAction, which
receives the POST request. Like GAE user ID, the field is hidden, so
cannot be changed by the user from the console, it is also not used to
identify the RegistryPoc entity, whose composite keys are the contact
email and the registrar ID associated with it.
The login email address is backfilled for all RegistrarPocs that have a
non-null GAE user ID. The backfilled addresses converted to the same ID
as stored in the database.
Also fixed a bug introduced in #1785 where identity checked were performed instead of equality. This resulted in two sets containing the same elements not being regarded as equal and subsequent DNS updated being unnecessarily enqueued.
* Rename ContactResource -> Contact
This is a follow-up to PR #1725 and #1733. Now all EPP resource entity class
names have been rationalized to match with their SQL table names.
* Add the PackagePromotion table
* Add long id
* Add NOT NULL
* fix formatting
* make package price non null
* Add not nulls to java file
* Fix broken tests from merge conflicts
* Rename DomainBase -> Domain
This was a long time coming, but we couldn't do it until we left Datastore, as
the Java class name has to match the Datastore entity name.
Subsequent PRs will rename ContactResource to Contact and HostResource to Host,
so that everything matches the SQL table names (and is shorter!).
* Merge branch 'master' into rename-domainbase
This PR turns out to be more massive than I would have liked but it
deals with all billing event related stuff, which are more or link all
intertwined:
* Remove all billing events as Ofy entities.
* Add a temporary annotation to allow BillingEvent's ID to be
auto-allocated by ofy while not lacking the Ofy @Id annotation.
* Remove Modification, which is only used in ofy.
* Remove BillingVKey, as we do not need to store the ofy key parent
information anymore. The VKey for a billing event now just contain
its primary key, and can be converted by VKeyConverter.
* Remove BigQuery related code in the billing pipeline.
Note that after BillingVKey is removed, several columns in
BillingCancellation are no longer needed. The change to database schema
will be handled in https://github.com/google/nomulus/pull/1721 after
this PR is deployed to production.
<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1710)
<!-- Reviewable:end -->
This includes:
- deletion of helper DB methods in tests
- deletion of various old Datastore-only classes and removal of any
endpoints
- removal of the dual-database test concept
- removal of 'ofy' from the AppEngineExtension
One of the more significant changes introduced in this PR is that we use
SQL as the backing database in all tests unless otherwise specified,
e.g. by using the TmOverrideExtension. We change various ofy-related
tests to use this.
This includes various changes:
- Deletion of SqlEntity/DatastoreEntity and related classes. Includes
any necessary changes because of that (e.g. getting a nice SQL key on
error in RegistryJpaIO).
- Deletion of classes that used libraries from the init-sql code
(RefreshDnsOnHostRenameAction)
- Removal of the JpaTransactionManager's backup implementation
- Modification of RegistryJpaWriteTest to not use init-sql code
- Removal of the Transaction class and related classes, however it does
not remove the TransactionEntity class as that would require DB
changes
- Removal of anything related to the actual usage of the database
migration schedule or read-only phases
- Various test changes and fixes to account for the differences in SQL
(like how foreign keys need to exist)
This deliberately doesn't do anything to alter the objects actually
stored in the DB yet, just how we use them
* Create a Dataflow pipeline to resave EPP resources
This has two modes.
If `fast` is false, then we will just load all EPP resources, project them to the current time, and save them.
If `fast` is true, we will attempt to intelligently load and save only resources that we expect to have changes applied when we project them to the current time. This means resources with pending transfers that have expired, domains with expired grace periods, and non-deleted domains that have expired (we expect that they autorenewed).
* Change billingIdentifier to BillingAccountMap in invoicing pipeline
* Add a default for billing account map
* Throw error on missing PAK
* Add unit test