* Cleanup gpg-agent instances and home directories
The GpgSystemCommandException leaks home directories, but more importantly it
leaks gpg-agent instances. This can cause problems with inotify limits, since
the agent seems to make use of inotify. Do a proper cleanup in afterEach().
* Don't fail if we can't kill the agent
* Don't enforce billing account map check on TEST TLDs
This was affecting monitoring (i.e. prober TLDs). Note that test TLDs are
already excluded from the billing account map check in the Registrar builder()
method (see PR #1601), but we forgot to make that same test TLD exclusion in the
EPP flows check (see PR #1605).
* Add missing transaction for whois lookups
Nameserver whois lookups are failing under SQL for hosts with superordinate
domains because the query in this case is not done in a transaction. We
missed this during testing because a) we didn't have a test for lookups of
hosts with superordinate domains and b) we missed converting
NameserverWhoisResponseTest to a DualDatabaseTest.
This PR fixes the problem and adds the requisite testing.
* Use a single transaction to get host registrars
* Replace streaming with Maps.toMap()
* Check PAK on domain create
* Add unit test
* update docs
* Remove unneccesary setup
* Fix blank line
* Add check and test to all relevant flows
* Change error message
This will require edits to a substantial number of registrars on sandbox (nearly
all of them) because almost all of them have access to at least one TLD, but
almost none of them have any billing accounts set. Until this is set, any updates
to the existing registrars that aren't adding the billing accounts will cause
failures.
Unfortunately, there wasn't any less invasive foolproof way to implement this
change, and we already had one attempt to implement it on create registrar
command that wasn't working (because allowed TLDs tend not to be added on
initial registrar creation, but rather, afterwards as an update).
* Create a Dataflow pipeline to resave EPP resources
This has two modes.
If `fast` is false, then we will just load all EPP resources, project them to the current time, and save them.
If `fast` is true, we will attempt to intelligently load and save only resources that we expect to have changes applied when we project them to the current time. This means resources with pending transfers that have expired, domains with expired grace periods, and non-deleted domains that have expired (we expect that they autorenewed).
* Begin migration from Guava Cache to Caffeine
Caffeine is apparently strictly superior to the older Guava Cache (and is even
recommended in lieu of Guava Cache on Guava Cache's own documentation).
This adds the relevant dependencies and switch over just a single call site to
use the new Caffeine cache. It also implements a new pattern, asynchronously
refreshing the cache value starting from half of our configuration time. For
frequently accessed entities this will allow us to NEVER block on a load, as it
will be asynchronously refreshed in the background long before it ever expires
synchronously during a read operation.
* Add new columns to BillingEvent.java
* Improve PR and modifyJodaMoneyType to handle null currency in override
* Add test cases for edge cases of nullSafeGet in JodaMoneyType
* Improve assertions
* Ignore read-only when saving commit logs
Ignore read-only when saving commit logs and commit log mutations so that we
can safely replicate in read-only mode. This should be safe, as we only ever
to the situation of saving commit logs and mutations when something has
already actually been modified in a transaction, meaning that we should have hit
the "read only" sentinel already.
This also introduces the ability to set the Clock in the
TransactionManagerFactory so that we can test this functionality.
* Changes per review
* Fix issues affecting tests
- Restore clobbered async phase in testNoInMigrationState_doesNothing
- Restore system clock to TransactionManagerFactory to avoid affecting other
tests.
* Change billingIdentifier to BillingAccountMap in invoicing pipeline
* Add a default for billing account map
* Throw error on missing PAK
* Add unit test
* Add a no-async actions DB migration phase
This needs to be set several hours prior to entering the READONLY stage. This is
not a read-only stage; all synchronous actions under Datastore (such as domain
creates) will continue to succeed. The only thing that will fail is host
deletes, host renames, and contact deletes, as these three actions require a
mapreduce to run before they are complete, and we don't want mapreduces hanging
around and executing during what is supposed to be a short duration READONLY
period.
* Use UrlFetch for RDE and default TLS (1.2) for other URL connections
This removes the TLS 1.3-settings in the module providers and,
essentially, reverts the changes in #1535 only to the RdeReporter and
RdeReportActionTest
We have a cron job that runs the RDE upload action every 4 hours for all
TLD. Normally this should be a no-op beacuse a RDE upload is scheduled
after RDE staging is completed, and when it fails with non-2XX status it
will retry. However if for some reason it failed due to 20X status (like
waiting for the SFTP cursor), it will not retry but rely on the cron job to
catch up.
With the BEAM RDE pipeline every staging job saves all its deposits in a
uniquely named folder to avoid the need to use a lock, which is not
practical in BEAM. However the cron job has no way of knowing what the
prefixes are for each TLD so it will fail in SQL mode.
In this PR we implemented a logic to guess what the prefix should be and
use it, if we are in SQL mode and a prefix is not provided.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1574)
<!-- Reviewable:end -->
* Fix sporadic SQL Snapshot failure
The Postgresql set-snapshot statement (called in
JpaTransactionManager.setDatabaseSnapshot() method) must be the first
statement in the SQL transaction.
Currenty the JpaTransaction.transact() method may insert a query for
DatabaseMigrationStateSchedule before the user query when the cache is
empty or the cached value expires.
This PR proactively preloads the cache in RegistryJpaIO to prevent cache
loading inside the transaction.
This PR also changes some DatabaseSnapshotTest tests to be retrying, in
case they run just after the cache expires. (This has happened before in
CI).
* Ignore trivial differences when comparing DB
Some data difference are due to entity model differences and also
harmless. We should igore them when comparing Datastore and SQL.
This PR ignores the following diffs:
- null vs empty collection
- the empty string in Address.stree field, which is a list
1. testRun_withPrefix() in RdeUploadActionTest does calls a mock lock
handler and does not actually try to read from the fake GCS
implementation. Therefore there's no point settig it up.
2. Remove an unused field in UploadDatastoreBackupActionTest.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1563)
<!-- Reviewable:end -->
* Remove static methods in back up actions
* Remove BigqueryPollJob helper class
* Add schedule time in task comparison
* Change payload type from byte[] to ByteString
* Fix a subtle issue in BRDA copy caused by Cloud Tasks
After the Cloud Tasks migration and #1508, the BRDA copy job now
routinely fail on the first try because the revision update is not
commited by the time the Cloud Tasks job enqueued in the same
transaction runs for the first time. This is because the enqueueing is
a side effect and not part of the transaction. The job eventually
succeeds because of retries.
This PR attempts to mitigate the initial failure by adding a delay to
the enqueued job, and checking the cursor in the job itself to prevent
it from running before the transaction is commited.
* Fix issues with saving and deleting gap records
Datastore limits us to mutating up to 25 records per transaction. We
sometimes exceed that when deleting expired gap records. In addition, it is
theoretically possible for us to accumulate enough continuous gap records to
exceed this count while replaying the original transaction.
Deal with deletion by breaking up the gap records to be deleted into a batch
size that is small enough to be deleted transactionally (in practice, we don't
much care about the transactionality but it doesn't seem like we can delete
batches without it).
Deal with the possibility of too many additions by always breaking out gap
record storage and last transaction number updates into their own
transaction(s) (separate from the replay of the original SQL transaction).
* Track and replay Transaction table gaps
Id gaps in the Transaction table can be the result of a transactions committed
out of order. To deal with this, keep track of gaps for up to five minutes
and check to see if they've been back-filled prior to applying the next batch
of transactions during reply.
* Changes for review
* Calculate gap expiration time before gap queries
* Reformat.
- Use the standard HttpsURLConnection to write/read data
- Rewrite RdeReporter, Nordn*Action, and Marksdb classes and related
tests to conform to the new format
- Remove FakeURLFetchService and ForwardingUrlFetchService as they weren't used
- Refactor UrlFetchException to UrlConnectionException
- Refactor UrlFetchUtils to UrlConnectionUtils
I will need to test this on Alpha. Fortunately the connections that
don't require auth (e.g. TMDB downloading) should be testable.
* Allow replicateToDatastore to skip gaps
As it turns out, gaps in the transaction id sequence number are expected
because rollbacks do not rollback sequence numbers.
To deal with this, stop checking these.
This change is not adequate in and of itself, as it is possible for a gap to
be introduced if two transactions are committed out of order of their sequence
number. We are currently discussing several strategies to mitigate this.
* Remove println, add a record verification
* Fix hanging test
Tests using the TestServerExtension may hang forever if an underlying
component (e.g., testcontainer for psql) fails. This may be the cause
of the some kokoro runs that timeed out after three hours.
* Don't reset the update time for TLD updates
It turns out that the reason that the Registrar update timestamp isn't updated
for some of the tests is because the record is updated unchanged. We can
avoid this problem by not trying to update the registrar to the same value.
So in this case, if the registrar alreay contains the TLD we're adding, don't
try to add it.
* Fix entity delete replication, compare db @ replay
Replay tests currently only verify that the contents of a transaction are
can be successfully replicated to the other database. They do not verify that
the contents of both databases are equivalent. As a result, we miss any
changes omitted from the transaction (as was the case with entity deletions).
This change adds a final database comparison to ReplayExtension so we can
safely say that the databases are in the same state.
This comparison is introduced in part as a unit test for the one-line fix for
replication of an "entity delete" operation (where we delete using an entity
object instead of the object's key) which so far has only affected PollMessage
deletion. The fix is also included in this commit within
JpaTransactionManagerImpl.
* Exclude tests and entities with failing comparisons
* Get all tests to pass and fix more timestamp
Fix most of the unit tests that were broken by this change.
- Fix timestamp updates after grace period changes in DomainContent and for
TLD changes in Registrar.
- Reenable full database comparison for most DomainCreateFlowTest's.
- Make some test entities NonReplicated so they don't break when used with
jpaTm().delete()
- Diable checking of a few more entity types that are failing comparisons.
- Add some formatting fixes.
* Remove unnecessary "NoDatabaseCompare"
I turns out that after other fixes/elisions we no longer need these for
any tests in DomainCreateFlowTest.
* Changes for review
* Remove old "compare" flag.
* Reformatted.
* Make a few quality-of-life improvements in CloudTasksUtils
1. Update the method names. There are too many overloaded methods and it
is hard to figure out which one does which without checking the
javadoc.
2. Added a method in the task matcher to specify the delay time in
DateTime, so the caller does not need to convert it to Timestamp.
3. Remove the expilict dependency on a clock when enqueueing a task with
delay, the clock is now injected directly into the util instance
itself.
This is necessary because the Cloud Tasks API is not transactionally enrolled,
so it's possible that multiple tasks might end up being enqueued. We need to be
able to handle them.
* Fix update timestamps for DomainContent types
We expect update timestamps to be updated whenever a containing entity is
modified and persisted, but unfortunately Hibernate doesn't seem to do this --
instead it appears to regard such an entity as unchanged.
To work around this, we explicitly reset the update timestamp whenever a
nested collection is modified in the Builder.
Note that this change only solves the problem for DomainContent. All other
entitities containing UpdateAutoTimestamp will need to be audited and
instrumented with a similar change.
* Fix a handful of tests broken by this change
* Reformatted.
* Use CloudTaskUtils to enqueue
* Add CloudTasksUtilsModule to FrontendComponent
* Fix Uri query issue
* Remove header and check service in matcher
* Use a ThreadLocal boolean in TestServer to determine enqueueing
* Extract enqueuing and email sending from tm().transact()