88 Commits

Author SHA1 Message Date
Marko
9379bc92fd fix lint failures with 1.31 (#5489) 2020-10-22 13:36:08 +02:00
Marko
b8d08b9ef4 lint: add errchecks (#5316)
## Description

Work towards enabling errcheck

ref #5059
2020-09-04 11:58:03 +00:00
Marko
dedf0d2350 proto: folder structure adhere to buf (#5025) 2020-06-22 10:00:51 +02:00
Marko
d54de61bf6 consensus: proto migration (#4984)
## Description

migrate consensus to protobuf

Closes: #XXX
2020-06-10 12:08:47 +00:00
Erik Grinaker
db8f1b3df3 migrate all JSON to new JSON encoder (#4975)
Uses new JSON encoder in #4955 for all JSON. Branched off of #4968.
2020-06-08 12:22:59 +00:00
Anton Kaliaev
b7b721c484 change use of errors.Wrap to fmt.Errorf with %w verb
Closes #4603

Commands used (VIM):

```
:args `rg -l errors.Wrap`
:argdo normal @q | update
```

where q is a macros rewriting the `errors.Wrap` to `fmt.Errorf`.
2020-05-12 03:35:47 +00:00
Marko
b7c2d7a977 lint: enable nolintlinter, disable on tests
## Description
- enable nolintlint
- disable linting on tests

Closes: #XXX
2020-05-04 07:49:53 +00:00
Marko
044f1bf288 format: add format cmd & goimport repo (#4586)
* format: add format cmd & goimport repo

- replaced format command
- added goimports to format command
- ran goimports

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* fix outliers & undo proto file changes
2020-03-23 09:19:26 +01:00
Erik Grinaker
fe11219795 Fix unexported returns (#4450) 2020-02-21 19:21:39 +01:00
Marko
7b52f51700 libs/common: Refactor libs/common 5 (#4240)
* libs/common: Refactor libs/common 5

- move mathematical functions and types out of `libs/common` to math pkg
- move net functions out of `libs/common` to net pkg
- move string functions out of `libs/common` to strings pkg
- move async functions out of `libs/common` to async pkg
- move bit functions out of `libs/common` to bits pkg
- move cmap functions out of `libs/common` to cmap pkg
- move os functions out of `libs/common` to os pkg

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* fix testing issues

* fix tests

closes #41417

woooooooooooooooooo kill the cmn pkg

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* add changelog entry

* fix goimport issues

* run gofmt
2019-12-12 09:33:27 +01:00
Marko
27b00cf8d1 libs/common: refactor libs common 3 (#4232)
* libs/common: refactor libs common 3

- move nil.go into types folder and make private
- move service & baseservice out of common into service pkg

ref #4147

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* add changelog entry
2019-12-11 09:31:25 +01:00
Marko
3e2751d274 lint: Enable Golint (#4212)
* Fix many golint errors

* Fix golint errors in the 'lite' package

* Don't export Pool.store

* Fix typo

* Revert unwanted changes

* Fix errors in counter package

* Fix linter errors in kvstore package

* Fix linter error in example package

* Fix error in tests package

* Fix linter errors in v2 package

* Fix linter errors in consensus package

* Fix linter errors in evidence package

* Fix linter error in fail package

* Fix linter errors in query package

* Fix linter errors in core package

* Fix linter errors in node package

* Fix linter errors in mempool package

* Fix linter error in conn package

* Fix linter errors in pex package

* Rename PEXReactor export to Reactor

* Fix linter errors in trust package

* Fix linter errors in upnp package

* Fix linter errors in p2p package

* Fix linter errors in proxy package

* Fix linter errors in mock_test package

* Fix linter error in client_test package

* Fix linter errors in coretypes package

* Fix linter errors in coregrpc package

* Fix linter errors in rpcserver package

* Fix linter errors in rpctypes package

* Fix linter errors in rpctest package

* Fix linter error in json2wal script

* Fix linter error in wal2json script

* Fix linter errors in kv package

* Fix linter error in state package

* Fix linter error in grpc_client

* Fix linter errors in types package

* Fix linter error in version package

* Fix remaining errors

* Address review comments

* Fix broken tests

* Reconcile package coregrpc

* Fix golangci bot error

* Fix new golint errors

* Fix broken reference

* Enable golint linter

* minor changes to bring golint into line

* fix failing test

* fix pex reactor naming

* address PR comments
2019-12-05 10:12:08 +01:00
Anton Kaliaev
43aed989ac cs: clarify where 24 comes from in maxMsgSizeBytes (wal.go) 2019-11-22 12:59:08 +04:00
Ethan Buchman
cbd5e031d6 Add more comments about the hard-coded limits (#4100)
* crypto: expose MaxAunts for documentation purposes

* types: update godoc for new maxes

* docs: make hard-coded limits more explicit

* wal: add todo to clarify max size

* shorten lines in test
2019-11-01 15:16:53 -07:00
Marko
5206ce32a0 Update master (#4087)
* cs: panic only when WAL#WriteSync fails

- modify WAL#Write and WAL#WriteSync to return an error

* fix test

* types: validate Part#Proof

add ValidateBasic to crypto/merkle/SimpleProof

* cs: limit max bit array size and block parts count

* cs: test new limits

* cs: only assert important stuff

* update changelog and bump version to 0.32.7

* fixes after Ethan's review

* align max wal msg and max consensus msg sizes

* fix tests

* fix test

* add change log for 31.11
2019-10-30 12:25:58 -07:00
Phil Salant
bc572217c0 Fix linter errors thrown by lll (#3970)
* Fix long line errors in abci, crypto, and libs packages

* Fix long lines in p2p and rpc packages

* Fix long lines in abci, state, and tools packages

* Fix long lines in behaviour and blockchain packages

* Fix long lines in cmd and config packages

* Begin fixing long lines in consensus package

* Finish fixing long lines in consensus package

* Add lll exclusion for lines containing URLs

* Fix long lines in crypto package

* Fix long lines in evidence package

* Fix long lines in mempool and node packages

* Fix long lines in libs package

* Fix long lines in lite package

* Fix new long line in node package

* Fix long lines in p2p package

* Ignore gocritic warning

* Fix long lines in privval package

* Fix long lines in rpc package

* Fix long lines in scripts package

* Fix long lines in state package

* Fix long lines in tools package

* Fix long lines in types package

* Enable lll linter
2019-10-17 10:42:28 +02:00
Anton Kaliaev
ec9bff5234 rename WAL#Flush to WAL#FlushAndSync (#3345)
* rename WAL#Flush to WAL#FlushAndSync

- rename auto#Flush to auto#FlushAndSync
- cleanup WAL interface to not leak implementation details!
  * remove Group()
  * add WALReader interface and return it in SearchForEndHeight()
- add interface assertions

Refs #3337

* replace WALReader with io.ReadCloser
2019-02-25 09:11:07 +04:00
Anton Kaliaev
ed1de13548 cs: update wal comments (#3334)
* cs: update wal comments

Follow-up to https://github.com/tendermint/tendermint/pull/3300

* Update consensus/wal.go

Co-Authored-By: melekes <anton.kalyaev@gmail.com>
2019-02-21 18:28:02 +04:00
Thane Thomson
dff3deb2a9 cs: sync WAL more frequently (#3300)
As per #3043, this adds a ticker to sync the WAL every 2s while the WAL is running.

* Flush WAL every 2s

This adds a ticker that flushes the WAL every 2s while the WAL is
running. This is related to #3043.

* Fix spelling

* Increase timeout to 2mins for slower build environments

* Make WAL sync interval configurable

* Add TODO to replace testChan with more comprehensive testBus

* Remove extraneous debug statement

* Remove testChan in favour of using system time

As per
https://github.com/tendermint/tendermint/pull/3300#discussion_r255886586,
this removes the `testChan` WAL member and replaces the approach with a
system time-oriented one. In this new approach, we keep track of the
system time at which each flush and periodic flush successfully
occurred.

The naming of the various functions is also updated here to be more
consistent with "flushing" as opposed to "sync'ing".

* Update naming convention and ensure lock for timestamp update

* Add Flush method as part of WAL interface

Adds a `Flush` method as part of the WAL interface to enforce the idea
that we can manually trigger a WAL flush from outside of the WAL. This
is employed in the consensus state management to flush the WAL prior to
signing votes/proposals, as per https://github.com/tendermint/tendermint/issues/3043#issuecomment-453853630

* Update CHANGELOG_PENDING

* Remove mutex approach and replace with DI

The dependency injection approach to dealing with testing concerns could
allow similar effects to some kind of "testing bus"-based approach. This
commit introduces an example of this, where instead of relying on
(potentially fragile) timing of things between the code and the test, we
inject code into the function under test that can signal the test
through a channel.

This allows us to avoid the `time.Sleep()`-based approach previously
employed.

* Update comment on WAL flushing during vote signing

Co-Authored-By: thanethomson <connect@thanethomson.com>

* Simplify flush interval definition

Co-Authored-By: thanethomson <connect@thanethomson.com>

* Expand commentary on WAL disk flushing

Co-Authored-By: thanethomson <connect@thanethomson.com>

* Add broken test to illustrate WAL sync test problem

Removes test-related state (dependency injection code) from the WAL data
structure and adds test code to illustrate the problem with using
`WALGenerateNBlocks` and `wal.SearchForEndHeight` to test periodic
sync'ing.

* Fix test error messages

* Use WAL group buffer size to check for flush

A function is added to `libs/autofile/group.go#Group` in order to return
the size of the buffered data (i.e. data that has not yet been flushed
to disk). The test now checks that, prior to a `time.Sleep`, the group
buffer has data in it. After the `time.Sleep` (during which time the
periodic flush should have been called), the buffer should be empty.

* Remove config root dir removal from #3291

* Add godoc for NewWAL mentioning periodic sync
2019-02-20 09:45:18 +04:00
Anton Kaliaev
0b0a8b3128 cs/wal: refuse to encode msg that is bigger than maxMsgSizeBytes (#3303)
Earlier this week somebody posted this in GoS Riot chat:

```
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 878916964 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 825701731 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 1631073634 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 912418148 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.600] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-12|10:38:37.600] Error on catchup replay. Proceeding to start ConsensusState anyway module=consensus err="Cannot replay height 7242. WAL does not contain #ENDHEIGHT for 7241"
E[2019-02-12|10:38:37.861] Error dialing peer module=p2p err="dial tcp 35.183.126.181:26656: i/o timeout
```

Note the length error messages. What has happened is the length field got corrupted probably. I've looked at the code and noticed that we don't check the msg size during encoding. This PR fixes that. It also improves a few error messages in WALDecoder.
2019-02-18 11:23:06 +04:00
Ethan Buchman
dc6567c677 consensus: flush wal on stop (#3297)
Should fix #3295
Also partial fix of #3043
2019-02-12 09:32:40 +04:00
Ethan Buchman
45b70ae031 fix non deterministic test failures and race in privval socket (#3258)
* node: decrease retry conn timeout in test

Should fix #3256

The retry timeout was set to the default, which is the same as the
accept timeout, so it's no wonder this would fail. Here we decrease the
retry timeout so we can try many times before the accept timeout.

* p2p: increase handshake timeout in test

This fails sometimes, presumably because the handshake timeout is so low
(only 50ms). So increase it to 1s. Should fix #3187

* privval: fix race with ping. closes #3237

Pings happen in a go-routine and can happen concurrently with other
messages. Since we use a request/response protocol, we expect to send a
request and get back the corresponding response. But with pings
happening concurrently, this assumption could be violated. We were using
a mutex, but only a RWMutex, where the RLock was being held for sending
messages - this was to allow the underlying connection to be replaced if
it fails. Turns out we actually need to use a full lock (not just a read
lock) to prevent multiple requests from happening concurrently.

* node: fix test name. DelayedStop -> DelayedStart

* autofile: Wait() method

In the TestWALTruncate in consensus/wal_test.go we remove the WAL
directory at the end of the test. However the wal.Stop() does not
properly wait for the autofile group to finish shutting down. Hence it
was possible that the group's go-routine is still running when the
cleanup happens, which causes a panic since the directory disappeared.
Here we add a Wait() method to properly wait until the go-routine exits
so we can safely clean up. This fixes #2852.
2019-02-06 10:24:43 -05:00
Ethan Buchman
39eba4e154 WAL: better errors and new fail point (#3246)
* privval: more info in errors

* wal: change Debug logs to Info

* wal: log and return error on corrupted wal instead of panicing

* fail: Exit right away instead of sending interupt

* consensus: FAIL before handling our own vote

allows to replicate #3089:
- run using `FAIL_TEST_INDEX=0`
- delete some bytes from the end of the WAL
- start normally

Results in logs like:

```
I[2019-02-03|18:12:58.225] Searching for height module=consensus wal=/Users/ethanbuchman/.tendermint/data/cs.wal/wal height=1 min=0 max=0
E[2019-02-03|18:12:58.225] Error on catchup replay. Proceeding to start ConsensusState anyway module=consensus err="failed to read data: EOF"
I[2019-02-03|18:12:58.225] Started node module=main nodeInfo="{ProtocolVersion:{P2P:6 Block:9 App:1} ID_:35e87e93f2e31f305b65a5517fd2102331b56002 ListenAddr:tcp://0.0.0.0:26656 Network:test-chain-J8JvJH Version:0.29.1 Channels:4020212223303800 Moniker:Ethans-MacBook-Pro.local Other:{TxIndex:on RPCAddress:tcp://0.0.0.0:26657}}"
E[2019-02-03|18:12:58.226] Couldn't connect to any seeds module=p2p
I[2019-02-03|18:12:59.229] Timed out module=consensus dur=998.568ms height=1 round=0 step=RoundStepNewHeight
I[2019-02-03|18:12:59.230] enterNewRound(1/0). Current: 1/0/RoundStepNewHeight module=consensus height=1 round=0
I[2019-02-03|18:12:59.230] enterPropose(1/0). Current: 1/0/RoundStepNewRound module=consensus height=1 round=0
I[2019-02-03|18:12:59.230] enterPropose: Our turn to propose module=consensus height=1 round=0 proposer=AD278B7767B05D7FBEB76207024C650988FA77D5 privValidator="PrivValidator{AD278B7767B05D7FBEB76207024C650988FA77D5 LH:1, LR:0, LS:2}"
E[2019-02-03|18:12:59.230] enterPropose: Error signing proposal module=consensus height=1 round=0 err="Error signing proposal: Step regression at height 1 round 0. Got 1, last step 2"
I[2019-02-03|18:13:02.233] Timed out module=consensus dur=3s height=1 round=0 step=RoundStepPropose
I[2019-02-03|18:13:02.233] enterPrevote(1/0). Current: 1/0/RoundStepPropose module=consensus
I[2019-02-03|18:13:02.233] enterPrevote: ProposalBlock is nil module=consensus height=1 round=0
E[2019-02-03|18:13:02.234] Error signing vote module=consensus height=1 round=0 vote="Vote{0:AD278B7767B0 1/00/1(Prevote) 000000000000 000000000000 @ 2019-02-04T02:13:02.233897Z}" err="Error signing vote: Conflicting data"
```

Notice the EOF, the step regression, and the conflicting data.

* wal: change errors to be DataCorruptionError

* exit on corrupt WAL

* fix log

* fix new line
2019-02-04 13:00:06 -05:00
Anton Kaliaev
d178ea9eaf use our logger in autofile/group 2018-11-09 16:29:43 +01:00
goolAdapter
110b07fb3f libs: Call Flush() before rename #2428 (#2439)
* fix Group.RotateFile need call Flush() before rename. #2428
* fix some review issue. #2428
 refactor Group's config: replace  setting member with initial option
* fix a handwriting mistake
* fix a time window error between rename and write.
* fix a syntax mistake.
* change option name Get_ to With_
* fix review issue
* fix review issue
2018-09-25 13:22:45 +02:00
Anton Kaliaev
0e1cd88863 Remove ConsensusParams.TxSize and ConsensusParams.BlockGossip (#2364)
* remove ConsensusParams.TxSize and ConsensusParams.BlockGossip

Refs #2347

* block part size is now fixed

Refs #2347

* use max data size, not max bytes for tx limit

Refs #2347
2018-09-12 15:44:43 -04:00
Zarko Milosevic
7b88172f41 Implement BFT time (#2203)
* Implement BFT time

* set LastValidators when creating state in state helper

for heights >= 2
2018-08-31 19:33:51 -04:00
Dev Ojha
2756be5a59 libs: Remove usage of custom Fmt, in favor of fmt.Sprintf (#2199)
* libs: Remove usage of custom Fmt, in favor of fmt.Sprintf

Closes #2193

* Fix bug that was masked by custom Fmt!
2018-08-10 09:25:57 +04:00
Ethan Buchman
d55243f0e6 fix import paths 2018-07-01 22:36:49 -04:00
Anton Kaliaev
1f22f34edf flush wal group on stop
Refs #1659
Refs https://github.com/tendermint/tmlibs/pull/217
2018-06-04 16:47:44 +04:00
Anton Kaliaev
708f35e5c1 do not look for height in older files if we've seen height - 1
Refs #1600
2018-05-25 15:11:15 +04:00
Anton Kaliaev
f3f5c7f472 we must only return io.EOF to progress to the next file in auto.Group
since we never write msg partially, if we've encountered io.EOF in the
middle of the msg, we must abort
2018-05-25 15:10:51 +04:00
Anton Kaliaev
68f6226bea data is corrupted, but this requires manual intervention
i.e., can't be skipped

and we should only return DataCorruptionError if we can skip a msg safely
2018-05-25 15:10:51 +04:00
Anton Kaliaev
118b86b1ef fix nil panic error
msg is nil and if we continue executing, we'll get nil exception at
`msg.Msg.(....)`
2018-05-25 15:10:51 +04:00
Anton Kaliaev
b9afcbe3a2 fix typo 2018-05-25 15:10:51 +04:00
Ethan Buchman
ee4eb59355 update comments 2018-05-20 16:44:08 -04:00
Ethan Buchman
082a02e6d1 consensus: only fsync wal after internal msgs 2018-05-20 14:40:47 -04:00
Anton Kaliaev
e88f74bb9b remove wal_light setting
Closes #1428
2018-04-11 10:08:03 +02:00
Jae Kwon
fb64314d1c Review from Anton 2018-04-06 13:46:40 -07:00
Ethan Buchman
799beebd36 fix consensus tests 2018-04-05 17:54:26 +03:00
Jae Kwon
45ec5fd170 WIP consensus 2018-04-05 07:05:45 -07:00
Ethan Buchman
a17105fd46 p2p: peer.Key -> peer.ID 2018-01-01 22:39:05 -05:00
Anton Kaliaev
843e1ed400 Updates -> ValidatoSetUpdates 2017-12-19 13:03:39 -06:00
Anton Kaliaev
e57cad6c3f correct maxMsgSizeBytes 2017-12-15 11:42:53 -06:00
Anton Kaliaev
06aece31cf lower the max message size 2017-12-12 13:02:40 -06:00
Anton Kaliaev
af79a2a59e fix error msg 2017-12-11 19:50:05 -06:00
Anton Kaliaev
ee66476d62 set max msg size
otherwise, it is easy to get OutOfMemory panic (somebody can even expoit
this)
2017-12-11 19:48:57 -06:00
Anton Kaliaev
40f9261d48 handle data corruption errors
Refs #573
2017-12-11 19:48:20 -06:00
Anton Kaliaev
90944bb1a2 be specific about what type we're encoding
to be consistent with Decode, which returns TimedWALMessage
2017-12-07 11:45:50 -06:00
Anton Kaliaev
07571741c5 [consensus] remove WAL separator (Refs #785)
We don't really need a separator unless we have complex structures
(rows, cells like RDBMS have https://www.sqlite.org/fileformat.html).
2017-12-07 11:36:46 -06:00