Commit Graph

378 Commits

Author SHA1 Message Date
William Banfield
b92a19b2ce proxy: add initial set of abci metrics (port of #7115) (#9169)
* internal/proxy: add initial set of abci metrics (#7115)

This PR adds an initial set of metrics for use ABCI. The initial metrics enable the calculation of timing histograms and call counts for each of the ABCI methods. The metrics are also labeled as either 'sync' or 'async' to determine if the method call was performed using ABCI's `*Async` methods.

An example of these metrics is included here for reference:
```
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0001"} 0
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0004"} 5
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.002"} 12
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.009"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.02"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.1"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.65"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="2"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="6"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="25"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="+Inf"} 13
tendermint_abci_connection_method_timing_sum{chain_id="ci",method="commit",type="sync"} 0.007802058000000001
tendermint_abci_connection_method_timing_count{chain_id="ci",method="commit",type="sync"} 13
```

These metrics can easily be graphed using prometheus's `histogram_quantile(...)` method to pick out a particular quantile to graph or examine. I chose buckets that were somewhat of an estimate of expected range of times for ABCI operations. They start at .0001 seconds and range to 25 seconds. The hope is that this range captures enough possible times to be useful for us and operators.

* fixup

* fixups

* docs: add abci timing metrics to the metrics docs (#7311)

* format table
2022-08-05 13:29:00 -04:00
Callum Waters
4206a0e9b7 remove v1 and v2 blocksync protocol impementations (#9146) 2022-08-02 11:30:28 +02:00
Callum Waters
49ec3b9780 chore: lint repo (use american english) (#9144) 2022-08-01 14:24:49 +02:00
Joe Abbey
4a1df4911d fix: "Lazy" Stringers to defer Sprintf and Hash until logs print (#8845) 2022-06-23 14:56:34 -04:00
mergify[bot]
624bbac8f6 blocksync: validate block before persisting it (backport #8493) (#8495) 2022-05-11 16:20:23 +02:00
mergify[bot]
4e25703d58 rpc/jsonrpc/server: return an error in WriteRPCResponseHTTP(Error) (bp #6204) (#6230)
* rpc/jsonrpc/server: return an error in WriteRPCResponseHTTP(Error) (#6204)

instead of panicking
Closes #5529

(cherry picked from commit 00b9524168)

# Conflicts:
#	CHANGELOG_PENDING.md
#	rpc/jsonrpc/server/http_json_handler.go
#	rpc/jsonrpc/server/http_server.go
#	rpc/jsonrpc/server/http_server_test.go
#	rpc/jsonrpc/server/http_uri_handler.go

* resolve conflicts

* fix linting

* fix conflict

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Marko Baricevic <marbar3778@yahoo.com>
2021-03-17 14:55:05 +00:00
Erik Grinaker
3185bb8b22 blockchain/v0: stop tickers on poolRoutine exit (#5860)
Fixes #5841.
2021-01-06 17:27:51 +01:00
Erik Grinaker
2eba38051a blockchain/v2: fix missing mutex unlock (#5862)
Fixes #5843.
2021-01-06 17:27:51 +01:00
Anton Kaliaev
b1328db07f modify Reactor priorities (#5826) (#5830)
blockchain/vX reactor priority was decreased because during the normal operation
(i.e. when the node is not fast syncing) blockchain priority can't be
the same as consensus reactor priority. Otherwise, it's theoretically possible to
slow down consensus by constantly requesting blocks from the node.

NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when
the node is fast syncing, the priority is 10 (max), but when it's done
fast syncing - the priority gets decreased to 5 (only to serve blocks
for other nodes). But it's not possible now, therefore I decided to
focus on the normal operation (priority = 5).

evidence and consensus critical messages are more important than
the mempool ones, hence priorities are bumped by 1 (from 5 to 6).

statesync reactor priority was changed from 1 to 5 to be the same as
blockchain/vX priority.

Refs https://github.com/tendermint/tendermint/issues/5816
2020-12-23 18:05:14 +01:00
Tess Rinearson
2a4fd3804c blockchain/v1: omit incoming message bytes from log 2020-12-04 12:18:14 +01:00
Tess Rinearson
0d9606e1b4 reactors: omit incoming message bytes from reactor logs (#5743)
After a reactor has failed to parse an incoming message, it shouldn't output the "bad" data into the logs, as that data is unfiltered and could have anything in it. (We also don't think this information is helpful to have in the logs anyways.)
2020-12-04 12:18:14 +01:00
Anton Kaliaev
54a0940e40 blockchain/v2: remove peers from the processor (#5607)
after they were pruned by the scheduler

Closes #5513
2020-11-05 16:55:11 +04:00
Anton Kaliaev
25fafb30b5 blockchain/v2: make the removal of an already removed peer a noop (#5553)
also, since multiple StopPeerForError calls may be executed in parallel,
only execute StopPeerForError once

Closes #5541
2020-11-05 14:48:31 +04:00
Marko
9379bc92fd fix lint failures with 1.31 (#5489) 2020-10-22 13:36:08 +02:00
Marko
41ab199378 blockchain/v1: add noBlockResponse handling (#5401)
## Description

Add simple `NoBlockResponse` handling to blockchain reactor v1. I tested before and after with erik's e2e testing and was not able to reproduce the inability to sync after the changes were applied

Closes: #5394
2020-10-22 13:08:12 +02:00
Anton Kaliaev
020edbc11d blockchain/v2: fix panic: processed height X+1 but expected height X (#5530)
Before: scheduler receives psBlockProcessed event, but does not mark block as processed because peer timed out (or was removed for other reasons) and all associated blocks were rescheduled.

After: scheduler receives psBlockProcessed event and marks block as processed in any case (even if peer who provided this block errors).

Closes #5387
2020-10-21 13:28:41 +04:00
Anton Kaliaev
79d535dd67 blockchain/v2: fix "panic: duplicate block enqueued by processor" (#5499)
When a peer is stopped due to some network issue, the Reactor calls scheduler#handleRemovePeer, which removes the peer from the scheduler. BUT the peer stays in the processor, which sometimes could lead to "duplicate block enqueued by processor" panic WHEN the same block is requested by the scheduler again from a different peer. The solution is to return scPeerError, which will be propagated to the processor. The processor will clean up the blocks associated with the peer in purgePeer.

Closes #5513, #5517
2020-10-21 13:26:20 +04:00
Marko
09982ae407 backport block size fixes (#5492)
* mempool: length prefix txs when getting them from mempool (#5483)

* correctly calculate evidence data size (#5482)

* block: use commit sig size instead of vote size (#5490)

* tx: reduce function to one parameter (#5493)
2020-10-13 18:07:54 +02:00
Callum Waters
ed002cea7e evidence: introduction of LightClientAttackEvidence and refactor of evidence lifecycle (#5361)
evidence: modify evidence types (#5342)

light: detect light client attacks (#5344)

evidence: refactor evidence pool (#5345)

abci: application evidence prepared by evidence pool (#5354)
2020-09-22 10:22:54 +02:00
Marko
56911ee352 state: define interface for state store (#5348)
## Description

Make an interface for the state store. 

Closes: #5213
2020-09-15 07:45:48 +00:00
Marko
6ab2a19088 header: check block protocol (#5340)
## Description

Check block protocol version in header validate basic. 

I tried searching for where we check the P2P protocol version but was unable to find it. When we check compatibility with a node we check we both have the same block protocol and are on the same network, but we do not check if we are on the same P2P protocol. It makes sense if there is a handshake change because we would not be able to establish a secure connection, but a p2p protocol version bump may be because of a p2p message change, which would go unnoticed until that message is sent over the wire.  Is this purposeful?

Closes: #4790
2020-09-09 09:13:18 +00:00
Marko
0ed8dba991 lint: enable errcheck (#5336)
## Description

Enable errcheck linter throughout the codebase

Closes: #5059
2020-09-07 15:03:18 +00:00
Marko
135ac0400e blockchain: verify +2/3 (#5278)
## Description

Verify only +2/3 of the commit. 

Closes: #5259
2020-08-25 07:07:19 +00:00
Erik Grinaker
edf5cff80f blockchain: fix fast sync halt with initial height > 1 (#5249)
Blockchain reactors were not updated to handle arbitrary initial height after #5191.
2020-08-14 13:04:51 +00:00
Marko
40bd416d59 test: protobuf vectors for reactors (#5221)
## Description

Add test vectors for all reactors

- [x] state-sync
- [x] privval
- [x] mempool
- [x] p2p
- [x] evidence
- [ ] light?

this PR is primarily oriented at testvectors for things going over the wire. should we expand the testvectors into types as well?

Closes: #XXX
2020-08-11 14:00:11 +00:00
Erik Grinaker
f66b7a8e32 merkle: return hashes for empty merkle trees (#5193)
Fixes #5192.

@liamsi Can you verify that the test vectors match the Rust implementation? I updated `ProofsFromByteSlices()` as well, anything else that should be updated?
2020-08-11 10:31:05 +00:00
n-hutton
375f0c819f add fixes for flaky tests (#5146)
While working on tendermint my colleague @jinmannwong fixed a few of the unit tests that we found to be flaky in our CI. We thought that you might find this useful, see below for comments.
2020-07-27 10:36:56 +04:00
Marko
2ac5a559b4 libs: wrap mutexes for build flag with godeadlock (#5126)
## Description

This PR wraps the stdlib sync.(RW)Mutex & godeadlock.(RW)Mutex. This enables using go-deadlock via a build flag instead of using sed to replace sync with godeadlock in all files

Closes: #3242
2020-07-20 07:55:09 +00:00
Marko
7c8c356f71 ci: version linter fix (#5128)
## Description
linter version fix and run make format to have all ci run

Closes: #XXX
2020-07-16 09:01:02 +00:00
Marko
6ccccb0933 lint: errcheck (#5091)
## Description

add more error checks to tests


gonna do a third PR that tackles the non test cases
2020-07-14 11:04:41 +00:00
Anton Kaliaev
730e16566e proto: change type + a cleanup (#5107)
- drop Height & Base from StatusRequest
It does not make sense nor it's used anywhere currently. Also, there
seem to be no trace of these fields in the ADR-40 (blockchain reactor
v2).

- change PacketMsg#EOF type from int32 to bool
2020-07-13 10:24:17 +00:00
Lei Wang
430162f8a1 Update reactor.go (#5088)
check bcR.fastSync flag when "OnStop"

fix "service/service.go:161	Not stopping BlockPool -- have not been started yet	{"impl": "BlockPool"}" error when kill process
2020-07-07 09:47:49 +00:00
Marko
943bbd75a4 blockchain: test vectors for proto encoding (#5073)
## Description

this PR adds test vectors for proto encoding. the main difference from amino was the removal of four bytes due to interface encoding.

should i add more cases?

Closes: #XXX
2020-07-02 13:48:31 +00:00
Marko
7e2cc1db5e linter: (1/2) enable errcheck (#5064)
## Description

partially cleanup in preparation for errcheck

i ignored a bunch of defer errors in tests but with the update to go 1.14 we can use `t.Cleanup(func() { if err := <>; err != nil {..}}` to cover those errors, I will do this in pr number two of enabling errcheck.

ref #5059
2020-07-01 15:13:11 +00:00
Marko
dedf0d2350 proto: folder structure adhere to buf (#5025) 2020-06-22 10:00:51 +02:00
Marko
51da4fe356 types: rename partsheader to partsetheader (#5029)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-06-22 09:38:03 +02:00
Marko
f6243d8b9e privval: migrate to protobuf (#4985) 2020-06-11 11:54:02 +02:00
Marko
a89f2581fc blockchain: proto migration (#4969)
## Description

migration of blockchain reactors to protobuf

Closes: #XXX
2020-06-08 12:45:03 +00:00
Erik Grinaker
b76b270a23 blockchain/v2: correctly set block store base in status responses (#4971)
See: https://github.com/tendermint/tendermint/pull/4969#pullrequestreview-425298225
2020-06-05 15:18:12 +00:00
Marko
a88537bb88 ints: stricter numbers (#4939) 2020-06-04 16:34:56 +02:00
Erik Grinaker
b9a0d47f14 test/blockchain/v0: mitigate test data race (#4886)
Mitigates the below data race. The proper fix involves not fiddling with reactor internals, which needs a rewrite of the test and possible additional reactor infrastructure.

```
==================
WARNING: DATA RACE
Write at 0x00c001118e78 by goroutine 187:
  github.com/tendermint/tendermint/blockchain/v0.TestBadBlockStopsPeer()
      /go/src/github.com/tendermint/tendermint/blockchain/v0/reactor_test.go:234 +0x9d7
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:992 +0x1eb

Previous read at 0x00c001118e78 by goroutine 326:
  [failed to restore the stack]

Goroutine 187 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1043 +0x660
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1285 +0xa6
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:992 +0x1eb
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1283 +0x527
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1200 +0x2ff
  main.main()
      _testmain.go:112 +0x337

Goroutine 326 (running) created at:
  github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart()
      /go/src/github.com/tendermint/tendermint/blockchain/v0/reactor.go:118 +0x12c
  github.com/tendermint/tendermint/libs/service.(*BaseService).Start()
      /go/src/github.com/tendermint/tendermint/libs/service/service.go:140 +0x504
  github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).Start()
      <autogenerated>:1 +0x43
  github.com/tendermint/tendermint/p2p.(*Switch).OnStart()
      /go/src/github.com/tendermint/tendermint/p2p/switch.go:225 +0x120
  github.com/tendermint/tendermint/libs/service.(*BaseService).Start()
      /go/src/github.com/tendermint/tendermint/libs/service/service.go:140 +0x504
  github.com/tendermint/tendermint/p2p.StartSwitches()
      /go/src/github.com/tendermint/tendermint/p2p/test_util.go:168 +0x75
  github.com/tendermint/tendermint/p2p.MakeConnectedSwitches()
      /go/src/github.com/tendermint/tendermint/p2p/test_util.go:89 +0x17d
  github.com/tendermint/tendermint/blockchain/v0.TestBadBlockStopsPeer()
      /go/src/github.com/tendermint/tendermint/blockchain/v0/reactor_test.go:209 +0x768
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:992 +0x1eb
==================
panic: BlockStore can only save contiguous blocks. Wanted 149, got 147

goroutine 1259 [running]:
github.com/tendermint/tendermint/store.(*BlockStore).SaveBlock(0xc000ff9cc0, 0xc001997180, 0xc0010c6a00, 0xc0013b3000)
	/go/src/github.com/tendermint/tendermint/store/store.go:276 +0xbc4
github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc001118d00, 0x107c000)
	/go/src/github.com/tendermint/tendermint/blockchain/v0/reactor.go:355 +0xe90
created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart
	/go/src/github.com/tendermint/tendermint/blockchain/v0/reactor.go:118 +0x12d
FAIL	github.com/tendermint/tendermint/blockchain/v0	11.447s
FAIL
```
2020-05-26 11:51:37 +00:00
Callum Waters
970cbbad6d blockchain[v1]: increased timeout times for peer tests (#4871) 2020-05-25 17:33:13 +02:00
Marko
9149ee7d8b lint: various fixes
## Description

various linitng fixes
2020-05-18 10:20:06 +00:00
Anton Kaliaev
b7b721c484 change use of errors.Wrap to fmt.Errorf with %w verb
Closes #4603

Commands used (VIM):

```
:args `rg -l errors.Wrap`
:argdo normal @q | update
```

where q is a macros rewriting the `errors.Wrap` to `fmt.Errorf`.
2020-05-12 03:35:47 +00:00
Erik Grinaker
eb443f4b77 blockchain/v2: integrate with state sync
Integrates the blockchain v2 reactor with state sync, fixes #4765. This mostly involves deferring fast syncing until after state sync completes. I tried a few different approaches, this was the least effort:

* `Reactor.events` is `nil` if no fast sync is in progress, in which case events are not dispatched - most importantly `AddPeer`.

* Accept status messages from unknown peers in the scheduler and register them as ready. On fast sync startup, broadcast status requests to all existing peers.

* When switching from state sync, first send a `bcResetState` message to the processor and scheduler to update their states - most importantly the initial block height.

* When fast sync completes, shut down event loop, scheduler and processor, and set `events` channel to `nil`.
2020-05-07 10:34:01 +00:00
Callum Waters
47cfadb0aa evidence: refactor evidence mocks throughout packages (#4787)
Predominantly following the discussions regarding the conventions of using mocks, I have decided to revert back to the previous state where mocks were specialized and stored in the separate packages that used them rather then have a generalized mock in the evidence package.

This also was a problem as the state package were running tests too slow and occasionally timing out unnecessarily.

For the replay file I renamed mockEvidencePool to emptyEvidencePool to illustrate that it was intentionally like this and not give the impression that testing software was being used in production

Closes: #4786
2020-05-05 13:11:44 +04:00
Marko
b7c2d7a977 lint: enable nolintlinter, disable on tests
## Description
- enable nolintlint
- disable linting on tests

Closes: #XXX
2020-05-04 07:49:53 +00:00
Callum
15a9f1760d move mempool mock directory 2020-04-29 15:25:01 +02:00
Callum Waters
ba967c077b Merge branch 'master' into callum/4728-check-evidence 2020-04-29 14:04:04 +02:00
Erik Grinaker
8108ac9d17 blockchain/v2: respect fast_sync option (#4772)
Not thoroughly tested, but seems to work. Will do further testing as this is integrated with state sync.

Fixes #4688.
2020-04-29 13:54:37 +02:00